2013年6月29日 星期六

[ InAction Note ] Ch1. Introduction - ActiveMQ features and Usage

Preface: 
Enterprise messaging software has been in existence since the late 1980s. Not only is messaging a style of communication between applications, it’s also a style of integration. Therefore, messaging fulfills the need for both notification as well as interoperation among applications. But open source solutions have only emerged in the last 10 years. Apache ActiveMQ is one such solution, providing the ability for applications to communicate in an asynchronous, loosely coupled manner. This chapter will introduce you to ActiveMQ. 

ActiveMQ features: 
ActiveMQ provides an abundance of features created through hundreds of man-years of effort. The chapters in this book break down ActiveMQ into sets of features to focus on describing many of them. The following is a high-level list of some of the features that will be discussed throughout this book: 
- JMS compliance 
A good starting point for understanding the features in ActiveMQ is that ActiveMQ is an implementation of the JMS 1.1 spec. As discussed later in this chapter, the JMS spec provides important benefits and guarantees, including synchronous or asynchronous message delivery, once-and-only-once message delivery, message durability for subscribers, and much more. Adhering to the JMS spec for such features means that no matter what JMS provider is used, the same base set of features will be made available.

- Connectivity 
ActiveMQ provides a wide range of connectivity options, including support for protocols such as HTTP/S, IP multicast, SSL, STOMP, TCP, UDP, XMPP, and more. Support for such a wide range of protocols equates to more flexibility. Many existing systems utilize a particular protocol and don’t have the option to change, so a messaging platform that supports many protocols lowers the barrier to adoption. Though connectivity is important, the ability to closely integrate with other containers is also important. Chapter 4 addresses both the transport connectors and the network connectors in ActiveMQ.

- Pluggable persistence and security 
ActiveMQ provides multiple flavors of persistence and you can choose between them. Also, security in ActiveMQ can be completely customized for the type of authentication and authorization that’s best for your needs. For example, ActiveMQ offers its own style of ultra-fast message persistence via KahaDB, but also supports standard JDBC-accessible databases. ActiveMQ also supports its own simple style of authentication and authorization using properties files as well as standard JAAS (Java Authentication Authorization Servicelogin modules. These two topics are discussed in chapters 5 and 6.

- Building messaging applications with Java 
The most common route with ActiveMQ is with Java applications for sending and receiving messages. This task entails use of the JMS spec APIs with ActiveMQ and is covered in chapter 7.

- Integration with application servers 
It’s common to integrate ActiveMQ with a Java application server. Chapter 8 provides examples of integrating with some of the most popular application servers, including Apache Tomcat, Jetty, Apache Geronimo, and JBoss.

- Client APIs 
ActiveMQ provides client APIs for many languages besides just Java, including C/C++, .NET, Perl, PHP, Python, Ruby, and more. This opens the door to opportunities where ActiveMQ can be utilized outside of the Java world. Many other languages also have access to all of the features and benefits provided by ActiveMQ through these various client APIs. Of course, the ActiveMQ broker still runs in a Java VM, but the clients can be written using any of the supported languages. Client connectivity to ActiveMQ is covered in chapter 9.

- Broker clustering 
Many ActiveMQ brokers can work together as a federated network of brokers for scalability purposes. This is known as a network of brokers and can support many different topologies. This topic is covered in chapter 10.

- Many advanced broker features and client options 
ActiveMQ provides many sophisticated features for both the broker and the clients connecting to the broker. ActiveMQ also supports the use of Apache Camel within the broker’s XML configuration file. These features are discussed in chapters 11 and 12.

- Dramatically simplified administration 
ActiveMQ is designed with developers in mind. As such, it doesn’t require a dedicated administrator because it provides easy-to-use yet powerful administration features. There are many ways to monitor different aspects of ActiveMQ, including via JMX using tools such as JConsole or the ActiveMQ web console, by processing the ActiveMQ advisory messages, by using command-line scripts, and even by monitoring various types of logging. This is all covered in chapter 14.

This is just a taste of the features offered by ActiveMQ. As you can see, these topics will be addressed through the rest of the chapters of the book. For demonstration purposes, a couple of simple examples will be carried throughout and these examples will be introduced in chapter 3. But before we take a look at the examples, and given the fact that you’ve been presented with numerous different features, we’re sure you have some questions about why you might use ActiveMQ. 

Using ActiveMQ: why and when? 
Back around 2003, a group of open source developers got together to form Apache Geronimo. In doing so, they discovered that there was no good message broker available that utilized a BSD-style license. Geronimo needed a JMS implementation for reasons of Java EE compatibility, so a few of the developers starting discussing the possibilities. Possessing vast experience with commercial MOMs and even having built a few MOMs themselves previously, these developers set out to create the next great open source message broker. Additional inspiration for ActiveMQ came from the fact that most of the MOMs in the market were commercial, closed source, and were costly to buy and support. The commercial MOMs were popular with businesses, but some businesses couldn’t afford the steep costs required. This further increased the motivation to build an open source alternative - what evolved over time is Apache ActiveMQ. 

ActiveMQ was meant to be used as the JMS spec intended, for remote communications between distributed applications. To better understand what this means, the best thing to do is look at a few of the ideas behind distributed application design, specifically communications. 

Loose coupling and ActiveMQ 
ActiveMQ provides the benefits of loose coupling for application architecture. Loose coupling is commonly introduced into an architecture to mitigate the classic tight coupling of Remote Procedure Calls (RPC). Such a loosely coupled design is considered to be asynchronous, where the calls from either application have no bearing on one another; there’s no interdependence or timing requirements. The applications can rely upon ActiveMQ’s ability to guarantee message delivery. Because of this, it’s often said that applications sending messages just fire-and-forget—they send the message to ActiveMQ and aren’t concerned with how or when the message is delivered. In the same manner, the consuming applications have no concern with where the messages originated or how they were sent to ActiveMQ. This is an especially powerful benefit in heterogeneous environments, allowing clients to be written using different languages and even possibly different wire protocols. ActiveMQ acts as the middleman, allowing heterogeneous integration and interaction in an asynchronous manner. More on this in the next section. 

When considering distributed application design, coupling is important. Coupling refers to the interdependence of two or more applications or systems. An easy way to think about coupling is to consider the effect of changes to any application in the system: the implications across the other applications in the architecture as features are added. Do changes to one application force changes to other applications involved? If the answer is yes, then those applications are tightly coupled. But if one application can be changed without affecting other applications, then those applications are more loosely coupled. The overall lesson here is that tightly coupled applications are more difficult to maintain compared to loosely coupled applications. Said another way, loosely coupled applications can easily deal with unforeseen changes

Technologies such as those discussed in chapter 2 (COM, CORBA, DCE, and EJB) using RPC are considered to be tightly coupled. Using RPC, when one application calls another application, the caller is blocked until the callee returns control to the caller. The diagram in figure 1.1 depicts this concept. 
 

The caller (application one) in figure 1.1 is blocked until the callee (application two) returns control. Many system architectures use RPC and are successful. But there are numerous disadvantages to such a tightly coupled design: most notable is the higher amount of maintenance required, since even small changes ripple throughout the system architecture. Correct timing between the two applications is a necessity. Both applications must be available at the same time for the request from application one to reach application two B, and for the response to travel from application two to application one C. Such timing requirements can be cumbersome, causing the application to be fragile. Compare such a tightly coupled design with a design where two applications are completely unaware of one another such as that depicted in figure 1.2. 
 

Application one in figure 1.2 sends a message to the MOM in a one-way fashion. Then, possibly sometime later, application two receives a message from the MOM, in a one-way fashion. Neither application has any knowledge that the other even exists, and there’s no timing between the two applications. This one-way style of interaction results in much lower maintenance because changes in one application have little to no effect on the other application. For these reasons, loosely coupled applications offer big advantages over tightly coupled architectures when considering distributed application design. This is where ActiveMQ enters the picture. 

Consider the changes necessary when an application must move to a new location. This can happen when new hardware is introduced or the application needs to be moved. With a tightly coupled system design, such movement is difficult because all segments of the application must experience an outage. With an application designed using loose coupling, different segments of the system can be moved independent of one another. Consider a scenario where there are multiple instances of application Aand multiple instances of application B, where each instance resides on a different machine. ActiveMQ is installed on still another machine independent of either application A or application B. In this scenario, any one of the application A or application B instances can be moved around without affecting one another. In fact, multiple instances of ActiveMQ could be used in what’s known as a network of brokers configuration. This would allow the ActiveMQ instances to be moved around without affecting either application A or application B. This means that any segment of this architecture can be taken down for maintenance at any time without taking down the entire system. More details about this are available in chapter 10. 

So ActiveMQ provides an incredible amount of flexibility in application architecture, allowing the concepts surrounding loose coupling to become a reality. ActiveMQ also supports the request/reply paradigm of messaging if a completely asynchronous style of messaging isn’t possible for a given use case. But when should ActiveMQ be used to introduce these benefits? 

When to use ActiveMQ 
There are many occasions where ActiveMQ and asynchronous messaging can have a meaningful impact on a system architecture. Here are just a few example scenarios: 
- Heterogeneous application integration 
The ActiveMQ broker is written using the Java language, so naturally a Java client API is provided. But ActiveMQ also provides clients for C/C++, .NET, Perl, PHP, Python, Ruby, and a few other languages. This is a huge advantage when considering how you might integrate applications written in different languages on different platforms. In cases such as this, the various client APIs make it possible to send and receive messages via ActiveMQ no matter what language is used. In addition to the cross-language capabilities provided by ActiveMQ, the ability to integrate such applications without the use of RPC is definitely a big benefit because messaging truly helps to decouple the applications.

- As a replacement for RPC 
Applications using RPC-style synchronous calls are widespread. Consider that the vast majority of client-server applications use RPC including ATMs, most web applications, credit card systems, point-of-sale systems, and more. Even though many of these systems are successful, conversion to the use of asynchronous messaging can bring about benefits without giving up the guarantee of a response. Systems that rely upon synchronous requests typically have a limited ability to scale because eventually requests will begin to back up, thereby slowing the whole system. Instead of experiencing this type of a slowdown, using asynchronous messaging, additional message receivers can be easily added so that messages are consumed concurrently and therefore handled faster. This, of course, assumes that your applications can be decoupled.

- To loosen the coupling between applications 
As already discussed, tightly coupled architectures can be problematic for many reasons, especially if they’re distributed. Loosely coupled architectures, on the other hand, exhibit fewer dependencies, making them better at handling unforeseen changes. Not only will a change to one component in the system not ripple across the entire system, but component interaction is also dramatically simplified. Instead of using a synchronous scheme for component interaction (where one method calls another and the caller waits for a response from the callee), components utilize asynchronous communications (where they simply send a message without waiting for a response—also known as fire-and-forget). Such loose coupling throughout a system can lead to what’s known as an event-driven architecture (EDA).

- As the backbone of an event-driven architecture 
The decoupled, asynchronous style of architecture described in the previous point allows the broker itself to scale much further and handle considerably more clients via tuning, additional memory allocation, and so on (known as vertical scalability) instead of only relying upon the ability of the number of broker nodes to be increased to handle many more clients (known as horizontal scalability). Consider an incredibly high-traffic e-commerce site such as Amazon. When a user makes a purchase on Amazon, there are quite a few separate stages through which that order must travel including order placement, invoice creation, payment processing, order fulfillment, shipping, and more. But when a user actually places an order, the user is immediately taken to a page stating, “Thanks for your order.” Not only that, but without delay, the user also receives an email stating that the order was received. The order placement process that’s employed by Amazon is a good example of the first stage in a much larger set of asynchronous processes. Each stage of the order is handled discretely by a separate service.

- To improve application scalability 
Many applications utilize an event-driven architecture in order to provide massive scalability including such domains as e-commerce, government, manufacturing, and online gaming, just to name a few. By separating an application along lines in the business domain using asynchronous messaging, many other possibilities begin to emerge. Consider the ability to design an application using a service for a specific task. This is the backbone of service-oriented architecture (SOA). Each service fulfills a discrete function and only that function. Then applications are built through the composition of these services, and the communication among services is achieved using asynchronous messaging and eventual consistency. This style of application design makes it possible to introduce such concepts as complex event processing (CEP). Using CEP, the interactions among the components in a system are tracked for further analysis. Such possibilities are truly endless when you consider that asynchronous messaging is simply adding a level of indirection between components in a system.

2013年6月25日 星期二

[ InAction Note ] Ch5. Advanced search techniques - Searching across multiple Lucene indexes

Preface: 
Some applications need to maintain separate Lucene indexes, yet want to allow a single search to return combined results from all the indexes. Sometimes, such separation is done for convenience or administrative reasons—for example, if different people or groups maintain the index for different collections of documents. Other times it may be done due to high volume. For example, a news site may make a new index for every month and then choose which months to search over. 

Whatever the reason, Lucene provides two useful classes for searching across multiple indexes. We’ll first meet MultiSearcher, which uses a single thread to perform searching across multiple indexes. Then we’ll see ParallelMultiSearcher, which uses multiple threads to gain concurrency. 

Using MultiSearcher: 
With MultiSearcher, all indexes can be searched with the results merged in a specified (or descending-score, by default) order. Using MultiSearcher is comparable to using IndexSearcher, except that you hand it an array of IndexSearchers to search rather than a single directory (so it’s effectively a decorator pattern and delegates most of the work to the subsearchers). 

Below illustrates how to search two indexes that are split alphabetically by keyword. The index is made up of animal names beginning with each letter of the alphabet. Half the names are in one index, and half are in the other. A search is performed with a range that spans both indexes, demonstrating that results are merged together. 
- Listing 5.17 Securing the search space with a filter 
  1. package ch5;  
  2.   
  3. import java.io.File;  
  4.   
  5. import junit.framework.TestCase;  
  6.   
  7. import org.apache.lucene.analysis.WhitespaceAnalyzer;  
  8. import org.apache.lucene.document.Document;  
  9. import org.apache.lucene.document.Field;  
  10. import org.apache.lucene.index.IndexReader;  
  11. import org.apache.lucene.index.IndexWriter;  
  12. import org.apache.lucene.index.IndexWriterConfig;  
  13. import org.apache.lucene.index.MultiReader;  
  14. import org.apache.lucene.search.IndexSearcher;  
  15. import org.apache.lucene.search.TermRangeQuery;  
  16. import org.apache.lucene.search.TopDocs;  
  17. import org.apache.lucene.store.Directory;  
  18. import org.apache.lucene.store.FSDirectory;  
  19. import org.apache.lucene.util.Version;  
  20.   
  21. public class MultiSearcherTest extends TestCase {  
  22.     public static Version LUCENE_VERSION = Version.LUCENE_30;  
  23.       
  24.       private IndexReader[] readers;  
  25.         
  26.       public void setUp() throws Exception {  
  27.         String[] animals = { "aardvark""beaver""coati",  
  28.                            "dog""elephant""frog""gila monster",  
  29.                            "horse""iguana""javelina""kangaroo",  
  30.                            "lemur""moose""nematode""orca",  
  31.                            "python""quokka""rat""scorpion",  
  32.                            "tarantula""uromastyx""vicuna",  
  33.                            "walrus""xiphias""yak""zebra"};          
  34.         Directory aTOmDirectory = FSDirectory.open(new File("indice/animal_a-m"));  
  35.         Directory nTOzDirectory = FSDirectory.open(new File("indice/animal_n-z"));  
  36.           
  37.         IndexWriterConfig iwConfig = new IndexWriterConfig(LUCENE_VERSION, new WhitespaceAnalyzer(LUCENE_VERSION));  
  38.         IndexWriter aTOmWriter = new IndexWriter(aTOmDirectory, iwConfig);  
  39.           
  40.         IndexWriterConfig iwConfig2 = new IndexWriterConfig(LUCENE_VERSION, new WhitespaceAnalyzer(LUCENE_VERSION));  
  41.         IndexWriter nTOzWriter = new IndexWriter(nTOzDirectory, iwConfig2);                                                
  42.         for (int i=animals.length - 1; i >= 0; i--) {  
  43.           Document doc = new Document();  
  44.           String animal = animals[i];  
  45.           doc.add(new Field("animal", animal,  
  46.                   Field.Store.YES, Field.Index.NOT_ANALYZED));  
  47.           if (animal.charAt(0) < 'n') {  
  48.             aTOmWriter.addDocument(doc);  
  49.           } else {                                         
  50.             nTOzWriter.addDocument(doc);  
  51.           }  
  52.         }  
  53.         aTOmWriter.close();  
  54.         nTOzWriter.close();  
  55.         readers = new IndexReader[2];  
  56.         IndexReader areader = IndexReader.open(aTOmDirectory);  
  57.         IndexReader nreader = IndexReader.open(nTOzDirectory);  
  58.         readers[0] = areader;  
  59.         readers[1] = nreader;  
  60.       }  
  61.       public void testMulti() throws Exception {  
  62.         MultiReader  readerX = new MultiReader(readers);  
  63.         IndexSearcher searcher = new IndexSearcher(readerX);  
  64.         TermRangeQuery query = new TermRangeQuery("animal",  
  65.                                                   "h",  
  66.                                                   "t",  
  67.                                                   truetrue);  
  68.         TopDocs hits = searcher.search(query, 10);  
  69.         assertEquals("tarantula not included"12, hits.totalHits);  
  70.       }  
  71.   
  72. }  
The inclusive TermRangeQuery matches animal names that begin with h through animal names that begin with t, with the matching documents coming from both indexes. A related class, ParallelMultiSearcher, achieves the same functionality as MultiSearcher but uses multiple threads to gain concurrency. 

Multithreaded searching using ParallelMultiSearcher: 
A multithreaded version of MultiSearcher, called ParallelMultiSearcher, spawns a new thread for each Searchable and waits for them all to finish when the search method is invoked. The basic search and search with filter options are parallelized, but searching with a Collector hasn’t yet been parallelized. The exposed API is the same as MultiSearcher, so it’s a simple drop-in. 

Whether you’ll see performance gains using ParallelMultiSearcher depends on your architecture. If the indexes reside on different physical disks and your computer has CPU concurrency, you should see improved performance. But there hasn’t been much real-world testing to back this up, so be sure to test it for your application. 

A cousin to ParallelMultiSearcher lives in Lucene’s contrib/remote directory, enabling you to remotely search multiple indexes in parallel. We’ll talk about term vectors next, a topic you’ve already seen on the indexing side in chapter 2.

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...