Lucene in action: a sample application :
To show you Lucene’s indexing and searching capabilities, we’ll use a pair of command-line applications: Indexer and Searcher. First we’ll index files in a directory; then we’ll search the created index. Before we can search with Lucene, we need to build an index, so we start with our Indexer application.
- Creating an index
A simple class called Indexer, which indexes all files in a directory ending with the .txt extension. When Indexer completes execution, it leaves behind a Lucene index for its sibling, Searcher (presented next in section 1.4.2). After the annotated code listing, we show you how to use Indexer; if it helps you to learn how Indexer is used before you see how it’s coded, go directly to the usage discussion that follows the code.
USING INDEXER TO INDEX TEXT FILES
Listing 1.1 shows the Indexer command-line program, originally written for Erik’s introductory Lucene article on java.net. It takes two arguments:
Listing 1.1 Indexer, which indexes .txt files
This class defines enum constants, such as LUCENE_24 and LUCENE_29, referencing Lucene’s minor releases. When you pass one of these values, it instructs Lucene to match the settings and behavior of that particular release. Lucene will also emulate bugs present in that release and fixed in later releases, if the Lucene developers felt that fixing the bug would break backward compatibility of existing indexes. For each class that accepts a Version parameter, you’ll have to consult the Javadocs to see what settings and bugs are changed across versions. It hows seriously the Lucene developers take backward compatibility.
Let’s use Indexer to build our first Lucene search index!
假設你在當前目錄下有目錄 ./data 要進行 Indexing (有文件 doc1.txt, doc2.txt), 並打算將 index 的結果放在 ./index. 可以使用如下代碼利用類別 Indexer 進行 Indexing:
In our example, each of the indexed files was small, but roughly 0.8 seconds to index a handful of text files is reasonably impressive. Indexing throughput is clearly important, and we cover it extensively in chapter 11. But generally, searching is far more important since an index is built once but searched many times.
- Searching an index
Searching in Lucene is as fast and simple as indexing; the power of this functionality is astonishing, as chapters 3, 5, and 6 will show you. For now, let’s look at Searcher, a command-line program that we’ll use to search the index created by Indexer.
USING SEARCHER TO IMPLEMENT A SEARCH
The Searcher program, originally written for Erik’s introductory Lucene article on java.net, complements Indexer and provides command-line searching capability. Listing 1.2 shows Searcher in its entirety. It takes two command-line arguments:
Listing 1.2 Searcher, which searches a Lucene index
接著我們可以使用下面代碼對剛剛 indexing 的結果進行查詢(index 的結果在 ./index), 假設我們的要找的文件有關鍵字 "John", 則可以參考下面代碼:
You can use more sophisticated queries, such as 'patent AND freedom' or 'patent AND NOT apache' or '+copyright +developers', and so on. Chapters 3, 5, and 6 cover various aspects of searching, including Lucene’s query syntax.
Indexer’s parsing of command-line arguments and directory listings to look for text files and Searcher’s code that prints matched filenames based on a query to the standard output. But don’t let this fact, or the conciseness of the examples, tempt you into complacence: there’s a lot going on under the covers of Lucene. To effectively leverage Lucene, you must understand how it works and how to extend it when the need arises. The remainder of this book is dedicated to giving you these missing pieces. Next we’ll drill down into the core classes Lucene exposes for indexing and searching - Understanding the core searching/indexing classes