Source From Here
Simple CoreNLP
In addition to the fully-featured annotator pipeline interface to CoreNLP, Stanford provides a simple API for users who do not need a lot of customization. The intended audience of this package is users of CoreNLP who want “import nlp” to work as fast and easily as possible, and do not care about the details of the behaviors of the algorithms. An example usage is given below:
The API is included in the CoreNLP release from 3.6.0 onwards. Visit the
download page to download CoreNLP; make sure to include both the code jar and the models jar in your classpath!
Advantages and Disadvantages
This interface offers a number of advantages (and a few disadvantages – see below) over the default annotator pipeline:
In exchange for these advantages, users should be aware of a few disadvantages:
Usage
There are two main classes in the interface: Document and Sentence. Tokens are represented as array elements in a sentence; e.g., to get the lemma of a token, get the lemmas array from the sentence and index it at the appropriate index. A constructor is provided for both the Document and Sentence class. For the former, the text is treated as an entire document containing potentially multiple sentences. For the latter, the text is forced to be interpreted as a single sentence. An example program using the interface is given below:
Example-1 - Token & POS in English
Output:
Supported Annotators
The interface is not guaranteed to support all of the annotators in the CoreNLP pipeline. However, most common annotators are supported. A list of these, and their invocation, is given below. Functionality is the plain-english description of the task to be performed. The second column lists the analogous CoreNLP annotator for that task. The implementing class and function describe the class and function used in this wrapper to perform the same tasks.
Patches for incorporating additional annotators are of course always welcome!
Miscellaneous Extras
Some potentially useful utility functions are implemented in the SentenceAlgorithms class. These can be called from a Sentence object with, e.g.:
A selection of useful algorithms are:
Supplement
* Difference between constituency parser and dependency parser
Simple CoreNLP
In addition to the fully-featured annotator pipeline interface to CoreNLP, Stanford provides a simple API for users who do not need a lot of customization. The intended audience of this package is users of CoreNLP who want “import nlp” to work as fast and easily as possible, and do not care about the details of the behaviors of the algorithms. An example usage is given below:
- import edu.stanford.nlp.simple.*;
- Sentence sent = new Sentence("Lucy is in the sky with diamonds.");
- List
nerTags = sent.nerTags(); // [PERSON, O, O, O, O, O, O, O] - String firstPOSTag = sent.posTag(0); // NNP
- ...
Advantages and Disadvantages
This interface offers a number of advantages (and a few disadvantages – see below) over the default annotator pipeline:
In exchange for these advantages, users should be aware of a few disadvantages:
Usage
There are two main classes in the interface: Document and Sentence. Tokens are represented as array elements in a sentence; e.g., to get the lemma of a token, get the lemmas array from the sentence and index it at the appropriate index. A constructor is provided for both the Document and Sentence class. For the former, the text is treated as an entire document containing potentially multiple sentences. For the latter, the text is forced to be interpreted as a single sentence. An example program using the interface is given below:
- import edu.stanford.nlp.simple.*;
- public class SimpleCoreNLPDemo {
- public static void main(String[] args) {
- // Create a document. No computation is done yet.
- Document doc = new Document("add your text here! It can contain multiple sentences.");
- for (Sentence sent : doc.sentences()) { // Will iterate over two sentences
- // We're only asking for words -- no need to load any models yet
- System.out.println("The second word of the sentence '" + sent + "' is " + sent.word(1));
- // When we ask for the lemma, it will load and run the part of speech tagger
- System.out.println("The third lemma of the sentence '" + sent + "' is " + sent.lemma(2));
- // When we ask for the parse, it will load and run the parser
- System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
- // ...
- }
- }
- }
- package demo.simple;
- import edu.stanford.nlp.simple.Document;
- import edu.stanford.nlp.simple.Sentence;
- public class TokenizeDemo {
- public static void main(String[] args) {
- Document edoc = new Document("Hello, Mary. My name is John and nice to meet you. Will you be available tomorrow?");
- int si=0;
- for(Sentence s:edoc.sentences())
- {
- int wi=0;
- System.out.printf("\t[Info] Sentence%d: ", si);
- for(String w:s.words())
- {
- System.out.printf("%s(%s/%s) ", w, s.posTag(wi), s.nerTag(wi));
- wi++;
- }
- System.out.printf("\n%s\n", s.parse());
- System.out.println();
- si++;
- }
- }
- }
Supported Annotators
The interface is not guaranteed to support all of the annotators in the CoreNLP pipeline. However, most common annotators are supported. A list of these, and their invocation, is given below. Functionality is the plain-english description of the task to be performed. The second column lists the analogous CoreNLP annotator for that task. The implementing class and function describe the class and function used in this wrapper to perform the same tasks.
Patches for incorporating additional annotators are of course always welcome!
Miscellaneous Extras
Some potentially useful utility functions are implemented in the SentenceAlgorithms class. These can be called from a Sentence object with, e.g.:
- Sentence sent = new Sentence("your text should go here");
- sent.algorithms().headOfSpan(new Span(0, 2)); // Should return 1
Supplement
* Difference between constituency parser and dependency parser
沒有留言:
張貼留言