程式扎記: [ Python 常見問題 ] English grammar for parsing in NLTK

2019年1月18日星期五

[ Python 常見問題 ] English grammar for parsing in NLTK

Source From Here
Question
Is there a ready-to-use English grammar that I can just load it and use in NLTK? I've searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence.

How-To
From 5. Categorizing and Tagging Word, you can do POS (Part Of Speech) this way:

view plaincopy to clipboardprint?
In [1]: import nltk  
In [4]: nltk.download('punkt')  
[nltk_data] Downloading package punkt to  
[nltk_data]     C:\Users\johnlee\AppData\Roaming\nltk_data...  
[nltk_data]   Unzipping tokenizers\punkt.zip.  
Out[4]: True  
  
In [5]: text = word_tokenize('And now for something completely different')  
In [7]: nltk.download('averaged_perceptron_tagger')  
In [8]: for w,pos in nltk.pos_tag(text):  
   ...:     print('{}/{} '.format(w, pos))  
   ...:  
And/CC  
now/RB  
for/IN  
something/NN  
completely/RB  
different/JJ  

Another library SpaCy, provides a high performance dependency parser. First of all, let's install it (Install spaCy):

# yum install -y gcc
# yum install python-devel
# pip install --upgrade setuptools
# pip install spacy
# python -m spacy download en

If you encounter conflict with installed package during installing, you can use argument --ignore-installed to get bypass this issue sometimes. Then let's see how to use this library (linguistic features):

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion.')
>>> for token in doc:
... print("{}/{} ".format(token.text, token.pos_))
...
Apple/PROPN
is/VERB
looking/VERB
at/ADP
buying/VERB
U.K./PROPN
startup/NOUN
for/ADP
$/SYM
1/NUM
billion/NUM
./PUNCT

Choi et al. (2015) found spaCy to be the fastest dependency parser available. It processes over 13,000 sentences a second, on a single thread. On the standard WSJ evaluation it scores 92.7%, over 1% more accurate than any of CoreNLP's models.

Supplement
* Natural Language Processing Made Easy – using SpaCy (in Python)

程式扎記

標籤

2019年1月18日星期五

[ Python 常見問題 ] English grammar for parsing in NLTK

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2019年1月18日 星期五

[ Python 常見問題 ] English grammar for parsing in NLTK

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2019年1月18日星期五