2019年1月18日 星期五

[ Python 常見問題 ] English grammar for parsing in NLTK

Source From Here 
Question 
Is there a ready-to-use English grammar that I can just load it and use in NLTK? I've searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence. 

How-To 
From 5. Categorizing and Tagging Word, you can do POS (Part Of Speech) this way: 
  1. In [1]: import nltk  
  2. In [4]: nltk.download('punkt')  
  3. [nltk_data] Downloading package punkt to  
  4. [nltk_data]     C:\Users\johnlee\AppData\Roaming\nltk_data...  
  5. [nltk_data]   Unzipping tokenizers\punkt.zip.  
  6. Out[4]: True  
  7.   
  8. In [5]: text = word_tokenize('And now for something completely different')  
  9. In [7]: nltk.download('averaged_perceptron_tagger')  
  10. In [8]: for w,pos in nltk.pos_tag(text):  
  11.    ...:     print('{}/{} '.format(w, pos))  
  12.    ...:  
  13. And/CC  
  14. now/RB  
  15. for/IN  
  16. something/NN  
  17. completely/RB  
  18. different/JJ  
Another library SpaCy, provides a high performance dependency parser. First of all, let's install it (Install spaCy): 
# yum install -y gcc
# yum install python-devel
# pip install --upgrade setuptools
# pip install spacy
# python -m spacy download en

If you encounter conflict with installed package during installing, you can use argument --ignore-installed to get bypass this issue sometimes. Then let's see how to use this library (linguistic features): 
>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion.')
>>> for token in doc:
... print("{}/{} ".format(token.text, token.pos_))
...
Apple/PROPN
is/VERB
looking/VERB
at/ADP
buying/VERB
U.K./PROPN
startup/NOUN
for/ADP
$/SYM
1/NUM
billion/NUM
./PUNCT
Choi et al. (2015) found spaCy to be the fastest dependency parser available. It processes over 13,000 sentences a second, on a single thread. On the standard WSJ evaluation it scores 92.7%, over 1% more accurate than any of CoreNLP's models. 

Supplement 
Natural Language Processing Made Easy – using SpaCy (​in Python)

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...