2017年4月12日 星期三

[ Python 常見問題 ] nltp - Resource 'corpora/wordnet' not found

Source From Here 
Question 
Execution below test script: 
- test.py 
  1. #!/usr/bin/env python  
  2. import spacy  
  3. import nltk  
  4.   
  5. # Load spacy's English-language models  
  6. en_nlp = spacy.load('en')  
  7.   
  8. # Instantiate nltk's Porter stemmer  
  9. stemmer = nltk.stem.PorterStemmer()  
  10. from nltk.stem import WordNetLemmatizer  
  11. wordnet_lemmatizer = WordNetLemmatizer()  
  12.   
  13. # Define function to compare lemmatization in spacy with stemming in nltk  
  14. def compare_normalization(doc):  
  15.     # tokenize document in spacy  
  16.     doc_spacy = en_nlp(doc)  
  17.     # print lemmas found by spacy  
  18.     print("Lemmatization:")  
  19.     print([wordnet_lemmatizer.lemmatize(token.text) for token in doc_spacy])  
  20.     # print tokens found by Porter stemmer  
  21.     print("Stemming:")  
  22.     print([stemmer.stem(token.norm_.lower()) for token in doc_spacy])  
  23.   
  24. compare_normalization(u"Our meeting today was worse than yesterday, "  
  25.                        "I'm scared of meeting the clients tomorrow.")  
Will cause below exception: 

How-To 
What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory. 
Step 1: Enter nltp downloader 
# python -m nltk.downloader

Step 2: Download Corpus 

Or in one step from Python code: 
  1. nltk.download("wordnet""whatever_the_absolute_path_to_myapp_is/nltk_data/")  
Step 3: Let nltk Know Where to Look 
ntlk looks for data,resources,etc. in the locations specified in the nltk.data.path variable. All you need to do is add nltk.data.path.append('whatever_the_absolute_path_to_myapp_is/nltk_data/') to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...