程式扎記: [ Python 常見問題 ] nltp - Resource 'corpora/wordnet' not found

2017年4月12日星期三

[ Python 常見問題 ] nltp - Resource 'corpora/wordnet' not found

Source From Here
Question
Execution below test script:
- test.py

view plaincopy to clipboardprint?
#!/usr/bin/env python  
import spacy  
import nltk  
  
# Load spacy's English-language models  
en_nlp = spacy.load('en')  
  
# Instantiate nltk's Porter stemmer  
stemmer = nltk.stem.PorterStemmer()  
from nltk.stem import WordNetLemmatizer  
wordnet_lemmatizer = WordNetLemmatizer()  
  
# Define function to compare lemmatization in spacy with stemming in nltk  
def compare_normalization(doc):  
    # tokenize document in spacy  
    doc_spacy = en_nlp(doc)  
    # print lemmas found by spacy  
    print("Lemmatization:")  
    print([wordnet_lemmatizer.lemmatize(token.text) for token in doc_spacy])  
    # print tokens found by Porter stemmer  
    print("Stemming:")  
    print([stemmer.stem(token.norm_.lower()) for token in doc_spacy])  
  
compare_normalization(u"Our meeting today was worse than yesterday, "  
                       "I'm scared of meeting the clients tomorrow.")  

Will cause below exception:

How-To
What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory.
Step 1: Enter nltp downloader

# python -m nltk.downloader

Step 2: Download Corpus

Or in one step from Python code:

view plaincopy to clipboardprint?
nltk.download("wordnet", "whatever_the_absolute_path_to_myapp_is/nltk_data/")  

Step 3: Let nltk Know Where to Look
ntlk looks for data,resources,etc. in the locations specified in the nltk.data.path variable. All you need to do is add nltk.data.path.append('whatever_the_absolute_path_to_myapp_is/nltk_data/') to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.

程式扎記

標籤

2017年4月12日星期三

[ Python 常見問題 ] nltp - Resource 'corpora/wordnet' not found

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2017年4月12日 星期三

[ Python 常見問題 ] nltp - Resource 'corpora/wordnet' not found

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

2017年4月12日星期三