nltk_草庐IT

python - 使用 nltk 改进人名的提取

关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题吗？更新问题，以便editingthispost提供事实和引用来回答它.关闭3年前。Improvethisquestion我正在尝试从文本中提取人名。有没有人推荐的方法？这是我尝试过的(代码如下):我正在使用nltk查找标记为人的所有内容，然后生成该人所有NNP部分的列表。我跳过只有一个NNP的人，以避免捕获一个单独的姓氏。我得到了不错的结果，但想知道是否有更好的方法来解决这个问题。代码:importnltkfromnameparser.parserimportHumanNamedefget_human_na

人名改进 section 39 PERSON python nlp nltk

python - 如何下载 NLTK 数据？

更新的答案:NLTK适用于2.7。我有3.2。我卸载了3.2并安装了2.7。现在可以了!!我已安装NLTK并尝试下载NLTK数据。我所做的是按照该站点上的说明进行操作:http://www.nltk.org/data.html我下载了NLTK，安装了它，然后尝试运行以下代码:>>>importnltk>>>nltk.download()它给了我如下错误消息:Traceback(mostrecentcalllast):File"",line1,innltk.download()AttributeError:'module'objecthasnoattribute'download'Dir

python NLTK code gt

python - 如何标记 NLTK 中的字符串句子？

我正在使用nltk，所以我想创建自己的自定义文本，就像nltk.books上的默认文本一样。但是，我刚刚开始使用类似的方法my_text=['This','is','my','text']我想找到任何方法来输入我的“文本”:my_text="Thisismytext,thisisanicewaytoinputtext."哪种方法，python或nltk允许我这样做。更重要的是，如何消除标点符号？最佳答案这实际上是在mainpageofnltk.org:>>>importnltk>>>sentence="""Ateighto'cl

句子 python 39 section nltk nlp tokenize

Python NLTK : SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP)

我正在使用NLTK来完成关于情绪分析的任务。我正在使用Python2.7。NLTK3.0和NumPy1.9.1版本。这是代码:__author__='karan'importnltkimportreimportsysdefmain():print("Start");#gettingthestopwordsstopWords=open("english.txt","r");stop_word=stopWords.read().split();AllStopWrd=[]forwdinstop_word:AllStopWrd.append(wd);print("stopwords->",Al

SyntaxError Non-ASCII 34 word print python unicode nlp nltk

python - 使用 NLTK 删除停用词

我正在尝试通过使用nltk工具包删除停用词来处理用户输入的文本，但是通过停用词删除，“and”、“or”、“not”等词会被删除。我希望这些词在停用词删除过程之后出现，因为它们是稍后将文本处理为查询所需的运算符。我不知道在文本查询中哪些词可以作为运算符，我也想从文本中删除不必要的词。最佳答案 NLTK中有一个内置的停用词列表，由11种语言的2,400个停用词组成(Porter等人)，参见http://nltk.org/book/ch02.html>>>fromnltkimportword_tokenize>>>fromnltk.c

用词 python section gt nlp nltk stop-words

python - 使用 NLTK 创建新语料库

我认为我标题的答案通常是去阅读文档，但我浏览了NLTKbook但它没有给出答案。我对Python有点陌生。我有一堆.txt文件，我希望能够使用NLTK为语料库nltk_data提供的语料库函数。我已经尝试过PlaintextCorpusReader但我无法做到:>>>importnltk>>>fromnltk.corpusimportPlaintextCorpusReader>>>corpus_root='./'>>>newcorpus=PlaintextCorpusReader(corpus_root,'.*')>>>newcorpus.words()如何使用punkt分割newco

语料 python corpus newcorpus code nlp nltk

python - NLTK 的所有可能的 pos 标签是什么？

如何找到包含自然语言工具包(nltk)使用的所有可能pos标签的列表？最佳答案为了节省一些人的时间，这是我从一个小型语料库中提取的列表。我不知道它是否完整，但它应该包含来自upenn_tagset的大部分(如果不是全部)帮助定义...CC:连词，协调&'nandbothbuteitheretforlessminusneithernororplussothereforetimesv.versusvs.whetheryetCD:数字，基数mid-1890nine-thirtyforty-twoone-tenthtenmillion0

python NLTK strong code pre

python - 使用 nltk.data.load 加载english.pickle 失败

尝试加载punkt标记器时...importnltk.datatokenizer=nltk.data.load('nltk:tokenizers/punkt/english.pickle')...引发了一个LookupError:>LookupError:>*********************************************************************>Resource'tokenizers/punkt/english.pickle'notfound.PleaseusetheNLTKDownloadertoobtaintheresource:n

english python nltk section nltk_data jenkins

python - 如何查看安装了哪个版本的nltk、scikit learn？

在shell脚本中，我正在检查是否安装了此软件包，如果未安装则安装它。所以使用shell脚本:importnltkechonltk.__version__但它会在import行停止shell脚本在linux终端尝试用这种方式查看:whichnltk这并没有让人觉得它已经安装了。有没有其他方法可以在shell脚本中验证这个包安装，如果没有安装，也安装它。最佳答案 importnltk是Python语法，因此在shell脚本中不起作用。要测试nltk和scikit_learn的版本，您可以编写一个Python脚本并运行它。这样的脚本可

python scikit code section version linux shell scikit-learn nltk

python - 如何查看安装了哪个版本的nltk、scikit learn？

在shell脚本中，我正在检查是否安装了此软件包，如果未安装则安装它。所以使用shell脚本:importnltkechonltk.__version__但它会在import行停止shell脚本在linux终端尝试用这种方式查看:whichnltk这并没有让人觉得它已经安装了。有没有其他方法可以在shell脚本中验证这个包安装，如果没有安装，也安装它。最佳答案 importnltk是Python语法，因此在shell脚本中不起作用。要测试nltk和scikit_learn的版本，您可以编写一个Python脚本并运行它。这样的脚本可

python scikit code section version linux shell scikit-learn nltk