python - 朴素贝叶斯分类器错误

coder 2023-08-24 原文

嘿，我正在尝试使用朴素贝叶斯分类器对一些文本进行分类。我正在使用 NLTK。每当我使用 classify() 方法测试分类器时，它总是为第一项返回正确的分类，并为我分类的所有其他文本行返回相同的分类。以下是我的代码:

from nltk.corpus import movie_reviews
from nltk.tokenize import word_tokenize
import nltk
import random
import nltk.data

documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = all_words.keys()[:2000] 

def bag_of_words(words):
    return dict([word,True] for word in words)

def document_features(document): 
    document_words = set(document) 
    features = {}
    for word in word_features:
        features['contains(%s)' % word] = (word in document_words)
    return features

featuresets = [(document_features(d), c) for (d,c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]
classifier = nltk.NaiveBayesClassifier.train(train_set)

text1="i love this city"
text2="i hate this city"


feats1=bag_of_words(word_tokenize(text1))
feats2=bag_of_words(word_tokenize(text2))


print classifier.classify(feats1)
print classifier.classify(feats2)

此代码将打印 pos 两次，就像我翻转代码的最后两行一样，它将打印 neg 两次。谁能帮忙？

最佳答案

改变

features['contains(%s)' % word] = (word in document_words)

到

features[word] = (word in document)

否则分类器只知道“contains(...)”形式的“词”，因此对“i love this city”

中的词一无所知

import nltk.tokenize as tokenize
import nltk
import random
random.seed(3)

def bag_of_words(words):
    return dict([word, True] for word in words)

def document_features(document): 
    features = {}
    for word in word_features:
        features[word] = (word in document)
        # features['contains(%s)' % word] = (word in document_words)
    return features

movie_reviews = nltk.corpus.movie_reviews

documents = [(set(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = all_words.keys()[:2000] 

train_set = [(document_features(d), c) for (d, c) in documents[:200]]

classifier = nltk.NaiveBayesClassifier.train(train_set)

classifier.show_most_informative_features()
for word in ('love', 'hate'):
    # No hope in passing the tests if word is not in word_features
    assert word in word_features
    print('probability {w!r} is positive: {p:.2%}'.format(
        w = word, p = classifier.prob_classify({word : True}).prob('pos')))

tests = ["i love this city",
         "i hate this city"]

for test in tests:
    words = tokenize.word_tokenize(test)
    feats = bag_of_words(words)
    print('{s} => {c}'.format(s = test, c = classifier.classify(feats)))

产量

Most Informative Features
                   worst = True              neg : pos    =     15.5 : 1.0
              ridiculous = True              neg : pos    =     11.5 : 1.0
                  batman = True              neg : pos    =      7.6 : 1.0
                   drive = True              neg : pos    =      7.6 : 1.0
                   blame = True              neg : pos    =      7.6 : 1.0
                terrible = True              neg : pos    =      6.9 : 1.0
                  rarely = True              pos : neg    =      6.4 : 1.0
                 cliches = True              neg : pos    =      6.0 : 1.0
                       $ = True              pos : neg    =      5.9 : 1.0
               perfectly = True              pos : neg    =      5.5 : 1.0
probability 'love' is positive: 61.52%
probability 'hate' is positive: 36.71%
i love this city => pos
i hate this city => neg

关于python - 朴素贝叶斯分类器错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13504424/

贝叶朴素 word features words python nltk

有关python - 朴素贝叶斯分类器错误的更多相关文章

python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声？ - 2
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题？更新问题，以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点？我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等)，但我确实想创建一个输出文件。
ruby-on-rails - Rails 常用字符串(用于通知和错误信息等) - 2
大约一年前，我决定确保每个包含非唯一文本的Flash通知都将从模块中的方法中获取文本。我这样做的最初原因是为了避免一遍又一遍地输入相同的字符串。如果我想更改措辞，我可以在一个地方轻松完成，而且一遍又一遍地重复同一件事而出现拼写错误的可能性也会降低。我最终得到的是这样的:moduleMessagesdefformat_error_messages(errors)errors.map{|attribute,message|"Error:#{attribute.to_s.titleize}#{message}."}enddeferror_message_could_not_find(obje
ruby-on-rails - 迷你测试错误 : "NameError: uninitialized constant" - 2
我遵循MichaelHartl的“RubyonRails教程:学习Web开发”，并创建了检查用户名和电子邮件长度有效性的测试(名称最多50个字符，电子邮件最多255个字符)。test/helpers/application_helper_test.rb的内容是:require'test_helper'classApplicationHelperTest在运行bundleexecraketest时，所有测试都通过了，但我看到以下消息在最后被标记为错误:ERROR["test_full_title_helper",ApplicationHelperTest,1.820016791]test
ruby-on-rails - 如何在 Rails View 上显示错误消息？ - 2
我是rails的新手，想在form字段上应用验证。myviewsnew.html.erb.....模拟.rbclassSimulation{:in=>1..25,:message=>'Therowmustbebetween1and25'}end模拟Controller.rbclassSimulationsController我想检查模型类中row字段的整数范围，如果不在范围内则返回错误信息。我可以检查上面代码的范围，但无法返回错误消息提前致谢最佳答案关键是您使用的是模型表单，一种显示ActiveRecord模型实例属性的表单。c
使用 ACL 调用 upload_file 时出现 Ruby S3 "Access Denied"错误 - 2
我正在尝试编写一个将文件上传到AWS并公开该文件的Ruby脚本。我做了以下事情:s3=Aws::S3::Resource.new(credentials:Aws::Credentials.new(KEY,SECRET),region:'us-west-2')obj=s3.bucket('stg-db').object('key')obj.upload_file(filename)这似乎工作正常，除了该文件不是公开可用的，而且我无法获得它的公共(public)URL。但是当我登录到S3时，我可以正常查看我的文件。为了使其公开可用，我将最后一行更改为obj.upload_file(file
ruby-on-rails - 错误 : Error installing pg: ERROR: Failed to build gem native extension - 2
我克隆了一个rails仓库，我现在正尝试捆绑安装背景:OSXElCapitanruby2.2.3p173(2015-08-18修订版51636)[x86_64-darwin15]rails-v在您的Gemfile中列出的或native可用的任何gem源中找不到gem'pg(>=0)ruby'。运行bundleinstall以安装缺少的gem。bundleinstallFetchinggemmetadatafromhttps://rubygems.org/............Fetchingversionmetadatafromhttps://rubygems.org/...Fe
ruby - ＃之间？ Cooper 的 *Beginning Ruby* 中的错误或异常 - 2
在Cooper的书BeginningRuby中，第166页有一个我无法重现的示例。classSongincludeComparableattr_accessor:lengthdef(other)@lengthother.lengthenddefinitialize(song_name,length)@song_name=song_name@length=lengthendenda=Song.new('Rockaroundtheclock',143)b=Song.new('BohemianRhapsody',544)c=Song.new('MinuteWaltz',60)a.betwee
ruby-on-rails - 每次我尝试部署时，我都会得到 - (gcloud.preview.app.deploy) 错误响应 : [4] DEADLINE_EXCEEDED - 2
我是Google云的新手，我正在尝试对其进行首次部署。我的第一个部署是RubyonRails项目。我基本上是在关注thisguideinthegoogleclouddocumentation.唯一的区别是我使用的是我自己的项目，而不是他们提供的“helloworld”项目。这是我的app.yaml文件runtime:customvm:trueentrypoint:bundleexecrackup-p8080-Eproductionconfig.ruresources:cpu:0.5memory_gb:1.3disk_size_gb:10当我转到我的项目目录并运行gcloudprevie
ruby-on-rails - Rails 5 Active Record 记录无效错误 - 2
我有两个Rails模型，即Invoice和Invoice_details。一个Invoice_details属于Invoice，一个Invoice有多个Invoice_details。我无法使用accepts_nested_attributes_forinInvoice通过Invoice模型保存Invoice_details。我收到以下错误:(0.2ms)BEGIN(0.2ms)ROLLBACKCompleted422UnprocessableEntityin25ms(ActiveRecord:4.0ms)ActiveRecord::RecordInvalid(Validationfa
arrays - 这是 Ruby 中 Array.fill 方法的错误吗？ - 2
这个问题在这里已经有了答案:Arraysmisbehaving(1个回答)关闭6年前。是否应该这样，即我误解了，还是错误？a=Array.new(3,Array.new(3))a[1].fill('g')=>[["g","g","g"],["g","g","g"],["g","g","g"]]它不应该导致:=>[[nil,nil,nil],["g","g","g"],[nil,nil,nil]]

python - 朴素贝叶斯分类器错误

有关python - 朴素贝叶斯分类器错误的更多相关文章

随机推荐