Python Gensim : how to calculate document similarity using the LDA model?

coder 2023-05-24 原文

我有一个经过训练的 LDA 模型，我想从我训练模型的语料库中计算两个文档之间的相似度得分。在学习了所有 Gensim 教程和功能之后，我仍然无法理解它。有人可以给我一个提示吗？谢谢!

最佳答案

取决于您要使用的相似度指标。

sim = gensim.matutils.cossim(vec_lda1, vec_lda2)

Hellinger distance对概率分布(例如 LDA 主题)之间的相似性很有用:

import numpy as np
dense1 = gensim.matutils.sparse2full(lda_vec1, lda.num_topics)
dense2 = gensim.matutils.sparse2full(lda_vec2, lda.num_topics)
sim = np.sqrt(0.5 * ((np.sqrt(dense1) - np.sqrt(dense2))**2).sum())

关于Python Gensim : how to calculate document similarity using the LDA model?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22433884/

Python Gensim : how to calculate document similarity using the LDA model?

有关Python Gensim : how to calculate document similarity using the LDA model?的更多相关文章

随机推荐