python - 寻找给定向量的 k 最近邻？

coder 2023-06-11 原文

鉴于我的知识数据库中有以下内容:

1  0  6 20  0  0  6 20
1  0  3  6  0  0  3  6
1  0 15 45  0  0 15 45
1  0 17 44  0  0 17 44
1  0  2  5  0  0  2  5

我希望能够找到以下向量的最近邻:

1  0  5 16  0  0  5 16

根据距离度量。所以在这种情况下，给定一个特定的阈值，我应该发现列出的第一个向量是给定向量的近邻。目前，我的知识数据库的大小约为数百万，因此计算每个点的距离度量然后进行比较证明是昂贵的。是否有任何替代方法可以显着加快实现这一目标？

我对几乎任何方法都持开放态度，包括在 MySQL 中使用空间索引(除了我不完全确定如何解决这个问题)或某种散列(这很好，但我也不完全相信当然)。

最佳答案

在 Python 中(来自 www.comp.mq.edu.au/):

def count_different_values(k_v1s, k_v2s):
    """kv1s and kv2s should be dictionaries mapping keys to 
    values.  count_different_values() returns the number of keys in 
    k_v1s and k_v2s that don't have the same value"""
    ks = set(k_v1s.iterkeys()) | set(k_v2s.iterkeys())
    return sum(1 for k in ks if k_v1s.get(k) != k_v2s.get(k))


def sum_square_diffs(x0s, x1s):
    """x1s and x2s should be equal-lengthed sequences of numbers.
    sum_square_differences() returns the sum of the squared differences 
    of x1s and x2s."""
    sum((pow(x1-x2,2) for x1,x2 in zip(x1s,x2s)))

def incr(x_c, x, inc=1):
    """increments the value associated with key x in dictionary x_c
    by inc, or sets it to inc if key x is not in dictionary x_c."""
    x_c[x] = x_c.get(x, 0) + inc

def count_items(xs, x_c=None):
    """returns a dictionary x_c whose keys are the items in xs, and 
    whose values are the number of times each item occurs in xs."""
    if x_c == None:
        x_c = {}
    for x in xs:
        incr(x_c, x)
    return x_c

def second(xy):
    """returns the second element in a sequence"""
    return xy[1]

def most_frequent(xs):
    """returns the most frequent item in xs"""
    x_c = count_items(xs)
    return sorted(x_c.iteritems(), key=second, reverse=True)[0][0]


class kNN_classifier:
    """This is a k-nearest-neighbour classifer."""
    def __init__(self, train_data, k, distf):
        self.train_data = train_data
        self.k = min(k, len(train_data))
        self.distf = distf

    def classify(self, x):
        Ns = sorted(self.train_data, 
                    key=lambda xy: self.distf(xy[0], x))
        return most_frequent((y for x,y in Ns[:self.k]))

    def batch_classify(self, xs):
        return [self.classify(x) for x in xs]

def train(train_data, k=1, distf=count_different_values):
    """Returns a kNN_classifer that contains the data, the number of
    nearest neighbours k and the distance function"""
    return kNN_classifier(train_data, k, distf)

也是 www.umanitoba.ca/的另一个实现

#!/usr/bin/env python
# This code is part of the Biopython distribution and governed by its
# license.  Please see the LICENSE file that should have been included
# as part of this package.
"""
This module provides code for doing k-nearest-neighbors classification.

k Nearest Neighbors is a supervised learning algorithm that classifies
a new observation based the classes in its surrounding neighborhood.

Glossary:
distance   The distance between two points in the feature space.
weight     The importance given to each point for classification. 


Classes:
kNN           Holds information for a nearest neighbors classifier.


Functions:
train        Train a new kNN classifier.
calculate    Calculate the probabilities of each class, given an observation.
classify     Classify an observation into a class.

    Weighting Functions:
equal_weight    Every example is given a weight of 1.

"""

import numpy

class kNN:
    """Holds information necessary to do nearest neighbors classification.

    Members:
    classes  Set of the possible classes.
    xs       List of the neighbors.
    ys       List of the classes that the neighbors belong to.
    k        Number of neighbors to look at.

    """
    def __init__(self):
        """kNN()"""
        self.classes = set()
        self.xs = []
        self.ys = []
        self.k = None

def equal_weight(x, y):
    """equal_weight(x, y) -> 1"""
    # everything gets 1 vote
    return 1

def train(xs, ys, k, typecode=None):
    """train(xs, ys, k) -> kNN

    Train a k nearest neighbors classifier on a training set.  xs is a
    list of observations and ys is a list of the class assignments.
    Thus, xs and ys should contain the same number of elements.  k is
    the number of neighbors that should be examined when doing the
    classification.

    """
    knn = kNN()
    knn.classes = set(ys)
    knn.xs = numpy.asarray(xs, typecode)
    knn.ys = ys
    knn.k = k
    return knn

def calculate(knn, x, weight_fn=equal_weight, distance_fn=None):
    """calculate(knn, x[, weight_fn][, distance_fn]) -> weight dict

    Calculate the probability for each class.  knn is a kNN object.  x
    is the observed data.  weight_fn is an optional function that
    takes x and a training example, and returns a weight.  distance_fn
    is an optional function that takes two points and returns the
    distance between them.  If distance_fn is None (the default), the
    Euclidean distance is used.  Returns a dictionary of the class to
    the weight given to the class.

    """
    x = numpy.asarray(x)

    order = []  # list of (distance, index)
    if distance_fn:
        for i in range(len(knn.xs)):
            dist = distance_fn(x, knn.xs[i])
            order.append((dist, i))
    else:
        # Default: Use a fast implementation of the Euclidean distance
        temp = numpy.zeros(len(x))
        # Predefining temp allows reuse of this array, making this
        # function about twice as fast.
        for i in range(len(knn.xs)):
            temp[:] = x - knn.xs[i]
            dist = numpy.sqrt(numpy.dot(temp,temp))
            order.append((dist, i))
    order.sort()

    # first 'k' are the ones I want.
    weights = {}  # class -> number of votes
    for k in knn.classes:
        weights[k] = 0.0
    for dist, i in order[:knn.k]:
        klass = knn.ys[i]
        weights[klass] = weights[klass] + weight_fn(x, knn.xs[i])

    return weights

def classify(knn, x, weight_fn=equal_weight, distance_fn=None):
    """classify(knn, x[, weight_fn][, distance_fn]) -> class

    Classify an observation into a class.  If not specified, weight_fn will
    give all neighbors equal weight.  distance_fn is an optional function
    that takes two points and returns the distance between them.  If
    distance_fn is None (the default), the Euclidean distance is used.
    """
    weights = calculate(
        knn, x, weight_fn=weight_fn, distance_fn=distance_fn)

    most_class = None
    most_weight = None
    for klass, weight in weights.items():
        if most_class is None or weight > most_weight:
            most_class = klass
            most_weight = weight
    return most_class

关于python - 寻找给定向量的 k 最近邻？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5684370/

给定 python 34 weight distance mysql algorithm language-agnostic search

有关python - 寻找给定向量的 k 最近邻？的更多相关文章

python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声？ - 2
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题？更新问题，以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点？我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等)，但我确实想创建一个输出文件。
ruby - 寻找通过阅读代码确定编程语言的ruby gem？ - 2
几个月前，我读了一篇关于rubygem的博客文章，它可以通过阅读代码本身来确定编程语言。对于我的生活，我不记得博客或gem的名称。谷歌搜索“ruby编程语言猜测”及其变体也无济于事。有人碰巧知道相关gem的名称吗？最佳答案是这个吗:http://github.com/chrislo/sourceclassifier/tree/master 关于ruby-寻找通过阅读代码确定编程语言的rubygem？，我们在StackOverflow上找到一个类似的问题：
Python 相当于 Perl/Ruby ||= - 2
这个问题在这里已经有了答案:关闭10年前。PossibleDuplicate:Pythonconditionalassignmentoperator对于这样一个简单的问题表示歉意，但是谷歌搜索||=并不是很有帮助；)Python中是否有与Ruby和Perl中的||=语句等效的语句？例如:foo="hey"foo||="what"#assignfooifit'sundefined#fooisstill"hey"bar||="yeah"#baris"yeah"另外，类似这样的东西的通用术语是什么？条件分配是我的第一个猜测，但Wikipediapage跟我想的不太一样。
java - 什么相当于 ruby 的 rack 或 python 的 Java wsgi？ - 2
什么是ruby的rack或python的Java的wsgi？还有一个路由库。最佳答案来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht
华为OD机试用Python实现 -【明明的随机数】 2023Q1A - 2
华为OD机试题本篇题目：明明的随机数题目输入描述输出描述：示例1输入输出说明代码编写思路最近更新的博客华为od2023|什么是华为od，od薪资待遇，od机试题清单华为OD机试真题大全，用Python解华为机试题|机试宝典【华为OD机试】全流程解析+经验分享,题型分享,防作弊指南华为o
python - 如何读取 MIDI 文件、更改其乐器并将其写回？ - 2
我想解析一个已经存在的.mid文件，改变它的乐器，例如从“acousticgrandpiano”到“violin”，然后将它保存回去或作为另一个.mid文件。根据我在文档中看到的内容，该乐器通过program_change或patch_change指令进行了更改，但我找不到任何在已经存在的MIDI文件中执行此操作的库.他们似乎都只支持从头开始创建的MIDI文件。最佳答案 MIDIpackage会为您完成此操作，但具体方法取决于midi文件的原始内容。一个MIDI文件由一个或多个音轨组成，每个音轨是十六个channel中任何一个上的
「Python｜Selenium｜场景案例」如何定位iframe中的元素？ - 2
本文主要介绍在使用Selenium进行自动化测试或者任务时，对于使用了iframe的页面，如何定位iframe中的元素文章目录场景描述解决方案具体代码场景描述当我们在使用Selenium进行自动化测试的时候，可能会遇到一些界面或者窗体是使用HTML的iframe标签进行承载的。对于iframe中的标签，如果直接查找是无法找到的，会抛出没有找到元素的异常。比如近在咫尺的例子就是，CSDN的登录窗体就是使用的iframe，大家可以尝试通过F12开发者模式查看到的tag_name,class_name,id或者xpath来定位中的页面元素，会抛出NoSuchElementException异常。解决
python ffmpeg 使用 pyav 转换一组图像到视频 - 2
2022/8/4更新支持加入水印水印必须包含透明图像，并且水印图像大小要等于原图像的大小pythonconvert_image_to_video.py-f30-mwatermark.pngim_dirout.mkv2022/6/21更新让命令行参数更加易用新的命令行使用方法pythonconvert_image_to_video.py-f30im_dirout.mkvFFMPEG命令行转换一组JPG图像到视频时，是将这组图像视为MJPG流。我需要转换一组PNG图像到视频，FFMPEG就不认了。pyav内置了ffmpeg库，不需要系统带有ffmpeg工具因此我使用ffmpeg的python包装p
Python 刷Leetcode题库，顺带学英语单词（31） - 2
ValidPalindromeGivenastring,determineifitisapalindrome,consideringonlyalphanumericcharactersandignoringcases. [#125]Example:"Aman,aplan,acanal:Panama"isapalindrome."raceacar"isnotapalindrome.Haveyouconsiderthatthestringmightbeempty?Thisisagoodquestiontoaskduringaninterview.Forthepurposeofthisproblem
python - 是否可以使用 Ruby 或 Python 禁用 anchor /引用来发出有效的 YAML？ - 2
是否可以在PyYAML或Ruby的Psych引擎中禁用创建anchor和引用(并有效地显式列出冗余数据)？也许我在网上搜索时遗漏了一些东西，但在Psych中似乎没有太多可用的选项，而且我也无法确定PyYAML是否允许这样做.基本原理是我必须序列化一些数据并将其以可读的形式传递给一个不是真正的技术同事进行手动验证。有些数据是多余的，但我需要以最明确的方式列出它们以提高可读性(anchor和引用是提高效率的好概念，但不是人类可读性)。Ruby和Python是我选择的工具，但如果有其他一些相当简单的方法来“展开”YAML文档，它可能就可以了。最佳答案

python - 寻找给定向量的 k 最近邻？

有关python - 寻找给定向量的 k 最近邻？的更多相关文章

随机推荐