java - 为什么哈希表会通过加倍来调整大小？

coder 2023-09-02 原文

检查 java 并在线搜索哈希表代码示例似乎是通过加倍来调整表的大小。
但是大多数教科书都说表格的最佳尺寸是质数。
所以我的问题是:
加倍的做法是因为:

很容易实现，或者
寻找质数是否效率太低(但我认为寻找下一个素数遍历 n+=2 并使用模是 O(loglogN) 这是便宜的)
或者这是我的误解，只是某些哈希表的变体只需要素表大小？

更新:
教科书中介绍的使用质数的方式是某些属性起作用所必需的(例如，二次探查需要一个质数大小的表来证明，例如，如果表不完整，将插入项目 X)。
作为重复发布的链接通常询问有关增加任何数字的问题，例如25% 或下一个质数，接受的答案表明我们加倍以保持调整大小操作“罕见”，因此我们可以保证摊销时间。
这没有回答具有素数的表大小和使用素数来调整大小甚至大于两倍的问题。所以想法是在考虑调整大小的开销的情况下保持主要大小的属性

最佳答案

Q: But most textbooks say that the best size for the table is a prime number.

Regarding size primality:

What comes to primality of size, it depends on collision resolution algorithm your choose. Some algorithms require prime table size (double hashing, quadratic hashing), others don't, and they could benefit from table size of power of 2, because it allows very cheap modulo operations. However, when closest "available table sizes" differ in 2 times, memory usage of hash table might be unreliable. So, even using linear hashing or separate chaining, you can choose non power of 2 size. In this case, in turn, it's worth to choose particulary prime size, because:

If you pick prime table size (either because algorithm requires this, or because you are not satisfied with memory usage unreliability implied by power-of-2 size), table slot computation (modulo by table size) could be combined with hashing. See this answer for more.

The point that table size of power of 2 is undesirable when hash function distribution is bad (from the answer by Neil Coffey) is impractical, because even if you have bad hash function, avalanching it and still using power-of-2 size would be faster that switching to prime table size, because a single integral division is still slower on modern CPUs that several of multimplications and shift operations, required by good avalanching functions, e. g. from MurmurHash3.

Q: Also to be honest I got lost a bit on if you actually recommend primes or not. Seems that it depends on the hash table variant and the quality of the hash function?

散列函数的质量无关紧要，您始终可以通过 MurMur3 雪崩“改进”散列函数，这比从 2 的幂表大小切换到素数表大小更便宜，请参见上文。
我建议使用 QHash 或二次哈希算法 ( aren't same ) 选择质数大小，仅当您需要精确控制哈希表负载因子并且可预测的高实际负载。对于 2 的幂表大小，最小调整因子为 2，通常我们不能保证哈希表的实际负载因子会高于 0.5。 See this answer.

否则，我建议使用线性探测的 2 次幂大小的哈希表。

Q: Is the approach of doubling because:
It is easy to implement, or

基本上，在很多情况下，是的。参见 this large answer regarding load factors :

Load factor is not an essential part of hash table data structure -- it is the way to define rules of behaviour for the dymamic system (growing/shrinking hash table is a dynamic system).

Moreover, in my opinion, in 95% of modern hash table cases this way is over simplified, dynamic systems behave suboptimally.

什么是加倍？这只是最简单的调整大小策略。该策略可以任意复杂，在您的用例中以最佳方式执行。它可以考虑当前的哈希表大小、增长强度(自上次调整大小以来完成了多少获取操作)等。没有人禁止您实现此类自定义调整大小逻辑。

Q: Is finding a prime number too inefficient (but I think that finding the next prime going over n+=2 and testing for primality using modulo is O(loglogN) which is cheap)

预先计算素数哈希表大小的某些子集是一种很好的做法，可以在运行时使用二进制搜索在它们之间进行选择。参见 the list double hash capacities and explaination , QHash capacities .或者，甚至使用 direct lookup , 那是非常快的。

Q: Or this is my misunderstanding and only certain hashtable variants only require prime table size?

是的，只有某些类型需要，见上文。

关于java - 为什么哈希表会通过加倍来调整大小？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30382783/

会通加倍 strong blockquote table java performance algorithm data-structures hashtable

有关java - 为什么哈希表会通过加倍来调整大小？的更多相关文章

ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法？ - 2
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
ruby-on-rails - 在 Rails 中将文件大小字符串转换为等效千字节 - 2
我的目标是转换表单输入，例如“100兆字节”或“1GB”，并将其转换为我可以存储在数据库中的文件大小(以千字节为单位)。目前，我有这个:defquota_convert@regex=/([0-9]+)(.*)s/@sizes=%w{kilobytemegabytegigabyte}m=self.quota.match(@regex)if@sizes.include?m[2]eval("self.quota=#{m[1]}.#{m[2]}")endend这有效，但前提是输入是倍数(“gigabytes”，而不是“gigabyte”)并且由于使用了eval看起来疯狂不安全。所以，功能正常，
ruby-on-rails - Rails - 子类化模型的设计模式是什么？ - 2
我有一个模型:classItem项目有一个属性“商店”基于存储的值，我希望Item对象对特定方法具有不同的行为。Rails中是否有针对此的通用设计模式？如果方法中没有大的if-else语句，这是如何干净利落地完成的？最佳答案通常通过Single-TableInheritance. 关于ruby-on-rails-Rails-子类化模型的设计模式是什么？，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.co
ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby 中生成它们？ - 2
我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%
ruby - 解析 RDFa、微数据等的最佳方式是什么，使用统一的模式/词汇(例如 schema.org)存储和显示信息 - 2
我主要使用Ruby来执行此操作，但到目前为止我的攻击计划如下:使用gemsrdf、rdf-rdfa和rdf-microdata或mida来解析给定任何URI的数据。我认为最好映射到像schema.org这样的统一模式，例如使用这个yaml文件，它试图描述数据词汇表和opengraph到schema.org之间的转换:#SchemaXtoschema.orgconversion#data-vocabularyDV:name:namestreet-address:streetAddressregion:addressRegionlocality:addressLocalityphoto:i
ruby - 为什么 4.1%2 使用 Ruby 返回 0.0999999999999996？但是 4.2%2==0.2 - 2
为什么4.1%2返回0.0999999999999996？但是4.2%2==0.2。最佳答案参见此处:WhatEveryProgrammerShouldKnowAboutFloating-PointArithmetic实数是无限的。计算机使用的位数有限(今天是32位、64位)。因此计算机进行的浮点运算不能代表所有的实数。0.1是这些数字之一。请注意，这不是与Ruby相关的问题，而是与所有编程语言相关的问题，因为它来自计算机表示实数的方式。关于ruby-为什么4.1%2使用Ruby返
ruby - ruby 中的 TOPLEVEL_BINDING 是什么？ - 2
它不等于主线程的binding，这个toplevel作用域是什么？此作用域与主线程中的binding有何不同？>ruby-e'putsTOPLEVEL_BINDING===binding'false 最佳答案事实是，TOPLEVEL_BINDING始终引用Binding的预定义全局实例，而Kernel#binding创建的新实例>Binding每次封装当前执行上下文。在顶层，它们都包含相同的绑定(bind)，但它们不是同一个对象，您无法使用==或===测试它们的绑定(bind)相等性。putsTOPLEVEL_BINDINGput
ruby - Infinity 和 NaN 的类型是什么？ - 2
我可以得到Infinity和NaNn=9.0/0#=>Infinityn.class#=>Floatm=0/0.0#=>NaNm.class#=>Float但是当我想直接访问Infinity或NaN时:Infinity#=>uninitializedconstantInfinity(NameError)NaN#=>uninitializedconstantNaN(NameError)什么是Infinity和NaN？它们是对象、关键字还是其他东西？最佳答案您看到打印为Infinity和NaN的只是Float类的两个特殊实例的字符串
ruby-on-rails - 如果 Object::try 被发送到一个 nil 对象，为什么它会起作用？ - 2
如果您尝试在Ruby中的nil对象上调用方法，则会出现NoMethodError异常并显示消息:"undefinedmethod‘...’fornil:NilClass"然而，有一个tryRails中的方法，如果它被发送到一个nil对象，它只返回nil:require'rubygems'require'active_support/all'nil.try(:nonexisting_method)#noNoMethodErrorexceptionanymore那么try如何在内部工作以防止该异常？最佳答案像Ruby中的所有其他对象
ruby - 为什么 SecureRandom.uuid 创建一个唯一的字符串？ - 2
关闭。这个问题需要detailsorclarity.它目前不接受答案。想改进这个问题吗？通过editingthispost添加细节并澄清问题.关闭8年前。Improvethisquestion为什么SecureRandom.uuid创建一个唯一的字符串？SecureRandom.uuid#=>"35cb4e30-54e1-49f9-b5ce-4134799eb2c0"SecureRandom.uuid方法创建的字符串从不重复？

java - 为什么哈希表会通过加倍来调整大小？

有关java - 为什么哈希表会通过加倍来调整大小？的更多相关文章

随机推荐