java - 为什么 Arrays.binarySearch 与遍历数组相比没有提高性能？

coder 2023-05-18 原文

我尝试解决 Hackerland Radio Transmitters programming challange .

总而言之，挑战如下:

Hackerland is a one-dimensional city with n houses, where each house i is located at some x_i on the x-axis. The Mayor wants to install radio transmitters on the roofs of the city's houses. Each transmitter has a range, k, meaning it can transmit a signal to all houses ≤ k units of distance away.

Given a map of Hackerland and the value of k, can you find the minimum number of transmitters needed to cover every house?

我的实现如下:

package biz.tugay;

import java.util.*;

public class HackerlandRadioTransmitters {

    public static int minNumOfTransmitters(int[] houseLocations, int transmitterRange) {
        // Sort and remove duplicates..
        houseLocations = uniqueHouseLocationsSorted(houseLocations);
        int towerCount = 0;
        for (int nextHouseNotCovered = 0; nextHouseNotCovered < houseLocations.length; ) {
            final int towerLocation = HackerlandRadioTransmitters.findNextTowerIndex(houseLocations, nextHouseNotCovered, transmitterRange);
            towerCount++;
            nextHouseNotCovered = HackerlandRadioTransmitters.nextHouseNotCoveredIndex(houseLocations, towerLocation, transmitterRange);
            if (nextHouseNotCovered == -1) {
                break;
            }
        }
        return towerCount;
    }

    public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
        final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
        final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
        int towerIndex = houseNotCoveredIndex;
        int loop = 0;
        while (true) {
            loop++;
            if (towerIndex == houseLocations.length - 1) {
                break;
            }
            if (farthestHouseLocationAllowed >= houseLocations[towerIndex + 1]) {
                towerIndex++;
                continue;
            }
            break;
        }
        System.out.println("findNextTowerIndex looped : " + loop);
        return towerIndex;
    }

    public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
        final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
        int notCoveredHouseIndex = towerIndex + 1;
        int loop = 0;
        while (notCoveredHouseIndex < houseLocations.length) {
            loop++;
            final int locationOfHouseBeingChecked = houseLocations[notCoveredHouseIndex];
            if (locationOfHouseBeingChecked > towerCoversUntil) {
                break; // Tower does not cover the house anymore, break the loop..
            }
            notCoveredHouseIndex++;
        }
        if (notCoveredHouseIndex == houseLocations.length) {
            notCoveredHouseIndex = -1;
        }
        System.out.println("nextHouseNotCoveredIndex looped : " + loop);
        return notCoveredHouseIndex;
    }

    public static int[] uniqueHouseLocationsSorted(final int[] houseLocations) {
        Arrays.sort(houseLocations);
        final HashSet<Integer> integers = new HashSet<>();
        final int[] houseLocationsUnique = new int[houseLocations.length];

        int innerCounter = 0;
        for (int houseLocation : houseLocations) {
            if (integers.contains(houseLocation)) {
                continue;
            }
            houseLocationsUnique[innerCounter] = houseLocation;
            integers.add(houseLocationsUnique[innerCounter]);
            innerCounter++;
        }
        return Arrays.copyOf(houseLocationsUnique, innerCounter);
    }
}

我很确定这个实现是正确的。但是请看函数中的细节: findNextTowerIndex 和 nextHouseNotCoveredIndex:它们会一一遍历数组!

我的一个测试如下:

static void test_01() throws FileNotFoundException {
    final long start = System.currentTimeMillis();
    final File file = new File("input.txt");
    final Scanner scanner = new Scanner(file);
    int[] houseLocations = new int[73382];
    for (int counter = 0; counter < 73382; counter++) {
        houseLocations[counter] = scanner.nextInt();
    }
    final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
    final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381);
    assert minNumOfTransmitters == 1;
    final long end = System.currentTimeMillis();
    System.out.println("Took: " + (end - start) + " milliseconds..");
}

input.txt 可以从 here 下载. (这不是这个问题中最重要的细节，但仍然..)所以我们有一个 73382 房屋的数组，我特意设置了发射器范围，所以我的方法循环了很多:

这是在我的机器上测试的示例输出:

findNextTowerIndex looped : 38213
nextHouseNotCoveredIndex looped : 13785
Took: 359 milliseconds..

我也有这个测试，它不断言任何东西，只是保持时间:

static void test_02() throws FileNotFoundException {
    final long start = System.currentTimeMillis();
    for (int i = 0; i < 400; i ++) {
        final File file = new File("input.txt");
        final Scanner scanner = new Scanner(file);
        int[] houseLocations = new int[73382];
        for (int counter = 0; counter < 73382; counter++) {
            houseLocations[counter] = scanner.nextInt();
        }
        final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);

        final int transmitterRange = ThreadLocalRandom.current().nextInt(1, 70000);
        final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
    }
    final long end = System.currentTimeMillis();
    System.out.println("Took: " + (end - start) + " milliseconds..");
}

我随机创建 400 个发射器范围，并运行程序 400 次。我将在我的机器中获得如下运行时间。

Took: 20149 milliseconds..

所以现在，我说，我为什么不使用二进制搜索而不是遍历数组，并将我的实现更改如下:

public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
    final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
    final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
    int nextTowerIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, farthestHouseLocationAllowed);

    if (nextTowerIndex < 0) {
        nextTowerIndex = -nextTowerIndex;
        nextTowerIndex = nextTowerIndex -2;
    }

    return nextTowerIndex;
}

public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
    final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
    int nextHouseNotCoveredIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, towerCoversUntil);

    if (-nextHouseNotCoveredIndex > houseLocations.length) {
        return -1;
    }

    if (nextHouseNotCoveredIndex < 0) {
        nextHouseNotCoveredIndex = - (nextHouseNotCoveredIndex + 1);
        return nextHouseNotCoveredIndex;
    }

    return nextHouseNotCoveredIndex + 1;
}

我期待性能大幅提升，因为现在我最多会循环 log(N) 次，而不是 O(N).. 所以 test_01 输出:

Took: 297 milliseconds..

请记住，之前是 359 毫秒。对于 test_02:

Took: 18047 milliseconds..

所以我总是在 20 秒左右获得数组遍历实现的值，而对于二分搜索实现，我总是在 18 - 19 秒左右。

我期待使用 Arrays.binarySearch 获得更好的性能提升，但显然不是这样，这是为什么呢？我错过了什么？我是否需要一个超过 73382 的数组才能看到好处，还是无关紧要？

编辑#01

在@huck_cussler 发表评论后，我尝试将我拥有的数据集(使用随机数)加倍和三倍，并尝试运行 test02(当然，在测试本身中将数组大小增加三倍..)。对于线性实现，时间是这样的:

Took: 18789 milliseconds..
Took: 34396 milliseconds..
Took: 53504 milliseconds..

对于二分搜索实现，我得到的值如下:

Took: 18644 milliseconds..
Took: 33831 milliseconds..
Took: 52886 milliseconds..

最佳答案

您的时间包括从硬盘驱动器中检索数据。这可能会占用您的大部分运行时间。从您的时间中省略数据加载，以便更准确地比较您的两种方法。想象一下，如果它需要 18 秒，并且您将 18.644 与 18.789(0.77% 改进)而不是 0.644 与 0.789(18.38% 改进)进行比较。

如果你有一个线性运算 O(n)，例如加载一个二进制结构，并且你将它与二进制搜索 O(log n) 结合起来，你最终会得到 O(n)。如果您相信大 O 表示法，那么您应该期望 O(n + log n) 与 O(2 * n) 没有显着差异，因为它们都减少到 O(n)。

此外，二分搜索可能比线性搜索执行得更好或更差，具体取决于塔之间的房屋密度。假设有 1024 座房屋，每 4 座房屋均匀分布有一座塔。线性搜索每塔需要 4 步，而二分搜索每塔需要 log2(1024)=10 步。

还有一件事……您的 minNumOfTransmitters 方法正在对从 test_01 和 test_02 传入的已排序数组进行排序。该诉诸步骤比您的搜索本身花费的时间更长，这进一步掩盖了您的两种搜索算法之间的时间差异。

======

我创建了一个小型计时类(class)，以便更好地了解正在发生的事情。我已经从 minNumOfTransmitters 中删除了这行代码以防止它重新运行排序，并添加了一个 boolean 参数来选择是否使用您的二进制版本。它总计 400 次迭代的总和，分离出每个步骤。我系统上的结果表明，加载时间使排序时间相形见绌，而排序时间又使求解时间相形见绌。

  Load:  22.565s
  Sort:   4.518s
Linear:   0.012s
Binary:   0.003s

很容易看出优化最后一步对整体运行时间没有太大影响。

private static class Timing {
    public long load=0;
    public long sort=0;
    public long solve1=0;
    public long solve2=0;
    private String secs(long millis) {
        return String.format("%3d.%03ds", millis/1000, millis%1000);
    }
    public String toString() {
        return "  Load: " + secs(load) + "\n  Sort: " + secs(sort) + "\nLinear: " + secs(solve1) + "\nBinary: " + secs(solve2);
    }
    public void add(Timing timing) {
        load+=timing.load;
        sort+=timing.sort;
        solve1+=timing.solve1;
        solve2+=timing.solve2;
    }
}

static Timing test_01() throws FileNotFoundException {
    Timing timing=new Timing();
    long start = System.currentTimeMillis();
    final File file = new File("c:\\path\\to\\xnpwdiG3.txt");
    final Scanner scanner = new Scanner(file);
    int[] houseLocations = new int[73382];
    for (int counter = 0; counter < 73382; counter++) {
        houseLocations[counter] = scanner.nextInt();
    }
    timing.load+=System.currentTimeMillis()-start;
    start=System.currentTimeMillis();
    final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
    timing.sort=System.currentTimeMillis()-start;
    start=System.currentTimeMillis();
    final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381, false);
    timing.solve1=System.currentTimeMillis()-start;
    start=System.currentTimeMillis();
    final int minNumOfTransmittersBin = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381, true);
    timing.solve2=System.currentTimeMillis()-start;
    final long end = System.currentTimeMillis();
    return timing;
}

关于java - 为什么 Arrays.binarySearch 与遍历数组相比没有提高性能？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43620487/

有关java - 为什么 Arrays.binarySearch 与遍历数组相比没有提高性能？的更多相关文章

ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法？ - 2
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
ruby-on-rails - Rails - 子类化模型的设计模式是什么？ - 2
我有一个模型:classItem项目有一个属性“商店”基于存储的值，我希望Item对象对特定方法具有不同的行为。Rails中是否有针对此的通用设计模式？如果方法中没有大的if-else语句，这是如何干净利落地完成的？最佳答案通常通过Single-TableInheritance. 关于ruby-on-rails-Rails-子类化模型的设计模式是什么？，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.co
ruby-on-rails - 在 Ruby 中循环遍历多个数组 - 2
我有多个ActiveRecord子类Item的实例数组，我需要根据最早的事件循环打印。在这种情况下，我需要打印付款和维护日期，如下所示:ItemAmaintenancerequiredin5daysItemBpaymentrequiredin6daysItemApaymentrequiredin7daysItemBmaintenancerequiredin8days我目前有两个查询，用于查找maintenance和payment项目(非排他性查询)，并输出如下内容:paymentrequiredin...maintenancerequiredin...有什么方法可以改善上述(丑陋的)代
ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby 中生成它们？ - 2
我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%
ruby - 解析 RDFa、微数据等的最佳方式是什么，使用统一的模式/词汇(例如 schema.org)存储和显示信息 - 2
我主要使用Ruby来执行此操作，但到目前为止我的攻击计划如下:使用gemsrdf、rdf-rdfa和rdf-microdata或mida来解析给定任何URI的数据。我认为最好映射到像schema.org这样的统一模式，例如使用这个yaml文件，它试图描述数据词汇表和opengraph到schema.org之间的转换:#SchemaXtoschema.orgconversion#data-vocabularyDV:name:namestreet-address:streetAddressregion:addressRegionlocality:addressLocalityphoto:i
ruby - 多次弹出/移动 ruby 数组 - 2
我的代码目前看起来像这样numbers=[1,2,3,4,5]defpop_threepop=[]3.times{pop有没有办法在一行中完成pop_three方法中的内容？我基本上想做类似numbers.slice(0,3)的事情，但要删除切片中的数组项。嗯...嗯，我想我刚刚意识到我可以试试slice! 最佳答案是numbers.pop(3)或者numbers.shift(3)如果你想要另一边。关于ruby-多次弹出/移动ruby数组，我们在StackOverflow上找到一
ruby - 将数组的内容转换为 int - 2
我需要读入一个包含数字列表的文件。此代码读取文件并将其放入二维数组中。现在我需要获取数组中所有数字的平均值，但我需要将数组的内容更改为int。有什么想法可以将to_i方法放在哪里吗？ClassTerraindefinitializefile_name@input=IO.readlines(file_name)#readinfile@size=@input[0].to_i@land=[@size]x=1whilex 最佳答案只需将数组映射为整数:@land边注如果你想得到一条线的平均值，你可以这样做:values=@input[x]
ruby - 为什么 4.1%2 使用 Ruby 返回 0.0999999999999996？但是 4.2%2==0.2 - 2
为什么4.1%2返回0.0999999999999996？但是4.2%2==0.2。最佳答案参见此处:WhatEveryProgrammerShouldKnowAboutFloating-PointArithmetic实数是无限的。计算机使用的位数有限(今天是32位、64位)。因此计算机进行的浮点运算不能代表所有的实数。0.1是这些数字之一。请注意，这不是与Ruby相关的问题，而是与所有编程语言相关的问题，因为它来自计算机表示实数的方式。关于ruby-为什么4.1%2使用Ruby返
ruby - 通过 erb 模板输出 ruby 数组 - 2
我正在使用puppet为ruby程序提供一组常量。我需要提供一组主机名，我的程序将对其进行迭代。在我之前使用的bash脚本中，我只是将它作为一个puppet变量hosts=>"host1,host2"我将其提供给bash脚本作为HOSTS=显然这对ruby不太适用——我需要它的格式hosts=["host1","host2"]自从phosts和putsmy_array.inspect提供输出["host1","host2"]我希望使用其中之一。不幸的是，我终其一生都无法弄清楚如何让它发挥作用。我尝试了以下各项:我发现某处他们指出我需要在函数调用前放置“function_”……这
ruby - 检查数组是否在增加 - 2
这个问题在这里已经有了答案:Checktoseeifanarrayisalreadysorted?(8个答案)关闭9年前。我只是想知道是否有办法检查数组是否在增加？这是我的解决方案，但我正在寻找更漂亮的方法:n=-1@arr.flatten.each{|e|returnfalseife

java - 为什么 Arrays.binarySearch 与遍历数组相比没有提高性能？

有关java - 为什么 Arrays.binarySearch 与遍历数组相比没有提高性能？的更多相关文章

随机推荐