c++ - 是否已经实现了 C++17 并行算法？

coder 2023-06-01 原文

我试图使用 C++17 标准中提出的新并行库功能，但我无法让它工作。我尝试使用 g++ 8.1.1 的最新版本进行编译和 clang++-6.0和 -std=c++17 ，但似乎都不支持#include <execution> , std::execution::par或任何类似的东西。

查看 cppreference 时对于并行算法，有一长串算法，声称

Technical specification provides parallelized versions of the following 69 algorithms from algorithm, numeric and memory: ( ... long list ...)

听起来算法已经准备好“纸上谈兵”，但还没有准备好使用？

在 this SO question一年多以前，答案声称这些功能尚未实现。但到现在为止，我预计会看到某种实现。有什么我们可以使用的吗？

最佳答案

GCC 9 有，但您必须单独安装 TBB

在 Ubuntu 19.10 中，所有组件终于对齐:

GCC 9 is the default one ，以及 TBB 所需的最低版本
TBB(Intel Thread Building Blocks)为 2019~U8-1，因此满足 2018 年的最低要求

所以你可以简单地做:

sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out

并用作:

#include <execution>
#include <algorithm>

std::sort(std::execution::par_unseq, input.begin(), input.end());

另请参阅下面的完整可运行基准。

GCC 9 和 TBB 2018 是第一个工作的版本，如发行说明中所述:https://gcc.gnu.org/gcc-9/changes.html

Parallel algorithms and <execution> (requires Thread Building Blocks 2018 or newer).

相关话题:

Ubuntu 18.04 安装

Ubuntu 18.04 涉及更多:

GCC 9 can be obtained from a trustworthy PPA ，所以还不错
待定is at version 2017 ，这不起作用，我找不到值得信赖的 PPA。从源代码编译很容易，但是没有安装目标很烦人...

以下是针对 Ubuntu 18.04 的全自动测试命令:

# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9

# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"

# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L 
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out

测试程序分析

我已经用这个比较并行和串行排序速度的程序进行了测试。

main.cpp

#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>

int main(int argc, char **argv) {
    using clk = std::chrono::high_resolution_clock;
    decltype(clk::now()) start, end;
    std::vector<unsigned long long> input_parallel, input_serial;
    unsigned int seed;
    unsigned long long n;

    // CLI arguments;
    std::uniform_int_distribution<uint64_t> zero_ull_max(0);
    if (argc > 1) {
        n = std::strtoll(argv[1], NULL, 0);
    } else {
        n = 10;
    }
    if (argc > 2) {
        seed = std::stoi(argv[2]);
    } else {
        seed = std::random_device()();
    }

    std::mt19937 prng(seed);
    for (unsigned long long i = 0; i < n; ++i) {
        input_parallel.push_back(zero_ull_max(prng));
    }
    input_serial = input_parallel;

    // Sort and time parallel.
    start = clk::now();
    std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
    end = clk::now();
    std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;

    // Sort and time serial.
    start = clk::now();
    std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
    end = clk::now();
    std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;

    assert(input_parallel == input_serial);
}

在 Ubuntu 19.10 上，带有 CPU 的 Lenovo ThinkPad P51 笔记本电脑:Intel Core i7-7820HQ CPU(4 核/8 线程，2.90 GHz 基础，8 MB 缓存)，RAM:2x Samsung M471A2K43BB1-CRC(2x 16GiB，2400 Mbps)具有 1 亿个待排序数字的输入的典型输出:

./main.out 100000000

曾经:

parallel 2.00886 s
serial 9.37583 s

所以并行版本快了大约 4.5 倍!另见:What do the terms "CPU bound" and "I/O bound" mean?

我们可以通过 strace 确认进程正在生成线程。 :

strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'

其中显示了几行类型:

[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788

另外，如果我注释掉串行版本并运行:

time ./main.out 100000000

我明白了:

real    0m5.135s
user    0m17.824s
sys     0m0.902s

其中confirms again that the algorithm was parallelized since real < user ，并介绍了它在我的系统中的并行化效率(大约 8 核的 3.5 倍)。

错误消息

嘿，谷歌，请索引这个。

如果你没有安装tbb，错误是:

In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
                 from /usr/include/c++/9/pstl/algorithm_impl.h:25,
                 from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
                 from /usr/include/c++/9/execution:32,
                 from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
   19 | #include <tbb/blocked_range.h>
      |          ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.

所以我们看到 <execution>依赖于已卸载的 TBB 组件。

如果 TBB 太旧，例如默认的 Ubuntu 18.04 之一，它失败了:

#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.

关于c++ - 是否已经实现了 C++17 并行算法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51031060/

amp 43 code include pre c++parallel-processing g++c++17 clang++

有关c++ - 是否已经实现了 C++17 并行算法？的更多相关文章

ruby-on-rails - 如何验证 update_all 是否实际在 Rails 中更新 - 2
给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息？最佳答案在Rails3中，update_all不返回任何有意义的信息，除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru
ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby - 检查数组是否在增加 - 2
这个问题在这里已经有了答案:Checktoseeifanarrayisalreadysorted?(8个答案)关闭9年前。我只是想知道是否有办法检查数组是否在增加？这是我的解决方案，但我正在寻找更漂亮的方法:n=-1@arr.flatten.each{|e|returnfalseife
ruby - 如何根据特征实现 FactoryGirl 的条件行为 - 2
我有一个用户工厂。我希望默认情况下确认用户。但是鉴于unconfirmed特征，我不希望它们被确认。虽然我有一个基于实现细节而不是抽象的工作实现，但我想知道如何正确地做到这一点。factory:userdoafter(:create)do|user,evaluator|#unwantedimplementationdetailshereunlessFactoryGirl.factories[:user].defined_traits.map(&:name).include?(:unconfirmed)user.confirm!endendtrait:unconfirmeddoenden
ruby - 检查字符串是否包含散列中的任何键并返回它包含的键的值 - 2
我有一个包含多个键的散列和一个字符串，该字符串不包含散列中的任何键或包含一个键。h={"k1"=>"v1","k2"=>"v2","k3"=>"v3"}s="thisisanexamplestringthatmightoccurwithakeysomewhereinthestringk1(withspecialcharacterslike(^&*$#@!^&&*))"检查s是否包含h中的任何键的最佳方法是什么，如果包含，则返回它包含的键的值？例如，对于上面的h和s的例子，输出应该是v1。编辑:只有字符串是用户定义的。哈希将始终相同。最佳答案
ruby-on-rails - Ruby 检查日期时间是否为 iso8601 并保存 - 2
我需要检查DateTime是否采用有效的ISO8601格式。喜欢:#iso8601?我检查了ruby是否有特定方法，但没有找到。目前我正在使用date.iso8601==date来检查这个。有什么好的方法吗？编辑解释我的环境，并改变问题的范围。因此，我的项目将使用jsapiFullCalendar，这就是我需要iso8601字符串格式的原因。我想知道更好或正确的方法是什么，以正确的格式将日期保存在数据库中，或者让ActiveRecord完成它们的工作并在我需要时间信息时对其进行操作。最佳答案我不太明白你的问题。我假设您想检查
ruby - 检查日期是否在过去 7 天内 - 2
我的日期格式如下:"%d-%m-%Y"(例如，今天的日期为07-09-2015)，我想看看是不是在过去的七天内。谁能推荐一种方法？最佳答案你可以这样做:require"date"Date.today-7 关于ruby-检查日期是否在过去7天内，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/32438063/
ruby - 如何验证 IO.copy_stream 是否成功 - 2
这里有一个很好的答案解释了如何在Ruby中下载文件而不将其加载到内存中:https://stackoverflow.com/a/29743394/4852737require'open-uri'download=open('http://example.com/image.png')IO.copy_stream(download,'~/image.png')我如何验证下载文件的IO.copy_stream调用是否真的成功——这意味着下载的文件与我打算下载的文件完全相同，而不是下载一半的损坏文件？documentation说IO.copy_stream返回它复制的字节数，但是当我还没有下
ruby - 是否可以覆盖 gemfile 进行本地开发？ - 2
我们的git存储库中目前有一个Gemfile。但是，有一个gem我只在我的环境中本地使用(我的团队不使用它)。为了使用它，我必须将它添加到我们的Gemfile中，但每次我checkout到我们的master/dev主分支时，由于与跟踪的gemfile冲突，我必须删除它。我想要的是类似Gemfile.local的东西，它将继承从Gemfile导入的gems，但也允许在那里导入新的gems以供使用只有我的机器。此文件将在.gitignore中被忽略。这可能吗？最佳答案设置BUNDLE_GEMFILE环境变量:BUNDLE_GEMFI
ruby - 在 Windows 机器上使用 Ruby 进行开发是否会适得其反？ - 2
这似乎非常适得其反，因为太多的gem会在window上破裂。我一直在处理很多mysql和ruby-mysqlgem问题(gem本身发生段错误，一个名为UnixSocket的类显然在Windows机器上不能正常工作，等等)。我只是在浪费时间吗？我应该转向不同的脚本语言吗？最佳答案我在Windows上使用Ruby的经验很少，但是当我开始使用Ruby时，我是在Windows上，我的总体印象是它不是Windows原生系统。因此，在主要使用Windows多年之后，开始使用Ruby促使我切换回原来的系统Unix，这次是Linux。Rub

c++ - 是否已经实现了 C++17 并行算法？

有关c++ - 是否已经实现了 C++17 并行算法？的更多相关文章

随机推荐