c++ - 避免并行递归异步算法中的递归模板实例化溢出

coder 2024-02-05 原文

这个问题通过一个简化的例子更容易解释(因为我的真实情况远非“最小”):给定...

template <typename T>
void post_in_thread_pool(T&& f)

...函数模板，我想创建一个具有树状递归结构的并行异步算法。我将使用 std::count_if 编写以下结构的示例作为占位符。我将要使用的策略如下:

如果我检查的范围长度小于64 , 我将回到顺序 std::count_if功能。 (0)
如果它大于或等于 64 ，我将在线程池中生成一个作业，该作业在范围的左半部分递归，并在当前线程上计算范围的右半部分。 (1)
- 我将使用原子共享 int “等待”计算两半。 (2)
- 我将使用原子共享 int累积部分结果。 (3)

简化代码:

auto async_count_if(auto begin, auto end, auto predicate, auto continuation)
{
    // (0) Base case:  
    if(end - begin < 64)
    {
        continuation(std::count_if(begin, end, predicate));
        return;
    }

    // (1) Recursive case:
    auto counter = make_shared<atomic<int>>(2); // (2)
    auto cleanup = [=, accumulator = make_shared<atomic<int>>(0) /*(3)*/]
                   (int partial_result)
    {
        *accumulator += partial_result; 

        if(--*counter == 0)
        {
            continuation(*accumulator);
        }
    };

    const auto mid = std::next(i_begin, sz / 2);                

    post_in_thread_pool([=]
    {
        async_count_if(i_begin, mid, predicate, cleanup);
    });

    async_count_if(mid, i_end, predicate, cleanup);
}

然后可以按如下方式使用代码:

std::vector<int> v(512);
std::iota(std::begin(v), std::end(v), 0);

async_count_if{}(std::begin(v), std::end(v), 
/*    predicate */ [](auto x){ return x < 256; }, 
/* continuation */ [](auto res){ std::cout << res << std::endl; });

上面代码中的问题是auto cleanup .自 auto cleanup 的每个实例都将被推导为唯一类型lambda，并且自 cleanup捕获 cont按值...由于递归，将在编译时计算无限大的嵌套 lambda 类型，导致以下错误:

fatal error: recursive template instantiation exceeded maximum depth of 1024

wandbox example

从概念上讲，您可以大致像这样想像正在构建的类型:

cont                                // user-provided continuation
cleanup0<cont>                      // recursive step 0
cleanup1<cleanup0<cont>>            // recursive step 1
cleanup2<cleanup1<cleanup0<cont>>>  // recursive step 2
// ...

(!):记住 async_count_if 只是一个例子，展示我真实情况的“树状”递归结构。我知道异步 count_if可以使用单个原子计数器和 sz / 64 轻松实现任务。

我想避免错误，最小化任何可能的运行时或内存开销。

一个可能的解决方案是使用 std::function<void(int)> cleanup ，它允许代码正确编译和运行，但会产生次优汇编并引入额外的动态分配。 wandbox example
- 另一种可能的解决方案是使用 std::size_t模板参数+特化人为限制async_count_if::operator()的递归深度——不幸的是，这会使二进制大小膨胀并且非常不优雅。

令我困扰的是，当我调用 async_count_if 时，我知道范围的大小。 :是std::distance(i_begin, i_end) .如果我知道范围的大小，我还可以推断出所需计数器和延续的数量:(2^k - 1) , 其中k是递归树的深度。

因此，我认为在 async_count_if 的第一次调用中应该有一种预先计算“控制结构”的方法并通过引用将其传递给递归调用。这个“控制结构”可以包含足够的空间用于(2^k - 1)原子计数器和 (2^k - 1)清理/延续功能。

不幸的是，我找不到一个干净的方法来实现这个，并决定在这里发布一个问题，因为在开发异步并行递归算法时这个问题应该很常见。

在不引入不必要的开销的情况下处理这个问题的优雅方法是什么？

最佳答案

我肯定遗漏了一些非常明显的东西，但为什么你需要多个计数器和结构？你可以预先计算迭代的总数(如果你知道你的基本情况)并在所有迭代中与累加器一起共享它，a la(不得不稍微修改你的简化代码):

#include <algorithm>
#include <memory>
#include <vector>
#include <iostream>
#include <numeric>
#include <future>

using namespace std;

template <class T>
auto post_in_thread_pool(T&& work)
{
    std::async(std::launch::async, work);
}

template <class It, class Pred, class Cont>
auto async_count_if(It begin, It end, Pred predicate, Cont continuation)
{
    // (0) Base case:  
    if(end - begin <= 64)
    {
        continuation(std::count_if(begin, end, predicate));
        return;
    }

    const auto sz = std::distance(begin, end);
    const auto mid = std::next(begin, sz / 2);                

    post_in_thread_pool([=]
    {
         async_count_if(begin, mid, predicate, continuation);
    });

    async_count_if(mid, end, predicate, continuation);
}

template <class It, class Pred, class Cont>
auto async_count_if_facade(It begin, It end, Pred predicate, Cont continuation)
{
    // (1) Recursive case:
    const auto sz = std::distance(begin, end);
    auto counter = make_shared<atomic<int>>(sz / 64); // (fix this for mod 64 !=0 cases)
    auto cleanup = [=, accumulator = make_shared<atomic<int>>(0) /*(3)*/]
                   (int partial_result)
    {
        *accumulator += partial_result; 

        if(--*counter == 0)
        {
            continuation(*accumulator);
        }
    };

    return async_count_if(begin, end, predicate, cleanup);
}

int main ()
{
    std::vector<int> v(1024);
    std::iota(std::begin(v), std::end(v), 0);

    async_count_if_facade(std::begin(v), std::end(v), 
    /*    predicate */ [](auto x){ return x > 1000; }, 
    /* continuation */ [](const auto& res){ std::cout << res << std::endl; });
}

一些 demo

关于c++ - 避免并行递归异步算法中的递归模板实例化溢出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41572863/

amp 43 code strong std c++multithreading asynchronous recursion c++14

有关c++ - 避免并行递归异步算法中的递归模板实例化溢出的更多相关文章

ruby - 如何从 ruby 中的字符串运行任意对象方法？ - 2
总的来说，我对ruby还比较陌生，我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础，我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用
ruby - 其他文件中的 Rake 任务 - 2
我试图在一个项目中使用rake，如果我把所有东西都放到Rakefile中，它会很大并且很难读取/找到东西，所以我试着将每个命名空间放在lib/rake中它自己的文件中，我添加了这个到我的rake文件的顶部:Dir['#{File.dirname(__FILE__)}/lib/rake/*.rake'].map{|f|requiref}它加载文件没问题，但没有任务。我现在只有一个.rake文件作为测试，名为“servers.rake”，它看起来像这样:namespace:serverdotask:testdoputs"test"endend所以当我运行rakeserver:testid时
ruby-on-rails - Ruby net/ldap 模块中的内存泄漏 - 2
作为我的Rails应用程序的一部分，我编写了一个小导入程序，它从我们的LDAP系统中吸取数据并将其塞入一个用户表中。不幸的是，与LDAP相关的代码在遍历我们的32K用户时泄漏了大量内存，我一直无法弄清楚如何解决这个问题。这个问题似乎在某种程度上与LDAP库有关，因为当我删除对LDAP内容的调用时，内存使用情况会很好地稳定下来。此外，不断增加的对象是Net::BER::BerIdentifiedString和Net::BER::BerIdentifiedArray，它们都是LDAP库的一部分。当我运行导入时，内存使用量最终达到超过1GB的峰值。如果问题存在，我需要找到一些方法来更正我的代
ruby-on-rails - Rails 3 中的多个路由文件 - 2
Rails2.3可以选择随时使用RouteSet#add_configuration_file添加更多路由。是否可以在Rails3项目中做同样的事情？最佳答案在config/application.rb中:config.paths.config.routes在Rails3.2(也可能是Rails3.1)中，使用:config.paths["config/routes"] 关于ruby-on-rails-Rails3中的多个路由文件，我们在StackOverflow上找到一个类似的问题
ruby-on-rails - Rails - 一个 View 中的多个模型 - 2
我需要从一个View访问多个模型。以前，我的links_controller仅用于提供以不同方式排序的链接资源。现在我想包括一个部分(我假设)显示按分数排序的顶级用户(@users=User.all.sort_by(&:score))我知道我可以将此代码插入每个链接操作并从View访问它，但这似乎不是“ruby方式”，我将需要在不久的将来访问更多模型。这可能会变得很脏，是否有针对这种情况的任何技术？注意事项:我认为我的应用程序正朝着单一格式和动态页面内容的方向发展，本质上是一个典型的网络应用程序。我知道before_filter但考虑到我希望应用程序进入的方向，这似乎很麻烦。最终从任何
ruby-on-rails - Rails 3.2.1 中 ActionMailer 中的未定义方法 'default_content_type=' - 2
我在我的项目中添加了一个系统来重置用户密码并通过电子邮件将密码发送给他，以防他忘记密码。昨天它运行良好(当我实现它时)。当我今天尝试启动服务器时，出现以下错误。=>BootingWEBrick=>Rails3.2.1applicationstartingindevelopmentonhttp://0.0.0.0:3000=>Callwith-dtodetach=>Ctrl-CtoshutdownserverExiting/Users/vinayshenoy/.rvm/gems/ruby-1.9.3-p0/gems/actionmailer-3.2.1/lib/action_mailer
ruby-on-rails - 如何在 ruby 中使用两个参数异步运行 exe？ - 2
exe应该在我打开页面时运行。异步进程需要运行。有什么方法可以在ruby中使用两个参数异步运行exe吗？我已经尝试过ruby命令-system()、exec()但它正在等待过程完成。我需要用参数启动exe，无需等待进程完成是否有任何rubygems会支持我的问题？最佳答案您可以使用Process.spawn和Process.wait2:pid=Process.spawn'your.exe','--option'#Later...pid,status=Process.wait2pid您的程序将作为解释器的子进程执行。除
ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby-on-rails - 如何使用 instance_variable_set 正确设置实例变量？ - 2
我正在查看instance_variable_set的文档并看到给出的示例代码是这样做的:obj.instance_variable_set(:@instnc_var,"valuefortheinstancevariable")然后允许您在类的任何实例方法中以@instnc_var的形式访问该变量。我想知道为什么在@instnc_var之前需要一个冒号:。冒号有什么作用？最佳答案我的第一直觉是告诉你不要使用instance_variable_set除非你真的知道你用它做什么。它本质上是一种元编程工具或绕过实例变量可见性的黑客攻击
ruby 正则表达式 - 如何替换字符串中匹配项的第 n 个实例 - 2
在我的应用程序中，我需要能够找到所有数字子字符串，然后扫描每个子字符串，找到第一个匹配范围(例如5到15之间)的子字符串，并将该实例替换为另一个字符串“X”。我的测试字符串s="1foo100bar10gee1"我的初始模式是1个或多个数字的任何字符串，例如，re=Regexp.new(/\d+/)matches=s.scan(re)给出["1","100","10","1"]如果我想用“X”替换第N个匹配项，并且只替换第N个匹配项，我该怎么做？例如，如果我想替换第三个匹配项“10”(匹配项[2])，我不能只说s[matches[2]]="X"因为它做了两次替换“1fooX0barXg

c++ - 避免并行递归异步算法中的递归模板实例化溢出

有关c++ - 避免并行递归异步算法中的递归模板实例化溢出的更多相关文章

随机推荐