c++ - 当一个工作线程失败时，如何中止剩余的工作人员？

coder 2023-05-31 原文

我有一个程序，它产生多个线程，每个线程都执行一个长时间运行的任务。然后主线程等待所有工作线程加入，收集结果，然后退出。

如果其中一个工作人员发生错误，我希望其余工作人员优雅地停止，以便主线程可以稍后退出。

我的问题是，当长期运行的任务的实现由我无法修改其代码的库提供时，如何最好地做到这一点。

这是一个简单的系统草图，没有错误处理:

void threadFunc()
{
    // Do long-running stuff
}

void mainFunc()
{
    std::vector<std::thread> threads;

    for (int i = 0; i < 3; ++i) {
        threads.push_back(std::thread(&threadFunc));
    }

    for (auto &t : threads) {
        t.join();
    }
}

如果长时间运行的函数执行一个循环并且我可以访问代码，那么只需在每次迭代顶部检查共享的“继续运行”标志即可中止执行。

std::mutex mutex;
bool error;

void threadFunc()
{
    try {
        for (...) {
            {
                std::unique_lock<std::mutex> lock(mutex);
                if (error) {
                    break;
                }
            }
        }
    } catch (std::exception &) {
        std::unique_lock<std::mutex> lock(mutex);
        error = true;
    }
}

现在考虑由库提供长时间运行的操作的情况:

std::mutex mutex;
bool error;

class Task
{
public:
    // Blocks until completion, error, or stop() is called
    void run();

    void stop();
};

void threadFunc(Task &task)
{
    try {
        task.run();
    } catch (std::exception &) {
        std::unique_lock<std::mutex> lock(mutex);
        error = true;
    }
}

在这种情况下，主线程必须处理错误，并调用 stop() on 仍在运行的任务。因此，它不能简单地等待每个 worker join() 和原来的实现一样。

到目前为止我使用的方法是在主线程和每个worker:

struct SharedData
{
    std::mutex mutex;
    std::condition_variable condVar;
    bool error;
    int running;
}

当工作人员成功完成时，它会减少 running 计数。如果捕获到异常时，worker 设置 error 标志。在这两种情况下，它然后调用 condVar.notify_one()。

然后主线程等待条件变量，如果有则唤醒 error 已设置或 running 达到零。醒来时，主线程如果设置了 error，则在所有任务上调用 stop()。

这种方法有效，但我觉得应该有一个更清洁的解决方案，使用一些标准并发库中的高级原语。能有人建议改进的实现吗？

这是我当前解决方案的完整代码:

// main.cpp

#include <chrono>
#include <mutex>
#include <thread>
#include <vector>

#include "utils.h"

// Class which encapsulates long-running task, and provides a mechanism for aborting it
class Task
{
public:
    Task(int tidx, bool fail)
    :   tidx(tidx)
    ,   fail(fail)
    ,   m_run(true)
    {

    }

    void run()
    {
        static const int NUM_ITERATIONS = 10;

        for (int iter = 0; iter < NUM_ITERATIONS; ++iter) {
            {
                std::unique_lock<std::mutex> lock(m_mutex);
                if (!m_run) {
                    out() << "thread " << tidx << " aborting";
                    break;
                }
            }

            out() << "thread " << tidx << " iter " << iter;
            std::this_thread::sleep_for(std::chrono::milliseconds(100));

            if (fail) {
                throw std::exception();
            }
        }
    }

    void stop()
    {
        std::unique_lock<std::mutex> lock(m_mutex);
        m_run = false;
    }

    const int tidx;
    const bool fail;

private:
    std::mutex m_mutex;
    bool m_run;
};

// Data shared between all threads
struct SharedData
{
    std::mutex mutex;
    std::condition_variable condVar;
    bool error;
    int running;

    SharedData(int count)
    :   error(false)
    ,   running(count)
    {

    }
};

void threadFunc(Task &task, SharedData &shared)
{
    try {
        out() << "thread " << task.tidx << " starting";

        task.run(); // Blocks until task completes or is aborted by main thread

        out() << "thread " << task.tidx << " ended";
    } catch (std::exception &) {
        out() << "thread " << task.tidx << " failed";

        std::unique_lock<std::mutex> lock(shared.mutex);
        shared.error = true;
    }

    {
        std::unique_lock<std::mutex> lock(shared.mutex);
        --shared.running;
    }

    shared.condVar.notify_one();
}

int main(int argc, char **argv)
{
    static const int NUM_THREADS = 3;

    std::vector<std::unique_ptr<Task>> tasks(NUM_THREADS);
    std::vector<std::thread> threads(NUM_THREADS);

    SharedData shared(NUM_THREADS);

    for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
        const bool fail = (tidx == 1);
        tasks[tidx] = std::make_unique<Task>(tidx, fail);
        threads[tidx] = std::thread(&threadFunc, std::ref(*tasks[tidx]), std::ref(shared));
    }

    {
        std::unique_lock<std::mutex> lock(shared.mutex);

        // Wake up when either all tasks have completed, or any one has failed
        shared.condVar.wait(lock, [&shared](){
            return shared.error || !shared.running;
        });

        if (shared.error) {
            out() << "error occurred - terminating remaining tasks";
            for (auto &t : tasks) {
                t->stop();
            }
        }
    }

    for (int tidx = 0; tidx < NUM_THREADS; ++tidx) {
        out() << "waiting for thread " << tidx << " to join";
        threads[tidx].join();
        out() << "thread " << tidx << " joined";
    }

    out() << "program complete";

    return 0;
}

这里定义了一些实用函数:

// utils.h

#include <iostream>
#include <mutex>
#include <thread>

#ifndef UTILS_H
#define UTILS_H

#if __cplusplus <= 201103L
// Backport std::make_unique from C++14
#include <memory>
namespace std {

template<typename T, typename ...Args>
std::unique_ptr<T> make_unique(
            Args&& ...args)
{
    return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}

} // namespace std
#endif // __cplusplus <= 201103L

// Thread-safe wrapper around std::cout
class ThreadSafeStdOut
{
public:
    ThreadSafeStdOut()
    :   m_lock(m_mutex)
    {

    }

    ~ThreadSafeStdOut()
    {
        std::cout << std::endl;
    }

    template <typename T>
    ThreadSafeStdOut &operator<<(const T &obj)
    {
        std::cout << obj;
        return *this;
    }

private:
    static std::mutex m_mutex;
    std::unique_lock<std::mutex> m_lock;
};

std::mutex ThreadSafeStdOut::m_mutex;

// Convenience function for performing thread-safe output
ThreadSafeStdOut out()
{
    return ThreadSafeStdOut();
}

#endif // UTILS_H

最佳答案

我一直在考虑您的情况，这可能对您有所帮助。您可能可以尝试使用几种不同的方法来实现您的目标。有 2-3 个选项可能有用，或者是所有三个选项的组合。我将至少展示第一个选项，因为我仍在学习并尝试掌握模板特化的概念以及使用 Lambda。

使用管理器类
使用模板特化封装
使用 Lambda。

Manager 类的伪代码如下所示:

class ThreadManager {
private:
    std::unique_ptr<MainThread> mainThread_;
    std::list<std::shared_ptr<WorkerThread> lWorkers_;  // List to hold finished workers
    std::queue<std::shared_ptr<WorkerThread> qWorkers_; // Queue to hold inactive and waiting threads.
    std::map<unsigned, std::shared_ptr<WorkerThread> mThreadIds_; // Map to associate a WorkerThread with an ID value.
    std::map<unsigned, bool> mFinishedThreads_; // A map to keep track of finished and unfinished threads.

    bool threadError_; // Not needed if using exception handling
public:
    explicit ThreadManager( const MainThread& main_thread );

    void shutdownThread( const unsigned& threadId );
    void shutdownAllThreads();

    void addWorker( const WorkerThread& worker_thread );          
    bool isThreadDone( const unsigned& threadId );

    void spawnMainThread() const; // Method to start main thread's work.

    void spawnWorkerThread( unsigned threadId, bool& error );

    bool getThreadError( unsigned& threadID ); // Returns True If Thread Encountered An Error and passes the ID of that thread, 

};

仅出于演示目的，为了结构简单，我使用 bool 值来确定线程是否失败，当然，如果您更喜欢使用异常或无效的无符号值等，这可以替换为您的喜欢。

现在使用这种类型的类将是这样的:另请注意，如果它是 Singleton 类型的对象，则认为这种类型的类会更好，因为您不想要超过 1 个 ManagerClass，因为您正在工作使用共享指针。

SomeClass::SomeClass( ... ) {
    // This class could contain a private static smart pointer of this Manager Class
    // Initialize the smart pointer giving it new memory for the Manager Class and by passing it a pointer of the Main Thread object

   threadManager_ = new ThreadManager( main_thread ); // Wouldn't actually use raw pointers here unless if you had a need to, but just shown for simplicity       
}

SomeClass::addThreads( ... ) {
    for ( unsigned u = 1, u <= threadCount; u++ ) {
         threadManager_->addWorker( some_worker_thread );
    }
}

SomeClass::someFunctionThatSpawnsThreads( ... ) {
    threadManager_->spawnMainThread();

    bool error = false;       
    for ( unsigned u = 1; u <= threadCount; u++ ) {
        threadManager_->spawnWorkerThread( u, error );

        if ( error ) { // This Thread Failed To Start, Shutdown All Threads
            threadManager->shutdownAllThreads();
        }
    }

    // If all threads spawn successfully we can do a while loop here to listen if one fails.
    unsigned threadId;
    while ( threadManager_->getThreadError( threadId ) ) {
         // If the function passed to this while loop returns true and we end up here, it will pass the id value of the failed thread.
         // We can now go through a for loop and stop all active threads.
         for ( unsigned u = threadID + 1; u <= threadCount; u++ ) {
             threadManager_->shutdownThread( u );
         }

         // We have successfully shutdown all threads
         break;
    }
}

我喜欢管理器类的设计，因为我在其他项目中使用过它们，并且它们经常派上用场，尤其是在使用包含许多和多种资源的代码库时，例如具有许多 Assets 的工作游戏引擎，例如如 Sprite 、纹理、音频文件、 map 、游戏项目等。使用管理器类有助于跟踪和维护所有 Assets 。同样的概念可以应用于“管理”事件、非事件、等待线程，并且知道如何直观地正确处理和关闭所有线程。如果您的代码库和库支持异常以及线程安全异常处理，而不是传递和使用 bool 值来处理错误，我建议您使用 ExceptionHandler。还拥有一个 Logger 类可以很好地写入日志文件和/或控制台窗口，以提供有关异常被抛出的函数以及导致异常的原因的显式消息，其中日志消息可能如下所示:

Exception Thrown: someFunctionNamedThis in ThisFile on Line# (x)
    threadID 021342 failed to execute.

通过这种方式，您可以查看日志文件并很快找出导致异常的线程，而不是使用传递的 bool 变量。

关于c++ - 当一个工作线程失败时，如何中止剩余的工作人员？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32246665/

amp 工作人员 std lt mutex c++multithreading

有关c++ - 当一个工作线程失败时，如何中止剩余的工作人员？的更多相关文章

ruby - 如何使用 Nokogiri 的 xpath 和 at_xpath 方法 - 2
我正在学习如何使用Nokogiri，根据这段代码我遇到了一些问题:require'rubygems'require'mechanize'post_agent=WWW::Mechanize.newpost_page=post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')puts"\nabsolutepathwithtbodygivesnil"putspost_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div
ruby - 如何从 ruby 中的字符串运行任意对象方法？ - 2
总的来说，我对ruby还比较陌生，我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础，我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用
python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声？ - 2
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题？更新问题，以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点？我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等)，但我确实想创建一个输出文件。
ruby-on-rails - 由于 "wkhtmltopdf"，PDFKIT 显然无法正常工作 - 2
我在从html页面生成PDF时遇到问题。我正在使用PDFkit。在安装它的过程中，我注意到我需要wkhtmltopdf。所以我也安装了它。我做了PDFkit的文档所说的一切......现在我在尝试加载PDF时遇到了这个错误。这里是错误:commandfailed:"/usr/local/bin/wkhtmltopdf""--margin-right""0.75in""--page-size""Letter""--margin-top""0.75in""--margin-bottom""0.75in""--encoding""UTF-8""--margin-left""0.75in""-
ruby-on-rails - 如何验证 update_all 是否实际在 Rails 中更新 - 2
给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息？最佳答案在Rails3中，update_all不返回任何有意义的信息，除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru
ruby-on-rails - 'compass watch' 是如何工作的/它是如何与 rails 一起使用的 - 2
我在我的项目目录中完成了compasscreate.和compassinitrails。几个问题:我已将我的.sass文件放在public/stylesheets中。这是放置它们的正确位置吗？当我运行compasswatch时，它不会自动编译这些.sass文件。我必须手动指定文件:compasswatchpublic/stylesheets/myfile.sass等。如何让它自动运行？文件ie.css、print.css和screen.css已放在stylesheets/compiled。如何在编译后不让它们重新出现的情况下删除它们？我自己编译的.sass文件编译成compiled/t
ruby - 如何将脚本文件的末尾读取为数据文件(Perl 或任何其他语言) - 2
我正在寻找执行以下操作的正确语法(在Perl、Shell或Ruby中):#variabletoaccessthedatalinesappendedasafileEND_OF_SCRIPT_MARKERrawdatastartshereanditcontinues. 最佳答案 Perl用__DATA__做这个:#!/usr/bin/perlusestrict;usewarnings;while(){print;}__DATA__Texttoprintgoeshere 关于ruby-如何将脚
ruby - 如何指定 Rack 处理程序 - 2
Rackup通过Rack的默认处理程序成功运行任何Rack应用程序。例如:classRackAppdefcall(environment)['200',{'Content-Type'=>'text/html'},["Helloworld"]]endendrunRackApp.new但是当最后一行更改为使用Rack的内置CGI处理程序时，rackup给出“NoMethodErrorat/undefinedmethod`call'fornil:NilClass”:Rack::Handler::CGI.runRackApp.newRack的其他内置处理程序也提出了同样的反对意见。例如Rack
ruby - 使用 Vim Rails，您可以创建一个新的迁移文件并一次性打开它吗？ - 2
使用带有Rails插件的vim，您可以创建一个迁移文件，然后一次性打开该文件吗？textmate也可以这样吗？最佳答案你可以使用rails.vim然后做类似的事情::Rgeneratemigratonadd_foo_to_bar插件将打开迁移生成的文件，这正是您想要的。我不能代表textmate。关于ruby-使用VimRails，您可以创建一个新的迁移文件并一次性打开它吗？，我们在StackOverflow上找到一个类似的问题： https://sta
ruby-on-rails - Rails - 一个 View 中的多个模型 - 2
我需要从一个View访问多个模型。以前，我的links_controller仅用于提供以不同方式排序的链接资源。现在我想包括一个部分(我假设)显示按分数排序的顶级用户(@users=User.all.sort_by(&:score))我知道我可以将此代码插入每个链接操作并从View访问它，但这似乎不是“ruby方式”，我将需要在不久的将来访问更多模型。这可能会变得很脏，是否有针对这种情况的任何技术？注意事项:我认为我的应用程序正朝着单一格式和动态页面内容的方向发展，本质上是一个典型的网络应用程序。我知道before_filter但考虑到我希望应用程序进入的方向，这似乎很麻烦。最终从任何

c++ - 当一个工作线程失败时，如何中止剩余的工作人员？

有关c++ - 当一个工作线程失败时，如何中止剩余的工作人员？的更多相关文章

随机推荐