c++ - c++ 是否存在多生产者单消费者无锁队列？

coder 2023-05-03 原文

我读得越多，我就越困惑……我会认为找到一个用 C++ 实现的正式正确的 MPSC 队列是微不足道的。

每当我发现另一个问题时，进一步的研究似乎表明存在诸如 ABA 或其他微妙的竞争条件之类的问题。

很多人都在谈论垃圾收集的必要性。这是我想避免的。

那里有公认的正确开源实现吗？

最佳答案

您可能想检查破坏者；它在 C++ 中可用:http://lmax-exchange.github.io/disruptor/

您还可以找到它如何工作的解释here on stackoverflow基本上它是一个没有锁定的循环缓冲区，针对在固定大小插槽中的线程之间传递 FIFO 消息进行了优化。

这里有两个我觉得有用的实现:Lock-free Multi-producer Multi-consumer Queue on Ring Buffer @ NatSys Lab. Blog和
Yet another implementation of a lock-free circular array queue @ CodeProject

注意:下面的代码不正确，我仅将其作为示例，这些事情有多棘手。

如果您不喜欢 google 版本的复杂性，这里有一些类似的东西 - 它更简单，但我将其作为练习留给读者使其工作(它是更大项目的一部分，不可移植此时此刻)。整个想法是为数据维护循环缓冲区和一小组计数器来识别用于写入/写入和读取/读取的插槽。由于每个计数器都在其自己的缓存行中，并且(通常)每个计数器仅在消息的实时更新中自动更新一次，因此无需任何同步即可读取它们。在 post_done 中写入线程之间存在一个潜在的争用点，它是 FIFO 保证所必需的。选择计数器(head_、wrtn_、rdng_、tail_)以确保正确性和 FIFO，因此删除 FIFO 还需要更改计数器(如果不牺牲正确性，这可能很难做到)。对于只有一个消费者的场景，可以稍微提高性能，但我不会打扰 - 如果发现其他有多个阅读器的用例，您将不得不撤消它。

在我的机器上，延迟如下所示(左侧百分位，右侧百分位内的平均值，单位为微秒，由 rdtsc 测量):

    total=1000000 samples, avg=0.24us
    50%=0.214us, avg=0.093us
    90%=0.23us, avg=0.151us
    99%=0.322us, avg=0.159us
    99.9%=15.566us, avg=0.173us

这些结果适用于单个轮询消费者，即工作线程在紧密循环中调用 wheel.read() 并检查是否不为空(例如滚动到底部)。等待消费者(低得多的 CPU 利用率)将等待事件(acquire... 函数之一)，这会由于上下文切换而增加大约 1-2us 的平均延迟。

由于读取的争用很少，消费者可以很好地扩展工作线程的数量，例如我机器上的 3 个线程:

    total=1500000 samples, avg=0.07us
    50%=0us, avg=0us
    90%=0.155us, avg=0.016us
    99%=0.361us, avg=0.038us
    99.9%=8.723us, avg=0.044us

欢迎提供补丁 :)

// Copyright (c) 2011-2012, Bronislaw (Bronek) Kozicki
//
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#pragma once

#include <core/api.hxx>
#include <core/wheel/exception.hxx>

#include <boost/noncopyable.hpp>
#include <boost/type_traits.hpp>
#include <boost/lexical_cast.hpp>
#include <typeinfo>

namespace core { namespace wheel
{
  struct bad_size : core::exception
  {
    template<typename T> explicit bad_size(const T&, size_t m)
      : core::exception(std::string("Slot capacity exceeded, sizeof(")
                  + typeid(T).name()
                  + ") = "
                  + boost::lexical_cast<std::string>(sizeof(T))
                  + ", capacity = "
                  + boost::lexical_cast<std::string>(m)
                  )
    {}
  };        

  // inspired by Disruptor
  template <typename Header>
  class wheel : boost::noncopyable
  {
    __declspec(align(64))
    struct slot_detail
    {
      // slot write: (memory barrier in wheel) > post_done > (memory barrier in wheel)
      // slot read:  (memory barrier in wheel) > read_done > (memory barrier in wheel)

      // done writing or reading, must update wrtn_ or tail_ in wheel, as appropriate
      template <bool Writing>
      void done(wheel* w)
      {
        if (Writing)
          w->post_done(sequence);
        else
          w->read_done();
      }

      // cache line for sequence number and header
      long long sequence;
      Header header;

      // there is no such thing as data type with variable size, but we need it to avoid thrashing
      // cache - so we invent one. The memory is reserved in runtime and we simply go beyond last element.
      // This is well into UB territory! Using template parameter for this is not good, since it
      // results in this small implementation detail leaking to all possible user interfaces.
      __declspec(align(8))
      char data[8];
    };

    // use this as a storage space for slot_detail, to guarantee 64 byte alignment
    _declspec(align(64))
    struct slot_block { long long padding[8]; };

  public:
    // wrap slot data to outside world
    template <bool Writable>
    class slot
    {
      template<typename> friend class wheel;

      slot& operator=(const slot&); // moveable but non-assignable

      // may only be constructed by wheel
      slot(slot_detail* impl, wheel<Header>* w, size_t c)
        : slot_(impl) , wheel_(w) , capacity_(c)
      {}

    public:
      slot(slot&& s)
        : slot_(s.slot_) , wheel_(s.wheel_) , capacity_(s.capacity_)
      {
        s.slot_ = NULL;
      }

      ~slot()
      {
        if (slot_)
        {
          slot_->done<Writable>(wheel_);
        }
      }

      // slot accessors - use Header to store information on what type is actually stored in data
      bool empty() const          { return !slot_; }
      long long sequence() const  { return slot_->sequence; }
      Header& header()            { return slot_->header; }
      char* data()                { return slot_->data; }

      template <typename T> T& cast()
      {
        static_assert(boost::is_pod<T>::value, "Data type must be POD");
        if (sizeof(T) > capacity_)
          throw bad_size(T(), capacity_);
        if (empty())
          throw no_data();
        return *((T*) data());
      }

    private:
      slot_detail*    slot_;
      wheel<Header>*  wheel_;
      const size_t    capacity_;
    };

  private:
    // dynamic size of slot, with extra capacity, expressed in 64 byte blocks
    static size_t sizeof_slot(size_t s)
    {
      size_t m = sizeof(slot_detail);
      // add capacity less 8 bytes already within sizeof(slot_detail)
      m += max(8, s) - 8;
      // round up to 64 bytes, i.e. alignment of slot_detail
      size_t r = m & ~(unsigned int)63;
      if (r < m)
        r += 64;
      r /= 64;
      return r;
    }

    // calculate actual slot capacity back from number of 64 byte blocks
    static size_t slot_capacity(size_t s)
    {
      return s*64 - sizeof(slot_detail) + 8;
    }

    // round up to power of 2
    static size_t round_size(size_t s)
    {
      // enfore minimum size
      if (s <= min_size)
        return min_size;

      // find rounded value
      --s;
      size_t r = 1;
      while (s)
      {
        s >>= 1;
        r <<= 1;
      };
      return r;
    }

    slot_detail& at(long long sequence)
    {
      // find index from sequence number and return slot at found index of the wheel
      return *((slot_detail*) &wheel_[(sequence & (size_ - 1)) * blocks_]);
    }

  public:
    wheel(size_t capacity, size_t size)
      : head_(0) , wrtn_(0) , rdng_(0) , tail_(0) , event_()
      , blocks_(sizeof_slot(capacity)) , capacity_(slot_capacity(blocks_)) , size_(round_size(size))
    {
      static_assert(boost::is_pod<Header>::value, "Header type must be POD");
      static_assert(sizeof(slot_block) == 64, "This was unexpected");

      wheel_ = new slot_block[size_ * blocks_];
      // all slots must be initialised to 0
      memset(wheel_, 0, size_ * 64 * blocks_);
      active_ = 1;
    }

    ~wheel()
    {
      stop();
      delete[] wheel_;
    }

    // all accessors needed
    size_t capacity() const { return capacity_; }   // capacity of a single slot
    size_t size() const     { return size_; }       // number of slots available
    size_t queue() const    { return (size_t)head_ - (size_t)tail_; }
    bool active() const     { return active_ == 1; }

    // enough to call it just once, to fine tune slot capacity
    template <typename T>
    void check() const
    {
      static_assert(boost::is_pod<T>::value, "Data type must be POD");
      if (sizeof(T) > capacity_)
        throw bad_size(T(), capacity_);
    }

    // stop the wheel - safe to execute many times
    size_t stop()
    {
      InterlockedExchange(&active_, 0);
      // must wait for current read to complete
      while (rdng_ != tail_)
        Sleep(10);

      return size_t(head_ - tail_);
    }

    // return first available slot for write
    slot<true> post()
    {
      if (!active_)
        throw stopped();

      // the only memory barrier on head seq. number we need, if not overflowing
      long long h = InterlockedIncrement64(&head_);
      while(h - (long long) size_ > tail_)
      {
        if (InterlockedDecrement64(&head_) == h - 1)
          throw overflowing();

        // protection against case of race condition when we are overflowing
        // and two or more threads try to post and two or more messages are read,
        // all at the same time. If this happens we must re-try, otherwise we
        // could have skipped a sequence number - causing infinite wait in post_done
        Sleep(0);
        h = InterlockedIncrement64(&head_);
      }

      slot_detail& r = at(h);
      r.sequence = h;

      // wrap in writeable slot
      return slot<true>(&r, this, capacity_);
    }

    // return first available slot for write, nothrow variant
    slot<true> post(std::nothrow_t)
    {
      if (!active_)
        return slot<true>(NULL, this, capacity_);

      // the only memory barrier on head seq. number we need, if not overflowing
      long long h = InterlockedIncrement64(&head_);
      while(h - (long long) size_ > tail_)
      {
        if (InterlockedDecrement64(&head_) == h - 1)
          return slot<true>(NULL, this, capacity_);

        // must retry if race condition described above
        Sleep(0);
        h = InterlockedIncrement64(&head_);
      }

      slot_detail& r = at(h);
      r.sequence = h;

      // wrap in writeable slot
      return slot<true>(&r, this, capacity_);
    }

    // read first available slot for read
    slot<false> read()
    {
      slot_detail* r = NULL;
      // compare rdng_ and wrtn_ early to avoid unnecessary memory barrier
      if (active_ && rdng_ < wrtn_)
      {
        // the only memory barrier on reading seq. number we need
        const long long h = InterlockedIncrement64(&rdng_);
        // check if this slot has been written, step back if not
        if (h > wrtn_)
          InterlockedDecrement64(&rdng_);
        else
          r = &at(h);
      }

      // wrap in readable slot
      return slot<false>(r , this, capacity_);
    }

    // waiting for new post, to be used by non-polling clients
    void acquire()
    {
      event_.acquire();
    }

    bool try_acquire()
    {
      return event_.try_acquire();
    }

    bool try_acquire(unsigned long timeout)
    {
      return event_.try_acquire(timeout);
    }

    void release()
    {}

  private:
    void post_done(long long sequence)
    {
      const long long t = sequence - 1;

      // the only memory barrier on written seq. number we need
      while(InterlockedCompareExchange64(&wrtn_, sequence, t) != t)
        Sleep(0);

      // this is outside of critical path for polling clients
      event_.set();
    }

    void read_done()
    {
      // the only memory barrier on tail seq. number we need
      InterlockedIncrement64(&tail_);
    }

    // each in its own cache line
    // head_ - wrtn_ = no. of messages being written at this moment
    // rdng_ - tail_ = no. of messages being read at the moment
    // head_ - tail_ = no. of messages to read (including those being written and read)
    // wrtn_ - rdng_ = no. of messages to read (excluding those being written or read)
    __declspec(align(64)) volatile long long head_; // currently writing or written
    __declspec(align(64)) volatile long long wrtn_; // written
    __declspec(align(64)) volatile long long rdng_; // currently reading or read
    __declspec(align(64)) volatile long long tail_; // read
    __declspec(align(64)) volatile long active_;    // flag switched to 0 when stopped

    __declspec(align(64))
    api::event event_;          // set when new message is posted
    const size_t blocks_;       // number of 64-byte blocks in a single slot_detail
    const size_t capacity_;     // capacity of data() section per single slot. Initialisation depends on blocks_
    const size_t size_;         // number of slots available, always power of 2
    slot_block* wheel_;
  };
}}

以下是轮询消费者工作线程的样子:

  while (wheel.active())
  {
    core::wheel::wheel<int>::slot<false> slot = wheel.read();
    if (!slot.empty())
    {
      Data& d = slot.cast<Data>();
      // do work
    }
    // uncomment below for waiting consumer, saving CPU cycles
    // else
    //   wheel.try_acquire(10);
  }

已编辑添加了消费者示例

关于c++ - c++ 是否存在多生产者单消费者无锁队列？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8918401/

多生 amp slot size section c++synchronization lock-free

有关c++ - c++ 是否存在多生产者单消费者无锁队列？的更多相关文章

ruby-on-rails - 如何验证 update_all 是否实际在 Rails 中更新 - 2
给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息？最佳答案在Rails3中，update_all不返回任何有意义的信息，除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru
ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby - 检查数组是否在增加 - 2
这个问题在这里已经有了答案:Checktoseeifanarrayisalreadysorted?(8个答案)关闭9年前。我只是想知道是否有办法检查数组是否在增加？这是我的解决方案，但我正在寻找更漂亮的方法:n=-1@arr.flatten.each{|e|returnfalseife
ruby - 检查字符串是否包含散列中的任何键并返回它包含的键的值 - 2
我有一个包含多个键的散列和一个字符串，该字符串不包含散列中的任何键或包含一个键。h={"k1"=>"v1","k2"=>"v2","k3"=>"v3"}s="thisisanexamplestringthatmightoccurwithakeysomewhereinthestringk1(withspecialcharacterslike(^&*$#@!^&&*))"检查s是否包含h中的任何键的最佳方法是什么，如果包含，则返回它包含的键的值？例如，对于上面的h和s的例子，输出应该是v1。编辑:只有字符串是用户定义的。哈希将始终相同。最佳答案
Ruby Sinatra 配置用于生产和开发 - 2
我已经在Sinatra上创建了应用程序，它代表了一个简单的API。我想在生产和开发上进行部署。我想在部署时选择，是开发还是生产，一些方法的逻辑应该改变，这取决于部署类型。是否有任何想法，如何完成以及解决此问题的一些示例。例子:我有代码get'/api/test'doreturn"Itisdev"end但是在部署到生产环境之后我想在运行/api/test之后看到ItisPROD如何实现？最佳答案根据SinatraDocumentation:EnvironmentscanbesetthroughtheRACK_ENVenvironm
ruby-on-rails - Ruby 检查日期时间是否为 iso8601 并保存 - 2
我需要检查DateTime是否采用有效的ISO8601格式。喜欢:#iso8601?我检查了ruby是否有特定方法，但没有找到。目前我正在使用date.iso8601==date来检查这个。有什么好的方法吗？编辑解释我的环境，并改变问题的范围。因此，我的项目将使用jsapiFullCalendar，这就是我需要iso8601字符串格式的原因。我想知道更好或正确的方法是什么，以正确的格式将日期保存在数据库中，或者让ActiveRecord完成它们的工作并在我需要时间信息时对其进行操作。最佳答案我不太明白你的问题。我假设您想检查
ruby - 检查日期是否在过去 7 天内 - 2
我的日期格式如下:"%d-%m-%Y"(例如，今天的日期为07-09-2015)，我想看看是不是在过去的七天内。谁能推荐一种方法？最佳答案你可以这样做:require"date"Date.today-7 关于ruby-检查日期是否在过去7天内，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/32438063/
ruby - 如何验证 IO.copy_stream 是否成功 - 2
这里有一个很好的答案解释了如何在Ruby中下载文件而不将其加载到内存中:https://stackoverflow.com/a/29743394/4852737require'open-uri'download=open('http://example.com/image.png')IO.copy_stream(download,'~/image.png')我如何验证下载文件的IO.copy_stream调用是否真的成功——这意味着下载的文件与我打算下载的文件完全相同，而不是下载一半的损坏文件？documentation说IO.copy_stream返回它复制的字节数，但是当我还没有下
ruby - 是否可以覆盖 gemfile 进行本地开发？ - 2
我们的git存储库中目前有一个Gemfile。但是，有一个gem我只在我的环境中本地使用(我的团队不使用它)。为了使用它，我必须将它添加到我们的Gemfile中，但每次我checkout到我们的master/dev主分支时，由于与跟踪的gemfile冲突，我必须删除它。我想要的是类似Gemfile.local的东西，它将继承从Gemfile导入的gems，但也允许在那里导入新的gems以供使用只有我的机器。此文件将在.gitignore中被忽略。这可能吗？最佳答案设置BUNDLE_GEMFILE环境变量:BUNDLE_GEMFI
ruby - 在 Windows 机器上使用 Ruby 进行开发是否会适得其反？ - 2
这似乎非常适得其反，因为太多的gem会在window上破裂。我一直在处理很多mysql和ruby-mysqlgem问题(gem本身发生段错误，一个名为UnixSocket的类显然在Windows机器上不能正常工作，等等)。我只是在浪费时间吗？我应该转向不同的脚本语言吗？最佳答案我在Windows上使用Ruby的经验很少，但是当我开始使用Ruby时，我是在Windows上，我的总体印象是它不是Windows原生系统。因此，在主要使用Windows多年之后，开始使用Ruby促使我切换回原来的系统Unix，这次是Linux。Rub

c++ - c++ 是否存在多生产者单消费者无锁队列？

有关c++ - c++ 是否存在多生产者单消费者无锁队列？的更多相关文章

随机推荐