草庐IT

c++ - 如何处理读入 ASIO streambuf 的额外字符?

coder 2024-02-17 原文

大多数库的解析器仅在 std::istream 或单个连续缓冲区上工作。这些解析器读取 istream 直到 eof,而不是文档的末尾。即使有一个很好的 boost::asio::streambuf,它可以与 istream 一起使用,但只读取一帧并向其提交一帧仍然存在问题。 read_until 之类的函数正在提交它们读取的任何内容,如果它们读取下一帧的片段,则解析填充会失败。

这 mock example on Coliru 显示了问题。

假设我们需要一个高效的解决方案,无需复制缓冲区,我需要确保流的结尾是文档的正确结尾。我当前的解决方案是扫描数据并在一个准备好的缓冲区上进行多次提交/使用:

size_t read_some_frames( boost::asio::streambuf& strbuf, 
                         std::function< void(istream&) > parser ) {
        auto buffers= strbuf.prepare( 1024 );
        size_t read= bad_case_of_read_some( buffers );

        vector< std::pair< size_t, size_t > > frames;
        std::pair< size_t, size_t > leftover= scanForFrames( 
                    buffers_begin(buffers), 
                    buffers_begin(buffers)+read, 
                    frames, '\0' );

        for( auto const& frame: frames ) {
            cout << "Frame size: " << frame.first 
                      << " skip: " << frame.second << endl;
            strbuf.commit( frame.first );
            strbuf.consume( frame.second );
            iostream stream( &strbuf );
            parser( stream );
        }
        cout << "Unfinished frame size: " << leftover.first 
                             << " skip:" << leftover.second << endl;
        strbuf.commit( leftover.first );
        strbuf.consume( leftover.second );
        return read;
}

Live on Coliru

根据 documentation ,这是错误的。我认为这段代码有效,因为调用 commit and consume不要释放内部缓冲区。我需要以某种方式处理这个问题。

有哪些可能的解决方案?

最佳答案

虽然read_until()操作将读取的所有数据提交到 streambuf 的输入序列中,它们返回一个 bytes_transferred 值,其中包含直到并包括第一个定界符的字节数。本质上,它提供了帧的大小,并且可以通过以下任一方式限制 istream 仅读取 streambuf 输入序列的一部分:

  • 使用自定义 istream 限制从 streambuf 读取的字节数。完成此操作的一种更简单的方法是使用 Boost.IOStream 的 boost::iostreams::stream并实现 Source 的模型概念。
  • 创建一个派生自 Boost.Asio 的 streambuf 的自定义 streambuf。为了限制从可用输入序列中读取的字节数,自定义函数将需要处理输入序列的末尾。此外,自定义 streambuf 将需要处理下溢。

Boost.IOStream 的自定义 Source

Boost.IOStream 的 boost::iostreams::stream 对象将 I/O 操作委托(delegate)给设备。 Device 是实现各种 Boost.IOStream 概念模型的用户代码。在这种情况下,提供对一系列字符的读取访问的 Source 概念是唯一需要的概念。此外,当 boost::iostreams::stream 使用 Source Device 时,它​​将继承自 std::basic_istream

在下面的代码中,asio_streambuf_input_device 是从 Boost.Asio streambuf 读取的 Source 概念模型。读取给定数量的字节后,asio_streambuf_input_device 指示下溢,即使底层 streambuf 的输入序列中仍有数据。

/// Type that implements a model of the Boost.IOStream's Source concept
/// for reading data from a Boost.Asio streambuf
class asio_streambuf_input_device
  : public boost::iostreams::source // Use convenience class.
{
public:

  explicit
  asio_streambuf_input_device(
      boost::asio::streambuf& streambuf,
      std::streamsize bytes_transferred
  )
    : streambuf_(streambuf),
      bytes_remaining_(bytes_transferred)
  {}

  std::streamsize read(char_type* buffer, std::streamsize buffer_size)
  {
    // Determine max amount of bytes to copy.
    auto bytes_to_copy =
      std::min(bytes_remaining_, std::min(
          static_cast<std::streamsize>(streambuf_.size()), buffer_size));

    // If there is no more data to be read, indicate end-of-sequence per
    // Source concept.
    if (0 == bytes_to_copy)
    {
      return -1; // Indicate end-of-sequence, per Source concept.
    }

    // Copy from the streambuf into the provided buffer.
    std::copy_n(buffers_begin(streambuf_.data()), bytes_to_copy, buffer);

    // Update bytes remaining.
    bytes_remaining_ -= bytes_to_copy;

    // Consume from the streambuf.
    streambuf_.consume(bytes_to_copy);

    return bytes_to_copy;
  }

private:
  boost::asio::streambuf& streambuf_;
  std::streamsize bytes_remaining_;
};

// ...

// Create a custom iostream that sets a limit on the amount of bytes
// that will be read from the streambuf.
boost::iostreams::stream<asio_streambuf_input_device> input(streambuf, n);
parse(input);

这是一个完整的例子demonstrating这种方法:

#include <functional>
#include <iostream>
#include <string>

#include <boost/asio.hpp>
#include <boost/iostreams/concepts.hpp>  // boost::iostreams::source
#include <boost/iostreams/stream.hpp>

/// Type that implements a model of the Boost.IOStream's Source concept
/// for reading data from a Boost.Asio streambuf
class asio_streambuf_input_device
  : public boost::iostreams::source // Use convenience class.
{
public:

  explicit
  asio_streambuf_input_device(
      boost::asio::streambuf& streambuf,
      std::streamsize bytes_transferred
  )
    : streambuf_(streambuf),
      bytes_remaining_(bytes_transferred)
  {}

  std::streamsize read(char_type* buffer, std::streamsize buffer_size)
  {
    // Determine max amount of bytes to copy.
    auto bytes_to_copy =
      std::min(bytes_remaining_, std::min(
          static_cast<std::streamsize>(streambuf_.size()), buffer_size));

    // If there is no more data to be read, indicate end-of-sequence per
    // Source concept.
    if (0 == bytes_to_copy)
    {
      return -1; // Indicate end-of-sequence, per Source concept.
    }

    // Copy from the streambuf into the provided buffer.
    std::copy_n(buffers_begin(streambuf_.data()), bytes_to_copy, buffer);

    // Update bytes remaining.
    bytes_remaining_ -= bytes_to_copy;

    // Consume from the streambuf.
    streambuf_.consume(bytes_to_copy);

    return bytes_to_copy;
  }

private:
  boost::asio::streambuf& streambuf_;
  std::streamsize bytes_remaining_;
};

/// @brief Convert a streambuf to a string.
std::string make_string(boost::asio::streambuf& streambuf)
{
  return std::string(buffers_begin(streambuf.data()),
                     buffers_end(streambuf.data()));
}

// This example is not interested in the handlers, so provide a noop function
// that will be passed to bind to meet the handler concept requirements.
void noop() {}

int main()
{
  using boost::asio::ip::tcp;
  boost::asio::io_service io_service;

  // Create all I/O objects.
  tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 0));
  tcp::socket server_socket(io_service);
  tcp::socket client_socket(io_service);

  // Connect client and server sockets.
  acceptor.async_accept(server_socket, std::bind(&noop));
  client_socket.async_connect(acceptor.local_endpoint(), std::bind(&noop));
  io_service.run();

  // Write to client.
  const std::string message =
    "12@"
    "345@";
  write(server_socket, boost::asio::buffer(message));

  boost::asio::streambuf streambuf;

  {
    auto bytes_transferred = read_until(client_socket, streambuf, '@');
    // Verify that the entire message "12@345@" was read into
    // streambuf's input sequence.
    assert(message.size() == streambuf.size());
    std::cout << "streambuf contains: " << make_string(streambuf) <<
                  std::endl;

    // Create a custom iostream that sets a limit on the amount of bytes
    // that will be read from the streambuf.
    boost::iostreams::stream<asio_streambuf_input_device> input(
      streambuf, bytes_transferred);

    int data = 0;
    input >> data; // Consumes "12" from input sequence.
    assert(data == 12);
    std::cout << "Extracted: " << data << std::endl;
    assert(!input.eof());
    input.get(); // Consume "@" from input sequence.
    assert(!input.eof());
    input.get(); // No more data available.
    assert(input.eof());
    std::cout << "istream has reached EOF" << std::endl;
  }
  std::cout << "streambuf contains: " << make_string(streambuf) <<
               std::endl;

  {
    // As the streambuf's input sequence already contains the delimiter,
    // this operation will not actually attempt to read data from the
    // socket.
    auto bytes_transferred = read_until(client_socket, streambuf, '@');

    // Create a custom iostream that sets a limit on the amount of bytes
    // that will be read from the streambuf.
    boost::iostreams::stream<asio_streambuf_input_device> input(
      streambuf, bytes_transferred);

    std::string data;
    getline(input, data, '@'); // Consumes delimiter.
    assert(data == "345");
    std::cout << "Extracted: " << data << std::endl;
    assert(!input.eof());
    input.get(); // Underflow.
    assert(input.eof());
    std::cout << "istream has reached EOF" << std::endl;
  }

  assert(streambuf.size() == 0);
  std::cout << "streambuf is empty" << std::endl;
}

输出:

streambuf contains: 12@345@
Extracted: 12
istream has reached EOF
streambuf contains: 345@
Extracted: 345
istream has reached EOF
streambuf is empty

源自 boost::asio::streambuf

可以安全地从 Boost.Asio 的 streambuf 派生并实现自定义行为。在这种情况下,目标是限制 istream 在导致下溢之前可以从输入序列中提取的字节数。这可以通过以下方式实现:

  • 更新 streambuf 的获取区域(输入序列)指针,使其只包含需要读取的字节数。这是通过将获取区域指针 (egptr) 的结尾设置为当前字符获取区域指针 (gptr) 之后的 n 字节来实现的>).在下面的代码中,我将其称为框架
  • 处理 underflow()。如果到达当前的末尾,则返回EOF
/// @brief Type that derives from Boost.Asio streambuf and can frame the
///        input sequence to a portion of the actual input sequence.
template <typename Allocator = std::allocator<char> >
class basic_framed_streambuf
  : public boost::asio::basic_streambuf<Allocator>
{
private:

  typedef boost::asio::basic_streambuf<Allocator> parent_type;

public:

  explicit 
  basic_framed_streambuf(
    std::size_t maximum_size = (std::numeric_limits< std::size_t >::max)(),
    const Allocator& allocator = Allocator()
  )
    : parent_type(maximum_size, allocator),
      egptr_(nullptr)
  {}

  /// @brief Limit the current input sequence to n characters.
  ///
  /// @remark An active frame is invalidated by any member function that
  ///        modifies the input or output sequence.
  void frame(std::streamsize n)
  {
    // Store actual end of input sequence.
    egptr_ = this->egptr();
    // Set the input sequence end to n characters from the current
    // input sequence pointer..
    this->setg(this->eback(), this->gptr(), this->gptr() + n);
  }

  /// @brief Restore the end of the input sequence.
  void unframe()
  {
    // Restore the end of the input sequence.
    this->setg(this->eback(), this->gptr(), this->egptr_);
    egptr_ = nullptr;
  }

protected:

  // When the end of the input sequence has been reached, underflow
  // will be invoked.
  typename parent_type::int_type underflow()
  {
    // If the  streambuf is currently framed, then return eof
    // on underflow.  Otherwise, defer to the parent implementation.
    return egptr_ ? parent_type::traits_type::eof()
                  : parent_type::underflow();
  }

private:
  char* egptr_;
};

// ...

basic_framed_streambuf<> streambuf;
// ....
streambuf.frame(n);
std::istream input(&streambuf);
parse(input);
streambuf.unframe();

这是一个完整的例子demonstrating这种方法:

#include <functional>
#include <iostream>
#include <string>

#include <boost/asio.hpp>

/// @brief Type that derives from Boost.Asio streambuf and can frame the
///        input sequence to a portion of the actual input sequence.
template <typename Allocator = std::allocator<char> >
class basic_framed_streambuf
  : public boost::asio::basic_streambuf<Allocator>
{
private:

  typedef boost::asio::basic_streambuf<Allocator> parent_type;

public:

  explicit 
  basic_framed_streambuf(
    std::size_t maximum_size = (std::numeric_limits< std::size_t >::max)(),
    const Allocator& allocator = Allocator()
  )
    : parent_type(maximum_size, allocator),
      egptr_(nullptr)
  {}

  /// @brief Limit the current input sequence to n characters.
  ///
  /// @remark An active frame is invalidated by any member function that
  ///        modifies the input or output sequence.
  void frame(std::streamsize n)
  {
    // Store actual end of input sequence.
    egptr_ = this->egptr();
    // Set the input sequence end to n characters from the current
    // input sequence pointer..
    this->setg(this->eback(), this->gptr(), this->gptr() + n);
  }

  /// @brief Restore the end of the input sequence.
  void unframe()
  {
    // Restore the end of the input sequence.
    this->setg(this->eback(), this->gptr(), this->egptr_);
    egptr_ = nullptr;
  }

protected:

  // When the end of the input sequence has been reached, underflow
  // will be invoked.
  typename parent_type::int_type underflow()
  {
    // If the  streambuf is currently framed, then return eof
    // on underflow.  Otherwise, defer to the parent implementation.
    return egptr_ ? parent_type::traits_type::eof()
                  : parent_type::underflow();
  }

private:
  char* egptr_;
};

typedef basic_framed_streambuf<> framed_streambuf;

/// @brief RAII type that helps frame a basic_framed_streambuf within a 
///        given scope.
template <typename Streambuf>
class streambuf_frame
{
public:
  explicit streambuf_frame(Streambuf& streambuf, std::streamsize n)
    : streambuf_(streambuf)
  {
    streambuf_.frame(n);
  }

  ~streambuf_frame() { streambuf_.unframe(); }

  streambuf_frame(const streambuf_frame&) = delete;
  streambuf_frame& operator=(const streambuf_frame&) = delete;

private:
  Streambuf& streambuf_;
};

/// @brief Convert a streambuf to a string.
std::string make_string(boost::asio::streambuf& streambuf)
{
  return std::string(buffers_begin(streambuf.data()),
                     buffers_end(streambuf.data()));
}

// This example is not interested in the handlers, so provide a noop function
// that will be passed to bind to meet the handler concept requirements.
void noop() {}

int main()
{
  using boost::asio::ip::tcp;
  boost::asio::io_service io_service;

  // Create all I/O objects.
  tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 0));
  tcp::socket server_socket(io_service);
  tcp::socket client_socket(io_service);

  // Connect client and server sockets.
  acceptor.async_accept(server_socket, std::bind(&noop));
  client_socket.async_connect(acceptor.local_endpoint(), std::bind(&noop));
  io_service.run();

  // Write to client.
  const std::string message =
    "12@"
    "345@";
  write(server_socket, boost::asio::buffer(message));

  framed_streambuf streambuf;

  // Demonstrate framing the streambuf's input sequence manually.
  {
    auto bytes_transferred = read_until(client_socket, streambuf, '@');
    // Verify that the entire message "12@345@" was read into
    // streambuf's input sequence.
    assert(message.size() == streambuf.size());
    std::cout << "streambuf contains: " << make_string(streambuf) <<
                  std::endl;

    // Frame the streambuf based on bytes_transferred.  This is all data
    // up to and including the first delimiter.
    streambuf.frame(bytes_transferred);

    // Use an istream to read data from the currently framed streambuf.
    std::istream input(&streambuf);
    int data = 0;
    input >> data; // Consumes "12" from input sequence.
    assert(data == 12);
    std::cout << "Extracted: " << data << std::endl;
    assert(!input.eof());
    input.get(); // Consume "@" from input sequence.
    assert(!input.eof());
    input.get(); // No more data available in the frame, so underflow.
    assert(input.eof());
    std::cout << "istream has reached EOF" << std::endl;

    // Restore the streambuf.
    streambuf.unframe();
  }

  // Demonstrate using an RAII helper to frame the streambuf's input
  // sequence.
  {
    // As the streambuf's input sequence already contains the delimiter,
    // this operation will not actually attempt to read data from the
    // socket.
    auto bytes_transferred = read_until(client_socket, streambuf, '@');
    std::cout << "streambuf contains: " << make_string(streambuf) <<
                  std::endl;

    // Frame the streambuf based on bytes_transferred.  This is all data
    // up to and including the first delimiter.  Use a frame RAII object
    // to only frame the streambuf within the current scope.
    streambuf_frame<framed_streambuf> frame(streambuf, bytes_transferred);

    // Use an istream to read data from the currently framed streambuf.
    std::istream input(&streambuf);
    std::string data;
    getline(input, data, '@'); // Consumes delimiter.
    assert(data == "345");
    std::cout << "Extracted: " << data << std::endl;
    assert(!input.eof());
    input.get(); // No more data available in the frame, so underflow.
    assert(input.eof());
    std::cout << "istream has reached EOF" << std::endl;
    // The frame object's destructor will unframe the streambuf.
  }

  assert(streambuf.size() == 0);
  std::cout << "streambuf is empty" << std::endl;
}

输出:

streambuf contains: 12@345@
Extracted: 12
istream has reached EOF
streambuf contains: 345@
Extracted: 345
istream has reached EOF
streambuf is empty

关于c++ - 如何处理读入 ASIO streambuf 的额外字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28415142/

有关c++ - 如何处理读入 ASIO streambuf 的额外字符?的更多相关文章

  1. ruby - 如何从 ruby​​ 中的字符串运行任意对象方法? - 2

    总的来说,我对ruby​​还比较陌生,我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础,我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用

  2. Ruby 解析字符串 - 2

    我有一个字符串input="maybe(thisis|thatwas)some((nice|ugly)(day|night)|(strange(weather|time)))"Ruby中解析该字符串的最佳方法是什么?我的意思是脚本应该能够像这样构建句子:maybethisissomeuglynightmaybethatwassomenicenightmaybethiswassomestrangetime等等,你明白了......我应该一个字符一个字符地读取字符串并构建一个带有堆栈的状态机来存储括号值以供以后计算,还是有更好的方法?也许为此目的准备了一个开箱即用的库?

  3. ruby-on-rails - 在 Rails 中将文件大小字符串转换为等效千字节 - 2

    我的目标是转换表单输入,例如“100兆字节”或“1GB”,并将其转换为我可以存储在数据库中的文件大小(以千字节为单位)。目前,我有这个:defquota_convert@regex=/([0-9]+)(.*)s/@sizes=%w{kilobytemegabytegigabyte}m=self.quota.match(@regex)if@sizes.include?m[2]eval("self.quota=#{m[1]}.#{m[2]}")endend这有效,但前提是输入是倍数(“gigabytes”,而不是“gigabyte”)并且由于使用了eval看起来疯狂不安全。所以,功能正常,

  4. ruby-on-rails - unicode 字符串的长度 - 2

    在我的Rails(2.3,Ruby1.8.7)应用程序中,我需要将字符串截断到一定长度。该字符串是unicode,在控制台中运行测试时,例如'א'.length,我意识到返回了双倍长度。我想要一个与编码无关的长度,以便对unicode字符串或latin1编码字符串进行相同的截断。我已经了解了Ruby的大部分unicode资料,但仍然有些一头雾水。应该如何解决这个问题? 最佳答案 Rails有一个返回多字节字符的mb_chars方法。试试unicode_string.mb_chars.slice(0,50)

  5. ruby - 将差异补丁应用于字符串/文件 - 2

    对于具有离线功能的智能手机应用程序,我正在为Xml文件创建单向文本同步。我希望我的服务器将增量/差异(例如GNU差异补丁)发送到目标设备。这是计划:Time=0Server:hasversion_1ofXmlfile(~800kiB)Client:hasversion_1ofXmlfile(~800kiB)Time=1Server:hasversion_1andversion_2ofXmlfile(each~800kiB)computesdeltaoftheseversions(=patch)(~10kiB)sendspatchtoClient(~10kiBtransferred)Cl

  6. ruby-on-rails - Rails 常用字符串(用于通知和错误信息等) - 2

    大约一年前,我决定确保每个包含非唯一文本的Flash通知都将从模块中的方法中获取文本。我这样做的最初原因是为了避免一遍又一遍地输入相同的字符串。如果我想更改措辞,我可以在一个地方轻松完成,而且一遍又一遍地重复同一件事而出现拼写错误的可能性也会降低。我最终得到的是这样的:moduleMessagesdefformat_error_messages(errors)errors.map{|attribute,message|"Error:#{attribute.to_s.titleize}#{message}."}enddeferror_message_could_not_find(obje

  7. ruby - 如何以所有可能的方式将字符串拆分为长度最多为 3 的连续子字符串? - 2

    我试图获取一个长度在1到10之间的字符串,并输出将字符串分解为大小为1、2或3的连续子字符串的所有可能方式。例如:输入:123456将整数分割成单个字符,然后继续查找组合。该代码将返回以下所有数组。[1,2,3,4,5,6][12,3,4,5,6][1,23,4,5,6][1,2,34,5,6][1,2,3,45,6][1,2,3,4,56][12,34,5,6][12,3,45,6][12,3,4,56][1,23,45,6][1,2,34,56][1,23,4,56][12,34,56][123,4,5,6][1,234,5,6][1,2,345,6][1,2,3,456][123

  8. ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby​​ 中生成它们? - 2

    我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案 他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%

  9. ruby - 如何使用文字标量样式在 YAML 中转储字符串? - 2

    我有一大串格式化数据(例如JSON),我想使用Psychinruby​​同时保留格式转储到YAML。基本上,我希望JSON使用literalstyle出现在YAML中:---json:|{"page":1,"results":["item","another"],"total_pages":0}但是,当我使用YAML.dump时,它不使用文字样式。我得到这样的东西:---json:!"{\n\"page\":1,\n\"results\":[\n\"item\",\"another\"\n],\n\"total_pages\":0\n}\n"我如何告诉Psych以想要的样式转储标量?解

  10. ruby-on-rails - 如何优雅地重启 thin + nginx? - 2

    我的瘦服务器配置了nginx,我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例,但找不到好的解决方案。有没有人能做到这一点? 最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器,例如server{listen80;server

随机推荐