c++ - 如何在 boost::spirit::qi 中将某些语义 Action 排除在 AST 之外

coder 2024-02-17 原文

我尝试使用 boost::spirit::qi 解析大量文件。解析不是问题，但有些文件包含我想跳过的噪音。构建一个简单的解析器(不使用 boost::spirit::qi)验证我可以通过跳过行首不匹配规则的任何内容来避免噪音。因此，我正在寻找一种方法来编写基于行的解析器，在不匹配任何规则时跳过行。

下面的示例允许语法在完全不匹配的情况下跳过行，但是“垃圾”规则仍然插入一个空的 V() 实例，这是不需要的行为。在示例中使用\r 而不是\n 是有意的，因为我在文件中同时遇到了\n、\r 和\r\n。

#include <iostream>
#include <string>
#include <vector>
#include <boost/foreach.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/include/std_tuple.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phx = boost::phoenix;

using V = std::tuple<std::string, double, double, double>;

namespace client {
    template <typename Iterator>
    struct VGrammar : qi::grammar<Iterator, std::vector<V>(), ascii::space_type> {
        VGrammar() : VGrammar::base_type(start) {
            using namespace qi;

            v %= string("v") > double_ > double_ > double_;
            junk = +(char_ - eol);
            start %= +(v | junk);

            v.name("v");
            junk.name("junk");
            start.name("start");

            using phx::val;
            using phx::construct;

            on_error<fail>(
                start,
                std::cout
                    << val("Error! Expecting \n\n'")
                    << qi::_4
                    << val("'\n\n here: \n\n'")
                    << construct<std::string>(qi::_3, qi::_2)
                    << val("'")
                    << std::endl
            );

            //debug(v);
            //debug(junk);
            //debug(start);
        }

        qi::rule<Iterator> junk;
        //qi::rule<Iterator, qi::unused_type()> junk; // Doesn't work either
        //qi::rule<Iterator, qi::unused_type(), qi::unused_type()> junk; // Doesn't work either
        qi::rule<Iterator, V(), ascii::space_type> v;
        qi::rule<Iterator, std::vector<V>(), ascii::space_type> start;
    };
} // namespace client

int main(int argc, char* argv[]) {
    using iterator_type = std::string::const_iterator;

    std::string input = "";
    input += "v 1 2 3\r";         // keep v 1 2 3
    input += "o a b c\r";         // parse as junk
    input += "v 4 5 6 v 7 8 9\r"; // keep v 4 5 6, but parse v 7 8 9 as junk
    input += "   v 10 11 12\r\r"; // parse as junk

    iterator_type iter = input.begin();
    const iterator_type end = input.end();
    std::vector<V> parsed_output;
    client::VGrammar<iterator_type> v_grammar;

    std::cout << "run" << std::endl;
    bool r = phrase_parse(iter, end, v_grammar, ascii::space, parsed_output);
    std::cout << "done ... r: " << (r ? "true" : "false") << ", iter==end: " << ((iter == end) ? "true" : "false") << std::endl;

    if (r && (iter == end)) {
        BOOST_FOREACH(V const& v_row, parsed_output) {
            std::cout << std::get<0>(v_row) << ", " << std::get<1>(v_row) << ", " << std::get<2>(v_row) << ", " << std::get<3>(v_row) << std::endl;
        }
    }

    return EXIT_SUCCESS;
}

这是示例的输出:

run
done ... r: true, iter==end: true
v, 1, 2, 3
, 0, 0, 0
v, 4, 5, 6
v, 7, 8, 9
v, 10, 11, 12

这就是我真正希望解析器返回的内容。

run
done ... r: true, iter==end: true
v, 1, 2, 3
v, 4, 5, 6

我现在的主要问题是防止“垃圾”规则添加空的 V() 对象。我该如何做到这一点？还是我想多了？

我已经尝试将 lit(junk) 添加到开始规则中，因为 lit() 不返回任何内容，但这不会编译。它失败了:“静态断言失败:error_invalid_expression”。

我还尝试将垃圾规则的语义操作设置为 qi::unused_type() 但在这种情况下规则仍然创建一个空的 V()。

我知道以下问题，但它们没有解决这个特定问题。我之前已经尝试过 comment skipper，但看起来我必须重新实现 skipper 中的所有解析规则才能识别噪音。我的示例灵感来自于最后一个链接中的解决方案:

How to skip line/block/nested-block comments in Boost.Spirit?

How to parse entries followed by semicolon or newline (boost::spirit)?

版本信息:

Linux debian 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13) x86_64 GNU/Linux
g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
#define BOOST_VERSION 106200

和:

Linux raspberrypi 4.14.24-v7+ #1097 SMP Mon Mar 5 16:42:05 GMT 2018 armv7l GNU/Linux
g++ (Raspbian 4.9.2-10+deb8u1) 4.9.2
#define BOOST_VERSION 106200

对于那些想知道的人:是的，我正在尝试解析类似于 Wavefront OBJ 文件的文件，而且我知道已经有很多可用的解析器。然而，我正在解析的数据是一个更大的数据结构的一部分，它也需要解析，因此构建一个新的解析器确实有意义。

最佳答案

您想要实现的目标称为错误恢复。

不幸的是，Spirit 没有很好的方法来做这件事(也有一些内部决策很难从外部做出)。但是，在您的情况下，通过语法重写很容易实现。

#include <iostream>
#include <string>
#include <vector>
#include <boost/foreach.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/include/std_tuple.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phx = boost::phoenix;

using V = std::tuple<std::string, double, double, double>;

namespace client {
    template <typename Iterator>
    struct VGrammar : qi::grammar<Iterator, std::vector<V>()> {
        VGrammar() : VGrammar::base_type(start) {
            using namespace qi;

            v = skip(blank)[no_skip[string("v")] > double_ > double_ > double_];
            junk = +(char_ - eol);
            start = (v || -junk) % eol;

            v.name("v");
            junk.name("junk");
            start.name("start");

            using phx::val;
            using phx::construct;

            on_error<fail>(
                start,
                std::cout
                << val("Error! Expecting \n\n'")
                << qi::_4
                << val("'\n\n here: \n\n'")
                << construct<std::string>(qi::_3, qi::_2)
                << val("'")
                << std::endl
                );

            //debug(v);
            //debug(junk);
            //debug(start);
        }

        qi::rule<Iterator> junk;
        //qi::rule<Iterator, qi::unused_type()> junk; // Doesn't work either
        //qi::rule<Iterator, qi::unused_type(), qi::unused_type()> junk; // Doesn't work either
        qi::rule<Iterator, V()> v;
        qi::rule<Iterator, std::vector<V>()> start;
    };
} // namespace client

int main(int argc, char* argv[]) {
    using iterator_type = std::string::const_iterator;

    std::string input = "";
    input += "v 1 2 3\r";         // keep v 1 2 3
    input += "o a b c\r";         // parse as junk
    input += "v 4 5 6 v 7 8 9\r"; // keep v 4 5 6, but parse v 7 8 9 as junk
    input += "   v 10 11 12\r\r"; // parse as junk

    iterator_type iter = input.begin();
    const iterator_type end = input.end();
    std::vector<V> parsed_output;
    client::VGrammar<iterator_type> v_grammar;

    std::cout << "run" << std::endl;
    bool r = parse(iter, end, v_grammar, parsed_output);
    std::cout << "done ... r: " << (r ? "true" : "false") << ", iter==end: " << ((iter == end) ? "true" : "false") << std::endl;

    if (r && (iter == end)) {
        BOOST_FOREACH(V const& v_row, parsed_output) {
            std::cout << std::get<0>(v_row) << ", " << std::get<1>(v_row) << ", " << std::get<2>(v_row) << ", " << std::get<3>(v_row) << std::endl;
        }
    }

    return EXIT_SUCCESS;
}

关于c++ - 如何在 boost::spirit::qi 中将某些语义 Action 排除在 AST 之外，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52005150/

何在 amp lt 34 std c++boost-spirit boost-spirit-qi

有关c++ - 如何在 boost::spirit::qi 中将某些语义 Action 排除在 AST 之外的更多相关文章

ruby - 如何在 Ruby 中顺序创建 PI - 2
出于纯粹的兴趣，我很好奇如何按顺序创建PI，而不是在过程结果之后生成数字，而是让数字在过程本身生成时显示。如果是这种情况，那么数字可以自行产生，我可以对以前看到的数字实现垃圾收集，从而创建一个无限系列。结果只是在Pi系列之后每秒生成一个数字。这是我通过互联网筛选的结果:这是流行的计算机友好算法，类机器算法:defarccot(x,unity)xpow=unity/xn=1sign=1sum=0loopdoterm=xpow/nbreakifterm==0sum+=sign*(xpow/n)xpow/=x*xn+=2sign=-signendsumenddefcalc_pi(digits
ruby-on-rails - 在 Rails 中将文件大小字符串转换为等效千字节 - 2
我的目标是转换表单输入，例如“100兆字节”或“1GB”，并将其转换为我可以存储在数据库中的文件大小(以千字节为单位)。目前，我有这个:defquota_convert@regex=/([0-9]+)(.*)s/@sizes=%w{kilobytemegabytegigabyte}m=self.quota.match(@regex)if@sizes.include?m[2]eval("self.quota=#{m[1]}.#{m[2]}")endend这有效，但前提是输入是倍数(“gigabytes”，而不是“gigabyte”)并且由于使用了eval看起来疯狂不安全。所以，功能正常，
ruby - 如何在 buildr 项目中使用 Ruby 代码？ - 2
如何在buildr项目中使用Ruby？我在很多不同的项目中使用过Ruby、JRuby、Java和Clojure。我目前正在使用我的标准Ruby开发一个模拟应用程序，我想尝试使用Clojure后端(我确实喜欢功能代码)以及JRubygui和测试套件。我还可以看到在未来的不同项目中使用Scala作为后端。我想我要为我的项目尝试一下buildr(http://buildr.apache.org/)，但我注意到buildr似乎没有设置为在项目中使用JRuby代码本身!这看起来有点傻，因为该工具旨在统一通用的JVM语言并且是在ruby中构建的。除了将输出的jar包含在一个独特的、仅限ruby
ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby 中生成它们？ - 2
我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%
ruby-on-rails - 如何在 ruby 中使用两个参数异步运行 exe？ - 2
exe应该在我打开页面时运行。异步进程需要运行。有什么方法可以在ruby中使用两个参数异步运行exe吗？我已经尝试过ruby命令-system()、exec()但它正在等待过程完成。我需要用参数启动exe，无需等待进程完成是否有任何rubygems会支持我的问题？最佳答案您可以使用Process.spawn和Process.wait2:pid=Process.spawn'your.exe','--option'#Later...pid,status=Process.wait2pid您的程序将作为解释器的子进程执行。除
ruby-on-rails - 如何优雅地重启 thin + nginx？ - 2
我的瘦服务器配置了nginx，我的ROR应用程序正在它们上运行。在我发布代码更新时运行thinrestart会给我的应用程序带来一些停机时间。我试图弄清楚如何优雅地重启正在运行的Thin实例，但找不到好的解决方案。有没有人能做到这一点？最佳答案 #Restartjustthethinserverdescribedbythatconfigsudothin-C/etc/thin/mysite.ymlrestartNginx将继续运行并代理请求。如果您将Nginx设置为使用多个上游服务器，例如server{listen80;server
ruby - 如何在续集中重新加载表模式？ - 2
鉴于我有以下迁移:Sequel.migrationdoupdoalter_table:usersdoadd_column:is_admin,:default=>falseend#SequelrunsaDESCRIBEtablestatement,whenthemodelisloaded.#Atthispoint,itdoesnotknowthatusershaveais_adminflag.#Soitfails.@user=User.find(:email=>"admin@fancy-startup.example")@user.is_admin=true@user.save!ende
ruby-on-rails - rails : How to make a form post to another controller action - 2
我知道您通常应该在Rails中使用新建/创建和编辑/更新之间的链接，但我有一个情况需要其他东西。无论如何我可以实现同样的连接吗？我有一个模型表单，我希望它发布数据(类似于新View如何发布到创建操作)。这是我的表格prohibitedthisjobfrombeingsaved: 最佳答案使用:url选项。=form_for@job,:url=>company_path,:html=>{:method=>:post/:put} 关于ruby-on-rails-rails:Howtomak
ruby - 如何在 Ruby 中拆分参数字符串 Bash 样式？ - 2
我正在为一个项目制作一个简单的shell，我希望像在Bash中一样解析参数字符串。foobar"helloworld"fooz应该变成:["foo","bar","helloworld","fooz"]等等。到目前为止，我一直在使用CSV::parse_line，将列分隔符设置为""和.compact输出。问题是我现在必须选择是要支持单引号还是双引号。CSV不支持超过一个分隔符。Python有一个名为shlex的模块:>>>shlex.split("Test'helloworld'foo")['Test','helloworld','foo']>>>shlex.split('Test"
ruby - 如何在 Lion 上安装 Xcode 4.6，需要用 RVM 升级 ruby - 2
我实际上是在尝试使用RVM在我的OSX10.7.5上更新ruby，并在输入以下命令后:rvminstallruby我得到了以下回复:Searchingforbinaryrubies,thismighttakesometime.Checkingrequirementsforosx.Installingrequirementsforosx.Updatingsystem.......Errorrunning'requirements_osx_brew_update_systemruby-2.0.0-p247',pleaseread/Users/username/.rvm/log/138121

c++ - 如何在 boost::spirit::qi 中将某些语义 Action 排除在 AST 之外

有关c++ - 如何在 boost::spirit::qi 中将某些语义 Action 排除在 AST 之外的更多相关文章

随机推荐