xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault

coder 2024-06-30 原文

我有一个巨大的格式为 xml 的文件

<XML>
<Application id="1" attr1="some value" attr2="some val"..and many more attr also with nested tags inside application which might contain more attributes
</Application>

<Application id="2"attr1="some value" attr2="some val"..and many more attralso with nested tags inside application which might contain more attributes
</Application>

<Application id="3"attr1="some value" attr2="some val"..and many more attr also with nested tags inside application which might contain more attributes
</Application>

 .... probably 10000 more Application entries
</XML>

每个Application标签只有属性没有内容，但也包含可以有属性的嵌套标签，我需要解析和提取一些属性。我正在使用以下脚本，它在应用程序标签的一小部分上运行良好，但当记录变高时变得非常慢，不幸的是，当我在整个文件甚至一半的文件上运行它时，它会给我一个段错误核心转储文件。

这是我的脚本非常感谢任何关于如何更好地做到这一点的建议。

最佳答案

我相信您可以通过 XML::LibXML::Reader 来执行此操作，但我对此并不熟悉。下面是使用 XML::Twig 的方法。

我刚刚为您提供了如何获取 Application 元素中的数据的示例。

 #!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

$filename1 = "exam.xml";

my $parser = XML::Twig->new( twig_handlers => { Application => \&process_application })
                        ->parsefile($filename1);

sub process_application
  { my( $t, $sample)= @_;
    my $hncid    = $sample->att('ID);                     # get an attribute
    my @persons  = $sample->children( 'Person');
    my @aplnamt  = map { $_->att( 'APLN') } @persons;     # that's how you get all attribute values 
    my @students = $sample->findnodes( './Person/Student');
    my @nsschl   = map { $_->att('NS') } @students;
    my @d81      = $sample->descendant('*[@D8CHRG]'); 
    my @d81      = $sample->findnodes('.//*[@D8CHRG]');   # you can use a subset of XPath

    $t->purge;                                           # this is where you free the memory
  }

现在我想到了，您实际上可以使用 XML::Twig::XPath 来获得 XPath 的全部功能，我只是更习惯 XML::Twig 的 native 导航方法。

关于xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17376775/

有关xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault的更多相关文章

ruby-on-rails - 如何从 format.xml 中删除 <hash></hash> - 2
我有一个对象has_many应呈现为xml的子对象。这不是问题。我的问题是我创建了一个Hash包含此数据，就像解析器需要它一样。但是rails自动将整个文件包含在.........我需要摆脱type="array"和我该如何处理？我没有在文档中找到任何内容。最佳答案我遇到了同样的问题；这是我的XML:我在用这个:entries.to_xml将散列数据转换为XML，但这会将条目的数据包装到中所以我修改了:entries.to_xml(root:"Contacts")但这仍然将转换后的XML包装在“联系人”中，将我的XML代码修改为
ruby-on-rails - 如何在 Rails 3 中禁用 XML 解析 - 2
我想禁用HTTP参数的自动XML解析。但我发现命令仅适用于Rails2.x，它们都不适用于3.0:config.action_controller.param_parsers.deleteMime::XML(application.rb)ActionController::Base.param_parsers.deleteMime::XMLRails3.0中的等价物是什么？最佳答案根据CVE-2013-0156的最新安全公告你可以将它用于Rails3.0。3.1和3.2ActionDispatch::ParamsParser::
ruby - 如何使用 Nokogiri::XML::Builder 生成动态标签？ - 2
我正在遍历数组中的一组标签名称，我想使用构建器打印每个标签名称，而不是求助于“我认为:builder=Nokogiri::XML::Builder.newdo|xml|fortagintagsxml.tag!tag,somevalendend会这样做，但它只是创建名称为“tag”的标签，并将标签变量作为元素的文本值。有人可以帮忙吗？这个看起来应该比较简单，我刚刚在搜索引擎上找不到答案。我可能没有以正确的方式提问。最佳答案尝试以下操作。如果我没记错的话，我添加了一个根节点，因为Nokogiri需要一个。builder=Nokogi
ruby - 如何让 Nokogiri 解析并返回 XML 文档？ - 2
这是一些奇怪的例子:#!/usr/bin/rubyrequire'rubygems'require'open-uri'require'nokogiri'print"withoutread:",Nokogiri(open('http://weblog.rubyonrails.org/')).class,"\n"print"withread:",Nokogiri(open('http://weblog.rubyonrails.org/').read).class,"\n"运行此返回:withoutread:Nokogiri::XML::Documentwithread:Nokogiri::
ruby - 模式加载时出现 Nokogiri::XML::Schema SyntaxError - 2
我正在尝试加载SAML协议(protocol)架构(具体来说:https://www.oasis-open.org/committees/download.php/3407/oasis-sstc-saml-schema-protocol-1.1.xsd)，但在执行此操作之后:schema=Nokogiri::XML::Schema(File.read('saml11_schema.xsd'))我得到这个输出:Nokogiri::XML::SyntaxErrorException:Element'{http://www.w3.org/2001/XMLSchema}element',att
ruby-on-rails - 来自 cucumber 的 HTTP POST XML 内容 - 2
我正在尝试通过POST将XML内容发送到一个简单的Rails项目中的Controller(“解析”)方法(“索引”)。它不是RESTful，因为我的模型名称不同，比如“汽车”。我在有效的功能测试中有以下内容:deftest_index...data_file_path=File.dirname(__FILE__)+'/../../app/views/layouts/index.xml.erb'message=ERB.new(File.read(data_file_path))xml_result=message.result(binding)doc=REXML::Document.ne
ruby - 如何使用 XPath 和 Nokogiri 获取 XML 节点的内容 - 2
我有这样的代码:@doc=Nokogiri::HTML(open(url)@doc.xpath(query).eachdo|html|putshtml#howgetcontentofanodeend我如何获取节点的内容而不是像这样: 最佳答案这是READMEfile中的概要示例为Nokogiri展示了一种使用CSS、XPath或混合的方法:require'nokogiri'require'open-uri'#GetaNokogiri::HTML:Documentforthepagewe’reinterestedin...doc=N
ruby - 使用 Ruby 向网络服务器发送 XML 请求 - 2
恐怕我没有太多通过网络服务器发布文档(例如XML)的经验，所以如果我对HTTP的理解不足，我深表歉意。我在127.0.0.1上的ruby应用程序中设置了一个基本的MongrelWeb服务器端口2000.(服务器)。我在同一台计算机上运行一个单独的Ruby应用程序。(客户)。我需要客户端向服务器发送XML文档。我曾尝试使用Net::HTTP来执行此操作，但我找不到一个明确的示例来告诉我应该做什么。我试过了，但遇到了错误。我已将请求分解，使其尽可能基本:http=Net::HTTP.new("127.0.0.1",2000)http.post('file','query=foo')#x
ruby - 如何使用 Sinatra 提供 XML 文档？ - 2
我有一些XML文档，我想从Sinatra服务器获取这些文档。我做了一些搜索，但找不到任何具体的东西。我确实找到了构建器gem，但我不想从头开始构建文档。我试着做这样的事情get'/'xml='Myname90'bodyxmlend但这会在它周围添加HTML标签。这可能是我所缺少的非常基本的东西。你能给我指出正确的方向吗？最佳答案这对于Sinatra来说非常简单:get'/'docontent_type'text/xml'"Luis99"end在获取“/”时，响应将是XML"Luis99"使用正确的content_type。
ruby - 尝试使用 nokogiri 获取 xml 文件中 cdata 标签内的内容 - 2
我已经看到了几件事，但到目前为止似乎没有任何效果。我正在使用nokogirionrails3ruby1.9.2通过url解析xml。xml的片段如下所示:我正在尝试解析它以获取与NewsLineText关联的文本r=node.at_xpath('.//newslinetext')ifnode.at_xpath('.//newslinetext')s=node.at_xpath('.//newslinetext').textifnode.at_xpath('.//newslinetext')t=node.at_xpath('.//newslinetext').contentifnod

xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault

有关xml - 库 :XML for perl parsing huge xml files through xpath causing core segmentation fault的更多相关文章

随机推荐