html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a rel="nofollow" href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a rel="nofollow" href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a rel="nofollow" href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
# 获取head
print(soup.head)
<head><title>The Dormouse's story</title></head>
# 获取title
print(soup.title)
<title>The Dormouse's story</title>
body标签中的第一个b标签# 获取<body>标签中的第一个<b>标签
print(soup.body.b)
<b>The Dormouse's story</b>
# 获得当前名字的第一个tag
print(soup.a)
<a rel="nofollow" class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
# 获取所有a标签
print(soup.find_all('a'))
[<a rel="nofollow" class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a rel="nofollow" class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a rel="nofollow" class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
.contents 属性将tag子节点以列表的方式输出:# .contents属性将`tag`子节点以列表的方式输出
head_tag = soup.head
print(head_tag)
print(head_tag.contents)
title_tag = head_tag.contents[0]
print(title_tag)
print(title_tag.contents)
<head><title>The Dormouse's story</title></head>
[<title>The Dormouse's story</title>]
<title>The Dormouse's story</title>
["The Dormouse's story"]
.children 生成器,可以对tag的子节点进行循环:# .children生成器,可以对tag的子节点进行循环
for child in title_tag.children:
print(child)
The Dormouse's story
.descendants 属性对所有tag的子孙节点进行递归循环:for child in head_tag.descendants:
print(child)
<title>The Dormouse's story</title>
The Dormouse's story
tag只有一个 NavigableString 类型子节点,那么这个tag可以使用 .string 得到子节点:# 如果tag只有一个 NavigableString 类型子节点,那么这个tag可以使用 .string 得到子节点:
print(title_tag.string)
The Dormouse's story
tag中包含多个字符串,可以使用 .strings 来循环获取:for string in soup.strings:
print(repr(string))
'\n'
"The Dormouse's story"
'\n'
'\n'
"The Dormouse's story"
'\n'
'Once upon a time there were three little sisters; and their names were\n'
'Elsie'
',\n'
'Lacie'
' and\n'
'Tillie'
';\nand they lived at the bottom of a well.'
'\n'
'...'
'\n'
.stripped_strings 可以去除多余空白内容:# 使用 .stripped_strings 可以去除多余空白内容:
for string in soup.stripped_strings:
print(repr(string))
"The Dormouse's story"
"The Dormouse's story"
'Once upon a time there were three little sisters; and their names were'
'Elsie'
','
'Lacie'
'and'
'Tillie'
';\nand they lived at the bottom of a well.'
'...'
.parent 属性来获取某个元素的父节点;head标签是title标签的父节点:# 通过 .parent 属性来获取某个元素的父节点,head标签是title标签的父节点:
title_tag = soup.title
print(title_tag)
print(title_tag.parent)
<title>The Dormouse's story</title>
<head><title>The Dormouse's story</title></head>
link = soup.a
print(link)
for parent in link.parents:
if parent is None:
print(parent)
else:
print(parent.name)
<a rel="nofollow" class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
p
body
html
[document]
sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></b></a>")
print(sibling_soup.prettify())
<a>
<b>
text1
</b>
<c>
text2
</c>
</a>
.next_sibling 和 .previous_sibling 属性来查询兄弟节点:# 使用 .next_sibling 和 .previous_sibling 属性来查询兄弟节点:
print(sibling_soup.b.next_sibling)
print(sibling_soup.c.previous_sibling)
<c>text2</c>
<b>text1</b>
.next_siblings 和 .previous_siblings 属性可以对当前节点的兄弟节点迭代输出:for sibling in soup.a.next_siblings:
print(repr(sibling))
for sibling in soup.find(id="link3").previous_siblings:
print(repr(sibling))
',\n'
<a rel="nofollow" class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
' and\n'
<a rel="nofollow" class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
';\nand they lived at the bottom of a well.'
' and\n'
<a rel="nofollow" class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
',\n'
<a rel="nofollow" class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
'Once upon a time there were three little sisters; and their names were\n'
.next_element 属性指向解析过程中下一个被解析的对象(字符串或tag):last_a_tag = soup.find("a", id="link3")
print(last_a_tag)
print(last_a_tag.next_sibling)
<a rel="nofollow" class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
;
and they lived at the bottom of a well.
.previous_element 属性刚好与 .next_element 相反,它指向当前被解析的对象的前一个解析对象:print(last_a_tag.previous_element)
print(last_a_tag.previous_element.next_element)
and
<a rel="nofollow" class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
.next_elements 和 .previous_elements 的迭代器就可以向前或向后访问文档的解析内容:for element in last_a_tag.next_elements:
print(repr(element))
'Tillie'
';\nand they lived at the bottom of a well.'
'\n'
<p class="story">...</p>
'...'
'\n'
# -*- coding:utf-8 -*-
# 作者:NoamaNelson
# 日期:2023/2/16
# 文件名称:bs03.py
# 作用:BeautifulSoup的使用
# 博客:https://blog.csdn.net/NoamaNelson
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a rel="nofollow" href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a rel="nofollow" href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a rel="nofollow" href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
# ====== 子节点======
# 获取head
print(soup.head)
# 获取title
print(soup.title)
# 获取<body>标签中的第一个<b>标签
print(soup.body.b)
# 获得当前名字的第一个tag
print(soup.a)
# 获取所有a标签
print(soup.find_all('a'))
# .contents属性将`tag`子节点以列表的方式输出
head_tag = soup.head
print(head_tag)
print(head_tag.contents)
title_tag = head_tag.contents[0]
print(title_tag)
print(title_tag.contents)
# .children生成器,可以对tag的子节点进行循环
for child in title_tag.children:
print(child)
# .descendants属性对所有tag的子孙节点进行递归循环
for child in head_tag.descendants:
print(child)
# 如果tag只有一个 NavigableString 类型子节点,那么这个tag可以使用 .string 得到子节点:
print(title_tag.string)
# 如果tag中包含多个字符串,可以使用 .strings来循环获取
for string in soup.strings:
print(repr(string))
# 使用 .stripped_strings 可以去除多余空白内容:
for string in soup.stripped_strings:
print(repr(string))
# ====== 父节点======
# 通过 .parent 属性来获取某个元素的父节点,head标签是title标签的父节点:
title_tag = soup.title
print(title_tag)
print(title_tag.parent)
# 通过元素的 .parents 属性可以递归得到元素的所有父辈节点
link = soup.a
print(link)
for parent in link.parents:
if parent is None:
print(parent)
else:
print(parent.name)
# ====== 兄弟节点======
sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></b></a>", 'html.parser')
print(sibling_soup.prettify())
# 使用 .next_sibling 和 .previous_sibling 属性来查询兄弟节点:
print(sibling_soup.b.next_sibling)
print(sibling_soup.c.previous_sibling)
# 通过 .next_siblings 和 .previous_siblings 属性可以对当前节点的兄弟节点迭代输出
for sibling in soup.a.next_siblings:
print(repr(sibling))
for sibling in soup.find(id="link3").previous_siblings:
print(repr(sibling))
# ====== 回退和前进======
# .next_element 属性指向解析过程中下一个被解析的对象(字符串或tag)
last_a_tag = soup.find("a", id="link3")
print(last_a_tag)
print(last_a_tag.next_sibling)
# .previous_element 属性刚好与 .next_element 相反,它指向当前被解析的对象的前一个解析对象
print(last_a_tag.previous_element)
print(last_a_tag.previous_element.next_element)
# 通过 .next_elements 和 .previous_elements 的迭代器就可以向前或向后访问文档的解析内容
for element in last_a_tag.next_elements:
print(repr(element))
我正在学习如何使用Nokogiri,根据这段代码我遇到了一些问题:require'rubygems'require'mechanize'post_agent=WWW::Mechanize.newpost_page=post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')puts"\nabsolutepathwithtbodygivesnil"putspost_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div
总的来说,我对ruby还比较陌生,我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础,我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
很好奇,就使用rubyonrails自动化单元测试而言,你们正在做什么?您是否创建了一个脚本来在cron中运行rake作业并将结果邮寄给您?git中的预提交Hook?只是手动调用?我完全理解测试,但想知道在错误发生之前捕获错误的最佳实践是什么。让我们理所当然地认为测试本身是完美无缺的,并且可以正常工作。下一步是什么以确保他们在正确的时间将可能有害的结果传达给您? 最佳答案 不确定您到底想听什么,但是有几个级别的自动代码库控制:在处理某项功能时,您可以使用类似autotest的内容获得关于哪些有效,哪些无效的即时反馈。要确保您的提
我正在尝试设置一个puppet节点,但rubygems似乎不正常。如果我通过它自己的二进制文件(/usr/lib/ruby/gems/1.8/gems/facter-1.5.8/bin/facter)在cli上运行facter,它工作正常,但如果我通过由rubygems(/usr/bin/facter)安装的二进制文件,它抛出:/usr/lib/ruby/1.8/facter/uptime.rb:11:undefinedmethod`get_uptime'forFacter::Util::Uptime:Module(NoMethodError)from/usr/lib/ruby
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题?更新问题,以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点?我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等),但我确实想创建一个输出文件。
给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息? 最佳答案 在Rails3中,update_all不返回任何有意义的信息,除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru
在控制台中反复尝试之后,我想到了这种方法,可以按发生日期对类似activerecord的(Mongoid)对象进行分组。我不确定这是完成此任务的最佳方法,但它确实有效。有没有人有更好的建议,或者这是一个很好的方法?#eventsisanarrayofactiverecord-likeobjectsthatincludeatimeattributeevents.map{|event|#converteventsarrayintoanarrayofhasheswiththedayofthemonthandtheevent{:number=>event.time.day,:event=>ev
我在我的项目目录中完成了compasscreate.和compassinitrails。几个问题:我已将我的.sass文件放在public/stylesheets中。这是放置它们的正确位置吗?当我运行compasswatch时,它不会自动编译这些.sass文件。我必须手动指定文件:compasswatchpublic/stylesheets/myfile.sass等。如何让它自动运行?文件ie.css、print.css和screen.css已放在stylesheets/compiled。如何在编译后不让它们重新出现的情况下删除它们?我自己编译的.sass文件编译成compiled/t
我有多个ActiveRecord子类Item的实例数组,我需要根据最早的事件循环打印。在这种情况下,我需要打印付款和维护日期,如下所示:ItemAmaintenancerequiredin5daysItemBpaymentrequiredin6daysItemApaymentrequiredin7daysItemBmaintenancerequiredin8days我目前有两个查询,用于查找maintenance和payment项目(非排他性查询),并输出如下内容:paymentrequiredin...maintenancerequiredin...有什么方法可以改善上述(丑陋的)代