Soup

python - Beautiful Soup 和 Unicode 问题

我正在使用BeautifulSoup来解析一些网页。偶尔我会遇到如下“unicodehell”错误:查看TheAtlantic.com上这篇文章的来源[http://www.theatlantic.com/education/archive/2013/10/why-are-hundreds-of-harvard-students-studying-ancient-chinese-philosophy/280356/]我们在og:description元属性中看到了这一点:当BeautifulSoup解析它时，我看到了这个:>>>printrepr(description)u'Thepr

python - 使用 Beautiful Soup 保存实体进行抓取

我想从网上抓取一张表格并保留实体完好无损，以便我以后可以重新发布为HTML。BeautifulSoup似乎正在将这些转换为空格。示例:frombs4importBeautifulSouphtml=""html+=" hello "html+=""soup=BeautifulSoup(html)table=soup.find_all('table')[0]row=table.find_all('tr')[0]cell=row.find_all('td')[0]printcell观察结果: hello 要求的结果: hello

Beautiful python section code BeautifulSoup web-scraping html-parsing html-entities

python - 如何使用 Beautiful Soup 查找所有具有自定义 html 属性的元素，而不管 html 标签如何？

我有两种情况想使用自定义html属性来抓取html标签这是html的例子。如何抓取所有具有自定义属性“limit”的元素。BarFooBaz第二种情况类似，但具有相同的html标签BarBarBar我的问题不同于Howtofindtagswithonlycertainattributes-BeautifulSoup因为后者以具有特定标签的属性值为目标，而我的问题只查找属性，而不考虑标签或值最佳答案 #Firstcase:soup.find_all(attrs={"limit":True})#Secondcase:soup.find

自定 html 34 section python beautifulsoup

python - Beautiful Soup 是否适用于 Python 3.4.1？

我想尝试制作一个从Internet下载图像的程序，并且我找到了一个使用Beautifulsoup的指南。我以前听说过BeautifulSoup，所以我想我会尝试一下。我唯一的问题是我似乎找不到适用于Python3的版本。我访问了他们的网站，但找不到适用于Python3的版本。每当我运行setup.py文件时，我都会收到一个错误，该错误太快以至于无法阅读，但它看起来像是在说语法错误。所以我查看了代码，发现应该打印的字符串前后没有任何括号。我试过许多不同的网页和不同的搜索，但无法找到答案。如果这不是与编程相关的问题，我也很抱歉，如果不是，请对此问题发表评论，我会尽快删除该问题。

Beautiful python section code beautifulsoup

python - Beautiful Soup for Ruby 最接近的等价物是什么？

我喜欢Python中的BeautifulSoup抓取库。它只是工作。Ruby中是否有一个近似的等价物？最佳答案 Nokogiri是另一个HTML/XML解析器。根据thesebenchmarks，它比hpricot更快.Nokogiri使用libxml2，是hpricot的替代品。它还支持css3选择器，这非常好。编辑:有一个新的基准比较nokogiri、libxml-ruby、hpricot和rexmlhere.RubyToolbox在HTML解析器上有一个类别here. 关于pyt

等价物 Beautiful section noreferrer nofollow python ruby beautifulsoup

python - beautiful soup 只是获取标签里面的值

以下命令:volume=soup.findAll("span",{"id":"volume"})[0]给出:16,103.3当我发行打印品(卷)时。我怎样才能只得到号码？最佳答案从元素中提取字符串:volume=soup.findAll("span",{"id":"volume"})[0].string 关于python-beautifulsoup只是获取标签里面的值，我们在StackOverflow上找到一个类似的问题： https://stackove

beautiful python section 34 volume beautifulsoup

python - 了解 Beautiful Soup 中的 Find() 函数

我知道我想做的很简单，但这让我很伤心。我想使用BeautifulSoup从HTML中提取数据。为此，我需要正确使用.find()函数。这是我正在使用的HTML:EdBoon@noobde73,599Real32,452FakeFollowers69%Auditscore我想要的值是data-value=73599的73599，data-value=32452的32352>，以及来自percentagegood的69%。使用过去的代码和在线示例，这是我目前所拥有的:RealValue=soup.find("div",{"class":"realnumber"})['data-value'

Beautiful python code 34 class html beautifulsoup

python - 如何使用 Beautiful Soup 查找所有评论

Thisquestion四年前被问到，但现在答案对于BS4已经过时了。我想用漂亮的汤删除我的html文件中的所有评论。由于BS4使每个commentasaspecialtypeofnavigablestring，我认为这段代码可以工作:forcommentsinsoup.find_all('comment'):comments.decompose()所以那行不通....我如何使用BS4查找所有评论？最佳答案您可以将函数传递给find_all()以帮助它检查字符串是否为评论。例如我有下面的html:TheScience&S

Beautiful python section code comments html beautifulsoup

python - 使用 Beautiful Soup 查找下一个出现的标签及其包含的文本

我正在尝试解析标签之间的文本.当我输入soup.blockquote.get_text().对于HTML文件中第一个出现的block引用，我得到了我想要的结果。我如何找到下一个和顺序文件中的标记？也许我只是累了，在文档中找不到它。示例HTML文件:headerIcangetthistexteiaoiefjtryingtocapturethisnextdonotcapturethiscapturethistoobutseparatelyafter"capturethisnext"简单的python代码:frombs4importBeautifulSouphtml_doc=open("ex

Beautiful 及其 blockquote gt lt python html python-2.7 beautifulsoup

python - 如何遍历 Beautiful Soup 元素的 HTML 属性？

如何遍历BeautifulSoup元素的HTML属性？喜欢，给定:xyz我想要“bar”和“blah”。最佳答案 fromBeautifulSoupimportBeautifulSouppage=BeautifulSoup('xyz')forattr,valueinpage.find('foo').attrs:printattr,"=",value#Prints:#bar=asdf#blah=123 关于python-如何遍历BeautifulSoup元素的HTML属性？，我们在Sta

Beautiful python section BeautifulSoup 34

5 6 789