BeautifulSoup4

python - .string 和 .text BeautifulSoup 之间的区别

我在使用BeautifulSoup时发现了一些奇怪的地方，但找不到任何支持这一点的文档，所以我想在这里询问。假设我们有一个类似这样的标签，我们已经用BS解析过:SomeTableDataofficialdocumented提取数据的方法是soup.string.然而，这为第二个提取了一个NoneType标签。所以我尝试了soup.text(因为为什么不呢？)它完全按照我的意愿提取了一个空字符串。但是，我在文档中找不到对此的任何引用，并且担心会遗漏一些东西。谁能告诉我这是否可以使用或以后会引起问题？顺便说一句，我正在从网页上抓取表格数据，并打算从数据中创建CSV，所以我确实需要空字符串而

python - 使用 BeautifulSoup 和 Python 获取元标记内容属性

我正在尝试使用python和美汤提取下面标签的内容部分:我正在让BeautifulSoup很好地加载页面并找到其他东西(这也从隐藏在源中的id标记中获取文章id)，但我不知道搜索html并找到的正确方法这些位，我尝试了find和findAll的变体，但无济于事。该代码目前遍历了一个url列表...#!/usr/bin/envpython#-*-coding:utf-8-*-#importingthelibrariesfromurllibimporturlopenfrombs4importBeautifulSoupdefget_data(page_no):webpage=urlopen(

BeautifulSoup python 34 code title html web-scraping

python - BeautifulSoup .prettify() 的自定义缩进宽度

有没有办法为.prettify()函数定义自定义缩进宽度？从我可以从它的来源获得的信息-defprettify(self,encoding=None,formatter="minimal"):ifencodingisNone:returnself.decode(True,formatter=formatter)else:returnself.encode(encoding,True,formatter=formatter)无法指定缩进宽度。我认为这是因为decode_contents()函数中的这一行-s.append(""*(indent_level-1))固定长度为1个空格!(为什

自定 BeautifulSoup code prettify section python indentation code-formatting pretty-print

python - BeautifulSoup : 'ResultSet' object has no attribute 'find_all' ?

我正在尝试使用BeautifulSoup刮一张简单的table。这是我的代码:importrequestsfrombs4importBeautifulSoupurl='https://gist.githubusercontent.com/anonymous/c8eedd8bf41098a8940b/raw/c7e01a76d753f6e8700b54821e26ee5dde3199ab/gistfile1.txt'r=requests.get(url)soup=BeautifulSoup(r.text)table=soup.find_all(class_='dataframe')fir

amp 39 find_all column python beautifulsoup

python - BeautifulSoup - 按标签内的文本搜索

观察以下问题:importrefrombs4importBeautifulSoupasBSsoup=BS("""Edit""")#Thisreturnstheelementsoup.find('a',href="/customer-menu/1/accounts/1/update",text=re.compile(".*Edit.*"))soup=BS("""Edit""")#ThisreturnsNonesoup.find('a',href="/customer-menu/1/accounts/1/update",text=re.compile(".*Edit.*"))由于某些原因，

BeautifulSoup python code 34 string regex

python - BeautifulSoup:从 anchor 标签中提取文本

我要提取:来自image标记和的以下src的文本div类数据中的anchor标记文本我成功地提取了imgsrc，但无法从anchor标记中提取文本。NikonCOOLPIXL2616.1MPDigitalCamerawith5xZoomNIKKORGlassLensand3-inchLCD(Red)这里是整个HTMLpage的链接.这是我的代码:fordivinsoup.findAll('div',attrs={'class':'image'}):print"\n"fordataindiv.findNextSibling('div',attrs={'class':'data'}):fo

BeautifulSoup python 39 code amp html tags scraper

Python BeautifulSoup 提取元素之间的文本

我尝试从以下HTML中提取“这是我的文本”:TextsomethingTHISISMYTEXTsomethingelse我是这样尝试的:soup=BeautifulSoup(html)forhitinsoup.findAll(attrs={'class':'MYCLASS'}):printhit.text但我得到了所有嵌套标签之间的所有文本以及评论。谁能帮我把“这是我的文字”从这里弄出来？最佳答案详细了解如何导航throughtheparsetreeinBeautifulSoup.解析树有tags和NavigableString

BeautifulSoup Python lt gt code

python - Beautifulsoup - nextSibling

我正在尝试使用以下内容获取内容“我的家庭地址”，但得到了AttributeError:address=soup.find(text="Address:")printaddress.nextSibling这是我的HTML:Address:Myhomeaddress向下导航td标签并拉取内容的好方法是什么？最佳答案问题是你找到了NavigableString，而不是.还有nextSibling会找到下一个NavigableString或Tag所以即使你有它不会像你期望的那样工作。这就是你想要的:address=soup.find(t

Beautifulsoup nextSibling code section pre python

python - 如何摆脱 BeautifulSoup 用户警告？

安装BeautifulSoup后，每当我从命令行运行Python时，都会出现以下警告:D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166:UserWarning:Noparserwasexplicitlyspecified,soI'musingthebestavailableHTMLparserforthissystem("html.parser").Thisusuallyisn'taproblem,butifyourunthiscodeonanothers

BeautifulSoup 摆脱 section parser python

python - 如何使用python和beautifulsoup抓取需要登录的网站？

如果我想抓取一个需要先使用密码登录的网站，我该如何开始使用Python使用beautifulsoup4库来抓取它？以下是我为不需要登录的网站所做的。frombs4importBeautifulSoupimporturllib2url=urllib2.urlopen("http://www.python.org")content=url.read()soup=BeautifulSoup(content)应如何更改代码以适应登录？假设我要抓取的网站是一个需要登录的论坛。一个例子是http://forum.arduino.cc/index.php 最佳答案

python beautifulsoup section import urllib web-scraping

34 35 363738 39 40