BeautifulSoup4

python - 使用 BeautifulSoup 4 和 Python 解析 HTML

我正在尝试解析http://mobile.de的结果列表.首先我用HTMLParser试了一下类，但出现错误:HTMLParser.HTMLParseError:EOFinmiddleofconstruct.所以我尝试使用BeautifulSoup4，它更适合无效的网站，但是我正在搜索无法访问，我不知道是我的错还是网站的错。frombs4importBeautifulSoupimporturllibimportsocketsearchurl="http://suchen.mobile.de/auto/search.html?scopeId=C&isSearchRequest=true&

python - 处理 BeautifulSoup CSS 选择器中的冒号

输入HTML:applepeachcucumber期望的输出:所有div恰好在下的元素.我正试图找到父级div用CSSselector:div[style="display:flex"]这会引发错误:>>>soup.select('div[style="display:flex"]')Traceback(mostrecentcalllast):File"",line1,inFile"/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py",line1400,inselect'Onlythefollow

冒号 BeautifulSoup code 34 style python html css-selectors html-parsing

python - 在 BeautifulSoup 中用标签包装文本的小节

我想要BeautifulSoup相当于thisjQueryquestion.我想在BeautifulSoup文本中找到特定的正则表达式匹配项，然后用包装版本替换该文本段。我可以用纯文本包装来做到这一点:#replaceallwordsendingin"ug"wrappedinquotes,#with"ug"replacedwith"ook">>>soup=BeautifulSoup("Snugasabuginarug")>>>soupSnugasabuginarug>>>fortextinsoup.findAll(text=True):...ifre.search(r'ug\b',te

小节中用 gt lt code python html regex beautifulsoup

python - 如何用 BeautifulSoup 连接两个 html 文件主体？

我需要将两个html文件的主体连接成一个html文件，中间用一些任意的html作为分隔符。我有用于此的代码，但是当我从Xubuntu11.10(或者是11.04？)升级到12.10时停止工作，可能是由于BeautifulSoup更新(我目前使用的是3.2.1；我不知道我以前有什么版本)或vim更新(我使用vim从纯文本文件自动生成html文件)。这是代码的精简版:fromBeautifulSoupimportBeautifulSoupsoup_original_1=BeautifulSoup(''.join(open('test1.html')))soup_original_2=Bea

何用 BeautifulSoup code pre lt python html

python - 使用 BeautifulSoup 删除第一个子节点

importosfrombs4importBeautifulSoupdo=dir_with_original_files='C:\FOLDER'dm=dir_with_modified_files='C:\FOLDER'forroot,dirs,filesinos.walk(do):forfinfiles:printf.title()iff.endswith('~'):#youdon'twanttoprocessbackupscontinueoriginal_file=os.path.join(root,f)mf=f.split('.')mf=''.join(mf[:-1])+'_mo

BeautifulSoup python table 39 section html parsing html-parsing

python - 使用 BeautifulSoup 访问下一个兄弟 <li> 元素

我完全不熟悉使用Python/BeautifulSoup进行网络解析。我有一个HTML，其中(部分)代码如下:ExampleExampleExample1Example2我必须访问每个链接(基本上是每个元素)直到没有更多的标签存在。每次点击一个链接，其对应的元素将类设为“事件”。我的代码是:frombs4importBeautifulSoupimporturllib2importrelandingPage=urllib2.urlopen('somepage.com').read()soup=BeautifulSoup(landingPage)pageList=soup.find("di

BeautifulSoup amp code 34 lt python html

python - BeautifulSoup get_text 不会去除所有标签和 JavaScript

我正在尝试使用BeautifulSoup从网页中获取文本。下面是我为此编写的脚本。它有两个参数，第一个是输入的HTML或XML文件，第二个是输出文件。importsysfrombs4importBeautifulSoupdefstripTags(s):returnBeautifulSoup(s).get_text()defstripTagsFromFile(inFile,outFile):open(outFile,'w').write(stripTags(open(inFile).read()).encode("utf-8"))defmain(argv):iflen(sys.argv)

BeautifulSoup 去除 code section python html xml screen-scraping

python BeautifulSoup 获取 select.value 而不是文本

2002/122003/122004/122005/122006/122007/12使用此代码，我需要值作为'0'而不是文本作为'2002/12'我尝试了很多BS4选项，.stripped_strings,.strip(),.contents,get()等如何获取值而不是文本？最佳答案您需要值属性；访问tagattributes使用映射语法:option['value']演示:>>>frombs4importBeautifulSoup>>>soup=BeautifulSoup('''\......2002/12...2003/1

BeautifulSoup python option value code html select

python - 使用 BeautifulSoup Python 在标签内搜索

我想在标签内搜索:基本上，我想计算的出现次数在这个分区。但是，当我使用beautifulsoup时，我无法获取div之间的标签。soup=BeautifulSoup(resp)tags=soup.find('div',attrs={'class':'cmePaginiation'})printtags>>> 有没有办法计算li的实例数？(在本例中为4)？最佳答案使用find_all:div=soup.find('div',id='cmeProductSlatePaginiationTop')lis=div.find_a

BeautifulSoup python code section cmeProductSlatePaginiationTop html

Python + BeautifulSoup : How to get ‘href’ attribute of ‘a’ element?

我有以下内容:html='''FileOneDown'''并且只想获取href的文本，即/file-one/additional。所以我做了:frombs4importBeautifulSoupsoup=BeautifulSoup(html,'html.parser')link_text=“”forainsoup.find_all(‘a’,href=True,text=True):link_text=a[‘href’]print“Link:“+link_text但它只是打印一个空白，什么也没有。只需链接:。所以我在另一个网站上测试了它，但使用了不同的HTML，并且它有效。我做错了什么？

lsquo rsquo code href 39 python html web-scraping beautifulsoup