BeautifulSoup4

python - 使用 Python Django BeautifulSoup 和 Curl 正确抓取和显示日文字符

我正在尝试使用python、curl和BeautifulSoup抓取日语页面。然后我将文本保存到使用utf-8编码的MySQL数据库，并使用Django显示结果数据。这是一个示例网址:https://www.cisco.apply2jobs.com/ProfExt/index.cfm?fuseaction=mExternal.showJob&RID=930026&CurrentPage=180我有一个函数用于将HTML提取为字符串:defget_html(url):c=Curl()storage=StringIO()c.setopt(c.URL,str(url))cookie_file

日文 BeautifulSoup xe3 xe section python django utf-8 iso-8859-1

python - 用 beautifulsoup4 解析 xml，命名空间问题

在使用beautifulsoup4(根据需要安装lxml)以xml(word/document.xml)形式解析.docx文件内容时，我遇到了一个问题。这部分来自xml:......变成这样:......即使我只是解析文件并保存它，没有任何修改。像这样:frombs4importBeautifulSoupsoup=BeautifulSoup(open(filepath_in),'xml')withopen(filepath_out,"w+")asfd:fd.write(str(soup))或者从python控制台解析xml。对我来说，它看起来像命名空间，像这样声明，而不是在根文档节点中

beautifulsoup4 beautifulsoup 34 name namespace python xml xml-parsing docx

python - BeautifulSoup，但对于 CSS？

BeautifulSoup解析HTML并提供多种在HTML中操作和搜索的方法。CSS有类似的东西吗？具体来说，我想知道给定的HTML文本是否呈现为粗体。要么它有一个祖先是或标签(可以用BeautifulSoup完成)，或者它有一个祖先(或它自己)具有font-weight:bold的CSS属性.这是否可能不导致编写我自己的库？最佳答案查看CSSParser类cssutils包。关于python-BeautifulSoup，但对于CSS？，我们在StackOverflow上找到一个类

BeautifulSoup python section strong code css

python - 在 Mac OSX 上安装 BeautifulSoup

我在这里尝试了一切:HowcanIinstalltheBeautifulSoupmoduleontheMac?从传统安装方式和使用easy_install安装似乎都有效(在安装过程中获得正确的输出)但是当我使用时:frombs4importBeautifulSoup解释器说不存在这样的模块。要解决此问题，我应该首先查看什么？最佳答案要查看已安装的所有软件包，可以在解释器中运行以下命令:>>>help('modules')这将为您列出所有已安装的模块。在列表中查找bs4(似乎是按字母顺序排列的)。另一种选择是在您的提示下发出:$p

BeautifulSoup python code section install

python - find_all 带有 BeautifulSoup 4 的 camelCase 标签名称

我正在尝试使用BeautifulSoup4.4.0抓取一个xml文件，该文件的标签名称采用驼峰命名法，而find_all似乎无法找到它们。示例代码:frombs4importBeautifulSoupxml="""world"""soup=BeautifulSoup(xml,"lxml")forxinsoup.find_all("hello"):printxxml2=""":-)"""soup=BeautifulSoup(xml2,"lxml")forxinsoup.find_all("helloWorld"):printx我得到的输出是:$pythonsoup_test.pyworl

BeautifulSoup camelCase 34 code python

python - 使用 BeautifulSoup CSS 选择器获取文本

示例HTMLABC123abc我可以通过类似的方式获取数字:soup.select('#name>span.numbers')[0].text如何使用BeautifulSoup和select函数获取文本ABC？在这种情况下呢？123ABC 最佳答案在第一种情况下，获取previoussibling:soup.select_one('#name>span.numbers').previous_sibling在第二种情况下，获取nextsibling:soup.select_one('#name>#numbers').next_sib

BeautifulSoup python code numbers section python-2.7 css-selectors html-parsing

python - Beautifulsoup - find_all 的 '*' 是什么？

我正在尝试获取所有从一个页面。attrs每次都不一样，还有一些siblings有colourred,colourpink等类(class)。所以我正在寻找colourblue之后的任何其他字符在class要包含在结果中。我试过使用*,但它没有用:soup.find_all('tr',{'class':'colourblue*'})谢谢最佳答案可以使用常用的CSSSelectors配上漂亮的汤:>>>soup=BeautifulSoup('''..................''')>>>soup.select('tr.col

Beautifulsoup amp code colour attr python

python - 使用 BeautifulSoup Python 单击按钮后获取值(value)

我正在尝试获取点击按钮后网站给出的值。这是网站:https://www.4devs.com.br/gerador_de_cpf可以看到有一个叫“GerarCPF”的按钮，这个按钮提供了一个点击后出现的数字。我当前的脚本打开浏览器并获取值，但我是在单击之前从页面获取值，因此该值为空。我想知道是否可以在点击按钮后获取值。fromseleniumimportwebdriverfrombs4importBeautifulSoupfromrequestsimportgeturl="https://www.4devs.com.br/gerador_de_cpf"defopen_browser():

BeautifulSoup python cpf code driver selenium web-scraping web-crawler

python - 使用 BeautifulSoup 提取两个节点之间的兄弟节点

我有这样一个文档:Idon'twantthisIwantthisandallthatstufftooButnotthisandnothingafterit我想提取p[class=top]和p[class=end]段落之间的所有内容。有什么好的方法可以用BeautifulSoup做到这一点？最佳答案 node.nextSibling属性是您的解决方案:fromBeautifulSoupimportBeautifulSoupsoup=BeautifulSoup(html)nextNode=soup.find('p',{'class':

BeautifulSoup python section 39

python - 使用 BeautifulSoup 去除 html 中的脚本和样式标签？

我有一个简单的脚本，我在其中获取HTML页面，将其传递给BeautifulSoup以删除所有脚本和样式标签，然后我想将HTML结果传递给另一个方法。是否有捷径可寻？浏览了一下BeautifulSoup.py，还没看到。soup=BeautifulSoup(html)forscriptinsoup("script"):soup.script.extract()forstyleinsoup("style"):soup.style.extract()contents=soup.html.contentstext=loader.extract_text(contents)contents=so

BeautifulSoup 去除 section soup contents python html-parsing python-2.6