SgmlLinkExtractor

python - LinkExtractor 和 SgmlLinkExtractor 的区别

我是scrapy框架的新手，我看过一些使用LinkExtractors的教程和一些使用SgmlLinkExtractor的教程。我曾尝试寻找两者的差异/利弊，但结果并不令人满意。谁能告诉我两者的区别？我们什么时候应该使用上述提取器？谢谢! 最佳答案为什么您找不到对SgmlLinkExtractor的引用的问题是它现在已弃用(相关changeset)。您可以找到SgmlLinkExtractor定义here-在Scrapy0.24文档中。而且，你不应该再使用SgmlLinkExtractor-Scrapy现在只留下一个链接提取器-L

python - Scrapy SgmlLinkExtractor 问题

我正在尝试让SgmlLinkExtractor工作。这是签名:SgmlLinkExtractor(allow=(),deny=(),allow_domains=(),deny_domains=(),restrict_xpaths(),tags=('a','area'),attrs=('href'),canonicalize=True,unique=True,process_value=None)我只是在使用allow=()所以，我输入rules=(Rule(SgmlLinkExtractor(allow=("/aadler/",)),callback='parse'),)所以，初始ur

SgmlLinkExtractor python code 39 web-crawler scrapy

python - Scrapy SgmlLinkExtractor 忽略允许的链接

请看thisspiderexample在Scrapy文档中。解释是:Thisspiderwouldstartcrawlingexample.com’shomepage,collectingcategorylinks,anditemlinks,parsingthelatterwiththeparse_itemmethod.Foreachitemresponse,somedatawillbeextractedfromtheHTMLusingXPath,andaItemwillbefilledwithit.我完全复制了同一个蜘蛛，并用另一个初始url替换了“example.com”。from

SgmlLinkExtractor python code section web-crawler scrapy