robot_hunt_maze

seo - robots.txt:禁止除了少数几个，为什么不呢？

关闭。这个问题是off-topic.它目前不接受答案。想改进这个问题吗？Updatethequestion所以它是on-topic用于堆栈溢出。关闭10年前。Improvethisquestion我一直在考虑是否禁止除Ask、Google、Microsoft和Yahoo!之外的所有抓取工具!来self的网站。这背后的原因是我从未见过任何其他网络爬虫产生的流量。我的问题是:有什么理由不这样做吗？有人做过吗？您是否注意到任何负面影响？更新:到目前为止，我使用的是黑名单方法:如果我不喜欢爬虫，我将它们添加到禁止列表中。然而，我不喜欢列入黑名单，因为这是一个永无止境的故事:那里总是有更多的爬虫

robots seo section 爬虫 stackoverflow web-crawler robots.txt

seo - robots.txt 中的星号

关闭。这个问题是off-topic.它目前不接受答案。想改进这个问题吗？Updatethequestion所以它是on-topic用于堆栈溢出。关闭10年前。Improvethisquestion想知道以下是否适用于robots.txt中的google不允许:/*.action我需要排除所有以.action结尾的网址。这是正确的吗？

robots seo section stackoverflow class robots.txt

seo - robots.txt 允许除少数子目录外的所有子目录

除了少数子目录外，我希望我的站点在搜索引擎中被编入索引。以下是我的robots.txt设置:根目录下的robots.txtUser-agent:*Allow:/在子目录中分离robots.txt(待排除)User-agent:*Disallow:/这是正确的方式还是根目录规则会覆盖子目录规则？最佳答案不，这是错误的。子目录中不能有robots.txt。你的robots.txtmustbeplacedinthedocumentroot你的主人。如果您想禁止抓取路径以/foo开头的URL，请在您的robots.txt中使用此记录(h

子目子目录 code seo search-engine cpanel robots.txt shared-hosting

seo - Robots.txt 中的多个用户代理

在robots.txt文件中，我有以下部分User-Agent:Bot1Disallow:/AUser-Agent:Bot2Disallow:/BUser-Agent:*Disallow:/C语句Disallow:c对Bot1和Bot2可见吗？最佳答案 tl;dr:不，Bot1和Bot2会愉快地抓取以C开头的路径。每个机器人最多只能遵守asinglerecord(block).原始规范在originalspecification它说:Ifthevalueis'*',therecorddescribesthedefaultacces

Robots seo section code blockquote robots.txt

seo - Robots.txt:这个通配符规则有效吗？

简单的问题。我要补充:Disallow*/*details-print/基本上，/foo/bar/dynamic-details-print形式的阻塞规则——本例中的foo和bar也可以是完全动态的。我认为这很简单，但随后在www.robotstxt.org上出现了这条消息:NotealsothatglobbingandregularexpressionarenotsupportedineithertheUser-agentorDisallowlines.The'*'intheUser-agentfieldisaspecialvaluemeaning"anyrobot".Specifi

Robots seo Disallow section noreferrer robots.txt

seo - 将 robots.txt 文件放在哪里？

关闭。这个问题是off-topic.它目前不接受答案。想改进这个问题吗？Updatethequestion所以它是on-topic用于堆栈溢出。关闭10年前。Improvethisquestionrobots.txt应该放在哪里？domainname.com/robots.txt或domainname/public_html/robots.txt我将文件放在domainname.com/robots.txt中，但是当我在浏览器中输入时它没有打开。alttexthttp://shup.com/Shup/358900/11056202047-My-Desktop.png

robots seo section noreferrer noopener web-hosting robots.txt

seo - 如何配置 robots.txt 文件以阻止除 2 个目录之外的所有目录

我不希望任何搜索引擎将我网站的大部分内容编入索引。不过，我确实希望搜索引擎为2个文件夹(及其子文件夹)编制索引。这是我设置的，但我认为它不起作用，我在Google中看到我想隐藏的页面:这是我的robots.txtUser-agent:*Allow:/archive/Allow:/lsic/User-agent:*Disallow:/禁止除2以外的所有文件夹的正确方法是什么？最佳答案我在这个论坛上给出了关于这个的教程here.而在维基百科here基本上第一个匹配的robots.txt模式总是获胜:User-agent:*Allow

robots seo section User-agent noreferrer robots.txt google-search

seo - 我可以在 robots.txt 中使用 “Host” 指令吗？

在robots.txt上搜索特定信息时，我偶然发现了一个Yandexhelppage‡关于这个主题。它建议我可以使用Host指令告诉爬虫我首选的镜像域:User-Agent:*Disallow:/dir/Host:www.example.com另外，Wikipediaarticle声明Google也理解Host指令，但没有太多(即没有)信息。在robotstxt.org，我没有在Host上找到任何内容(或维基百科上所述的Crawl-delay)。是否鼓励使用Host指令？Google是否有关于此robots.txt的任何资源？与其他爬虫的兼容性如何？‡至少从2021年初开始，链接的条目

robots ldquo code section noreferrer seo robots.txt

asp.net - MVC.NET 4 中的 Robots.txt 文件

我已经阅读了一篇关于在我的ASPMVC.NET项目中忽略来自某些url的机器人的文章。在他的文章中，作者说我们应该像这样在一些关闭的Controller中添加一些Action。在此示例中，他将操作添加到家庭Controller:#region--Robots()Method--publicActionResultRobots(){Response.ContentType="text/plain";returnView();}#endregion然后我们应该用这个主体在我们的项目中添加一个Robots.cshtml文件@{Layout=null;}#robots.txtfor@this.

Robots asp code section asp.net asp.net-mvc-4 seo robots.txt

seo - 如何允许爬虫只访问 index.php，使用 robots.txt？

如果我只想让爬虫访问index.php，这行得通吗？User-agent:*Disallow:/Allow:/index.php 最佳答案是的，它会起作用。这是来自GoogleWebmasterTool的测试结果.Urlhttp://www.example.org/index.phpGooglebotAllowedbyline3:Allow:/index.phpGooglebot-MobileAllowedbyline3:Allow:/index.php但是，请记住，如果使用此配置，您的网站主页将不会被抓取，除非使用完全限定路径访

爬虫 robots section code index seo web-crawler robots.txt

13 14 151617 18 19