Robots_草庐IT

ruby-on-rails - rails 中子域的多个 robots.txt

我有一个包含多个子域的网站，我希望命名的子域robots.txt与www不同。我尝试使用.htaccess，但FastCGI没有查看它。所以，我试图设置路由，但似乎你不能直接重写，因为每条路由都需要一个Controller:map.connect'/robots.txt',:controller=>?,:path=>'/robots.www.txt',:conditions=>{:subdomain=>'www'}map.connect'/robots.txt',:controller=>?,:path=>'/robots.club.txt'解决这个问题的最佳方法是什么？(我正在为子域

c# - Routes.AppendTrailingSlash 排除部分路线

在MVC5.2.2中，我可以将Routes.AppendTrailingSlash设置为true，以便将尾部斜杠附加到url。但是我还有一个机器人Controller，它返回robots.txt的内容。如何防止将斜杠附加到robots.txt路由并使其在没有尾随斜杠的情况下可调用？我的Controller代码:[Route("robots.txt")]publicasyncTaskRobots(){stringrobots=getRobotsContent();returnContent(robots,"text/plain");}我的路由配置如下所示:routes.IgnoreRou

部分路 c#section robots code asp.net asp.net-mvc asp.net-mvc-5 asp.net-mvc-5.2

php - Robots.txt 和谷歌日历

我正在寻找有关如何确保正确执行此操作的最佳解决方案:我的网站上有一个日历，用户可以在其中获取日历iCal提要并将其导入他们喜欢的外部日历(Outlook、iCal、Google日历等...)。为了阻止坏人在我的网站上抓取/搜索*.ics文件，我设置了Robots.txt以禁止存储提要的文件夹。因此，从本质上讲，iCal提要可能如下所示:webcal://www.mysite.com/feeds/cal/a9d90309dafda390d09/feed.ics我知道以上内容仍然是一个公共(public)URL。但是，我有一个功能，用户可以根据需要更改其提要的地址。我的问题是:所有外部日历

和谷 Robots section 提要 Google php google-calendar-api robots.txt icalendar

php - 检查文件(robots.txt，favicon.ico)到网站 php

我想检查远程网站是否包含一些文件。例如。robots.txt或favicon.ico。当然，文件应该是可访问的(读取模式)。所以如果网站是:http://www.example.com/我想检查http://www.example.com/robots.txt.我尝试获取像http://www.example.com/robots.txt这样的URL。有时您可以查看文件是否存在，因为您在页眉中收到页面未找到错误。但是一些网站会处理这个错误，你得到的只是一些HTML代码，说找不到该页面。您将获得状态代码为200的header。所以有人知道如何检查文件是否真的存在吗？谢谢，花岗岩

php favicon section curl code file http-status-code-404 fetch

支持通配符的 Java robots.txt 解析器

我正在寻找Java中的robots.txt解析器，它支持相同的patternmatchingrules作为Googlebot。我找到了一些库来解析robots.txt文件，但它们都不支持Googlebot样式的模式匹配:Heritrix(关于这个主题有一个openissue)Crawler4j(看起来像与Heritrix相同的实现)jrobotx有人知道可以执行此操作的java库吗？最佳答案 Nutch似乎使用了crawler-commons的组合与somecustomcode(参见RobotsRulesParser.java)。

robots Java section noreferrer noopener web-applications wildcard robots.txt

java - robots.txt 解析器 java

我想知道如何在java中解析robots.txt。是否已经有任何代码？最佳答案 Heritrix是一个用Java编写的开源网络爬虫。查看他们的javadoc，我看到他们有一个实用程序类Robotstxt用于解析robots.txt文件。关于java-robots.txt解析器java，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/3141031/

java robots section noreferrer noopener parsing robots.txt

html - 谷歌显示 "A description for this result is not available because of this site' s robots.txt“

我创建了一个web应用程序并使用wordpress托管这个网站。当我在goole中搜索名称时，它显示Adescriptionforthisresultisnotavailablebecauseofthissite'srobots.txt为什么会这样。元标记有问题吗？最佳答案您网站的robots.txt文件不允许抓取您在Google搜索中找到的网页。这意味着Google的机器人不会访问此页面来阅读其内容。robots.txt文件存在于URL/robots.txt，例如，http://example.com/robots.txt。您

this description code section strong html wordpress seo wordpress-theming meta-tags

wordpress - 如何阻止 x-robots-tag 在我的整个站点上设置 noindex？

我有一个最新的WordPress网站，运行WooCommerce和Yoast，在每个页面的标题中设置了以下noindex。x-robots-tag:noindex,nofollow,nosnippet,noarchive我不确定它来自哪里。唯一的引用资料是wp-admin/admin-ajax、一些woocommerce插件文件、一些Yoast文件和一个wp-includes文件；我认为没有什么不寻常的。Cloudflare已启用，据我所知可能会以某种方式导致此问题，但暂停它似乎不会产生任何影响。Yoast已按许多其他站点正常配置。我很难理解是什么导致/控制了这个-甚至是什么、何时以及

x-robots-tag wordpress section noindex seo robots.txt yoast

seo - Google Search Console 上的 Robots.txt 错误

当我将我的网站robots.txt提交到GoogleSearchConsole时，它显示为错误，如下面的屏幕截图所示。最佳答案只需将robots.txt文件上传到根目录即可。前往yourdomain.com/robots.txt自行检查。如果有效，那么它...有效!Google可能需要一段时间才能更新SearchConsole中的状态。有时您需要将视线从SearchConsole上移开;-) 关于seo-GoogleSearchConsole上的Robots.txt错误，我们在S

Console Google section Search seo robots.txt google-search-console

php - 如何使用 robots.txt 禁止 Codeigniter 中的某些 Controller

我对robots.txt这个话题很陌生。我已经研究了几个小时并尝试实现它。我有一个名为login和view的Controller。我想要的只是谷歌搜索只列出我的ViewController而不是登录Controller。但是现在当我在谷歌中搜索我的网站时，它在View之前显示登录。如何使用robots.txt从Goolge中删除登录信息？最佳答案关于robots.txt的很多信息你可以在那里找到:http://www.robotstxt.org/robotstxt.html简单的例子:User-agent:*Disallow:/

Codeigniter Controller section robots php html seo robots.txt