robots.txt online generator

Default - All robots are:
Search interval:
Sitemap: (Left blank for none)
Common search robot: Google
googlebot
Baidu
baiduspider
MSN Search
msnbot
Yahoo
yahoo-slurp
Ask/Teoma
teoma
Cuil
twiceler
GigaBlast
gigabot
Scrub The Web
scrubby
DMOZ Checker
robozilla
Nutch
nutch
Alexa/Wayback
ia_archiver
Naver
naverbot, yeti
Special search robot: Google Image
googlebot-image
Google Mobile
googlebot-mobile
Yahoo MM
yahoo-mmcrawler
MSN PicSearch
psbot
SingingFish
asterias
Yahoo Blogs
yahoo-blogs/v3.9
Restricted Directory: Paths are relative, but each path must include: "/"

What is a robots.txt file?

  • robots.txt (all lowercase) is a text file located in a website's root directory, typically informing web crawlers (also known as web spiders) which content on the site should not be indexed by search engines and what can be accessed.
  • Due to case sensitivity in some system URLs, the robots.txt filename should be in lowercase. The robots.txt file should be placed in the root directory of the website.
  • If you want to define the behavior of a search engine's crawler for accessing subdirectories, you can merge custom settings into the root directory's robots.txt, or use robots metadata.
  • The robots.txt protocol is not a standard but a convention, so it cannot guarantee website privacy. Note that robots.txt uses string comparison to determine whether to access a URL, so whether there is a slash “/” at the end of the directory indicates different URLs. robots.txt allows the use of wildcards like "Disallow: *.gif".
  • The Robots Exclusion Protocol is a moral code in the international Internet community, established based on the following principles: 1. Search technology should serve humanity, respect the intentions of information providers, and protect their privacy; 2. Websites have the obligation to protect their users' personal information and privacy from being violated.

robots.txt file content

  • The accessibility or crawlability of search engine spiders.
  • Search engine spiders focus on the accessibility of directories or files.
  • Website sitemap path definition.
  • Time interval restriction for search engine spiders crawling.

About the robots.txt file generator

  • Configure the data through the web interface, click the generator to obtain the robots.txt file content in the text input box at the bottom.
  • Now, create a blank text file named "robots.txt" and copy and paste the content above into it.
  • Place "robots.txt" in your website's root directory. Access robots.txt to ensure it allows visitors (such as search engines) to access.

You recently used:

收藏: favorite Menu QQ