SEO Robots.txt
The robots exclusion protocol (REP) or Robots.txt is a text file that is used to tell search robots how to index pages on your site, i.e. which pages you want to be crawled and which pages you don’t want to be crawled. It is uploaded to the root directory and linked in the html code of a website.
Robots.txt file structure
Robots.txt has a very simple and flexible structure. Its basic syntax is given below:
“User-agent” represents search engine robot and “disallow” lists the pages not to be indexed. You can also add a comment after a # sign as shown below:
# user agents are not allowed to see the /temp directory.
Commonly used Robots.txt files:
The above robots.txt says all web crawlers are allowed to crawl or index the entire site.
The above robots.txt setup is used to block all the web crawlers from indexing the entire site.
The above robots.txt setup is used to block a site from a specific robot.
The above robots.txt setup is used to block a specific web crawler from crawling a particular folder.
The above robots.txt setup is used to block a specific web crawler from crawling a particular web page.