📘 You're Losing Customers

📘 You're Losing Customers

robot.txt what is it


  • September 24, 2018

About the author

Ryan Stewart

I have an unhealthy obsession with being considered the world's BEST internet marketer. I'm highly active on social media and love a good debate.

In short, a  Robots.txt file controls how search engines access your website.

This text file contains “directives” which dictate to search engines which pages are to “Allow” and  “Disallow” search engine access.


example-robots-txt-file Screenshot of our Robots.txt file


Adding the wrong directives here can negatively impact your rankings as it can hinder search engines from crawling pages (or your entire) website.


Your Website is Losing Money...

Find out exactly how many customers you SHOULD be getting from organic search.


What are “Robots” (in regards to SEO)?

Robots are applications that “ crawl ” through websites, documenting (i.e. “indexing”) the information they cover.

In regards to the Robots.txt file, these robots are referred to as User-agents .

You may also hear them called:

  • Spiders
  • Bots
  • Web Crawlers

These are not the official User-agent names of search engines crawlers. In other words, you would not “Disallow” a “Crawler”, you would need to get the official name of the search engine (the Google crawler is called “Googlebot”).

You can find a full list of web robots here .



Image credit

These bots are influenced in a number of ways, including the content you create and links pointing to your website.

Your Robots.txt file is a means to speak directly to search engine bots , giving them clear directives about which parts of your site you want crawled (or not crawled).


How to use Robots.txt file?

You need to understand the “syntax” in which to create you Robots.txt file.

1. Define the User-agent

State the name of the robot you are referring to (i.e. Google, Yahoo, etc). Again, you will want to refer to the full list of user-agents for help.

2. Disallow

If you want to block access to pages or a section of your website, state the URL path here.

3. Allow

If you want to unblock a URL path within a blocked parent directly, enter that URL subdirectory path here.


wikipedia robots file

Wikipedia’s Robots.txt file.


In short, you can use robots.txt to tell these crawlers, “Index these pages but don’t index these other ones.”


Why Robots.txt Is So Important

It may seem counter intuitive to “block” pages from search engines. There’s a number of reasons and instances to do so:


1. Blocking sensitive information

Directories are a good example.

You’d probably want to hide those that may contain sensitive data like:

  • /cart/
  • /cgi-bin/
  • /scripts/
  • /wp-admin/


2. Blocking low quality pages

Google has stated numerous times that it’s important to keep your website “pruned” from low quality pages. Having a lot of garbage on your site can drag down performance.

Check out our content audit for more details.


3. Blocking duplicate content

You may want to exclude any pages that contain duplicate content. For example, if you offer “print versions” of some pages, you wouldn’t want Google to index duplicate versions as duplicate content could hurt your rankings.

However, keep in mind that people can still visit and link to these pages, so if the information is the type you don’t want others to see, you’ll need to use password protection to keep it private.

It’s because there are probably some pages that contain sensitive information you don’t want to show on a SERP.


Robots.txt Formats for Allow and Disallow

Robots.txt is actually fairly simple to use.

You literally tell robots which pages to “Allow” (which means they’ll index them) and which ones to “Disallow” (which they’ll ignore).

You’ll use the latter only once to list the pages you don’t want spiders to crawl. The “Allow” command is only used when you want a page to be crawled, but its parent page is “Disallowed.”

Here’s what the robot.txt for my website looks like:


example of robots text file


The initial user-agent command tells all web robots (i.e. *) – not just ones for specific search engines – that these instructions apply to them.


How to Set Up Robots.txt for Your Website

First, you will need to write your directives into a text file .

Next, upload the text file to your site’s top-level directory – this need to be added via Cpanel.


adding robots.txt to cpanel

Image credit

Your live file will always come right after the “.com/” in your URL. Ours, for example, is located at https://webris.org/robot.txt .

If it were located at www.webris.com/blog/robot.txt, the crawlers wouldn’t even bother looking for it and none of its commands would be followed.

If you have subdomains, make sure they have their own robots.txt files as well. For example, our training.webris.org subdomain has it’s own set of directives – this is incredibly important to check when running SEO audits .


Testing Your Robots.txt file

Google offers a free robots.txt tester tool that you can use to check.


robots.txt tester

It is located in Google Search Console under Crawl > Robots.txt Tester .


Putting Robots.txt to work for improved SEO

Now that you understand this important element of SEO, check your own site to ensure search engines are indexing the pages you want and ignoring those you wish to keep out of SERPs.

Going forward, you can continue using robot.txt to inform search engines how they are to crawl your site.

Leave a Comment

Your email address will not be published.

Comments ( 14 )

  • Kyle Says
    4 years ago

    Thanks as always for your posts. Just wanted to give you a quick (no trying to be a dick) comment. The link to your robots file is not correct. Your missing the “s”. Thanks

  • RDF Says
    4 years ago

    I did not know that a robots.txt disallow on a folder like the following:
    Disallow: /staff

    It will also block an HTML file with the same word in it like this example:

    So, what I did is to disallow a folder is to add a / to designate a folder:

  • Abhinay Says
    4 years ago

    Hi Ryan,

    The article is very informative, I have a problem with the robots.txt file, the file is not visible, I checked in search console but there is no information it is blank. earlier we have the file. can you please tell me how to get it back?
    This is the URL

  • Fung Says
    4 years ago

    Nice Stuff! I recently viewed https://www.ibm.com/robots.txt and am seeing “Disallow: //”. What does it mean with two forward slashes?

    # $Id: robots.txt,v 1.81 2018/12/13 19:15:03 jliao Exp $
    # This is a file retrieved by webwalkers a.k.a. spiders that
    # conform to a defacto standard.
    # See
    # Comments to the webmaster should be posted at
    # Format is:
    # User-agent:
    # Disallow: |
    # ——————————————————————————

    User-agent: *
    Disallow: //
    Disallow: /account/registration
    Disallow: /account/mypro
    Disallow: /account/myint

  • Todd A. Slee Says
    3 years ago

    Thanks for the article, I’m going to do more research, and I hope you can answer my question. Will these methods work with a GoDaddywebsite, version 7, built with the wysiwyg editor? Godaddy has excellent tech support and customer service, but I wanted to get the input of a third party tech. Thanks again.

  • Abdul Aziz Qureshi Says
    3 years ago

    Do you have any youtube channel sir


    Find out how much organic traffic your website should be getting through our Traffic Projection Analysis.




    Using data from your website, our Traffic Projection analysis can accurately forecast how much traffic (and revenue) your website could be getting from Google.

    英雄联盟竞猜线上入口靠谱 玩嘉电竞抽注APP v0.8 28加拿大走势APP今日 鲸鱼电竞比赛竞猜app下载 2022lpl比赛开盘注册 esg电竞现场查询