Table of Contents
In short, a Robots.txt file controls how search engines access your website.
This text file contains “directives” which dictate to search engines which pages are to “Allow” and “Disallow” search engine access.
Screenshot of
our Robots.txt
file
Adding the wrong directives here can negatively impact your rankings as it can hinder search engines from crawling pages (or your entire) website.
Find out exactly how many customers you SHOULD be getting from organic search.
Robots are applications that “ crawl ” through websites, documenting (i.e. “indexing”) the information they cover.
In regards to the Robots.txt file, these robots are referred to as User-agents .
You may also hear them called:
These are not the official User-agent names of search engines crawlers. In other words, you would not “Disallow” a “Crawler”, you would need to get the official name of the search engine (the Google crawler is called “Googlebot”).
You can find a full list of web robots here .
These bots are influenced in a number of ways, including the content you create and links pointing to your website.
Your Robots.txt file is a means to speak directly to search engine bots , giving them clear directives about which parts of your site you want crawled (or not crawled).
You need to understand the “syntax” in which to create you Robots.txt file.
State the name of the robot you are referring to (i.e. Google, Yahoo, etc). Again, you will want to refer to the full list of user-agents for help.
If you want to block access to pages or a section of your website, state the URL path here.
If you want to unblock a URL path within a blocked parent directly, enter that URL subdirectory path here.
In short, you can use robots.txt to tell these crawlers, “Index these pages but don’t index these other ones.”
It may seem counter intuitive to “block” pages from search engines. There’s a number of reasons and instances to do so:
Directories are a good example.
You’d probably want to hide those that may contain sensitive data like:
Google has stated numerous times that it’s important to keep your website “pruned” from low quality pages. Having a lot of garbage on your site can drag down performance.
Check out our content audit for more details.
You may want to exclude any pages that contain duplicate content. For example, if you offer “print versions” of some pages, you wouldn’t want Google to index duplicate versions as duplicate content could hurt your rankings.
However, keep in mind that people can still visit and link to these pages, so if the information is the type you don’t want others to see, you’ll need to use password protection to keep it private.
It’s because there are probably some pages that contain sensitive information you don’t want to show on a SERP.
Robots.txt is actually fairly simple to use.
You literally tell robots which pages to “Allow” (which means they’ll index them) and which ones to “Disallow” (which they’ll ignore).
You’ll use the latter only once to list the pages you don’t want spiders to crawl. The “Allow” command is only used when you want a page to be crawled, but its parent page is “Disallowed.”
Here’s what the robot.txt for my website looks like:
The initial user-agent command tells all web robots (i.e. *) – not just ones for specific search engines – that these instructions apply to them.
First, you will need to write your directives into a text file .
Next, upload the text file to your site’s top-level directory – this need to be added via Cpanel.
Your live file will always come right after the “.com/” in your URL. Ours, for example, is located at https://webris.org/robot.txt .
If it were located at www.webris.com/blog/robot.txt, the crawlers wouldn’t even bother looking for it and none of its commands would be followed.
If you have subdomains, make sure they have their own robots.txt files as well. For example, our training.webris.org subdomain has it’s own set of directives – this is incredibly important to check when running SEO audits .
Google offers a free robots.txt tester tool that you can use to check.
It is located in Google Search Console under Crawl > Robots.txt Tester .
Now that you understand this important element of SEO, check your own site to ensure search engines are indexing the pages you want and ignoring those you wish to keep out of SERPs.
Going forward, you can continue using robot.txt to inform search engines how they are to crawl your site.
Thanks as always for your posts. Just wanted to give you a quick (no trying to be a dick) comment. The link to your robots file is not correct. Your missing the “s”. Thanks
I did not know that a robots.txt disallow on a folder like the following:
Disallow: /staff
It will also block an HTML file with the same word in it like this example:
https://sites-url-here/staff-name.html
So, what I did is to disallow a folder is to add a / to designate a folder:
Disallow:/staff/
Hi Ryan,
The article is very informative, I have a problem with the robots.txt file, the file is not visible, I checked in search console but there is no information it is blank. earlier we have the file. can you please tell me how to get it back?
This is the URL
http://www.emarketagency.in/robots.txt
Nice Stuff! I recently viewed https://www.ibm.com/robots.txt and am seeing “Disallow: //”. What does it mean with two forward slashes?
# $Id: robots.txt,v 1.81 2018/12/13 19:15:03 jliao Exp $
#
# This is a file retrieved by webwalkers a.k.a. spiders that
# conform to a defacto standard.
# See
#
# Comments to the webmaster should be posted at
#
# Format is:
# User-agent:
# Disallow: |
# ——————————————————————————
User-agent: *
Disallow: //
Disallow: /account/registration
Disallow: /account/mypro
Disallow: /account/myint
Thanks for the article, I’m going to do more research, and I hope you can answer my question. Will these methods work with a GoDaddywebsite, version 7, built with the wysiwyg editor? Godaddy has excellent tech support and customer service, but I wanted to get the input of a third party tech. Thanks again.
How do I set up a block so that people don’t see the content on these links. The case is closed and I don’t want my family to suffer anymore.
https://www.9and10news.com/2018/09/17/man-accused-of-luring-teen-over-internet-for-sex-expected-in-court/
https://www.9and10news.com/2018/08/27/man-accused-of-trying-to-lure-teen-girl-over-internet-for-sex-charged/
https://www.facebook.com/9and10news/posts/philip-loew-was-charged-last-month-in-grand-traverse-county-with-trying-to-entic/10156828705362578/
https://www.record-eagle.com/news/local_news/special-prosecutor-likely-in-child-solicitation-case/article_ccd69fd4-58e5-59fa-bf52-d72c2bf26eed.html
https://twitter.com/9and10news/status/1041597627148763137?lang=en
Do you have any youtube channel sir
Find out how much organic traffic your website should be getting through our Traffic Projection Analysis.
SHOW ME OUR POTENTIAL!Using data from your website, our Traffic Projection analysis can accurately forecast how much traffic (and revenue) your website could be getting from Google.
FIND OUT MORE
4 years ago