Importance of WordPress Robots.txt File

WordPress Robots.txt is the text file located in the root of your Web Directory. One of the most confusing part in SEO is setting up your Robots.txt file. There are lot of Webmasters out there, who miss out the use of WP Robots txt file. Actually Robots.txt file followed by the search engine bots. Those bots follow the guide lines in Robots.txt file, to understand that which part of your blog they have to crawl, and which part to left. In short it helps to block search engine bots to index and crawl admin pages and important part of your blog. A wrong configured Robots.txt file can completely remove the presence of your blog from search engines. Thats why it is the most confusing part of SEO. wordpress robots.txt file Robots.txt is not only work in WordPress, actually it is available in every platform such as Drupal, Joomla, ZenCart etc. It resides at the root of the domain for example www.howupdates.com/robots.txt

What is WordPress Robots.txt File?


Robots.txt file is located in the root of your Blog. When a search engine bot  or spider comes to your site for indexing, they follow the Robots.txt file first. This file help Search engine bots to understand which part to crawl and which to avoid. If you can’t find any Robots.txt file in host root, you can simply create a new one with the name of robots.txt and edit it. Or if you can’t do this then open up the notepad in your PC, and save the text file as rob

ots and upload it to the root of your host. Below is the link to Robots.txt file of HowUpdates. Link to HowUpdates Robots file: http://www.howupdates.com/robots.txt

Robots.txt Tags/Commands Details


Once you find your robots.txt file you have to write the commands/tags in that file which will be followed by Search engine Bots. If you want to get quality traffic on your blog you should allow the bots of every search engine to crawl your blog. To allow every search engine bot just type User-agent: * or if you want to allow only google to crawl your blog, then type User-agent: googlebot.

User-agent: *
User-agent: googlebot
User-agent: bingbot

User-agent: * This tag allow bots from every search engine to crawl your blog
User-agent: googlebot This tag allow only google to crawl your blog
User-agent: bingbot This tag allow only bing to crawl your blog

If you are choosing a specific bot to crawl your blog you have to put the Allow command below it. Or if you want to disallow a specific search engine to crawl your blog you have to type the disallow command below it. For example lets suppost i want to allow the googlebot to crawl my blog but i dont want bing to crawl my blog. So the tags i am going to be use are below.

User-agent: googlebot Allow: / 
User-agent: bingbot Disallow: /

So from this example, its easy to understand that to allow the crawler, you have to type Allow: / and to disallow the crawler type Disallow: / and thats it. You should type the location of your Blog sitemap in Robots.txt file. Simply type Sitemap: http://www.yourdomain.com/sitemap.xml or the location where your sitemap exists. If you have more then one sitemap, then follow the example below.

Sitemap: http://www.yourdomain.com/sitemap1.xml
Sitemap: http://www.yourdomain.com/sitemap2.xml
Sitemap: http://www.yourdomain.com/sitemap3.xml

Simply type each of the sitemap in the next line. So the crawler will understand the location of your Sitemap and will easily index your links. Ok now you have to disallow the Search crawlers to crawl the admin area or important directories of your blog. For this purpose you have to put the tag Disallow: /directory name/

Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/

Disallow: /cgi-bin/ This tag will disallow the crawler to crawl CGI Bin directory.
Disallow: /wp-admin/ This tag will disallow the crawler to crawl Wp-Admin directory.
Disallow: /wp-content/ This tag will disallow the crawler to crawl Wp-Content directory.

So if you want to Disallow any of the directory simply follow the guidelines above and type the directory name or you can also Disallow any file with the same rule.

Things to avoid while creating a Robots.txt File


Creating a Robots.txt is not much hard. But a single mistake is very dangerous for the health of your blog traffic. A single mistake can completely remove the presence of your blog from the Search Engines. Here are the common mistakes which you should avoid.

  • Creating spaces while putting commands/tags. Example: Dis allow
  • Creating spaces in the start of command. Example:  Disallow: /wp-admin/
  • Usage of extra Capital letters. Example: DisAllow

HowUpdates Robots.txt File


Here is the robots file which i wrote for HowUpdates. You can copy that one and change the sitemap location to yours and leave the rest as it is. And if you want to put more limitations you can edit it according to your needs. But remember a single wrong Disallow command can completely remove your blog from search engines.

sitemap: http://www.howupdates.com/sitemap.xml

User-agent:  *
Disallow: /wp-admin/
Disallow: /wp-includes/

User-agent: NinjaBot
Allow: /

User-agent: Mediapartners-Google*
Allow: /

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Robots.txt file of Famous Websites


Below you can find link to the robots.txt file of some Famous Websites all around the globe.

 

Start using robots.txt file and start monetizing your blog. You must create a robots.txt file in order to do Quality SEO of your blog. If you required any help for Robots.txt file, do let me know via comments below. Share this post with your friends and Don’t forget to subscribe to the blog for Daily Newsletters.

5 Comments

You can post comments in this post.


  • I have seen robot.txt file hundred of time in my blog but didn’t know what exactly it is. Read all your post and find that my blog robot.txt file is a kind of awkward (I guess). When you read this comment kindly help me to with some codes in robot.txt Thank you.

    Usama Arshad 4 years ago Reply


    • Sure! Kindly contact me on facebook.

      M Luqman 4 years ago Reply


  • I updated the robots.txt for all of my sites a few weeks ago. One thing I learned is that the many blog posts about how to create a robots.txt forget to mention social media sites.

    Following some advice from my web host I started with a deny-all rule and then listed a lot of search bots and allow them specific access.

    Hours later I found that Twitter wasn’t showing previews for my posts anymore.

    Of course!

    It’s extremely important to make sure you’re not accidentally blocking Facebook, Twitter, & Co.

    They won’t be able to pull images and post descriptions for sharing.

    Ralf Skirr 4 years ago Reply


    • Hey Ralf,
      In my knowledge there is no such commands in robots.txt for social media or social media previews. And the Twitter does not have the ability to share the previews or features images of the posts. But whenever you or your readers share your post on facebook/G+/Pinterest or any other social will be able to attach the previews. However deny all rule can be destroy your blog, since you are denying/disallowing everything from your blog so maybe it is possible that social media will no longer be able to share your previews. However correct your Robots.txt file if you still have that issue or you can use the commands of mine.
      Thanks and Keep visiting!

      M Luqman 4 years ago Reply


  • thanks for this information about robot.txt

    monpctips 4 years ago Reply


Post A Reply

CommentLuv badge