Learn How To Create & Optimize Your Robots.txt File For SEO

0
544
SEO Robots.txt file for seo

Robots.txt is an important part of the technical SEO element. 

Ever wonder a single wrong syntax code in your robots.txt file, could remove all of your pages from search engines index.

As the instruction is given in the robots.txt file, followed by most of the search engines.

So we must use robots.txt correctly, thus all your website, important pages could be crawled by search bots easily.

Thus optimizing the robots.txt file is one of the crucial parts in on-page seo.

So if you’ve found robots. txt file optimization challenging, you can make it work, and I’ll show you how.

What is a robots.txt file?

Robots. txt file is a text file placed on the root directory of the domain. Robots .txt file command the search engine bots which pages to crawl and which not to.

As robots. txt is the first file that crawler visit while visiting your site. 

And first, it inspects all the instructions given in the robots .txt file. 

Then it starts crawling the website by following given instructions in the robots .txt file.

You can also refer a sitemap URL in the robots .txt file it makes it easy for bots to find all the pages on your site.

Do you need robots.txt file?

Robots .txt file is not mandatory but you should keep it on your website.

Why?

To keep unimportant pages away from crawling search engine bots and it’s SE index.

Mainly search engines can crawl and index all of your website pages easily.

But as a results search engines can crawl those pages as well as, which are not important or which you don’t want to crawl.

Reasons you should have a robots.txt file on your site

#1. Block Private Pages

Such as login page, dynamically generated content and even block content is by folder as well.

#2. Improve Crawl Budget

Block useless pages through robots file that way search engine bots spend more time crawling useful and valuable pages and as a result, it will improve your website crawl budget.

Why the robots.txt file is important

#1. Robots .txt file prevents appearing unimportant pages in search engine results.

#2. Robots .txt file helps your website in crawl budget by blocking unimportant pages so that crawlers can spend most of their time crawling important pages only.

#3. Robots .txt file helps your website blocking spam bots from crawling your website as well.

How to create robots .txt file manually?

#1. Open the Notepad enter the below syntax in the notepad file and save as robots.txt.

#2. robots.txt file syntax –
User-agent: *
disallow:

how to create robots.txt file

2. Make sure the filename should be exact (robots .txt) in lowercase.
Now your robots .txt file is ready.

Where to upload robots.txt file?

1#. Log in to your Cpanel and click on file manager.
2#. Now click on upload.

Upload robots.txt in Cpanel

3#. Once uploaded now header over to your Cpanel in the root domain directory.

And you will see robots. txt file as you’re seeing in the given below example.

Robots.txt file in root directory

4#. Test your robots. txt file on the live domain as well.

In the browser enter your domain and behind it add robots .txt.

Example – https://digitalpankaj.me/robots.txt

And you will see your robots .txt file like this.

example robots txt file

How to Create a Robots.txt File in WordPress?

Log into wordpress and if you installed the Yoast SEO plugin.

Now on the left side click on SEO >> Tools.

wordpress yoast seo

On the next page click on File Editor.

File editor wordpress

And click on create a robots. txt file.

create_robots.txt file wordpress yoast

Enter the below code and click on save.

Enter the robots.txt code in yoast robots.txt editor

And click on save.

How to Test your Robots.txt file?

Log in to search console and it will ask you to select the property.

Choose Your Property in Search Console

Select the property and you will be redirected to robots .txt Tester section, in search console.

robots.txt tester in serch console

It will automatically fetch your robots .txt file in the search console. Now you can also test your robots .txt file here for errors. Below the domain enter the robots .txt and click on test.

Robots.txt Syntax – 

Robots .txt must be a UTF-8 encoded text file (which includes ASCII). Using other character sets is not possible. Here is the list of all the User-Agent Robots.

Ready?

I’ll walk you through the whole process.

So here we go.

Robots.txt file Basic format:

User-agent: *
Disallow:

User-agent: 

This is the first line of the rule in robots .txt. This line of syntax tells for which crawlers you would like to set the permission.

Star *

Star Means for All Search Engine Bots.

If you want to set up permission for any particular bots. 

Then just add that particular bots name in front of user-agent

Disallow: /

If you add / after the disallow then it tells the bots that you’re not allowed to visit any page.

User-agent: *
Disallow: /

And if you remove the / behind the disallow then all the bots have permission to visit the pages of the website.

User-agent: *
Disallow:

Allow: 

It tells the search engine bots that you can access the file in a folder that has been disallowed.

User-agent: *
Disallow: /photos
Allow: /photos/mycar.jpg

Sitemap:

To help google discover urls that you want to crawl and index in search engines.

User-agent: *
Disallow: /
https://example.com/sitemap.xml

You know every search engines have their own crawlers & bots.

For example – 

Google User-Agent

Googlebot

Bing User-Agent

Bingbot

Yahoo User-Agent

Slurp

Here you can find a list of all the search engine bots.

Here are some common useful robots.txt rules:

Robots.txt Disallow

User-agent: *
Disallow: /

Robots.txt Allow

User-agent: *
Disallow:

Blocking a specific web crawler from accessing all content

User-agent: googlebot
Disallow: /

Blocking a specific web crawler from a specific folder

User-agent: *
Disallow: /example-subfolder/

To allow a single robot

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

Block all images on your site from Google Images

User-agent: Googlebot-Image
Disallow: /

Note – For subdomain make sure to create separate robots .txt file.

How to Use Wildcards in Robots.txt

Use robots .txt to allow or exclude specific urls from search engines. 

For this robots .txt, use pattern matching for the urls. 

And with the help of using wildcards in robots .txt, you can do it easily.

There are two types of character –

* Wildcard :

The * wildcard character is used to match any sequence of characters.

Block search engines from accessing any URL that has a ? in it:

User-agent: *
Disallow: /*?

$ wildcards:

The $ wildcard character is used to denote the end of a URL.

Block search engines from crawling any URL that ends with “.pdf” 

User-agent: *
Disallow: /*.pdf$

Always Validate your robots .txt Changes Before Making It Live.

Robots.txt Testing Tool 

Use the below tools to double-check your work. 

  1. Google’s Robots.txt Testing Tool
  2. Robots.txt Validator and Testing Tool
  3. Robots.txt Test Tool

Robots.txt file Generator 

If you don’t want to take the load to generate the robots .txt file yourself. No worries you can easily create a robots.txt file with the help of robots .txt generator tools online.

Here is the list of the tools that you can use to generate your robots .txt file.

  1. Seobook Robots.txt generator
  2. Ryte Robots.txt generator
  3. Seoptimer Robots.txt generator

Meta Robots vs Robots.txt

You can use robots .txt to block pdf, multimedia resources, images, videos but meta robots are complicated to implement on these pages & files.

Robots .txt is a better option for excluding the useless pages to improve the crawl budget.

As meta directives are easier to implement & if implemented wrongly they only affect that particular page but a wrong syntax of code in robots .txt can block the entire website from crawling.

So I would recommend using the meta directives instead of robots .txt.

Let me know in the comments how helpful is this post for you or did you learn something from this.

LEAVE A REPLY

Please enter your comment!
Please enter your name here