The Best Guide to Create and Optimize Robots.txt in WordPress

The robots.txt file can be an excellent ally for improving the SEO in your WordPress website. Increasing the traffic of a web page to reach more potential customers is one of the most common objectives today. For this, some professionals are exclusively dedicated to creating content and developing web positioning strategies. So today, we will discuss how to edit and optimize a robots.txt file in WordPress.

Table of Contents hide

What is Robots.txt?

Why Use Robots.txt?

How to Create a Robots.txt in WordPress

How to Tell Google That We Have Created the Robots.txt?

How is the Robots.txt Created by Default in WordPress

Wildcards to Use in a Robots.txt.

Pad (#)

User-agent

The Asterisk (*)

The Dollar Symbol ($)

Noindex and Nofollow

Disallow

Allow

How to Optimize the Robots.text File in WordPress

Do You Need a Robots.txt File?

Conclusion

What is Robots.txt?

The robots.txt is a small file that allows search engine bots to indicate which parts of a website they can and cannot crawl.

When Google, Bing, Yandex, or any other search engine accesses a website, it is first to access the robots.txt.

Based on this, it will make analysis decisions based on the orders that we have marked.

If you have not created it, it will access without paying attention to any order or prohibition previously indicated, which can sometimes be harmful, especially on websites with a lot of content.

In summary: the robots.txt is not mandatory, but it is highly recommended to optimize it because it will help the positioning of your website.

Why Use Robots.txt?

In the same way, you must make sure that Google does not index unimportant URLs on your website, indicating as noindex pages such as “privacy policy”, “cookie policy”, “legal notice”, “URLs with content that is not searched on Google”. Then, you must use the robots to more drastically close Google access to those URLs or parts of the web.

We cannot confuse noindex with robots.txt as their functions are different.

Noindex: It does not show a specific page in the SERPs or what the same, allows Google to index content.
Robots.txt: Blocks access to marked URLs so that Google cannot read the HTML, which includes reading the noindex.

Google recommends using noindex not to show URLs in SERPs:
If you don’t want certain pages to appear in search results, “don’t use the robots.txt file to hide your web page”, as Google explains

Although the orders indicated in the robots.txt are usually obeyed by search engines, they are not 100% effective. Google or others can ignore the displayed instructions and track the blocked URLs.

According to Google, the information you give in your robots.txt file is instructions, not rules.

If multiple links point to this page, Google may index it and display it in its search results without knowing what it contains, even if you have blocked it in your robots.txt file.

How to Create a Robots.txt in WordPress

To see if a website has created a robots.txt file, you just have to indicate after the domain /robots.txt .

Example: themrally.com/robots.txt

The creation of the robots.txt file is very simple and you can do this in several ways:

1. Creating a .txt file noting what you want to block and uploading it to the root of your website

– Open a new file in notes, indicate the directives you want, and save it with the name robots.txt.

– Now you just have to upload it to the root of your website and that’s it.

2. Using a plugin like Yoast Seo .

– Access the Yoast tools option and click Create Robots.txt.

As you can see, create a predesigned robot by default that you can save and you already have it created in the absence of indicating the sitemap.

To do this, whether you use Yoast or the notes blog, you must indicate the following line:

– If you use the Google xml Sitemap plugin:

Sitemap: yourwebsite.com/sitemap.xml

– Or if you use the Yoast Seo sitemap :

Sitemap: yourwebsite.com/sitemap_index.xml

To insert the sitemap inside the robots.txt you just have to copy the path of your sitemap with your domain in the robots.txt in WordPress.

Remember to paste it at the end of the robots.txt.

How to Tell Google That We Have Created the Robots.txt?

Very simple, access your Google Search Console and in the Help section (?) Write robots.txt.

The first result will appear: Test your robots.txt file. Click, and the robots.txt tool will open.

Note: Keep in mind that uploading the Robots.txt file is not the same as validating it in Google Search Console. To validate it, you will have to have previously created it.

Choose a property (your domain) and copy and paste the robots.txt in WordPress created in the notes blog or Yoast.

Click submit and choose the option “Request the update from Google. “

Make sure it doesn’t generate an error.

Now you have to access the browser and type yourwebsite.com/robots.txt and see if it shows up.

Google Search console keeps making improvements and changes to its interface, it may change or add this tester option on another site later

With this, you will have your robots created, and you will be facilitating Google tracking, although everything you want is customizable.

Note: You have to be careful because every website is different, and an error in a simple * o / can make Google not track essential parts.

Now We will see the robots created by default and the different options to create a custom robot.

How is the Robots.txt Created by Default in WordPress

This is the default code created.

user-agent: *
Disallow: / wp-admin /
Allow: /wp-admin/admin-ajax.php

* Here you have to add the line of your sitemap.

In them, we see disallow (not allow) and allow (permit).

In it, there are three lines:

User-agent: *: Allows all search engines to crawl your website.
Disallow / wp-admin: Prevents search engines from wasting time crawling the WordPress admin.
Allow: /wp-admin/admin-ajax.php/: Within the above prohibition, search engines must crawl admin-ajax-PHP.

Note: If your website were blocked for search engines (option in the WordPress settings while designing a website), your robots would have a Disallow: /

You have to be careful because if it shows this, you are telling Google not to crawl anything on your website at all.

The essential part of robots is found in wildcards. It is necessary to know all the codes to use, such as the * sign, the $, etc. Let’s see it:

Wildcards to Use in a Robots.txt.

In the creation of robots.txt, you must respect uppercase, lowercase, spaces. It is not the same to indicate / wp-content as / WP-content.
Any error in space and out-of-site symbol can greatly harm your website in terms of positioning.

Pad (#)

This symbol can simply be used to annotate comments indicating what the different lines to be treated mean.

Example: #blocking searches or #blocking trackbacks.

In this way, you will have control of what you want to indicate and it will be more organized.

User-agent

Indicate which bots you want to target. The normal thing is that all bots access your website, so by default it is:

User-agent: *

But if, for example, you only want the Google robot to have access or establish a specific rule for Google, you will have to add the line:

User-agent: Googlebot

Everything you add below will apply exclusively to the Googlebot.

Note: It is important that the user-agents are separated by spaces because otherwise if they are together, the rules below will apply to all.
If we establish several user-agents of the same bot, the most specific or long will be the one that sends.

Example:

User-agent: Googlebot-Image
Disallow: /

shall prevail over:

User-agent: Googlebot
Disallow:

(allows tracking when not carrying / )

**The Asterisk (*)**

This is a wildcard symbol that represents any sequence of characters. For example, we indicate /*.pdf and this will be referring to all the files that contain .pdf.

And it will be valid for both yourwebsite.com/document.pdf and yourwebsite.com/document.pdf?ver=1.1

The use of the asterisk is very important

Suppose you want to prevent search engines from accessing parameterized product category URLs on your site.

You can do it like this:

User-agent: * 
Disallow: / products / t-shirts?
Disallow: / products / shirts?
Disallow: / products / coats?

Or make use of the * (best option)

User-agent: * 
Disallow: / products / *?

By marking the * you will be telling search engines not to track absolutely any product with parameters.

The Dollar Symbol ($)

If there is any character after this symbol ( $ ), the rule will not be applied. So if, for example, we indicate /*.pdf$, we will be referring to all the files that end with a .pdf.

This includes yourwebsite.com/document.pdf but excludes yourwebsite.com/document.pdf?ver=1.1.

Use the wildcard ” $ ” to mark the end of a URL.

For example, if you want to prevent search engines from accessing all the .pdf files on your site, your robots.txt file might look like this:

User-agent: * 
Disallow: /*.pdf$

Noindex and Nofollow

Note: Since September 1, 2019, it is not recommended to mark noindex in the robots, although you can and should use it in the meta robots tag or the HTTP header x-robots instead. In the same way, you should not use nofollow either.

You Can Also Read: 6 Reasons Why WordPress Is The Best CMS For SEO

Disallow

This directive prevents search engines from crawling a specific page, category, or structure.

Note: Even if you have a page that you assign disallow marked as noindex, bots can index it but not its content.

We explain to you; If you mark a page as noindex, the URL or its internal content will be shown in the SERPs, although search engines can include the URL in the SERPs with a meta description indicating that there are robots.

This can happen if they consider, for example, that the page has inbound links and is of quality.

What happens to the Linkjuice in a Disallow?

The link juice (force of a page) will not be transferred to another URL if robots block the first one.

We’ll explain:

Example: Imagine that we mark

Disallow: / service1 /

Service1 receives links from home and in turn, service1 has links to service2. Service1 continues to receive his strength but not pierce to service2 because we have been blocked by robots that url.

Blocking pages starting with.

Disallow: / test page

Block all the URLs that start with the test page but not those with something in front of them. For this, you would need to include a *.

That is, it would block all URLs that started with / test page such as yourwebsite.com/testpage/ or yourwebsite.com/testpage-image/contact.

But we would need a * in front:

yourwebsite.com/*testpage

If we want to block for example yourwebsite.com/example-test page or yourwebsite /category/test page

Folder lock

If you want to lock the folder of page test, you must place a sidebar to the end of the directive as follows:

Disallow: / test page /

In this way, we will block all the URLs that contain said folder, such as:

tusitioweb.com/paginaprueba/
tusitioweb.com/paginaprueba/imagenes/

But we would not block those URLs that do not contain exactly that example folder:

yourwebsite.com/test-pictures/portfolio
yourwebsite.com/index/testpage
yourwebsite.com/tests-page

Another example: Imagine that you clone your website by creating a subfolder on the server called / cop.

If you put:

Disallow: / cop

not only will you be blocking the subfolder, but you will also be blocking pages like / backup-copy /, / cooking-potatoes, or / copier-epson /

The solution is to lock the entire folder by putting a / at the end

That is to say:

Disallow: / cop /

And as always, if we want to block all the URLs that contain / test page / regardless of the position we must use the following:

Disallow: / * / test page /

And remembering the $, if what we want is to block all the URLs that end in the test page we should use:

Disallow: / * test page $

Allow

The Allow function is the opposite of Disallow and is used exclusively to allow access to specific parts previously blocked by disallowing.

For example, it is customary to block the / wp-content / plugins/folder since we do not want search engines to waste time here, but Google, for example, indicates that it should have access to the .css and .js files.

As these files exist in this folder, we must permit tracking as follows:

Disallow: / wp-content / plugins /
Allow: /wp-content/plugins/*.js
Allow: /wp-content/plugins/*.css

Imagine that you want to block the entire blog except for one entry, you can apply the following;

User-agent: *
Disallow: / blog
Allow: / blog / post-allowed

How to Optimize the Robots.text File in WordPress

There is no fixed rule, and you have to be careful when replicating robots from other websites as it can be counterproductive.

An example of standard robots.txt with some rules can be the following:

# Block or allow access to attached content. (If the installation is in / public_html).

User-agent: *
Disallow: / cgi-bin
Disallow: / wp-content / plugins /   
Disallow: / wp-content / themes /   
Disallow: / wp-includes /   
Disallow: / wp-admin /

# Prevent access to the different feed generated by the page

Allow: / feed / $
Disallow: / feed   
Disallow: / comments / feed   
Disallow: / * / feed / $   
Disallow: / * / feed / rss / $   
Disallow: / * / trackback / $   
Disallow: / * / * / feed / $   
Disallow: / * / * / feed / rss / $   
Disallow: / * / * / trackback / $   
Disallow: / * / * / * / feed / $   
Disallow: / * / * / * / feed / rss / $   

# Prevent URLs ending in / trackback / that serve as Trackback URLs.   

Disallow: / * / * / * / trackback / $

# Avoid CSS and JS crashes.

Allow: /*.js$
Allow: /*.css$

#Lock all pdfs

Disallow: /*.pdf$

#Lock parameters

Disallow: / *?

# List of bots you should allow.

User-agent: Googlebot-Image
Allow: / wp-content / uploads /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /


 # List of blocked bots

User-agent: MSIECrawler
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: libwww
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: GurujiBot
Disallow: /
User-agent: hl_ftien_spider
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: Yeti
Disallow: /
User-agent: YodaoBot
Disallow: /

#Disallow unnecessary pages

Disallow: / thanks-for-subscribing

# We add an indication of the location of the sitemap

Sitemap: https: //website/sitemap_index.xml

Note: You may want to disallow comments, tags, etc. Every website is different but thinks if you want search engines to waste time tracking that.

Do You Need a Robots.txt File?

This is the question you can ask yourself.

As we mentioned having a robots.txt is not essential for small sites, although my recommendation is that you can use it and improve the rankings when you use it.

A good robots.txt created can help you in:

Maintenance tasks to be able to include a Disallow: /
Prevent Google from wasting your crawl budget by increasing the likelihood of more extended and better access to relevant pages.
You can avoid duplicate content. You can prevent pages such as checkout or cart from being tracked in an online store (Disallow: / checkout / and Disallow: / cart /)

The robots.txt in WordPress alone is independent of subdomains.

In other words, if you have a subdomain created, you have to make a specific robot for the subdomain started.

Conclusion

The robots can help when tracking your website, but you have to make sure that it works well. For example, a simple comma or wrong capital letter can do significant SEO damage.
Whether or not it is necessary at all sites depends. You have to know that, on small websites with simple architectures, the truth is that search engines track it without problems.
Even essential SEO websites say not to use robots.txt in WordPress since Google is smart enough to understand a website.
However, we always say that in SEO, everything helps, no matter how small.
If you can make Google prioritize and understand your site better, saving it time, we recommend using a consistent bot without going crazy.
So far the Guide on Robots, we recommend that you work on your robots.txt in WordPress and increase crawling! If you have any difficulties, please join our Theme Rally Community to ask your questions.

Last Update:

March 2021

About the Author

Mira Edorra

Mira is a passionate blogger who loves writing about open-source technologies.

Elementor