The robots.txt file can be an excellent ally for improving the SEO in your WordPress website. Increasing the traffic of a web page to reach more potential customers is one of the most common objectives today. For this, some professionals are exclusively dedicated to creating content and developing web positioning strategies. So today, we will discuss how to edit and optimize a robots.txt file in WordPress.
What is Robots.txt?
The robots.txt is a small file that allows search engine bots to indicate which parts of a website they can and cannot crawl.
When Google, Bing, Yandex, or any other search engine accesses a website, it is first to access the robots.txt.
Based on this, it will make analysis decisions based on the orders that we have marked.
If you have not created it, it will access without paying attention to any order or prohibition previously indicated, which can sometimes be harmful, especially on websites with a lot of content.
In summary: the robots.txt is not mandatory, but it is highly recommended to optimize it because it will help the positioning of your website.
Why Use Robots.txt?
In the same way, you must make sure that Google does not index unimportant URLs on your website, indicating as noindex pages such as “privacy policy”, “cookie policy”, “legal notice”, “URLs with content that is not searched on Google”. Then, you must use the robots to more drastically close Google access to those URLs or parts of the web.
We cannot confuse noindex with robots.txt as their functions are different.
- Noindex: It does not show a specific page in the SERPs or what the same, allows Google to index content.
- Robots.txt: Blocks access to marked URLs so that Google cannot read the HTML, which includes reading the noindex.
Google recommends using noindex not to show URLs in SERPs:
If you don’t want certain pages to appear in search results, “don’t use the robots.txt file to hide your web page”, as Google explains
Although the orders indicated in the robots.txt are usually obeyed by search engines, they are not 100% effective. Google or others can ignore the displayed instructions and track the blocked URLs.
According to Google, the information you give in your robots.txt file is instructions, not rules.
If multiple links point to this page, Google may index it and display it in its search results without knowing what it contains, even if you have blocked it in your robots.txt file.
How to Create a Robots.txt in WordPress
To see if a website has created a robots.txt file, you just have to indicate after the domain /robots.txt .
Example: themrally.com/robots.txt
The creation of the robots.txt file is very simple and you can do this in several ways:
1. Creating a .txt file noting what you want to block and uploading it to the root of your website
– Open a new file in notes, indicate the directives you want, and save it with the name robots.txt.
– Now you just have to upload it to the root of your website and that’s it.
2. Using a plugin like Yoast Seo .
– Access the Yoast tools option and click Create Robots.txt.
As you can see, create a predesigned robot by default that you can save and you already have it created in the absence of indicating the sitemap.
To do this, whether you use Yoast or the notes blog, you must indicate the following line:
– If you use the Google xml Sitemap plugin:
Sitemap: yourwebsite.com/sitemap.xml
– Or if you use the Yoast Seo sitemap :
Sitemap: yourwebsite.com/sitemap_index.xml
To insert the sitemap inside the robots.txt you just have to copy the path of your sitemap with your domain in the robots.txt in WordPress.
Remember to paste it at the end of the robots.txt.
How to Tell Google That We Have Created the Robots.txt?
Very simple, access your Google Search Console and in the Help section (?) Write robots.txt.
The first result will appear: Test your robots.txt file. Click, and the robots.txt tool will open.
Note: Keep in mind that uploading the Robots.txt file is not the same as validating it in Google Search Console. To validate it, you will have to have previously created it.
Choose a property (your domain) and copy and paste the robots.txt in WordPress created in the notes blog or Yoast.
Click submit and choose the option “Request the update from Google. “
Make sure it doesn’t generate an error.
Now you have to access the browser and type yourwebsite.com/robots.txt and see if it shows up.
Google Search console keeps making improvements and changes to its interface, it may change or add this tester option on another site later
With this, you will have your robots created, and you will be facilitating Google tracking, although everything you want is customizable.
Note: You have to be careful because every website is different, and an error in a simple * o / can make Google not track essential parts.
Now We will see the robots created by default and the different options to create a custom robot.
How is the Robots.txt Created by Default in WordPress
This is the default code created.
user-agent: * Disallow: / wp-admin / Allow: /wp-admin/admin-ajax.php
* Here you have to add the line of your sitemap.
In them, we see disallow (not allow) and allow (permit).
In it, there are three lines:
- User-agent: *: Allows all search engines to crawl your website.
- Disallow / wp-admin: Prevents search engines from wasting time crawling the WordPress admin.
- Allow: /wp-admin/admin-ajax.php/: Within the above prohibition, search engines must crawl admin-ajax-PHP.
Note: If your website were blocked for search engines (option in the WordPress settings while designing a website), your robots would have a Disallow: /
You have to be careful because if it shows this, you are telling Google not to crawl anything on your website at all.
The essential part of robots is found in wildcards. It is necessary to know all the codes to use, such as the * sign, the $, etc. Let’s see it:
Wildcards to Use in a Robots.txt.
In the creation of robots.txt, you must respect uppercase, lowercase, spaces. It is not the same to indicate / wp-content as / WP-content.
Any error in space and out-of-site symbol can greatly harm your website in terms of positioning.
Pad (#)
This symbol can simply be used to annotate comments indicating what the different lines to be treated mean.
Example: #blocking searches or #blocking trackbacks.
In this way, you will have control of what you want to indicate and it will be more organized.
User-agent
Indicate which bots you want to target. The normal thing is that all bots access your website, so by default it is:
User-agent: *
But if, for example, you only want the Google robot to have access or establish a specific rule for Google, you will have to add the line:
User-agent: Googlebot
Everything you add below will apply exclusively to the Googlebot.
Note: It is important that the user-agents are separated by spaces because otherwise if they are together, the rules below will apply to all.
If we establish several user-agents of the same bot, the most specific or long will be the one that sends.
Example:
User-agent: Googlebot-Image Disallow: /
shall prevail over:
User-agent: Googlebot Disallow:
(allows tracking when not carrying / )
The Asterisk (*)
This is a wildcard symbol that represents any sequence of characters. For example, we indicate /*.pdf and this will be referring to all the files that contain .pdf.
And it will be valid for both yourwebsite.com/document.pdf and yourwebsite.com/document.pdf?ver=1.1
The use of the asterisk is very important
Suppose you want to prevent search engines from accessing parameterized product category URLs on your site.
You can do it like this:
User-agent: * Disallow: / products / t-shirts? Disallow: / products / shirts? Disallow: / products / coats?
Or make use of the * (best option)
User-agent: * Disallow: / products / *?
By marking the * you will be telling search engines not to track absolutely any product with parameters.
The Dollar Symbol ($)
If there is any character after this symbol ( $ ), the rule will not be applied. So if, for example, we indicate /*.pdf$, we will be referring to all the files that end with a .pdf.
This includes yourwebsite.com/document.pdf but excludes yourwebsite.com/document.pdf?ver=1.1.
Use the wildcard ” $ ” to mark the end of a URL.
For example, if you want to prevent search engines from accessing all the .pdf files on your site, your robots.txt file might look like this:
User-agent: * Disallow: /*.pdf$
Noindex and Nofollow
Note: Since September 1, 2019, it is not recommended to mark noindex in the robots, although you can and should use it in the meta robots tag or the HTTP header x-robots instead. In the same way, you should not use nofollow either.
You Can Also Read: 6 Reasons Why WordPress Is The Best CMS For SEO
Disallow
This directive prevents search engines from crawling a specific page, category, or structure.
Note: Even if you have a page that you assign disallow marked as noindex, bots can index it but not its content.
We explain to you; If you mark a page as noindex, the URL or its internal content will be shown in the SERPs, although search engines can include the URL in the SERPs with a meta description indicating that there are robots.
This can happen if they consider, for example, that the page has inbound links and is of quality.
What happens to the Linkjuice in a Disallow?
The link juice (force of a page) will not be transferred to another URL if robots block the first one.
We’ll explain:
Example: Imagine that we mark
Disallow: / service1 /
Service1 receives links from home and in turn, service1 has links to service2. Service1 continues to receive his strength but not pierce to service2 because we have been blocked by robots that url.
Blocking pages starting with.
Disallow: / test page
Block all the URLs that start with the test page but not those with something in front of them. For this, you would need to include a *.
That is, it would block all URLs that started with / test page such as yourwebsite.com/testpage/ or yourwebsite.com/testpage-image/contact.
But we would need a * in front:
yourwebsite.com/*testpage
If we want to block for example yourwebsite.com/example-test page or yourwebsite /category/test page
Folder lock
If you want to lock the folder of page test, you must place a sidebar to the end of the directive as follows:
Disallow: / test page /
In this way, we will block all the URLs that contain said folder, such as:
tusitioweb.com/paginaprueba/
tusitioweb.com/paginaprueba/imagenes/
But we would not block those URLs that do not contain exactly that example folder:
yourwebsite.com/test-pictures/portfolio
yourwebsite.com/index/testpage
yourwebsite.com/tests-page
Another example: Imagine that you clone your website by creating a subfolder on the server called / cop.
If you put:
Disallow: / cop
not only will you be blocking the subfolder, but you will also be blocking pages like / backup-copy /, / cooking-potatoes, or / copier-epson /
The solution is to lock the entire folder by putting a / at the end
That is to say:
Disallow: / cop /
And as always, if we want to block all the URLs that contain / test page / regardless of the position we must use the following:
Disallow: / * / test page /
And remembering the $, if what we want is to block all the URLs that end in the test page we should use:
Disallow: / * test page $
Allow
The Allow function is the opposite of Disallow and is used exclusively to allow access to specific parts previously blocked by disallowing.
For example, it is customary to block the / wp-content / plugins/folder since we do not want search engines to waste time here, but Google, for example, indicates that it should have access to the .css and .js files.
As these files exist in this folder, we must permit tracking as follows:
Disallow: / wp-content / plugins / Allow: /wp-content/plugins/*.js Allow: /wp-content/plugins/*.css
Imagine that you want to block the entire blog except for one entry, you can apply the following;
User-agent: * Disallow: / blog Allow: / blog / post-allowed
How to Optimize the Robots.text File in WordPress
There is no fixed rule, and you have to be careful when replicating robots from other websites as it can be counterproductive.
An example of standard robots.txt with some rules can be the following:
# Block or allow access to attached content. (If the installation is in / public_html). User-agent: * Disallow: / cgi-bin Disallow: / wp-content / plugins / Disallow: / wp-content / themes / Disallow: / wp-includes / Disallow: / wp-admin / # Prevent access to the different feed generated by the page Allow: / feed / $ Disallow: / feed Disallow: / comments / feed Disallow: / * / feed / $ Disallow: / * / feed / rss / $ Disallow: / * / trackback / $ Disallow: / * / * / feed / $ Disallow: / * / * / feed / rss / $ Disallow: / * / * / trackback / $ Disallow: / * / * / * / feed / $ Disallow: / * / * / * / feed / rss / $ # Prevent URLs ending in / trackback / that serve as Trackback URLs. Disallow: / * / * / * / trackback / $ # Avoid CSS and JS crashes. Allow: /*.js$ Allow: /*.css$ #Lock all pdfs Disallow: /*.pdf$ #Lock parameters Disallow: / *? # List of bots you should allow. User-agent: Googlebot-Image Allow: / wp-content / uploads / User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / # List of blocked bots User-agent: MSIECrawler Disallow: / User-agent: WebCopier Disallow: / User-agent: HTTrack Disallow: / User-agent: Microsoft.URL.Control Disallow: / User-agent: libwww Disallow: / User-agent: Baiduspider Disallow: / User-agent: GurujiBot Disallow: / User-agent: hl_ftien_spider Disallow: / User-agent: sogou spider Disallow: / User-agent: Yeti Disallow: / User-agent: YodaoBot Disallow: / #Disallow unnecessary pages Disallow: / thanks-for-subscribing # We add an indication of the location of the sitemap Sitemap: https: //website/sitemap_index.xml
Note: You may want to disallow comments, tags, etc. Every website is different but thinks if you want search engines to waste time tracking that.
Do You Need a Robots.txt File?
This is the question you can ask yourself.
As we mentioned having a robots.txt is not essential for small sites, although my recommendation is that you can use it and improve the rankings when you use it.
A good robots.txt created can help you in:
- Maintenance tasks to be able to include a Disallow: /
- Prevent Google from wasting your crawl budget by increasing the likelihood of more extended and better access to relevant pages.
- You can avoid duplicate content. You can prevent pages such as checkout or cart from being tracked in an online store (Disallow: / checkout / and Disallow: / cart /)
The robots.txt in WordPress alone is independent of subdomains.
In other words, if you have a subdomain created, you have to make a specific robot for the subdomain started.
Conclusion
The robots can help when tracking your website, but you have to make sure that it works well. For example, a simple comma or wrong capital letter can do significant SEO damage.
Whether or not it is necessary at all sites depends. You have to know that, on small websites with simple architectures, the truth is that search engines track it without problems.
Even essential SEO websites say not to use robots.txt in WordPress since Google is smart enough to understand a website.
However, we always say that in SEO, everything helps, no matter how small.
If you can make Google prioritize and understand your site better, saving it time, we recommend using a consistent bot without going crazy.
So far the Guide on Robots, we recommend that you work on your robots.txt in WordPress and increase crawling! If you have any difficulties, please join our Theme Rally Community to ask your questions.