The crawl budget indicates how fast and how many pages a search engine wants to crawl on your website. It depends on the amount of resources a crawler wants to use on your site and the amount of crawling your server supports.
More crawling doesn’t mean you rank better. However, if your pages are not crawled and indexed, they will not be rated at all.
Most websites don’t have to worry about the crawl budget, but there are few times when you’ll want to take a look. Let’s look at some of these cases.
Usually, you don’t have to worry about the crawl budget on popular pages. It’s usually pages that are newer, that aren’t well linked, or don’t change much, that aren’t crawled often.
Crawling budget can be an issue for newer sites, especially sites with many pages. Your server may be able to support more crawling. However, because your website is new and probably not very popular yet, a search engine may not want to crawl your website very often. This is mostly a break in expectations. You want your pages to be crawled and indexed, but Google doesn’t know if your pages are worth indexing and may not want to crawl as many pages as you want.
Crawl budget can also be an issue for larger websites with millions of pages or websites that are updated frequently. In general, if many pages are not being crawled or updated as often as you’d like, you should try to speed up the crawl. We’ll talk about how to do this later in this article.
If you want to get an overview of Google’s crawling activity and the issues they have identified, the best place to look is Crawl Stats Report in the Google Search Console.
Here you will find various reports that can help you identify changes in crawling behavior, problems with crawling and more information about how Google is crawling your website.
You definitely want to look into everyone marked crawl status like the ones shown here:
There are also timestamps of when pages were last crawled.
If you want to see hits from all bots and users, you need access to your log files. Depending on your hosting and setup, you may have access to tools like Awstats and Webalizer, as shown here on a shared host with cPanel. These tools show some aggregated data from your log files.
For more complex setups, you will need to access and save data from the raw log files, possibly from multiple sources. You may also need special tools for larger projects, such as: MOOSE (elasticsearch, logstash, kibana) Stack that enables the storage, processing and visualization of log files. There are also protocol analysis tools such as Splunk.
These URLs can be found by crawling and parsing pages or from a variety of other sources including sitemaps. RSS Feeds, submitting URLs for indexing in Google Search Console, or using the indexing API.
There are also multiple googlebots who share the crawl budget. For a list of the various Googlebots crawling your website, see the Crawl Stats report in GSC.
Each website has a different crawl budget made up of a few different inputs.
The crawl demand is simply how much Google wants to crawl on your website. More popular pages and pages with significant changes are crawled more often.
Popular pages or those with more links generally take precedence over other pages. Remember, Google needs to prioritize your pages in some way for crawling. Links are an easy way to find out which pages on your website are more popular. However, it’s not just your website, but all of the pages on every website on the internet that Google needs to figure out how to prioritize.
You can use the … Preferably via links Report in Site Explorer indicating which pages are likely to be crawled more often. It also shows you when Ahrefs last crawled your pages.
There is also a concept of staleness. When Google finds that a page is not changing, the page is crawled less often. For example, if they’re crawling a page and don’t see any changes after a day, they might wait three days before crawling again, ten days the next time, 30 days, 100 days, and so on. There isn’t actually a set amount of time between Waiting for crawls. but it gets rarer over time. However, when Google detects large changes across the website or a website move, the crawl rate is usually increased, at least temporarily.
Creep rate limit
The crawl rate limit is how much crawling your website can support. Websites can do a certain amount of crawls before problems with the stability of the server, such as: B. Slowdowns or errors. Most crawlers will roll back crawling when they see these issues so they don’t harm the site.
Google adapts to the crawling status of the website. If the site is fine with more crawling, the limit will increase. If the website has problems, Google will slow down the speed at which it is crawled.
There are a few things you can do to ensure that your website supports additional crawling and increases your website’s crawl demand. Let’s look at some of these options.
Speed up your server / increase resources
Essentially, the way Google crawls pages is to download resources and then end up processing them. Your page speed as a user perceives that it is not quite the same. The crawl budget affects how quickly Google can connect and download resources, which has more to do with the server and resources.
Remember, crawl demand is generally based on popularity or links. You can increase your budget by increasing the number of external and / or internal links. Internal links are easier because you control the site. For suggested internal links, see the Link possibilities Report in Site Audit, which also includes a tutorial that explains how it works.
Keeping links to broken or redirected pages active on your website will have little impact on your crawl budget. Usually the pages linked here have a relatively low priority as they probably haven’t changed in a while. However, cleaning up problems is good for website maintenance in general and helps your crawl budget a bit.
You can easily find broken (4xx) and redirected (3xx) links on your website in the Internal pages Report in Site Audit.
Check the sitemap for broken or redirected links in the sitemap All problems Report for “3XX Forward in sitemap “and”4XX Page in sitemap ”.
To use RECEIVE Instead of POST where you can
This is a bit more technical because it includes HTTP Request methods. Do not use POST Inquiries where RECEIVE Inquiries work. It basically is RECEIVE (pull) vs. POST (to press). POST Requests are not cached, so they affect the crawl budget RECEIVE Inquiries can be cached.
Use indexing API
If you want to crawl pages faster, see if you are eligible for Google indexing API. Currently this is only available for some use cases like job postings or live videos.
Bing has one too indexing API that is available to everyone.
What won’t work
There are a few things that are sometimes tried that don’t really help your crawl budget.
- Small changes to the side. Make small changes to pages, such as: For example, updating dates, spaces, or punctuation marks in the hope that pages will be crawled more often. Google is pretty good at determining whether changes matter or not. Therefore, these small changes are unlikely to affect crawling.
- Crawl delay directive in robots.txt. This instruction will slow down many bots. However, Googlebot doesn’t use it, so it has no effect. We at Ahrefs respect this. So if you ever need to slow down our crawling, you can add a crawl delay to your robots.txt file.
- Remove third-party scripts. Third-party scripts don’t count towards your crawl budget, so removing them won’t help.
- Nofollow. Okay, this one is dubious. In the past, nofollow links would not have used a crawl budget. However, nofollow is now treated as a hint so Google may be able to crawl these links.
There are only a few good ways to slow down Google crawling. There are some other adjustments that you could technically make, such as: B. slowing down your website, but I would not recommend these methods.
Slow setting, but guaranteed
The main control Google gives us to crawl slower is a Rating limit in the Google Search Console. You can use the tool to slow the crawl rate, but it can take up to two days for it to take effect.
Fast adaptation, but with risks
If you need a faster solution, you can take advantage of Google’s crawl rate adjustments for the health of your website. If you post a “503 Service Unavailable” or “429 Too Many Requests” status code to Googlebot, pages crawl more slowly or may stop crawling temporarily. However, you don’t want to do this for more than a few days or pages may be removed from the index.
I want to reiterate that most people don’t have to worry about the crawl budget. If you have any concerns, I hope this guide has been helpful.
I usually only investigate if there are problems with pages not crawling and indexing. I need to explain why anyone shouldn’t be concerned or I will see something that concerns me on the crawl stats report in Google Search Console.
Have any questions? let me know Twitter.