Friday, March 28, 2025
In the erstwhile posts astir the Robots Exclusion Protocol (REP), we explored what you tin already do pinch its various components — namely robots.txt and the URI level controls. In this station we will research really the REP tin play a supporting domiciled successful the ever-evolving relation betwixt automatic clients and the quality web.
The REP — specifically robots.txt — became a modular successful 2022 as RFC9309. However, the dense lifting was done anterior to its standardization: it was the trial of clip between 1994 and 2022 that made it celebrated capable to beryllium adopted by billions of hosts and virtually all awesome crawler operators (excluding adversarial crawlers specified arsenic malware scanners). It is a straightforward and elegant solution to definitive preferences pinch a elemental yet versatile syntax. In its 25 years of beingness it hardly had to germinate from its original form, it only sewage an let norm if we only see the rules that are universally supported by crawlers.
That doesn't mean that location are nary different rules; immoderate crawler usability tin travel up pinch their own rules. For example, rules for illustration "clean-param" and "crawl-delay" are not portion of RFC9309, but they're supported by immoderate hunt engines — though not Google Search. The "sitemap" rule, which again is not portion of RFC9309, is supported by each major hunt engines. Given capable support, it could go an charismatic norm successful the REP.
Because the REP tin successful truth get "updates". It's a wide supported protocol and it should grow pinch the internet. Making changes to it is not impossible, but it's not easy; it shouldn't be easy, precisely because the REP is wide supported. Like pinch immoderate alteration to a standard, location has to beryllium a statement that changes use the mostly of the users of the protocol, some connected the publishers' and the crawler operators' side.
Due to its simplicity and wide adoption, the REP is an fantabulous campaigner for carrying new crawling preferences: billions of publishers are already acquainted pinch robots.txt and its syntax for example, truthful making changes to it comes much people for them. On the flip side, crawler operators already person robust, good tested parsers and matchers (and Google besides unfastened originated its ain robots.txt parser), which intends it's highly apt that location won't beryllium parsing issues pinch caller rules.
The aforesaid goes for the REP URI level extensions, the X-robots-tag HTTP header and its meta tag counterpart. If location is simply a request for a caller norm to transportation opt-out preferences, they're easy extensible. How though?
The astir important point you, the reader, tin do is to talk astir your thought publically and gather supporters for that idea. Because the REP is simply a nationalist standard, nary 1 entity tin make unilateral changes to it; sure, they tin instrumentality support for thing caller connected their side, but that won't go THE standard. But talking astir that alteration and showing to the ecosystem — both crawler operators and the publishing ecosystem — that it's benefiting everyone will drive consensus, and that paves the roadworthy to updating the standard.
Similarly, if the protocol is lacking something, talk astir it publicly. sitemap became a wide supported norm successful robots.txt because it was useful for contented creators and search engines alike, which paved the roadworthy to take of the extension. If you person a caller thought for a rule, inquire the consumers of robots.txt and creators what they deliberation astir it and activity pinch them to hash retired imaginable (and likely) issues they raise and constitute up a proposal.
If your driver is to service the communal good, it's worthy it.
Posted by Gary Illyes, Search Relations team
Check retired the remainder of the Robots Refresher series:
Except arsenic different noted, the contented of this page is licensed nether the Creative Commons Attribution 4.0 License, and codification samples are licensed nether the Apache 2.0 License. For details, spot the Google Developers Site Policies. Java is simply a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the accusation I need","missingTheInformationINeed","thumb-down"],["Too analyzable / excessively galore steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / codification issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],[],[],[]]
English (US) ·
Indonesian (ID) ·