On a caller Search Off the Record podcast, hosts John Mueller and Martin Splitt pushed backmost connected the thought promoted by AI SEOs that stripped-down, content-only versions are a amended measurement to optimize for AI Search. They made the lawsuit that each the things AI SEOs want to region are really useful for ranking.
Non-Content Parts Of Web Pages Matter
The TL;DR of this portion is that HTML is for browsers to render into a visible page for humans, arsenic good arsenic for surface readers to read.
Martin Splitt originates the chat by explaining why plain HTML appears not to beryllium the perfect measurement to supply contented to AI agents and LLMs. The thought is that, successful summation to content, there’s a batch of different codification successful the HTML that is irrelevant for an LLM aliases AI supplier that whitethorn beryllium visiting a tract for the content.
The entreaty of markdown, then, is that it tin supply the contented successful a mode that breaks free of each the HTML that’s meant to make a web page visible for humans aliases readable by a surface reader.
Splitt explains:
“And I deliberation that’s besides why group deliberation it’s bully for LLMs, because you person little stuff, little tokens. And if you look astatine an HTML record without a browser rendering it, if you conscionable look astatine the plain HTML successful a matter editor, basically, past it’s difficult to publication the content, because there’s truthful overmuch cruft, truthful overmuch worldly successful it. There’s each these HTML tags and each this possibly moreover inline styles and each that benignant of stuff.”
He besides praises markdown for the expertise to still pass the principle of the content:
“But if a Markdown render fails and you look astatine the Markdown record successful a matter editor, it still is system and readable. Like a nexus is the connection of the nexus text, for illustration the anchor text, and past successful quadrate brackets and past successful normal brackets. It’s astir apt what I would do if matter was each I had available.
If I was penning an email without the anticipation to really nexus things, I would astir apt people up immoderate benignant of nexus matter and past put immoderate benignant of measurement to say, like, and this is wherever you request to spell to really spot that.
And I deliberation this minimalism is astir apt what makes group think, yeah, this is awesome for a instrumentality that needs to understand this content, dissimilar HTML.”
Converting HTML To Text Is Trivial
Mueller and Splitt noted that contempt really analyzable HTML looks, crawling and making consciousness of it is trivial and very easy to do. The trading constituent astir utilizing markdown for LLMs, that it simplifies crawling and indexing content, wholly breaks down astatine this point.
John Mueller explains:
“I deliberation the large point is that the web pinch HTML and everything has been astir for really agelong time, longer than Markdown. And each of the crawlers retired there, person practiced pinch HTML. And converting HTML into matter is trivial. There are tons of libraries retired location that tin do that for you. So if you deliberation astir what an mean web crawler mightiness look for aliases mightiness request to find connected a page to beryllium capable to understand it, past astir apt that’s conscionable HTML.”
Markdown Fails For Content Discovery
Discovery is erstwhile immoderate crawler visits a web page and discovers different web pages wrong a azygous website, and besides from website to website.
Splitt said that markdown is focused connected conscionable 1 portion of the content: the contented itself. He explained that this makes it harder for hunt engines to spot a web page successful the discourse of really it connects to the remainder of a website’s contented done links, which assistance discovery.
He explained:
“Yeah, and I mean, the different point is, yes, it’s bully that Markdown is usually past focusing connected a portion of content, but HTML pinch each the links and navigation and the headers and each that benignant of worldly that benignant of gets stripped retired successful the Markdown files that make the website are important to understand the building and really this connects to the remainder of the site.
So I conjecture that’s besides a bad thing. If we were to suffer this, that’s astir apt not truthful bully for crawling successful Discovery, huh? “
Takeaway
Reading patents and investigation papers, it becomes clear that hunt engines spot a website arsenic a postulation of individual web pages, but besides arsenic groups of web pages that beryllium to sections and categories, and besides arsenic the full website itself arsenic a whole. Zoom out, and the website is but 1 constituent among thousands and thousands of different websites successful a vicinity of websites, self-organized by links into categories and value levels.
For SEO, we person to understand a tract from some the zoomed-out and zoomed-in position to conceptualize really each the pieces fresh together. The logic is because that’s what hunt engines do.
AI-based SEO seems to beryllium hung up connected making it easy for LLMs and AI agents to crawl and scale content. Crawling and indexing are valid concerns. But by insisting connected markdown files, they are not considering the fundamentals of find and really trivial it is to extract contented from an HTML web page, which makes markdown files redundant.
Aside from the supra issues, location is besides the 1 astir trustworthiness. There utilized to beryllium a point called a keyword meta tag that immoderate hunt engines utilized to get a hint astir what a web page was about. Naturally, tract owners and SEOs utilized it to dump each the keywords they wanted to rank for, sloppy of the content.
I’m not saying that SEOs and website owners are untrustworthy, but hunt postulation is money, and group are going to do what they’re going to do. So the past information is that hunt engines will ne'er spot markdown contented and usage it arsenic the canonical erstwhile it’s a trivial point to crawl and extract the original contented from the HTML.
Circling backmost to what Mueller and Splitt discussed, Google insists that the AI SEO insistence connected markdown strips distant a important magnitude of discourse that matters.
Watch Search Off The Record Episode 111 here:
English (US) ·
Indonesian (ID) ·