The Schema Delusion: Why LLMs Aren’t Reading Your Markup (And Why You Should Still Use It)

For years, the Search Engine Optimization (SEO) industry has treated Schema.org markup as a kind of digital talisman—a secret code that, when whispered into the <head> of a website, would magically compel search engines to favor your content. With the advent of Large Language Models (LLMs) and Generative Engine Optimization (GEO), this belief has only intensified. Many consultants now claim that JSON-LD structured data is the primary bridge that allows AI models to "understand" and cite brands.

However, a recent, irreverent experiment by SEO expert Mark Williams-Cook has set the industry ablaze, suggesting that the emperor has no clothes. By injecting intentionally broken, duck-themed JSON-LD into a webpage, Williams-Cook demonstrated that LLMs are not parsing schema as intended—they are merely reading it as messy text.

The Duck Experiment: A Case of Digital Gaslighting

The premise was deceptively simple. Williams-Cook created a webpage about a fictional company, "DUCK YEA," and embedded a JSON-LD script containing a fake address: 77 The Muddy Bank, South Pondshire. Crucially, the address appeared nowhere in the visible text of the page.

Schema, LLMs & The Low Bar For ‘Evidence’ In GEO

When queried, LLMs like ChatGPT and Perplexity confidently extracted the address. The "GEO community" seized upon this as proof that the models were intelligently parsing the structured data. But the reality was far more mundane: the models were not "reading" the schema in any structural sense; they were simply performing text extraction on everything they found on the page. Because the JSON-LD was just another block of text wrapped in curly braces, the model treated it with the same level of importance as the rest of the HTML.

This revelation has forced a necessary, if uncomfortable, reckoning within the SEO community. If the machines aren’t actually parsing the schema, what is the value of this markup in the age of generative AI?

Chronology of a Misunderstanding

The confusion surrounding schema and LLMs did not happen overnight. It is the result of years of industry-wide conflation between traditional search engine indexing and modern probabilistic AI models.

The Pre-LLM Era: For two decades, schema was the gold standard for disambiguation. It allowed Google to connect "Apple" (the fruit) to "Apple" (the tech giant) by linking to specific knowledge graph entities.
The Rise of Generative AI: As LLMs began to replace traditional search interfaces, the industry sought to apply old SEO logic to new technology. The narrative shifted from "schema helps search engines index" to "schema helps LLMs hallucinate less."
The "IndexNow" Conflation: Microsoft’s Fabrice Canel famously mentioned that GenAI values fresh content, and that IndexNow helps keep data updated. Many interpreted this as confirmation that LLMs use JSON-LD for "reference checking," ignoring that Canel was speaking about crawl efficiency, not semantic parsing.
The Great Debunking: Williams-Cook’s experiment proved that even with completely invalid, nonsensical schema types (e.g., MallardEnterprise), LLMs returned the data. The models were not verifying the schema—they were just scraping the code block.

The Mechanical Reality: Why Schema Doesn’t "Train" Models

To understand why schema is likely ignored during the training of frontier models, one must look at the "unglamorous" side of data science: the pre-training pipeline.

The Cleaning Process

When companies like OpenAI or Google build a foundational model, they utilize massive datasets—often involving trillions of tokens. Before the model sees a single word, the data undergoes rigorous cleaning. Scripts, CSS, analytics tags, and yes, <script type="application/ld+json"> tags, are routinely stripped out by libraries like trafilatura. The goal is to isolate clean, human-readable prose.

The Tokenization Problem

Even if a block of JSON-LD survived the cleaning process, it would face the hurdle of tokenization. LLMs do not read words; they read sequences of tokens. A structured JSON object, when broken down into tokens, loses its semantic "structure." The @type: Organization field becomes a string of tokens indistinguishable from a casual forum post about SEO. The very disambiguation that schema provides for a database-driven search engine is lost in the statistical soup of a neural network.

Official Stances and Industry Contradictions

The irony of the current situation is best captured by Google’s own internal inconsistencies. On one side of the search results page, the "AI Overview" might confidently state that a business is open, while the "Google Business Profile" (the structured, curated database) explicitly displays a red "Permanently Closed" banner.

If Google cannot reconcile its own AI output with its own structured data, the idea that a third-party LLM (like Claude or ChatGPT) is relying on your website’s JSON-LD to verify facts is statistically unlikely.

Is Schema Worthless?

Absolutely not. However, its value proposition must be re-evaluated:

The Disambiguation Argument: Schema remains the best way to tell search engines (the traditional ones) who you are and what you do. It remains a core input for the Knowledge Graph.
The Future-Proofing Argument: While LLMs might not be "reading" your schema today, they are likely to move toward RAG (Retrieval-Augmented Generation) architectures that prioritize structured data inputs. By having clean schema, you are ensuring that if a model does decide to query your page in the future, it will have the best possible data to consume.

Implications for SEO Strategy

For those building websites today, the takeaway is not to delete your schema, but to stop treating it as a "magic button."

Stop Over-Engineering: Do not spend thousands of dollars on custom schema setups that you hope will "trigger" AI citations. It is unlikely to yield a direct ROI in the current LLM landscape.
Focus on Entity Footprint: If you are a new brand, your goal is to become an entity. This involves consistent branding, Wikipedia/Wikidata presence, and authoritative external mentions. Schema supports this, but it is not the foundation.
The "Duck" Test: If you want to know if your schema is doing anything for your brand’s presence in AI, try testing it with a neutral prompt. If the model is giving you accurate information, it is likely because your content is clear, consistent, and widely referenced across the web, not because of a few lines of JSON-LD.

Conclusion: A Shift in Perspective

The industry’s obsession with schema as an "LLM signal" is a classic example of "vibe-based" marketing. We want the technology to be elegant—we want it to read our structured data and understand our brilliance. But the technology is currently a messy, probabilistic engine that prefers simple, human-written text.

Schema is a vital piece of infrastructure for the web, but it is not a direct line to an AI’s brain. By continuing to use it correctly—as a way to provide machine-readable clarity to crawlers—you are playing the long game. Just be careful not to let the hype cycle sell you a solution that doesn’t actually exist. As Williams-Cook’s experiment proves, if you look closely enough at the code, you might just find a duck pointing out that we’ve all been looking in the wrong direction.