llms.txt explained: should you implement it in 2026?
By Cited Research Team · Published April 16, 2026 · Updated Apr 2026
Key Takeaways
- llms.txt is a proposed plain-text file that curates a website's content for AI crawlers and generative engines. It lives at the site root, like robots.txt.
- Only 7.4% of Fortune 500 companies (37 in total) have implemented llms.txt (ProGEO.ai, March 2026). Adoption is an open question.
- No major AI engine (ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude) has publicly committed to reading llms.txt as a ranking signal in 2026.
- 76% of AI citations go to external sources beyond the brand's own domain (Slate HQ AI Citations Study, 2026) — which limits how much any on-site file can move citation share.
- Cited's position: low-cost to implement, no demonstrated citation lift, defensible as a hedge but not a lever. Ship it in 30 minutes if your stack supports it; skip it if it requires engineering cycles.
llms.txt is a proposed content-curation file designed to give AI crawlers a clean, Markdown-formatted map of a website's most important pages. It was introduced by Jeremy Howard of Answer.AI in September 2024 and has been adopted by 7.4% of the Fortune 500 — 37 companies as of March 2026 (ProGEO.ai, March 2026). Whether it moves AI citation share is a separate question with no peer-reviewed answer yet.
What is llms.txt?
llms.txt is a plain-text file that lives at the root of a website (e.g., cited.so/llms.txt) and lists the site's most AI-relevant URLs with Markdown formatting and short descriptions. It was proposed as an open standard by Jeremy Howard of Answer.AI in September 2024. Its stated goal is to give large language models a curated, low-noise view of a site's content rather than relying on open crawl and retrieval.
The file is not mandated, enforced, or read by any major AI engine as a documented ranking signal in 2026. OpenAI, Anthropic, Google, and Perplexity have not published guidance confirming they parse llms.txt. Adoption is organic and speculative — a hedge, not a contract.
How does llms.txt differ from robots.txt?
robots.txt tells crawlers what they may and may not crawl. llms.txt tells AI engines what the site considers important and how to read it.
| File | Purpose | Syntax | Enforced by | Standard status (2026) |
|---|---|---|---|---|
| robots.txt | Crawl control | Plain-text directives (User-agent, Disallow) | All major crawlers, RFC 9309 (2022) | IETF standard |
| sitemap.xml | URL discovery | XML list of URLs + lastmod | Search engines | Sitemaps.org protocol, 2008 |
| llms.txt | AI-curation hint | Markdown with H1, H2, bullet links | No engine has committed | Proposed, Sep 2024 |
llms.txt is closer in spirit to a curated sitemap or a "best of" index than to a crawl directive. It does not block, allow, or throttle — it just highlights.
What does an llms.txt file look like?
An llms.txt file opens with an H1 naming the site, an optional blockquote summary, and then H2 sections grouping the most important pages as bulleted Markdown links with descriptions. The format is specified at llmstxt.org.
A minimal example:
# Cited
> Cited is a GEO agency that gets brands cited by ChatGPT, Perplexity, and Google AI Overviews.
## Core pages
- [AI Visibility Audit](https://cited.so/audit): Free 48-hour diagnostic.
- [How GEO works](https://cited.so/how-it-works): The methodology.
## Articles
- [What is GEO?](https://cited.so/blog/what-is-geo): Definitional 2026 guide.
- [GEO vs SEO](https://cited.so/blog/geo-vs-seo): Signal-by-signal comparison.
A parallel llms-full.txt file can include full-text content for deeper ingestion. Implementation takes roughly 30 minutes on a static site and marginally longer on a CMS.
Who has adopted llms.txt?
Adoption is early and concentrated. 7.4% of Fortune 500 companies — 37 in total — had implemented llms.txt as of March 2026 (ProGEO.ai research, March 2026). That figure is the first published industry benchmark, and it frames current adoption as "novel early-mover" rather than "standard practice."
Implementation examples include Anthropic, Mintlify, Hugging Face, and Cloudflare. Mintlify auto-generates llms.txt for customer documentation sites — a significant chunk of the 37 Fortune 500 adopters appear to have arrived via Mintlify rather than greenfield implementation. Cloudflare rolled out Auto llms.txt as a feature in late 2025 for its hosted sites, which will likely accelerate the adoption curve through 2026.
Do AI engines actually read llms.txt?
No major engine has confirmed it. OpenAI, Anthropic, Google DeepMind, and Perplexity have not published guidance stating that ChatGPT, Claude, AI Overviews, Gemini, or Perplexity read llms.txt as a ranking input. Informal testing by SEO practitioners has produced mixed signal — some engines appear to fetch the file; none documents using it.
This matters because citation share is measured, not declared. 56% of AI citations come from third-party pages outside the brand's own domain (AirOps, 2026; aligned with Slate HQ's 76% finding, 2026). A file served from your root doesn't influence those off-site signals. At best, llms.txt helps an engine that is already crawling your site index the most relevant pages more efficiently.
What's the argument for implementing it?
Three defensible arguments, ranked by strength:
- Low-cost hedge. Implementation takes 30 minutes. If a major engine begins weighting llms.txt in late 2026 or 2027, early movers have it. ProGEO.ai's 7.4% number (March 2026) will be obsolete inside 12 months if adoption follows the robots.txt or sitemap.xml curves.
- Crawl efficiency for AI agents. Autonomous AI agents (Perplexity's research mode, Claude's
web_search_20260209, OpenAI's Deep Research) perform multi-step retrieval. A curated page list reduces tokens spent on low-relevance URLs. This is inferred, not measured — but mechanically consistent with how agentic retrieval works. - Documentation signal. Sites that ship llms.txt also tend to ship FAQPage, Article, HowTo, Organization schema, and author bylines. Those correlate with AI citations: 71% of ChatGPT-cited pages use schema markup (industry crawl sample, 2026). llms.txt is part of a broader "AI-ready documentation" stance, and stance correlates with citation performance.
What's the argument against it?
Two concrete counter-arguments:
- No demonstrated citation lift. Cited has not found a single 2026 study demonstrating a causal lift in citation share from llms.txt adoption. The available evidence is correlational, observational, and small-sample. The 7.4% Fortune 500 adoption number (ProGEO.ai, 2026) tells us who has it, not what it did for them.
- Off-site citations dominate. 56% of AI citations come from third-party sources (AirOps, 2026); 85% of brand mentions in AI responses are from pages outside the brand's own domain (AirOps, 2026). An on-site file cannot move those signals. Unlinked brand mentions correlate r=0.664 with AI citations; backlinks correlate r=0.218 (Ahrefs 75K-brand study, 2025). The highest-leverage GEO signals live off your site.
A third, softer objection: llms.txt fragments the ecosystem. robots.txt already covers crawl control; sitemap.xml already covers URL discovery. Adding a third root file with no enforcement authority risks becoming a spec nobody reads.
Where does llms.txt break down?
For brands with significant technical debt or CMS-locked publishing, implementing llms.txt is non-trivial. For brands in regulated verticals (financial services, health) where legal review gates every published file, the 30-minute task becomes a 3-week task.
llms.txt also doesn't solve the volatility problem. 40–60% of domains cited in AI responses are completely different one month later (Conductor + Superlines, 2026); only 20% of brands remain visible across five consecutive query runs (Profound AI Search Volatility, 2026). A static curated file can't counteract engine-side ranking drift.
And crucially, llms.txt is useless for brands that are invisible to AI today. If ChatGPT and Perplexity don't cite your site at all, ensuring they see a well-organized index changes nothing. The prior question — "are the right off-site signals present to bring AI crawlers to you in the first place" — is where the budget should go.
When should you implement it?
Cited's position: implement llms.txt if your stack makes it trivial (static site generator, Mintlify-hosted docs, Cloudflare-served pages) and skip it if it requires engineering cycles. It's a defensible hedge, not a lever.
Prioritize higher-leverage GEO moves first:
- Structured extractable content (sequential H2s, 40–60 word answer capsules, lists, tables).
- Entity density (15+ named Knowledge Graph entities per 1K words, per Ziptie.dev, 2026).
- Off-site signal (Wikipedia, Reddit, LinkedIn, earned media, directories).
- Schema stacking (Article + FAQPage + HowTo + Organization).
After those ship, llms.txt is a 30-minute add. For the full GEO stack, see What is Generative Engine Optimization? and GEO vs SEO: what's actually different in 2026. To measure whether any of this is working, see Citation share: the GEO metric that replaces rankings. Or run a free AI Visibility Audit to get a baseline before you invest.
FAQ
Does Google read llms.txt? Google has not documented reading llms.txt for AI Overviews or Gemini. Google uses its own crawl infrastructure, Knowledge Graph, and YouTube-transcript pipeline for AI Overview source selection (Ziptie.dev, 2026). There is no evidence llms.txt influences AIO citation rates.
Does ChatGPT read llms.txt? OpenAI has not documented it. ChatGPT Search uses Bing's index with OpenAI-specific reranking; Wikipedia is 47.9% of top-cited sources (Hashmeta via Yext, 2026). Whether llms.txt moves the needle inside that pipeline is an open empirical question.
Should I implement llms-full.txt too? Optional. llms-full.txt includes the full-text content of the indexed pages, intended for deep-ingestion agents. It roughly doubles file size and maintenance burden. For most sites, llms.txt alone is sufficient; reserve llms-full.txt for documentation-heavy sites where full-content ingestion is the core use case.
Can llms.txt block AI crawlers?
No. robots.txt blocks crawlers. llms.txt only suggests content. To exclude AI crawlers, use robots.txt with User-agent: GPTBot, User-agent: PerplexityBot, User-agent: Google-Extended, User-agent: ClaudeBot and the appropriate Disallow directives.
Sources
- ProGEO.ai — Fortune 500 llms.txt Implementation Research, March 2026 — https://www.globenewswire.com/news-release/2026/03/31/3265644/0/en/ProGEO-ai-research-finds-7-4-of-the-Fortune-500-have-implemented-llms-txt.html
- llmstxt.org — The llms.txt specification — https://llmstxt.org
- Answer.AI — Jeremy Howard's original llms.txt proposal, September 2024 — https://www.answer.ai/posts/2024-09-03-llmstxt.html
- AirOps — LLM Brand Citation Tracking, 2026 — https://www.airops.com/blog/llm-brand-citation-tracking
- Slate HQ — AI Citations Study, 2026 — https://slatehq.com/blog/ai-citations
- Ahrefs — 75K-Brand AI Mentions Study, 2025 — https://ahrefs.com/blog/
- Hashmeta via Yext — AI Visibility Report, 2026 — https://www.yext.com/blog/ai-visibility-in-2025-how-gemini-chatgpt-perplexity-cite-brands
- Ziptie.dev — AI Overviews Source Selection, 2026 — https://ziptie.dev/blog/google-ai-overviews-source-selection/
- Conductor — State of AEO/GEO Report, 2026 — https://www.conductor.com/academy/state-of-aeo-geo-report/
- Superlines — AI Search Statistics 2026 — https://www.superlines.io/articles/ai-search-statistics/
- Profound — AI Search Volatility, 2026 — https://www.tryprofound.com/blog/ai-search-volatility
- Cloudflare — Auto llms.txt Documentation — https://developers.cloudflare.com/
- Mintlify — llms.txt generator — https://mintlify.com/docs/llms-txt
- Anthropic — llms.txt Example — https://docs.anthropic.com/llms.txt
- Hugging Face — llms.txt Example — https://huggingface.co/llms.txt
- IETF RFC 9309 — Robots Exclusion Protocol, 2022 — https://datatracker.ietf.org/doc/html/rfc9309
- Sitemaps.org Protocol — https://www.sitemaps.org/protocol.html
About the author: The Cited Research Team tracks AI citation behavior across ChatGPT, Perplexity, Google AI Overviews, Gemini, and Claude. Cited is a GEO agency that gets brands recommended by AI without touching client websites. Run your free AI Visibility Audit.
Want Cited to run the audit for you?
50 target queries, 3 AI engines, competitor gap analysis. 48-hour turnaround. Free.
Get your free audit →