playbook · measurement

How to track AI citations in 2026.

The complete measurement stack: which engines to probe, which metrics matter, the volatility you have to instrument around, and what Google Search Console refuses to tell you.

Why manual tracking breaks at week two.

Any marketer can ask ChatGPT ten questions and note who gets cited. The problem shows up on the second probe. Between one run and the next, 40–60% of domains cited in AI responses are completely different (Superlines / Conductor volatility study, 2026). Over six months the drift hits 70–90% (Growth Memo, 2026). The AI citation graph is not a stable artifact you can audit once a quarter — it is a live system you have to sample continuously.

Google AI Mode is the worst offender: responses to the same exact query overlap with themselves only 9.2% of the time across 3 consecutive tests (Growth Memo, 2026). Only 30% of brands stay visible between two consecutive runs (Profound AI Search Volatility, 2026). Rolling averages are the only honest measurement.

The five engines and what they cost to probe.

EngineProbe methodApprox. cost / 1,000 probes
ChatGPTOpenAI Responses API with web_search tool$3–$12
PerplexityPerplexity API (sonar family)$1–$5
Google AI OverviewsSerpAPI AIO parser$15–$50
GeminiGemini API with grounding enabled$1–$4
ClaudeAnthropic Messages API with web-search tool$2–$8

Google AI Overview probing is the expensive one — SerpAPI prices it at the SERP tier because the parser has to actually render the result. The rest are LLM calls. For a mid-market tenant tracking 100 queries across all five engines daily, expect roughly $150–$400/month in raw probe cost.

The metrics that matter (and the ones that don’t).

Metric 1 — Citation share

The fraction of tracked queries where your brand appears as a cited source. Segment by engine (per-platform share) and by query category (commercial vs informational). This is your headline number. Track week-over-week; report monthly.

Metric 2 — Position and snippet quality

Citations are not equal. A position-1 citation with a complete product name and a value-prop snippet drives clicks. A position-5 bare-domain mention does not. Instrument both.

Metric 3 — Citation volatility

Report citation share alongside the weekly standard deviation. High volatility on a growing mean is a sign the off-site moat is still forming. Low volatility on a flat mean means you’ve plateaued — time to open a new channel.

Metric 4 — AI referral traffic and conversion

GA4 filtered by chat.openai.com, perplexity.ai, gemini.google.com, claude.ai, and related bot referrers. AI-referred traffic converts at 14.2% vs 2.8% for organic (Semrush AI Search Study, 2025). Per-platform B2B conversion: ChatGPT 15.9%, Perplexity 10.5%, Claude 5%, Gemini 3%, Google Organic 1.76% (Seer Interactive / ALM Corp, 2026).

Metric 5 — Time-to-first-citation

Days between publishing or distributing a new asset and the first citation appearing in an AI answer. This is your shortest feedback loop — often 7–14 days for well-placed off-site content on Reddit or LinkedIn. It also tells you which channels are working and which ones aren’t.

Skip these.Don’t waste dashboard space on raw “AI mentions” counts without citation context (mention ≠ citation ≠ recommendation). Don’t track keyword density. Don’t track page authority as a GEO metric — backlinks correlate with AI citation at only r=0.218.

Build-vs-buy — the honest math.

The DIY path is feasible. Run a nightly cron that POSTs each tracked query against OpenAI, Perplexity, Gemini, Claude, and SerpAPI, parses each response for your domain + brand entity, stores a row per (query, engine, run), and rolls up weekly share. The open-source stack (Postgres + pg-cron + Prisma/Drizzle + a Next.js dashboard) costs under $100/month in infra plus the $150–$400 in API usage from above. Fine for a small shop tracking one brand.

Where the math breaks for most teams: (a) the probe scheduler has to deduplicate, retry 429s, and back off per engine; (b) the citation parser has to identify your brand under every reasonable spelling and strip hallucinated citations that don’t exist on the source page; (c) the volatility math needs a rolling window that doesn’t re-score your citation share every time ChatGPT flips a coin. You will rebuild the same system that Profound, Otterly, Peec, Evertune, and Cited already shipped, and the Time-to-Insight gap eats 60–90 days of program budget.

Buy when the budget for the program exceeds $3,000/month — the fixed cost of the tracker amortizes below 10% of spend. Build when you have one brand, two engineers, and a strong reason to keep probe data in-house (healthcare, finance, EU).

What Google Search Console won’t tell you.

Google Search Console does not segment AI Overview impressions, does not report clicks from Gemini grounding, and does not identify which queries triggered an AIO at all. You can partially infer AIO exposure by comparing GSC impressions to click-through rate — queries with a CTR well below the position-weighted average on a volatility-adjusted basis almost certainly have an AI Overview absorbing the click. This is an estimate, not a signal. For anything load-bearing, probe the query directly.

Bing Webmaster Tools does surface some Copilot citation data, but the sample is thin and the API latency is long. Use it as a cross-check, not a source of truth.

Weekly review — the 30-minute ritual.

  1. Pull citation share by engine and by query category. Compare to rolling 4-week mean.
  2. Flag queries where you lost a citation in the last 7 days. Look for a common theme (channel, content-type, freshness).
  3. Flag queries where a new competitor entered. Open the cited source. Understand what they did; copy the structural move, not the words.
  4. Check time-to-first-citation on last week’s shipped content. Below 14 days is healthy; above 30 days is a channel problem, not a content problem.
  5. Update the gap list. Queue next week’s content briefs.

The shortest path to a complete answer.

Cited handles all of the above as the default configuration: five-engine probes, rolling citation share, volatility tracking, per-engine conversion reporting, gap analysis, content drafting, off-site distribution, and monthly PDF reports. If you want to see it run against your own brand, the 48-hour audit is free; pricing starts at $1,500/month for the Monitor tier.

◉ faq

What most people ask first.

Can I just check AI citations manually?+
For a one-time audit, yes — open ChatGPT, Perplexity, Gemini, Claude, and a Google AI Overview query, ask your 10 most-important buyer questions, and note who gets cited. That gets you a snapshot. What it can't get you is the 40–60% monthly drift — meaning the brand ChatGPT cites today is often not the brand it cites next week. Production tracking needs automation.
What is "citation share"?+
Citation share is the percent of your tracked queries where your brand appears as a cited source, divided by the total probes run in the window. If you probe 100 queries across 5 engines every week and your brand was cited in 63 of the 500 responses, your weekly citation share is 12.6%. It's the GEO equivalent of share-of-voice in traditional search.
How often should I probe each engine?+
Daily for your top 10–20 commercial-intent queries; weekly for the next 50; monthly for the long tail. Volatility is highest on Perplexity (freshness-weighted) and lowest on Claude. Running the full query set daily across all five engines is cost-overkill for most teams — spend the budget on breadth instead.
Why does the same query return different citations when I re-ask?+
AI answers are probabilistic. Google AI Mode responses overlap with themselves only 9.2% of the time when the exact query is tested 3 times (Growth Memo, 2026). Only 30% of brands stay visible between two consecutive runs; only 20% remain across 5 runs (Profound AI Search Volatility, 2026). Track a rolling average, not a single probe.
Can Google Search Console report AI Overview impressions?+
No. Google Search Console does not break out AI Overview impressions or clicks separately. You have to infer from AI referral traffic (GA4 filtered by chat.openai.com, perplexity.ai, gemini.google.com, claude.ai) and from direct probing. This is the single biggest reason GEO tools exist.
Which metrics actually predict revenue?+
Four: (1) citation share on your top 20 commercial-intent queries, (2) AI referral traffic to /pricing and high-intent URLs, (3) per-engine conversion rate — ChatGPT B2B conversion averages 15.9% vs Google organic 1.76% (Seer, 2026), and (4) time-to-first-citation after a new content push. The last one is the shortest feedback loop you have.
◉ keep reading

Want Cited to run the audit for you?

50 target queries, 5 AI engines, competitor gap analysis. 48-hour turnaround. Free.

Get your free audit →