Methodology · Updated June 2026

Built so you can trust the numbers

A mention rate is only useful if you can act on it. Every score we report comes with a confidence interval, every alert comes with a statistical test, and every design decision was made to minimize false signals.

Get your free snapshot

Why a raw percentage is worthless

If an AI model mentions you in 3 out of 5 responses, the intuitive measure is 60%. But the true rate could be anywhere from 23% to 88%. That 65-point range makes the number almost unactionable.

Scale to 30 observations and the same 60% rate narrows to 42% to 76%. Now you have something you can watch and respond to.

n = 5 queries

60%

CI: 23–88%

65-point range. Don't act on this.

n = 30 queries

60%

CI: 42–76%

34-point range. Now you can.

The Wilson score interval

We use the Wilson score confidence interval, introduced by Edwin Wilson in 1927 and used by Reddit to rank comments. It handles small samples correctly, doesn't produce nonsense near 0% or 100%, and gives you honest bounds at every data point.

We display both the center rate and the full interval. When intervals from two consecutive periods overlap, we don't send an alert. Overlapping means the change could be random variation. Non-overlapping means it almost certainly isn't.

Ready to see your score?

Free snapshot. Confidence intervals included.

Get started free

How prompts are generated

We generate prompts in three stages to ensure they reflect questions real buyers actually ask, not questions we invented:

Seeds

Category templates anchor the prompt space across six entity types. Templates include competitive-comparison variants so results reflect how buyers actually weigh options.

Harvest

Autocomplete signals from major search engines surface the phrasing real users type. This ensures prompts use actual buyer language.

Normalization

A fast model deduplicates and validates the harvested prompts into a final panel. Each prompt is checked for quality before being stored.

Which AI models we query

Every paid plan runs prompts across the three major consumer AI assistants: GPT-5 Nano (OpenAI), Gemini Flash Lite (Google), and Claude Haiku 4.5 (Anthropic). These are the models most buyers encounter day-to-day.

Custom tier subscribers can extend coverage to any additional model by bringing their own API key, including DeepSeek, Grok, Mistral, Llama, and Perplexity. Models are health-checked regularly and paused automatically after consecutive failures.

How mentions are extracted

Each AI response is classified at the sentence level — not the document level — to avoid counting an incidental reference as a recommendation. We run a two-stage pipeline:

Stage 1 — Deterministic

We locate the entity name (and any configured aliases) in the response, identify the sentence boundary, and classify the framing. If classification is confident, we stop here. This handles the majority of responses.

Stage 2 — LLM judge

Ambiguous cases go to a small model that returns a structured verdict with a confidence score. Unresolvable cases fall back to a conservative classification. This stage runs on a small minority of observations.

What counts as a “mention”

Every observation is assigned one of five roles. Your mention rate counts Recommended and Listed only.

Role	Meaning	Counts?
`recommended`	Explicitly suggested as a top pick or solution.	Yes
`listed`	Appears in a list of options without explicit endorsement.	Yes
`mentioned`	Referenced in passing, not as a recommendation.	No
`dismissed`	Named with negative framing or as a counterexample.	No
`absent`	Not found anywhere in the response.	No

When we send you an alert

After each run we compare the new metric snapshot to the previous period. If the two Wilson confidence intervals do not overlap, the change is statistically significant. That's when we email you.

No alert — intervals overlap

W26: 45–65%
W27: 50–70%

Shared range. Could be noise.

Alert sent — no overlap

W26: 30–48%
W27: 55–72%

No shared range. Real signal.

What we don’t promise

AI models are non-deterministic

The same prompt may yield different responses on different runs. We mitigate this with multiple prompts and model aggregation, but individual runs will vary.

Model updates happen without warning

AI providers update their models silently. A score change may reflect a model update rather than a reputation change. Per-model breakdowns help you isolate which model moved.

Prompt coverage isn't exhaustive

We generate category-appropriate prompts from real search signals, but can't cover every possible question a buyer might ask.

This isn't SEO

We measure AI recommendation behavior. The two are increasingly correlated but not identical.

See your own numbers

Free snapshot. No credit card. Confidence intervals included.

Check your AI reputation free