Do AEO Tracking Tools Matter When AI Personalizes Answers?

I’ve been speaking to a lot of teams recently and it’s clear there’s a gap in how AEO (Answer Engine Optimization) is understood.

This concern is valid:

If every user sees a different answer, based on personalisation and context, what is a tracking tool actually measuring?

Here’s the reality: personalization exists, but within limits. And those limits are measurable. AI responds randomly, but the randomness is predictable because responses are generated from a probability distribution

What actually happens when two users ask the same question

Imagine two people asking ChatGPT: "What's the best CRM for a small business?"

One is a solo founder in Bangalore running a 3-person team
The other is a sales manager in New York at a 50-person company.

They'll get slightly different answers. Maybe one sees Zoho ranked higher. Maybe the other sees HubSpot higher. But Salesforce, HubSpot and Zoho will appear in both.

The core set of brands stays consistent. What shifts is ranking, emphasis and sometimes one or two names at the edges. Research from Graphite.io, which analyzed 200 responses per prompt across thousands of brand-focused queries, confirmed this. The bulk of any answer is drawn from a stable underlying distribution.

A more accurate way to think about it: AI starts with a shared baseline understanding of the world, and then applies light personalization on top. That baseline is driven by:

How widely a brand is discussed across the web: Presence (Think: mentions across reviews, blogs, social, forums, news etc.)
How consistently it's associated with a specific use case: Association (Think: Zomato = food delivery)
How well it fits the intent of a query: Positioning (Think: Zoho showing up when someone asks for a CRM for small businesses)

Where most people get this wrong

The common assumption is: "Every user sees a completely different, personalised answer and hence would see a different universe of brands. So tracking is pointless."

That's not true.

Think of it like a restaurant shortlist. If someone asks a local for a good Italian restaurant, they'll get a slightly different answer depending on who they ask, budget, vibe, occasion. But the same 4-5 well-known spots will keep coming up. Nobody recommends a place that's completely off the radar.

AI works the same way. Personalization reshuffles the same shortlist, it doesn't create an entirely new one.

If HubSpot is in that shortlist, personalization might move it up or down. If another CRM tool isn't in the shortlist at all, personalization won't put it there.

The mistake most people make is confusing variation with unmeasurability. These are not the same thing.

So what do AEO tools actually measure?

Tools like Paprik.ai don't mimic any particular user’s search behavior. They answer a more useful question:

Across a large number of runs and neutral contexts, how often does your brand appear for a given prompt?

Imagine running the prompt "What's the best CRM for small businesses?" 100 times in a fresh chat window, each time. If your brand shows up in 60 of those 100 responses, your visibility is 60%. That 60% is your baseline. It tells you "how often you exist in the answer, across the full range of people likely asking this."

Visibility tells you your brand's baseline probability of being included in an answer, before personalization adjusts it.

Graphite's research found that you don't even need 100 runs to get a reliable number. Just 10 responses per prompt is enough for a solid estimate, with 98.6% of prompts landing within a 10% margin of error at that sample size.

The simple takeaway

AEO tools are measuring a brand's probability of being included in the answer, before personalization tweaks it. That score is real. It's measurable. And it's the number that should drive your AEO strategy.