Is GA4 Data Accurate?

No, buddy.

Let’s break it down.

I’m not saying GA4 is useless. Having GA4 data is still better than having none. But you need to know where the limits are. And how much those limits cost you.

I’ll assume your GA4 is technically set up correctly — no duplicate events, no missing tags, no broken triggers. Even then, GA4 doesn’t show you accurate data. And there are more reasons than you’d expect.

How GA4 counts users and sessions

GA4 doesn’t count users one by one. It uses a probabilistic algorithm called HyperLogLog++ (HLL++), which estimates the number of unique values without storing every single record.

It’s a smart trade-off: instead of storing millions of unique IDs in memory, GA4 gets by with 12 KB of data and 16,384 “buckets.” The result? An estimate with a margin of error:

Users (Total Users, Active Users): ±1.6% (95% confidence interval)
Sessions: ±3.3%

So 100,000 users in GA4 actually means somewhere between 98,400 and 101,600 in reality. For most decisions, that’s fine. For statistical tests on large samples (over 12,000 users per variant), it starts to matter.

Important clarification: HLL++ only affects unique counts — users and sessions. Event counts (pageviews, clicks, conversions) are exact. But watch out: when you calculate a conversion rate with users in the denominator, the HLL++ imprecision carries over.

This is the smaller deviation, though. Let’s move to the bigger ones.

Contact me

Share on LinkedIN

What GA4 doesn’t see at all

Ad blockers and tracking prevention

Safari (ITP), Brave, Firefox (ETP), and DuckDuckGo block Google Tag Manager by default. Add uBlock Origin and other ad blockers on top.

Result: GA4 misses roughly 10% of your visitors due to ad blockers and browser tracking prevention. For a tech-savvy audience, it can be more. Some international sources report up to 30%, but in my experience on European projects, it’s typically around 10%.

A solution exists: server-side tracking can recover a significant portion of this data because your server sends the data, not the user’s browser. You can also serve GTM from your own domain (careful — don’t use CNAME, use something like Cloudflare Routes instead) and send data to your own domain. This bypasses most blocking rules.

Consent and cookie banners

If your cookie banner is set up correctly (and a third of companies don’t have theirs right), GA4 only measures visitors who clicked “Accept.”

Opt-in rates typically range between 40–70%. That means 30–60% of visitors GA4 either doesn’t see at all (Basic Consent Mode) or tries to model (Advanced Consent Mode — more on that shortly).

This is, in practice, the biggest source of inaccuracy in GA4 — yet many people focus on ad blockers instead of checking their opt-in rate.

By the way, I described how to calculate opt-in rate in a separate article.

What GA4 hides or distorts

Sampling

In Explorations, GA4 processes a maximum of 10 million events per query (free version). Anything above that gets sampled. Average error is around 5%, but for small data ranges it can be up to 30%.

Standard reports aren’t sampled — they’re pre-calculated. But they have a different problem.

Cardinality — the “(other)” row

Standard reports have a limit of 500 unique dimension values per day. Everything above that falls into the “(other)” row.

Running an e-commerce store with 2,000 products and want to see revenue by product in a standard report? Most of them will be in “(other).” Solution: Explorations (higher limit) or BigQuery (no limit).

Thresholding

If your reports use demographic dimensions or Google Signals data, GA4 may silently hide rows. It doesn’t want anyone to identify individual users from the data.

Google doesn’t say exactly when. From experience: typically at 30–50 users or events. Solution: switch reporting identity to “Device-based.”

Note: Google removed Google Signals from reporting identity in 2024 — Signals are still used for remarketing and demographic data, but they no longer affect reporting identity itself. Thresholding still applies when using demographic dimensions.

Reporting identity

GA4 offers three modes: Blended, Observed, and Device-based. Each gives different numbers for the same data. Blended uses modeling, Device-based uses only cookies. Differences in user counts can be in the tens of percent.

And that brings us to modeling.

Data modeling — a safety net with holes

Consent Mode v2 in Advanced mode sends anonymous pings even from users who haven’t consented. GA4 uses these to model missing data.

Sounds great. The catch:

Modeling only activates with 1,000+ daily events with consent, over 7 consecutive days
It works only in the UI and API — BigQuery exports don’t include modeled data
Basic Consent Mode = tags don’t fire at all = zero modeling
Accuracy depends on the ratio of consented vs. non-consented users

Recovery estimates: 30–50% of lost conversions can be recovered. That’s not nothing. But it’s not 100%.

And most importantly: modeling on vs. off dramatically changes your numbers. We’ll come back to this.

Attribution — a black box

GA4 uses different attribution models depending on the report type. In standard reports: User Acquisition uses first-click, Traffic Acquisition uses last non-direct click. Data-Driven Attribution (DDA) only applies to Key Events reports and Explorations — and even there, only if you have enough data.

DDA is a black box. Google doesn’t reveal how it assigns credit.
Silent fallback. If you don’t have at least 400 conversions in 28 days for a given key event, GA4 silently switches to last-click — without any notification. You’ll still see “Data-driven” in your settings, but last-click is actually running.
Incrementality? Neither DDA nor last-click answers the key question: how many of those conversions would have happened without the ad?

Why does GA4 show a different traffic source in User Acquisition, a different one in Traffic Acquisition, and yet another in Explorations? Exactly for this reason — each report uses a different attribution model. More on this in my article about traffic sources in GA4.

Data latency

One more thing that doesn’t get discussed enough: GA4 has a latency of 24–48 hours. Universal Analytics was significantly faster. On larger websites, I often see data appearing in reports only by the next afternoon. Some data (offline events, late pings) can arrive up to 72 hours after the event — anything later gets discarded by GA4. And there’s no SLA.

For analysis, this doesn’t matter. For real-time decision-making, it does.

What to do about it

Having data is better than having none

This needs to be said out loud. GA4 isn’t perfect, but it’s still the most accessible and widely used tool for web measurement. Having no measurement at all is incomparably worse.

The key is to know how accurate your data is and how much you can trust it.

For trends, GA4 is enough

For tracking trends and comparing periods, GA4 works well. The deviation is consistent, so even though absolute numbers aren’t precise, relative changes are. Campaign A performs better than Campaign B? GA4 will tell you that reliably.

For absolute numbers, watch the modeling

Once you need absolute numbers, it’s a different story. For calculating CPA or ROAS, it makes a massive difference whether you have modeling turned on or not.

With modeling, GA4 reports tens of percent more conversions — and your ROAS suddenly looks great. Without modeling? Same campaign, but the numbers tell a different story. And you’re making budget allocation decisions based on this.

This isn’t an academic problem. This is a problem that costs real money.

If you’re comparing revenue in GA4 with your e-commerce backend numbers, you need to know exactly which reporting method you’re using — otherwise you’re comparing apples to oranges.

BigQuery: accurate data, but at a cost

BigQuery export from GA4 contains:

Every single event — no sampling, no thresholding, no cardinality limits
Raw data without modeling (consent mode models aren’t in BQ)
Event counts are perfectly accurate

Even in BigQuery, though, you need to consider how much precision you actually need. Aggregation functions like APPROX_COUNT_DISTINCT (yes, the same HLL++ as in GA4) can dramatically reduce query costs in BQ — at the cost of that same ±1.6% deviation.

If knowing “roughly 98,000 users” instead of “exactly 97,842” is good enough, you’ll save money. And as I wrote in my Lean Analytics article, you don’t need to measure everything — just what matters.

Scientific note

It’s the Coastline Paradox — the more precisely you measure a coastline, the longer it gets. Same with web data: the closer you get to “accurate” numbers, the more complications emerge. Sometimes it’s wiser to measure with reasonable precision and invest the saved resources into what you actually do with the data.

Wait, it might get better

In November 2025, the European Commission published the Digital Fairness Omnibus package — and in February 2026, it officially withdrew the stalled ePrivacy Regulation proposal. Cookie and tracking rules are now being moved directly into the GDPR (new Article 88a).

The key change: the Digital Omnibus introduces a consent exemption for “audience measurement.” If a website operator collects aggregated visitor data exclusively for their own use, without sharing it with third parties, user consent won’t be required. The conditions are strict: data must be processed by the controller, exclusively for their own service and own purposes.

A similar approach already exists in France (CNIL), Spain (AEPD), and Italy (Garante). The Digital Omnibus would extend this principle across the entire EU.

Timeline-wise, this is more of a late 2026 matter, realistically 2027 — the proposal must pass through the European Parliament and the Council of the EU.

And crucially: this won’t apply to Google Analytics 4. GA4 processes data on Google’s shared infrastructure, and Google uses it for product development and its advertising ecosystem. This doesn’t meet the definition of “the operator’s own analytics.” The exemption will likely be usable for tools like Matomo (self-hosted), Piwik PRO, or Plausible — solutions where data stays fully under the website operator’s control.

Deviation summary

Source of inaccuracy	Deviation	Solution
HLL++ (users)	±1.6%	BigQuery with COUNT DISTINCT
HLL++ (sessions)	±3.3%	BigQuery with COUNT DISTINCT
Ad blockers and tracking prevention	~10%	Server-side tracking, custom domain
Consent (missing consent)	30–60%	Consent Mode v2 Advanced + modeling
Sampling (Explorations)	5–30%	BigQuery, shorter date range
Thresholding	hidden rows	Device-based reporting identity
Cardinality	grouped into “(other)”	Explorations, BigQuery
Modeling (consent)	tens of % difference on/off	Know what you have turned on