What is a user?
It’s hard to talk about any aspect of digital analytics without referring to a user. It might sound silly to define a user – it’s common sense, right? But the colloquial definition of a user, and how Google Analytics defines a user can be quite different.
Further complicating matters, many of our clients are in the middle of upgrading to GA4, while also maintaining their existing Universal Analytics (UA) accounts for reporting purposes. The internet landscape has changed a lot since 2012 when UA was originally released, especially in regards to user privacy, so GA4 has made some changes to address this.
In this article, I’ll explain how users are defined in both UA and GA4, and how each of these might differ from your assumptions.
How Google Analytics collects data
Before diving into things, let’s take a step back and understand what’s happening behind the scenes when a user is browsing your website.
Basically, once it’s installed, the Universal Analytics tracking code is listening for any interaction that results in data being sent to GA. Any time this occurs, it’s referred to as a ‘hit’. Hits can take different forms, but the most common are page tracking (e.g. pageviews), and event tracking (e.g. CTA clicks).
When an interaction occurs, data relating to the action and the user are bundled together into a single hit and sent to Google’s servers. Bits of data that might be included in a hit are:
- a tracking ID (i.e. UAID/property ID) to direct the hit to the right GA account
- the page URL to know where the interaction occurred
- hit type to let GA know what kind of data is being sent
If you’re interested in learning more about hit types and how they’re built, check out the dev guide here.
Sidenote: This is largely the same in GA4, with each hit sending bits of data that are associated with the user action. However, it has been simplified, in that there are no longer different hit types – only events.
How is a user defined in UA?
Now that we understand how a basic hit is constructed, we can introduce the idea of a Client ID.
In web development, there is a distinction between ‘client’ and ‘server’. In simple terms, a server is a program that provides services (i.e. it waits for requests to come through, and fulfils them). While a client is a program that sends requests to servers. A browser (e.g. Google Chrome, Firefox, Safari) is an example of a client.
The client ID is a “unique, randomly generated string” that is included in the hit data. It tells GA where this interaction came from. This means that when all the hits are sent to Google’s servers, GA is able to group together hits with the same client ID and understand that it was part of a single session, that all those actions were undertaken by the same client (i.e. visitor).
When talking about the total number of users in GA, what you are actually referring to is the number of unique client IDs that have sent hits.
What does that mean? Consider the following scenario – Bob accesses www.datatribe.co.nz from his work computer, and later that night, accesses it again on his mobile. He would be counted as a single user, right? Unfortunately, wrong – GA has no way of knowing that Bob is behind both visits. Why is that?
The important thing to note is that client ID is based on a unique browser instance. This means that Bob’s work computer generated a client ID of X; while on his mobile, he generated a client ID of Y. Because X and Y do not match, GA doesn’t realise that it’s Bob both times, and counts him as two users.
The same scenario occurs even if Bob was using his work computer for both visits, but used Chrome the first time, and Safari the second. Each browser generates their own client ID, so GA can’t know that they’re both Bob. GA counts him as two users.
If that’s not enough, clearing your browser cookies will also reset your client ID. So if you visited a website twice in the same week, on the same browser-device combination, but cleared your cookies in-between, GA would count you as two users. This is because your client ID is stored in a cookie on the browser. Deleting it forces GA to generate a new one for you.
Note that by default, the GA cookie expires after two years. However, the two year expiry is reset every time a new hit is sent. Obviously, this is quite a long time, which has led to the rise of Intelligent Tracking Prevention being built into some browsers such as Safari. When using a browser with ITP built in, a cookie will be deleted in 7 days (or even one day under the right circumstances!). This means that someone would be counted as a new user every week.
As you can see from these examples, the user metric in Universal Analytics can be slightly inflated due to these different scenarios. While it could be argued that GA is more useful as a tool for viewing overall trends in user behaviour and attributing value to marketing campaigns, some clients might wish to minimise this inaccuracy.
Enter User ID
One scenario where it would be useful to consolidate the same user across different browser instances, is when a user is able to log into an account.
Having a website with its own authentication system allows GA to identify the same user across different devices. So rather than calculating users as the number of unique client IDs, the number of unique user IDs (i.e. user accounts) is used.
For example, if Bob logged into his TradeMe account on Chrome on his work computer, Safari on his mobile, and Firefox on his personal laptop, all three devices would be counted as a single user under this feature. i.e. counting his logged in user account as one user, disregarding the fact that he is accessing TradeMe through different browsers.
Sound good? This feature is known as User ID and is important to note that is not enabled by default – it requires implementation by a developer. But even once User ID has been implemented, you will need to create a new User ID view, separate from your regular Main/Test/Raw views. This is because a standard view can’t be converted to a User ID view – the option to do so is only available when creating a new view.
Why is that? Simply because the counting methods are completely different. Going from using client ID as the user counting method, to using user ID is a radically different way of looking at the data. User ID even gives you the option of unifying sessions so that hits sent before a user logged in are still associated with that user ID. All of this means your user count will be quite a bit lower than a normal GA view.
However – and this is a big one – under UA any users that don’t log into an account would not have a user ID and would therefore not be counted at all in the User ID view!
Obviously, this is a massive drawback, as you would need to switch between User ID and regular views for different reporting needs, rather than having it combined in a single view.
What has changed in GA4?
Everything up until now has covered users in the context of the Universal Analytics version of Google Analytics (i.e. GA3). Now that we have a basic understanding of the ways in which the users metric can be calculated in UA, is it any different in GA4?
How GA4 collects data
Unlike Universal Analytics, which was completely dependent on cookies being set in your browser in order to count non logged-in users accurately (though with the caveats described above!), GA4 has a range of user-counting methods available to it.
This is the biggest change in GA4. While cookies (and the client ID that is stored in them) are still a fundamental part of the user counting mechanism, it is now one of three reporting identities that feed into the same property. This is pretty big – as I mentioned in the ‘Enter User ID’ section above, User ID was not compatible with Client ID, so they had to be kept in separate views. GA4 fixes this drawback by having different reporting identities, allowing you to count both logged-in and non logged-in users in a single GA4 property.
There are 3 types of reporting identities available in your account. These are (in order of the level of priority assigned to it by GA4):
- User ID
- Google Signals
- Device ID (Client ID)
Let’s go through each of these.
User ID is the same concept as in UA. It’s not enabled by default, and must be configured to your website’s authentication system before GA will be able to link users across devices/browsers. Because GA4 allows for different data streams to be sent to the same property (e.g. mobile app + website), a single logged-in user can be tracked and unified even as they move across different device types. This was not possible in UA, as mobile app and website data were not compatible, and were therefore tracked in separate properties.
Device ID is the same concept as Client ID, which we discussed above in the UA section above.
With the increased focus on user privacy laws such as GDPR, and browsers rolling out limitations on cookies, GA4 is covering its bases in terms of having multiple methods of identifying users. You can choose which combination of reporting identities that you would like to use to define your user numbers.
Google has also suggested that machine learning could be used to fill in data gaps for users who have opted out of all 3 reporting identity methods in the future. This would work by creating an algorithm looking at the behaviour of users who have opted into Google Signals, and creating a model that would lay it over users who haven’t opted in. This would make the gaps in data that are currently present in UA much less significant, as these users would still be accounted for in the model.
Google has recently released Consent Mode, which is currently still in beta. It allows you to change the behaviour of your tags behave based on the user’s consent choice. In the context of Google Analytics, it allows you to still collect basic data even if a user has opted out of analytics cookies. Sounds great, right?
However, since there is a client ID cannot be persisted between pages in order to tie a user’s hits together, Google uses a “random number generated on each page load” instead – essentially resetting this pseudo client ID on every page! This would have the effect of massively overreporting user numbers…
…or at least it would if Consent Mode data was exposed in GA reports. Currently, this is not available for us to view. The data is stored on Google’s servers, with the expectation that at some point, it will be modelled based on users who did consent to tracking, and then become available in reports.
As you can see, the concept of a user is much wider than what you previously thought! If you’re interested in upgrading to GA4 to take advantage of these new features, or would like to get our help on other analytics projects, get in touch with our handy dandy form.