Measurement Protocol with no cookie

Objectives

  • Try to understand what kind of metrics we can push to GA
  • Is it doable for production usage?
  • What metrics can we get from other cookie-less analytics (Cloudflare and PanelBear) ?

Methodology

  • Use this personal website as a playground
  • Use Cloudflare Workers to send data to GA
  • Measure, Fix, and Repeat

Measurement Protocol

The protocol is the name given by GA to non-official API to push data to the analytic engine. There are two main versions:

  • GA4 - Latest version, in beta, not well documented, low number of event that can be captured using the server-to-server API. Not very stable. I have discarded this version.
  • GA3 - Universal Analytics - Seems to be the relevant option for server-to-server analytics as of April 2021. API is documented

Implementation

The measurement protocol API requires a Client Id reference. This Id is used to track user actions, by default the value is generated by the GA SDK for you using some black magic. In the current case, everything is server-side, so some information needs to be used to generate this Client Id. The good news is that Cloudflare Workers provides a lot of information, let’s review their uniqueness.

Header Values

  • accept: low - same for a browser all chrome users are likely to have the same value
  • accept-language: low - same for a browser all chrome users are likely to have the same value
  • cf-connecting-ip: high - connecting ip - can be share for the same household or the same enterprise.
  • user-agent: low - same for a browser all chrome users are likely to have the same value

CF Extra Values

  • country: low - country related to the IP
  • tlsCipher: low - list the cipher used for the TLS version
  • tlsVersion: low - list the version used to the TLS connection - most likely 1.2 or 1.3
  • city: low - based on the related ip
  • continent: low - based on the related ip
  • postalCode: low - based on the related ip
  • timezone: low - based on the related ip

However, merging all those values into one unique hash can result in a temporally quite unique hash to measure the traffic on the website. Using only these values will not allow analysing:

  • People on the go, IP or localisation will change over time. So they will be counted as a new Client
  • Big corporate networks are likely to use the same outbound IP / even this is not sure as they can rely on service providers with many external outbound connections. In all case, traffic sharing the same IP are likely to have the same ClientID.
  • ISP may also share IP across multiple clients.

A pseudo code for the getClientId can be:

async function getClientId(request: Request): Promise<string> {
  let value = ""
  const fromHeaders = [
    'accept',
    'accept-language',
    'cf-connecting-ip',
    'cf-ipcountry',
    'dnt',
    'user-agent'
  ]

  fromHeaders.forEach((name) => value += request.headers.get(name))

  if (request.cf) {
    const fromCf = [
      'country',
      'tlsCipher',
      'tlsVersion',
      'city',
      'continent',
      'postalCode',
      'timezone',
    ]
    fromCf.forEach((name) => value += getProperty(request.cf, name as keyof IncomingRequestCfProperties))  
  }

  const myText = new TextEncoder().encode(value)
  const myDigest = await crypto.subtle.digest({ name: "SHA-256",}, myText)

  return hex(myDigest)
}

The value can also contain a date like YYYMMDD.ClientID for instance, so the date can be used as a snowflake to avoid having duplicate information and to make sure the information is correctly split. This can be adjusted depends on the need.

The code to send the data looks like this:

async function analytics_g3(request: Request) {
  const measurement_id = 'XXXXXX'

  const url = new URL(request.url)

  const params = new URLSearchParams({
    'v': '1',
    't': 'pageview',
    'tid': measurement_id,
    'cid': await getClientId(request),
    'dl': request.url,
    'dr': request.headers.get('referer') || "",
    'ds': 'web', // datasource
    'ua': request.headers.get('User-Agent') || "",
    'geoid': request.cf.country,
    'cs': url.searchParams.get('utm_source') || "(direct)",
    'cm': url.searchParams.get('utm_medium') || "organic",
    'cn': url.searchParams.get('utm_campaign') || "(direct)",
    'ck': url.searchParams.get('utm_term') || "",
    'cc': url.searchParams.get('utm_content') || "",
    'ci': url.searchParams.get('utm_id') || "",
    'gclid': url.searchParams.get('gclid') || "",
    'dclid': url.searchParams.get('dclid') || "",
  })

  const langs = languageParser.parse(request.headers.get("accept-language") || "")
  if (langs.length > 0) {
    params.append('ul', langs[0].code)
  }  

  // generate an unique client id
  return fetch(`https://www.google-analytics.com/collect?${params}`)
}

I am not an expert of GA, some values might not be relevant or generates wrong data.

The code has been running on this website for a couple of days now and some metrics are showing up in GA.

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-00.png

Of course, don’t expect any values from user interactions to be available.

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-01.png

The code also logs all requests to GA, so traffic from bots is also available and will add noise to the metrics.

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-02.png

From a developer’s point of view, this can be nice to see bots, but for other people, this is just noise. It is possible to detect bots at the Cloudflare Workers level and adjust the behaviour, either add a new attribute or send the bot traffic to a dedicated GA account.

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-03.png

So the prototype is working!

Compare with others.

In the same time, I have also deployed 2 other cookie-less systems to compare numbers. Those solutions are javascript based running in the browser.

PanelBear

Over the last 7 days, the solution has recorded 44 visits! To be honest, I don’t expect more for this kind of website.

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-04.png

Cloudflare Analytics

Over the last 7 days, the solution has recorded 105 visits! The double of PanelBear, why that ? I did not investigate yet, both are working with Javascript. So either Cloudflare Analytics is not yet blocked or the unique visits count differ.

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-05.png

Both solutions are pretty far away from the GA values.

Can we use it in production? Probably.

Can we use values as it? Probably Not.

However, It is probably time to iterate to improve implementation based on the current learning.

Pandora box

The indirect obvious finding of this small POC is that Cloudflare Workers provides at no cost an easy way to track people in an invisible way (if you accept the fact that the generated client id might not be unique for a group of user).

/public/2021/ga-measurement-protocal-with-no-cookie/ga-mp-06.png

This open a lot of questions but not for today…