Cached Resources and Analytics using Cloudflare Workers

We were facing challenge of gathering request telemetry in our cached API endpoints served by Content Delivery Networks - CDNs. We usually just go to our proxy server logs in Big Query for non-cached resources. It does not work in this scenario as the proxy server won't receive the request after the first hit. We could use javascript to send info to another server but API endpoints responses are usually plaintext or json types.

One of the solution we found was to gather the data in the CDN level. Luckily, Cloudflare - our CDN provider - have such feature available in the form of Cloudflare Workers. Cloudflare Workers are serverless applications written as service workers. It offers easy deployment using the provided CLI application called Wrangler.

We start by install wrangler, login and create the application with:

$ npm install -g @cloudflare/wrangler
$ wrangler login
$ wrangler init our_cf_application
$ wrangler dev

The steps above, obviously, taken from the Cloudflare Workers gettings started guide.

Apart from the main application code/script, the most important part is configuration. Cloudflare workers will read from wrangler.toml that is found in the root directory of the application. It must contain information about the account and other application settings. Here's a very simple wrangler.toml that we use.

name = "news-views-counter-worker"
type = "javascript"

account_id = "somerandomnumbers"
workers_dev = true
compatibility_date = "2021-11-15"

[env.staging]
route = "api.staging-domain.com/news/story/*"
zone_id = "somerandomnumbersagain"
vars = { LOG_HOST = "logging.staging-domain.com" }

[env.production]
route = "api.production-domain.com/news/story/*"
zone_id = "somerandomnumbersoncemore"
vars = { LOG_HOST = "logging.production-domain.com" }

The name is what will identify this worker in the cloudflare dashboard indicating javascript as the language of the application. The account_id and zone_id can be found in the cloudflare dashboard. The workers_dev config is for us to be assigned with a custom cloudflare subdomain to deploy to and test our workers. Custom variables are passed in through the vars config. They will appear as globals inside the application. Cloudflare workers knowing which request to handle is made possible by the route settings. This worker will only handle requests with path starting with /news/story.

addEventListener('fetch', event => {
  try {
    return event.respondWith(handleRequest(event))
  } catch (e) {
    return event.respondWith(new Response("Error: " + e.message))
  }
})

async function handleRequest(event) {
  const request = event.request
  const cacheUrl = new URL(request.url)
  const cacheKey = new Request(cacheUrl.toString(), request)
  const cache = caches.default

    // fetch from origin if not in cache
  if (!response) {
    response = await fetch(request)
    response = new Response(response.body, response)

    // we only put in cache if origin's response is ok
    if (response.ok) {
      event.waitUntil(cache.put(cacheKey, response.clone()))
    }
  }
  if (response.ok) {
    let body = await response.clone().json()
    let items = body.length
    let cf_data = request.cf
    let headers = request.headers
    let url = new URL(request.url)
    let params = url.searchParams

    let data = {
      "longitude": cf_data['longitude'],
      "latitude": cf_data['latitude'],
      "continent": cf_data['continent'],
      "timezone": cf_data['timezone'],
      "country": cf_data['country'],
      "city": cf_data['city'],
      "region": cf_data['region'],
      "region_code": cf_data['regionCode'],
      "colo": cf_data['colo'], // CF colocation IATA code
      "postal_code": cf_data['postalCode'],
      "asn_organization": cf_data['asOrganization'], // ASN name or the ISP name
      "metro_code": cf_data['metroCode'],
      "eu_country": cf_data['isEUCountry'], // "1" if inside EU
      "user_agent": headers.get("user-agent"),
      "real_ip": headers.get("x-real-ip"),
      "platform": headers.get("sec-ch-ua-platform"), // macosx, windows, linux
      "url": request.url,
      "locales": params.getAll("locales[]"),
      "items": items
    }

    let logger_url = "https://" +  LOG_HOST + "/news_logger/log"
    let logger_request = new Request(logger_url, {
      body: JSON.stringify(data),
      method: "POST",
      headers: new Headers({
        "Content-Type": "application/json",
        "User-Agent": "News Views Counter Worker"
      })
    })

    // fire and forget
    fetch(logger_request)
  }

  return response
}

The code above is listening to the fetch event. In our main handler, we constructed a request to Cloudflare's cache. If we have the response cached, we use it, otherwise we make a request to our upstream application and put it's response into the cache. We then build a JSON object using Cloudflare data and our upstream application response and send them to our logging application. Logged data are, then, processed and consumed for analytics. Without Cloudflare Workers, we would not be able to collect information on requests made to our cached resources.

Cloudflare Workers can do more than just intercepting traffic and logging. Full-fledge applications can be built using it with database access and routing. We can even build a scheduled task runner by utilizing the Workers as cron trigger.