AD0SK Project Log

Aug 01, 2020

Building a Geocities-style Hit Counter with Google Cloud Functions

When I was working on getting this site hosted, I knew I wanted some basic metrics reporting. Just basic view counts and referrers, nothing so overbuilt or privacy-invasive as Google Analytics. Server logs would suffice but my hosting doesn't provide them, and I've been out of the web game for long enough I started asking around my friends for suggestions. Among the responses I received was the following:

LOL I remember those hit counters from the 2000s

You are visitor NUMBER 219

Just do one of those

It would legit be so cool

Another friend, somewhat more helpfully, suggested GoatCounter. This ended up being a pretty perfect fit for my needs and what I ended up going with, but my mind had already worked through how we might go about building a geocities-style hit counter in 2020. Plus, it would legit be so cool.

So, let's do it!

Basic requirements

For those who aren't clear what I'm talking about, what we're going for here is something like this:

an old-school, odometer-style hit counter

An old-school hitcounter, mined from the bowels of archive.org and used without proper attribution like I'm the British Museum or something.

I'm not sure how these actually worked, I'd imagine just a cgi script that used ImageMagick to glue the digit assets together into a final image, although if you had cgi rendering yourself you could probably just stuff the img tags for the correct individual digits and do it that way. The former technique sounds better for our present purposes, for reasons that should become apparent.

It's 2020, so let's assume the visiting browser can display SVG. Let's try and write an endpoint that will keep an internal count, increment it on each GET request, and return that as an SVG that shows the current number. We can then just put that into an img tag and it should just work with no AJAXy complications or client-side anything.

Note, though, we don't currently have any server-side anything either (this site, for instance, being statically-hosted). To make anything work at all, then, let's put our new endpoint up on Google Cloud Functions. This is google's version of lambda/serverless, and should allow us to (1) write a function to service the GET request, (2) communicate with one of the various datastores that google provides to maintain the counter state, and (3) render the SVG code for return as the response payload.

Let's take these tasks in 3, 1, 2 order:

Implementation: the display

For this, we want something that takes an arbitrary integer and returns an SVG document that draws it like we want.

Let's use the drawSvg module for the SVG manipulation. First nail the basic geometry by assuming (very optimistically) six digits to display, which we'll zero-pad from the left. Make each digit 40 pixels wide by 60 pixels tall.

import drawSvg as draw

N_DIGITS = 6
DIGIT_WIDTH = 40
DIGIT_HEIGHT = 60

def get_counter(n):
    """Given integer n, returns a drawSVG.Drawing representing n"""

    d = draw.Drawing(
        width = N_DIGITS * DIGIT_WIDTH,
        height = DIGIT_HEIGHT
    )

    return d

This will give us an SVG with the correct geometry, although there's nothing in it yet. We'll want some digits in there, which we'll presumably want to draw individually, so let's write a helper that breaks those out:

def get_digits(n):
    """return list of single-character strings, left-to-right the digits of n,
    zero-padded to N_DIGITS.
    Raise AssertionError if n exceeds representable digits"""

    fmtstr = f'{{:0{N_DIGITS}d}}'
    digits = list(fmtstr.format(n))
    assert len(digits) == N_DIGITS, f'Overflow trying to fit {n}' \
                                    f'into {N_DIGITS} digits'

    return digits

Since we want the result to pad out to N_DIGITS, and since this might conceivably change, we double-bag it on the format string. Given the default value above, fmtstr ends up being '{:06d}', and thus get_digits(69) returns ['0', '0', '0', '0', '6', '9'] (Note: there are many, many other ways of accomplishing what we're doing, some of them a lot more fun. Quick reminder that "fun" is not necessarily an admirable objective when developing software in a a pedagogical or team environment).

Although perhaps not in keeping with the spirit of the geocities inspiration, we also aggresively trap for an argument that's not representable in the given width (instead of silently overflowing, trying to paste ancillary characters outside the bounds of the SVG draw area, throwing a weird internal exception, or other unexpected behaviors). Thus, get_digits(1234567) will raise AssertionError. To be totally rigorous, we should also check to make sure the argument is an integer and positive-definite. In a professional environment, we'd probably (hopefully?) have unit tests to ensure that exceptional cases like these are handled with a minimum of surprise.

Now that we have these digits, we can apply them as text elements to our Drawing. We add these to the d object within the scope of get_counter above like so:

for digit, x in zip(get_digits(n), range(N_DIGITS)):
    d.draw(
        draw.Text(
            digit,
            fontSize=DIGIT_HEIGHT,
            x=(x+.5)*DIGIT_WIDTH,
            y=DIGIT_HEIGHT*.65,
            fill='white',
            font_family='courier, MONOSPACE',
            center=True
        ))

There's some munging going on to get the characters to align, the details of which I'll spare you (protip: just monkey with it between some combination of code tweaks and devtools with liberal use of the refresh button until it looks right). We will, however, bow in the general direction of cross-platform visual consistency by specifying a font, and go ahead and make the numbers white, to show up against the dark background.

Except, we don't have a dark background yet. The following needs to be specified farther up in the procedure so as to end up on a lower z-layer, but we can use an SVG gradient to get the awesome drum counter visual effect:

g = draw.LinearGradient(0, 0, 0, DIGIT_HEIGHT)
g.addStop(0, 'black')
g.addStop(0.5, '#666')
g.addStop(1, 'black')

d.draw(draw.Rectangle(
    0,
    0,
    width=N_DIGITS * DIGIT_WIDTH,
    height=DIGIT_HEIGHT,
    fill=g
))

And maybe some vertical lines between the digits, while we're at it:

for x in range(-1, N_DIGITS+1):
    d.draw(
        draw.Line(
            x*DIGIT_WIDTH,
            0,
            x*DIGIT_WIDTH,
            DIGIT_HEIGHT,
            stroke_width=8,
            stroke='#222'
        )
    )

Put all together, get_digits(31337) then returns an SVG document that looks something like the following:

svg output of the above code

Not bad for 55 lines of (mostly whitespace) code with no static assets!

Implementation: the deployment

To catch up, we now have a thing that, given a number n, will provide a fancy SVG graphic depicting that number n in the most awesome 90s-tastic fashion possible. It remains to host this someplace that will allow for retrieval of those graphics over the public internet, and which will hopefully take care of incrementing n once per such request, thereby making it a proper hit counter.

First, let's test that our svg generation works in a deployed environment. Define a function test_counter of one argument, the flask.Request object representing the request, but which we'll then ignore. Have this return the svg data from a static n:

def test_counter(request):
    return draw_counter(69).asSvg()

and, for convenience, a Makefile to deploy it:

deploy-test:
    gcloud functions deploy test-counter --entry-point test_counter \
           --runtime python37 --trigger-http

We also need a requirements.txt telling GCF that our execution environment needs drawSVG.

Getting that to actually work will require a bit of configuration in the Google Cloud Console, and I'm going to consider that out-of-scope for this treatment. For one, there are a million blog posts out there about "how to get started with Google Cloud Functions" that go through all that in detail, for two they're all also probably out-of-date because google seems to change the minutiae of the configuration pages quite frequently. Google's own documentation is kept current and also surprisingly good, and I'd refer any readers actually playing the home game to that.

Anyway, (waves hands), given the configuration of my project, this endpoint is now available at https://us-central1-geocities-counter.cloudfunctions.net/test-counter. Amazingly it just works, with the xmlns presumably saving us from having to futz about with MIME types or anything. Now, to make it tick...

Implementation: data persistence

The GCF execution environment is ephemeral, so obviously we can't just use a global variable to retain the counter state, or anything else that's tied to the interpreter lifecycle. So, although it seems like overkill, we're going to need to set up an external datastore to retain this state. GCP is absolutely resplendent with options for doing this: we could use google's Bigtable or Spanner, a hosted instance of MongoDB or Postgres, even a flat file in Google Cloud Store. For this walk-through I'm going to use firestore, which is a NoSQL/document database that google took over with their acquisition of Firebase and that's easy to get started with for projects like this.

As above, I'm going to skip the details of actually setting this up on the Cloud Console side. Suffice to say we've set up firebase for the project and defined a collection called counter. Now, we add a second function:

from google.cloud import firestore

def get_counter(request):
    """GCF entrypoint: retrieves counter state from firestore, initializing if
    necessary.
    Increments this and persists to firestore, returning svg payload of counter
    displaying new count"""

    db = firestore.Client()
    doc_ref = db.collection(u'counter').document(u'count')
    doc = doc_ref.get()
    old_n = doc.get(u'n') if doc.exists else 0

    n = old_n + 1
    doc_ref.set({u'n': n})

    return draw_counter(n).asSvg()

and a make target for same:

deploy:
    gcloud functions deploy counter --entry-point get_counter \
           --runtime python37 --trigger-http

We need the firestore module from the google.cloud package for anything to work, and this must be added to requirements.txt also. Despite being a google product, the Cloud Functions execution environment doesn't inject every possible dependency into the runtime, which we may assume is a Good Thing.

When we instantiate a database client via firestore.Client(), it does know, somewhat magically, that we want to connect to the project's firestore, and it handles this connection, including authentication. This is actually totally awesome since anyone who's done work like this knows getting things to talk like that usually takes at least an hour, along with a large repertoire of curse words to get working properly.

Even though the value we're trying to store is scalar, firebase makes us go through a couple layers of abstraction to get there (being, of course, designed to operate on much larger collections of data, the value of which we'll see in the conclusion). First, we must identify a collection (here called counter), which must be created either through the Cloud Console or via calls to an API we won't treat here. The unit over which these collect is called a document, and these are identified by unique keys. We use the magic key count to identify our single document of concern. The documents are (potentially heterogeneous) associative arrays, which are not yet retrieved at this point. This allows us to check for existence and apply default data if necessary, allowing document lifecycle to be managed at the application level even after relegating that of the collection to the infrastructure. Finally, our scalar value of interest lives on a field of the document we call n, which we then retrieve if the document exists and default to 0 otherwise. We then increment this value, overwrite the document with the new value of n, and pass it to draw_counter to generate our response.

Although it worked above when directly loaded, it turns out that browsers don't seem to render svg in an img tag without either a .svg file extension (which GCF doesn't allow) or the correct mimetype set, so we force that on the response. We then should be able to use this as the src value of an img tag, like so:

img tag with src pointing to our new endpoint

Sit there and jam on the refresh button for a minute to prove to yourself it works.

Conclusion

So that was fun, we've accomplished our goal of implementing a server-side-rendered graphical hit counter I feel is very aligned with the spirit of those we saw on geocities and other sites in the early days of the public web (even if "server" has become a much more nebulous concept since then). Project files may be found at https://github.com/drewhutchison/geocities-counter.

From here, there's a few places we could obviously go:

  • If we don't trust our clients to render SVG, we could use drawSVG's .rasterize() method to return it as a PNG or whatever. Maybe even predicate this on the contents of the request's Accept header.
  • A lot of hit counters back then used a different visual presentation: 7-segment displays, plaintext, or whatever. My feeling is, if we're going old-school skeumorphic, go old-school skeumorphic and stick with the drum counter, but since this is encapsulated in the draw_counter method, we can really do whatever we want. For instance, a bit of signed random added to the y argument of the draw.Text constructor would give us a bit of jitter to the drums for a more authentic 90s feel.
  • There's a classic race condition, in that two simultaneous requests might retrieve the same n and both overwrite with their n+1, resulting in a count being "lost". Obviously this would be unacceptable in certain applications, and there are well-known ways of preventing this, but I think for a simple hitcounter we're OK.
  • This is really the dumbest thing possible and, except for the concern just noted, will advance the count on each discrete GET request. Most things like this are minimally interested in unique visitors. Since we have access to the entire Request object, we could tamp this down by IP, set a browser-side cookie, or employ some more-sophisticated type of client fingerprinting to account for non-unique views (this need not be invasive, in fact, GoatCounter employs a rather ingenious mechanism to track unique visitors in a GDPR-compliant way. See https://www.goatcounter.com/gdpr for details).

The latter would entail a modification of our use of the datastore: instead of just incrementing a scalar n we would keep a set of identifiers (the IP, some uniquely-identifying hash, or whatever) and return the cardinality of this set. The use of a document-store database like firestore makes this a very easy change. At that point, we could also do away with the magic key n entirely and instead use the URL as determined from the request, and we're well on our way to a full-featured analytics system, despite not changing the public-facing API (of a GET to the function endpoint) at all.

But, that would have the effect of reducing the total count, and I think I get more internet points the higher that thing goes. Did you refesh the page to prove to yourself it's working yet?

©2020 andrew james hutchison • atomrss