Ever since I was 13 I’ve been programming in PHP. It’s one of those “you can do anything with it” languages that I just love working with. I have recently launched a (pre-beta) service that automatically checks you into Facebook Places (and more will follow, such as Foursquare) based on where your phone reports you to be totally automatically, courtesy of Google Latitude. It was awesome fun to write and is now live for folks to play with (you can find out more at beta.CheckMeIn.at).
The Problem
Now if it was just for me, it would have been trivial to write. Grab my Latitude position, compare it against a handful of places I frequent, and if any of them match, check me in on Facebook. Checking and comparing my location every 60 seconds would be really easy.
But what if I’m doing that for hundreds or even thousands of people? A script that runs each user in turn would run for hours just doing one sweep of the user database, querying Google Latitude, doing the distance calculation math based on latitude and longitudes, and then punching any matches to Facebook. Cron that script to run every 60 seconds and the server would fall over from RAM exhaustion in about 10 minutes, and only the first 50 or 100 people in the user database would ever be processed.
The Solution
There are 3 background processes (excluding the maintenance bots) that ‘power’ CheckMeIn.at. They are all written to work out of a central ‘work queue’ table, where the parent process gets a list of work to do and inserts work units into the work queue table. It then counts up how much work there is to do, and divides that by the number of work units each child process will be allowed to handle at a time. If there are more work units than permitted children, it spawns off the first batch, lets them run, and then spawns more as they exit off with their completed workloads.
The beauty of it is it dynamically grows itself. With 10 users it’ll happily spawn 1 process and run through them all in a second. With 100 users it’ll spawn 2 processes and do likewise. With 2,000 users it’ll spawn 10, and so on and so forth. If we have 1 million users it’ll spawn it’s maximum (say 50), then wait and spawn extras when there is room. All without any interaction on my part.
The Google Latitude Collector (GLC) manages the collection of user locations every 60 seconds. It’s “self-aware” in the sense that it manages its own workload, keeps track of the queries allowed by Google, and generally throttles itself to Do No Evil, while keeping the service responsive.
The User Location Processor (ULP) follows the same principles of the work queue, and compares locations collected by the GLC against a list of Places the user has configured via the web interface. It computes matches, near misses (to help with the setup), honours the delay periods, and so on and so on. If all criteria are met, it passes work units on to…
The Facebook Check-in Injector (FCI). The FCI handles a shedload of sanity checks, prevents double-checkins, examines Facebook for a users last check-in to make sure we’re not doing something they’ve already done themselves, and lots more. If it all works out, then we check them in and the whole thing goes round again.
Sounds complex, but from firing off a Google Latitude Collector, to checking a user in (assuming we’ve adhered to delay periods here), the are checked in to Facebook about 4 seconds later.
The Moral
Plan for growth in your application from the very beginning. This project would have been a b*tch to modify later on. But by knowing it’d grow, and implementing self-awareness and control into the app, it can handle infinite growth. If the current server that does all the processing becomes overloaded, it’s trivial to add another to halve its workload, and all without having to modify a single line of code.
The key however is to have a powerful database server to run it all off. In an hour it can easily generate a million database queries as users interact with the site, and the daemons go about their own business. Without a database server capable of keeping up, things start to seriously slow down.