Jul 302011

Ever since I was 13 I’ve been programming in PHP. It’s one of those “you can do anything with it” languages that I just love working with. I have recently launched a (pre-beta) service that automatically checks you into Facebook Places (and more will follow, such as Foursquare) based on where your phone reports you to be totally automatically, courtesy of Google Latitude. It was awesome fun to write and is now live for folks to play with (you can find out more at beta.CheckMeIn.at). 

The Problem

Now if it was just for me, it would have been trivial to write. Grab my Latitude position, compare it against a handful of places I frequent, and if any of them match, check me in on Facebook. Checking and comparing my location every 60 seconds would be really easy.

But what if I’m doing that for hundreds or even thousands of people? A script that runs each user in turn would run for hours just doing one sweep of the user database, querying Google Latitude, doing the distance calculation math based on latitude and longitudes, and then punching any matches to Facebook. Cron that script to run every 60 seconds and the server would fall over from RAM exhaustion in about 10 minutes, and only the first 50 or 100 people in the user database would ever be processed. 

The Solution

There are 3 background processes (excluding the maintenance bots) that ‘power’ CheckMeIn.at. They are all written to work out of a central ‘work queue’ table, where the parent process gets a list of work to do and inserts work units into the work queue table. It then counts up how much work there is to do, and divides that by the number of work units each child process will be allowed to handle at a time. If there are more work units than permitted children, it spawns off the first batch, lets them run, and then spawns more as they exit off with their completed workloads.

The beauty of it is it dynamically grows itself. With 10 users it’ll happily spawn 1 process and run through them all in a second. With 100 users it’ll spawn 2 processes and do likewise. With 2,000 users it’ll spawn 10, and so on and so forth. If we have 1 million users it’ll spawn it’s maximum (say 50), then wait and spawn extras when there is room. All without any interaction on my part.

The Google Latitude Collector (GLC) manages the collection of user locations every 60 seconds. It’s “self-aware” in the sense that it manages its own workload, keeps track of the queries allowed by Google, and generally throttles itself to Do No Evil, while keeping the service responsive. 

The User Location Processor (ULP) follows the same principles of the work queue, and compares locations collected by the GLC against a list of Places the user has configured via the web interface. It computes matches, near misses (to help with the setup), honours the delay periods, and so on and so on. If all criteria are met, it passes work units on to…

The Facebook Check-in Injector (FCI). The FCI handles a shedload of sanity checks, prevents double-checkins, examines Facebook for a users last check-in to make sure we’re not doing something they’ve already done themselves, and lots more. If it all works out, then we check them in and the whole thing goes round again. 

Sounds complex, but from firing off a Google Latitude Collector, to checking a user in (assuming we’ve adhered to delay periods here), the are checked in to Facebook about 4 seconds later. 

The Moral

Plan for growth in your application from the very beginning. This project would have been a b*tch to modify later on. But by knowing it’d grow, and implementing self-awareness and control into the app, it can handle infinite growth. If the current server that does all the processing becomes overloaded, it’s trivial to add another to halve its workload, and all without having to modify a single line of code. 

The key however is to have a powerful database server to run it all off. In an hour it can easily generate a million database queries as users interact with the site, and the daemons go about their own business. Without a database server capable of keeping up, things start to seriously slow down.

 Posted by at 7:32 pm
Apr 212011

Those of you who have known me for any period of time will probably have been aware that you could find my current location on my personal website (which is now this blog). This was originally just the Google Latitude ‘badge’, which was quite a simple map representation of my current location with a guestimated range bubble around me. This is still displayed on every page in the right hand column. It only however, identified the town at best in textual format, and offered no historical view ability, or alternative display methods when I was somewhere I go regularly.

Since the 19th February 2011, I have been storing my Latitude location as updated automatically by my mobile phone that goes with me everywhere. This has been updated every 60 seconds into a MySQL database, along with a reverse-geocode lookup from the Google Maps API of the best possible postal address from latitude/longitude, an accuracy estimate (can be spot on with GPS, within 50 metres with wifi and city centre 3G coverage, and upwards of 2km in the countryside), and a timestamp. A couple of authorised PHP shell scripts do all the raw collection and storage operations. This then allows me create my own map of my location that I can play with, as well as offer minimaps of my ‘last 5 positions’ and anything else that might take my fancy.

For example on my location page now, I calculate the time I have been somewhere and also check my current location against a database of places I frequent on a regular basis and stay at for quite a while when I get to them. There are 9 entries in it. A short sample are my house, my girlfriend’s house, a couple of Starbucks that I go to regularly, and where I work. If it calculates I am within a permitted range of any place in that database table (each entry has a specific permitted range) and I’ve been there for more than a few minutes, it’ll “check me in” to that place and display precisely where I am. Once I begin moving again, it’ll check me out and begin the usual ‘roaming’ display once more.

If I ever get asked “To eliminate you from our murder enquiry, where were you at 5.33pm on the 2nd April 2011?” I can honestly say IKEA, Lakeside Retail Park, W Thurrock Way, Thurrock RM16 6, UK!

Some may question the logic of doing this – surely it’s invasion into my personal life? That may be so, but given any of my friends could ring me and say “Where are you?”, what’s the difference?