Programming – The Blog of Lee

MySQL Query to Use MaxMind GeoLite IP Database

Feb 162015

MaxMind offer a fantastic resource to geolocate IP addresses. Unfortunately the PHP API supplied is overkill for most (simple) uses. Instead, we can do it mostly within PHP/MySQL.

Create Schema:

DROP TABLE IF EXISTS `blocks`;
CREATE TABLE  `blocks` (
  `startIPNum` int(10) unsigned NOT NULL,
  `endIPNum` int(10) unsigned NOT NULL,
  `locID` int(10) unsigned NOT NULL,
  PRIMARY KEY  (`startIPNum`,`endIPNum`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1 DELAY_KEY_WRITE=1;

DROP TABLE IF EXISTS `location`;
CREATE TABLE  `location` (
  `locID` int(10) unsigned NOT NULL,
  `country` char(2) default NULL,
  `region` char(2) default NULL,
  `city` varchar(45) default NULL,
  `postalCode` char(7) default NULL,
  `latitude` double default NULL,
  `longitude` double default NULL,
  `dmaCode` char(3) default NULL,
  `areaCode` char(3) default NULL,
  PRIMARY KEY  (`locID`),
  KEY `Index_Country` (`country`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;

Read in the MaxMind data:

load data infile 'GeoLiteCity-Blocks.csv' 
into table `blocks` 
fields terminated by ',' optionally enclosed by '"' 
lines terminated by '\n' 
ignore 2 lines;

load data infile 'GeoLiteCity-Location.csv' 
into table `location` 
fields terminated by ',' optionally enclosed by '"' 
lines terminated by '\n' 
ignore 2 lines;

And then we can use raw SQL to query against it – extremely quickly (0.0001 sec) by using the following query:

SELECT * 
FROM   `location` 
WHERE  locid = 
       ( 
                  SELECT     locid 
                  FROM       `blocks` 
                  INNER JOIN 
                             ( 
                                    SELECT Max(startipnum) AS start 
                                    FROM   `blocks` 
                                    WHERE  startipnum <= inet_aton('1.2.3.4')) 
                           AS 
                             s ON 
                             ( 
                               startipnum = s.start 
                             ) 
                             WHERE endipnum >= inet_aton('1.2.3.4'));

You can similarly use this almost exact query within PHP using the following function (assuming PDO….):

function IPdata($ip) {
	global $dbh;
	if(!isset($dbh)) die("No database connection available.");
	if(filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV4 | FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE)) {
		$cip = ip2long($ip);
		if($cip) {
  			try {
				$stmt = $dbh->prepare("SELECT * FROM `ip_geolocation` WHERE locID = (SELECT locID FROM `ip_blocks` INNER JOIN (SELECT MAX(startIPNum) AS start FROM `ip_blocks` WHERE startIPNum <= :IPADDR1) AS s ON (startIPNum = s.start) WHERE endIPNum >= :IPADDR2)");
				$stmt->bindParam(':IPADDR1', $cip);
				$stmt->bindParam(':IPADDR2', $cip);
				$stmt->execute();
			} catch (PDOException $ex) {
				die("IPdata() SQL ERROR: " . $ex->getMessage());
				return false;
			}

			if($stmt->rowCount() == 1) {
				$row = $stmt->fetch();
				return $row;
			} else  {
				die("IPdata() ERROR: NO DATA RETURNED FOR {$ip}");
				return false;
			}

		} else {
			die("IPdata() ERROR: ip2long RETURNED FALSE FOR {$ip}:{$cip}");
			return false;
		}
	} else {
		die("IPdata() ERROR: IP address does not validate ({$ip}:{$cip})");
		return false;
	}
}

CubieUPS – A UPS For the CubieTruck/Cubieboard3 Backup Battery

Programming 2 Responses »

Dec 182014

I recently got hold of a CubieTruck / Cubieboard 3 (think Raspberry Pi but on steroids) and ordered the LiPo backup battery to go with it (available from New IT in the UK, if you were wondering). Aside from just how fabulous the Cubieboard is (running Ubuntu 14.04 LTS no less), it also has a built-in battery backup for when the power goes out. This can, without a SATA HDD/SSD attached, power the Cubieboard for many hours. Even with a SATA attachment, it’ll ride out most power blips, burps, minor outages and power-plug moves.

But there is no built-in intelligence to it. It runs until the battery dies, and then your system dies with it. That’s pretty much the same as not having a backup battery at all in an extended outage and you aren’t around to notice. It just delays your file system corruption rather than prevents it.

So I wrote CubieUPS one evening. It’s a simple pair of scripts, run via cron (for extra points, run the same PHP script in your .profile as well and it’ll show you the current power status without you having to even do anything at all). It logs everything, and when things are looking a little dicey, it’ll shutdown your system cleanly.

It’s quite straight forward. You can view the source code, installation instructions and download the tarball here.

Monitoring ADS-B with dump1090 and Generating an Alert on a Squawk (7700, etc) in PHP

Programming 11 Responses »

Apr 222014

A friend of mine recently said he’d like to have something to ‘plug in’ to dump1090 running locally (see my previous post for my setup on this) to alert to emergencies in the air that are nearby. As I was getting in to it I realised what a neat idea it actually was. There are many different “interesting” squawk codes, and unless you’re glued to your Virtual Radar all day long, you’re not going to notice. Not to mention if a plane is going to fall out the sky it’d be handy to know to take cover …

So. SquawkWatch was born!

It’s a lightweight PHP script (uses approximately 1% of CPU time allocated to my ‘playground’ virtual server, and it has next to no CPU allocated to it) and sifts through all the messages coming in from dump1090, looking for a set of squawk codes defined within. When it finds one, it sits on it and waits for a complete data picture, and then emails the alert to you.

Requirements:

PHP 5.3
MySQL
dump1090 (reachable by the PHP host)

Download the source code here.

Download the G-INFO database here (late March 2014).

Example (non-debug mode) output here (stdout).

Any questions pop them in the comments and I’ll help where I can.

Lee

ADS-B (dump1090) with MySQL Support on the Raspberry Pi using a DVB-T (R820T) USB Stick

Programming 15 Responses »

Mar 252014

The Raspberry Pi is an amazing piece of kit. There’s little it cannot do with sufficient tweaking. I have a passing interest in planes, and love FlightRadar24 (iPad app and website).

FlightRadar24 screenshot

I started to wonder how they got this data. A quick rummage around their website reveals they mostly rely on a world wide network of volunteers to collect and then feed them data. You need ‘line of sight’ to an aircraft to be able to query the information, and no one entity can afford to do this globally. So, a network of volunteers run ‘feeder radar stations’ (ok, it isn’t really ‘radar’ but, more the Next Generation version of).

Hardware I Use

I use a ‘stock’ Raspberry Pi model B, connected via ethernet to my home network ordered via RS Components (fwiw, they’re slow as heck… order one from somewhere else…!). My Pi is strapped up under my desk out of the way. It has a 4 port (unpowered) USB hub connected to it (the black blob underneath) but otherwise it is entirely unremarkable. I’m even still using the original RS Components 4Gb SD card.

Excuse my camera charger lurking underneath it all – it’s entirely unrelated!

My Raspberry Pi, mounted under my desk

Hardware wise, to receive the ADS-B broadcasts from planes overhead, I use a DVB-T (R820T) Stick with an external aerial attached to it. I ordered mine from 1090mhz.com (based in Germany) and it arrived in the UK about 4 days later in perfect condition, and even had an adaptor and external aerial included in the package – thanks guys! This is the ‘best’ stick to use – apparently – as it is based on the R820T chipset.

Software I Use

I use a modified version of dump1090, originally installed using instructions from David Taylor’s SatSignal.eu website. dump1090 is fantastic. Using the sdr-rtl drivers (also documented on David’s site) to re-tune my DVT-B Stick from digital TV to the 1090MHz used by ADS-B, allows it to receive the ‘next gen’ radar data right from the planes in the sky themselves.

dump1090 then takes this feed and decodes the data into something human rather than machine readable. Using the –interactive mode, you can see the output as the planes fly by.

dump1090 –interactive

Perhaps even more exciting than that though, is the –net option, which enables not only all the data sockets so that FlightRadar24’s software can pull the information out of dump1090 (setup instructions here), but also enables a built-in web server so you can run and view your own miniature version of FR24:

Screenshot of my own ‘dump1090’ FlightRadar24-style output (–net option)

MySQL Support

You may remember I said I use a modified version of dump1090. That is because as well as everything above, I also wanted to store a local copy of all the data I receive in a MySQL database so I can do my own manipulations and calculations. While there is a MySQL’d branch of dump1090 on github, it is dozens of commits behind the main branch and missing out on a lot of the hard work Malcolm Robb has put in to his master branch and fork of antirez’ original work.

So, rather than forego either mysql, or using the latest version of dump1090, I hacked them together and re-introduced mysql support into the latest version of dump1090.

To keep updates to future versions easy, there are very minimal changes to all other source code/header files of the official dump1090 branch. 95% of mysql support code is contained within mysql.h and mysql.c with pretty much the only main branch changes being the inclusion of mysql headers and a new struct in dump1090.h, the –mysql switch handler in dump1090.c, and a call to modesFeedMySQL() in mode_s.c (that could even be moved to mysql.h I suppose to separate it even more .. but I just put it with all the other structs for consistency).

This should make it relatively simple for me/you to upgrade each time a new version comes out.

MySQL authentication credentials are now in mysql.h rather than buried deep in the codebase. If it’s something lots of people show an interest in, the database credentials could even be supplied on the command line for even greater simplicity and portability. We’ll see…

If you’d like the latest version (1.08.1003.14) of dump1090 with mysql support, you can get it here.

Happy flying!

Distributed, Daemonized, Parallel PHP

Google Latitude, Programming No Responses »

Jul 302011

Ever since I was 13 I’ve been programming in PHP. It’s one of those “you can do anything with it” languages that I just love working with. I have recently launched a (pre-beta) service that automatically checks you into Facebook Places (and more will follow, such as Foursquare) based on where your phone reports you to be totally automatically, courtesy of Google Latitude. It was awesome fun to write and is now live for folks to play with (you can find out more at beta.CheckMeIn.at).

The Problem

Now if it was just for me, it would have been trivial to write. Grab my Latitude position, compare it against a handful of places I frequent, and if any of them match, check me in on Facebook. Checking and comparing my location every 60 seconds would be really easy.

But what if I’m doing that for hundreds or even thousands of people? A script that runs each user in turn would run for hours just doing one sweep of the user database, querying Google Latitude, doing the distance calculation math based on latitude and longitudes, and then punching any matches to Facebook. Cron that script to run every 60 seconds and the server would fall over from RAM exhaustion in about 10 minutes, and only the first 50 or 100 people in the user database would ever be processed.

The Solution

There are 3 background processes (excluding the maintenance bots) that ‘power’ CheckMeIn.at. They are all written to work out of a central ‘work queue’ table, where the parent process gets a list of work to do and inserts work units into the work queue table. It then counts up how much work there is to do, and divides that by the number of work units each child process will be allowed to handle at a time. If there are more work units than permitted children, it spawns off the first batch, lets them run, and then spawns more as they exit off with their completed workloads.

The beauty of it is it dynamically grows itself. With 10 users it’ll happily spawn 1 process and run through them all in a second. With 100 users it’ll spawn 2 processes and do likewise. With 2,000 users it’ll spawn 10, and so on and so forth. If we have 1 million users it’ll spawn it’s maximum (say 50), then wait and spawn extras when there is room. All without any interaction on my part.

The Google Latitude Collector (GLC) manages the collection of user locations every 60 seconds. It’s “self-aware” in the sense that it manages its own workload, keeps track of the queries allowed by Google, and generally throttles itself to Do No Evil, while keeping the service responsive.

The User Location Processor (ULP) follows the same principles of the work queue, and compares locations collected by the GLC against a list of Places the user has configured via the web interface. It computes matches, near misses (to help with the setup), honours the delay periods, and so on and so on. If all criteria are met, it passes work units on to…

The Facebook Check-in Injector (FCI). The FCI handles a shedload of sanity checks, prevents double-checkins, examines Facebook for a users last check-in to make sure we’re not doing something they’ve already done themselves, and lots more. If it all works out, then we check them in and the whole thing goes round again.

Sounds complex, but from firing off a Google Latitude Collector, to checking a user in (assuming we’ve adhered to delay periods here), the are checked in to Facebook about 4 seconds later.

The Moral

Plan for growth in your application from the very beginning. This project would have been a b*tch to modify later on. But by knowing it’d grow, and implementing self-awareness and control into the app, it can handle infinite growth. If the current server that does all the processing becomes overloaded, it’s trivial to add another to halve its workload, and all without having to modify a single line of code.

The key however is to have a powerful database server to run it all off. In an hour it can easily generate a million database queries as users interact with the site, and the daemons go about their own business. Without a database server capable of keeping up, things start to seriously slow down.