Or, How I Built a Floppy Preservation Platform From Scratch
There’s a moment I keep coming back to. I’m holding a 5.25-inch floppy disk from 1985, slightly warped, its label faded to near-illegibility. The disk belonged to a now-defunct magazine that no longer exists, containing software written by programmers who may have no idea their work survived at all. In about thirty seconds, my Greaseweazle is going to read it at the flux level — capturing not just the data but the magnetic signature of the original write head, preserved with a fidelity that would have seemed like science fiction to the person who formatted it 41 years ago.
That moment is why I built this.
The Problem With “Just Copy the Files”
When most people think about digital preservation, they imagine it’s straightforward. Files are files. Copy them. Done.
It isn’t like that.
Floppy disks from the 1980s and early 90s were written by a dozen different DOS versions, on hardware with varying geometry, using formatting conventions that were often undocumented, non-standard, or deliberately obfuscated. The IBM PC disk format evolved rapidly between 1981 and 1993 – from 160KB single-sided disks with no BIOS Parameter Block at all, through the chaotic proliferation of 180KB, 320KB, 360KB, 720KB, and 1.2MB formats, to the eventual standardisation around 1.44MB. Every step of that evolution left behind disks that modern tools handle poorly, or increasingly, not at all. You can’t just plug a 5.25″ floppy disk drive into a modern computer either; the motherboards do not have the headers, there is no USB version of such drives, and most modern OS’s have no idea what to do with such devices.
An open-source project (@keirf/greaseweazle: Tools for accessing a floppy drive at the raw flux level · GitHub) gets round this problem with sufficient technical knowhow, but even with the right hardware, software, tenacity and preparation, recoveries can be slow and time-consuming.
My first serious problem was a 160KB disk – an IBM DOS 1.0 single-sided format from 1985. DOS 1.0 predates the BPB entirely. The boot sector BPB area doesn’t exist; it’s either zeroed or, in many cases, contains raw bootstrap code, because the formatter never needed to write geometry data there. DOS didn’t read it. The BIOS told DOS the geometry. That was enough.
Modern Linux does not have a BIOS. It reads the BPB, finds garbage, and refuses to mount.
I spent a week on this. The solution involved mtools with explicit geometry configuration, a content-addressed scan cache, and eventually a BPB-patching function that writes a correct 25-byte geometry block into a temporary copy of the image before mounting – leaving the original completely untouched.
The Stoned Virus Problem
Several of the cover disks I’ve been archiving are infected with boot sector viruses. Stoned is the most common – a 1987 New Zealand virus that spread primarily through floppy sharing, and that infected an absolutely enormous number of disks before it was widely understood. Finding it on a magazine cover disk isn’t surprising. Finding it on a disk that was then sent to thousands of subscribers and read on machines that went on to infect office networks is a small window into how the late 80s computing ecosystem actually worked.
The interesting preservation question is: what do you do with it?
My answer: keep the original, warts and all. Archive it faithfully. Document the infection. But also provide a clean variant for people who just want to use the software.
The tool now handles this automatically. ClamAV scans happen in two passes – first against the raw .img file (the only way to catch boot sector viruses, which live in the first 512 bytes and never appear as files on the mounted filesystem), then against the mounted filesystem itself. If the original is infected and a .clean.img exists, BPB patching is applied to the clean image, not the infected one. The variants system – original, clean, patched, recovered – means every state of a disk’s history is preserved and documented.
The infected original is still there, still mountable via NetDrive if you want to study it. It just comes with a red warning badge and a button you have to click to acknowledge the risks before you get the connect command.
Archaeology at the Boot Sector Level
The most technically interesting disk I’ve encountered so far had a boot sector that was simultaneously valid bootstrap code and appeared to corrupt the filesystem. The JMP SHORT 0x3E instruction at offset zero – a two-byte jump that skips over the entire BPB area – is a deliberate design. The publisher wanted a custom boot experience: insert the disk, power on, and instead of Non-system disk or disk error, you’d get something. A menu, a splash screen, a welcome message. There wasn’t space for that in the standard three-byte jump and eight-byte OEM label. So they used the BPB fields – offsets 11 through 61 – as overflow for executable code, and got 51 extra bytes.
On real hardware, this worked perfectly. DOS never reads the BPB during normal file access. The BIOS knows the geometry. The disk just works.
It also functioned as copy protection. Duplicate a disk and your duplication tool reads the BPB to determine geometry. It gets: 147 bytes per sector, 240 sectors per cluster, 55,438 sectors per FAT. Whatever it produces next is not a working copy.
The bytes that should contain geometry – 0x93, 0x00, 0xF0, 0x1E, 0x50... – are MOV instructions and jump offsets. When I decoded them I found a fragment of the IBM 3.3 bootstrap string interspersed with 8086 opcodes. The formatter had been creative.
The Tool
What started as a PHP script that mounted disk images and printed a directory listing has become something considerably more substantial. The current version:
- Recursively discovers
.imgfiles and groups them by base slug, automatically detecting variant types (.clean,.patched,.recovered,.cracked, and several others) - Detects filesystem type from the raw boot sector bytes – FAT12/16/32, early DOS formats, CP/M, NTFS, ext2/3/4 – without mounting
- Extracts disk geometry from image size, using a lookup table covering every standard floppy format from 160KB (1981) to 2.88MB (1991)
- Mounts via Linux loopback, falling back to mtools with explicit geometry, falling back to BPB patching on a temp copy, with a content-based sanity check at each stage to catch silent failures
- Runs ClamAV in two passes, including raw image scanning for boot sector infections
- Extracts readable text files – documentation, README files, source code in BASIC, C, Pascal, Assembly – and publishes them as individually-crawlable HTML pages, so a search engine can index a 1987 copyright notice or an author’s name embedded in a
REMstatement - Generates per-disk HTML reports in a retro green-screen CRT aesthetic, with directory trees, full SHA-256 file manifests, archivist’s notes, photo galleries, and a live mTCP NetDrive connect command
- Publishes a cross-archive file search, a
sitemap.xml, arobots.txt, and a staticall-disks.htmlspecifically for search engine crawling - Caches everything by image SHA-256, so unchanged disks cost nothing on subsequent runs
The Point
The software on these disks isn’t historically significant the way a Gutenberg Bible is significant. Most of it is small utilities, games, productivity tools, and programming experiments – written by hobbyists and professionals who were, in many cases, making things for the love of it.
That’s precisely why it matters.
The commercial software of the 1980s is relatively well-preserved. Companies had catalogues, revenues, lawyers, and reasons to maintain archives. The cover disk software – the stuff that came shrink-wrapped to a magazine, distributed to tens of thousands of subscribers and then largely forgotten – has no institutional custodian. Nobody owns the rights in any meaningful active sense. The authors have often lost their own copies.
But some of them are still out there. And some of them have children, and grandchildren, who might one day search the internet for their name and find a piece of code they wrote in 1987, preserved in a running state, mountable on a DOS machine via a TCP/IP protocol that didn’t exist when the disk was formatted.
That’s not a small thing.
The archive is live at dl.x86.world [or https://dl.x86.world] come and have a rummage…