The anatomy of an empty floppy

Wed, Oct 16, 2024

While developing DOSContainer, I restarted my efforts multiple times over due to me underestimating the complexity of the task. The current attempt starts at the very beginning and works up from there. The very beginning of the PC-era being the IBM 5150 with two 160KB 5.25" floppy drives. IBM had sourced an operating system from Microsoft that it would sell as PC-DOS to go with the system. How that deal came to be is the subject matter for another article. Right now, let’s dive into what’s needed for IBM PC-DOS 1.00 to accept an empty floppy for use.

Floppy specifications

Floppy disks, for those too young to have seen them in the wild, are a means of removable storage. The IBM PC came with two drives that took single-sided 5.25" floppies with a capacity of 160KB each. That’s where I’m starting with DOSContainer so that’s what I’m dissecting in the most detail I possibly can. As it turns out, there’s quite a lot of detail even before you write anything useful to the disk.

A bunch of 5.25" floppies.

A floppy on the IBM-PC is divided into sides, tracks and sectors. Being single-sided, the initial PC could use only one side of the disk. This side would then be divided into 40 concentric rings, the so-called tracks. These ring-shaped tracks would then be divided into 8 sectors each, leading up to 1 side holding 40 tracks with 8 sectors each. That’s 320 raw sectors, each of which holds 512 bytes of information adding up to a grand total of 163,840 bytes or 160KB of raw capacity.

The file system

In order to make sense of a floppy’s contents, files need to be written to it in a structured manner. IBM PC-DOS 1.00 has a utility to prepare a disk for use named FORMAT. The idea is that you have your system disk in drive A, put a new disk in drive B and then type FORMAT B: in order to prepare the disk for use. The action of “formatting” a disk destroys any and all data that may have been present on the disk so be careful what you wish for. Once done, the floppy will contain the bare-bones structures needed for the rest of PC-DOS to store and retrieve files on it. The most important structural elements are: the boot sector, two copies of the file allocation table, the root directory and the data region of the disk.

The boot sector

The boot sector is the very first sector on the floppy, located at sector 0. On the original PC this sector is always identical. In later operating systems the contents of the boot sector are adjusted to suit the physical characteristics of the disk, but not so on PC-DOS 1.00. It only knows about the 160KB variant and nothing else. While you could replace the floppy drives in a PC with more versatile ones, you would also need a newer version of the operating system to handle them. Since only one type of floppy exists in the PC-DOS 1.00 universe, the boot sector gets written verbatim from a part of the FORMAT utility onto sector zero of any floppy it formats. You may be surprised to learn that this sector contains executable code also on floppies that do not contain anything operating system related at all.

Another interesting little factoid about the early IBM boot sector is that the author of the executable code in it actually signed his work. When you look at the bytes of an IBM PC-DOS 1.00 boot sector, you’ll find the name of Bob (Robert) O’Rear who was the seventh employee of Microsoft at the time.

Hex dump of the IBM PC-DOS 1.00 boot sector.

File Allocation Tables

Sectors 1 and 2 each contain a copy of the File Allocation Table (FAT). The purpose of the File Allocation Table is to keep track of which “cluster” is allocated to which file’s data in the data region of the disk. On an original IBM PC floppy a “cluster” would map onto exactly a single sector of data. The idea behind having two copies of the FAT originally was to have a backup in case the primary table got damaged. This idea didn’t exactly pan out in practice due to the inability of a disk recovery utility to conclusively determine which of the FAT’s was the damaged one at any one time. Regardless, keeping two copies of the FAT is a practice that remains with it to this very day even though recovery utilities generally don’t do much with them.

Root directory

Sector 3, 4, 5 and 6 are occupied by the root directory. This is sequential table of contents for the floppy. Each entry in the root directory occupies 32 bytes. The root directory occupying 4 sectors adds up to 32 entries in the root directory. That’s also where the ability for IBM PC-DOS 1.00 to store files ends: there is no MKDIR command in PC-DOS 1.00 yet, so you can’t create directories/folders yet. After storing 32 files, the disk reports as full because the root directory is out of entries, even though the data region may still have sectors available.

An empty root directory in later implementations of FAT file systems can simply be bytes set to a value of zero. In the case of IBM we’re dealing with a corporate behemoth that had very specific ideas on how computers were supposed to work. They invented considerable chunks of the field so that should come as no surprise. In order for a floppy to be usable on an IBM PC, the root directory gets filled with empty placeholder entries that start with a byte of 0xE5, which is the marker for a file or subdirectory that is deleted. Semantically it implies that something existed in this place before, but not anymore. The rest of the 32 bytes for each entry are filled with 0xF6, which is the standard filler byte that IBM used on its bigger systems to indicate “empty space” on storage systems at the time.

The data region

Right after the root directory you’ll find the data region. These are empty sectors that will contain the actual contents of the files you write to the floppy. Contrary to later FAT implementations, IBM also wipes all of this with the same 0xF6 filler bytes, even though that is not technically required. Setting the cluster value to zero in the FAT should be enough to mark the space in the data region as available and changing the first character of the root directory entry to 0xE5 marks it as effectively “deleted”. In later FAT implementations this would allow tools to effectively “undelete” files or even “unformat” whole disks. Not so on the early IBM PC: once you format a disk, it’s all wiped into oblivion. While this is a clean approach and prevents confidential data from lingering on a disk, it does put an extra responsibility on the operator of the machine: a format is final.