skip to content

DNS Decompression (RFC 1035)

Learnings from implementing a toy dns parser.

DNS uses a clever little trick to compress the domain address data. Explained neatly in RFC 1035.

When we compress the data we first put the length of the string and then the string itself. Example foo becomes 3foo.

When we are parsing and decompressing, the protocol uses a clever little pointer methodology to compress the data.

A pointer is represented in the form of an octet as

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 1| OFFSET |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

The first two bits are 1 and the rest is the offset. Then the offset will have the location where to search.

Eg

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 1| 20 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Here we need to go to location 20 where it might have the data like

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 | F |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Domain names and labels

Domain names in messages are expressed in terms of a sequence of labels. Each label is represented as a one octet length field followed by that number of octets. Since every domain name ends with the null label of the root, a domain name is terminated by a length byte of zero. The high order two bits of every length octet must be zero, and the remaining six bits of the length field limit the label to 63 octets or less.

To simplify implementations, the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less.

When parsing the data identifying the pointer bit is done by using binary AND against a byte 11000000 which is essentially value 192 and observe the result. If we are getting a value which is not zero it means that it’s a pointer and we can go and decompress it.

Snippet from my code below

if ((length) & 0b11000000) != 0 {
let r = decode_compressed_name(length, reader);
parts.push(r);
break;
} else {
let t = reader.read(length as usize);
parts.push(String::from_utf8(t).unwrap());
}

A clever little algorithm. Full code for my project can be found at github


Updated on