Zlib Flush Modes
WARNING: this document is still a working draft, and the information contained therein may be erroneous. Moreover, this document is in no way normative and although I am doing my best to make it clear and correct, I will never, I repeat never guarantee anything on that subject. Any kind of guarantee is explicitely denied. You may use the information stated in this document at your own risk.
If you notice any error in this document, please send me an e-mail: <pornin@bolet.org>
(Il n'existe pas de version française de ce document.)
Introduction
Zlib is a C library which implements the DEFLATE lossless compression algorithm. That algorithm has been specified in RFC 1951, where zlib is considered to be the reference implementation. RFC 1951 describes the core format for data, but data streams compressed with DEFLATE are often encapsulated in outer formats which include checksums. Two such outer formats are "zlib" (RFC 1950) and "gzip" (RFC 1952). In this document, we consider only DEFLATE and not those outer formats.
DEFLATE may be used with some streamed transport protocols such as PPP (see RFC 1979), TLS (see RFC 3749), and SSH (see RFC 4253). These protocols split data streams into ordered elements often called "packets" or "records". When DEFLATE is used, the successive records are meant to be part of a single continuous compressed stream. However, compression with DEFLATE is an intrinsically buffered process, hence some kind of flush operation is needed when some data must be sent to the peer: all data bytes currently scheduled for sending must make it to the peer, without knowledge of the next data byte, and even without knowing if there will be a "next data byte".
This flush operation is not specified in RFC 1951. Zlib implements two flush operations, called "sync flush" and "partial flush". The purpose of this page is to document these flush operations:
- what zlib does when flushing;
- what zlib expects when the peer should flush;
- what a conforming DEFLATE implementation is allowed to do when flushing;
- what a conforming DEFLATE implementation must be able to handle when receiving flushed data from a peer.
DEFLATE Format Details
In this section, I give some information about the DEFLATE format. This is in no way a complete DEFLATE specification; rather, I outline some of the fine details which are of interest for the flush operation.
A DEFLATE compressed stream consists of a number of blocks. Each block contains the compressed version of some data bytes. Blocks are bit strings; their length is not necessarily a multiple of 8. This means that a block may finish "in the middle of a byte" (I use the word "byte" to mean "an 8-bit quantity"; the term "octet" would be technically better, but RFC 1951 uses the term "byte" everywhere). However, the transport protocols which use DEFLATE are byte-oriented, and may send only an integral number of bytes. Blocks are tightly packed. Hence, when a block is finished, the last bits may have to wait before being sent, because the byte which contains them also contains the first few bits of the next block.
Each block begins with a three-bit header which specifies the block type (three block types are defined) and whether this is the final block in the stream. That header is then followed by block specific bits. Each block type is self-terminated, i.e. the knowledge of the block bits is enough to know whether the next bit begins a new header.
Type 0 blocks
A "type 0" block is a block where data is not compressed. It has the unique feature of including a "byte-alignment" procedure: following the three-bit header are stored between 0 and 7 bits (all equal to zero) so that the end of the current byte is reached.
The uncompressed data length for the block is then written twice: first as a 16-bit integer (this block type is limited to 65535 bytes at most), then the one's complement of the same value (this repetition is for error checking and assisting in recovering from damaged compressed streams: the special pattern of a 16-bit value followed by its one's complement can be easily detected, which allows the inflater to restart at that point). These four length bytes are followed by the uncompressed data bytes. There is no block trailer.
The type 0 blocks imply a maximum bound on the compressed stream size: whenever data does not compress well enough, type 0 blocks can be used, thus guaranteeing that an optimal compression process will not enlarge the stream by more than 5 bytes every 65535 bytes of input data. It should be noted that most compressors are not fully optimal; recent versions of zlib tend to produce type 0 blocks with 16 kB worth of data instead of 64 kB. The asymptotic overhead is still very low. Record-based protocols usually use smaller data sets anyway (for instance, a TLS record contains at most 16 kB of application data, and PPP packets are much smaller than that).
Type 1 blocks
Type 1 blocks assume that the input data is first processed by a LZ77 unit. The data then becomes a sequence of symbols. Each symbol encodes one of the following:
- a literal byte value between 0 and 255;
- a special end-of-block (EOB) marker;
- a sequence copy operation with a length n (between 3 and 258) and a distance d (between 1 and 32768), which specifies that the next n data bytes are each equal to the corresponding data byte located d positions before in the input data stream (this somehow "copies" the previous n-byte sequence, except that overlap is allowed between the original and the copy).
A special alphabet is defined with 286 symbols. The symbols 0 to 255 encode the literal bytes. The symbol 256 encodes the EOB marker. The symbols 257 to 285 introduce a sequence copy operation. Each symbol is encoded as a specific bit sequence of length 7, 8 or 9, depending on the symbol value. When the symbol is a copy symbol (between 257 and 285), it is followed by 0 to 5 extra bits which complement the length information (the number of bits depends on the symbol value), then a 5-bit distance code, which is itself followed by up to 13 bits of data; the distance code and subsequent extra bits encode the distance at which the original sequence is to be found.
What must be noted is that type 1 blocks consist of sequences of quantities which all use an integral but varying number of bits. The last such quantity is the 7-bit code for EOB, which happens to consist of 7 bits equal to 0.
Type 2 blocks
Type 2 blocks are similar to type 1 blocks, except that the bit sequences encoding a symbol or a distance code are not fixed: they do not change within a block, but they may change, both in length and contents, between distinct blocks. Huffman codes are used to compute code lengths and values, based on the encountered symbol frequencies (Huffman codes are prefix, which means that all codes are self-terminated). A type 2 block begins with a compact description of the codes used for that block, then followed by the codes themselves, interspersed with the extra bits for sequence copy symbols. The computed Huffman codes may have lengths ranging from 1 to 15 bits (inclusive).
This means that the EOB marker may use from 1 to 15 bits in a type 2 block, and does not necessarily consist of a sequence of bits equal to zero.
Zlib lookahead
Although everything is quite self-terminated in DEFLATE, zlib (when inflating data) needs some lookahead, for performance reasons: reading compressed bits one by one is deemed too slow. Type 0 blocks are no problem: their header encodes their length, which allows zlib to read the whole block data precisely, and not a single bit more. For type 1 and type 2 blocks, zlib uses a lookup table to process input bits by 9-bit chunks. This means that zlib will require at least 9 bits worth of compressed data to decode the next symbol, even if that symbol is encoded over less than 9 bits.
Zlib inflater code operates as a state machine, which decompresses as much data as possible given the available compressed bytes (this "as possible" is subject to the 9-bit lookahead). The core idea of a flush operation is that the bytes sent to the peer (which runs the inflater) shall contain the last meaningful symbol and enough subsequent bits so that this symbol can be decoded with the 9-bit lookahead. Note that the EOB marker is not a meaningful symbol (i.e. it results in no extra application data). This process is illustrated by the following pseudo-code:
repeat gather 9 input bits of lookahead if not enough then exit decode next symbol
Note: recent versions of zlib do not need these 9 bits of lookahead. But the flush modes described hereafter still assume that the 9 bits are needed by the inflater (this is needed for backward compatibility with previous versions, and Mark Adler, one of the two authors of zlib, tells me that there is no plan to change the semantics described in this document).
Sync Flush
The "sync flush" is what zlib implements when used with the Z_SYNC_FLUSH flag. It performs the following tasks:
- If there is some buffered but not yet compressed data, then this data is compressed into one or several blocks (the type for each block will depend on the amount and nature of data).
- A new type 0 block with empty contents is appended.
A type 0 block with empty contents consists of:
- the three-bit block header;
- 0 to 7 bits equal to zero, to achieve byte alignment;
- the four-byte sequence 00 00 FF FF.
Therefore, the flushed stream always uses an integral number of bytes. As an added bonus, if the flush occurs at externally predictible moments (e.g. at the end of a record, for transport protocols where data is split into length-explicit records), and the sender and receiver agree that a sync flush is always used, then the four-byte sequence 00 00 FF FF needs not be transmitted: since it is always there, the sender can omit it, and the receiver can add it back before giving the bytes to its inflater engine. This is precisely what is done when DEFLATE is used with PPP (see RFC 1979).
The sync flush method is documented in zlib as the flush method which should be used. When applying this flush mode, the sender shall make sure that the complete type 0 block is sent, i.e. with the 00 00 FF FF sequence (unless the convention described above can be applied).
Partial Flush
The "partial flush" method is the first flush method which zlib implemented. It is accessible with the Z_PARTIAL_FLUSH parameter. Its documentation has been somewhat altered in recent zlib versions: it now reads "will be removed, use Z_SYNC_FLUSH instead" (version 1.2.3 of zlib). In other words, it is a deprecated flush method. But it is also the standard flush method in some protocols, e.g. SSH. The RFC 4253 actually describes the flush method for "zlib" in these terms:
The compression context is initialized after each key exchange, and is passed from one packet to the next, with only a partial flush being performed at the end of each packet. A partial flush means that the current compressed block is ended and all data will be output. If the current block is not a stored block, one or more empty blocks are added after the current block to ensure that there are at least 8 bits, counting from the start of the end-of-block code of the current block to the end of the packet payload.
The description for SSH is accurate, if not extremely clear. I hope this document will be easier to grasp for readers.
Note: since the partial flush is now part of the SSH standard, it will probably be "reprecated" in the next zlib version. The code was not meant to be removed anyway, but a more descriptive comment will be put back. For newer protocols, the sync flush is still recommended, because it uses less bandwidth (provided that the trick of removing the 00 00 FF FF sequence is applied) and is more resilient to stream damage (thanks to the byte alignment feature).
A partial flush consists of the following steps:
- If there is some buffered but not yet compressed data, then it is compressed into one or several blocks.
- An empty type 1 block is sent.
- Possibly, a second empty type 1 block is sent.
An empty type 1 block consists of the three-bit block header, followed by the seven-bit EOB code, hence a grand total of 10 bits. The whole trickery is how to decide whether a second empty type 1 block should be sent. Zlib uses the following test:
- Let
u
be the length (in bits) of the EOB marker of the last block (if there was buffered data when the flush was decided, then the last block is the last used to send that data). If the last block was a type 0 block, or if there is no last block (flush at the very beginning of the stream), then let u = 8. - The first empty type 1 block is sent. That block consists of 10 bit. Out of those 10 bits, v could be sent because they were part of complete bytes. From the sender point of view, there are b computed but unsent bits (between 0 and 7 bits, inclusive), and thus v = 10 - b. It follows that v ranges from 3 to 10.
- If u + v is strictly lower than 8, then a second empty block is sent. Otherwise, no second block is sent.
The rationale is the following: the inflater needs 9 bits of lookahead to decode the last meaningful symbol, which may be encoded over a single bit. Hence, at least 8 more bits must be sent. The following points shall be noted:
- The deflater could actually know exactly how many bits the symbol immediately before the EOB of the last block used. But this knowledge is not used by zlib. Rather, zlib simply assumes that the said symbol could have used only one bit.
- This test is performed even if the rationale does not apply, for instance because the previous block was a type 0 or type 1 block, or possibly did not exist at all.
- Similarly, the first type 1 empty block is always sent, even if the previous block was a type 0 block and thus no actual flush operation was really needed.
- Every sent block updates the value of u. In particular, empty type 1 blocks set it to 7 and empty type 0 blocks set it to 8.
This is what zlib does, and, of course, following these rules guarantees that a zlib-based receiver will operate cleanly. In other words, if a partial flush is to be used, then, for interoperability purposes, the deflater should use the exact same test.
Full Flush
The "full flush" (with Z_FULL_FLUSH) is a variant of the sync flush. The difference lies in the LZ77 step. Recall that the LZ77 algorithm replaces some sequences of characters by references to an identical sequence occurring somewhere in the previous uncompressed data (the backward distance is up to 32 kB in DEFLATE). This can be viewed in the following way: the previous data bytes collectively represent a dictionary of sequences, which the LZ77 unit can use by including symbolic references, when such a sequence is encountered.
At the very beginning of the stream, the dictionary is empty (it can be set to arbitrary data, but the deflater and the inflater must use the same preset dictionary, which is a convention outside the scope of this document). It is then filled with the first 32 kB of data. As more data is processed, the dictionary is constantly updated, and entries which become too old are dropped. The full flush is a sync flush where the dictionary is emptied: after a full flush, the deflater will refrain from using copy symbols which reference sequences appearing before the flush point.
From the inflater point of view, no special treatment of full flushes is needed. The difference between a sync flush and a full flush alters only the way the deflater selects symbols. A full flush degrades the compression efficiency since it removes sequence sharing opportunities for the next 32 kB of data; however, this degradation is very slight if full flushes are applied only rarely with regards to the 32 kB window size, e.g. every 1 MB or so. Full flushes are handy for damage recovery:
- a full flush is also a sync flush, which includes the very specific and highly recognizable 00 00 FF FF pattern;
- byte alignment is restored by a full flush;
- inflation can take over from a full flush point without any knowledge of the previous bytes.
It is therefore recommended to include occasional full flushes in long streams of data, except if the outer protocol makes it useless (e.g. TLS and SSH are security protocols which apply cryptographic integrity checks which detect alterations with a very high probability, and it is specified that they MUST NOT try to recover from damage, since such damage could be malicious).
Flush Modes for Some Protocols
In this section, I list some protocols which may use DEFLATE, and how the flush modes apply to them.
PPP
PPP can use DEFLATE; this is documented in RFC 1979. Each packet begins at a block boundary. Each packet MUST end with a "sync flush" (or a "full flush", at the sender choice) and the 00 00 FF FF final sequence MUST be omitted from the actually sent bytes.
This means that each PPP packet will contain one or several DEFLATE blocks, and no DEFLATE block will span over several packets.
TLS
TLS can use several compression protocols, although only the "null" compression (no compression at all) is mandatory. The use of DEFLATE with TLS is specified in RFC 3749. This specification is particularly empty of details. The only relevant part is the single sentence:
Flushing ensures that each compressed packet payload can be decompressed completely.
In other words, the sender should apply some kind of flushing for each record. Which flush mode should be used is not specified. Recent versions of OpenSSL (0.9.8 and beyond) use a sync flush, and include the 00 00 FF FF sequence since there is no convention for omitting it (TLS records range up to 16 kB of clear data, and already have a few dozen bytes of overhead for the encryption and cryptographic integrity check, hence this four-byte sequence is no big deal). Since OpenSSL is quite widely used, and since OpenSSL developers usually mind about interoperability issues, it is then recommended to apply a sync flush for each outgoing record (a full flush is not needed because the TLS cryptographic mechanisms make sure that previous data is neither lost nor damaged).
TLS implementations, when receiving records, must be prepared for all kind of flush methods. In particular, the sender may use the "partial flush". One consequence is that a record may begin "in the middle" of an empty type 1 block used for partial flushing.
Note: when the receiver processes a record, and reaches the end of a block, it may note how many bytes remain untouched by the inflater process. An empty type 1 block uses 10 bits, but a non-empty block of any type uses at least 18 bits. It follows that if, at the end of a DEFLATE block, zero or one record byte has not been touched by the inflater, then it can be assumed that the untouched byte (if any) is part of an empty type 1 block and can be ignored until the next record is received. Conversely, if two or more bytes remain, then the receiver may safely assume that the (possibly empty) next block is completely encoded in the subsequent bytes and can be processed without fear of reaching the record end before the block end. This property is useful when using a DEFLATE implementation which is not a state machine, but rather processes incoming compressed data on a block basis, with blocking read requests until the block end is reached.
SSH
SSH is a complex assembly of transport, authentication and serialization protocols, which collectively specify how a remote authenticated sessions can be obtained, with multiplexing, raw data transfers and terminal emulation. DEFLATE compression can be applied; this is specified in section 6.2 of RFC 4253.
For SSH, the partial flush is mandatory. See the section above about TLS for some details about how the record boundaries can be handled with regards to split blocks.
IP
Raw IP packets may benefit from DEFLATE compression. This usage is specified in RFC 2393 and RFC 2394. Since IP packets may be duplicated, lost, or transmitted in arbitrary order, this use of DEFLATE cannot assume any kind of state transmission between subsequent packets. Hence, each packet must be decompressed independantly, and flushing does not apply (each packet contains a complete sequence of DEFLATE blocks, the last of which being marked "final" and byte-aligned with trailing zero bits).
This DEFLATE usage cannot benefit from sequence sharing between distinct packets. Its efficiency is thus less than what is achieved by applying compression at a higher level, e.g. on connected TCP streams. To my knowledge, this IP-based compression is almost never used anywhere.
HTTP
HTTP can use DEFLATE. Data is encoded in either "zlib" or "gzip" formats (HTTP 1.1 can use both; HTTP 1.0 knows only "gzip"). In these modes, the data is first compressed as if it was a complete file, and then sent with a proper transfer encoding, as specified by HTTP. No flushing applies per se, since the complete compressed file is sent as a big chunk. Partial sending can still be performed, by using the "gzip" format and a streamed transfer encoding (e.g. "chunked"). The gzip format specifies that the compressed streams consists of one or several "members", and each member may use DEFLATE (independantly of each other). Since each member has its own CRC, some HTTP clients will accept to process a member before receiving the next member. This is a rather "border line" usage of HTTP as a tunnel, which may not be applicable in all generality. Anyway, none of this uses a DEFLATE flush: each gzip member contains a complete sequence of DEFLATE blocks.
HTTP 1.1 is specified in RFC 2616.