Lempel–Ziv–Stac
Encyclopedia
Lempel-Ziv-Stac is a lossless data compression
algorithm
that uses a combination of the LZ77 sliding-window compression algorithm and fixed Huffman coding
. It was originally developed by Stac Electronics
for hard disk compression, and sold as the Stacker disk compression software. It was later specified as a compression algorithm for various network protocols. LZS is specified in the Cisco IOS
stack.
LZS compression is specified for various Internet protocols:
An LZS compressor looks for matches between the data to be compressed and the last 2 kB of data. If it finds a match, it encodes an offset/length reference to the dictionary. If no match is found, the next data byte is encoded as a "literal" byte. The compressed data stream ends with an end-marker.
A literal byte is encoded as a '0' bit followed by the 8 bits of the byte.
An offset/length reference is encoded as a '1' bit followed by the encoded offset, followed by the encoded length.
An offset can have a minimum value of 1, maximum value of 2047. A value of 1 refers to the most recent byte in the history buffer, immediately preceding the next data byte to be processed. An offset is encoded as:
A length is encoded as:
An end-marker is encoded as the 8-bit token 10000000. Following the end-marker, 0 to 7 extra 0 bits are appended as needed, to pad the stream to the next byte boundary.
has held several patents for LZS compression.
In 1993-94, Stac Electronics sued Microsoft for infringement of LZS patents in the DoubleSpace disk compression program included with MS-DOS 6.0.
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...
algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
that uses a combination of the LZ77 sliding-window compression algorithm and fixed Huffman coding
Huffman coding
In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on...
. It was originally developed by Stac Electronics
Stac Electronics
Stac Electronics, originally incorporated as State of the Art Consulting and later shortened to Stac, Inc, was a technology company founded in 1983...
for hard disk compression, and sold as the Stacker disk compression software. It was later specified as a compression algorithm for various network protocols. LZS is specified in the Cisco IOS
Cisco IOS
Cisco IOS is the software used on the vast majority of Cisco Systems routers and current Cisco network switches...
stack.
Standards
LZS compression is standardised as an INCITS (previously ANSI) standard.LZS compression is specified for various Internet protocols:
- RFC 1967 – PPP LZS-DCP Compression Protocol (LZS-DCP)
- RFC 1974 – PPP Stac LZS Compression Protocol
- RFC 2395 – IP Payload Compression Using LZS
- RFC 3943 – Transport Layer Security (TLS) Protocol Compression Using Lempel-Ziv-Stac (LZS)
Algorithm
LZS compression and decompression uses an LZ77 type algorithm. It uses the last 2 kB of uncompressed data as a sliding-window dictionary.An LZS compressor looks for matches between the data to be compressed and the last 2 kB of data. If it finds a match, it encodes an offset/length reference to the dictionary. If no match is found, the next data byte is encoded as a "literal" byte. The compressed data stream ends with an end-marker.
Compressed Data Format
Data is encoded into a stream of variable-bit-width tokens.A literal byte is encoded as a '0' bit followed by the 8 bits of the byte.
An offset/length reference is encoded as a '1' bit followed by the encoded offset, followed by the encoded length.
An offset can have a minimum value of 1, maximum value of 2047. A value of 1 refers to the most recent byte in the history buffer, immediately preceding the next data byte to be processed. An offset is encoded as:
- If the offset is less than 128: a '1' bit followed by a 7-bit offset value.
- If the offset is greater than or equal to 128: a '0' bit followed by an 11-bit offset value.
A length is encoded as:
Length | Bit Encoding |
---|---|
2 | 00 |
3 | 01 |
4 | 10 |
5 | 1100 |
6 | 1101 |
7 | 1110 |
8 to 22 | 1111 xxxx, where xxxx is length - 8 |
23 to 37 | 1111 1111 xxxx, where xxxx is length - 23 |
length |
(1111 repeated N times) xxxx, where N is integer result of (length + 7) / 15, and xxxx is length - (N*15 - 7) |
An end-marker is encoded as the 8-bit token 10000000. Following the end-marker, 0 to 7 extra 0 bits are appended as needed, to pad the stream to the next byte boundary.
Patents
Stac Electronics' spin-off HifnHifn
Hifn is a semiconductor manufacturer founded in 1996 as a spin-off from Stac Electronics. The company is headquartered in Los Gatos, California, and has offices in North America, Europe and Asia. It is active in the market of security processors...
has held several patents for LZS compression.
In 1993-94, Stac Electronics sued Microsoft for infringement of LZS patents in the DoubleSpace disk compression program included with MS-DOS 6.0.