Zlib
Encyclopedia
zlib is a software library
used for data compression
. zlib was written by Jean-Loup Gailly
and Mark Adler
and is an abstraction
of the DEFLATE
compression algorithm
used in their gzip
file compression program. Zlib is also a crucial component of many software platforms including Linux
, Mac OS X
, and the iOS. It has been also used in gaming consoles such as the Playstation 3
, Wii
, and Xbox 360
.
The first public version of zlib, 0.9, was released on 1 May 1995 and was originally intended for use with libpng
image library. It is free software
, distributed under the zlib license
.
data by adding a header and trailer. This provides stream identification and error detection which are not provided by the raw DEFLATE data.
The gzip header is larger than the zlib header as it stores a file name and other file system information. This is the header format used in the ubiquitous gzip
file format.
which is a variation of LZ77 (Lempel–Ziv 1977)
This algorithm provides good compression on a wide variety of data with minimal use of system resources. This is also the algorithm used in the ZIP archive format
.
It is unlikely that the zlib format will ever be extended to use any other algorithms, though the header makes allowance for this possibility.
A compression level value may be supplied which trades-off speed with compression.
There are also facilities for conserving memory. These are probably only useful in restricted memory environments such as some embedded systems.
If you are using the library to always compress specific types of data then using a specific strategy may improve compression and performance. For example, if your data contains long lengths of repeated bytes then the RLE (run-length encoding
) strategy may give good results at higher speed.
For general data, the default strategy is preferred.
Data corruption can be detected (as long as the data is written with a zlib or gzip header - see above).
Further, if full-flush points are written to the compressed stream then corrupt data can be skipped and the decompression will resynchronise at the next flush point. (No error recovery of the corrupt data is provided.) Full-flush points are useful for large data streams on unreliable channels where some last data loss is unimportant (e.g. multimedia), however creating too many flush points can dramatically affect speed and compression.
Repeated calls to the library allow an unlimited numbers of blocks of data to be handled. Some ancillary code (counters) may suffer from overflow for long data streams but this does not affect the actual compression or decompression.
When compressing a long (or infinite) data stream it would be advisable to write regular full-flush points.
standard
, to the point that zlib and DEFLATE are often used interchangeably in standards documents. Thousands of applications rely on it for compression, directly or indirectly, including:
zlib is also used in many embedded devices such as the Apple Inc. iPhone
and Sony
Playstation 3
because the code is portable, liberally-licensed and has a relatively small memory footprint.
Library (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....
used for data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
. zlib was written by Jean-Loup Gailly
Jean-Loup Gailly
Jean-Loup Gailly - is an author of gzip. He wrote the compression code of the portable archiver of the Info-ZIP and the tools compatible with the PKZIP archiver for MS-DOS...
and Mark Adler
Mark Adler
Dr. Mark Adler may be best known for his work in the field of data compression. Adler is the author of the Adler-32 hash function, a co-author of the zlib compression library and gzip, has contributed to Info-ZIP, and has participated in developing the Portable Network Graphics image format...
and is an abstraction
Abstraction (computer science)
In computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...
of the DEFLATE
DEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....
compression algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
used in their gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...
file compression program. Zlib is also a crucial component of many software platforms including Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
, and the iOS. It has been also used in gaming consoles such as the Playstation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
, Wii
Wii
The Wii is a home video game console released by Nintendo on November 19, 2006. As a seventh-generation console, the Wii primarily competes with Microsoft's Xbox 360 and Sony's PlayStation 3. Nintendo states that its console targets a broader demographic than that of the two others...
, and Xbox 360
Xbox 360
The Xbox 360 is the second video game console produced by Microsoft and the successor to the Xbox. The Xbox 360 competes with Sony's PlayStation 3 and Nintendo's Wii as part of the seventh generation of video game consoles...
.
The first public version of zlib, 0.9, was released on 1 May 1995 and was originally intended for use with libpng
Libpng
libpng is the official Portable Network Graphics reference library . It is a platform-independent library that contains C functions for handling PNG images...
image library. It is free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
, distributed under the zlib license
Zlib License
The zlib License is a permissive free software license which defines the terms under which the zlib and libpng software libraries can be distributed. It is also used by other free software packages....
.
Encapsulation
zlib compressed data is typically written with a gzip wrapper or a zlib wrapper. The wrapper encapsulates the raw DEFLATEDEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....
data by adding a header and trailer. This provides stream identification and error detection which are not provided by the raw DEFLATE data.
The gzip header is larger than the zlib header as it stores a file name and other file system information. This is the header format used in the ubiquitous gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...
file format.
Algorithm
zlib only supports one algorithm called DEFLATEDEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....
which is a variation of LZ77 (Lempel–Ziv 1977)
This algorithm provides good compression on a wide variety of data with minimal use of system resources. This is also the algorithm used in the ZIP archive format
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...
.
It is unlikely that the zlib format will ever be extended to use any other algorithms, though the header makes allowance for this possibility.
Resource use
The library provides facilities for control of processor and memory useA compression level value may be supplied which trades-off speed with compression.
There are also facilities for conserving memory. These are probably only useful in restricted memory environments such as some embedded systems.
Strategy
The compression can be optimized for specific types of dataIf you are using the library to always compress specific types of data then using a specific strategy may improve compression and performance. For example, if your data contains long lengths of repeated bytes then the RLE (run-length encoding
Run-length encoding
Run-length encoding is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run...
) strategy may give good results at higher speed.
For general data, the default strategy is preferred.
Error handling
Errors may be detected and skipped.Data corruption can be detected (as long as the data is written with a zlib or gzip header - see above).
Further, if full-flush points are written to the compressed stream then corrupt data can be skipped and the decompression will resynchronise at the next flush point. (No error recovery of the corrupt data is provided.) Full-flush points are useful for large data streams on unreliable channels where some last data loss is unimportant (e.g. multimedia), however creating too many flush points can dramatically affect speed and compression.
Data length
There is no limit to the length of data that can be compressed or decompressed.Repeated calls to the library allow an unlimited numbers of blocks of data to be handled. Some ancillary code (counters) may suffer from overflow for long data streams but this does not affect the actual compression or decompression.
When compressing a long (or infinite) data stream it would be advisable to write regular full-flush points.
Applications
Today, zlib is something of a de factoDe facto
De facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...
standard
Standardization
Standardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....
, to the point that zlib and DEFLATE are often used interchangeably in standards documents. Thousands of applications rely on it for compression, directly or indirectly, including:
- The Linux kernel, where it is used to implement compressed network protocols, compressed file systemFile systemA file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...
s and to decompress the kernel image itself at boot time. - libpngLibpnglibpng is the official Portable Network Graphics reference library . It is a platform-independent library that contains C functions for handling PNG images...
, the reference implementation for the PNG image format, which specifies DEFLATE as the stream compression for its bitmapBitmapIn computer graphics, a bitmap or pixmap is a type of memory organization or image file format used to store digital images. The term bitmap comes from the computer programming terminology, meaning just a map of bits, a spatially mapped array of bits. Now, along with pixmap, it commonly refers to...
data. - LibwwwLibwwwlibwww is a highly-modular client-side web API for Unix and Windows, and is also the name of the reference implementation of this API....
, an API for web applications like web browserWeb browserA web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content... - The Apache HTTP serverApache HTTP ServerThe Apache HTTP Server, commonly referred to as Apache , is web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million website milestone...
, which uses zlib to implement HTTP/1.1Http compressionHTTP compression is a capability that can be built into web servers and web clients to make better use of available bandwidth , and provide faster transmission speeds between both...
. - The OpenSSHOpenSSHOpenSSH is a set of computer programs providing encrypted communication sessions over a computer network using the SSH protocol...
client and server, which rely on zlib to perform the optional compression offered by the Secure ShellSecure ShellSecure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...
protocol. - The OpenSSLOpenSSLOpenSSL is an open source implementation of the SSL and TLS protocols. The core library implements the basic cryptographic functions and provides various utility functions...
and GnuTLSGnuTLSGnuTLS , the GNU Transport Layer Security Library, is a free software implementation of the SSL and TLS protocols. Its purpose is to offer an application programming interface for applications to enable secure communication protocols over their network transport layer.-Features:GnuTLS consists of...
security libraries, which can optionally use zlib to compress TLSTransport Layer SecurityTransport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...
connections. - The FFmpegFFmpegFFmpeg is a free software project that produces libraries and programs for handling multimedia data. The most notable parts of FFmpeg are libavcodec, an audio/video codec library used by several other projects, libavformat, an audio/video container mux and demux library, and the ffmpeg command line...
multimedia library, which uses zlib to read and write the DEFLATE-compressed parts of stream formats such as MatroskaMatroskaThe Matroska Multimedia Container is an open standard free container format, a file format that can hold an unlimited number of video, audio, picture or subtitle tracks in one file. It is intended to serve as a universal format for storing common multimedia content, like movies or TV shows...
. - The rsyncRsyncrsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...
remote file synchronizer, which uses zlib to implement optional protocol compression. - The dpkgDpkgdpkg is the software at the base of the Debian package management system. dpkg is used to install, remove, and provide information about .deb packages....
and RPMRPM Package ManagerRPM Package Manager is a package management system. The name RPM variously refers to the .rpm file format, files in this format, software packaged in such files, and the package manager itself...
package managers, which use zlib to unpack files from compressed software packages. - The Subversion and CVSConcurrent Versions SystemThe Concurrent Versions System , also known as the Concurrent Versioning System, is a client-server free software revision control system in the field of software development. Version control system software keeps track of all work and all changes in a set of files, and allows several developers ...
version control systems, which use zlib to compress traffic to and from remote repositories. - The GitGit (software)Git is a distributed revision control system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development. Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on...
version control system uses zlib to store the contents of its data objects (blobs, trees, commits and tags). - The PostgreSQLPostgreSQLPostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...
RDBMS uses zlib with custom dump format (pg_dump -Fc) for database backups.
zlib is also used in many embedded devices such as the Apple Inc. iPhone
IPhone
The iPhone is a line of Internet and multimedia-enabled smartphones marketed by Apple Inc. The first iPhone was unveiled by Steve Jobs, then CEO of Apple, on January 9, 2007, and released on June 29, 2007...
and Sony
Sony
, commonly referred to as Sony, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan and the world's fifth largest media conglomerate measured by revenues....
Playstation 3
PlayStation 3
The is the third home video game console produced by Sony Computer Entertainment and the successor to the PlayStation 2 as part of the PlayStation series. The PlayStation 3 competes with Microsoft's Xbox 360 and Nintendo's Wii as part of the seventh generation of video game consoles...
because the code is portable, liberally-licensed and has a relatively small memory footprint.
External links
- zlib home page
- RFC 1950—ZLIB Compressed Data Format
- RFC 1951—DEFLATE Compressed Data Format
- RFC 1952—GZIP file format