Run-length encoding
Encyclopedia
Run-length encoding is a very simple form of data compression
in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, simple graphic images such as icons, line drawings, and animations. It is not useful with files that don't have many runs as it could greatly increase the file size.
RLE also refers to a little-used image format in Windows 3.x
, with the extension rle, which is a Run Length Encoded Bitmap, used to compress the Windows 3.x startup screen.
s in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line
, with B representing a black pixel and W representing white:
If we apply the run-length encoding (RLE) data
compression algorithm to the above hypothetical scan line, we get the following:
This is to be interpreted as twelve Ws, one B, twelve Ws, three Bs, etc.
The run-length code represents the original 67 characters in only 18. Of course, the actual format used for the storage of images is generally binary rather than ASCII
characters like this, but the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as DEFLATE
often use LZ77-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as
and is well suited to palette
-based iconic images. It does not work well at all on continuous-tone images such as photographs, although JPEG
uses it quite effectively on the coefficients that remain after transforming and quantizing
image blocks.
Common formats for run-length encoded data include Truevision TGA
, PackBits
, PCX
and ILBM
.
Run-length encoding is used in fax
machines (combined with other techniques into Modified Huffman coding
). It is relatively efficient because most faxed documents are mostly white space, with occasional interruptions of black.
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs: for example, simple graphic images such as icons, line drawings, and animations. It is not useful with files that don't have many runs as it could greatly increase the file size.
RLE also refers to a little-used image format in Windows 3.x
Windows 3.x
Windows 3.x can refer to either an individual or all of the following versions of Microsoft Windows:*Windows 3.0*Windows 3.1x*Windows 3.2...
, with the extension rle, which is a Run Length Encoded Bitmap, used to compress the Windows 3.x startup screen.
Example
For example, consider a screen containing plain black text on a solid white background. There will be many long runs of white pixelPixel
In digital imaging, a pixel, or pel, is a single point in a raster image, or the smallest addressable screen element in a display device; it is the smallest unit of picture that can be represented or controlled....
s in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line
Scan line
A scan line or scanline is one line, or row, in a raster scanning pattern, such as a line of video on a cathode ray tube display of a television set or computer monitor....
, with B representing a black pixel and W representing white:
-
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
If we apply the run-length encoding (RLE) data
compression algorithm to the above hypothetical scan line, we get the following:
-
12W1B12W3B24W1B14W
This is to be interpreted as twelve Ws, one B, twelve Ws, three Bs, etc.
The run-length code represents the original 67 characters in only 18. Of course, the actual format used for the storage of images is generally binary rather than ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
characters like this, but the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as DEFLATE
DEFLATE
Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....
often use LZ77-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as
BWWBWWBWWBWW
).Applications
Run-length encoding performs lossless data compressionLossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...
and is well suited to palette
Palette (computing)
In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...
-based iconic images. It does not work well at all on continuous-tone images such as photographs, although JPEG
JPEG
In computing, JPEG . The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality....
uses it quite effectively on the coefficients that remain after transforming and quantizing
Quantization (image processing)
Quantization, involved in image processing, is a lossy compression technique achieved by compressing a range of values to a single quantum value. When the number of discrete symbols in a given stream is reduced, the stream becomes more compressible. For example, reducing the number of colors...
image blocks.
Common formats for run-length encoded data include Truevision TGA
Truevision TGA
Truevision TGA, often referred to as TARGA, is a raster graphics file format created by Truevision Inc. . It was the native format of TARGA and VISTA boards, which were the first graphic cards for IBM-compatible PCs to support Highcolor/truecolor display...
, PackBits
PackBits
PackBits is a fast, simple lossless compression scheme for run-length encoding of data.Apple introduced the PackBits format with the release of MacPaint on the Macintosh computer. This compression scheme is one of the types of compression that can be used in TIFF-files...
, PCX
PCX
PCX is an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS imaging standards, although it has since been succeeded by more sophisticated image formats, such as GIF,...
and ILBM
ILBM
ILBM is a subtype of the Interchange File Format used for storing picture data. ILBM stands for InterLeaved BitMap which refers to the way the pictures are stored. The image data is stored as a varying number of bitplanes, each storing one bit of data for each pixel in the image...
.
Run-length encoding is used in fax
Fax
Fax , sometimes called telecopying, is the telephonic transmission of scanned printed material , normally to a telephone number connected to a printer or other output device...
machines (combined with other techniques into Modified Huffman coding
Modified Huffman coding
Modified Huffman coding is used in fax machines to encode black on white images . It combines the variable length codes of Huffman coding with the coding of repetitive data in run-length encoding....
). It is relatively efficient because most faxed documents are mostly white space, with occasional interruptions of black.
See also
- Look-and-say sequenceLook-and-say sequenceIn mathematics, the look-and-say sequence is the sequence of integers beginning as follows:To generate a member of the sequence from the previous member, read off the digits of the previous member, counting the number of digits in groups of the same digit...
- Comparison of graphics file formatsComparison of graphics file formats-General:Ownership of the format and related information.-Technical details:...
- Golomb codingGolomb codingGolomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb coding highly suitable for situations in which the...
- Modified Huffman codingModified Huffman codingModified Huffman coding is used in fax machines to encode black on white images . It combines the variable length codes of Huffman coding with the coding of repetitive data in run-length encoding....
- Burrows-Wheeler transformBurrows-Wheeler transformThe Burrows–Wheeler transform , is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center in Palo Alto, California...
- Run Length LimitedRun Length LimitedRun length limited or RLL coding is a line coding technique that is used to send arbitrary data over a communications channel with bandwidth limits. This is used in both telecommunication and storage systems which move a medium past a fixed head. Specifically, RLL bounds the length of stretches ...
- Bitmap indexBitmap IndexA bitmap index is a special kind of database index that uses bitmaps.Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, for example male and female, but many occurrences of those values. This would happen if, for...
- Forsyth–Edwards Notation, which uses run-length-encoding for empty spaces in chess positions.
External links
- Run-length encoding implemented in different programming languages (on Rosetta CodeRosetta CodeRosetta Code is a wiki-based programming chrestomathy website with solutions to various programming problems in many different programming languages. It was created in 2007 by Mike Mol. Rosetta Code includes 450 programming tasks, and covers 351 programming languages...
) - Variations on Run-length encoding by Daniel Lemire