Run Length Limited
Encyclopedia
Run length limited or RLL coding is a line coding
technique that is used to send arbitrary data over a communications channel
with bandwidth limits. This is used in both telecommunication
and storage systems which move a medium past a fixed head. Specifically, RLL bounds the length of stretches (runs) of repeated bits during which the signal does not change. If the runs are too long clock recovery
is difficult, while if they are too short, the high frequencies might be attenuated by the communications channel. By modulating
the data
, RLL reduces the timing uncertainty in decoding
the stored data which would lead to the possible errant insertion or removal of bits when reading the data back. Run length limited codes were widely used in hard disk
drives until the mid-1980s and are used in digital optical discs such as CD, DVD
, MD
, Hi-MD
and Blu-ray as the mechanism to ensure that the boundaries between bits can always be accurately found (preventing bit slip
) while efficiently using the media to reliably store the maximum amount of data in a given space. Early disk drives used very simple encoding schemes, such as RLL (0,1) FM code, but higher density RLL (2,7) and RLL (1,7) codes became the de facto industry standard for hard disks by the early 1990s.
, information is represented by changes in the direction of the magnetic field on the disk. In a computer, information is represented by the voltage on a wire. No voltage on the wire would be a binary zero, and a positive voltage on the wire represents a binary one. Magnetic media, on the other hand, always carries a magnetic flux - either a "north" pole or a "south" pole. In order to convert the magnetic fields to binary data, some encoding method must be used to translate between the two.
One of the simplest practical codes, Modified Non-Return-to-Zero-Inverted (NRZI), simply encodes a 1 as a magnetic polarity transition, also known as a "flux reversal", and a zero as no transition. With the disk spinning at a constant rate, each bit is given an equal time period, a "data window," for the magnetic signal that represents that bit, and the flux reversal, if any, occurs at the start of this window. (Note: older hard disks used the same data window over the whole disk, but modern disks are more complicated; for more on this, see zoned bit recording.)
While this method is simple, it is prone to errors for long runs of zeros.
In a simple example, consider the binary pattern 101 with a data window of 1 ns (one nanosecond, or one billionth of a second). This will be stored on the disk as a change, followed by no change, and then another change. If the preceding magnetic polarity was already positive, the resulting pattern might look like this: −−+. A value of 255, or all binary ones, would be written as −−+−+−+−+ or +−+−+−+−. A zero byte would be written as ++++++++ or −−−−−−−−. A 512 byte sector of zeros would be written as 4,096 sequential bits with the same polarity.
Since a disk drive is a physical piece of hardware, the rotational speed of the drive can change slightly, due to a change in the motor speed or thermal expansion of the disk platter. The physical media on a floppy disk can also become deformed, causing larger timing errors, and the timing circuit on the controller itself may have small variations in speed. The problem is that, with a long string of zeros, there's no way for the disk drive's controller to know the exact position of the read head, and so a speed variation of even .01% could result in 4 bits being added to or removed from the data stream. Without some form of synchronization and error correction, the data would become completely unusable.
The other problem is due to the limits of magnetic media itself: it is only possible to write so many polarity changes in a certain amount of space, so there's an upper limit to how many 1's can also be written sequentially.
To prevent this problem, data is coded in such a way that long repetitions of a single binary value do not occur. By limiting the number of zeros written consecutively, this makes it possible for the drive controller to stay in sync. By limiting the number of 1's written in a row, the overall frequency of polarity changes is reduced, allowing the drive to store more data in the same amount of space, resulting in either a smaller package for the same amount of data or more storage in the same size package.
engineers, and was first used commercially in 1979 on the IBM 3370 DASD
, for use with the 4300 series mainframe
. During the late 1980s, PC
hard disks began using RLL. RLL codes have found almost universal application in optical disc recording practice since 1980. In consumer electronics, RLLs like the EFM code
with (Eight-to-Fourteen:rate = 8/14, d=2, k=10) are employed in the Compact Disc
(CD) and MiniDisc
(MD), and the EFMPlus code (rate = 8/16, d=2, k=10) used in the DVD
. Parameters d, k are the minimum and maximum allowed run-lengths. For more coverage on the storage technologies, the references cited in this article are useful.
Somewhat confusingly, the run length is the number of zeros (0, 3, 1, 2 and 5 in the preceding) between adjacent ones, which is one less than the number of bit times the signal actually remains unchanged. Run length limited sequences are characterized by two parameters, (d) and (k), which stipulate the minimum and maximum zero-bit run length that can occur in the sequence. So RLL's are generally specified as (d,k) RLL. e.g.: (1,3) MFM RLL.
Suppose a magnetic tape can support up to 3,200 flux reversals per inch. A Modified Frequency Modulation
or (1,3) RLL encoding stores each data bit as two bits on tape, but since there is guaranteed to be one 0 (non flux reversal) bit between any 1 (flux reversal) bits then it is possible to store 6,400 encoded bits per inch on the tape, or 3,200 data bits per inch. A (1,7) RLL encoding can also store 6,400 encoded bits per inch on the tape, but since it only takes 3 encoded bits to store 2 data bits this is 4,267 data bits per inch. A (2,7) RLL encoding takes 2 encoded bits to store each data bit, but since there is guaranteed to be two 0 bits between any 1 bits then it is possible to store 9,600 encoded bits per inch on the tape, or 4,800 data bits per inch.
The flux reversal densities on hard drives are significantly greater, but the same improvements in storage density are seen by using different encoding systems.
The added 1 bits are referred to as clock bits.
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: 1010111011111011101010111110
Clock: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
variant.
Example:
Data: 0010 1101 0001 1000
Encoded: 10010011011101111010
begins to get interesting, because it imposes a constraint of d=1 (a minimum of one 0 between each two 1's), thereby allowing a bit recording rate equal to twice the flux reversal limit. Although its coding rate is only 1/2, the doubled recording rate limit makes it equivalent to a rate-1 code.
Where "x" is the complement of the previous data bit (which is also the previous encoded bit). Except for that x bit, this is the same as the FM table, which is how this code gets its name. The inserted clock bits are 0 except between two 0 data bits.
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: x010010001010001001010010100
Clock: x 1 0 0 0 0 0 0 0 1 1 0 0 0
except (x, 0, 0, y) becomes (NOT x, x AND y, NOT y, 0, 0, 0)
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: 101 001 010 100 100 000 001
Example:
Data: 1 1 0 1 1 0 0 1 1
Encoded: 1000 001000 00001000
Where "x" is the complement of the previous encoded bit (i.e. 1 if the previous bit was 0, and 0 if the previous bit was 1).
Example:
Data: 0 1 0 0 1 1 0 1 0 1
Encoded: 010 101 000 000 010
Line code
In telecommunication, a line code is a code chosen for use within a communications system for baseband transmission purposes...
technique that is used to send arbitrary data over a communications channel
Channel (communications)
In telecommunications and computer networking, a communication channel, or channel, refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel...
with bandwidth limits. This is used in both telecommunication
Telecommunication
Telecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...
and storage systems which move a medium past a fixed head. Specifically, RLL bounds the length of stretches (runs) of repeated bits during which the signal does not change. If the runs are too long clock recovery
Clock recovery
Some digital data streams, especially high-speed serial data streams are sent without an accompanying clock signal. The receiver generates a clock from an approximate frequency reference, and then phase-aligns to the transitions in the data stream with a phase-locked loop...
is difficult, while if they are too short, the high frequencies might be attenuated by the communications channel. By modulating
Modulation
In electronics and telecommunications, modulation is the process of varying one or more properties of a high-frequency periodic waveform, called the carrier signal, with a modulating signal which typically contains information to be transmitted...
the data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
, RLL reduces the timing uncertainty in decoding
Decoding
Decoding is the reverse of encoding, which is the process of transforming information from one format into another. Information about decoding can be found in the following:* Digital-to-analog converter, the use of analog circuit for decoding operations...
the stored data which would lead to the possible errant insertion or removal of bits when reading the data back. Run length limited codes were widely used in hard disk
Hard disk
A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...
drives until the mid-1980s and are used in digital optical discs such as CD, DVD
DVD
A DVD is an optical disc storage media format, invented and developed by Philips, Sony, Toshiba, and Panasonic in 1995. DVDs offer higher storage capacity than Compact Discs while having the same dimensions....
, MD
MiniDisc
The disc is permanently housed in a cartridge with a sliding door, similar to the casing of a 3.5" floppy disk. This shutter is opened automatically by a mechanism upon insertion. The audio discs can either be recordable or premastered. Recordable MiniDiscs use a magneto-optical system to record...
, Hi-MD
Hi-MD
In January 2004, Sony announced the Hi-MD media storage format as a further development of the MiniDisc format. With its release in later 2004 came the ability to use newly-developed, high-capacity 1 gigabyte Hi-MD discs, sporting the same dimensions as regular MiniDiscs.- Main features :* The...
and Blu-ray as the mechanism to ensure that the boundaries between bits can always be accurately found (preventing bit slip
Bit slip
In digital transmission, bit slip is the loss of a bit or bits, caused by clock drift – variations in the respective clock rates of the transmitting and receiving devices....
) while efficiently using the media to reliably store the maximum amount of data in a given space. Early disk drives used very simple encoding schemes, such as RLL (0,1) FM code, but higher density RLL (2,7) and RLL (1,7) codes became the de facto industry standard for hard disks by the early 1990s.
Need for RLL coding
On a hard diskHard disk
A hard disk drive is a non-volatile, random access digital magnetic data storage device. It features rotating rigid platters on a motor-driven spindle within a protective enclosure. Data is magnetically read from and written to the platter by read/write heads that float on a film of air above the...
, information is represented by changes in the direction of the magnetic field on the disk. In a computer, information is represented by the voltage on a wire. No voltage on the wire would be a binary zero, and a positive voltage on the wire represents a binary one. Magnetic media, on the other hand, always carries a magnetic flux - either a "north" pole or a "south" pole. In order to convert the magnetic fields to binary data, some encoding method must be used to translate between the two.
One of the simplest practical codes, Modified Non-Return-to-Zero-Inverted (NRZI), simply encodes a 1 as a magnetic polarity transition, also known as a "flux reversal", and a zero as no transition. With the disk spinning at a constant rate, each bit is given an equal time period, a "data window," for the magnetic signal that represents that bit, and the flux reversal, if any, occurs at the start of this window. (Note: older hard disks used the same data window over the whole disk, but modern disks are more complicated; for more on this, see zoned bit recording.)
While this method is simple, it is prone to errors for long runs of zeros.
In a simple example, consider the binary pattern 101 with a data window of 1 ns (one nanosecond, or one billionth of a second). This will be stored on the disk as a change, followed by no change, and then another change. If the preceding magnetic polarity was already positive, the resulting pattern might look like this: −−+. A value of 255, or all binary ones, would be written as −−+−+−+−+ or +−+−+−+−. A zero byte would be written as ++++++++ or −−−−−−−−. A 512 byte sector of zeros would be written as 4,096 sequential bits with the same polarity.
Since a disk drive is a physical piece of hardware, the rotational speed of the drive can change slightly, due to a change in the motor speed or thermal expansion of the disk platter. The physical media on a floppy disk can also become deformed, causing larger timing errors, and the timing circuit on the controller itself may have small variations in speed. The problem is that, with a long string of zeros, there's no way for the disk drive's controller to know the exact position of the read head, and so a speed variation of even .01% could result in 4 bits being added to or removed from the data stream. Without some form of synchronization and error correction, the data would become completely unusable.
The other problem is due to the limits of magnetic media itself: it is only possible to write so many polarity changes in a certain amount of space, so there's an upper limit to how many 1's can also be written sequentially.
To prevent this problem, data is coded in such a way that long repetitions of a single binary value do not occur. By limiting the number of zeros written consecutively, this makes it possible for the drive controller to stay in sync. By limiting the number of 1's written in a row, the overall frequency of polarity changes is reduced, allowing the drive to store more data in the same amount of space, resulting in either a smaller package for the same amount of data or more storage in the same size package.
History
The RLL recording technique, specifically RLL (2,7), was originally developed by IBMIBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
engineers, and was first used commercially in 1979 on the IBM 3370 DASD
Direct access storage device
In mainframe computers and some minicomputers, a direct access storage device, or DASD , is any secondary storage device which has relatively low access time relative to its capacity....
, for use with the 4300 series mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
. During the late 1980s, PC
IBM PC compatible
IBM PC compatible computers are those generally similar to the original IBM PC, XT, and AT. Such computers used to be referred to as PC clones, or IBM clones since they almost exactly duplicated all the significant features of the PC architecture, facilitated by various manufacturers' ability to...
hard disks began using RLL. RLL codes have found almost universal application in optical disc recording practice since 1980. In consumer electronics, RLLs like the EFM code
Eight-to-Fourteen Modulation
Eight-to-fourteen modulation is a data encoding technique – formally, a channel code – used by compact discs and pre-Hi-MD MiniDiscs. EFMPlus is a related code, used in DVDs and SACDs. EFM and EFMPlus were both invented by Kees A...
with (Eight-to-Fourteen:rate = 8/14, d=2, k=10) are employed in the Compact Disc
Compact Disc
The Compact Disc is an optical disc used to store digital data. It was originally developed to store and playback sound recordings exclusively, but later expanded to encompass data storage , write-once audio and data storage , rewritable media , Video Compact Discs , Super Video Compact Discs ,...
(CD) and MiniDisc
MiniDisc
The disc is permanently housed in a cartridge with a sliding door, similar to the casing of a 3.5" floppy disk. This shutter is opened automatically by a mechanism upon insertion. The audio discs can either be recordable or premastered. Recordable MiniDiscs use a magneto-optical system to record...
(MD), and the EFMPlus code (rate = 8/16, d=2, k=10) used in the DVD
DVD
A DVD is an optical disc storage media format, invented and developed by Philips, Sony, Toshiba, and Panasonic in 1995. DVDs offer higher storage capacity than Compact Discs while having the same dimensions....
. Parameters d, k are the minimum and maximum allowed run-lengths. For more coverage on the storage technologies, the references cited in this article are useful.
Technical overview
Generally run-length is the number of bits for which signal remains unchanged. A run-length of 3 for bit 1, represents a sequence of '111'. For instance, the pattern of magnetic polarizations on the disk might be '+−−−−++−−−++++++', with runs of length 1, 4, 2, 3, and 6. However, run length limited coding terminology assumes NRZI encoding, so 1 bits indicate changes and 0 bits indicate the absence of change, the above sequence would be expressed as '11000101001000001', and only runs of zero bits are counted.Somewhat confusingly, the run length is the number of zeros (0, 3, 1, 2 and 5 in the preceding) between adjacent ones, which is one less than the number of bit times the signal actually remains unchanged. Run length limited sequences are characterized by two parameters, (d) and (k), which stipulate the minimum and maximum zero-bit run length that can occur in the sequence. So RLL's are generally specified as (d,k) RLL. e.g.: (1,3) MFM RLL.
Suppose a magnetic tape can support up to 3,200 flux reversals per inch. A Modified Frequency Modulation
Modified Frequency Modulation
Modified Frequency Modulation, commonly MFM, is a line coding scheme used to encode the actual data-bits on most floppy disk formats, hardware examples include Amiga, most CP/M machines as well as IBM PC compatibles. Early hard disk drives also used this coding.MFM is a modification to the original...
or (1,3) RLL encoding stores each data bit as two bits on tape, but since there is guaranteed to be one 0 (non flux reversal) bit between any 1 (flux reversal) bits then it is possible to store 6,400 encoded bits per inch on the tape, or 3,200 data bits per inch. A (1,7) RLL encoding can also store 6,400 encoded bits per inch on the tape, but since it only takes 3 encoded bits to store 2 data bits this is 4,267 data bits per inch. A (2,7) RLL encoding takes 2 encoded bits to store each data bit, but since there is guaranteed to be two 0 bits between any 1 bits then it is possible to store 9,600 encoded bits per inch on the tape, or 4,800 data bits per inch.
The flux reversal densities on hard drives are significantly greater, but the same improvements in storage density are seen by using different encoding systems.
Coding
In the encoded format a "1" bit indicates a flux transition, while a "0" indicates that the magnetic field on the disk does not change for that time interval.FM: (0,1) RLL
Generally, the term "RLL code" is used to refer to more elaborate encodings, but the original frequency modulation code, also called differential Manchester, can be seen as a simple rate-1/2 RLL code.The added 1 bits are referred to as clock bits.
Data | Encoded |
---|---|
0 | 10 |
1 | 11 |
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: 1010111011111011101010111110
Clock: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
GCR: (0,2) RLL
By extending the maximum run length to 2 adjacent 0 bits, the data rate can be improved to 4/5. This is the original IBM group code recordingGroup Code Recording
In computer science, group code recording refers to several distinct but related encoding methods for magnetic media. The first, used in 6250 cpi magnetic tape, is an error-correcting code combined with a run length limited encoding scheme...
variant.
|
|
Example:
Data: 0010 1101 0001 1000
Encoded: 10010011011101111010
MFM: (1,3) RLL
Modified frequency modulationModified Frequency Modulation
Modified Frequency Modulation, commonly MFM, is a line coding scheme used to encode the actual data-bits on most floppy disk formats, hardware examples include Amiga, most CP/M machines as well as IBM PC compatibles. Early hard disk drives also used this coding.MFM is a modification to the original...
begins to get interesting, because it imposes a constraint of d=1 (a minimum of one 0 between each two 1's), thereby allowing a bit recording rate equal to twice the flux reversal limit. Although its coding rate is only 1/2, the doubled recording rate limit makes it equivalent to a rate-1 code.
Data | Encoded |
---|---|
0 | x0 |
1 | 01 |
Where "x" is the complement of the previous data bit (which is also the previous encoded bit). Except for that x bit, this is the same as the FM table, which is how this code gets its name. The inserted clock bits are 0 except between two 0 data bits.
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: x010010001010001001010010100
Clock: x 1 0 0 0 0 0 0 0 1 1 0 0 0
(1,7) RLL
(1,7) RLL maps 2 bits of data onto three bits on the disk, and the encoding is done in two or four bit groups. The encoding rules are: (x, y) becomes (NOT x, x AND y, NOT y),except (x, 0, 0, y) becomes (NOT x, x AND y, NOT y, 0, 0, 0)
Data | Encoded |
---|---|
00 00 | 101 000 |
00 01 | 100 000 |
10 00 | 001 000 |
10 01 | 010 000 |
00 | 101 |
01 | 100 |
10 | 001 |
11 | 010 |
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: 101 001 010 100 100 000 001
(2,7) RLL
(2,7) RLL maps n bits of data onto 2n bits on the disk, but the encoding can be done in two, three or four bit groups.Data | (2,7) RLL Encoded |
---|---|
11 | 1000 |
10 | 0100 |
000 | 000100 |
010 | 100100 |
011 | 001000 |
0011 | 00001000 |
0010 | 00100100 |
Example:
Data: 1 1 0 1 1 0 0 1 1
Encoded: 1000 001000 00001000
DC Free (1,7) RLL
There is also an alternate (1,7) RLL encoding that is sometimes used to avoid a DC bias (which helps when sending a signal over a long distance or with some types of recording media).Data | Encoded |
---|---|
00 | x01 |
01 | 010 |
10 | x00 |
11 00 | 010 001 |
11 01 | x00 000 |
11 10 | x00 001 |
11 11 | 010 000 |
Where "x" is the complement of the previous encoded bit (i.e. 1 if the previous bit was 0, and 0 if the previous bit was 1).
Example:
Data: 0 1 0 0 1 1 0 1 0 1
Encoded: 010 101 000 000 010
See also
- PRMLPRMLIn computer data storage, Partial Response Maximum Likelihood is a method for converting the weak analog signal from the head of a magnetic disk or tape drive into a digital signal. PRML attempts to correctly interpret even small changes in the analog signal, whereas peak detection relies on fixed...
- Run-length encodingRun-length encodingRun-length encoding is a very simple form of data compression in which runs of data are stored as a single data value and count, rather than as the original run...
- Line codeLine codeIn telecommunication, a line code is a code chosen for use within a communications system for baseband transmission purposes...
- Error correcting codes
- Source codingSource codingIn information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy....
- ModulationModulationIn electronics and telecommunications, modulation is the process of varying one or more properties of a high-frequency periodic waveform, called the carrier signal, with a modulating signal which typically contains information to be transmitted...
- Physical layerPhysical layerThe physical layer or layer 1 is the first and lowest layer in the seven-layer OSI model of computer networking. The implementation of this layer is often termed PHY....
- Self-synchronizing codeSelf-synchronizing codeIn telecommunications, a self-synchronizing code is a line code in which the symbol stream formed by a portion of one code word, or by the overlapped portion of any two adjacent code words, is not a valid code word...
and bit synchronization - 8b/10b encoding8B/10B encodingIn telecommunications, 8b/10b is a line code that maps 8-bit symbols to 10-bit symbols to achieve DC-balance and bounded disparity, and yet provide enough state changes to allow reasonable clock recovery. This means that the difference between the count of 1s and 0s in a string of at least 20 bits...
- Bit slipBit slipIn digital transmission, bit slip is the loss of a bit or bits, caused by clock drift – variations in the respective clock rates of the transmitting and receiving devices....
- Eight-to-fourteen modulationEight-to-Fourteen ModulationEight-to-fourteen modulation is a data encoding technique – formally, a channel code – used by compact discs and pre-Hi-MD MiniDiscs. EFMPlus is a related code, used in DVDs and SACDs. EFM and EFMPlus were both invented by Kees A...
and EFMplus are DC-free (2,10) RLL codes used on CDs and DVDs, respectively.