Baudot code
Encyclopedia
The Baudot code, invented by Émile Baudot
, is a character set
predating EBCDIC
and ASCII
. It was the predecessor to the International Telegraph Alphabet No 2 (ITA2), the teleprinter
code in use until the advent of ASCII. Each character in the alphabet is represented by a series of bits
, sent over a communication channel such as a telegraph wire or a radio signal. The symbol rate
measurement is known as baud
, and is derived from the same name.
developed the cipher now called Bacon's cipher
. However, this cipher is not a machine cipher and as such is not readily suitable for telecommunications.http://www.math.cornell.edu/~morris/135/Bacon.pdf
Baudot invented his original code during 1870 and patented it during 1874. It was a 5-bit code, with equal on and off intervals, which allowed telegraph transmission of the Roman alphabet and punctuation and control signals. It was based on an earlier code developed by Carl Friedrich Gauss
and Wilhelm Weber
in 1834.
Baudot's original code was adapted to be sent from a manual keyboard, and no teleprinter equipment was ever constructed that used it in its original form. The code was entered on a keyboard which had just five piano type keys, operated with two fingers of the left hand and three fingers of the right hand. Once the keys had been pressed they were locked down until mechanical contacts in a distributor unit passed over the sector connected to that particular keyboard, when the keyboard was unlocked ready for the next character to be entered, with an audible click (known as the "cadence signal") to warn the operator. Operators had to maintain a steady rhythm, and the usual speed of operation was 30 words per minute.
The table above "shows the allocation of the Baudot code which was employed in the British Post Office for continental and inland services. It will be observed that a number of characters in the continental code are replaced by fractionals in the inland code. Code elements 1, 2 and 3 are transmitted by keys 1, 2 and 3, and these are operated by the first three fingers of the right hand. Code elements 4 and 5 are transmitted by keys 4 and 5, and these are operated by the first two fingers of the left hand."
Baudot's code became known as International Telegraph Alphabet No. 1, and is no longer used.
.
The Murray code also introduced what became known as "format effectors" or "control character
s" - the CR
(Carriage Return) and LF
(Line Feed) codes. A few of Baudot's codes moved to the positions where they have stayed ever since: the NULL or BLANK and the DEL code. NULL/BLANK was used as an idle code for when no messages were being sent.
Early British Creed
machines used the Murray system.
which used it until the 1950s, with a few changes that consisted of omitting some characters and adding more control codes. An explicit SPC (space) character was introduced, in place of the BLANK/NULL, and a new BEL
code rang a bell or otherwise produced an audible signal at the receiver. Additionally, the WRU or "Who aRe yoU?" code was introduced, which caused a receiving machine to send an identification stream back to the sender.
ITA2 is still used in TDDs and some amateur radio
applications, such as radioteletype
("RTTY"). ITA2 is also used in Enhanced Broadcast Solution (a recent financial protocol specified by Deutsche Börse
) to reduce the character encoding footprint
's Amateur Radio Handbook does so, though in more recent editions the tables of codes correctly identifies it as ITA2.
In ITA2, characters are expressed using five bits. ITA2 uses two code sub-sets, the "letter shift" (LTRS), and the "figure shift" (FIGS). The FIGS character (11011) signals that the following code is to be interpreted as being in the FIGS set, until this is reset by the LTRS (11111) character. "ENQuiry" will trigger the other machine's answerback. It means "Who are you?"
CR is carriage return
, LF is line feed, BEL is the bell character
which rang a small bell
(often used to alert operators to an incoming message), SP is space, and NUL is the null character
(blank tape).
Note: the binary conversions of the codepoints are often shown in reverse order, depending on (presumably) from which side one views the paper tape. Note further that the "control" characters
were chosen so that they were either symmetric or in useful pairs so that inserting a tape "upside down" did not result in problems for the equipment and the resulting printout could be deciphered. Thus FIGS (11011), LTRS (11111) and space (00100) are invariant, while CR (00010) and LF (01000), generally used as a pair, result in the same output when the tape is reversed. LTRS could also be used to overpunch characters to be deleted on a paper tape
(much like DEL in 7-bit ASCII
).
The sequence RYRYRY... is often used in test messages, and at the start of every transmission. Since R is 01010 and Y is 10101, the sequence exercises much of a teleprinter's mechanical components at maximum stress. Also, at one time, fine-tuning of the receiver was done using two coloured lights (one for each tone). 'RYRYRY...' produced 0101010101..., which made the lights glow with equal brightness when the tuning was correct. This tuning sequence is only useful when ITA2 is used with two-tone FSK
modulation, such as is commonly seen in Radioteletype
(RTTY) usage.
US implementations of Baudot code may differ in the addition of a few characters, such as #, & on the FIGS layer. The above table represents the US TTY code.
The Russian version of Baudot code (MTK-2) used three shift modes; the Cyrillic letter mode was activated by the character (00000). Because of the larger number of characters in the Cyrillic alphabet, the characters !, &, £, and BEL
were omitted and replaced by Cyrillics.
Émile Baudot
Jean-Maurice-Émile Baudot , French telegraph engineer and inventor of the first means of digital communication Baudot code, was one of the pioneers of telecommunications...
, is a character set
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
predating EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....
and ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
. It was the predecessor to the International Telegraph Alphabet No 2 (ITA2), the teleprinter
Teleprinter
A teleprinter is a electromechanical typewriter that can be used to communicate typed messages from point to point and point to multipoint over a variety of communication channels that range from a simple electrical connection, such as a pair of wires, to the use of radio and microwave as the...
code in use until the advent of ASCII. Each character in the alphabet is represented by a series of bits
Asynchronous communication
In telecommunications, asynchronous communication is transmission of data without the use of an external clock signal, where data can be transmitted intermittently rather than in a steady stream. Any timing required to recover data from the communication symbols is encoded within the symbols...
, sent over a communication channel such as a telegraph wire or a radio signal. The symbol rate
Symbol rate
In digital communications, symbol rate is the number of symbol changes made to the transmission medium per second using a digitally modulated signal or a line code. The Symbol rate is measured in baud or symbols/second. In the case of a line code, the symbol rate is the pulse rate in pulses/second...
measurement is known as baud
Baud
In telecommunications and electronics, baud is synonymous to symbols per second or pulses per second. It is the unit of symbol rate, also known as baud rate or modulation rate; the number of distinct symbol changes made to the transmission medium per second in a digitally modulated signal or a...
, and is derived from the same name.
Baudot code
Technically, five bit codes began in the 16th century, when Francis BaconFrancis Bacon
Francis Bacon, 1st Viscount St Albans, KC was an English philosopher, statesman, scientist, lawyer, jurist, author and pioneer of the scientific method. He served both as Attorney General and Lord Chancellor of England...
developed the cipher now called Bacon's cipher
Bacon's cipher
Bacon's cipher or the Baconian cipher is a method of steganography devised by Francis Bacon. A message is concealed in the presentation of text, rather than its content.-Cipher details:...
. However, this cipher is not a machine cipher and as such is not readily suitable for telecommunications.http://www.math.cornell.edu/~morris/135/Bacon.pdf
Baudot invented his original code during 1870 and patented it during 1874. It was a 5-bit code, with equal on and off intervals, which allowed telegraph transmission of the Roman alphabet and punctuation and control signals. It was based on an earlier code developed by Carl Friedrich Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...
and Wilhelm Weber
Wilhelm Eduard Weber
Wilhelm Eduard Weber was a German physicist and, together with Carl Friedrich Gauss, inventor of the first electromagnetic telegraph.-Early years:...
in 1834.
Baudot's original code was adapted to be sent from a manual keyboard, and no teleprinter equipment was ever constructed that used it in its original form. The code was entered on a keyboard which had just five piano type keys, operated with two fingers of the left hand and three fingers of the right hand. Once the keys had been pressed they were locked down until mechanical contacts in a distributor unit passed over the sector connected to that particular keyboard, when the keyboard was unlocked ready for the next character to be entered, with an audible click (known as the "cadence signal") to warn the operator. Operators had to maintain a steady rhythm, and the usual speed of operation was 30 words per minute.
EWLINE
|
EWLINE
|
The table above "shows the allocation of the Baudot code which was employed in the British Post Office for continental and inland services. It will be observed that a number of characters in the continental code are replaced by fractionals in the inland code. Code elements 1, 2 and 3 are transmitted by keys 1, 2 and 3, and these are operated by the first three fingers of the right hand. Code elements 4 and 5 are transmitted by keys 4 and 5, and these are operated by the first two fingers of the left hand."
Baudot's code became known as International Telegraph Alphabet No. 1, and is no longer used.
Murray code
During 1901 Baudot's code was modified by Donald Murray (1865–1945), prompted by his development of a typewriter-like keyboard. The Murray system employed an intermediate step, a keyboard perforator, which allowed an operator to punch a paper tape, and a tape transmitter for sending the message from the punched tape. At the receiving end of the line, a printing mechanism would print on a paper tape, and/or a reperforator could be used to make a perforated copy of the message. As there was no longer a direct correlation between the operator's hand movement and the bits transmitted, there was no concern about arranging the code to minimize operator fatigue, and instead Murray designed the code to minimize wear on the machinery, assigning the code combinations with the fewest punched holes to the most frequently used charactersLetter frequencies
The frequency of letters in text has often been studied for use in cryptography, and frequency analysis in particular. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. Linotype machines sorted the letters' frequencies as etaoin shrdlu...
.
The Murray code also introduced what became known as "format effectors" or "control character
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
s" - the CR
Carriage return
Carriage return, often shortened to return, refers to a control character or mechanism used to start a new line of text.Originally, the term "carriage return" referred to a mechanism or lever on a typewriter...
(Carriage Return) and LF
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...
(Line Feed) codes. A few of Baudot's codes moved to the positions where they have stayed ever since: the NULL or BLANK and the DEL code. NULL/BLANK was used as an idle code for when no messages were being sent.
Early British Creed
Creed & Company
Creed & Company was a British telecommunications company founded by Frederick George Creed which was an important pioneer in the field of teleprinter machines...
machines used the Murray system.
Western Union
Murray's code was adopted by Western UnionWestern Union
The Western Union Company is a financial services and communications company based in the United States. Its North American headquarters is in Englewood, Colorado. Up until 2006, Western Union was the best-known U.S...
which used it until the 1950s, with a few changes that consisted of omitting some characters and adding more control codes. An explicit SPC (space) character was introduced, in place of the BLANK/NULL, and a new BEL
Bell character
A bell code is a device control code originally sent to ring a small electromechanical bell on tickers and other teleprinters and teletypewriters to alert operators at the other end of the line, often of an incoming message...
code rang a bell or otherwise produced an audible signal at the receiver. Additionally, the WRU or "Who aRe yoU?" code was introduced, which caused a receiving machine to send an identification stream back to the sender.
ITA2
Around 1930, the CCITT introduced the International Telegraphy Alphabet No. 2 (ITA2) code as an international standard, which was based on the Western Union code with some minor changes. The US standardized on a version of ITA2 called the American Teletypewriter code (USTTY) which was the basis for 5-bit teletypewriter codes until the debut of 7-bit ASCII in 1963.Pattern of impulses 1 = mark 0 = space msb Most significant bit In computing, the most significant bit is the bit position in a binary number having the greatest value... on left |
Pattern of impulses 1 = mark 0 = space msb Most significant bit In computing, the most significant bit is the bit position in a binary number having the greatest value... on right |
Letter shift | Figure shift |
---|---|---|---|
00000 | 00000 | null | null |
00100 | 00100 | space | space |
10111 | 11101 | Q | 1 |
10011 | 11001 | W | 2 |
00001 | 10000 | E | 3 |
01010 | 01010 | R | 4 |
10000 | 00001 | T | 5 |
10101 | 10101 | Y | 6 |
00111 | 11100 | U | 7 |
00110 | 01100 | I | 8 |
11000 | 00011 | O | 9 |
10110 | 01101 | P | 0 |
00011 | 11000 | A | - |
00101 | 10100 | S | BELL |
01001 | 10010 | D | $ |
01101 | 10110 | F | ! |
11010 | 01011 | G | & |
10100 | 00101 | H | # |
01011 | 11010 | J | ' |
01111 | 11110 | K | ( |
10010 | 01001 | L | ) |
10001 | 10001 | Z | " |
11101 | 10111 | X | / |
01110 | 01110 | C | : |
11110 | 01111 | V | ; |
11001 | 10011 | B | ? |
01100 | 00110 | N | , |
11100 | 00111 | M | . |
01000 | 00010 | Carriage return | Carriage return |
00010 | 01000 | Line feed | Line feed |
11011 | 11011 | Shift to figures | |
11111 | 11111 | Shift to letters |
ITA2 is still used in TDDs and some amateur radio
Amateur radio
Amateur radio is the use of designated radio frequency spectrum for purposes of private recreation, non-commercial exchange of messages, wireless experimentation, self-training, and emergency communication...
applications, such as radioteletype
Radioteletype
Radioteletype is a telecommunications system consisting originally of two or more electromechanical teleprinters in different locations, later superseded by personal computers running software to emulate teleprinters, connected by radio rather than a wired link.The term radioteletype is used to...
("RTTY"). ITA2 is also used in Enhanced Broadcast Solution (a recent financial protocol specified by Deutsche Börse
Deutsche Börse
Deutsche Börse AG is a marketplace organizer for the trading of shares and other securities. It also is a transaction services provider. It gives companies and investors access to global capital markets. It is a joint stock company and was founded in 1993. The headquarters are in Frankfurt,...
) to reduce the character encoding footprint
Nomenclature
Nearly all 20th century teleprinter equipment used Western Union's code, ITA2, or variants thereof. Radio amateurs casually call ITA2 and variants "Baudot" incorrectly, and even the American Radio Relay LeagueAmerican Radio Relay League
The American Radio Relay League is the largest membership association of amateur radio enthusiasts in the USA. ARRL is a non-profit organization, and was founded in May 1914 by Hiram Percy Maxim of Hartford, Connecticut...
's Amateur Radio Handbook does so, though in more recent editions the tables of codes correctly identifies it as ITA2.
Details
NOTE: This table presumes the space called "1" by Baudot and Murray is rightmost, and least significant. The actual order of transmission varied by manufacturer.In ITA2, characters are expressed using five bits. ITA2 uses two code sub-sets, the "letter shift" (LTRS), and the "figure shift" (FIGS). The FIGS character (11011) signals that the following code is to be interpreted as being in the FIGS set, until this is reset by the LTRS (11111) character. "ENQuiry" will trigger the other machine's answerback. It means "Who are you?"
CR is carriage return
Carriage return
Carriage return, often shortened to return, refers to a control character or mechanism used to start a new line of text.Originally, the term "carriage return" referred to a mechanism or lever on a typewriter...
, LF is line feed, BEL is the bell character
Bell character
A bell code is a device control code originally sent to ring a small electromechanical bell on tickers and other teleprinters and teletypewriters to alert operators at the other end of the line, often of an incoming message...
which rang a small bell
Bell (instrument)
A bell is a simple sound-making device. The bell is a percussion instrument and an idiophone. Its form is usually a hollow, cup-shaped object, which resonates upon being struck...
(often used to alert operators to an incoming message), SP is space, and NUL is the null character
Null character
The null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...
(blank tape).
Note: the binary conversions of the codepoints are often shown in reverse order, depending on (presumably) from which side one views the paper tape. Note further that the "control" characters
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
were chosen so that they were either symmetric or in useful pairs so that inserting a tape "upside down" did not result in problems for the equipment and the resulting printout could be deciphered. Thus FIGS (11011), LTRS (11111) and space (00100) are invariant, while CR (00010) and LF (01000), generally used as a pair, result in the same output when the tape is reversed. LTRS could also be used to overpunch characters to be deleted on a paper tape
Punched tape
Punched tape or paper tape is an obsolete form of data storage, consisting of a long strip of paper in which holes are punched to store data...
(much like DEL in 7-bit ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
).
The sequence RYRYRY... is often used in test messages, and at the start of every transmission. Since R is 01010 and Y is 10101, the sequence exercises much of a teleprinter's mechanical components at maximum stress. Also, at one time, fine-tuning of the receiver was done using two coloured lights (one for each tone). 'RYRYRY...' produced 0101010101..., which made the lights glow with equal brightness when the tuning was correct. This tuning sequence is only useful when ITA2 is used with two-tone FSK
Frequency-shift keying
Frequency-shift keying is a frequency modulation scheme in which digital information is transmitted through discrete frequency changes of a carrier wave. The simplest FSK is binary FSK . BFSK uses a pair of discrete frequencies to transmit binary information. With this scheme, the "1" is called...
modulation, such as is commonly seen in Radioteletype
Radioteletype
Radioteletype is a telecommunications system consisting originally of two or more electromechanical teleprinters in different locations, later superseded by personal computers running software to emulate teleprinters, connected by radio rather than a wired link.The term radioteletype is used to...
(RTTY) usage.
US implementations of Baudot code may differ in the addition of a few characters, such as #, & on the FIGS layer. The above table represents the US TTY code.
The Russian version of Baudot code (MTK-2) used three shift modes; the Cyrillic letter mode was activated by the character (00000). Because of the larger number of characters in the Cyrillic alphabet, the characters !, &, £, and BEL
Bell character
A bell code is a device control code originally sent to ring a small electromechanical bell on tickers and other teleprinters and teletypewriters to alert operators at the other end of the line, often of an incoming message...
were omitted and replaced by Cyrillics.
See also
- Serial communicationSerial communicationIn telecommunication and computer science, serial communication is the process of sending data one bit at a time, sequentially, over a communication channel or computer bus. This is in contrast to parallel communication, where several bits are sent as a whole, on a link with several parallel channels...
- Asynchronous communicationAsynchronous communicationIn telecommunications, asynchronous communication is transmission of data without the use of an external clock signal, where data can be transmitted intermittently rather than in a steady stream. Any timing required to recover data from the communication symbols is encoded within the symbols...
- Morse codeMorse codeMorse code is a method of transmitting textual information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment...
- RY (test signal)RY (test signal)RYRYRYRY... is a character string that was widely used to test a five-level teleprinter or RTTY channel. The characters R and Y are "01010" and "10101" in 5-bit ITA2 code, also known as Baudot...
- Punched tapePunched tapePunched tape or paper tape is an obsolete form of data storage, consisting of a long strip of paper in which holes are punched to store data...