EBCDIC
Encyclopedia
Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8-bit
character encoding
used mainly on IBM mainframe
and IBM
midrange computer
operating systems.
EBCDIC descended from the code used with punched card
s and the corresponding six bit binary-coded decimal
code used with most of IBM's computer peripherals of the late 1950s and early 1960s.
It is also employed on various non-IBM platforms such as Fujitsu
-Siemens
' BS2000/OSD
, HP
MPE/iX, and Unisys
MCP
.
and was announced with the release of the IBM System/360
line of mainframe computer
s. It is an 8-bit character encoding, in contrast to, and developed separately from, the 7-bit ASCII encoding scheme. It was created to extend the existing binary-coded decimal (BCD) interchange code, or BCDIC, which itself was devised as an efficient means of encoding the two zone and number punches on punched cards into 6 bits.
While IBM was a chief proponent of the ASCII standardization committee, they did not have time to prepare ASCII peripherals (such as card punch machines) to ship with its System/360
computers, so the company settled on EBCDIC at the time. The System/360 became wildly successful, and thus so did EBCDIC.
All IBM mainframe and midrange
peripheral
s and operating system
s use EBCDIC as their inherent encoding, but AIX running on the iSeries, Linux running on the zSeries, and the IBM PC
and its descendants use ASCII. Software and many hardware peripherals can translate to and from encodings, and modern mainframes (such as IBM zSeries
) include processor instructions, at the hardware level, to accelerate translation between character sets.
EBCDIC has no modern technical advantage over ASCII-based code pages such as the ISO-8859 series or Unicode
. There are some technical niceties in each, e.g., ASCII and EBCDIC both have one bit which indicates upper or lower case. But there are some aspects of EBCDIC which make it much less pleasant to work with than ASCII, such as a non-contiguous alphabet. As with single-byte extended ASCII
codepages, most EBCDIC codepages only allow up to two languages (English and one other language) to be used in a database
or text file.
Where true support for multilingual text is desired, a system supporting far more characters is needed. Generally this is done with some form of Unicode support. There is an EBCDIC Unicode Transformation Format called UTF-EBCDIC
proposed by the Unicode consortium, but it is not intended to be used in open interchange environments and, even on EBCDIC-based systems, it is almost never used. IBM mainframes support UTF-16, but they do not support UTF-EBCDIC natively.
Arabic EBCDIC versions are typically in presentation order, in left to right order as displayed by an older mainframe or line printer, rather than in the right to left logical order used by modern encodings such as Unicode.
, one of the code page variants of EBCDIC; it shows only the basic (English) EBCDIC characters. Characters 00–3F and FF are control
s, 40 is space, 41 is no-break space (RSP: "Required Space"), E1 is numeric space (NSP: "Numeric Space"), and CA is soft hyphen
. Characters are shown with their equivalent Unicode
codes. Invariant alphanumeric, punctuation, and control characters common to all EBCDIC code pages are shown in bold. Unassigned codes are typically filled with international or region-specific characters in the various EBCDIC code page
variants.
>
|
|
|
writes in his Jargon File
that EBCDIC was almost universally loathed by early hackers and programmers because of its multitude of different versions, none of which resembled the other versions, and that IBM produced it in direct competition with the already-established ASCII
.
The Jargon File 4.4.7 gives the following definition:
Another popular complaint is that the EBCDIC alphabetic characters follow an archaic punched card encoding rather than a linear ordering like ASCII. One consequence of this is that incrementing the character code for "I" does not produce the code for "J", and likewise there is a gap between the codes for "R" and "S". Thus programming a simple control loop to cycle through only the alphabetic characters is problematic.
These incompatibilities were also the source of many jokes. One such joke went:
A reference to the EBCDIC character set is made in the classic Infocom adventure game Zork II
. In the "Machine Room", there is a collection of ancient computers and other machines of uncertain purpose. The following is the description of the room, with EBCDIC used to imply an incomprehensible language:
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
used mainly on IBM mainframe
IBM mainframe
IBM mainframes are large computer systems produced by IBM from 1952 to the present. During the 1960s and 1970s, the term mainframe computer was almost synonymous with IBM products due to their marketshare...
and IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
midrange computer
Midrange computer
Midrange computers, or midrange systems, are a class of computer systems which fall in between mainframe computers and microcomputers.The class emerged in the 1960s and machines were generally known at the time as minicomputers - especially models from Digital Equipment Corporation , Data General,...
operating systems.
EBCDIC descended from the code used with punched card
Punched card
A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions...
s and the corresponding six bit binary-coded decimal
Binary-coded decimal
In computing and electronic systems, binary-coded decimal is a digital encoding method for numbers using decimal notation, with each decimal digit represented by its own binary sequence. In BCD, a numeral is usually represented by four bits which, in general, represent the decimal range 0 through 9...
code used with most of IBM's computer peripherals of the late 1950s and early 1960s.
It is also employed on various non-IBM platforms such as Fujitsu
Fujitsu
is a Japanese multinational information technology equipment and services company headquartered in Tokyo, Japan. It is the world's third-largest IT services provider measured by revenues....
-Siemens
Siemens AG
Siemens AG is a German multinational conglomerate company headquartered in Munich, Germany. It is the largest Europe-based electronics and electrical engineering company....
' BS2000/OSD
BS2000
BS2000 is a mainframe computer operating system developed by Fujitsu Technology Solutions.Unlike other mainframe systems, BS2000/OSD provides exactly the same user and programming interface in all operating modes and regardless of whether it is running natively or as a guest system in a virtual...
, HP
Hewlett-Packard
Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...
MPE/iX, and Unisys
Unisys
Unisys Corporation , headquartered in Blue Bell, Pennsylvania, United States, and incorporated in Delaware, is a long established business whose core products now involves computing and networking.-History:...
MCP
MCP (Burroughs Large Systems)
The MCP is the proprietary operating system of the Burroughs large systems including the Unisys Clearpath/MCP systems....
.
History
EBCDIC (icon) was devised in 1963 and 1964 by IBMIBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
and was announced with the release of the IBM System/360
System/360
The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...
line of mainframe computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
s. It is an 8-bit character encoding, in contrast to, and developed separately from, the 7-bit ASCII encoding scheme. It was created to extend the existing binary-coded decimal (BCD) interchange code, or BCDIC, which itself was devised as an efficient means of encoding the two zone and number punches on punched cards into 6 bits.
While IBM was a chief proponent of the ASCII standardization committee, they did not have time to prepare ASCII peripherals (such as card punch machines) to ship with its System/360
System/360
The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...
computers, so the company settled on EBCDIC at the time. The System/360 became wildly successful, and thus so did EBCDIC.
All IBM mainframe and midrange
Midrange computer
Midrange computers, or midrange systems, are a class of computer systems which fall in between mainframe computers and microcomputers.The class emerged in the 1960s and machines were generally known at the time as minicomputers - especially models from Digital Equipment Corporation , Data General,...
peripheral
Peripheral
A peripheral is a device attached to a host computer, but not part of it, and is more or less dependent on the host. It expands the host's capabilities, but does not form part of the core computer architecture....
s and operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s use EBCDIC as their inherent encoding, but AIX running on the iSeries, Linux running on the zSeries, and the IBM PC
IBM PC
The IBM Personal Computer, commonly known as the IBM PC, is the original version and progenitor of the IBM PC compatible hardware platform. It is IBM model number 5150, and was introduced on August 12, 1981...
and its descendants use ASCII. Software and many hardware peripherals can translate to and from encodings, and modern mainframes (such as IBM zSeries
ZSeries
IBM System z, or earlier IBM eServer zSeries, is a brand name designated by IBM to all its mainframe computers.In 2000, IBM rebranded the existing System/390 to IBM eServer zSeries with the e depicted in IBM's red trademarked symbol, but because no specific machine names were changed for...
) include processor instructions, at the hardware level, to accelerate translation between character sets.
EBCDIC has no modern technical advantage over ASCII-based code pages such as the ISO-8859 series or Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
. There are some technical niceties in each, e.g., ASCII and EBCDIC both have one bit which indicates upper or lower case. But there are some aspects of EBCDIC which make it much less pleasant to work with than ASCII, such as a non-contiguous alphabet. As with single-byte extended ASCII
Extended ASCII
The term extended ASCII describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others...
codepages, most EBCDIC codepages only allow up to two languages (English and one other language) to be used in a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
or text file.
Where true support for multilingual text is desired, a system supporting far more characters is needed. Generally this is done with some form of Unicode support. There is an EBCDIC Unicode Transformation Format called UTF-EBCDIC
UTF-EBCDIC
UTF-EBCDIC is a character encoding used to represent Unicode characters. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for...
proposed by the Unicode consortium, but it is not intended to be used in open interchange environments and, even on EBCDIC-based systems, it is almost never used. IBM mainframes support UTF-16, but they do not support UTF-EBCDIC natively.
Arabic EBCDIC versions are typically in presentation order, in left to right order as displayed by an older mainframe or line printer, rather than in the right to left logical order used by modern encodings such as Unicode.
Codepage layout
The table below is based on CCSID 500EBCDIC 500
IBM code page 500 is an EBCDIC code page with full Latin-1-charset used in IBM mainframes.CCSID 1148 is the Euro currency update of code page/CCSID 500. Byte 9F is replaced ¤ with € in that code page.-Codepage layout:...
, one of the code page variants of EBCDIC; it shows only the basic (English) EBCDIC characters. Characters 00–3F and FF are control
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
s, 40 is space, 41 is no-break space (RSP: "Required Space"), E1 is numeric space (NSP: "Numeric Space"), and CA is soft hyphen
Soft hyphen
In computing and typesetting, a soft hyphen is a type of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed....
. Characters are shown with their equivalent Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
codes. Invariant alphanumeric, punctuation, and control characters common to all EBCDIC code pages are shown in bold. Unassigned codes are typically filled with international or region-specific characters in the various EBCDIC code page
Code page
Code page is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM's EBCDIC-based mainframe systems, but many vendors use this term including Microsoft, SAP, and Oracle Corporation...
variants.
Criticism and humor
Open-source-software advocate and hacker Eric S. RaymondEric S. Raymond
Eric Steven Raymond , often referred to as ESR, is an American computer programmer, author and open source software advocate. After the 1997 publication of The Cathedral and the Bazaar, Raymond was for a number of years frequently quoted as an unofficial spokesman for the open source movement...
writes in his Jargon File
Jargon File
The Jargon File is a glossary of computer programmer slang. The original Jargon File was a collection of terms from technical cultures such as the MIT AI Lab, the Stanford AI Lab and others of the old ARPANET AI/LISP/PDP-10 communities, including Bolt, Beranek and Newman, Carnegie Mellon...
that EBCDIC was almost universally loathed by early hackers and programmers because of its multitude of different versions, none of which resembled the other versions, and that IBM produced it in direct competition with the already-established ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
.
The Jargon File 4.4.7 gives the following definition:
Another popular complaint is that the EBCDIC alphabetic characters follow an archaic punched card encoding rather than a linear ordering like ASCII. One consequence of this is that incrementing the character code for "I" does not produce the code for "J", and likewise there is a gap between the codes for "R" and "S". Thus programming a simple control loop to cycle through only the alphabetic characters is problematic.
These incompatibilities were also the source of many jokes. One such joke went:
- Professor: "So the American government went to IBM to come up with an encryption standardData Encryption StandardThe Data Encryption Standard is a block cipher that uses shared secret encryption. It was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is...
, and they came up with—"
Student: "EBCDIC!"
A reference to the EBCDIC character set is made in the classic Infocom adventure game Zork II
Zork II
Zork II: The Wizard of Frobozz is an interactive fiction computer game published by Infocom in 1981. It was written by Marc Blank, Dave Lebling, Bruce Daniels and Tim Anderson. It was the second game in the popular Zork trilogy and was released for a wide range of computer systems. It begins where...
. In the "Machine Room", there is a collection of ancient computers and other machines of uncertain purpose. The following is the description of the room, with EBCDIC used to imply an incomprehensible language:
See also
- EBCDIC-codepages with Latin-1-charsetEBCDIC 88591st number : EBCDIC code page CCSID number with full Latin-1-charset2nd number : the same code page, but currency ¤ replaced by euro €...
- codepage 037 (English, Portuguese)EBCDIC 037IBM code page 37 is an EBCDIC code page with the full Latin-1 character set used in IBM mainframes. It is used in some English and Portuguese speaking countries, including Australia, Brazil, Canada, New Zealand, Portugal, South Africa, and the United States....
- codepage 285 (Ireland, United Kingdom)EBCDIC 285IBM code page 285 is an EBCDIC code page with full Latin-1-charset used in IBM mainframes. It is used in Ireland and the United Kingdom.CCSID 1146 is the Euro currency update of code page/CCSID 285...
- UTF-EBCDICUTF-EBCDICUTF-EBCDIC is a character encoding used to represent Unicode characters. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for...
External links
- Character Data Representation Architecture (CDRA) from IBM Contains IBM's official information on codepages and charsets.
- [ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00037.pdf Codepage 37]
- [ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP01047.pdf Codepage 1047]
- F.0 Appendix F. Code Pages from AS/400 International Application Development V4R2
- ICU Converter Explorer Contains more information about EBCDIC derived from IBM's CDRA, including DBCS EBCDIC (Double Byte Character Set EBCDIC)
- ICU Charset Mapping Tables Contains computer readable Unicode mapping tables for EBCDIC and many other character sets
- ASCII/EBCDIC Conversion Table for HP 3000 MPE/iX Computer Systems
- EBCDIC character list, including decimal and hex values, symbolic name, and character/function
- iconv.com Online tool to convert from ASCII to/from EBCDIC
- EBCDIC-codepages with Latin-1-charset (JavaScript)