DBCS
Encyclopedia
A double-byte character set (DBCS) is a character set that represents each character with 2 bytes. The DBCS supports national languages that contain a large number of unique characters or symbols (the maximum number of characters that can be represented with 1 byte is 256
characters, while 2 bytes can represent up to 65,536 characters). Examples of such languages include Japanese, Korean, and Chinese.
DBCS stands for Double Byte Character Set. This term has two basic meanings:
set (i.e., being greater than 7 bits), and is always paired up with a single-byte character-set (SBCS
). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with halfwidth characters and the DBCS with fullwidth characters.
Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not EUC
.
Note that this original meaning of DBCS is different from what some consider correct usage today. Some insist that these character sets be properly called either multi-byte character sets (MBCS) or variable-width encoding
s because character sets like EUC-JP, EUC-TW, GB18030 and UTF-8
use more than 2 bytes for some characters, and they support 1 byte for some other characters.
encodings, while other people use the term DBCS to mean older (pre-Unicode) code pages that use more than one byte per character. Shift-JIS
, GB2312 and Big5
are a few code pages that can contain more than one byte per character, but even using the term DBCS for these code pages is incorrect terminology because these code pages are really MBCS
(MultiByte Character Sets). Some IBM
mainframes do have true DBCS code pages, which contain only the double byte portion of a multibyte code page.
If a person uses the term "DBCS Enablement" for software internationalization
, they are using ambiguous terminology. They either mean they want to write software for East Asian markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also implies translation
into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible code pages that exist between the various countries in East Asia for internationalizing software. Since Unicode supports all the major languages in East Asia, unlike many other code pages, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.
256 (number)
256 is the natural number following 255 and preceding 257.-In mathematics:256 is a composite number, with the factorization 256 = 28, which makes it a power of two....
characters, while 2 bytes can represent up to 65,536 characters). Examples of such languages include Japanese, Korean, and Chinese.
DBCS stands for Double Byte Character Set. This term has two basic meanings:
- In CJKCJKCJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...
(Chinese, Japanese and Korean) computing, the term "DBCS" traditionally means a character set in which every graphic characterGraphic characterIn ISO/IEC 646 and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans...
not representable by an accompanying SBCSSBCSSBCS, or Single Byte Character Set, is used to refer to character sets which use exactly one byte for each graphic character. SBCS can accommodate a maximum of 256 symbols, and were originally essentially built for the English language because English does not have many symbols or accented letters...
is encoded in two bytes; Han characters would generally comprise most of these two-byte characters. - The term "DBCS" can also mean a character set in which all characters (including all control characters) are encoded in two bytes.
The DBCS in CJK computing
The term DBCS traditionally refers to a character set where each graphic character is encoded in two bytes. The DBCS always has lead bytes with the most significant bitMost significant bit
In computing, the most significant bit is the bit position in a binary number having the greatest value...
set (i.e., being greater than 7 bits), and is always paired up with a single-byte character-set (SBCS
SBCS
SBCS, or Single Byte Character Set, is used to refer to character sets which use exactly one byte for each graphic character. SBCS can accommodate a maximum of 256 symbols, and were originally essentially built for the English language because English does not have many symbols or accented letters...
). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with halfwidth characters and the DBCS with fullwidth characters.
Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically not EUC
Extended Unix Code
Extended Unix Code is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 characters, or 830584 ...
.
Note that this original meaning of DBCS is different from what some consider correct usage today. Some insist that these character sets be properly called either multi-byte character sets (MBCS) or variable-width encoding
Variable-width encoding
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set for representation in a computer...
s because character sets like EUC-JP, EUC-TW, GB18030 and UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
use more than 2 bytes for some characters, and they support 1 byte for some other characters.
Controversy
Some people use DBCS to mean the UTF-16 and UTF-8UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
encodings, while other people use the term DBCS to mean older (pre-Unicode) code pages that use more than one byte per character. Shift-JIS
Shift-JIS
Shift JIS is a character encoding for the Japanese language originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1...
, GB2312 and Big5
Big5
Big-5 or Big5 is a character encoding method used in Taiwan, Hong Kong, and Macau for Traditional Chinese characters.Mainland China, which uses Simplified Chinese Characters, uses the GB instead.- Organization :...
are a few code pages that can contain more than one byte per character, but even using the term DBCS for these code pages is incorrect terminology because these code pages are really MBCS
Variable-width encoding
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set for representation in a computer...
(MultiByte Character Sets). Some IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
mainframes do have true DBCS code pages, which contain only the double byte portion of a multibyte code page.
If a person uses the term "DBCS Enablement" for software internationalization
Internationalization
In economics, internationalization has been viewed as a process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization or international entrepreneurship...
, they are using ambiguous terminology. They either mean they want to write software for East Asian markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also implies translation
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. Whereas interpreting undoubtedly antedates writing, translation began only after the appearance of written literature; there exist partial translations of the Sumerian Epic of...
into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible code pages that exist between the various countries in East Asia for internationalizing software. Since Unicode supports all the major languages in East Asia, unlike many other code pages, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.