Locale
Encyclopedia
In computing
, locale is a set of parameter
s that defines the user's language, country and any special variant preferences that the user wants to see in their user interface
. Usually a locale identifier consists of at least a language identifier and a region identifier.
On Unix
, Linux
and other POSIX
-type platforms, locale identifiers are defined similar to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier. It is defined in this format:[language[_territory][.codeset][@modifier]] . (For example, Australian English
using the UTF-8
encoding is en_AU.UTF-8.)
The locale settings are about formatting output given a locale. So, the timezone information and daylight saving time are not usually part of the locale settings.
Less usual, but worth mentioning, is the input format setting. This is mostly defined on a per application basis.
Furthermore, the General settings usually include the keyboard layout
setting.
and other (nowadays) Unicode
-based environments, they are defined in a format similar to BCP 47. They are usually defined with just ISO 639
and ISO 3166-1 alpha-2
codes.
, Linux
and other POSIX
-type platforms, locale identifiers are defined similarly to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier.
In the next example there is an output of command
(cs), Czech Republic
(CZ) with explicit UTF-8
encoding:
$ locale
LANG=cs_CZ.UTF-8
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=
The full list of POSIX locale codes
may be found on the Internet Assigned Numbers Authority
(IANA) website
Details of the IANA registry for language tag extensions
and IANA protocols
are also to be found there.
on Microsoft Windows
, a number such as 1033 for English (United States) or 1041 for Japanese (Japan). These numbers consist of a language code (lower 10 bits) and culture code (upper bits) and are therefore often written in hexadecimal
notation, such as 0x0409 or 0x0411. The list of those codesets are described in character encoding
.
Microsoft
is beginning to introduce unmanaged code Application programming interface
s (APIs) for .NET that use this format. One of the first to be generally released is a function to mitigate issues with internationalized domain name
s,http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_DownlevelGetLocaleScripts.asp but more are in Windows Vista
Beta 1.
Beginning with Windows Vista
, new functions that use BCP 47 locale names have been introduced to replace nearly all LCID-based APIs.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, locale is a set of parameter
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....
s that defines the user's language, country and any special variant preferences that the user wants to see in their user interface
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...
. Usually a locale identifier consists of at least a language identifier and a region identifier.
On Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
and other POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
-type platforms, locale identifiers are defined similar to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier. It is defined in this format:
Australian English
Australian English is the name given to the group of dialects spoken in Australia that form a major variety of the English language....
using the UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
encoding is en_AU.UTF-8.)
General locale settings
These settings usually include the following display (output) format settings:- Number format setting
- Character classification, case conversion settings
- Date/Time format setting
- String collation setting
- Currency format setting
- Paper size setting
- other minor settings ...
The locale settings are about formatting output given a locale. So, the timezone information and daylight saving time are not usually part of the locale settings.
Less usual, but worth mentioning, is the input format setting. This is mostly defined on a per application basis.
Furthermore, the General settings usually include the keyboard layout
Keyboard layout
A keyboard layout is any specific mechanical, visual, or functional arrangement of the keys, legends, or key–meaning associations of a computer, typewriter, or other typographic keyboard....
setting.
Programming/markup language support
In these environments,- CC (programming language)C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
- C++
- EiffelEiffel (programming language)Eiffel is an ISO-standardized, object-oriented programming language designed by Bertrand Meyer and Eiffel Software. The design of the language is closely connected with the Eiffel programming method...
- JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
- Microsoft .NET framework
- REBOLREBOLREBOL is a cross-platform data exchange language and a multi-paradigm dynamic programming language originally designed by Carl Sassenrath for network communications and distributed computing. The language and its official implementation, which is a proprietary freely redistributable software are...
- RubyRuby (programming language)Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
- Perl
- PHPPHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
- PythonPython (programming language)Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
- XMLXMLExtensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
and other (nowadays) Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
-based environments, they are defined in a format similar to BCP 47. They are usually defined with just ISO 639
ISO 639
ISO 639 is a set of standards by the International Organization for Standardization that is concerned with representation of names for language and language groups....
and ISO 3166-1 alpha-2
ISO 3166-1 alpha-2
ISO 3166-1 alpha-2 codes are two-letter country codes defined in ISO 3166-1, part of the ISO 3166 standard published by the International Organization for Standardization , to represent countries, dependent territories, and special areas of geographical interest...
codes.
POSIX-type platforms
On UnixUnix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
, Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
and other POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
-type platforms, locale identifiers are defined similarly to the BCP 47 definition of language tags, but the locale variant modifier is defined differently, and the character set is included as a part of the identifier.
In the next example there is an output of command
locale
for Czech languageCzech language
Czech is a West Slavic language with about 12 million native speakers; it is the majority language in the Czech Republic and spoken by Czechs worldwide. The language was known as Bohemian in English until the late 19th century...
(cs), Czech Republic
Czech Republic
The Czech Republic is a landlocked country in Central Europe. The country is bordered by Poland to the northeast, Slovakia to the east, Austria to the south, and Germany to the west and northwest....
(CZ) with explicit UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
encoding:
$ locale
LANG=cs_CZ.UTF-8
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=
The full list of POSIX locale codes
may be found on the Internet Assigned Numbers Authority
Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...
(IANA) website
Details of the IANA registry for language tag extensions
and IANA protocols
are also to be found there.
Specifics for Microsoft platforms
Locale identifier (LCID) for unmanaged codeManaged code
Managed code is a term coined by Microsoft to identify computer program code that requires and will only execute under the "management" of a Common Language Runtime virtual machine ....
on Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, a number such as 1033 for English (United States) or 1041 for Japanese (Japan). These numbers consist of a language code (lower 10 bits) and culture code (upper bits) and are therefore often written in hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
notation, such as 0x0409 or 0x0411. The list of those codesets are described in character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
.
Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
is beginning to introduce unmanaged code Application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
s (APIs) for .NET that use this format. One of the first to be generally released is a function to mitigate issues with internationalized domain name
Internationalized domain name
An internationalized domain name is an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Russian, Hindi or the Latin alphabet-based characters with diacritics,...
s,http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_DownlevelGetLocaleScripts.asp but more are in Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...
Beta 1.
Beginning with Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...
, new functions that use BCP 47 locale names have been introduced to replace nearly all LCID-based APIs.
See also
- Internationalization and localizationInternationalization and localizationIn computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...
- ISO 639ISO 639ISO 639 is a set of standards by the International Organization for Standardization that is concerned with representation of names for language and language groups....
language codeLanguage codeA language code is a code that assigns letters and/or numbers as identifiers or classifiers for languages. These codes may be used to organize library collections or presentations of data, to choose the correct localizations and translations in computing, and as a shorthand designation for longer...
s - ISO 3166-1 alpha-2ISO 3166-1 alpha-2ISO 3166-1 alpha-2 codes are two-letter country codes defined in ISO 3166-1, part of the ISO 3166 standard published by the International Organization for Standardization , to represent countries, dependent territories, and special areas of geographical interest...
country codeCountry codeCountry codes are short alphabetic or numeric geographical codes developed to represent countries and dependent areas, for use in data processing and communications. Several different systems have been developed to do this. The best known of these is ISO 3166-1...
s - IETF language tag
- Common Locale Data RepositoryCommon Locale Data RepositoryThe Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. CLDR is...
- Date and time representation by country
- AppLocaleAppLocaleAppLocale is a tool for Windows XP and Windows Server 2003 by Microsoft. It is a launcher application that makes it possible to run non-Unicode applications in a locale of the user's choice. Since changing the locale normally requires a restart of Windows, AppLocale is especially popular with...
External links
- BCP 47
- Language Subtag Registry
- Common Locale Data Repository JavadocJavadocJavadoc is a documentation generator from Sun Microsystems for generating API documentation in HTML format from Java source code.The "doc comments" format used by Javadoc is the de facto industry standard for documenting Java classes. Some IDEs, such as Netbeans and Eclipse automatically generate...
API documentation - Locale and Language information from Microsoft
- MS-LCID: Windows Language Code Identifier (LCID) Reference from Microsoft
- Microsoft LCID list
- Microsoft LCID chart with decimal equivalents
- POSIX Environment Variables
- Low Level Technical details on defining a POSIX locale
- ICU Locale Explorer
- Debian Wiki on Locales
- Article "The Standard C++ Locale" by Nathan C. Myers
- Internationalization services - Python Library Reference
- locale(7): Description of multi-language support - Linux man page
- Apache C++ Standard Library Locale User's Guide
- Sort order charts for various operating system locales and database collations
- NATSPEC Library
- Description of locale-related UNIX environment variables in Debian Linux Reference Manual
- Guides to locales and locale creation on various platforms