Character entity reference
Encyclopedia
In the markup languages SGML, HTML
, XHTML
and XML
, a character entity reference is a reference to a particular kind of named entity
that has been predefined or explicitly declared in a Document Type Definition
(DTD). The "replacement text" of the entity consists of a single character
from the Universal Character Set
/Unicode
. The purpose of a character entity reference is to provide a way to refer to a character that is not universally encodable
.
Although in popular usage character references are often called "entity references" or even "entities", this usage is wrong. A character reference is a reference to a character, not to an entity. Entity reference refers to the content of a named entity. An entity declaration is created by using the
code point
, i.e. here, the character code
) refers the
.
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
, XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....
and XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
, a character entity reference is a reference to a particular kind of named entity
SGML entity
In the Standard Generalized Markup Language , an entity is a primitive data type, which associates a string with either a unique alias or an SGML reserved word . Entities are foundational to the organizational structure and definition of SGML documents...
that has been predefined or explicitly declared in a Document Type Definition
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
(DTD). The "replacement text" of the entity consists of a single character
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
from the Universal Character Set
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...
/Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
. The purpose of a character entity reference is to provide a way to refer to a character that is not universally encodable
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
.
Although in popular usage character references are often called "entity references" or even "entities", this usage is wrong. A character reference is a reference to a character, not to an entity. Entity reference refers to the content of a named entity. An entity declaration is created by using the
syntax in a document type definition (DTD) or XML schema. Then, the name defined in the entity declaration is subsequently used in the XML. When used in the XML, it is called an entity reference.Predefined entity
A "predefined entity reference" is a reference to one of the special characters denoted by:name | value | character | code (dec) | meaning |
---|---|---|---|---|
quot | " |
" | x22 (34) | (double) quotation mark Quotation mark Quotation marks or inverted commas are punctuation marks at the beginning and end of a quotation, direct speech, literal title or name. Quotation marks can also be used to indicate a different meaning of a word or phrase than the one typically associated with it and are often used to express irony... |
amp | & |
& | x26 (38) | ampersand Ampersand An ampersand is a logogram representing the conjunction word "and". The symbol is a ligature of the letters in et, Latin for "and".-Etymology:... |
apos | ' |
' | x27 (39) | apostrophe Apostrophe The apostrophe is a punctuation mark, and sometimes a diacritic mark, in languages that use the Latin alphabet or certain other alphabets... (= apostrophe-quote) |
lt | < |
< | x3C (60) | less-than sign |
gt | > |
> | x3E (62) | greater-than sign Greater-than sign -Computing:The greater-than sign is an original ASCII character .-Angle brackets:The greater-than sign is used for an approximation of the closing angle bracket . ASCII does not have angular brackets.-Programming language:... |
Character coding
A "character reference" is a construct such as 
or equally  
that refers to a character by means of its numeric UnicodeUnicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
code point
Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...
, i.e. here, the character code
160
(or xA0
in hexaHexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
) refers the
character, the non-breaking spaceNon-breaking space
In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...
.
See also
- SGML entitySGML entityIn the Standard Generalized Markup Language , an entity is a primitive data type, which associates a string with either a unique alias or an SGML reserved word . Entities are foundational to the organizational structure and definition of SGML documents...
- Character encodings in HTMLCharacter encodings in HTMLHTML has been in use since 1991, but HTML 4.0 was the first standardized version where international characters were given reasonably complete treatment...
- Numeric character referenceNumeric character referenceA numeric character reference is a common markup construct used in SGML and other SGML-related markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set of Unicode...
- List of XML and HTML character entity references