Document Type Definition
Encyclopedia
Document Type Definition (DTD) is a set of markup declarations that define a document type for SGML-family markup language
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

s (SGML, XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

, HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

). DTDs were a precursor to XML schema
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

 and have a similar function, although different capabilities.

DTDs use a terse formal syntax that declares precisely which elements and references may appear where in the document of the particular type, and what the elements’ contents and attributes are. DTDs also declare entities which may be used in the instance document.

XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 uses a subset of SGML DTD.

newer XML Namespace
XML Namespace
xmlns tagged XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary...

-aware schema languages (such as W3C XML Schema and ISO RELAX NG
RELAX NG
In computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...

) have largely superseded DTDs. A namespace-aware version of DTDs is being developed as Part 9 of ISO DSDLhttp://www.dsdl.org/. DTDs persist in applications which need special publishing characters such as the XML and HTML Character Entity References, which were derived from the larger sets defined as part of the ISO SGML standard effort.

Associating DTDs with documents

A Document Type Declaration
Document Type Declaration
A Document Type Declaration, or DOCTYPE, is an instruction that associates a particular SGML or XML document with a Document Type Definition...

 associates a DTD with an XML document. Document Type Declarations appear in the syntactic fragment doctypedecl near the start of an XML document. The declaration establishes that the document is an instance of the type defined by the referenced DTD.

DTDs make two sorts of declaration:
  • an optional external subset
  • an optional internal subset


The declarations in the internal subset form part of the Document Type Declaration in the document itself. The declarations in the external subset are located in a separate text file. The external subset may be referenced via a public identifier
Public identifier
A public identifier is a document processing construct in SGML and XML.In HTML and XML, a public identifier is meant to be universally unique within its application scope. It typically occurs in a Document Type Declaration....

and/or a system identifier
System identifier
A system identifier is a document processing construct introduced in the HyTime markup language as a supplement to SGML. It was subsequently incorporated into the HTML and XML markup languages....

. Programs for reading documents may not be required to read the external subset.

Note that any valid SGML or XML document that references an external subset in its DTD, or whose body contains references to parsed external entities declared in its DTD (including those declared within its internal subset), may only be partially parsed but cannot be fully validated by validating SGML or XML parsers in their standalone mode (this means that these validating parsers will not attempt to retrieve these external entities, and their replacement text will not be accessible).

However, such documents will still be fully parsable in the non-standalone mode of validating parsers, which will signal an error if these external entities cannot be located with their specified public identifier (FPI)
Formal Public Identifier
A Formal Public Identifier is a short piece of specially formatted text that may be used to uniquely identify a product, specification or document...

 and/or system identifier (an URI), or are inaccessible. (Notations declared in the DTD are also referencing external entities, but these unparsed entities are not needed for the validation of documents in the standalone mode of these parsers: the validation of all external entities referenced by notations is left to the application using the SGML or XML parser). Non-validating parsers may eventually attempt to locate these external entities in the non-standalone mode (by partially interpreting the DTD only to resolve their declared parsable entities), but will not validate the content model of these documents.

Examples

The following example of a document type declaration contains both public and system identifiers:


"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


All HTML 4.01 documents conform to one of three SGML DTDs. The public identifiers of these DTDs are constant and are as follows:


The system identifiers of these DTDs, if present in the Document Type Declaration, will be URI references
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

. System identifiers can vary, usually point to a specific set of declarations in a resolvable location. SGML allows for public identifiers to be mapped to system identifiers in catalogs
XML Catalog
XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs....

 that are optionally made available to the URI resolvers used by document parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...

 software.

Note that this document type declaration can only appear after the optional XML declaration, and before the document body, if the document syntax conforms to XML. This includes XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....

 documents:



"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


...



An additional internal subset can also be provided after the external subset:



"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [

]>


...



Alternatively, only the internal subset may be provided:




]>


...



Finally, the document type definition may include no subset at all; in that case, it just specifies that the document has a single top-level element (this is an implicit requirement for all valid XML and HTML documents, but not for document fragments or for all SGML documents, whose top-level elements may be different from the implied root element), and it indicates the type name of the root element:






...


Markup declarations

DTDs describe the structure of a class of documents via element and attribute-list declarations. Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character data may be contained within each element. Attribute-list declarations name the allowable set of attributes for each declared element, including the type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

 of each attribute value, if not an explicit set of valid value(s).

DTD markup declarations declare which element types, attribute lists, entities and notations are allowed in the structure of the corresponding class of XML documents.

Element type declarations

An element type declaration defines an element and its possible content. A valid XML document contains only elements that are defined in the DTD.

Various keywords and characters specify an element’s content; they can be either:
  • EMPTY for specifying that the defined element allows no content, i.e., it can't have any children elements, not even text elements (if there are whitespaces, they are ignored);
  • ANY for specifying that the defined element allows any content, without restriction, i.e., that it may have any number (including none) and type of children elements (including text elements);
  • or an expression, specifying the only elements allowed as direct children in the content of the defined element; this content can be either:
    • a mixed content, which means that the content may include at least one text element and zero or more named elements, but their order and number of occurrences can't be restricted; this can be:
      • ( #PCDATA ): historically meaning parsed character data, this means that only one text element is allowed in the content (no quantifier is allowed);
      • ( #PCDATA | element name | ... )*: a limited choice (in an exclusive list between parentheses and separated by "|" pipe characters and terminated by the required "*" quantifier) of two or more child elements (including only text elements or the specified named elements) may be used in any order and number of occurrences in the content.
    • an element content, which means that there must be no text elements in the children elements of the content (all whitespaces encoded between child elements are then ignored, just like comments). Such element content is specified as content particle in a variant of Backus-Naur Form without terminal symbols and element names as non-terminal symbols. Element content consists of:
      • a content particle can be either the name of an element declared in the DTD, or a sequence list or choice list. It may be followed by an optional quantifier.
        • a sequence list means an ordered list (specified between parentheses and separated by a "," comma character) of one or more content particles : all the content particles must appear successively as direct children in the content of the defined element, at the specified position and relative order;
        • a choice list means a mutually exclusive list (specified between parentheses and separated by a "|" pipe character) of two or more content particles : only one of these content particles may appear in the content of the defined element at the same position.
      • A quantifier is a single character that immediately follows the specified item to which it applies, to restrict the number of successive occurrences of these items at the specified position in the content of the element; it and may be either:
        • + for specifying that there must be one or more occurrences of the item — the effective content of each occurrence may be different;
        • * for specifying that any number (zero or more) of occurrences in allowed — the item is optional and the effective content of each occurrence may be different;
        • ? for specifying that there must not be more than one occurrence — the item is optional;
        • If there is no quantifier, the specified item must occur exactly one time at the specified position in the content of the element.


For example:





Note that element type declarations are ignored by non-validating SGML and XML parsers (in which cases any elements will be accepted in any order and any number of occurrences in the parsed document), but these declarations are still checked for well-formness and validity.

Attribute list declarations

An attribute list specifies for a given element type the list of all possible attribute associated with that type. For each possible attribute, it contains:
  • the declared name of the attribute,
  • its data type (or an enumeration of its possible values),
  • and its default value.


For example:


src CDATA #REQUIRED
id ID #IMPLIED
sort CDATA #FIXED "true"
print (yes | no) "yes"
>


Here are some attribute types supported by both SGML and XML:
CDATA: this type means characters data and indicates that the effective value of the attribute can be any textual value, unless the attribute is specified as fixed (the comments in the DTD may further document which values are effectively accepted, but the DTD syntax does not allow such precise specification);
ID: the effective value of the attribute must be a valid identifier, and it is used to define and anchor to the current element the target of references using this defined identifier (including as document fragment identifiers that may be specified at end of an URI after a "#" sign); it is an error if distinct elements in the same document are defining the same identifier; the uniqueness constraint also implies that the identifier itself carries no other semantics and that identifiers must be treated as opaque in applications; note that XML also predefines the standard pseudo-attribute "xml:id" with this type, without needing any declaration in the DTD, so the uniqueness constraint also applies to these defined identifiers when they are specified anywhere in a XML document.
IDREF or IDREFS: the effective value of the attribute can only be a valid identifier (or a space-separated list of such identifiers) and must be referencing the unique element defined in the document with an attribute declared with the type ID in the DTD (or the unique element defined in an XML document with a pseudo-attribute "xml:id") and whose effective value is the same identifier;
NMTOKEN or NMTOKENS: the effective value of the attribute can only be a valid name token (or a spaced-separated list of such name tokens), but it is not restricted to be a unique identifier within the document; this name may carry supplementary and application-dependent semantics and may require additional naming constraints, but this is out of scope of the DTD;
ENTITY or ENTITIES: the effective value of the attribute can only be the name of an unparsed external entity (or a space-separated list of such names), which must also be declared in the document type declaration; this type is not supported in HTML parsers, but is valid in SGML and XML 1.0 or 1.1 (including XHTML and SVG);
(value1|...): the effective value of the attribute can only be one of the enumerated list (specified between parentheses and separated by a "|" pipe character) of textual values, where each value in the enumeration is possibly specified between 'single' or "double" quotation marks if it's not a simple name token;
NOTATION (notation1|...): the effective value of the attribute can only be any one of the enumerated list (specified between parentheses and separated by a "|" pipe character) of notation names, where each notation name in the enumeration must also be declared in the document type declaration; this type is not supported in HTML parsers, but is valid in SGML and XML 1.0 or 1.1 (including XHTML and SVG).

A default value can define whether an attribute must occur (#REQUIRED) or not (#IMPLIED), or whether it has a fixed value (#FIXED), or which value should be used as a default value ("…") in case the given attribute is left out in an XML tag.

Note that attribute list declarations are ignored by non-validating SGML and XML parsers (in which cases any attribute will be accepted within all elements of the parsed document), but these declarations are still checked for well-formness and validity.

Entity declarations

An entity is similar to a macro. The entity declaration assigns it a value which is retained throughout the document. A common use is to have a name more recognizable than a numeric character reference for an unfamiliar character. Entities help to improve legibility of an XML text. In general, there are two types: internal and external.
  • Internal (parsed) entities are associating a name with any arbitrary textual content defined in their declaration (which may be in the internal subset or in the external subset of the DTD declared in the document). When a named entity reference is then encountered in the rest of the document (including in the rest of the DTD), and if this entity name has effectively been defined as a parsed entity, the reference itself is replaced immediately by the textual content defined in the parsed entity, and the parsing continues within this replacement text.
    • Predefined named character entities are similar to internal entities: 5 of them however are treated specially in all SGML, HTML and XML parsers. These entities are a bit different from normal parsed entities because when a named character entity reference is encountered in the document, the reference is also replaced immediately by the character content defined in the entity, but the parsing continues after the replacement text which is immediately inserted literally in the currently parsed token (if such character is permitted in the textual value of that token). This allows some characters that are needed for the core syntax of HTML or XML themselves to be escaped from their special syntactic role (notably "&" which is reserved for beginning entity references, "<" or ">" which are reserved to delimit the markup tags, and "double" or 'single' quotation marks which are reserved to delimit the values of attributes and entity definitions). Predefined character entities also include numeric character references that are handled the same way and can also be used to escape the characters they represent, or to bypass limitations in the character repertoire supported by the document encoding.
    • In basic profiles for SGML or in HTML documents, the declaration of internal entities is not possible (because external DTD subsets are not retrieved, and internal DTD subsets are not supported in these basic profiles).
    • Instead, HTML standards are predefining a large set of up to several hundreds of named character entities, but they can still be handled as standard parsed entities, defined in the DTD used by the parser.
  • External entities refer to external storage objects. They are just declared by a unique name in the document, and defined with a public identifier (an FPI) and/or a system identifier (interpreted as an URI
    Úri
    Úriis a village and commune in the comitatus of Pest in Hungary....

    ) specifying where the source of their content. They exist in fact in two variants:
    • parsed external entities (most often defined with a SYSTEM identifier indicating the URI of their content) that are not associated in their definition to a named annotation, in which case validating XML or SGML parsers will retrieve their contents and parse them as if they were declared as internal entities (the external entity containing their effective replacement text);
    • unparsed external entities that are defined and associated with an annotation name, in which case they will be treated as opaque references and signaled as such to the application using the SGML or XML parser: their interpretation, retrieval and parsing is left to the application, according the types of annotations it supports (see the next section about annotations and for examples of unparsed external entities).
    • External entities are not supported in basic profiles for SGML or in HTML documents, but are valid in full implementations of SGML and in XML 1.0 or 1.1 (including XHTML and SVG, even if they are not strictly needed in those document types).


An example of internal entity declarations (here in an internal DTD subset of an SGML document) is:







]>
&question;&signature;


Note that internal entities may be defined in any order, as long as they are not referenced and parsed in the DTD or in the body of the document, in their order of parsing : it is valid to include a reference to a still undefined entity within the content of a parsed entity, but it is invalid to include anywhere else any named entity reference before this entity has been fully defined, including all other internal entities referenced in its defined content (this also prevents circular or recursive definitions of internal entities). This document is parsed as if it was:







]>
Why couldn’t I publish directly my books in standard SGML? — William Shakespeare.


Note that reference to the "author" internal entity is not substituted in the replacement text of the "signature" internal entity. Instead, it is replaced only when the "signature" entity reference is parsed within the content of the "sgml" element, but only by validating parsers (non validating parsers will not substitute entity references occurring within contents of element or within attribute values,in the body of the document.

This is possible because the replacement text specified in the internal entity definitions permit a distinction between parameter entity references (that are introduced by the "%" character and whose replacement applies to the parsed DTD contents) and general entity references (that are introduced by the "&" character and whose replacement is delayed until they are effectively parsed and validated). The "%" character for introducing parameter entity references in the DTD loses its special role outside of the DTD and it becomes a literal character.

However, the references to predefined numeric character entities are substituted wherever they occur, without needing a validating parser (they are only introduced by the "&" character).

Notation declarations

Notations are used in SGML or XML. They provide a complete reference to unparsed external entities whose interpretation is left to the application (which will interpret them directly or will retrieve the external entity themselves), by assigning them a simple name which is usable in the body of the document. For example, notations may be used to reference non-XML data in an XML 1.1 document. For example, to annotate SVG images, in order to associate them with a specific renderer:





This declares the MIME type of external images with this type, and associates it a notation name "type-image-svg". However, notation names are usually following a naming convention which are specific to the application generating or using the notation: notations are interpreted as additional meta-data whose effective content is an external entity and either a PUBLIC FPI, registered in the catalogs used by XML or SGML parsers, or a SYSTEM URI, whose interpretation is application dependent (here a MIME type, interpreted as a relative URI, but it could be an absolute URI to a specific renderer, or a URN indicating a OS-specific object identifier such as a UUID).

The declared notation name must be unique within all the document type declaration, i.e. in the external subset as well as the internal subset, at least for conformance with XML.

Notations can be associated to unparsed external entities included in the body of the SGML or XML document. The PUBLIC or SYSTEM parameter of these external entities specify the FPI and/or the URI where the unparsed data of the external entity is located, and the additional NDATA parameter of these defined entities specify the additional notation (i.e., effectively the MIME type here). For example:





data ENTITY #IMPLIED>



]>





Within the body of the SGML document, these referenced external entities (whose name is specified between "&" and ";") are not replaced like usual named entities (defined with a CDATA value), but are left as distinct unparsed tokens that may be used either as the value of an element attribute (like above) or within the element contents, provided that either the DTD allows such external entities in the declared content type of elements or in the declared type of attributes (here the ENTITY type for the data attribute), or the SGML parser is not validating the content.

Notations may also be associated directly to elements as additional meta-data, without associating them to another external entity, by giving their names as possible values of some additional attributes (also declared in the DTD within the declaration of the element). For example:




type NOTATION (
type-vendor-specific ) #IMPLIED>



title CDATA #IMPLIED
data ENTITY #IMPLIED
type NOTATION (
type-image-svg |
type-image-gif ) #IMPLIED>


"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">







]>


&example1SVG;









The example above shows a notation named "type-image-svg" that references the standard public FPI and the system identifier (the standard URI) of an SVG 1.1 document, instead of specifying just a system identifier as in the first example (which was a relative URI interpreted locally as a MIME type). This annotation is referenced directly within the unparsed "type" attribute of the "img" element, but its content is not retrieved. It also declares another notation for a vendor-specific application, to annotate the "sgml" root element in the document. In both cases, the declared notation named is used directly in a declared "type" attribute, whose content is specified in the DTD with the "NOTATION" attribute type (this "type" attribute is declared for the "sgml" element, as well as for the "img" element).

However, the "title" attribute of the "img" element specifies the internal entity "example1SVGTitle" whose declaration that does not define an annotation, so it will be parsed by validating parsers and the entity replacement text will be "Title of example1.svg".

And the content of the "img" element is referencing another external entity "example1SVG" whose declaration also does not define an notation, so it will also be parsed by validating parsers and the entity replacement text will be located by its defined SYSTEM identifier "example1.svg" (also interpreted as a relative URI). The effective content for the "img" element be the content of this second external resource. The difference with the GIF image, is that the SVG image will be parsed within the SGML document, according to the declarations in the DTD, where the GIF image is just referenced as an opaque external object (which is not parsable with SGML) via its "data" attribute (whose value type is an opaque ENTITY).

Only one notation name may be specified in the value of ENTITY attributes (there's no support in SGML, XML 1.0 or XML 1.1 for multiple notation names in the same declared external ENTITY, so separate attributes will be needed). However multiple external entities may be referenced (in a space-separated list of names) in attributes declared with type ENTITIES, and where each named external entity is also declared with its own notation).

Notations are also completely opaque for XML and SGML parsers, so they are not differentiated by the type of the external entity that they may reference (for these parsers they just have a unique name associated to a public identifier (an FPI) and/or a system identifier (a URI)).

Some applications (but not XML or SGML parsers themselves) also allow referencing notations indirectly by naming them in the "URN:name" value of a standard CDATA attribute, everywhere a URI can be specified. However this behaviour is application-specific, and requires that the application maintains a catalog of known URNs to resolve them into the notations that have been parsed in a standard SGML or XML parser. This use allows notations to be defined only in a DTD stored as an external entity and referenced only as the external subset of documents, and allows these documents to remain compatible with validating XML or SGML parsers that have no direct support for notations.

Notations are not used in HTML, or in basic profiles for XHTML and SVG, because:
  • all the external entities used by these standard document types are referenced by simple attributes, declared with the CDATA type in their standard DTD (such as the "href" attribute of an anchor "a" element, or the "src" attribute of an image "img" element, whose values are interpreted as a URI, without needing any catalog of public identifiers, i.e., known FPI);
  • all the external entities for additional meta-data are referenced:
    • either by additional attributes (such as the "type" attribute which indicates the MIME type of the external entity, or the "charset" attribute which indicates its encoding),
    • or by additional elements (such as "link" or "meta" in HTML and XHTML) within their own attributes,
    • or by standard pseudo-attributes in XML and XHTML (such as "xml:lang", or "xmlns" and "xmlns:*" for namespace declarations).


Note also that even in validating SGML or XML 1.0 or XML 1.1 parsers, the external entities referenced by an FPI and/or URI in declared notations are not retrieved automatically by the parsers themselves. Instead, these parsers just provides to the application the parsed FPI and/or URI associated to the notations found in the parsed SGML or XML document, and with a facility for a dictionnary containing all notation names declared in the DTD; these validating parsers will also check the uniqueness of notation name declarations, and will report a validation error if some notation names are used anywhere in the DTD or in the document body but not declared:
  • if the application can't use any notation (or if their FPI and/or URI are unknown or not supported in their local catalog), these notations may be either ignored silently by the application or the application could signal an error;
  • otherwise the applications will decide themselves how to interpret them, then if the external entities must be retrieved and then parsed separately;
  • applications may then signal an error if such interpretation, retrieval or separate parsing ever fails.
  • Unrecognized notations that may cause an application to signal an error should not be blocking the interpretation of the validated document using them.

XML DTDs and schema validation

The XML DTD syntax is one of several XML schema
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

 languages. However many of them do not fully replace the XML DTD. Notably, the XML DTD allows defining entities and notations that have no direct equivalents in DTD-less XML only (because internal entities and parsable external entities are not part of XML schema languages, and because other unparsed external entities and notations have no simple equivalent mappings in most XML shema languages).

Most XML schema languages are only replacements for element declarations and attribute list declarations, in such as way that it becomes possible to parse XML documents with non-validating XML parsers (if the only purpose of the external DTD subset was to define the schema). In addition, documents for these XML schema languages have to be parsed separately, so validating the schema of XML documents in pure standalone mode is not really possible with these languages: the document type declaration remains necessary for at least identifying (with a XML Catalog
XML Catalog
XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs....

) the schema used in the parsed XML document and that will be validated in another language.

A common misconception holds that non-validating XML parsers do not have to read document type declarations, when in fact, the document type declarations must still be scanned for correct syntax as well as validity of declarations, and must still parse all entity declarations in the internal subset, and substitute the replacement texts of internal entities occurring anywhere in the document type declaration or in the document body.

A non-validating parser may, however, elect not to read parsable external entities (including the external subset), and don't have to honor the content model restrictions defined in element declarations and in attribute list declarations.

If the XML document depends on parsable external entities (including the specified external subset, or parsable external entities declared in the internal subset), it should assert standalone="no" in its XML declaration. Identification of the validating DTD may be performed by the use of XML Catalog
XML Catalog
XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs....

s, in order to retrieve its specified external subset.

In the example below, the XML document is declared with standalone="no" because it has an external subset in its document type declaration:







If the XML document type declaration includes any SYSTEM identifier for the external subset, it can't be safely processed as standalone: the URI should be retrieved, otherwise there may be unknown named character entities whose definition may be needed to correctly parse the effective XML syntax in the internal subset or in the document body (the XML syntax parsing is normally performed after the substitution of all named entities, excluding the five entities that are predefined in XML and that are implicitly substituted after parsing the XML document into lexical tokens). If it just includes any PUBLIC identifier, it may be processed as standalone, if the XML processor knows this PUBLIC identifier in its local catalog from where it can retrieve an associated DTD entity.

XML DTD schema example

An example of a very simple external XML DTD to describe the schema of a list of persons might consist of:










Taking this line by line:
  1. people_list is a valid element name, and an instance of such an element contains any number of person elements. The * denotes there can be 0 or more person elements within the people_list element.
  2. person is a valid element name, and an instance of such an element contains one element named name, followed by one named birthdate (optional), then gender (also optional) and socialsecuritynumber (also optional). The ? indicates that an element is optional. The reference to the name element name has no ?, so a person element must contain a name element.
  3. name is a valid element name, and an instance of such an element contains "parsed character data" (#PCDATA).
  4. birthdate is a valid element name, and an instance of such an element contains parsed character data.
  5. gender is a valid element name, and an instance of such an element contains parsed character data.
  6. socialsecuritynumber is a valid element name, and an instance of such an element contains parsed character data.


An example of an XML file which makes use of and conforms to this DTD follows. The DTD is referenced here as an external subset, via the SYSTEM specifier and an URI. It assumes that we can identify the DTD with the relative URI reference "example.dtd"; the "people_list" after "!DOCTYPE" tells us that the root tags, or the first element defined in the DTD, is called "people_list":






Fred Bloggs
2008-11-27
Male




The same DTD can also be embedded directly in the XML document itself as an internal subset, by encasing it within [square brackets] in the document type declaration, in which case the document may no longer depend on other external entities and could be processed as standalone, like this:









]>


Fred Bloggs
2008-11-27
Male




One can render this in an XML-enabled browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

 (such as Internet Explorer
Internet Explorer
Windows Internet Explorer is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year...

 or Mozilla Firefox
Mozilla Firefox
Mozilla Firefox is a free and open source web browser descended from the Mozilla Application Suite and managed by Mozilla Corporation. , Firefox is the second most widely used browser, with approximately 25% of worldwide usage share of web browsers...

) by pasting and saving the DTD component above to a text file named example.dtd and the XML file to a differently-named text file, and opening the XML file with the browser. The files should both be saved in the same directory. However, many browsers do not check that an XML document conforms to the rules in the DTD; they are only required to check that the DTD is syntactically correct. For security reasons, they may also choose not to read the external DTD.

Alternatives to DTDs (for specifying schemas) are available:
  • XML Schema, also referred to as XML Schema Definition (XSD), has achieved Recommendation status within the W3C, and is popular for "data oriented" (that is, transactional non-publishing) XML use because of its stronger typing and easier round-tripping to Java declarations. Most of the publishing world has found that the added complexity of XSD would not bring them any particular benefits, so DTDs are still far more popular there. An XML Schema Definition is itself an XML document while a DTD is not.
  • RELAX NG
    RELAX NG
    In computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...

    , which is also a part of DSDL, is an ISO international standard. It is more expressive than XSD, while providing a simpler syntax, but commercial software support has been slow in coming.

See also

  • Document Type Declaration
    Document Type Declaration
    A Document Type Declaration, or DOCTYPE, is an instruction that associates a particular SGML or XML document with a Document Type Definition...

  • Semantic Web
    Semantic Web
    The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...

  • XML Schema Language Comparison
    XML Schema Language Comparison
    An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. There are several different languages available for specifying an XML...

     - Comparison to other XML Schema languages.
  • XML Schema (W3C)

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK