XML schema
Encyclopedia
An XML schema is a description of a type of XML
document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity
constraints.
There are languages developed specifically to express XML schemas. The Document Type Definition
(DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S) and RELAX NG
.
The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.
One user notes "One reason the spec is so unreadable is because it exposes the abstract model continuously" and indeed it is not possible to derive the allowed structure of information elements from this abstract model alone without reading the text of the specification which, as has been noted, is rather difficult.
The accompanying diagram is an Entity-Relationship (ER) (though in UML format) metamodel of the information elements of XSD. This model does not describe the components of the abstract data model, this model addresses only the actual information elements themselves. The complexity of the model reflects the complexity of the specification itself.
, which is separate from XML's core concept of syntactic well-formedness. All XML documents must be well-formed, but it is not required that a document be valid unless the XML parser is "validating," in which case the document is also checked for conformance with its associated schema. DTD-validating parser
s are most common, but some support W3C XML Schema or RELAX NG as well.
Documents are only considered valid, if they satisfy the requirements of the schema with which they have been associated. These requirements typically include such constraints as:
Validation of an instance document against a schema can be regarded as a conceptually separate operation from XML parsing. In practice, however, many schema validators are integrated with an XML parser.
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity
Referential integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute of a relation to exist as a value of another attribute in a different relation ....
constraints.
There are languages developed specifically to express XML schemas. The Document Type Definition
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
(DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S) and RELAX NG
RELAX NG
In computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...
.
The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.
Metamodel
The leader of the original XML team admits that they did not begin with a data model. "In the interests of time, XML 1.0 did not define its own data model" They developed a specification for data structures without themselves first defining their own data model. A number of people "co-operated" by email over a short time period to create the original specification. Eventually the XSD specification was produced and the result has been widely criticised as being "unreadable" with one commentator going so far as to say: "it is one of the most heavily criticised specifications to come out of the organisation" User surveys also highlight the verbose, complex and difficult language used.One user notes "One reason the spec is so unreadable is because it exposes the abstract model continuously" and indeed it is not possible to derive the allowed structure of information elements from this abstract model alone without reading the text of the specification which, as has been noted, is rather difficult.
The accompanying diagram is an Entity-Relationship (ER) (though in UML format) metamodel of the information elements of XSD. This model does not describe the components of the abstract data model, this model addresses only the actual information elements themselves. The complexity of the model reflects the complexity of the specification itself.
Capitalization
There is some confusion as to when to use the capitalized spelling "Schema" and when to use the lowercase spelling. The lowercase form is a generic term and may refer to any type of schema, including DTD, XML Schema (aka XSD), RELAX NG, or others, and should always be written using lowercase except when appearing at the start of a sentence. The form "Schema" (capitalized) in common use in the XML community always refers to W3C XML Schema.Validation
The process of checking to see if an XML document conforms to a schema is called validationXML Validation
XML validation is the process of checking a document written in XML to confirm that it is both "well-formed" and also "valid" in that it follows a defined structure. A "well-formed" document follows the basic syntactic rules of XML, which are the same for all XML documents...
, which is separate from XML's core concept of syntactic well-formedness. All XML documents must be well-formed, but it is not required that a document be valid unless the XML parser is "validating," in which case the document is also checked for conformance with its associated schema. DTD-validating parser
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
s are most common, but some support W3C XML Schema or RELAX NG as well.
Documents are only considered valid, if they satisfy the requirements of the schema with which they have been associated. These requirements typically include such constraints as:
- Elements and attributes that must/may be included, and their permitted structure
- The structure as specified by a regular expressionRegular expressionIn computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
syntax - How character data is to be interpreted, e.g. as a numberNumberA number is a mathematical object used to count and measure. In mathematics, the definition of number has been extended over the years to include such numbers as zero, negative numbers, rational numbers, irrational numbers, and complex numbers....
, a dateCalendar dateA date in a calendar is a reference to a particular day represented within a calendar system. The calendar date allows the specific day to be identified. The number of days between two dates may be calculated. For example, "24 " is ten days after "14 " in the Gregorian calendar. The date of a...
, a URLUniform Resource LocatorIn computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
, a BooleanBoolean datatypeIn computer science, the Boolean or logical data type is a data type, having two values , intended to represent the truth values of logic and Boolean algebra...
, etc.
Validation of an instance document against a schema can be regarded as a conceptually separate operation from XML parsing. In practice, however, many schema validators are integrated with an XML parser.
XML schema languages
- Document Content Description facility for XML, an RDFResource Description FrameworkThe Resource Description Framework is a family of World Wide Web Consortium specifications originally designed as a metadata data model...
framework - Document Definition Markup LanguageDocument Definition Markup LanguageDocument Definition Markup Language is an XML schema language proposed in 1999 by various contributors from the xml-dev electronic mailing list...
(DDML) - Document Schema Definition LanguagesDocument Schema Definition LanguagesDocument Schema Definition Languages is a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology....
(DSDL) - Document Structure DescriptionDocument Structure DescriptionDocument Structure Description, or DSD, is a schema language for XML, that is, a language for describing valid XML documents. It's an alternative to DTD or the W3C XML Schema.An example of DSD in its simplest form:...
(DSD) - SGML’s Document Type DefinitionDocument Type DefinitionDocument Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
(DTD) - Namespace Routing LanguageNamespace Routing LanguageIn its simplest form, a Namespace Routing Language schema consists of a mapping from namespace URIs to schema URIs. An NRL schema is written in XML.DSDL Part 4 , NVDL is based on NRL.- External links :**...
(NRL) - OASIS CAM Content Assembly Mechanism
- RELAX NGRELAX NGIn computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...
and its predecessors RELAXRelaxRelax may refer to:In music:Albums*Relax , a 2003 album by Blank & Jones*Relax , a 2011 album by Das RacistSongs* "Relax" by Frankie Goes to Hollywood...
and TREXTREXTree Regular Expressions for XML is a simple schema language for XML.TREX's author, James Clark, says: : A TREX pattern specifies a pattern for the structure and content of an XML document. A TREX pattern thus identifies a class of XML documents consisting of those documents that match the pattern... - Schema for Object-Oriented XMLSchema for Object-Oriented XMLSchema for Object-Oriented XML, or SOX, is an XML schema language developed by Commerce One. In 1998 a SOX specification was submitted to the World Wide Web Consortium and published as a W3C Note. A revised version, SOX 2.0, was published as a W3C Note in 1999.SOX was one of several predecessors of...
(SOX) - SchematronSchematronIn markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...
- XML-Data ReducedXDR SchemaXML-Data Reduced was a schema language for specifying and validating XML documents.In January 1998, Microsoft, the University of Edinburgh and others submitted a proposal for an XML schema language called XML-Data to the World Wide Web Consortium...
(XDR) - XML Schema (WXS or XSD)
See also
- Data structureData structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
- Structuring informationInformation ArchitectureInformation architecture is the art of expressing a model or concept of information used in activities that require explicit details of complex systems. Among these activities are library systems, Content Management Systems, web development, user interactions, database development, programming,...
- List of XML schemas
- XML Information SetXML Information SetXML Information Set is a W3C specification describing an abstract data model of an XML document in terms of a set of information items...
- XML Schema Language ComparisonXML Schema Language ComparisonAn XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. There are several different languages available for specifying an XML...
- Schema (disambiguation) (for other uses of the term)
External links
- Comparing XML Schema Languages by Eric van der Vlist (2001)
- Comparative Analysis of Six XML Schema Languages by Dongwon Lee, Wesley W. Chu, In ACM SIGMOD Record, Vol. 29, No. 3, page 76-87, September 2000
- Taxonomy of XML Schema Languages using Formal Language Theory by Makoto Murata, Dongwon Lee, Murali Mani, Kohsuke Kawaguchi, In ACM Trans. on Internet Technology (TOIT), Vol. 5, No. 4, page 1-45, November 2005
- Application of XML Schema in Web Services Security by Sridhar Guthula, W3C Schema Experience Report, May 2005
- March 2009 DEVX article "Taking XML Validation to the Next Level: Introducing CAM" by Michael Sorens