Java API for XML Processing
Encyclopedia
The Java API for XML
Processing, or JAXP (icon ), is one of the Java XML
programming API
s. It provides the capability of validating and parsing XML
documents. The three basic parsing interfaces are:
In addition to the parsing interfaces, the API provides an XSLT
interface to provide data and structural transformations on an XML document. JAXP was developed under the Java Community Process
as JSR 5 (JAXP 1.0) and JSR 63 (JAXP 1.1 and 1.2).
JAXP version 1.4.4 was released on September 3, 2010. JAXP 1.3 was end-of-lifed on February 12, 2008.
The DOM parser is called a
Refer to the Javadoc
documentation of the Java package
for a complete list of node types.
.
The
Clients provide a subclass of the
During parsing, the parser may need to access external documents. It is possible to store a local cache for frequently-used documents using an XML Catalog
.
This was introduced with Java 1.3 in May 2000.
, allows for conversion of an XML document into other forms of data. JAXP provides interfaces in package
Main features of the interface are
Two abstract interfaces Source and Result are defined to represent the input and output of the transformation. This is a somewhat unconventional use of Java interfaces, since there is no expectation that a processor will accept any class that implements the interface - each processor can choose which kinds of Source or Result it is prepared to handle. In practice all JAXP processors support the three standard kinds of Source (
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
Processing, or JAXP (icon ), is one of the Java XML
Java XML
The Java programming language XML APIs from Sun Microsystems consist of the following separate programming APIs:* Java API for XML Processing, or JAXP.* Java API for XML Messaging, or JAXM.* Java API for XML-based RPC, or JAX-RPC....
programming API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
s. It provides the capability of validating and parsing XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
documents. The three basic parsing interfaces are:
- the Document Object ModelDocument Object ModelThe Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...
parsing interface or DOM interface - the Simple API for XMLSimple API for XMLSAX is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model...
parsing interface or SAX interface - the Streaming API for XMLStAXStreaming API for XML is an application programming interface to read and write XML documents, originating from the Java programming language community.Traditionally, XML APIs are either:...
or StAX interface (part of JDK 6; separate jar available for JDK 5)
In addition to the parsing interfaces, the API provides an XSLT
XSL Transformations
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
interface to provide data and structural transformations on an XML document. JAXP was developed under the Java Community Process
Java Community Process
The Java Community Process or JCP, established in 1998, is a formalized process that allows interested parties to get involved in the definition of future versions and features of the Java platform....
as JSR 5 (JAXP 1.0) and JSR 63 (JAXP 1.1 and 1.2).
J2SE Java Platform, Standard Edition Java Platform, Standard Edition or Java SE is a widely used platform for programming in the Java language. It is the Java Platform used to deploy portable applications for general use... version | JAXP version bundled |
---|---|
1.4 | 1.1 |
1.5 | 1.3 |
1.6 | 1.4 |
JAXP version 1.4.4 was released on September 3, 2010. JAXP 1.3 was end-of-lifed on February 12, 2008.
DOM interface
The DOM interface is perhaps the easiest to understand. It parses an entire XML document and constructs a complete in-memory representation of the document using the classes modeling the concepts found in the Document Object Model(DOM) Level 2 Core Specification.The DOM parser is called a
DocumentBuilder
, as it builds an in-memory Document
representation. The is created by the . The DocumentBuilder
creates an instance, which is a tree structure containing nodes in the XML Document. Each tree node in the structure implements the interface. There are many different types of tree nodes, representing the type of data found in an XML document. The most important node types are:
- element nodes that may have attributes
- text nodes representing the text found between the start and end tags of a document element.
Refer to the Javadoc
Javadoc
Javadoc is a documentation generator from Sun Microsystems for generating API documentation in HTML format from Java source code.The "doc comments" format used by Javadoc is the de facto industry standard for documenting Java classes. Some IDEs, such as Netbeans and Eclipse automatically generate...
documentation of the Java package
Java package
A Java package is a mechanism for organizing Java classes into namespaces similar to the modules of Modula. Java packages can be stored in compressed files called JAR files, allowing classes to download faster as a group rather than one at a time...
for a complete list of node types.
SAX interface
The SAX parser is called the and is created by the . Unlike the DOM parser, the SAX parser does not create an in-memory representation of the XML document and so is faster and uses less memory. Instead, the SAX parser informs clients of the XML document structure by invoking callbacks, that is, by invoking methods on a instance provided to the parser. This way of accessing document is called Streaming XMLStreaming XML
Streaming XML means dynamic data which is in an XML format.Another popular use of this term refers to one method of consuming XML data – largely known as Simple API for XML. This is via asynchronous events that are generated as the XML data is parsed. In this context, the consumer streams through...
.
The
DefaultHandler
class implements the , the , the , and the interfaces. Most clients will be interested in methods defined in the ContentHandler
interface that are called when the SAX parser encounters the corresponding elements in the XML document. The most important methods in this interface are:
-
startDocument
andendDocument
methods that are called at the start and end of a XML document. -
startElement
andendElement
methods that are called at the start and end of a document element. -
characters
method that is called with the text data contents contained between the start and end tags of an XML document element.
Clients provide a subclass of the
DefaultHandler
that overrides these methods and processes the data. This may involve storing the data into a database or writing it out to a stream.During parsing, the parser may need to access external documents. It is possible to store a local cache for frequently-used documents using an XML Catalog
XML Catalog
XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs....
.
This was introduced with Java 1.3 in May 2000.
StAX interface
StAX was designed as a median between the DOM and SAX interface. In its metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs. This is different from an event based API - such as SAX - which 'pushes' data to the application - requiring the application to maintain state between events as necessary to keep track of location within the document.XSLT interface
The XML Stylesheet Language for Transformations, or XSLTXSL Transformations
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
, allows for conversion of an XML document into other forms of data. JAXP provides interfaces in package
javax.xml.transform
allowing applications to invoke an XSLT transformation. This interface was originally called TrAX (Transformation API for XML), and was developed by an informal collaboration between the developers of a number of Java XSLT processors.Main features of the interface are
- a factory class allowing the application to select dynamically which XSLT processor it wishes to use
- methods on the factory class to create a Templates object, representing the compiled form of a stylesheet. This is a thread-safe object that can be used repeatedly, in series or in parallel, to apply the same stylesheet to multiple source documents (or to the same source document with different parameters)
- a method on the Templates object to create a Transformer, representing the executable form of a stylesheet. This cannot be shared across threads, though it is serially reusable. The Transformer provides methods to set stylesheet parameters and serialization options (for example, whether output should be indented), and a method to actually run the transformation.
Two abstract interfaces Source and Result are defined to represent the input and output of the transformation. This is a somewhat unconventional use of Java interfaces, since there is no expectation that a processor will accept any class that implements the interface - each processor can choose which kinds of Source or Result it is prepared to handle. In practice all JAXP processors support the three standard kinds of Source (
DOMSource
, SAXSource
, StreamSource
) and the three standard kinds of Result (DOMResult
, SAXResult
, StreamResult
) and possibly other implementations of their own.External links
- Sun's JAXP product description
- jaxp: JAXP 1.4 Reference Implementation (JAXP 1.4)
- JSR 63 (JAXP 1.1 and 1.2)
- JSR 5 (JAXP 1.0)
- Document Object Model(DOM) Level 2 Core Specification
- Sample programs using the DOM and SAX parser Tutorial: XML with Xerces for Java
- Sun's Java and XML APIs: Helping or hurting?
- JAXP/TrAX introduction on the Apache XML web site