Parse XML with the StAX Java API
Friday, May 09, 2008 04:58 PM
Find out more about StAX, a pure Java API based on interfaces that can be implemented by multiple parsers, and how to use it to read and write XML documents.
Streaming API for XML (StAX) is an API that allows you to read and write XML documents in Java. StAX is a parser independent, pure Java API based on interfaces that can be implemented by multiple parsers.
StAX was introduced in Java 6.0 and is considered superior to Simple API for XML (SAX) and Document Object Model (DOM).
An introduction to StAX
XML APIs are traditionally either tree based or event based. In tree-based APIs, the entire document is read into memory as a tree structure for random access by the calling application. In event-based XML APIs, the application registers to receive events, as entities are encountered within the source document.
Tree-based APIs (e.g., DOM) allow for random access to the document; event-based APIs (e.g., SAX) require a small memory footprint and are typically much faster.
You can think of these two access metaphors as polar opposites. A tree-based API allows unlimited and random access and manipulation, while an event-based API is a "one shot" pass through the source document.
StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward (i.e., "pulling" the information from the parser as it needs). This is different from an event-based API, which "pushes" data to the application, requiring the application to maintain state between events as necessary to keep track of the location within the document.
StAX uses a pull approach, so the developer requests events rather than having event information from the XML parser pushed onto the client. This results in more natural, readable code without sacrificing performance. StAX has its roots in a number of incompatible pull APIs for XML, most notably XMLPULL.
How the StAX API works
The core StAX API falls into two categories: the Event Iterator API and the Cursor API. Applications can use these APIs for reading, parsing, and writing XML documents.
Event Iterator API
The Event Iterator API, which is very similar to SAX, has two main interfaces: XMLEventReader (for parsing XML) and XMLEventWriter (for generating XML).
Imagine you have the following simple XML file:
<?xml version="1.0" encoding="UTF-8"?> <root> <port>1</port> <baud>9600</baud> <bit>1</bit> <parity>0</parity> </root>
The following code reads the XML file and puts the content of the example XML file to the standard output:
package test.xml;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;
public class TestEventStaX {
private String configFile;
public void setFile(String configFile) {
this.configFile = configFile;
}
public void readConfig() {
try {
// First create a new XMLInputFactory
XMLInputFactory inputFactory =
XMLInputFactory.newInstance();
// Setup a new eventReader
InputStream in = new FileInputStream(configFile);
XMLEventReader eventReader =
inputFactory.createXMLEventReader(in);
// Read the XML document
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartElement()) {
if (event.asStartElement().getName().getLocalPart()
== ("port")) {event = eventReader.nextEvent();
System.out.println(event.asCharacters().getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
== ("baud")) {event = eventReader.nextEvent();
System.out.println(event.asCharacters().getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
== ("bit")) {event = eventReader.nextEvent();
System.out.println(event.asCharacters().getData());
continue;
}
if (event.asStartElement().getName().getLocalPart()
== ("parity")) {event = eventReader.nextEvent();
System.out.println(event.asCharacters().getData());
continue;
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
TestEventStaX read = new TestEventStaX();
read.setFile("root.xml");
read.readConfig();
}
}
To write the same file into rootfile2.xml, you need to do the following:
package test.xml;
import java.io.FileOutputStream;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartDocument;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
public class WriteConfigFile {
private String configFile;
public void setFile(String configFile) {
this.configFile = configFile;
}
public void saveConfig() throws Exception {
// Create a XMLOutputFactory
XMLOutputFactory outputFactory =
XMLOutputFactory.newInstance();
// Create XMLEventWriter
XMLEventWriter eventWriter = outputFactory
.createXMLEventWriter(new FileOutputStream(configFile));
// Create a EventFactory
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
XMLEvent end = eventFactory.createDTD("n");
// Create and write Start Tag
StartDocument startDocument =
eventFactory.createStartDocument();
eventWriter.add(startDocument);
// Create config open tag
StartElement configStartElement
= eventFactory.createStartElement("", "", "config");
eventWriter.add(configStartElement);
eventWriter.add(end);
// Write the different nodes
createNode(eventWriter, "port", "1");
createNode(eventWriter, "baud", "9600");
createNode(eventWriter, "bit", "1");
createNode(eventWriter, "parity", "0");
eventWriter.add(eventFactory.createEndElement("", "", "root"));
eventWriter.add(end);
eventWriter.add(eventFactory.createEndDocument());
eventWriter.close();
}
private void createNode(XMLEventWriter eventWriter, String name,
String value) throws XMLStreamException {
XMLEventFactory eventFactory =
XMLEventFactory.newInstance();
XMLEvent end = eventFactory.createDTD("n");
XMLEvent tab = eventFactory.createDTD("t");
// Create Start node
StartElement sElement
= eventFactory.createStartElement("", "", name);
eventWriter.add(tab);
eventWriter.add(sElement);
// Create Content
Characters characters
= eventFactory.createCharacters(value);
eventWriter.add(characters);
// Create End node
EndElement eElement =
eventFactory.createEndElement("", "", name);
eventWriter.add(eElement);
eventWriter.add(end);
}
/**
* @param args
*/
public static void main(String[] args) {
WriteConfigFile configFile = new WriteConfigFile();
configFile.setFile("rootfile2.xml");
try {
configFile.saveConfig();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The XML constructing process has similarities with the DOM approach, but it is a bit different. The code is self-explanatory.
Cursor API
The interface XMLStreamReader represents a cursor that's moved across an XML document from beginning to end. At any given time, this cursor points at one thing: a text node, a start-tag, a comment, the beginning of the document, etc. The cursor always moves forward and usually only moves one item at a time.
You invoke methods such as getName() and getText() on the XMLStreamReader to retrieve information about the item where the cursor is currently positioned. This is how you typically load a parser that depends on the installed StAX implementation:
URL u = new URL("http://www.mikhalenko.ru/");
InputStream in = u.openStream();
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
You can create the XMLStreamReader from any other java.io.Reader successor class. The next() method advances the cursor to the next item. When the cursor is positioned at the current element, you use various getter methods to extract data from the current item. These are the most important getters:
public QName getName() public String getLocalName() public String getNamespaceURI() public String getText() public String getElementText() public int getEventType() public Location getLocation() public int getAttributeCount() public QName getAttributeName(int index) public String getAttributeValue(String namespaceURI, String localName)
The loop with a switch statement is a very common pattern in StAX. There are a few ways to filter the event stream; for instance, you could use a stack of if-else statements instead of the switch, but almost all StAX programs will feature an event loop similar to this one:
int inHeader = 0;
for (int event = parser.next();
event != XMLStreamConstants.END_DOCUMENT;
event = parser.next()) {
switch (event) {
case XMLStreamConstants.START_ELEMENT:
if (isHeader(parser.getLocalName())) {
inHeader++;
}
break;
case XMLStreamConstants.END_ELEMENT:
if (isHeader(parser.getLocalName())) {
inHeader--;
if (inHeader == 0) System.out.println();
}
break;
case XMLStreamConstants.CHARACTERS:
if (inHeader > 0) System.out.print(parser.getText());
break;
case XMLStreamConstants.CDATA:
if (inHeader > 0) System.out.print(parser.getText());
break;
} // end switch
} // end while
The isHeader() function checks if the current element is the H1 or H2 header element in HTML:
private static boolean isHeader(String name) {
if (name.equals("h1")) return true;
if (name.equals("h2")) return true;
return false;
}
The previous code snippet reads through an XHTML document and prints out the contents of all the heading elements h1 and h2.
When writing XML, you use the XMLStreamWriter interface. You provide methods to write elements, attributes, comments, text, and all of the other parts of an XML document. An XMLStreamWriter is created by an XMLOutputFactory, like this:
OutputStream out = new FileOutputStream("data.xml");
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer =
factory.createXMLStreamWriter(out);
Then, you can use various writeXXX() methods:
writer.writeStartDocument("ISO-8859-1", "1.0");
writer.writeStartElement("greeting");
writer.writeAttribute("id", "g1");
writer.writeCharacters("Hello StAX");
writer.writeEndDocument();
When you finish creating the document, you want to flush and close the writer. This does not close the underlying output stream, so you'll need to close that too:
writer.flush(); writer.close(); out.close();
XMLStreamWriter helps maintain some well-formedness constraints. For instance, endDocument closes all unclosed start-tags, and writeCharacters performs any necessary escaping of reserved characters like & and <. However, the checking is minimal.
Implementations of StAX
There are a number of StAX implementations; the most notable are Sun’s StAX implementation, Woodstox (an open-source StAX implementation), and the StAX Reference Implementation from Codehaus.
In addition, the StAX-Utils Project provides a set of utility classes that make it easy for you to integrate StAX into your existing XML processing applications. For example, StAX-Utils includes classes to provide XML file indenting and formatting.
Check out these related resources
- JSR 173: Streaming API for XML
- JAXP Reference Implementation (which includes StAX)
- JAXP JavaDoc
Peter V. Mikhalenko is a Sun certified professional who works as a business and technical consultant for several top-tier investment banks.

» Achieve enhanced server performance with energy-efficient blade technology






There are currently no comments for this post.