Tech

Guides
 

Parse XML with the StAX Java API

By Peter Mikhalenko, Special to ZDNet Asia
Friday, May 09, 2008 04:58 PM

Find out more about StAX, a pure Java API based on interfaces that can be implemented by multiple parsers, and how to use it to read and write XML documents.

Streaming API for XML (StAX) is an API that allows you to read and write XML documents in Java. StAX is a parser independent, pure Java API based on interfaces that can be implemented by multiple parsers.

StAX was introduced in Java 6.0 and is considered superior to Simple API for XML (SAX) and Document Object Model (DOM).

An introduction to StAX
XML APIs are traditionally either tree based or event based. In tree-based APIs, the entire document is read into memory as a tree structure for random access by the calling application. In event-based XML APIs, the application registers to receive events, as entities are encountered within the source document.

Tree-based APIs (e.g., DOM) allow for random access to the document; event-based APIs (e.g., SAX) require a small memory footprint and are typically much faster.

You can think of these two access metaphors as polar opposites. A tree-based API allows unlimited and random access and manipulation, while an event-based API is a "one shot" pass through the source document.

StAX was designed as a median between these two opposites. In the StAX metaphor, the programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward (i.e., "pulling" the information from the parser as it needs). This is different from an event-based API, which "pushes" data to the application, requiring the application to maintain state between events as necessary to keep track of the location within the document.

StAX uses a pull approach, so the developer requests events rather than having event information from the XML parser pushed onto the client. This results in more natural, readable code without sacrificing performance. StAX has its roots in a number of incompatible pull APIs for XML, most notably XMLPULL.

How the StAX API works
The core StAX API falls into two categories: the Event Iterator API and the Cursor API. Applications can use these APIs for reading, parsing, and writing XML documents.

Event Iterator API
The Event Iterator API, which is very similar to SAX, has two main interfaces: XMLEventReader (for parsing XML) and XMLEventWriter (for generating XML).

Imagine you have the following simple XML file:

<?xml version="1.0" encoding="UTF-8"?>
 <root>
   <port>1</port>
   <baud>9600</baud>
   <bit>1</bit>
   <parity>0</parity>
 </root>

The following code reads the XML file and puts the content of the example XML file to the standard output:

package test.xml;      

 import java.io.FileInputStream;
 import java.io.FileNotFoundException;
 import java.io.InputStream;      

 import javax.xml.stream.XMLEventReader;
 import javax.xml.stream.XMLInputFactory;
 import javax.xml.stream.XMLStreamException;
 import javax.xml.stream.events.XMLEvent;      

 public class TestEventStaX {      

     private String configFile;      

     public void setFile(String configFile) {
         this.configFile = configFile;
     }      

     public void readConfig() {      

         try {
           // First create a new XMLInputFactory
           XMLInputFactory inputFactory =
XMLInputFactory.newInstance();
           // Setup a new eventReader
           InputStream in = new FileInputStream(configFile);
           XMLEventReader eventReader =
inputFactory.createXMLEventReader(in);
           // Read the XML document
           while (eventReader.hasNext()) {      

          XMLEvent event = eventReader.nextEvent();      

            if (event.isStartElement()) {
                if (event.asStartElement().getName().getLocalPart() 
                    == ("port")) {event = eventReader.nextEvent();
                    System.out.println(event.asCharacters().getData());
                    continue;
                }
                if (event.asStartElement().getName().getLocalPart() 
                    == ("baud")) {event = eventReader.nextEvent();
                    System.out.println(event.asCharacters().getData());
                    continue;
                }      

                if (event.asStartElement().getName().getLocalPart() 
                    == ("bit")) {event = eventReader.nextEvent();
                    System.out.println(event.asCharacters().getData());
                    continue;
                }      

                if (event.asStartElement().getName().getLocalPart() 
                    == ("parity")) {event = eventReader.nextEvent();
                    System.out.println(event.asCharacters().getData());
                    continue;
                }
            }
         }
     } catch (FileNotFoundException e) {
         e.printStackTrace();
     } catch (XMLStreamException e) {
         e.printStackTrace();
     }
 }      

  public static void main(String args[]) {
      TestEventStaX read = new TestEventStaX();
      read.setFile("root.xml");
      read.readConfig();
  }
}

To write the same file into rootfile2.xml, you need to do the following:

package test.xml;      

 import java.io.FileOutputStream;
 import javax.xml.stream.XMLEventFactory;
 import javax.xml.stream.XMLEventWriter;
 import javax.xml.stream.XMLOutputFactory;
 import javax.xml.stream.XMLStreamException;
 import javax.xml.stream.events.Characters;
 import javax.xml.stream.events.EndElement;
 import javax.xml.stream.events.StartDocument;
 import javax.xml.stream.events.StartElement;
 import javax.xml.stream.events.XMLEvent;      

 public class WriteConfigFile {
   private String configFile;      

   public void setFile(String configFile) {
       this.configFile = configFile;
   }      

   public void saveConfig() throws Exception {
       // Create a XMLOutputFactory
       XMLOutputFactory outputFactory =
XMLOutputFactory.newInstance();
       // Create XMLEventWriter
       XMLEventWriter eventWriter = outputFactory
            .createXMLEventWriter(new FileOutputStream(configFile));
       // Create a EventFactory
       XMLEventFactory eventFactory = XMLEventFactory.newInstance();
       XMLEvent end = eventFactory.createDTD("n");
       // Create and write Start Tag
       StartDocument startDocument =
eventFactory.createStartDocument();
       eventWriter.add(startDocument);      

       // Create config open tag
       StartElement configStartElement 
       = eventFactory.createStartElement("", "", "config");
       eventWriter.add(configStartElement);
       eventWriter.add(end);
       // Write the different nodes
       createNode(eventWriter, "port", "1");
       createNode(eventWriter, "baud", "9600");
       createNode(eventWriter, "bit", "1");
       createNode(eventWriter, "parity", "0");      

       eventWriter.add(eventFactory.createEndElement("", "", "root"));
       eventWriter.add(end);
       eventWriter.add(eventFactory.createEndDocument());
       eventWriter.close();
   }      

   private void createNode(XMLEventWriter eventWriter, String name,
           String value) throws XMLStreamException {      

       XMLEventFactory eventFactory =
XMLEventFactory.newInstance();
       XMLEvent end = eventFactory.createDTD("n");
       XMLEvent tab = eventFactory.createDTD("t");
       // Create Start node
       StartElement sElement 
       = eventFactory.createStartElement("", "", name);
       eventWriter.add(tab);
       eventWriter.add(sElement);
       // Create Content
       Characters characters 
       = eventFactory.createCharacters(value);
       eventWriter.add(characters);
       // Create End node
       EndElement eElement =
eventFactory.createEndElement("", "", name);
       eventWriter.add(eElement);
       eventWriter.add(end);      

   }     
     /**
     * @param args
     */
   public static void main(String[] args) {
       WriteConfigFile configFile = new WriteConfigFile();
       configFile.setFile("rootfile2.xml");
       try {
           configFile.saveConfig();
       } catch (Exception e) {
           e.printStackTrace();
       }
   }      

 }

The XML constructing process has similarities with the DOM approach, but it is a bit different. The code is self-explanatory.

Cursor API
The interface XMLStreamReader represents a cursor that's moved across an XML document from beginning to end. At any given time, this cursor points at one thing: a text node, a start-tag, a comment, the beginning of the document, etc. The cursor always moves forward and usually only moves one item at a time.

You invoke methods such as getName() and getText() on the XMLStreamReader to retrieve information about the item where the cursor is currently positioned. This is how you typically load a parser that depends on the installed StAX implementation:

URL u = new URL("http://www.mikhalenko.ru/");
 InputStream in = u.openStream();
 XMLInputFactory factory = XMLInputFactory.newInstance();
 XMLStreamReader parser = factory.createXMLStreamReader(in);

You can create the XMLStreamReader from any other java.io.Reader successor class. The next() method advances the cursor to the next item. When the cursor is positioned at the current element, you use various getter methods to extract data from the current item. These are the most important getters:

public QName    getName()
 public String   getLocalName()
 public String   getNamespaceURI()
 public String   getText()
 public String   getElementText()
 public int      getEventType()
 public Location getLocation()
 public int      getAttributeCount()
 public QName    getAttributeName(int index)
 public String   getAttributeValue(String namespaceURI,
String localName)

The loop with a switch statement is a very common pattern in StAX. There are a few ways to filter the event stream; for instance, you could use a stack of if-else statements instead of the switch, but almost all StAX programs will feature an event loop similar to this one:

    int inHeader = 0;
       for (int event = parser.next(); 
        event != XMLStreamConstants.END_DOCUMENT;
        event = parser.next()) {
         switch (event) {
           case XMLStreamConstants.START_ELEMENT:
             if (isHeader(parser.getLocalName())) {
               inHeader++;
             }
             break;
           case XMLStreamConstants.END_ELEMENT:
             if (isHeader(parser.getLocalName())) {
               inHeader--;
               if (inHeader == 0) System.out.println();
             }
             break;
           case XMLStreamConstants.CHARACTERS:
             if (inHeader > 0)  System.out.print(parser.getText());
             break;
           case XMLStreamConstants.CDATA:
             if (inHeader > 0)  System.out.print(parser.getText());
             break;
         } // end switch
       } // end while

The isHeader() function checks if the current element is the H1 or H2 header element in HTML:

    private static boolean isHeader(String name) {
       if (name.equals("h1")) return true;
       if (name.equals("h2")) return true;
       return false;
   }

The previous code snippet reads through an XHTML document and prints out the contents of all the heading elements h1 and h2.

When writing XML, you use the XMLStreamWriter interface. You provide methods to write elements, attributes, comments, text, and all of the other parts of an XML document. An XMLStreamWriter is created by an XMLOutputFactory, like this:

OutputStream out = new FileOutputStream("data.xml");
 XMLOutputFactory factory = XMLOutputFactory.newInstance();
 XMLStreamWriter writer =
factory.createXMLStreamWriter(out);

Then, you can use various writeXXX() methods:

writer.writeStartDocument("ISO-8859-1", "1.0");
 writer.writeStartElement("greeting");
 writer.writeAttribute("id", "g1");
 writer.writeCharacters("Hello StAX");
 writer.writeEndDocument();

When you finish creating the document, you want to flush and close the writer. This does not close the underlying output stream, so you'll need to close that too:

writer.flush();
 writer.close();
 out.close();

XMLStreamWriter helps maintain some well-formedness constraints. For instance, endDocument closes all unclosed start-tags, and writeCharacters performs any necessary escaping of reserved characters like & and <. However, the checking is minimal.

Implementations of StAX
There are a number of StAX implementations; the most notable are Sun’s StAX implementation, Woodstox (an open-source StAX implementation), and the StAX Reference Implementation from Codehaus.

In addition, the StAX-Utils Project provides a set of utility classes that make it easy for you to integrate StAX into your existing XML processing applications. For example, StAX-Utils includes classes to provide XML file indenting and formatting.

Check out these related resources

Peter V. Mikhalenko is a Sun certified professional who works as a business and technical consultant for several top-tier investment banks.


See also:  Java, XML

WORTHWHILE?

0

0 votes
Blog

Talkback 0 comments

There are currently no comments for this post.

Guest user

Guest user

Level: 
Joined: —
Already a member? Log in »



 

Loading...

Whitepapers / Case Studies

Downloads

Java News


Tech Jobs Now!

Tags

  1. access
  2. active
  3. analysis
  4. availability
  5. data
  6. directory
  7. do
  8. double-take
  9. excel
  10. field
  11. high
  12. java
  13. management
  14. microsoft
  15. page
  16. ports
  17. project
  18. secure
  19. server
  20. service
  21. services
  22. should
  23. simply
  24. sql
  25. time
  26. tips
  27. use
  28. using
  29. windows
  30. word