Who came up with DOM and XPath in Java?

I’ve hated the DOM implementation in Java for a long time. Today I used XPath for the first time, now I hate it too. Up to now I’ve used a collection of utility methods that would just iterate over nodes until it found one with a matching tag name and/or attribute set. After my experience today I’m back to them. Seriously how wordy is this? (Exception handling excluded for ‘brevity’)

DocumentBuilderFactory docfactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docbuilder = docfactory.newDocumentBuilder();

// Assume we've got the file as an InputSource
Document docroot = docbuilder.parse(filestream);

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate("/errors", docroot, XPathConstants.NODESET);

int length = nodes.getLength();
for (int i = 0; i < length; i++) {
    Node node = nodes.item(i);

    if (node instanceof Element) {
        Element e = (Element) node;
        // ...
    }
    else {
        // ...
    }
}

You get the idea. The code I was writing was meant to pull out all ‘errors’ blocks, consolidate them, update a count attribute, then replace the old errors blocks with the new one. This mean there were more XPaths, parsing of Integers, then converting them back to Strings, it was ridiculous. To make it even worse, the nodes returned in the NodeList, were as far as I could tell, copies and not the original nodes, so I couldn’t remove them. If I’m reading the API correctly, a document fragment is returned, so to be fair it is documented, but when element.getParentNode().removeChild(element) is failing, it’s hard to get past the frustration and make sense of the docs.

Why can’t I have something like:

Document doc = new Document(...path to XML file...);
List matchingNodes = doc.find(”/errors”);

for (Element errors : matchingNodes) { // … }

That’s not too un-Java is it? Okay, the return type of my find method isn’t well defined, but that can be worked around.

Why is this API so unwieldy?

Spread the word: Technorati related  |  Technorati related  |  del.icio.us bookmark it!  |  submit Who came up with DOM and XPath in Java? digg.com digg it!  |  reddit reddit!

5 Responses to “Who came up with DOM and XPath in Java?”

  1. Simon says:

    >> Why is this API so unwieldy?

    Because it’s Java.

    It’s a marketing tool used by companies to get work, it’s not a “real” programming language :)

  2. Miles says:

    There are nice Java APIs too. The Collections API is really well done, and the Streams API, when applied correctly is really useful. But the XML processing ones…I can only assume they were designed by committee. Sure it’s very flexible, but the overriding design principle has to be, make the 95% case simple, and that case for me is to add/change/delete the elements in a document.

    I take it you don’t use Java these days? So what is your language du jour? And I’m going to give you an acid test, it’s got to do XML processing elegantly.

  3. JD says:

    Sane XPath APIs do exist in other languages. In perl you can do:

    my $parser = XML::LibXML->new();

    my $doc = $parser->parse_file($ARGV[0]);

    my $root = $doc->documentElement();

    my $firstname = $root->findvalue(”/article/articleinfo/author/firstname” );

    my $nodes = $root->find(”/article/sect1″ );
    while (my $node = $nodes->shift()) {
    $content .= $node->toString();
    }

    I don’t see why Java’s API is so crap.

  4. JD says:

    It’s not great, but JDOM appears to be a much saner API. If course I would have perferred a findNodes() method on an node, but never mind.

    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(”dvds.xml”);
    XPath x = XPath.newInstance(”/collection/dvd”);
    List list = x.selectNodes(doc);

  5. Miles says:

    I like JDOM and have used in the past. At work there was a debate over what XML library to use, and ‘using core libraries where ever possible’ won the day so we were stuck with W3C DOM. JDOM’s XPath looks like and simple, and I agree, ‘find’ should be a method on Node, but I can live with it being in a separate class.

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>