I’m doing some sample code for talking to the Yahoo! Search Web Services. I’m writing it in Java and figured I might as well take the opportunity to write it using the latest and greatest: J2SE 5.0 (or is it 1.5? or is it Tiger?). One nice thing about J2SE 5.0 is the new XPath support. I’ve always felt that SAX is too low level an API for directly dealing with XML and DOM is just too much of a pain in the butt because of all the code you have to write to traverse down to a particular element.
The XML I’m attempting to parse is a simple web search for Ryan Kennedy. I utilized the javax.xml.xpath.XPath object and the evaluate() functions it provides. Notice, there’s two ways of getting XML into the XPath evaluation() method: using an org.xml.sax.InputSource or using a java.lang.Object.
I tried using an InputSource first, since it meant I didn’t have to do anything other than create an InputSource instance that wrapped a java.io.InputStream. This presented an initial problem. When I tried evaluating the XPath expression “/ResultSet/Result/Url”, I should have gotten back a org.w3c.dom.NodeList instance. But I didn’t. If you look closely at the XML, you can see there’s a default namespace configured: xmlns=”urn:yahoo:srch”. I had to change the XPath expression to be “/urn:yahoo:srch:ResultSet/urn:yahoo:srch:Result/urn:yahoo:srch:Url”. This was ugly as hell and not intuitive at all, but it worked. I decided I wanted to also print out the totalResultsReturned attribute from the ResultSet element, so I made a second call to evaluate(). This caused the second call to fail. Evidently, when using an InputSource you can only evaluate once.
So I switched from using the InputSource version of evaluate() to the Object version of evaluate(). It turns out you can pass in a org.w3c.dom.Document object in. So I used the javax.xml.parsers.DocumentBuilder to construct a DOM document object from the XML. I passed the Document into evaulate() and didn’t get anything back. I stripped off the namespace prefixes on my XPath expressions and it started working.
So, if I use an InputSource with XPath.evaluate() I have to concern myself with namespaces. If I use a Document with XPath.evaluate() I don’t. Maybe I’m missing something, but the behavior of the XPath support doesn’t seem to be very consistent.
You’re missing (as I did) that the parser you use must be namespace aware. Also you probably want to have an implementation for NamespaceContext to help you with mapping namespaces.