Groovy Xml processing

2,100 views

Published on

explains the various techniques to create and parse XML documents and files using groovy

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,100
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Groovy Xml processing

  1. 1. XML Processing The Groovy approach XML is like a human: it starts out cute when it’s small and gets annoying when it becomes bigger.
  2. 2. AGENDA •XML , Should we groovy it ? •Parsing XML •Comparing Java and Groovy XML parsing •DOM Category •Downsides •What’s GPath •Using XMLParser •Downsides •Using XMLSlurper •XMLPArser VS XMLSluper •So here is my Advice •Creating XML •Comparing Java and Groovy XML generation •Gstring , It's Not What You Think! •MarkupBuilder •StreamingMarkupBuilder •Comparing builders •Wait, XML processing is not OXM •XML using Groovy Conclusion
  3. 3. XML , Should we groovy it ? • Groovy does not force us to duplicate our efforts . • Use the Java-based approaches as needed specially for legacy XML processing code. • If we’re creating a new code to process XML, though, we should use Groovy facilities. here is why ?
  4. 4. Sample XML <langs type="current"> <language>Java</language> <language>Groovy</language> <language>JavaScript</language> </langs> • Parsing this trivial XML document is decidedly nontrivial in the Java language , 30 LOC !!
  5. 5. import org.xml.sax.SAXException; import org.w3c.dom.*; import javax.xml.parsers.*; import java.io.IOException; public class ParseXml { public static void main(String[] args) { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); try { DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse("src/languages.xml"); //print the "type" attribute Element langs = doc.getDocumentElement(); System.out.println("type = " + langs.getAttribute("type")); //print the "language" elements NodeList list = langs.getElementsByTagName("language"); for(int i = 0 ; i < list.getLength();i++) { Element language = (Element) list.item(i); System.out.println(language.getTextContent()); } }catch(ParserConfigurationException pce) { pce.printStackTrace(); }catch(SAXException se) { se.printStackTrace(); }catch(IOException ioe) { ioe.printStackTrace(); } } }
  6. 6. Groovy Code def langs = new XmlParser().parse("languages.xml") println "type = ${langs.attribute("type")}" langs.language.each{ println it.text() } //output: type = current Java Groovy JavaScript
  7. 7. Groovy 1-0 Java • Groovy code is significantly shorter than the equivalent Java code • Far more expressive, Writing langs.language.each, feels like working directly with the XML, it’s not like Java , thanks to the Dynamic nature of groovy and GPath .
  8. 8. Dom Category • We can use Groovy categories to define dynamic methods on classes ( borrowed from objectiveC ) • Groovy provides a category for working with the Document Object Model (DOM), by adding convenience methods. • DOMCategory :navigate the DOM structure the DOM API , with the convenience of GPath queries
  9. 9. What’s GPath? • Much like how XPath helps navigate the hierarchy of an XML document, but Gpath allows to navigate the hierarchy of objects(POJO/POGO) and XML using Dot notation . • Ex: car.engine.power Xml : <car year=“20> <engine> <power/> </engine> </ car > POGO/POJO: Car.getEngine().getPower() we can access a year attribute of a car using car.'@year' (or car.@year). For more info : http://groovy.codehaus.org/GPath
  10. 10. Sample XML <languages> <language name="C++"> <author>Stroustrup</author> </language> <language name="Java"> <author>Gosling</author> </language> <language name="Lisp"> <author>McCarthy</author> </language> <language name="Modula-2"> <author>Wirth</author> </language> <language name="Oberon-2"> <author>Wirth</author> </language> <language name="Pascal"> <author>Wirth</author> </language> </languages>
  11. 11. document = groovy.xml.DOMBuilder.parse(new FileReader('languages.xml')) rootElement = document.documentElement use(groovy.xml.dom.DOMCategory) { println "Languages and authors" languages = rootElement.language languages.each { language -> println "${language.'@name'} authored by ${language.author[0].text()}" } def languagesByAuthor = { authorName -> languages.findAll { it.author[0].text() == authorName }.collect { it.'@name' }.join(', ') } println "Languages by Wirth:" + languagesByAuthor('Wirth') } DOM Category
  12. 12. Output Languages and authors C++ authored by Stroustrup Java authored by Gosling Lisp authored by McCarthy Modula-2 authored by Wirth Oberon-2 authored by Wirth Pascal authored by Wirth Languages by Wirth:Modula-2, Oberon-2, Pascal
  13. 13. Downside • one restriction is that we need to place code in a (use)Block
  14. 14. XMLParser • The class groovy.util.XMLParser exploits groovy’s dynamic typing and metaprogramming capabilities. • The code is much like the example we saw in Using DOMCategory, without the use block • XMLParser has added the convenience of iterators to the elements, so we can navigate easily using methods such as each(), collect(), and find().
  15. 15. XMLParser languages = new XmlParser().parse('languages.xml') println "Languages and authors" languages.each { println "${it.@name} authored by ${it.author[0].text()}" } def languagesByAuthor = { authorName -> languages.findAll { it.author[0].text() == authorName }.collect {it.@name }.join(', ') } println "Languages by Wirth:" + languagesByAuthor('Wirth')
  16. 16. Downside • It does not preserve the XML InfoSet1, and it ignores the XML comments and processing instructions in documents. • For large document sizes, the memory usage of XMLParser might become prohibitive.
  17. 17. XMLSlurper Same Code as XMLParser languages = new XmlSlurper().parse('languages.xml') println "Languages and authors" languages.language.each { println "${it.@name} authored by ${it.author[0].text()}" } def languagesByAuthor = { authorName -> languages.language.findAll { it.author[0].text() == authorName }.collect { it.@name }.join(', ') } println "Languages by Wirth:" + languagesByAuthor('Wirth')
  18. 18. XMLSluper • Name Spaces <languages xmlns:computer="Computer" xmlns:natural="Natural"> <computer:language name="Java"/> <computer:language name="Groovy"/> <computer:language name="Erlang"/> <natural:language name="English"/> <natural:language name="German"/> <natural:language name="French"/> </languages>
  19. 19. XMLSluper languages = new XmlSlurper().parse( 'computerAndNaturalLanguages.xml').declareNamespace(human: 'Natural') print "Languages: " println languages.language.collect { it.@name }.join(', ') print "Natural languages: " println languages.'human:language'.collect { it.@name }.join(', ') Output : Languages: Java, Groovy, Erlang, English, German, French Natural languages: English, German, French
  20. 20. XMLParser VS XMLSluper • The difference is that the parser structure is evaluated only once, the slurper paths may be evaluated on demand. On demand can be read as "more memory efficient but slower”. • Ultimatively it depends how many paths/requests – want only to know the value of an attribute in a certain part of the XML and then be done with it: • XmlParser : process all Nodes , a lot of objects will be created, memory and CPU spend • XmlSlurper: will not create the extra objects – If you need all parts of the document anyway, the slurper looses the advantage and will be slower
  21. 21. XMLParser VS XMLSluper • Both can do transforms on the document, but the slurper assumes it being a constant and thus you would have to first write the changes out and create a new slurper to read the new xml in. The parser supports seeing the changes right away.
  22. 22. Creating XML • Again Groovy doesn’t force us to use it , We can use the full power of Java APIs based XML processor, such as Xerces with groovy as well
  23. 23. Comparing Java and Groovy XML generation import org.w3c.dom.*; import javax.xml.parsers.*; import javax.xml.transform.*; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import java.io.StringWriter; public class CreateXml { public static void main(String[] args) { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); try { DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.newDocument(); Element langs = doc.createElement("langs"); langs.setAttribute("type", "current"); doc.appendChild(langs); Element language1 = doc.createElement("language"); Text text1 = doc.createTextNode("Java"); language1.appendChild(text1); langs.appendChild(language1); Element language2 = doc.createElement("language"); Text text2 = doc.createTextNode("Groovy"); language2.appendChild(text2); langs.appendChild(language2);
  24. 24. // Output the XML TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); StringWriter sw = new StringWriter(); StreamResult sr = new StreamResult(sw); DOMSource source = new DOMSource(doc); transformer.transform(source, sr); String xmlString = sw.toString(); System.out.println(xmlString); }catch(ParserConfigurationException pce) { pce.printStackTrace(); } catch (TransformerConfigurationException e) { e.printStackTrace(); } catch (TransformerException e) { e.printStackTrace(); } } }
  25. 25. Comparing Java and Groovy XML generation • I know that some of you are crying, "Foul!" right now. Plenty of third-party libraries can make this code more straightforward — JDOM and dom4j are two popular ones. But none of the Java libraries comes close to the simplicity of using a Groovy MarkupBuilder
  26. 26. Comparing Java and Groovy XML generation def xml = new groovy.xml.MarkupBuilder() xml.langs(type:"current"){ language("Java") language("Groovy") language("JavaScript") } That’s it !! • we are back to the nearly 1:1 ratio of code to XML • It's almost like a DSL for building XML, thanks to Groovy MOPping
  27. 27. Metaobject Protocol (MOP) • Metaprogramming means writing programs that manipulate programs,including themselves • In Groovy, we can use MOP to invoke methods dynamically and synthesize classes and methods on the fly. This can give us the feeling that our object favorably changed its class.
  28. 28. Metaobject Protocol (MOP) • The Java language is static: the Java compiler ensures that all methods exist before you can call them • Groovy's Builder demonstrates that one language's bug is another language's feature. • The API docs for MarkupBuilder, contains no langs() method , language() method, or any other element name. • Luckily, Groovy can catch these calls to methods that don't exist and do something productive with them. In the case of a MarkupBuilder, it takes the phantom method calls and generates well-formed XML.
  29. 29. GString It's Not What You Think! • Snapshot From the Groovy API Documentaion Nice suggestion Gosnell !
  30. 30. GString • We can use GString’s ability to embed expressions into a string, along with Groovy’s • facility for creating multiline strings. This facility is useful for creating small • XML fragments that we may need in code and tests.
  31. 31. langs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy'] content = '' langs.each { language, author -> fragment = """ <language name="${language}"> <author>${author}</author> </language> """ content += fragment } xml = "<languages>${content}</languages>" println xml
  32. 32. Downside • Only works for the small fragments of XML . • The preferred approach in Groovy applications is to use Builders. We don’t have to mess with string manipulation.
  33. 33. MarkupBuilder def sw = new StringWriter() def html = new groovy.xml.MarkupBuilder(sw) html.html{ head{ title("Links") } body{ h1("Here are my HTML bookmarks") table(border:1){ tr{ th("what") th("where") } tr{ td("Groovy Articles") td{ a(href:"http://ibm.com/developerworks", "DeveloperWorks") } } } } } def f = new File("index.html") f.write(sw.toString())
  34. 34. MarkupBuilder output: <html> <head> <title>Links</title> </head> <body> <h1>Here are my HTML bookmarks</h1> <table border='1'> <tr> <th>what</th> <th>where</th> </tr> <tr> <td>Groovy Articles</td> <td> <a href='http://ibm.com/developerworks'>DeveloperWorks</a> </td> </tr> </table> </body> </html>
  35. 35. DownSide • For Large XML documents, it’s not memory efficient • Miss some XML structures like namespaces and processing instructions and comments • For these reasons , streaming markup builder should be used .
  36. 36. StreamingMarkupBuilder langs = ['C++' : 'Stroustrup', 'Java' : 'Gosling', 'Lisp' : 'McCarthy'] xmlDocument = new groovy.xml.StreamingMarkupBuilder().bind { mkp.xmlDeclaration() mkp.declareNamespace(computer: "Computer") languages { comment << "Created using StreamingMarkupBuilder" langs.each { key, value -> computer.language(name: key) { author (value) } } } } println xmlDocument
  37. 37. StreamingMarkupBuilder <?xml version="1.0"?> <languages xmlns:computer='Computer'> <!--Created using StreamingMarkupBuilder--> <computer:language name='C++'> <author>Stroustrup</author> </computer:language> <computer:language name='Java'> <author>Gosling</author> </computer:language> <computer:language name='Lisp'> <author>McCarthy</author> </computer:language> </languages>
  38. 38. Markup builder vs streaming markup builder • MarkupBuilder creates a representation of the document in memory which is then written out to which ever stream is designated. • StreamingMarkupBuilder only evaluates the document on demand where the demand is driven by the writer requesting the next item. • The inversion of control means that for StreamingMarkup Builder, the document is never actually represented in memory only the program that generates the document is.
  39. 39. Wait, XML processing is not OXM • OXM , Object/XML Mapping , is the act of converting an XML document to and from an object. • It’s important to work with java objects instead of their XML ,e.g. SOAP web service request/response • In web services we need to send/receive the exact field types , thus the dynamic nature of groovy hasn’t much to offer in this static direct field mappings ,instead the Java Architecture for XML Binding (JAXB) or other library (e.g. JiBX ,..)should be used
  40. 40. XML using Groovy Conclusions • What we covered can be compared with JAXP, The Java API for XML Processing (JAXP) • Groovy is by far simplifies the processing and results in a much shorter and more expressive code • GPath allows to traverse the XML/POGO in a similar way • Builders and parsers use Mooping and dynamic groovy features to provide a DSL like for XML processing • Groovy is strongly recommended for XML processing , especially if we are about to write a new code .
  41. 41. References • InfoSet : • http://www.informit.com/library/content.aspx ?b=STY_XML_21days&seqNum=40

×