VTD-XML: The Future of XML Processing

3,958 views
3,767 views

Published on

Published in: Technology, News & Politics
1 Comment
2 Likes
Statistics
Notes
  • http://www.dbmanagement.info/Tutorials/XML.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,958
On SlideShare
0
From Embeds
0
Number of Embeds
44
Actions
Shares
0
Downloads
88
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

VTD-XML: The Future of XML Processing

  1. 1. VTD-XML: The Future of XML Processing<br />Albert Guo<br />junyuo@gmail.com<br />
  2. 2. VTD-XML<br />Motivations Behind VTD-XML<br />Why VTD-XML?<br />When to Use VTD-XML?<br />Known Limitations<br />Basic Concept<br />Essential Classes<br />Shortcomings<br />Typical Programming Flows<br />Demo<br />Reference<br />Agenda<br />2<br />
  3. 3. *Numerous*, well-known issues of old XML processing models, below summarizes a few:<br />Comparison with DOM, SAX and PULL <br />http://vtd-xml.sourceforge.net/userGuide/5.html<br />Motivations Behind VTD-XML<br />3<br />
  4. 4. <ul><li>The next generation XML processing model that is simultaneously:
  5. 5. The world's most memory-efficient random-access XML parser.
  6. 6. The world's fastest XML parser
  7. 7. The world's fastest XPath 1.0 implementation.
  8. 8. The world's most efficient XML indexer that seamlessly integrates with your XML applications.
  9. 9. The world's only incremental-update capable XML parser capable of cutting, pasting, splitting and assembling XML documents with max efficiency.
  10. 10. The world's only XML parser that allows you to use XPath to process 256 GB XML documents.</li></ul>Why VTD-XML?<br />4<br />
  11. 11. The scenarios that you may consider using VTD-XML<br />Large XML files that DOM can’t handle<br />Performance-critical transactional Web- Services/SOA applications<br />Native XML database applications<br />Network-based XML content switching/routing/security applications<br />When to Use VTD-XML?<br />5<br />
  12. 12. <ul><li>Not yet support external entities (those declared within DTD)
  13. 13. Not yet process DTD (return as a single VTD record)
  14. 14. Schema validation feature is planned for a future release.
  15. 15. Extreme long (>=512 chars) element/attribute names or ultra deep document (>= 255 levels) will cause parse exception
  16. 16. http://vtd-xml.sourceforge.net/userGuide/0.html</li></ul>Known Limitations<br />6<br />
  17. 17. Basic Concept<br /><ul><li>Non-extractive tokenization based on Virtual Token Descriptor (VTD): use 64-bit integers to encode offsets, lengths, token types, depths
  18. 18. The XML document is kept intact and un-decoded.</li></ul>7<br />
  19. 19. Basic Concept – cont.<br /><ul><li>In other words, in vast majority of the cases string allocation is *unnecessary*, and nothing but a waste of CPU and memory
  20. 20. VTD-XML performs many string operations directly on VTD records
  21. 21. String to VTD record comparison (both boolean and lexicographically)
  22. 22. Direct conversions from VTD records to ints, longs, floats and doubles
  23. 23. VTD record to String conversion also provided, but avoid them whenever possible for performance reasons</li></ul>8<br />
  24. 24. Basic Concept – cont.<br /><ul><li>VTD-XML’s document hierarchy consists *exclusively* of elements
  25. 25. Move a single, global cursor to different locations in the document tree
  26. 26. Many VTDNav’s methods identify a VTD record with its index value
  27. 27. -1 corresponds to “no such record”</li></ul>9<br />
  28. 28. Essential Classes<br />10<br />
  29. 29. Essential Classes – cont.<br />11<br />
  30. 30. Poor exception handling<br />Shortcomings<br />12<br />If this method does not execute properly, <br />it will just return false from parseFile method, <br />and does not report any exception message.<br />
  31. 31. Add BufferedInput Stream in parseFile method to avoid running out of read buffer max size in UNIX platform<br />Shortcomings – cont.<br />13<br />You need to modify the build.bat to rebuild VTD-XML jar file, then set it into class path.<br />//add commons-io jar file into the first line<br />javac-classpath .;D:libcommons-io-1.4commons-io-1.4.jar comximpleware*.java<br />javac comximplewarexpath*.java<br />javac comximplewareparser*.java<br />…<br />Finally, you just need to execute build.bat file. Then it will generate the brand-new jar file for you.<br />
  32. 32. Typical Programming Flows<br />Call VTDGen’s parseFile(…)<br />Start with a byte buffer containing the content of XML, call set_doc() of VTDGen<br />Call VTDGen’s loadIndex(…)<br />Call VTDGen’s parse() <br />Obtain an instance<br />VTDNav from VTDGen<br />Move VTDNav’s cursor manually to<br />various locations and perform <br />corresponding application logic<br />Instantiate autoPilot for node<br />iteration and XPath to perform<br />Corresponding application logic<br />14<br />
  33. 33. Demo<br />15<br />
  34. 34. 1. Add &lt;age&gt; tag after &lt;geneder&gt;<br />16<br />
  35. 35. 1. Add &lt;age&gt; tag after &lt;geneder&gt; – cont. <br />17<br />Compiled XPath expression<br />Binded with NTDNav<br />Assigned age value<br />Moved to gender cursor, and added <br />&lt;age&gt; tag after &lt;gender&gt; tag<br />Outputted to new xml file<br />
  36. 36. 2. Remove &lt;age&gt; tag<br />18<br />
  37. 37. 2. Remove &lt;age&gt; tag – cont. <br />19<br />Compiled XPath expression<br />Binded with NTDNav<br />Remove &lt;age&gt;<br />Outputted to new xml file<br />
  38. 38. 3. Add Contact info after &lt;age&gt; tag<br />20<br />
  39. 39. 3. Add Contact info after &lt;age&gt; tag – cont.<br />21<br />Compiled XPath expression<br />Binded with NTDNav<br />Assigned age value<br />Inserted new value after <br />&lt;gender&gt; tag<br />Outputted to new xml file<br />
  40. 40. 4. Visit XML file<br />22<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Albert, gender=男<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=02-11111111<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=0911111111<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=0915555555<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Mandy, gender=女<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=02-22222222<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=0912222222<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:339) - name=Verio, gender=男<br />2009-12-12 14:30:31,187 DEBUG xml.XmlTest.visitPersonData(XmlTest.java:341) - phone=0913333333<br />
  41. 41. 4. Visit XML file – cont.<br />23<br />Compiled XPath expression<br />Binded with NTDNav<br />
  42. 42. 4. Visit XML file – cont.<br />24<br />
  43. 43. Official site<br />http://vtd-xml.sourceforge.net/<br />Jar files, source code, sample code<br />http://sourceforge.net/projects/vtd-xml/files/<br />JavaDoc<br />http://vtd-xml.sourceforge.net/javadoc/<br />Accelerate WSS applications with VTD-XML<br />http://www.javaworld.com/javaworld/jw-01-2007/jw-01-vtd.html<br />Reference<br />25<br />
  44. 44. Process SOAP with VTD-XML<br />http://jimmyzhang.sys-con.com/node/48764/mobile<br />XPathTutorial: <br />http://www.w3schools.com/XPath/default.asp<br />Sample code<br />http://sites.google.com/site/junyuo/Home/code<br />26<br />Reference – cont.<br />

×