Successfully reported this slideshow.

Large output in xml with unicode and namespace

0

Share

Upcoming SlideShare
ApiDesign
ApiDesign
Loading in …3
×
1 of 29
1 of 29

Large output in xml with unicode and namespace

0

Share

Download to read offline

Lightning talk for EuroPython 2012 on how to write large XML files with unicode characters and namespaces in Python. The video of the talk is available from http://www.youtube.com/watch?v=f1t2M2wcY2k#t=26m20s. The source code examples can be found at https://gist.github.com/3067859.

Lightning talk for EuroPython 2012 on how to write large XML files with unicode characters and namespaces in Python. The video of the talk is available from http://www.youtube.com/watch?v=f1t2M2wcY2k#t=26m20s. The source code examples can be found at https://gist.github.com/3067859.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Large output in xml with unicode and namespace

  1. 1. Large output in XML with Unicode and namespace Thomas Aglassinger http://roskakori.at
  2. 2. We wanted to write this:
  3. 3. We wanted to write this: XML
  4. 4. We wanted to write this: XML Unicode
  5. 5. We wanted to write this: Name XML Unicode spaces
  6. 6. We wanted to write this: Name Large XML Unicode spaces
  7. 7. We already knew how to read XML. ● xml.dom.minidom.parse() ● xml.etree.ElementTree.parse() ● xml.sax.parse() ● lxml.etree.parse() http://encyclopediadramatica.se/File:Bill_Nye_Expert.jpg
  8. 8. So we went to the Python Library... http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg
  9. 9. ...and we were like: http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)
  10. 10. xml.dom.minidom
  11. 11. xml.dom.minidom Explicit attribute for name space Verbose way to add attributes Many lines of code
  12. 12. xml.dom.minidom ● “Users [...] who would like to write less code for processing XML files should consider using the xml.etree.ElementTree module instead” (The Python Standard Library, Chapter 19, Structured Markup) http://www.destructoid.com/blogs/Sevre/femshep-5-a-space-opera-208844.phtml
  13. 13. xml.etree
  14. 14. xml.etree Clark notation instead of XPath Requires Python 2.7 Generally shorter Similar issues but wider with lxml.etree
  15. 15. Memory issues ● So far, XML-Document is built in memory ● Won't work well for large sets of data ● We need a streaming interface
  16. 16. codecs.open() and write()
  17. 17. codecs.open() and write() It just doen't feel “right” Manual escaping
  18. 18. saxutils.XMLGenerator
  19. 19. saxutils.XMLGenerator No support for <x/>, only <x></x>
  20. 20. Lack of basic validation ● Are all tags closed? ● In the correct order? ● Has a namespace been registered before usage?
  21. 21. So we had to go all kinky... http://mylittlefacewhen.com/f/3781/
  22. 22. ...and write yet-another XML module http://www.110pounds.com/?p=6880
  23. 23. Before you judge too harshly: It just writes XML!
  24. 24. loxun
  25. 25. loxun Compact Defaults to namespace UTF-8 output syntax Compact attribute syntax
  26. 26. loxun Supports with- statement Streaming interface for low memory usage No dependencies on other modules Pure Python 2.5+ Optimizes <x></x> to simply <x/>
  27. 27. Raises XmlError if... ● ...you add references to undefined name spaces ● ...if you forget to close tags (elements) ● ...if you build non-well formed documents ● ...if you add non-ASCII characters in 8-bit strings
  28. 28. Available from: ● http://pypi.python.org/pypi/loxun/ ● https://github.com/roskakori/loxun ● Open Source $ sudo pip install loxun Downloading/unpacking loxun Downloading loxun-1.3.zip Running setup.py egg_info for package loxun Installing collected packages: loxun Running setup.py install for loxun Code examples for this talk: Successfully installed loxun https://gist.github.com/3067859
  29. 29. Try loxun for: Large output in XML with Unicode and namespace Also writes small ASCII files!

×