Large output in XMLwith Unicode and namespace         Thomas Aglassinger          http://roskakori.at
We wanted to write this:
We wanted to write this:     XML
We wanted to write this:     XML      Unicode
We wanted to write this:                         Name     XML      Unicode                        spaces
We wanted to write this:                                 NameLarge        XML      Unicode                                ...
We already knew how to read XML.●   xml.dom.minidom.parse()●   xml.etree.ElementTree.parse()●   xml.sax.parse()●   lxml.et...
So we went to the Python Library...         http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg
...and we were like:     http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)
xml.dom.minidom
xml.dom.minidom            Explicit attribute            for name space            Verbose way to             add attribut...
xml.dom.minidom●   “Users [...] who would like to write less code    for processing XML files should consider using    the...
xml.etree
xml.etree          Clark notation                                instead of XPath RequiresPython 2.7 Generally shorter    ...
Memory issues●   So far, XML-Document is built in memory●   Wont work well for large sets of data●   We need a streaming i...
codecs.open() and write()
codecs.open() and write()                          It just doent                            feel “right” Manualescaping
saxutils.XMLGenerator
saxutils.XMLGeneratorNo support for <x/>,   only <x></x>
Lack of basic validation●   Are all tags closed?●   In the correct order?●   Has a namespace been    registered before usa...
So we had to go all kinky...          http://mylittlefacewhen.com/f/3781/
...and write yet-another XML module             http://www.110pounds.com/?p=6880
Before you judge too harshly:            It just        writes XML!
loxun
loxun                            Compact Defaults to               namespaceUTF-8 output                 syntax           ...
loxun                   Supports                      with-                   statement                Streaming interface...
Raises XmlError if...●   ...you add references to undefined name    spaces●   ...if you forget to close tags (elements)●  ...
Available from:●   http://pypi.python.org/pypi/loxun/●   https://github.com/roskakori/loxun●   Open Source                ...
Try loxun for:     Large output in XMLwith Unicode and namespace                        Also writes                       ...
Upcoming SlideShare
Loading in …5
×

Large output in xml with unicode and namespace

1,430 views

Published on

Lightning talk for EuroPython 2012 on how to write large XML files with unicode characters and namespaces in Python. The video of the talk is available from http://www.youtube.com/watch?v=f1t2M2wcY2k#t=26m20s. The source code examples can be found at https://gist.github.com/3067859.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,430
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Large output in xml with unicode and namespace

  1. 1. Large output in XMLwith Unicode and namespace Thomas Aglassinger http://roskakori.at
  2. 2. We wanted to write this:
  3. 3. We wanted to write this: XML
  4. 4. We wanted to write this: XML Unicode
  5. 5. We wanted to write this: Name XML Unicode spaces
  6. 6. We wanted to write this: NameLarge XML Unicode spaces
  7. 7. We already knew how to read XML.● xml.dom.minidom.parse()● xml.etree.ElementTree.parse()● xml.sax.parse()● lxml.etree.parse() http://encyclopediadramatica.se/File:Bill_Nye_Expert.jpg
  8. 8. So we went to the Python Library... http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg
  9. 9. ...and we were like: http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)
  10. 10. xml.dom.minidom
  11. 11. xml.dom.minidom Explicit attribute for name space Verbose way to add attributes Many lines of code
  12. 12. xml.dom.minidom● “Users [...] who would like to write less code for processing XML files should consider using the xml.etree.ElementTree module instead” (The Python Standard Library, Chapter 19, Structured Markup) http://www.destructoid.com/blogs/Sevre/femshep-5-a-space-opera-208844.phtml
  13. 13. xml.etree
  14. 14. xml.etree Clark notation instead of XPath RequiresPython 2.7 Generally shorter Similar issues but wider with lxml.etree
  15. 15. Memory issues● So far, XML-Document is built in memory● Wont work well for large sets of data● We need a streaming interface
  16. 16. codecs.open() and write()
  17. 17. codecs.open() and write() It just doent feel “right” Manualescaping
  18. 18. saxutils.XMLGenerator
  19. 19. saxutils.XMLGeneratorNo support for <x/>, only <x></x>
  20. 20. Lack of basic validation● Are all tags closed?● In the correct order?● Has a namespace been registered before usage?
  21. 21. So we had to go all kinky... http://mylittlefacewhen.com/f/3781/
  22. 22. ...and write yet-another XML module http://www.110pounds.com/?p=6880
  23. 23. Before you judge too harshly: It just writes XML!
  24. 24. loxun
  25. 25. loxun Compact Defaults to namespaceUTF-8 output syntax Compact attribute syntax
  26. 26. loxun Supports with- statement Streaming interface for low memory usage No dependencies on other modulesPure Python 2.5+ Optimizes <x></x> to simply <x/>
  27. 27. Raises XmlError if...● ...you add references to undefined name spaces● ...if you forget to close tags (elements)● ...if you build non-well formed documents● ...if you add non-ASCII characters in 8-bit strings
  28. 28. Available from:● http://pypi.python.org/pypi/loxun/● https://github.com/roskakori/loxun● Open Source $ sudo pip install loxun Downloading/unpacking loxun Downloading loxun-1.3.zip Running setup.py egg_info for package loxun Installing collected packages: loxun Running setup.py install for loxunCode examples for this talk: Successfully installed loxunhttps://gist.github.com/3067859
  29. 29. Try loxun for: Large output in XMLwith Unicode and namespace Also writes small ASCII files!

×