XML in software development  Technical overview  Lars Marius Garshol  Development Manager, Ontopia  <larsga@ontopia.net>  ...
Who speaks?  •   Lars Marius Garshol       – Development manager at Ontopia, and one of the founders       – Author of Def...
My personal XML history  • Started with XML in 1997       – started my MSc thesis on content management just as XML work  ...
Overview  • Introduction  • XML and application architecture       – impedance mismatch       – web services  • Common XML...
Introduction                         What is XML really?                         Data models                         Inter...
XML is a way to represent data  • XML is one of many ways to do this  • XML is a data format (or syntax)       – used when...
Some data representations  • Relational       – tabular, rows and columns       – used by relational databases       – pri...
So, what is XML good for?  • Well, it was created for documents...       – <p>allows <term>mixed content</term>, which is ...
Why XML is good for interchange  • Standard is done right       –   short, implementable, precise, formal, readable, hacka...
XML and architecture                         Traditional information systems                         The impedance mismatc...
Information systems  • Information-centric computing has traditionally been about    information systems  • Typically, the...
Traditional 2-tier architecture                             Application #2       Application #3                           ...
XML enters the picture© 2003-2004 Ontopia AS    13   http://www.ontopia.net/
Impedance mismatch  • The OO/RDBMS impedance mismatch                               Objects       – object-oriented langua...
A very common architecture                         Objects                  <?xml version="1.0" encoding="iso-8859-1"?>   ...
The brave new world of XML  • Originally we had the OO-RDBMS mismatch  • XML adds the OO-XML and XML-RDBMS mismatches     ...
But is XML really all bad?  • Imagine an RDBMS/object interchange format       – would support RDBMS/object export and imp...
So, what to do?  • XML is already here       – all the big vendors are pushing it       – government standards and custome...
An example application  • From January 2003 the EU required all member states to    submit individual case safety reports ...
Architecture of the application© 2003-2004 Ontopia AS   20        http://www.ontopia.net/
The internals of the application© 2003-2004 Ontopia AS   21         http://www.ontopia.net/
The XML part                                           Obj.model                         Export   Import© 2003-2004 Ontopi...
Native XML databases  • XML databases have been on the rise for the past few years       – these are databases whose stora...
Using an XML database© 2003-2004 Ontopia AS   24   http://www.ontopia.net/
Other considerations  • Using an XML database would have simplified the regional    applications       – no need for the o...
A different kind of information system  • RSS is       – a simple XML format for newsfeeds       – probably the simplest u...
Information system?    topicmaps.bond.edu.au                                                                              ...
Web services                         What they are                         The promise of web services© 2003-2004 Ontopia ...
What is a web service, anyway?  • Basically any software service made available over http       – must be intended to be i...
SOAP  • Essentially a wrapper for XML messages  • Consists of       – a header (with routing information etc)       – a bo...
Web services and architecture                         Web service               <?xml version="1.0" encoding="iso-8859-1"?...
The promise of web services  • Connect legacy applications  • Create services anyone can connect to and use  • Integrate d...
A word of caution  • Weve heard all this before  • CORBA was widely touted as doing the same thing in the    90s       –  ...
Another caution  • Integrating applications is not really the issue       – what is necessary is to integrate the informat...
What web services are, second try  • Web services are an idea more than anything else  • In some cases new technology make...
Web services?    topicmaps.bond.edu.au                                 <?xml version="1.0" encoding="iso-8859-1"?>        ...
Common XML challenges                         Import/export                         Important groups of tools             ...
Deserialization  • That is, building an object structure from XML  • Usually involves some level of validation as well  • ...
SAX  • Standard for event-based parser APIs       –   passes the document to the application piece by piece       –   some...
DOM  • Presents the document as an object structure  • W3C Recommendation       – widely supported and widely derided     ...
SAX vs DOM  • Or, rather, event-based vs tree-based       – most XML technologies use one of these two approaches       – ...
XPath  • A simple query language for XML       – remarkably simple to learn given its expressive power       – graph-trave...
Data binding tools  • Tools that simplify serialization and deserialization       – automate as much as possible of those ...
Validation  • Validation is to ensure the correctness of incoming data       –   that every <person> has a <birth-date>   ...
Schema languages  • DTDs       – part of XML 1.0, but only supports structural constraints       – serious problem: the do...
Serialization  • The opposite of deserialization: writing XML from objects  • Straightforward, but some pitfalls       – r...
Importing XML to an RDBMS  • A form of deserialization, but with issues of its own  • Typical issues are       –   how to ...
Writing XML from an RDBMS  • A special kind of serialization  • Much easier than going the other way  • Main problem is ma...
XQuery  • The query language for XML databases in the future  • Embeds XPath inside a functional programming language  • M...
SQL/XML  • ISO SC32 is working on adding XML support to SQL       – this involves columns whose data type is XML       – o...
Wrapping up                         What XML means for developers                         Resources to learn more© 2003-20...
XML and software development  • The possibilities for interchange and integration are not new       – XML makes them easie...
Where to learn more  • http://www.xml.com  • http://www.cafeconleche.org  • The XML-DEV mailing list  • http://www.w3.org/...
Upcoming SlideShare
Loading in …5
×

XML in software development

2,322 views

Published on

Old presentation about the architectural implications of XML.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,322
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XML in software development

  1. 1. XML in software development Technical overview Lars Marius Garshol Development Manager, Ontopia <larsga@ontopia.net> 2004-09-29© 2003-2004 Ontopia AS 1 http://www.ontopia.net/
  2. 2. Who speaks? • Lars Marius Garshol – Development manager at Ontopia, and one of the founders – Author of Definitive XML Application Development, published by Prentice-Hall – Wrote the xmlproc validating parser in Python – Responsible for translation of SAX to Python – Member of ISO SC34, which developed SGML – Editor of parts of the topic map standard (ISO 13250-2 and 13250-3) – Editor of the TMQL standard (topic map query language, ISO 18048) – Maintainer of Free XML Tools web site • Ontopia – Leading vendor of topic map software – “The Oracle of Topic Maps” – Norwegian company with partners world-wide© 2003-2004 Ontopia AS 2 http://www.ontopia.net/
  3. 3. My personal XML history • Started with XML in 1997 – started my MSc thesis on content management just as XML work was taking off – followed the XML process from the start – believed all the promises that XML would make it possible to find information and exchange anything with anyone • Now I work with topic maps – XML turned out not to be what I was looking for – many of the supporting standards I do not think good enough – am now a bitter and disappointed man – will return to this at the end© 2003-2004 Ontopia AS 3 http://www.ontopia.net/
  4. 4. Overview • Introduction • XML and application architecture – impedance mismatch – web services • Common XML-related tasks – XML tools and standards • Conclusion© 2003-2004 Ontopia AS 4 http://www.ontopia.net/
  5. 5. Introduction What is XML really? Data models Interchange and storage© 2003-2004 Ontopia AS 5 http://www.ontopia.net/
  6. 6. XML is a way to represent data • XML is one of many ways to do this • XML is a data format (or syntax) – used when storing XML in files – also used when transmitting XML • XML has several data models – used in APIs, XML databases, and query languages – some support for this, not main usage© 2003-2004 Ontopia AS 6 http://www.ontopia.net/
  7. 7. Some data representations • Relational – tabular, rows and columns – used by relational databases – primary focus on storage, limited interchange with CSV files • Object-oriented – objects with properties and methods – used by most programming languages today – primary focus on application-internal representation – some interchange, also some database support • XML – tree of labeled nodes – primary focus on interchange – some database support© 2003-2004 Ontopia AS 7 http://www.ontopia.net/
  8. 8. So, what is XML good for? • Well, it was created for documents... – <p>allows <term>mixed content</term>, which is unusual</p> – also strictly preserves order everywhere (except for attributes) • XML works very well for documents • XML also works for data – however, the document features make it more complicated than necessary for implementors – in use it is quite straightforward, though weak on references – for storage it is not optimal – for interchange it is still the best alternative© 2003-2004 Ontopia AS 8 http://www.ontopia.net/
  9. 9. Why XML is good for interchange • Standard is done right – short, implementable, precise, formal, readable, hackable – everything is Unicode all the way: no internationalization problems – Draconian error handling forces users to do things right – schema languages make validation simple and effective • Everyone agrees on the standard – Microsoft, Sun, IBM, Oracle, you-name-it • Lots of high-quality tools – parsers tend to be fast, highly conformant, and robust – lots and lots of higher-level tools make life easier – tools available for all languages and platforms© 2003-2004 Ontopia AS 9 http://www.ontopia.net/
  10. 10. XML and architecture Traditional information systems The impedance mismatch An example XML application© 2003-2004 Ontopia AS 10 http://www.ontopia.net/
  11. 11. Information systems • Information-centric computing has traditionally been about information systems • Typically, these were clusters of applications with a database at the center • Originally, the business logic would reside in the database • With n-tier architecture it was encapsulated by an object layer • The basic concept has remained the same, however© 2003-2004 Ontopia AS 11 http://www.ontopia.net/
  12. 12. Traditional 2-tier architecture Application #2 Application #3 Application #1 Application #4 Database© 2003-2004 Ontopia AS 12 http://www.ontopia.net/
  13. 13. XML enters the picture© 2003-2004 Ontopia AS 13 http://www.ontopia.net/
  14. 14. Impedance mismatch • The OO/RDBMS impedance mismatch Objects – object-oriented languages use objects with properties – RDBMSs use tables – these two data models do not match, and mapping between them requires substantial effort • Common solutions mismatch – attempt to isolate RDBMS interaction in an application module – use object-relational mapping tools – give up, just plunge in, and create a horrible mess • Conclusion – the problem is real, but with effort it can be handled RDBMS© 2003-2004 Ontopia AS 14 http://www.ontopia.net/
  15. 15. A very common architecture Objects <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN" "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "&lt;?xml"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879"> <!ENTITY pic "?>"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY com "--"> mismatch <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp " "> <!ENTITY magicents "<code>amp</code>, <code>lt</code>, <code>gt</code>, <code>apos</code>, <code>quot</code>"> <!ENTITY doc.audience "public review and discussion"> <!ENTITY doc.distribution "may be distributed freely, as long as all text and legal notices remain intact"> ]> <spec w3c-doctype="rec"> mismatch XML mismatch RDBMS© 2003-2004 Ontopia AS 15 http://www.ontopia.net/
  16. 16. The brave new world of XML • Originally we had the OO-RDBMS mismatch • XML adds the OO-XML and XML-RDBMS mismatches – in other words: yet another issue for developers to deal with • Solutions are much the same – use data binding tools (well return to these) – restrict XML code to a specific module – give up and create a mess • Conclusion – interchange is complicated, and there is no silver bullet© 2003-2004 Ontopia AS 16 http://www.ontopia.net/
  17. 17. But is XML really all bad? • Imagine an RDBMS/object interchange format – would support RDBMS/object export and import with no effort – already exists in XML form – still need transformations, because different applications are unlikely to use, or want to use, the same internal structure • XML enforces loose coupling of data – since internal and external representation are so different • Other benefits – it supports easy XML-to-XML transformations, which makes information interchange easier – it is human-readable, which makes debugging and understanding easier • However, life could still be easier than it is© 2003-2004 Ontopia AS 17 http://www.ontopia.net/
  18. 18. So, what to do? • XML is already here – all the big vendors are pushing it – government standards and customers require it – the open source community has embraced it • In short, we just have to live with it now© 2003-2004 Ontopia AS 18 http://www.ontopia.net/
  19. 19. An example application • From January 2003 the EU required all member states to submit individual case safety reports for drugs • Basically, every time someone suffers side-effects from a drug, this is to be reported to EMEA in London • A standardized XML format is used for this • Ontopia developed the solution used by Norwegian authorities© 2003-2004 Ontopia AS 19 http://www.ontopia.net/
  20. 20. Architecture of the application© 2003-2004 Ontopia AS 20 http://www.ontopia.net/
  21. 21. The internals of the application© 2003-2004 Ontopia AS 21 http://www.ontopia.net/
  22. 22. The XML part Obj.model Export Import© 2003-2004 Ontopia AS 22 http://www.ontopia.net/
  23. 23. Native XML databases • XML databases have been on the rise for the past few years – these are databases whose storage model is XML – in other words, they store XML directly – query languages tend to be XPath and/or XQuery • Reasons for using XML databases include – supports semi-structured data – may be faster when only specific views wanted (fewer joins) – well suited to document storage • Reasons not to use them are – have to choose between poor architecture and impedance mismatch – few mature products yet – SQL and RDBMSs usually do the same job better – unfamiliar technology© 2003-2004 Ontopia AS 23 http://www.ontopia.net/
  24. 24. Using an XML database© 2003-2004 Ontopia AS 24 http://www.ontopia.net/
  25. 25. Other considerations • Using an XML database would have simplified the regional applications – no need for the object model, since application is a simple editor – however, validation would have been somewhat awkward to add • The central application is different, however – limited need for editing – main need is advanced reporting – advanced reporting means complex queries and joins – XML databases are not well suited for this – solution also needs support for replication, which few XML DBs have • Architecturally, this means going back to the 2-tier model – might work in this case, unlikely to work in the general case© 2003-2004 Ontopia AS 25 http://www.ontopia.net/
  26. 26. A different kind of information system • RSS is – a simple XML format for newsfeeds – probably the simplest useful XML application there is – probably the most widespread XML application • Today there are – tens of thousands of RSS feeds – lots of news aggregation sites using RSS – lots of desktop tools for reading RSS feeds directly© 2003-2004 Ontopia AS 26 http://www.ontopia.net/
  27. 27. Information system? topicmaps.bond.edu.au <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "&lt;?xml"> <!ENTITY pio "&lt;?xml"> <!ENTITY doc.date "10 February 1998"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY pic "?>"> <!ENTITY pic "?>"> <!ENTITY br "n"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY mdash "--"> <!ENTITY com "--"> <?xml version="1.0" encoding="iso-8859-1"?> <?xml version="1.0" encoding="iso-8859-1"?> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY como "--"> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi <!ENTITY comc "--"> <!ENTITY comc "--"> "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ Publishing <!ENTITY hcro "&amp;#x"> <!--ArborText, Inc., 1988-2000, v.4002--> <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp " "> <!ENTITY nbsp " "> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "&lt;?xml"> <!ENTITY pio "&lt;?xml"> application <!ENTITY doc.date "10 February 1998"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY pic "?>"> <!ENTITY pic "?>"> <!ENTITY br "n"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY mdash "--"> <!ENTITY com "--"> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp " "> <!ENTITY nbsp " "> weblogs.rss weblogs.html bloogz.com weblogs.com User desktop RSS Web reader browser© 2003-2004 Ontopia AS 27 http://www.ontopia.net/
  28. 28. Web services What they are The promise of web services© 2003-2004 Ontopia AS 28 http://www.ontopia.net/
  29. 29. What is a web service, anyway? • Basically any software service made available over http – must be intended to be invoked by another piece of software – line is somewhat blurry: is Google a web service? MapQuest? • Two schools of thought: – REST holds that http + XML has all that is needed – the SOAP camp wants special protocols and standards • In practice we see both – REST is good because it fits seamlessly into the existing web – SOAP is good because it has better tool support • Make your choice based on what is important for you© 2003-2004 Ontopia AS 29 http://www.ontopia.net/
  30. 30. SOAP • Essentially a wrapper for XML messages • Consists of – a header (with routing information etc) – a body (which holds the message) • Very little is defined in terms of message structure • Effectively, SOAP encapsulates XML, and you must figure out how to deal with the XML yourself – this is not very different from how HTTP would encapsulate XML© 2003-2004 Ontopia AS 30 http://www.ontopia.net/
  31. 31. Web services and architecture Web service <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN" "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/REC-xml"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> http <!ENTITY versionOfXML "1.0"> <!ENTITY pio "&lt;?xml"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations Annex to ISO 8879"> <!ENTITY pic "?>"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp " "> <!ENTITY magicents "<code>amp</code>, <code>lt</code>, <code>gt</code>, <code>apos</code>, <code>quot</code>"> <!ENTITY doc.audience "public review and discussion"> <!ENTITY doc.distribution "may be distributed freely, as long as all text and legal notices remain intact"> ]> <spec w3c-doctype="rec"> XML© 2003-2004 Ontopia AS 31 http://www.ontopia.net/
  32. 32. The promise of web services • Connect legacy applications • Create services anyone can connect to and use • Integrate disparate applications across the enterprise • Publish your service in a web service marketplace – people can find it using UDDI and bind to it dynamically with WSDL – you will, of course, charge them for this© 2003-2004 Ontopia AS 32 http://www.ontopia.net/
  33. 33. A word of caution • Weve heard all this before • CORBA was widely touted as doing the same thing in the 90s – applications connecting to each other over the net – CORBA as the enterprise-wide “bus” connecting all applications – directory services and dynamic service binding – component brokers and online trading • CORBA did the first, but not the last three – political, economic, and legal issues intruded – information integration turns out to be difficult – dynamic service binding was harder than anyone thought • In short, exposing services on the net works – be skeptical about the rest© 2003-2004 Ontopia AS 33 http://www.ontopia.net/
  34. 34. Another caution • Integrating applications is not really the issue – what is necessary is to integrate the information – XML is about information, but its not really designed for integration • XML has no notion of identity – no way to say when two elements represent the same thing – nothing tells you what to do when two elements do represent the same thing • Knowledge technologies are about identity – they have rules for identity and merging – better suited for information integration – thus also for application integration© 2003-2004 Ontopia AS 34 http://www.ontopia.net/
  35. 35. What web services are, second try • Web services are an idea more than anything else • In some cases new technology makes it easier to apply • The idea is what matters, however – seeing the possibilities and trying to make use of them – which way you do it always matters less than doing it© 2003-2004 Ontopia AS 35 http://www.ontopia.net/
  36. 36. Web services? topicmaps.bond.edu.au <?xml version="1.0" encoding="iso-8859-1"?> <?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi <!DOCTYPE spec PUBLIC "-//W3C//DTD Specifi "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [ Publishing <!--ArborText, Inc., 1988-2000, v.4002--> <!--ArborText, Inc., 1988-2000, v.4002--> <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY http-ident "http://www.w3.org/TR/2000/ <!ENTITY draft.month "October"> <!ENTITY draft.month "October"> <!ENTITY draft.day "6"> <!ENTITY draft.day "6"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY iso6.doc.date "20001006"> <!ENTITY draft.year "2000"> <!ENTITY draft.year "2000"> <!ENTITY versionOfXML "1.0"> <!ENTITY versionOfXML "1.0"> <!ENTITY pio "&lt;?xml"> <!ENTITY pio "&lt;?xml"> application <!ENTITY doc.date "10 February 1998"> <!ENTITY doc.date "10 February 1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY w3c.doc.date "02-Feb-1998"> <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY WebSGML "WebSGML Adaptations A <!ENTITY pic "?>"> <!ENTITY pic "?>"> <!ENTITY br "n"> <!ENTITY br "n"> <!ENTITY cellback "#c0d9c0"> <!ENTITY cellback "#c0d9c0"> <!ENTITY mdash "--"> <!ENTITY mdash "--"> <!ENTITY com "--"> <!ENTITY com "--"> <!ENTITY como "--"> <!ENTITY como "--"> <!ENTITY comc "--"> <!ENTITY comc "--"> <!ENTITY hcro "&amp;#x"> <!ENTITY hcro "&amp;#x"> <!ENTITY nbsp " "> <!ENTITY nbsp " "> weblogs.rss weblogs.html bloogz.com weblogs.com User desktop RSS Web reader browser© 2003-2004 Ontopia AS 36 http://www.ontopia.net/
  37. 37. Common XML challenges Import/export Important groups of tools Validation Using XML databases© 2003-2004 Ontopia AS 37 http://www.ontopia.net/
  38. 38. Deserialization • That is, building an object structure from XML • Usually involves some level of validation as well • Several ways to do this – use SAX, which is low-level but fast – use DOM, which is high-level and awful – use XPath, which lets you extract information easily – use a data binding tool© 2003-2004 Ontopia AS 38 http://www.ontopia.net/
  39. 39. SAX • Standard for event-based parser APIs – passes the document to the application piece by piece – somewhat like staring at a parade through a keyhole – very fast, consumes no memory at all – suitable for applications where • documents may be big • documents require heavy processing • De-facto standard created by self-appointed group – supported by pretty much every parser there is – effectively the foundation for all XML work in Java – less standardized in other languages© 2003-2004 Ontopia AS 39 http://www.ontopia.net/
  40. 40. DOM • Presents the document as an object structure • W3C Recommendation – widely supported and widely derided – in most programming languages better alternatives are found – in Java JDOM and XOM are good alternatives • Downsides – this approach requires the entire document to be loaded into memory – using an API is awkward, whether tree-based or event-based© 2003-2004 Ontopia AS 40 http://www.ontopia.net/
  41. 41. SAX vs DOM • Or, rather, event-based vs tree-based – most XML technologies use one of these two approaches – understanding the difference is important in order to choose correctly • Essentially the difference is this – event-based solutions require less resources – however, they make many common operations too hard to be practical – tree-based solutions are slower and use more memory – but there is no limit on what you can do • Which approach is the right one depends on the requirements© 2003-2004 Ontopia AS 41 http://www.ontopia.net/
  42. 42. XPath • A simple query language for XML – remarkably simple to learn given its expressive power – graph-traversal semantics • Simplifies extracting information from XML enormously – probably the single most important XML specification – used in query languages, mapping tools, schema languages, ... • Much less powerful than SQL – cant return structured results, only a list of values – limited support for handling reference relationships – no support for aggregate functions© 2003-2004 Ontopia AS 42 http://www.ontopia.net/
  43. 43. Data binding tools • Tools that simplify serialization and deserialization – automate as much as possible of those tasks – some generate the object model for you – others let you map the XML to your object model • Most such tools have limitations – no support for mixed content – no support for element order – ignore comments, processing instructions, and entities – limited support for references • When suitable they can simplify development considerably – some event-based, others tree-based • The biggest hurdle is picking the right one and learning it© 2003-2004 Ontopia AS 43 http://www.ontopia.net/
  44. 44. Validation • Validation is to ensure the correctness of incoming data – that every <person> has a <birth-date> – that every <birth-date> is a valid date – that every <death-date> is later than the <birth-date> – that the <birth-date> is actually correct – ... • These three constraints can be grouped into – structural constraints – type constraints – “semantic” constraints – “existential” constraint • Schema languages can be used to define the first two – application logic must usually be used for the semantic ones – the last category is for human beings© 2003-2004 Ontopia AS 44 http://www.ontopia.net/
  45. 45. Schema languages • DTDs – part of XML 1.0, but only supports structural constraints – serious problem: the document says which schema to use • XML Schema – has both structural and type constraints – W3C Recommendation, widely supported and widely criticized • RELAX-NG – has very strong structural and type constraints – ISO standard, growing support, and widely praised • Schematron – weak structural and type constraints, strong on semantic constraints – constraints specified with XPath – about to become an ISO standard© 2003-2004 Ontopia AS 45 http://www.ontopia.net/
  46. 46. Serialization • The opposite of deserialization: writing XML from objects • Straightforward, but some pitfalls – remember to quote special XML characters everywhere – handling character encodings correctly – handling namespaces correctly • Validation usually part of testing, but otherwise not an issue – one assumes the object structure is already valid • Again several ways to do it – use simple print statements, and do all the above yourself – use a SAX2XML tool, which will handle the above for you – build a DOM instance, then write it out (slow and awkward) – use a data binding tool (has limitations)© 2003-2004 Ontopia AS 46 http://www.ontopia.net/
  47. 47. Importing XML to an RDBMS • A form of deserialization, but with issues of its own • Typical issues are – how to represent mixed content, if allowed – dealing with referential integrity – data typing – recognizing null values – validation • Again, there are many ways to do this – just hack it in – having an XML-to-OO mapper and an OO-to-RDBMS mapper – using a data binding tool© 2003-2004 Ontopia AS 47 http://www.ontopia.net/
  48. 48. Writing XML from an RDBMS • A special kind of serialization • Much easier than going the other way • Main problem is matching the desired output format • Several tools to do this – template-based approaches where SQL is embedded in the XML – extensions to SQL that allow XML element constructors in SELECT – some allow XSLT transformations of the initial output© 2003-2004 Ontopia AS 48 http://www.ontopia.net/
  49. 49. XQuery • The query language for XML databases in the future • Embeds XPath inside a functional programming language • Much more powerful than XPath • Progress on XQuery is slow, but language highly regarded – XPath 2.0, which is used in XQuery, has usability problems with the data typing approach taken • Likely to become an important tool in the future© 2003-2004 Ontopia AS 49 http://www.ontopia.net/
  50. 50. SQL/XML • ISO SC32 is working on adding XML support to SQL – this involves columns whose data type is XML – one assumes XPath expressions can be applied to these – probably also support for XML output • RDBMS vendors are committed to this • SQL/XML is likely to be a key building block in the future – simplifies XML storage in databases – does not, however, remove the impedance mismatch • SQL/XML may well become an XQuery killer© 2003-2004 Ontopia AS 50 http://www.ontopia.net/
  51. 51. Wrapping up What XML means for developers Resources to learn more© 2003-2004 Ontopia AS 51 http://www.ontopia.net/
  52. 52. XML and software development • The possibilities for interchange and integration are not new – XML makes them easier to achieve – XML makes us think of these possibilities in ways we didnt before • In practice, this means more work for developers – new lists of acronyms to learn and master – new kinds of tasks compared to earlier • XML makes life harder, but its worth it© 2003-2004 Ontopia AS 52 http://www.ontopia.net/
  53. 53. Where to learn more • http://www.xml.com • http://www.cafeconleche.org • The XML-DEV mailing list • http://www.w3.org/TR/ • “Definitive XML Application Development” by me, published by Prentice-Hall© 2003-2004 Ontopia AS 53 http://www.ontopia.net/

×