XML en NoSQL

3,451 views

Published on

Presented for the database course at Kortrijk on 31 March 2011...

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,451
On SlideShare
0
From Embeds
0
Number of Embeds
156
Actions
Shares
0
Downloads
87
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XML en NoSQL

  1. 1. Gegevensbanken: toekomst...Kortrijk, 31 March 2011Erik Duvalhttp://erikduval.wordpress.com@ErikDuval 1Thursday 31 March 2011
  2. 2. http://www.slideshare.net/erik.duvalThursday 31 March 2011 2
  3. 3. which database holds the web? 3Thursday 31 March 2011
  4. 4. • XML • NoSQL (Met dank aan Steven Noels) 4Thursday 31 March 2011
  5. 5. ? XML ? 5Thursday 31 March 2011
  6. 6. Thursday 31 March 2011 6 http://en.wikipedia.org/wiki/Extensible_Markup_Language
  7. 7. Thursday 31 March 2011 7 http://www.itjobboard.be/ICT-banen/xml/Belgie/alle/0/relevantie/nl/
  8. 8. 8 http://www.khbo.be/12385Thursday 31 March 2011
  9. 9. 9 http://www.w3.org/XMLThursday 31 March 2011
  10. 10. 10 http://www.w3c.it/talks/2005/openCulture/slide7-0.htmlThursday 31 March 2011
  11. 11. Thursday 31 March 2011 11 http://en.wikipedia.org/wiki/List_of_XML_markup_languages
  12. 12. XML is not ... • Extension of HTML • XHTML is XML-compliant, and extensible • Just for Web pages • Useful when data are stored or exchanged • Concerned with semantics • XML does not define semantics, just syntax • Innovative new technology • Standard, building on existing technology • Only a hype • Though alsoThursday 31 March 2011 12
  13. 13. XML is ... • Endorsed by W3C and major companies • Extensible • No tag name limitations • No language limitations • Human software developer-readable • Can be processed with basic text tools • Open standard • no vendor lock-in (in theory...) • Easy to implement • powerful, cheap (free), off-the-shelf XML toolsThursday 31 March 2011 13
  14. 14. when was XML invented? 14Thursday 31 March 2011
  15. 15. • 1969: SGML (Standard Generalized Markup Language) • Meta-language: describe other languages • Powerful, but rather complicated • 1986: ISO standard • 1992: HTML (HyperText Markup Language) • Based on SGML • Simple, but limited • 1996: Start design of XML • By World Wide Web Consortium (W3C) • 1998: Publication of XML 1.0 15Thursday 31 March 2011
  16. 16. Design Goals • Easy to use over the Internet • Power of SGML • Simplicity of HTML • Human-legible • Easy to create • Compactness is not an issue • “The ASCII of the Web” 16Thursday 31 March 2011
  17. 17. what does XML look like? 17Thursday 31 March 2011
  18. 18. XML Basics <Person> <Name> <First>Thomas</First> <Last>Atkinson</Last> </Name> <Age>30</Age> </Person> • Self-defined, meaningful tags • Separate data and its representation 18Thursday 31 March 2011
  19. 19. • Language for defining syntax • Records and fields have explicit boundaries • parse-able without knowing structure (self-descriptive) • Unicode support (UTF-8, UTF-16, ...) • Web-aware • DTD, ENTITY and Schema can be loaded through URL • Strictly parsed: no ambiguity (case sensitive!) • Extensible: namespaces 19Thursday 31 March 2011
  20. 20. <?xml version="1.0” encoding=“UTF-8”?> <!-- processing instruction: XML follows --> <!DOCTYPE addressbook SYSTEM "http://www/~koenh/ddml/addressbook.dtd”> <!-- Document Type Declaration... --> <!-- ExternalDTDPointer --> <addressbook> <!--root element --> <person first-name="John" family-name="Doe” employee-number="1234"> <contact-info> <email address="Jdoe@home.com"/> </contact-info> <address street="Celestijnenlaan” number="200A"/> </person></addressbook> 20Thursday 31 March 2011
  21. 21. <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  22. 22. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  23. 23. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  24. 24. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  25. 25. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  26. 26. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  27. 27. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 21Thursday 31 March 2011
  28. 28. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element • Major differences: • Case sensitive • Proper nesting: No <A> … <B> … </A> … </B> • Unicode instead of ASCII 21Thursday 31 March 2011
  29. 29. Vocabularies • Agreed-upon XML tag sets for specific domain • Examples • Chemical Markup Language (CML) • Business: ebXML, RosettaNet, BizTalk • Mathematics: MathML • Multimedia: Synchronized Multimedia Integration Language (SMIL) • Etc. 22Thursday 31 March 2011
  30. 30. • well-formed: follows XML syntax • Proper tag and attribute names • Tags properly closed • Attributes and text between tags do not contain ‘<‘ (escape with &lt;) • valid: well-formed and vocabulary • All elements and their attributes declared in DTD • Attribute values follow DTD type declarations • CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated • Nesting and sequencing of elements follows DTD 23Thursday 31 March 2011
  31. 31. Elements • XML’s container for • Attributes • Character data • Other elements (“child” elements) • Delimited by opening and closing tags • Non-empty element: <name>..</name> • Empty element: <name/> • Form a simple hierarchic tree • Root = “document element” 24Thursday 31 March 2011
  32. 32. Attributes and Strings • Attributes • Name-value pairs: name=value • Only strings as value! • Strings • Enclosed by ‘...’ or “...” → replace with &apos; or &quot; • Character data • Any text that is not markup • ‘&’, ‘<’ and ‘>’ are markup → replace with &amp; &lt; and &gt; 25Thursday 31 March 2011
  33. 33. Document structure • Prolog (optional) • <?xml version="1.0” encoding=“UTF-8”?> • (compulsory) version="number" • encoding="character encoding" (optional) • Document type declaration • <!DOCTYPE document_element ... >• Body – The document element 26Thursday 31 March 2011
  34. 34. Another example<?xml version="1.0" standalone="no"?><!DOCTYPE BankAccounts ...><!-- This is an example XML document --><BankAccounts> <Account accountNr="123-456789-01" use="personal"> <Owners> <Person ID="1258-a8d72-98"> <Name>John Smith</Name></Person> <Person ID="5842-df5ef-e9"> <Name>Claudia Scott</Name></Person> </Owners> <CreditCards><CreditCard number="12345"/></CreditCards> <Balance Currency="EUR">50000</Balance> </Account> ...</BankAccounts> 27Thursday 31 March 2011
  35. 35. namespaces: problem<widget type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info> <head><title>Gadget</title></head> <body><h1>Gadget</h1> A gadget contains a big gizmo </body> Name collision! </info></widget> 28Thursday 31 March 2011
  36. 36. solution ? 29Thursday 31 March 2011
  37. 37. namespaces: approach • A collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names •xmlns:prefix="URI" • URI used only as identifier • does not need to point to anything • applies to all nested elements and attributes 30Thursday 31 March 2011
  38. 38. namespaces: example <widget xmlns="http://www.widget.org" xmlns:xhtml="http://www.w3.org/TR/xhtml1" type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info><xhtml:head><xhtml:title>Gadget </xhtml:title></xhtml:head> <xhtml:body><xhtml:h1>Gadget </xhtml:h1>A gadget contains... </xhtml:body></info> </widget> 31Thursday 31 March 2011
  39. 39. Another example<Address> <Server> <Street>Celestijnenlaan</Street> <Name>www</Name> <Nr>200A</Nr> <Address> 134.58.43.1 <City>Heverlee-Leuven</City> </Address> <Country>Belgium</Country> </Server></Address> ? 32Thursday 31 March 2011
  40. 40. Another example (2)<Address <Server xmlns="www.all.edu/departments"> xmlns="www.dns.net/servers"> <Street>Celestijnenlaan</Street> <Name>www</Name> <Nr>200A</Nr> <Address> <City>Heverlee-Leuven</City> 134.58.43.1 </Address> <Country>Belgium</Country> </Server></Address> <Department xmlns:edu="www.all.edu/departments" xmlns:dns="www.dns.net/servers"> <edu:Address> <Street>Celestijnenlaan</Street> ... </edu:Address> <dns:Name>www</dns:Name> <dns:Address>134.58.43.1</dns:Address> </Department> 33Thursday 31 March 2011
  41. 41. how would you process XML? 34Thursday 31 March 2011
  42. 42. Accessing XML documents • Manual text file manipulation • Cumbersome & Error-prone • Parser • Simplifies document manipulation • Ensures proper grammar, well-formedness • Abstracts content from grammar • Accessed through standard API • Document Object Model (DOM) • Simple API for XML (SAX) 35Thursday 31 March 2011
  43. 43. • DOM parser • create DOM object tree • SAX parser • generates events when elements encountered • one-pass translation • no need to keep whole document tree in memory • Both can be validating or non-validating • Many available (most freeware, open source) • ibm xml4j, apache xerces, sun parser, microsoft, datachannel, oracle, ... 36Thursday 31 March 2011
  44. 44. DOM approach http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP 37Thursday 31 March 2011
  45. 45. Dom Benefits & Drawbacks • Benefits • W3C Recommendation • Language- and platform-independent • Random access • Intuitive • Drawback • Entire object tree in memory 38Thursday 31 March 2011
  46. 46. Simple API for XML (SAX) • Not an official standard • Ad-hoc product by XML developers • Primarily Java API • Event-based mechanism • Don’t call the parser, the parser calls you • No object model in memory • Programmer must keep state information 39Thursday 31 March 2011
  47. 47. http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXPThursday 31 March 2011 40 SAX approach
  48. 48. SAX Benefits & Drawbacks • Benefits • Suitable when • parsing large documents • constructing proprietary object structures • only small subset of information is needed • Simple and fast • Drawbacks • Read-only • No random access • Complex searches messy to programThursday 31 March 2011 41
  49. 49. how to define valid instances? 42Thursday 31 March 2011
  50. 50. XML Schema • typering van waarden • vb. integer, string, enz. • ook beperkingen op min/max waarden • types door gebruiker gedefinieerd • is gespecificeerd in XML syntax, • meer gestandaardiseerde voorstelling • is geïntegreerd met namespaces • en nog andere mogelijkheden • lijst types, uniciteitsbeperking op sleutels, verwijssleutelbeperkingen, overerving,… 43Thursday 31 March 2011
  51. 51. XSDL • XML Schema Definition Language • documenten met suffix .xsd 44Thursday 31 March 2011
  52. 52. XML Schema: voorbeeld XML schema <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> .... <xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="HOURS" type="xsd:float"/> </xsd:sequence> <xsd:attribute name="SSN" type="xsd:IDREF" use="required"/> </xsd:complexType> </xsd:element> .... </xsd:schema> XML instantie <PWORKER SSN="_123456789"> <HOURS>7.5</HOURS> </PWORKER> 45Thursday 31 March 2011
  53. 53. XML: eenvoudige types– ingebouwde eenvoudige types • string, integer, decimal, float, boolean, date, time,… • <xsd:element name=“gebdat” type=“xsd:date” />– door gebruiker gedefinieerde eenvoudige types • gedefinieerd met simpleType element • restriction element geeft het basistype waarop gesteund is • <xsd:simpleType name=“salaryRange”> <xsd:restriction base=“xsd:integer”> <xsd:minInclusive value=“25000” /> <xsd:maxInclusive value=“100000” /> </xsd:restriction> </xsd:simpleType> 46Thursday 31 March 2011
  54. 54. XML: eenvoudige types <xsd:simpleType name=“studentClassificatie”> <xsd:restriction base=“xsd:string”> <xsd:enumeration value=“bachelorstudent” /> <xsd:enumeration value=“masterstudent” /> <xsd:enumeration value=“doctorstudent” /> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name=“deptType”> <xsd:restriction base=“xsd:string”> <xsd:length value=“3” /> </xsd:restriction> </xsd:simpleType> 47Thursday 31 March 2011
  55. 55. 48Thursday 31 March 2011
  56. 56. 49Thursday 31 March 2011
  57. 57. 50Thursday 31 March 2011
  58. 58. 51Thursday 31 March 2011
  59. 59. how to query XML? 52Thursday 31 March 2011
  60. 60. XPath (example) ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 53Thursday 31 March 2011
  61. 61. ROOT COMPANY / COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 54Thursday 31 March 2011
  62. 62. ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 55Thursday 31 March 2011
  63. 63. ROOT COMPANY / /COMPANY EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 56Thursday 31 March 2011
  64. 64. ROOT COMPANY EMPLOYEE /COMPANY/ EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 57Thursday 31 March 2011
  65. 65. XPath ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE <EMPLOYEE SSN="_123456789" SEX="M“ SSN SUPERSSN="_333445555" DNO="_5"> <FNAME>John</FNAME> _123456789 <MINIT>B</MINIT> .... EMPLOYEE </EMPLOYEE> <EMPLOYEE SSN="_333445555" SEX="M“ SSN SUPERSSN="_888665555" DNO="_5"> <FNAME>Franklin</FNAME> <MINIT>T</MINIT> _333445555 <LNAME>Wong</LNAME> <BDATE>08-DEC-45</BDATE> </EMPLOYEE> EMPLOYEE <EMPLOYEE SSN="_999887777" SEX="F“ SUPERSSN="_987654321" DNO="_4"> SSN <FNAME>Alicia</FNAME> _999887777 ..... 58Thursday 31 March 2011
  66. 66. XML family of technologies • Xlink: hypertext • XSL: Extensible Style Sheet Language • XSL-T Transformation • Formatting Objects • Xschema: additional constraints on attribute types • and more... 59Thursday 31 March 2011
  67. 67. XML applications • RDF: Resource Description Framework • infra • XHTML: eXtensible HTML en HTML5 • XML compliant HTML • MathML • SMILE: synchronized multimedia presentation • Many others • Chemical Markup Language,Vector Graphics Markup Language, Open Software Description Format, Weather observation, astronomical data, financial data, electronic components, workflow, business cards, real estate, newspaper, classifieds, javadoc, human resource, advertising, architecture …. 60Thursday 31 March 2011
  68. 68. More XPath Features • Operator “|” used to implement union • E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)] • gives employees with either 0 or 1 dependents • “//” can be used to skip multiple levels of nodes • E.g. /COMPANY//FNAME • finds any FNAME element anywhere under the /COMPANY element, regardless of the element in which it is contained. • A step in the path can go to: parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children • “//”, described above, is a short from for specifying “all descendants” • “..” specifies the parent. • e.g. : /COMPANY//FNAME/../BDATE 61Thursday 31 March 2011
  69. 69. XQuery • laat toe om meer algemene queries te formuleren dan XPath • algemene vorm: FLWOR uitdrukking FOR < for-variabele > IN < in-uitdrukking > LET < let-variabele > := < let-uitdrukking > [ WHERE < filter-uitdrukking > ] [ ORDER BY < orde-specificatie > ] RETURN uitdrukking > < • opm: FOR en LET kunnen alleen of samen voorkomen 62Thursday 31 March 2011
  70. 70. • Q1: voornaam en familienaam van alle werknemers die meer dan 70000 verdienen • FOR $x IN doc(www.company.com/info.xml) // employee [employeeSalary > 70000] / employeeName RETURN < res > $x / firstName, $x / lastName </ res > • alternatief: FOR $x IN doc(www.company.com/info.xml) company / employee WHERE $x / employeeSalary > 70000 RETURN < res > $x / employeeName / firstName, $x / employeeName / lastName </ res > 63Thursday 31 March 2011
  71. 71. • Q3: voornaam en familienaam van alle werknemers die meer dan 20 uur op project nummer 5 werken, met dat aantal uren • FOR $x IN doc(www.company.com/info.xml) / company / project [projectNumber = 5] / projectWorker , $y IN doc(www.company.com/info.xml) / company / employee WHERE $x/hours > 20.0 AND $y.ssn = $x.ssn RETURN < res > $y / employeeName / firstName, $y / employeeName / lastName, $x / hours </ res > 64Thursday 31 March 2011
  72. 72. • XML • NoSQL (Met dank aan Steven Noels) 65Thursday 31 March 2011
  73. 73. 66Thursday 31 March 2011
  74. 74. Hoe bovenop SQL? 67Thursday 31 March 2011
  75. 75. select fun, profit from real_world where relational=false; 68Thursday 31 March 2011
  76. 76. NoSQL • problems with existing relational approach for Amazon (Dynamo) and Google (BigTable) • flexibility, performance, scaling, cost • millions of users • application changes rolled out incrementally without downtime • now more broadly applicable (velcro) • Open source developments: Facebook,Yahoo! - Cassandra, Hadoop, MapReduce, Hive, Pig 69Thursday 31 March 2011
  77. 77. http://www.odbms.org/download/NoSQL-Whitepaper-1.pdf 70Thursday 31 March 2011
  78. 78. NoSQL • non-relational • distributed • open source • horizontally scalable 71Thursday 31 March 2011
  79. 79. NoSQL • non-relational • “web scale” • distributed • schema free • open source • easy replication • horizontally scalable • simple API 71Thursday 31 March 2011
  80. 80. Systems • Core: Hadoop, HBase, Cassandra, Hypertable, ... • Docs: CouchDB, MongoDB, Riak, Terrastore, ... • Key-Value, tuple: Amazon SimpleDB, Azure, ... • Graph: Neo4J, Bigdata, InfoGrid, HyperGraph, ... • Object:Versant, Perst, ZODB, ... • Grid: GigaSpaces, Hazelcast, ... • XML: Tamino, eXist, Mark Logic, Xindice, ... • ... 72 http://nosql-databases.org/Thursday 31 March 2011
  81. 81. 73Thursday 31 March 2011
  82. 82. http://about.digg.com/blog/looking-future-cassandra 74 Thursday 31 March 2011
  83. 83. http://about.digg.com/blog/looking-future-cassandra 74 Thursday 31 March 2011
  84. 84. http://about.digg.com/blog/looking-future-cassandra14 seconds 74 Thursday 31 March 2011
  85. 85. http://about.digg.com/blog/looking-future-cassandra 75Thursday 31 March 2011
  86. 86. Text 76 http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentationThursday 31 March 2011
  87. 87. 77Thursday 31 March 2011
  88. 88. no attempt to ACID • Atomicity • Consistency • Isolation • Durability • BASE: trade ACID off in favor of high availability 78Thursday 31 March 2011
  89. 89. http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext 79Thursday 31 March 2011
  90. 90. Questions? http://erikduval.wordpress.com/ twitter: @ErikDuval 80Thursday 31 March 2011

×