Gegevensbanken
                      “Laatse Les”
                         Prof. Erik Duval
                           200...
http://www.slideshare.net/erik.duval




Sunday 30 May 2010
                     2
•     NoSQL (Met dank aan Steven Noels)
               •     XML (met dank aan prof. Olivié!)
               •     over he...
Sunday 30 May 2010
                     4




                         http://en.wikipedia.org/wiki/Extensible_Markup_Lang...
Sunday 30 May 2010
                     5




                         http://www.itjobboard.be/ICT-banen/xml/Belgie/alle/...
6   http://www.khbo.be/12385
Sunday 30 May 2010
7   http://www.w3.org/XML
Sunday 30 May 2010
8   http://www.w3c.it/talks/2005/openCulture/slide7-0.html

Sunday 30 May 2010
Sunday 30 May 2010
                     9




                         http://en.wikipedia.org/wiki/List_of_XML_markup_lan...
XML is not ...
   •      Extension of HTML
        •      XHTML is XML-compliant, and extensible

   •      Just for Web p...
XML is ...
   •      Endorsed by W3C and major companies
   •      Extensible
        •      No tag name limitations
     ...
•     1969: SGML (Standard Generalized Markup Language)
                     •   Meta-language: describe other languages
 ...
Design Goals
               •     Easy to use over the Internet
                     •   Power of SGML
                   ...
XML Basics
           <Person>
                <Name>
                     <First>Thomas</First>
                     <Las...
•      Language for defining syntax
   •      Records and fields have explicit boundaries
        •      parse-able without ...
<?xml version="1.0” encoding=“UTF-8”?>
    <!-- processing instruction: XML follows -->
  <!DOCTYPE addressbook SYSTEM
   ...
<H1        align=”center”        > a Heading </H1>
                       attribute
                       opening        ...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribut...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribut...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribut...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribut...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribut...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                       attribut...
•      Cfr. HTML markup tags

          <H1        align=”center”        > a Heading </H1>
                        attribu...
Vocabularies

   •      Agreed-upon XML tag sets for specific domain
   •      Examples
         •      Chemical Markup Lan...
•      well-formed: follows XML syntax

        •      Proper tag and attribute names

        •      Tags properly closed...
Elements
    •      XML’s container for
          •      Attributes
          •      Character data
          •      Other...
Attributes and Strings
     •      Attributes
           •         Name-value pairs: name=value
           •         Only ...
Document structure

   •      Prolog (optional)
        •      <?xml version="1.0” encoding=“UTF-8”?>

              •    ...
Another example
<?xml version="1.0" standalone="no"?>
<!DOCTYPE BankAccounts ...>
<!-- This is an example XML document -->...
Document Type Definition
<!ELEMENT address EMPTY>
  <!-- no content, used for attributes only -->
<!ATTLIST address city CD...
Document Type Definition
<!ELEMENT email EMPTY>
<!ATTLIST email address CDATA #REQUIRED>

<!ELEMENT home-phone EMPTY>
<!ATT...
Document Type Definition

<!ELEMENT manager EMPTY>
<!ATTLIST manager empnumber IDREF #REQUIRED>
  <!-- reference to empnumb...
namespaces: problem
<widget type="gadget">
     <head size="medium"/>
     <big><subwidget ref="gizmo"/></big>
     <info>...
namespaces: approach


   •      A collection of names, identified by a URI
          reference, which are used in XML docu...
namespaces: example
 <widget xmlns="http://www.widget.org"
      xmlns:xhtml="http://www.w3.org/TR/xhtml1"
      type="gad...
Another example

<Address>                            <Server>
  <Street>Celestijnenlaan</Street>     <Name>www</Name>
  <...
Another example (2)
<Address                                   <Server
  xmlns="www.all.edu/departments">           xmlns=...
Accessing XML documents

   •      Manual text file manipulation
         •      Cumbersome & Error-prone

   •      Parser...
•      DOM parser
        •      create DOM object tree
   •      SAX parser
        •      generates events when elements...
DOM approach




                               http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#J...
DOM Node Tree
                                        Doc
<?xml version="1.0"?>
                                          ...
parsing: DOM
 public void print(Node node) {
    ...
    NodeList nlist=node.getChildNodes();
    if (nlist != null) {
   ...
Dom Benefits & Drawbacks

    •      Benefits
         •      W3C Recommendation
         •      Language- and platform-inde...
Simple API for XML (SAX)

    •      Not an official standard
         •      Ad-hoc product by XML developers
         •  ...
http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP




Sunday 30 May 2010
                     ...
SAX parsing model
          Application
                             new ContentHandler()               ContentHandler
   ...
parsing: SAX
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser,
     "startQuestion","endQuestion");
...
•      Start and end of document
      – startDocument()
      – endDocument()

    •      Start and end of element
      ...
Another SAX example
<?xml version="1.0" standalone="no"?>

<!DOCTYPE BankAccounts ...>

<!-- This is an example XML docume...
public class AvgBalanceCalculator extends DefaultHandler
  {private double total = 0.0;
   private int count = 0;
   priva...
SAX Benefits & Drawbacks
   •      Benefits
         •      Suitable when
              •      parsing large documents

    ...
beperkingen van DTDs

   •      geen typering van tekst elementen en attributen

         •      alle waarden zijn strings...
XML Schema
    •      typering van waarden

         •      vb. integer, string, enz.
         •      ook beperkingen op m...
XSDL


               •     XML Schema Definition Language
               •     documenten met suffix .xsd




             ...
XML Schema: voorbeeld
       XML schema

       <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
       ....
    ...
XML: eenvoudige types
–        ingebouwde eenvoudige types
        •      string, integer, decimal, float, boolean, date, t...
XML: eenvoudige types
      <xsd:simpleType name=“studentClassificatie”>
            <xsd:restriction base=“xsd:string”>
  ...
52

Sunday 30 May 2010
53

Sunday 30 May 2010
54

Sunday 30 May 2010
55

Sunday 30 May 2010
XPath (example)
                                ROOT

                                       COMPANY
            /COMPANY/...
ROOT

                                        COMPANY

            / COMPANY/EMPLOYEE
                                    ...
ROOT

                                               COMPANY
                 /   COMPANY/EMPLOYEE
                       ...
ROOT

                                       COMPANY

                     /
            /COMPANY EMPLOYEE
               ...
ROOT

                                         COMPANY
                         EMPLOYEE
                 /COMPANY/
      ...
XPath    ROOT

                                                      COMPANY
            /COMPANY/EMPLOYEE

              ...
XML family of technologies

   •      Xlink: hypertext

   •      XSL: Extensible Style Sheet Language

        •      XSL...
XML applications
   •      RDF: Resource Description Framework

         •      infra

   •      XHTML: eXtensible HTML en...
XML Working Groups
               •     XML Coordination
               •     XML Core
               •     XSL (XSLT, XSL...
More XPath Features
  •    Operator “|” used to implement union

      •    E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPL...
XQuery
   •      laat toe om meer algemene queries te formuleren dan XPath
   •      algemene vorm: FLWOR uitdrukking
    ...
•      Q1: voornaam en familienaam van alle werknemers die meer
          dan 70000 verdienen
   •      FOR $x IN doc(www....
•      Q3: voornaam en familienaam van alle werknemers die meer
          dan 20 uur op project nummer 5 werken, met dat a...
The End...


                     Bedankt!
                       Vragen...?


                           69

Sunday 30 Ma...
NoSQL

               •     non-relational
               •     distributed
               •     open source
             ...
NoSQL

               •     non-relational
                                                  •   schema free
             ...
Systems
               •     Core: Hadoop, HBase, Cassandra, Hypertable, ...
               •     Docs: CouchDB, MongoDB, ...
nosql

               •     Google BigTable
               •     Amazon Dynamo
               •     Open source: HBase
   ...
nosql: why

               •     big data sets:
                     •   Digg green badge: 3 TB
                     •   F...
http://about.digg.com/blog/looking-future-cassandra
                                                  74

                ...
http://about.digg.com/blog/looking-future-cassandra
                                                  74

                ...
http://about.digg.com/blog/looking-future-cassandra
14 seconds



                                                    74

...
http://about.digg.com/blog/looking-future-cassandra
                     75

Sunday 30 May 2010
Text




                      76    http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation

Sunday 3...
no attempt to ACID
               •     Atomicity
               •     Consistency
               •     Isolation
        ...
query


               •     associative array, key-value pair
               •     XQuery
               •     SPARQL



...
Vragen...?

                         79

Sunday 30 May 2010
Upcoming SlideShare
Loading in …5
×

Gegevensbanken laatste les: XML...

1,026 views

Published on

Last lecture for 2010 course on databases. Focuses on XML.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,026
On SlideShare
0
From Embeds
0
Number of Embeds
99
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Gegevensbanken laatste les: XML...

  1. 1. Gegevensbanken “Laatse Les” Prof. Erik Duval 2009 - 2010 1 Sunday 30 May 2010
  2. 2. http://www.slideshare.net/erik.duval Sunday 30 May 2010 2
  3. 3. • NoSQL (Met dank aan Steven Noels) • XML (met dank aan prof. Olivié!) • over het examen... 3 Sunday 30 May 2010
  4. 4. Sunday 30 May 2010 4 http://en.wikipedia.org/wiki/Extensible_Markup_Language
  5. 5. Sunday 30 May 2010 5 http://www.itjobboard.be/ICT-banen/xml/Belgie/alle/0/relevantie/nl/
  6. 6. 6 http://www.khbo.be/12385 Sunday 30 May 2010
  7. 7. 7 http://www.w3.org/XML Sunday 30 May 2010
  8. 8. 8 http://www.w3c.it/talks/2005/openCulture/slide7-0.html Sunday 30 May 2010
  9. 9. Sunday 30 May 2010 9 http://en.wikipedia.org/wiki/List_of_XML_markup_languages
  10. 10. XML is not ... • Extension of HTML • XHTML is XML-compliant, and extensible • Just for Web pages • Useful when data are stored or exchanged • Concerned with semantics • XML does not define semantics, just syntax • Innovative new technology • Standard, building on existing technology • Only a hype • Though also Sunday 30 May 2010 10
  11. 11. XML is ... • Endorsed by W3C and major companies • Extensible • No tag name limitations • No language limitations • Human software developer-readable • Can be processed with basic text tools • Open standard • no vendor lock-in (in theory...) • Easy to implement • powerful, cheap (free), off-the-shelf XML tools Sunday 30 May 2010 11
  12. 12. • 1969: SGML (Standard Generalized Markup Language) • Meta-language: describe other languages • Powerful, but rather complicated • 1986: ISO standard • 1992: HTML (HyperText Markup Language) • Based on SGML • Simple, but limited • 1996: Start design of XML • By World Wide Web Consortium (W3C) • 1998: Publication of XML 1.0 12 Sunday 30 May 2010
  13. 13. Design Goals • Easy to use over the Internet • Power of SGML • Simplicity of HTML • Human-legible • Easy to create • Compactness is not an issue • “The ASCII of the Web” 13 Sunday 30 May 2010
  14. 14. XML Basics <Person> <Name> <First>Thomas</First> <Last>Atkinson</Last> </Name> <Age>30</Age> </Person> • Self-defined, meaningful tags • Separate data and its representation 14 Sunday 30 May 2010
  15. 15. • Language for defining syntax • Records and fields have explicit boundaries • parse-able without knowing structure (self-descriptive) • Unicode support (UTF-8, UTF-16, ...) • Web-aware • DTD, ENTITY and Schema can be loaded through URL • Strictly parsed: no ambiguity (case sensitive!) • Extensible: namespaces 15 Sunday 30 May 2010
  16. 16. <?xml version="1.0” encoding=“UTF-8”?> <!-- processing instruction: XML follows --> <!DOCTYPE addressbook SYSTEM "http://www/~koenh/ddml/addressbook.dtd”> <!-- Document Type Declaration... --> <!-- ExternalDTDPointer --> <addressbook> <!--root element --> <person first-name="John" family-name="Doe” employee-number="1234"> <contact-info> <email address="Jdoe@home.com"/> </contact-info> <address street="Celestijnenlaan” number="200A"/> </person></addressbook> 16 Sunday 30 May 2010
  17. 17. <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  18. 18. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  19. 19. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  20. 20. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  21. 21. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  22. 22. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  23. 23. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element 17 Sunday 30 May 2010
  24. 24. • Cfr. HTML markup tags <H1 align=”center” > a Heading </H1> attribute opening closing content tag tag element • Major differences: • Case sensitive • Proper nesting: No <A> … <B> … </A> … </B> • Unicode instead of ASCII 17 Sunday 30 May 2010
  25. 25. Vocabularies • Agreed-upon XML tag sets for specific domain • Examples • Chemical Markup Language (CML) • Business: ebXML, RosettaNet, BizTalk • Mathematics: MathML • Multimedia: Synchronized Multimedia Integration Language (SMIL) • Etc. 18 Sunday 30 May 2010
  26. 26. • well-formed: follows XML syntax • Proper tag and attribute names • Tags properly closed • Attributes and text between tags do not contain ‘<‘ (escape with &lt;) • valid: well-formed and vocabulary • All elements and their attributes declared in DTD • Attribute values follow DTD type declarations • CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, enumerated • Nesting and sequencing of elements follows DTD 19 Sunday 30 May 2010
  27. 27. Elements • XML’s container for • Attributes • Character data • Other elements (“child” elements) • Delimited by opening and closing tags • Non-empty element: <name>..</name> • Empty element: <name/> • Form a simple hierarchic tree • Root = “document element” 20 Sunday 30 May 2010
  28. 28. Attributes and Strings • Attributes • Name-value pairs: name=value • Only strings as value! • Strings • Enclosed by ‘...’ or “...” → replace with &apos; or &quot; • Character data • Any text that is not markup • ‘&’, ‘<’ and ‘>’ are markup → replace with &amp; &lt; and &gt; 21 Sunday 30 May 2010
  29. 29. Document structure • Prolog (optional) • <?xml version="1.0” encoding=“UTF-8”?> • (compulsory) version="number" • encoding="character encoding" (optional) • Document type declaration • <!DOCTYPE document_element ... > • Body – The document element 22 Sunday 30 May 2010
  30. 30. Another example <?xml version="1.0" standalone="no"?> <!DOCTYPE BankAccounts ...> <!-- This is an example XML document --> <BankAccounts> <Account accountNr="123-456789-01" use="personal"> <Owners> <Person ID="1258-a8d72-98"> <Name>John Smith</Name></Person> <Person ID="5842-df5ef-e9"> <Name>Claudia Scott</Name></Person> </Owners> <CreditCards><CreditCard number="12345"/></CreditCards> <Balance Currency="EUR">50000</Balance> </Account> ... </BankAccounts> 23 Sunday 30 May 2010
  31. 31. Document Type Definition <!ELEMENT address EMPTY> <!-- no content, used for attributes only --> <!ATTLIST address city CDATA #REQUIRED <!-- character data: any string --> <!-- value for that attribute must be present --> state NMTOKEN #REQUIRED <!-- name token: letters, numbers, ., -, _ and : only --> number CDATA #REQUIRED street CDATA #REQUIRED> <!ELEMENT addressbook (person+)> <!-- 1 or more --> <!ELEMENT contact-info (home-phone|mobile-phone|email)*> <!-- choice --> <!-- o or more --> 24 Sunday 30 May 2010
  32. 32. Document Type Definition <!ELEMENT email EMPTY> <!ATTLIST email address CDATA #REQUIRED> <!ELEMENT home-phone EMPTY> <!ATTLIST home-phone number CDATA #REQUIRED> <!ELEMENT job-info EMPTY> <!ATTLIST job-info is-manager (yes|no) 'no’ <!-- default --> emp-type (FullTime|PartTime) 'FullTime’ job-description CDATA #REQUIRED> <!ELEMENT misc-info (#PCDATA)> <!-- Parsed Character Data: cannot contain subelements --> <!ELEMENT mobile-phone EMPTY> <!ATTLIST mobile-phone 25number CDATA #REQUIRED> Sunday 30 May 2010
  33. 33. Document Type Definition <!ELEMENT manager EMPTY> <!ATTLIST manager empnumber IDREF #REQUIRED> <!-- reference to empnumber of person --> <!ELEMENT person (contact-info,address, job-info?,manager?,misc-info?)> <!-- sequence --> <!-- zero or one --> <!ATTLIST person first-name CDATA #REQUIRED middle-initial CDATA #IMPLIED <!-- can, but need not be provided --> employee-number ID #REQUIRED <!-- can be referred to by manager.empnumber --> family-name CDATA #REQUIRED> 26 Sunday 30 May 2010
  34. 34. namespaces: problem <widget type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info> <head><title>Gadget</title></head> <body><h1>Gadget</h1> A gadget contains a big gizmo </body> Name collision! </info> </widget> 27 Sunday 30 May 2010
  35. 35. namespaces: approach • A collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names •xmlns:prefix="URI" • URI used only as identifier • does not need to point to anything • applies to all nested elements and attributes 28 Sunday 30 May 2010
  36. 36. namespaces: example <widget xmlns="http://www.widget.org" xmlns:xhtml="http://www.w3.org/TR/xhtml1" type="gadget"> <head size="medium"/> <big><subwidget ref="gizmo"/></big> <info><xhtml:head><xhtml:title>Gadget </xhtml:title></xhtml:head> <xhtml:body><xhtml:h1>Gadget </xhtml:h1>A gadget contains... </xhtml:body></info> </widget> 29 Sunday 30 May 2010
  37. 37. Another example <Address> <Server> <Street>Celestijnenlaan</Street> <Name>www</Name> <Nr>200A</Nr> <Address> 134.58.43.1 <City>Heverlee-Leuven</City> </Address> <Country>Belgium</Country> </Server> </Address> ? 30 Sunday 30 May 2010
  38. 38. Another example (2) <Address <Server xmlns="www.all.edu/departments"> xmlns="www.dns.net/servers"> <Street>Celestijnenlaan</Street> <Name>www</Name> <Nr>200A</Nr> <Address> <City>Heverlee-Leuven</City> 134.58.43.1 </Address> <Country>Belgium</Country> </Server> </Address> <Department xmlns:edu="www.all.edu/departments" xmlns:dns="www.dns.net/servers"> <edu:Address> <Street>Celestijnenlaan</Street> ... </edu:Address> <dns:Name>www</dns:Name> <dns:Address>134.58.43.1</dns:Address> </Department> 31 Sunday 30 May 2010
  39. 39. Accessing XML documents • Manual text file manipulation • Cumbersome & Error-prone • Parser • Simplifies document manipulation • Ensures proper grammar, well-formedness • Abstracts content from grammar • Accessed through standard API • Document Object Model (DOM) • Simple API for XML (SAX) 32 Sunday 30 May 2010
  40. 40. • DOM parser • create DOM object tree • SAX parser • generates events when elements encountered • one-pass translation • no need to keep whole document tree in memory • Both can be validating or non-validating • Many available (most freeware, open source) • ibm xml4j, apache xerces, sun parser, microsoft, datachannel, oracle, ... 33 Sunday 30 May 2010
  41. 41. DOM approach http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP 34 Sunday 30 May 2010
  42. 42. DOM Node Tree Doc <?xml version="1.0"?> Com An example XML document <!-- An example XML document --> El BankAccounts <BankAccounts> El Account <Account accountNr="123-456789-01“> <Owner ID="1258-a8d72-98"> Att accountNr = “123-456789-01” John Smith El Owner = “John Smith” </Owner> <Balance Currency="EUR"> Att ID = “1258-a8d72-98” 50000 El Balance = “50000” </Balance> </Account> Att Currency = “Eur” <Account ...> ... El Account </BankAccounts> ... 35 Sunday 30 May 2010
  43. 43. parsing: DOM public void print(Node node) { ... NodeList nlist=node.getChildNodes(); if (nlist != null) { int l = nlist.getLength(); for (int i=0; i<l; i++) { print(nlist.item(i)); ... }...}...} 36 Sunday 30 May 2010
  44. 44. Dom Benefits & Drawbacks • Benefits • W3C Recommendation • Language- and platform-independent • Random access • Intuitive • Drawback • Entire object tree in memory 37 Sunday 30 May 2010
  45. 45. Simple API for XML (SAX) • Not an official standard • Ad-hoc product by XML developers • Primarily Java API • Event-based mechanism • Don’t call the parser, the parser calls you • No object model in memory • Programmer must keep state information 38 Sunday 30 May 2010
  46. 46. http://java.sun.com/xml/jaxp/dist/1.1/docs/tutorial/overview/3_apis.html#JAXP Sunday 30 May 2010 39 SAX approach
  47. 47. SAX parsing model Application new ContentHandler() ContentHandler new Parser() Parser setContentHandler() parse() startDocument() startElement() characters() endElement() endDocument() 40 Sunday 30 May 2010
  48. 48. parsing: SAX $xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, "startQuestion","endQuestion"); ... xml_parse($xml_parser,$data,feof($fp)) ... function startQuestion($parser,$name,$attrs) { ...if ($name == "QUESTION") ...new Question($attrs["QTEXT"]); ... 41 Sunday 30 May 2010
  49. 49. • Start and end of document – startDocument() – endDocument() • Start and end of element – startElement(namespace, name, qname, attlist) – endElement(namespace, name, qname) • Character data – characters(char[] ch, int start, int length) • Processing Instruction – processingInstruction(target, data) • No event for comments! Sunday 30 May 2010 42
  50. 50. Another SAX example <?xml version="1.0" standalone="no"?> <!DOCTYPE BankAccounts ...> <!-- This is an example XML document --> <BankAccounts> <Account accountNr="123-456789-01" use="personal"> <Owners> <Person ID="1258-a8d72-98"><Name>John Smith</Name></Person> <Person ID="5842-df5ef-e9"><Name>Claudia Scott</Name></Person> </Owners> <CreditCards><CreditCard number="12345"/></CreditCards> <Balance Currency="EUR">50000</Balance> </Account> ... </BankAccounts> 43 Sunday 30 May 2010
  51. 51. public class AvgBalanceCalculator extends DefaultHandler {private double total = 0.0; private int count = 0; private boolean isBalance = false; public void startElement(String uri, String name, String qname, Attributes atts) {if (name.equals(“Balance")) { isBalance = true; count++; }} public void characters(char[] ch, int start, int len) throws SaxException {if (isBalance) { String help = new String(ch, start, len); double balance = (new double(help)).doubleValue(); total = total + balance; isBalance = false; }} public void endDocument() {if (count != 0) System.out.println(“Average balance is ”+(total/count)); } } 44 Sunday 30 May 2010
  52. 52. SAX Benefits & Drawbacks • Benefits • Suitable when • parsing large documents • constructing proprietary object structures • only small subset of information is needed • Simple and fast • Drawbacks • Read-only • No random access • Complex searches messy to program Sunday 30 May 2010 45
  53. 53. beperkingen van DTDs • geen typering van tekst elementen en attributen • alle waarden zijn strings, geen integers, reals, enz. • ongeordende verzameling van subelementen moeilijk te definiëren • orde is meestal irrelevant in gegevensbanken • IDs en IDREFs zijn niet getypeerd • het DNO attribuut van een EMPLOYEE kan een referentie bevatten aan een andere EMPLOYEE, wat zinloos is vb. <EMPLOYEE SSN="_888665555 " SEX="M" DNO="_888665555 "> • het DNO attribuut zou als beperking moeten hebben dat het slechts aan een DEPARTMENT kan refereren 46 Sunday 30 May 2010
  54. 54. XML Schema • typering van waarden • vb. integer, string, enz. • ook beperkingen op min/max waarden • types door gebruiker gedefinieerd • is gespecificeerd in XML syntax, • meer gestandaardiseerde voorstelling • is geïntegreerd met namespaces • en nog andere mogelijkheden • lijst types, uniciteitsbeperking op sleutels, verwijssleutelbeperkingen, overerving,… 47 Sunday 30 May 2010
  55. 55. XSDL • XML Schema Definition Language • documenten met suffix .xsd 48 Sunday 30 May 2010
  56. 56. XML Schema: voorbeeld XML schema <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> .... <xsd:element name="PWORKER" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="HOURS" type="xsd:float"/> </xsd:sequence> <xsd:attribute name="SSN" type="xsd:IDREF" use="required"/> </xsd:complexType> </xsd:element> .... </xsd:schema> XML instantie <PWORKER SSN="_123456789"> <HOURS>7.5</HOURS> </PWORKER> 49 Sunday 30 May 2010
  57. 57. XML: eenvoudige types – ingebouwde eenvoudige types • string, integer, decimal, float, boolean, date, time,… • <xsd:element name=“gebdat” type=“xsd:date” /> – door gebruiker gedefinieerde eenvoudige types • gedefinieerd met simpleType element • restriction element geeft het basistype waarop gesteund is • <xsd:simpleType name=“salaryRange”> <xsd:restriction base=“xsd:integer”> <xsd:minInclusive value=“25000” /> <xsd:maxInclusive value=“100000” /> </xsd:restriction> </xsd:simpleType> 50 Sunday 30 May 2010
  58. 58. XML: eenvoudige types <xsd:simpleType name=“studentClassificatie”> <xsd:restriction base=“xsd:string”> <xsd:enumeration value=“bachelorstudent” /> <xsd:enumeration value=“masterstudent” /> <xsd:enumeration value=“doctorstudent” /> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name=“deptType”> <xsd:restriction base=“xsd:string”> <xsd:length value=“3” /> </xsd:restriction> </xsd:simpleType> 51 Sunday 30 May 2010
  59. 59. 52 Sunday 30 May 2010
  60. 60. 53 Sunday 30 May 2010
  61. 61. 54 Sunday 30 May 2010
  62. 62. 55 Sunday 30 May 2010
  63. 63. XPath (example) ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 56 Sunday 30 May 2010
  64. 64. ROOT COMPANY / COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 57 Sunday 30 May 2010
  65. 65. ROOT COMPANY / COMPANY/EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 58 Sunday 30 May 2010
  66. 66. ROOT COMPANY / /COMPANY EMPLOYEE EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 59 Sunday 30 May 2010
  67. 67. ROOT COMPANY EMPLOYEE /COMPANY/ EMPLOYEE SSN _123456789 EMPLOYEE SSN _333445555 EMPLOYEE SSN _999887777 60 Sunday 30 May 2010
  68. 68. XPath ROOT COMPANY /COMPANY/EMPLOYEE EMPLOYEE <EMPLOYEE SSN="_123456789" SEX="M“ SSN SUPERSSN="_333445555" DNO="_5"> <FNAME>John</FNAME> _123456789 <MINIT>B</MINIT> .... EMPLOYEE </EMPLOYEE> <EMPLOYEE SSN="_333445555" SEX="M“ SSN SUPERSSN="_888665555" DNO="_5"> <FNAME>Franklin</FNAME> <MINIT>T</MINIT> _333445555 <LNAME>Wong</LNAME> <BDATE>08-DEC-45</BDATE> </EMPLOYEE> EMPLOYEE <EMPLOYEE SSN="_999887777" SEX="F“ SUPERSSN="_987654321" DNO="_4"> SSN <FNAME>Alicia</FNAME> _999887777 ..... 61 Sunday 30 May 2010
  69. 69. XML family of technologies • Xlink: hypertext • XSL: Extensible Style Sheet Language • XSL-T Transformation • Formatting Objects • Xschema: additional constraints on attribute types • and more... 62 Sunday 30 May 2010
  70. 70. XML applications • RDF: Resource Description Framework • infra • XHTML: eXtensible HTML en HTML5 • XML compliant HTML • MathML • SMILE: synchronized multimedia presentation • Many others • Chemical Markup Language,Vector Graphics Markup Language, Open Software Description Format, Weather observation, astronomical data, financial data, electronic components, workflow, business cards, real estate, newspaper, classifieds, javadoc, human resource, advertising, architecture …. 63 Sunday 30 May 2010
  71. 71. XML Working Groups • XML Coordination • XML Core • XSL (XSLT, XSL/FO) -> W3C architecture • Efficient XML Interchange • XML Processing Model • XML Query (XQuery, XPath) • XML Schema • Service Modeling Language (SML) 64 Sunday 30 May 2010
  72. 72. More XPath Features • Operator “|” used to implement union • E.g. //EMPLOYEE[count(DEPENDENT) = 1] | //EMPLOYEE[not(DEPENDENT)] • gives employees with either 0 or 1 dependents • “//” can be used to skip multiple levels of nodes • E.g. /COMPANY//FNAME • finds any FNAME element anywhere under the /COMPANY element, regardless of the element in which it is contained. • A step in the path can go to: parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children • “//”, described above, is a short from for specifying “all descendants” • “..” specifies the parent. • e.g. : /COMPANY//FNAME/../BDATE 65 Sunday 30 May 2010
  73. 73. XQuery • laat toe om meer algemene queries te formuleren dan XPath • algemene vorm: FLWOR uitdrukking FOR < for-variabele > IN < in-uitdrukking > LET < let-variabele > := < let-uitdrukking > [ WHERE < filter-uitdrukking > ] [ ORDER BY < orde-specificatie > ] RETURN uitdrukking > < • opm: FOR en LET kunnen alleen of samen voorkomen 66 Sunday 30 May 2010
  74. 74. • Q1: voornaam en familienaam van alle werknemers die meer dan 70000 verdienen • FOR $x IN doc(www.company.com/info.xml) // employee [employeeSalary > 70000] / employeeName RETURN < res > $x / firstName, $x / lastName </ res > • alternatief: FOR $x IN doc(www.company.com/info.xml) company / employee WHERE $x / employeeSalary > 70000 RETURN < res > $x / employeeName / firstName, $x / employeeName / lastName </ res > 67 Sunday 30 May 2010
  75. 75. • Q3: voornaam en familienaam van alle werknemers die meer dan 20 uur op project nummer 5 werken, met dat aantal uren • FOR $x IN doc(www.company.com/info.xml) / company / project [projectNumber = 5] / projectWorker , $y IN doc(www.company.com/info.xml) / company / employee WHERE $x/hours > 20.0 AND $y.ssn = $x.ssn RETURN < res > $y / employeeName / firstName, $y / employeeName / lastName, $x / hours </ res > 68 Sunday 30 May 2010
  76. 76. The End... Bedankt! Vragen...? 69 Sunday 30 May 2010
  77. 77. NoSQL • non-relational • distributed • open source • horizontally scalable • “web scale” 70 Sunday 30 May 2010
  78. 78. NoSQL • non-relational • schema free • distributed • easy replication • open source • simple API • horizontally scalable • BASE (not ACID) • “web scale” 70 Sunday 30 May 2010
  79. 79. Systems • Core: Hadoop, HBase, Cassandra, Hypertable, ... • Docs: CouchDB, MongoDB, Riak, Terrastore, ... • Key-Value, tuple: Amazon SimpleDB, Azure, ... • Graph: Neo4J, Bigdata, InfoGrid, HyperGraph, ... • Object:Versant, Perst, ZODB, ... • Grid: GigaSpaces, Hazelcast, ... • XML: Tamino, eXist, Mark Logic, Xindice, ... • ... 71 http://nosql-databases.org/ Sunday 30 May 2010
  80. 80. nosql • Google BigTable • Amazon Dynamo • Open source: HBase • Cassandra: last.fm, FaceBook 72 Sunday 30 May 2010
  81. 81. nosql: why • big data sets: • Digg green badge: 3 TB • Facebook inbox: 50 TB • eBay overall data: 2 PB 73 Sunday 30 May 2010
  82. 82. http://about.digg.com/blog/looking-future-cassandra 74 Sunday 30 May 2010
  83. 83. http://about.digg.com/blog/looking-future-cassandra 74 Sunday 30 May 2010
  84. 84. http://about.digg.com/blog/looking-future-cassandra 14 seconds 74 Sunday 30 May 2010
  85. 85. http://about.digg.com/blog/looking-future-cassandra 75 Sunday 30 May 2010
  86. 86. Text 76 http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation Sunday 30 May 2010
  87. 87. no attempt to ACID • Atomicity • Consistency • Isolation • Durability • trade ACID off in favor of high availability 77 Sunday 30 May 2010
  88. 88. query • associative array, key-value pair • XQuery • SPARQL 78 Sunday 30 May 2010
  89. 89. Vragen...? 79 Sunday 30 May 2010

×