Unit 10: XML and Beyond (Sematic Web, Web Services, ...)


  1. 1. Unit 10: XML and Web and Beyond XML  DTD, XMLSchema  XSL, Xquery Web Services  SOAP, WSDL  RESTful Web Services Semantic Web  Introduction  RDF, RDF Schema, OWL, SPARQLdsbw 2011/2012 q1 1
  2. 2. eXtensible Markup Language “... is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. ” W3 Consortium XML …  is not a solution but a tool to build solutions  is not a language but a meta-language that require interoperating applications that use it to adopt clear conventions on how to use it  is a standardized text format that is used to represent structured informationdsbw 2011/2012 q1 2
  3. 3. SGML, XML and their applications Meta-Markup Language SGML Application Markup Language XML HyTime HTML XHTML SMIL SOAP WMLdsbw 2011/2012 q1 3
  4. 4. Well-Formed XML Documents The document has exactly one root element The root element can be preceded by an optional XML declaration Non-empty elements are delimited by both a start-tag and an end-tag. Empty elements are marked with an empty-element (self-closing) tag Tags may be nested but must not overlap All attribute values are quoted with either single () or double (") quotes <?xml version="1.0" encoding="UTF-8"?> <address> <street> <line>123 Pine Rd.</line> </street> <city name="Lexington"/> <state abbrev="SC"/> <zip base="19072" plus4=""/> </address>dsbw 2011/2012 q1 4
  5. 5. Valid XML Documents Are well-formed XML documents Are documents that conform the rules defined by certain schemas Schema: define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. Two ways to define a schema:  DTD: Document Type Definition  XML Schemadsbw 2011/2012 q1 5
  6. 6. DTD Example: Embedded and External Definitions <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE address [ <!ELEMENT address (street, city, state, zip)> <!ELEMENT street line+> <!ELEMENT line (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)> ]> <address> ... </address> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE address SYSTEM "http://dtd.mycompany.com/address.dtd"> <address> ... </address>dsbw 2011/2012 q1 6
  7. 7. DTD Limitations DTD is not integrated with Namespace technology so users cannot import and reuse code DTD does not support data types other than character data DTD syntax is not XML compliant DTD language constructs are no extensibledsbw 2011/2012 q1 7
  8. 8. XML Schema: Example<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified"> <xsd:import namespace=" "/> <xsd:element name="address"> <xsd:complexType> <xsd:sequence> <xsd:element name="street"> <xsd:complexType> <xsd:all maxOccurs="unbounded"> <xsd:element name="line" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>dsbw 2011/2012 q1 8
  9. 9. Processing XML Documents Using a programming language and the SAX API.  SAX is a lexical, event-driven interface in which a document is read serially and its contents are reported as "callbacks" to various methods on a handler object of the users design Using a programming language and the DOM API.  DOM allows for navigation of the entire document as if it were a tree of "Node" objects representing the documents contents. Using a transformation engine and a filter  XSLT, XQuery, etcdsbw 2011/2012 q1 9
  10. 10. XML Uses Alternative/complement to HTML  XML + CSS, XML + XSL, XHTML Declarative application programming/configuration  Configuration files, descriptors, etc. Data exchange among heterogeneous systems  B2B, e-commerce: ebXML Data Integration from heterogeneous sources  Schema mediation Data storage and processing  XML Databases, XQuery (XPath) Protocol definition  SOAP, WAP, WML, etc.dsbw 2011/2012 q1 10
  11. 11. XPath Expression language to address elements of an XML document (used in XSLT, XQuery, …) A location path is a sequence of location steps separated by a slash (/) Various navigation axes such as child, parent, following etc. XPath expressions look similar to file pathnames: /bib/book /bib/book[year>2008]/title //author[3]dsbw 2011/2012 q1 11
  12. 12. eXtensible Stylesheet Language: XSL XSL serves the dual purpose of  transforming XML documents  exhibiting control over document rendering XSL consists of two parts:  XSL Transformations (XSLT):  An XML language for transforming XML documents  It uses XPath to search and transverse the element hierarchy of XML documents  XSL Formatting Objects (XSL-FO):  An XML language for specifying the visual formatting of an XML document.  It is a superset of the CSS functionally designed to support print layouts.dsbw 2011/2012 q1 12
  13. 13. XQuery (XML Query): Example (source)<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book></book></bib>dsbw 2011/2012 q1 13
  14. 14. XQuery (XML Query): Example (query)<results> { let $a := doc("http://bstore1.example.com/bib/bib.xml")//author for $last in distinct-values($a/last), $first in distinct-values($a[last=$last]/first) order by $last, $first return For each author, retrieve its last, first names as well as the title of its books, ordered by <author> last, first names <name> <last>{ $last }</last><first>{ $first }</first> </name> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book where some $ba in $b/author satisfies ($ba/last = $last and $ba/first=$first) return $b/title } </author> }</results>dsbw 2011/2012 q1 14
  15. 15. XQuery (XML Query): Example (result) <results> <author> <name> <last>Abiteboul</last><first>Serge</first> </name> <title>Data on the Web</title> </author> <author> <name> <last>Stevens</last><first>W.</first> </name> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title> </author> <author> <name> <last>Suciu</last><first>Dan</first> </name> <title>Data on the Web</title> </author> </results>dsbw 2011/2012 q1 15
  16. 16. A Smarter Web Is PossiblePeople and communities have data stores and applications to share Vision:  Expand the Web to include more machine-understandable resources  Enable global interoperability between resources you know should be interoperable as well as those you dont yet know should be interoperableKey Web technologies: Web Services: Web of Programs  Standards for interactions between programs, linked on the Web  Easier to Expose and Use services (and data they provide) Semantic Web: Web of Data  Standards for things, relationships and descriptions, linked on the Web  Easier to Understand, Search for, Share, Re-Use, Aggregate, Extend informationdsbw 2011/2012 q1 16
  17. 17. Web Services “A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards”. Web Services Glossary, W3C, http://www.w3.org/TR/ws-gloss/ UDDI: Universal Description, Discovery and Integrationdsbw 2011/2012 q1 17
  18. 18. Simple Object Access Protocol (SOAP) SOAP is a simple XML based protocol to let applications exchange information over HTTP. A SOAP message is a XML document containing the following elements:  A required Envelope element that identifies the XML document as a SOAP message  An optional Header element that contains header information  A required Body element that contains call and response information  An optional Fault element that provides information about errors that occurred while processing the messagedsbw 2011/2012 q1 18
  19. 19. SOAP Request: Example POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: nnn <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPrice> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body> </soap:Envelope>dsbw 2011/2012 q1 19
  20. 20. SOAP Response: Example HTTP/1.1 200 OK Content-Type: application/soap; charset=utf-8 Content-Length: nnn <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPriceResponse> <m:Price>34.5</m:Price> </m:GetStockPriceResponse> </soap:Body> </soap:Envelope>dsbw 2011/2012 q1 20
  21. 21. Web Services Description Language (WSDL) A WSDL document describes a web <definitions> service using these major elements: <types>  <portType>: The operations type definition ...... performed by the web service </types>  <message>: The messages used <message> by the web service message definition ...  <types>: The data types used by </message> the web service <portType>  <binding>: The communication port definition .... protocols used by the web </portType> service <binding> binding definition .. </binding> </definitions>dsbw 2011/2012 q1 21
  22. 22. WSDL Document: Example (fragment) <message name=“getStockPriceRequest"> <part name="StockName" type="xs:string"/> </message> <message name=“getStockPriceResponse"> <part name="Price" type="xs:float"/> </message> <portType name=“StockMarket"> <operation name=“getStockPrice"> <input message="getStockPriceRequest"/> <output message= "getStockPriceTermResponse"/> </operation> </portType>dsbw 2011/2012 q1 22
  23. 23. RESTful Web Services RESTFul Web Services expose their data and functionality trough resources identified by URI Uniform Interface Principle: Clients interact with resources through a fix set of verbs. Example HTTP: GET (read), PUT (update), DELETE, POST (catch all), Multiple representations (MIME types) for the same resource: XML, JSON, … Hyperlinks model resource relationships and valid state transitions for dynamic protocol description and discoverydsbw 2011/2012 q1 23
  24. 24. Representational State Transfer (REST) REST is an architectural style for networked systems based on the following principles:  Client-server  Stateless  no client context being stored on the server between requests  Cacheable  Layered System  Any number of connectors (e.g., clients, servers, caches, firewalls, tunnels, etc.) can mediate the request, but each does so without being concern about anything but its own request  Code-on-demand (optional)  Servers can extend or customize the functionality of a client by transferring to it logic that it can execute.  Uniform Interfacedsbw 2011/2012 q1 24
  25. 25. REST: Uniform Interface All important resources are identified by one (uniform) resource identifier mechanism (e.g. URI) Access methods mean the same for all resources (universal semantics; e.g.: GET, POST, DELETE, PUT) Hypertext as the engine of application state (HATEOAS):  A successful response indicates (or contains) a current representation of the state of the identified resource  The resource remains hidden behind the interface.  Some representations contain links to potential next application states, including direction on how to transition to those states when a transition is selected.dsbw 2011/2012 q1 25
  26. 26. RESTful WS: URI Design Guidelines Only two base URIs per resource:  Collection: /stocks (plural noun)  Element: /stocks/{stock_id} (e.g. /stocks/IBM ) Complex variations:  /dogs?color=red&state=running&location=park Versioning:  /v1/stocks Positioning:  /stocks?limit=25&offset=50 Non-resources (e.g. calculate, convert, …):  /convert?from=EUR&to=CNY&amount=100 (verbs, not nouns)dsbw 2011/2012 q1 26
  27. 27. RESTful WS: Example (adapted from Wikipedia) Resource GET PUT POST DELETEhttp://www.stock.org/ List the Replace the Create a new entry Delete thestocks members entire in the collection. entire (URIs and collection with The new entrys ID collection. perhaps other another is assigned details) of the collection. automatically and collection. For is usually returned example list all by the operation. the stocks.http://www.stock.org/ Retrieve a Update the Treat the Delete thestocks/IBM representation addressed addressed member addressed of the member of the as a collection in its member of addressed collection, or if own right the member of it doesnt and create a new collection. the collection, exist,create it. entry in it. expressed in an appropriate Internet media type.dsbw 2011/2012 q1 27
  28. 28. SOAP+WSDL vs. RESTfuldsbw 2011/2012 q1 28
  29. 29. Semantic Web = The Web of Data “The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form”. "If HTML and the Web made all the online documents look like one huge book, RDF, schema, and inference languages will make all the data in the world look like one huge database" Tim Berners-Leedsbw 2011/2012 q1 29
  30. 30. The Current Web (1/2) Resources:  Identified by URIs  untyped Links:  href, src, ...  limited, non-descriptive Users:  A lot of information, but its meaning must be interpreted and deduced from the content as it has been done since millenniums Machines:  They don’t understand.dsbw 2011/2012 q1 30
  31. 31. The Current Web (2/2) The Public Web  The web found when searching and browsing  At least 21 billion pages indexed by standard search engines The Deep Web  Large data repositories that require their own internal searches.  About 6 trillion documents not indexed by standard search engines. The Private Web  Password-protected sites and data: corporate intranets, private networks, subscription-based services, etc.  About 3 trillion documents not indexed by standard search engines.dsbw 2011/2012 q1 31
  32. 32. The Semantic Web Resources:  Globally identified by URIs  or locally (Blank)  Extensible  Relational Links:  Identified by URIs  Extensible  Relational Users:  More an better information Machines:  More processable information (Data Web)dsbw 2011/2012 q1 32
  33. 33. Semantic Web: How? Make web resources more accessible to automated processes Extend existing rendering markup with semantic markup  Metadata (data about data) annotations that describe content/function of web accessible resources Use Ontologies to provide vocabulary for annotations  “Formal specification” accessible to machines A prerequisite is a standard web ontology language  Need to agree common syntax before we can share semantics  Syntactic web based on standards such as HTTP and HTMLdsbw 2011/2012 q1 33
  34. 34. Metadata annotationsdsbw 2011/2012 q1 34
  35. 35. Semantic Web: W3C Standards and Tools RDF (Resource Description Framework): simple data model to describe resources and their relationships RDF Schema: is a language for declaring basic class and types for describing the terms used in RDF, that allows defining class hierarchies SPARQL: SPARQL Protocol and RDF Query Language OWL: Web Ontology Language. Allows enriching the description of properties and classes, including, among others, class disjunction, association cardinality, richer data types, property features (eg. symmetry), etc.dsbw 2011/2012 q1 35
  36. 36. Resource Description Framework (RDF) RDF is graphical formalism ( + XML syntax + semantics)  for representing metadata  for describing the semantics of information in a machine- accessible way RDF Statements are <subject, predicate, object> triples that describe properties of resources : <Carles,hasColleague,Ernest> XML representation:<Description about="some.uri/person/carles_farre"> <hasColleague resource="some.uri/person/ernest_teniente"/> </Description>dsbw 2011/2012 q1 36
  37. 37. RDF Schema RDF Schema allows you to define vocabulary terms and the relations between those terms  it gives “extra meaning” to particular RDF predicates and resources  this “extra meaning”, or semantics, specifies how a term should be interpreted Examples: <Person,type,Class> <hasColleague,type,Property> <Professor,subClassOf,Person> <Cristina,type,Professor> <hasColleague,range,Person> <hasColleague,domain,Person>dsbw 2011/2012 q1 37
  38. 38. Problems with RDFS RDFS too weak to describe resources in sufficient detail  No localized range and domain constraints  Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants  No existence/cardinality constraints  Can’t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents  No transitive, inverse or symmetrical properties  Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical  … Difficult to provide reasoning support  No “native” reasoners for non-standard semantics  May be possible to reason via FO axiomatizationdsbw 2011/2012 q1 38
  39. 39. Web Ontology Language (OWL) OWL is RDF(S), adding vocabulary to specify:  Relations between classes  Cardinality  Equality  More typing of and characteristics of properties  Enumerated classes Three species of OWL  OWL full is union of OWL syntax and RDF  OWL DL restricted to FOL fragment (≅ SHIQ Description Logic)  OWL Lite is “easier to implement” subset of OWL DL OWL DL Benefits from many years of DL research  Well defined semantics  Formal properties well understood (complexity, decidability)  Known reasoning algorithms  Implemented systems (highly optimised)dsbw 2011/2012 q1 39
  40. 40. OWL in RDF(S) notation: Example Person ⊓ ∀hasChild.(Doctor ⊔ ∃hasChild.Doctor)<owl:Class> <owl:intersectionOf rdf:parseType=" collection"> <owl:Class rdf:about="#Person"/> <owl:Restriction> <owl:onProperty rdf:resource="#hasChild"/> <owl:toClass> <owl:unionOf rdf:parseType="collection"> <owl:Class rdf:about="#Doctor"/> <owl:Restriction> <owl:onProperty rdf:resource="#hasChild"/> <owl:hasClass rdf:resource="#Doctor"/> </owl:Restriction> </owl:unionOf> </owl:toClass> </owl:Restriction> </owl:intersectionOf></owl:Class>dsbw 2011/2012 q1 40
  41. 41. SPARQL Protocol And RDF Query Language Designed to query collections of triples… …and to easily traverse relationships Vaguely SQL-like syntax (SELECT, WHERE) “Matches graph patterns” SELECT ?sal WHERE { emps:e13954 HR:salary ?sal }dsbw 2011/2012 q1 41
  42. 42. SQL vs SPARQLEMP_ID NAME HIRE_ SALARY DATE emps:e13954 HR:name Joe emps:e13954 HR:hire-date 2000-04-1413954 Joe 2000-04-14 48000 emps:e13954 HR:salary 4800010335 Mary 1998-11-23 52000 emps:e10335 HR:name ‘Mary… … … … emps:e10335 HR:hire-date 1998-11-23 emps:e10335 HR:salary 5200004182 Bob 2005-02-10 21750 …SELECT hire_date SELECT ?hdate WHERE FROM employees { ?id HR:salary ?sal WHERE salary >= 21750 ?id HR:hire_date ?hdate FILTER ?sal >= 21750 }dsbw 2011/2012 q1 42
  43. 43. Semantic Web Services Web Services Dynamic UDDI, WSDL, SOAP Semantic Web Services Static WWW Semantic Web URI, HTML, HTTP RDF, RDF(S), OWL The main aim is to enable highly flexible Web services architectures, where new services can be quickly discovered, orchestrated and composed into workflows by  creating a semantic markup of Web services that makes them machine understandable and use-apparent is necessary  developing an agent technology that exploits this semantic markup to support automated Web service composition and interoperabilitydsbw 2011/2012 q1 43
  44. 44. References KAPPEL, Gerti et al. Web Engineering, John Wiley & Sons, 2006. Chapter 14. SHKLAR, Leon and ROSEN, Rich. Web Application Architecture: Principles, Protocols and Practices, 2nd Edition. John Wiley & Sons, 2009. Chapters 5 and 13. RAY, Kate. Web 3.0 (video) http://vimeo.com/11529540 www.w3.org www.w3schools.comdsbw 2011/2012 q1 44