Data Transformation using Semantic Web Standards

2,798 views

Published on

This presentation explains the benefits of using Semantic Web standards for integration and transformation of data. Step by step examples are included.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,798
On SlideShare
0
From Embeds
0
Number of Embeds
84
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Transformation using Semantic Web Standards

  1. 1. Importing and Using diverse Schemas and Data with the TopBraid Suite
  2. 2. The enterprise data integration problem How does government spending in certain XML sectors relates to my company’s earnings? RDB How does the historic spending relates to the current figures? Spreadsheet Give me report about all of my customers across the whole organization © Copyright 2007-2009 TopQuadrant Inc. Slide 2
  3. 3. Merging data with RDF “Rote” syntactic transformation into RDF (the mathematically simplest way to denote linked data) Once in RDF:  Merges happen as part XML of the infrastructure  Concepts can be mapped to one another For example, to say that RDB one notion of “Customer” is more general than another Without needing to Spreadsheet reference the syntactic type of the source! Mapping is also captured in RDF Data transformation (on merged data) make no reference to syntax of the source – they can be written in a single language (SPARQL) © Copyright 2007-2009 TopQuadrant Inc. Slide 3
  4. 4. Semantic Mappings © Copyright 2007-2009 TopQuadrant Inc. Slide 4
  5. 5. Benefits of separating Syntactic details from Semantic mapping - 1  Rote import provides a name (URI) for every entity in every data source, so that they can be referenced  It is easier to discuss how "my" use of the word "Customer" relates to "your" use than to agree who gets to define "Customer“  By translating into a simple, common language, all mappings and transforms can be of the same form (i.e., SPARQL).  In contrast to several transforms for each pair of languages © Copyright 2007-2009 TopQuadrant Inc. Slide 5
  6. 6. Benefits of separating Syntactic details from Semantic mapping - 2  Each new kind of source only needs one new importer  In contrast to needing one for each old syntax  Import modules don’t need to implement the merge functionality  The underlying data representation supports merge as a primitive operation  No need to worry about a number of information types and when they can be merged; there is just one. © Copyright 2007-2009 TopQuadrant Inc. Slide 6
  7. 7. TopBraid Suite’s Implementation - 1  Built-in default converters transform information from a variety of sources into RDF (rote import):  Arbitrary XML, XML Schema, Spreadsheets, Databases, etc  Depending on the complexity, conversion logic is either encoded in an ontology or in a Java module  If round-triping is supported, all information from the original is preserved (sometimes in annotations) © Copyright 2007-2009 TopQuadrant Inc. Slide 7
  8. 8. TopBraid Suite’s Implementation - 2  Once in RDF, SPARQL is used to transform and map as needed  Imported RDF “as-is” may not be what a particular application requires  Transformation steps are represented using mapping ontologies and/or SPIN ( http://www.topquadrant.com/spin/ ) rules/templates  Entire transformation process is saved as a SPARQLMotion ( http://www.topquadrant.com/sparqlmotion/) script for repeated executions © Copyright 2007-2009 TopQuadrant Inc. Slide 8
  9. 9. Semantic XML © Copyright 2007-2009 TopQuadrant Inc. Slide 9
  10. 10. Built-in Converter Example: Semantic XML  Select an XML file and open it in TopBraid Composer (you may need to right click on a file and select Open With > TopBraid) Each element name becomes a class Each attribute becomes datatype property Nesting is mapped into a dedicated object property (composite:child) (we are using a simple file describing people and jobs) © Copyright 2007-2009 TopQuadrant Inc. Slide 10
  11. 11. Built-in Converter Example: Semantic XML Converted to RDF © Copyright 2007-2009 TopQuadrant Inc. Slide 11
  12. 12. Built-in Converter Example: Semantic XML Each element becomes composite:child property captures a class with instances the hierarchical nesting in the XML for each occurrence of document the element in the document © Copyright 2007-2009 TopQuadrant Inc. Slide 12
  13. 13. Semantic Tables © Copyright 2007-2009 TopQuadrant Inc. Slide 13
  14. 14. Built-in Converter Example: Semantic Tables *  Select an Excel file and simply open it in TopBraid Composer Each sheet becomes a class Columns become datatype properties Rows become instances Cells will be converted into triples, where the subject is the row instance, the predicate is the column property, and the object is a literal with the value of the cell *Assumes that the spreadsheet is structured as a table. Not all spreadsheets are designed this way. To support different design patterns TopBraid Suite offers more than one spreadsheet importer. © Copyright 2007-2009 TopQuadrant Inc. Slide 14
  15. 15. Built-in Converter Example: Semantic Tables Converted to RDF © Copyright 2007-2009 TopQuadrant Inc. Slide 15
  16. 16. Other default importers  Relational Databases  Uses simple mapping of tables to classes, columns and foreign keys to properties  XML profiles  Extends Semantic XML with pre-built profiles such as one for XHTML  XML Schema  Complex logic provided in a specialized Java module  UML, RDFa, RSS, e-Mail, additional spreadsheet importers, … © Copyright 2007-2009 TopQuadrant Inc. Slide 16
  17. 17. Merging Data © Copyright 2007-2009 TopQuadrant Inc. Slide 17
  18. 18. Next Steps  RDF converted from the XML file and RDF from the spreadsheet can now be merged:  Open one, switch to Import tab, drag and drop the second one or  Create a mapping/aggregation file and import both, XML and spreadsheet  Creating connections  Conceptually XML and Excel examples are linked: • XML lis d re p o leinc ingthe jo sa o a tio the ts iffe nt e p lud ir b nd rg niza ns y w rk fo o r • Exc l ha c m a info a n o a d b ind try s c rs e s o p ny rm tio rg nize y us e to  But there are no connections in the raw data  SPARQL queries (CONSTRUCT) including query templates (to generalize query patterns) can be used to establish connections • Ma p sa re o e in them p ingo lo ie a s rip fo re e t p ing re c rd d ap nto g s nd c ts r p a e c n xe utio © Copyright 2007-2009 TopQuadrant Inc. Slide 18
  19. 19. Scripting Data Transformations © Copyright 2007-2009 TopQuadrant Inc. Slide 19
  20. 20. Step by Step Example  Extract and convert data from a real XML file  Publish result as a web page  Combine SPARQLMotion, Web Service, Semantic XML, and XSD to accomplish the result.  Step by step instructions are provided, requires TopBraid Composer Maestro Edition  Also requires some familiarity with SPARQL and SPARQLMotion  Recommended first step is to go through the SPARQLMotion tutorial and examples at: http://www.topquadrant.com/sparqlmotion/ © Copyright 2007-2009 TopQuadrant Inc. Slide 20
  21. 21. Open XML file  We will use an XML file from the US Federal Government about the FEA. Download from:  http://www.whitehouse.gov/omb/assets/fea_docs/FEA_XML_Doc_Rev _2_3.xml  Open it with Semantic XML © Copyright 2007-2009 TopQuadrant Inc. Slide 21
  22. 22. Explore converted RDF  There are 42 BusinessLines in this XML file.  Each one has a Name, Defintion, and SubFunction detail.  Click on one and explore in the graph view © Copyright 2007-2009 TopQuadrant Inc. Slide 22
  23. 23. Extract some information using SPARQL  Looking at the graph, write a SPARQL query that will determine the name of the business line and the BusinessLineID  Check that the business line in the graph appears in the solution © Copyright 2007-2009 TopQuadrant Inc. Slide 23
  24. 24. Extract correlated information with SPARQL  Extend your query to find the corresponding BusinessLineDefinitionText.  Display just the names and descriptions of the business lines. Save this query in a safe place – we’ll use it later © Copyright 2007-2009 TopQuadrant Inc. Slide 24
  25. 25. Shortcut: SPARQL by EXAMPLE - 1  Complete queries (or for more complex queries a starting point), can be generated directly from a graph  We call this generation capability “SPARQL by Example” – saves a lot of tedious work and helps to prevent mistakes  To get started display the graph pattern for a single business line Click to “pin down”, all the classes in the diagram, the rest will be treated as a variable Click to on the star icon to generate a query Run it in the usual way Looks good, but we are not getting the text fields © Copyright 2007-2009 TopQuadrant Inc. Slide 25
  26. 26. Shortcut: SPARQL by EXAMPLE - 2 Click to “pin down” text fields so that they are included in the query We get one result But we need names and descriptions for all business lines, not just the one we pinned down! Modify the query by hand to turn the name and description into variables and to include only these variables in the SELECT list © Copyright 2007-2009 TopQuadrant Inc. Slide 26
  27. 27. Encode the process in SPARQLMotion  Create a new SPARLQMotion file.  Click “Yes”, it will declare web services.  Create a new SPARQLMotion script.  Start with a CreateSpreadsheet; call it findLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 27
  28. 28. Encode the process in SPARQLMotion  Bring the XML file into the SPARQLMotion script by dragging it onto the canvas.  This automatically makes a SXML import module. © Copyright 2007-2009 TopQuadrant Inc. Slide 28
  29. 29. Encode the process in SPARQLMotion  Connect these two modules together © Copyright 2007-2009 TopQuadrant Inc. Slide 29
  30. 30. Encode the process in SPARQLMotion  Add your query to findLOB (double-click to edit) © Copyright 2007-2009 TopQuadrant Inc. Slide 30
  31. 31. Encode the process in SPARQLMotion  Add a “ModifyPrefixes” module to specify the namespace for the query you just pasted.  Connect it with next to findLOB module  Copy-and paste the base URI of the XML file with a space before and a # after © Copyright 2007-2009 TopQuadrant Inc. Slide 31
  32. 32. Test the script  Run the whole script with the debug button  select the last step  Results appear in the Console tab  Results are in tab-delimited form © Copyright 2007-2009 TopQuadrant Inc. Slide 32
  33. 33. Exposing Results with Web Services © Copyright 2007-2009 TopQuadrant Inc. Slide 33
  34. 34. Serve as a web page  Add a Return Text module to the script. Call it showLOB.  Make it the last module, right after findLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 34
  35. 35. View as a web page  Point you browser to: http://localhost:8083/tbl/actions?action=sparqlmotion&id=showLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 35
  36. 36. Extend the script to create an HTML file These first two modules can be re-used from the initial script xhtml.owl can be found in your TBC folder, just drag and drop it ApplyConstruct See Copy and Paste file for details ConvertRDFtoXML no configuration needed ReturnXML Mimetype text/html © Copyright 2007-2009 TopQuadrant Inc. Slide 36
  37. 37. Viewing in a Web Browser http://localhost:8083/tbl/actions?action=sparqlmotion&id=tabulateLOB © Copyright 2007-2009 TopQuadrant Inc. Slide 37
  38. 38. To Learn More  Attend one of TopQuadrant’s Semantic Web Technology Trainings:  Semantic Web Technology & Introduction to TopBraid Suite  TopBraid Suite Advanced Product Training Series  For scheduled dates, locations and other information, visit:  http://www.topquadrant.com/training/training_overview.html  Private, on-site trainings are also available  Call (703) 299-9330 or write to trainings@topquadrant.com. © Copyright 2007-2009 TopQuadrant Inc. Slide 38

×