Data Shapes and Data Transformations

1,759 views

Published on

Nowadays, information management systems deal with data originating from different sources including relational databases, NoSQL data stores, and Web data formats, varying not only in terms of data formats, but also in the underlying data model. Integrating data from heterogeneous data sources is a time-consuming and error-prone engineering task; part of this process requires that the data has to be transformed from its original form to other forms, repeating all along the life cycle. With this report we provide a principled overview on the fundamental data shapes tabular, tree, and graph as well as transformations between them, in order to gain a better understanding for performing said transformations more efficiently and effectively.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,759
On SlideShare
0
From Embeds
0
Number of Embeds
244
Actions
Shares
0
Downloads
13
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Data Shapes and Data Transformations

  1. 1. Data Shapes and Data TransformationsMichael Hausenblas1, Boris Villazón-Terrazas2, and Richard Cyganiak1 1 DERI, NUI Galway, Ireland firstname.lastname@deri.org 2 iSOCO, Madrid, Spain bvillazon@isoco.com Paper available at: http://arxiv.org/abs/1211.1565
  2. 2. ToC» Motivation» Fundamental data shapes» Data shapes transformations» Discussion 2
  3. 3. ToC» Motivation» Fundamental data shapes» Data shapes transformations» Discussion 3
  4. 4. MotivationCurrent data systems combine data from atremendous number of resources 1. load extract transform 1. Pat Helland. If You Have Too Much Data, then Good Enough Is Good Enough. Queue, 9:40:40-40:50, May 2011. http://queue.acm.org/detail.cfm?id=1988603 4
  5. 5. MotivationWe use the term data shape to refer on how data isarranged and structured. resource data shape 5
  6. 6. ToC» Motivation» Fundamental data shapes» Data shapes transformations» Discussion 6
  7. 7. TabularA tabular data shape organizes data items into atable. Location Environmental Services Carlow County Council 40 Cavan County Council 36 Clare County Council 38 Cork City Council 51 Cork County Council 47 Donegal County Council 45 Dublin City Council 43 7
  8. 8. TreeA tree data shape organizes data items into ahierarchy. A data item is designated to be the root ofthe tree while the remaining data items arepartitioned into non-empty sets each of which is asubtree of the root. 8
  9. 9. GraphA graph data shape consists of a set of vertexes,and a set of edges. An edge is a pair of vertexes.The two vertexes are called edge endpoints. TM 9
  10. 10. ToC» Motivation» Fundamental data shapes» Data shapes transformations» Discussion 10
  11. 11. Features Input/Output, generic data shape, and specific implementation Declarative/Operational 11
  12. 12. Features Lossy transformation: all queries that are possible on the original shape are also possible on the resultant shape 12
  13. 13. Tabular - Tabular • RDB – RDB • SQL Select SELECT Location as Region, EServices as EnvServices FROM servicesLocation EServices Regjon EnvServicesCarlow County Council 40 Carlow County Council 40Cavan County Council 36 Cavan County Council 36Clare County Council 38 Data shape Clare County Council 38Cork City Council 51 transformation Cork City Council 51Cork County Council 47 Cork County Council 47Donegal County Council 45 Donegal County Council 45Dublin City Council 43 Dublin City Council 43 • Declarative • No Information loss • No provenance • Standard language, SQL 13
  14. 14. Tabular - Tree • RDB – XML • XML representation of a relational databaseLocation EnvironmentalServicesCarlow County Council 40Cavan County Council 36Clare County Council 38 Data shapeCork City Council 51 transformationCork County Council 47Donegal County Council 45Dublin City Council 43 • Operational • No Information loss 14
  15. 15. Tabular - Graph • RDB – RDF • W3C RDB2RDF WG – R2RML 1ID Name10 Venus Data shape20 Felipe transformation R2RML Mapping • Declarative • No Information loss • W3C Recommendation 1. http://www.w3.org/TR/r2rml/ 15
  16. 16. Tree - Tabular• XML - RDB • A technique and tool that rely on the XSD of the XML 1 Location EnvironmentalServices Carlow County Council 40 Cavan County Council 36 Data shape Clare County Council 38 transformation Cork City Council 51 Cork County Council 47 Donegal County Council 45 Dublin City Council 43 • Operational • No Information loss 1. Amy Flik, Transforming XML into a Relational Database Using XML Schema Document Type, 2009. http://scholarworks.gvsu.edu/cistechlib/48/ 16
  17. 17. Tree - Tree• XML - XML • XSLT 1 Data shape transformation • Declarative • No Information loss • W3C Recommendation 1. http://www.w3.org/TR/xslt 17
  18. 18. Tree - Graph• XML - RDF • Gleaning Resource Descriptions from Dialects of Languages - GRDDL 1 Data shape transformation • Declarative • No Information loss • W3C Recommendation 1. http://www.w3.org/TR/grddl/ 18
  19. 19. Graph - Tabular• RDF - RDB • SPARQL 1 SELECT Data shape transformation • Declarative • Information loss • W3C Recommendation 1. http://www.w3.org/TR/rdf-sparql-query/ 19
  20. 20. Graph - Tree• RDF - XML • Rhizomik ReDeFer RDF2XHTML 1, relies on XSLT Data shape transformation • Declarative (XSLT) • Information loss • Ad-hoc tool 1. http://rhizomik.net/html/redefer/ 20
  21. 21. Graph - Graph• RDF - RDF • SPARQL 1 Construct Data shape transformation • Declarative • No Information loss • W3C Recommendation 1. http://www.w3.org/TR/rdf-sparql-query/ 21
  22. 22. Summary 22
  23. 23. ToC» Motivation» Fundamental data shapes» Data shapes transformations» Discussion 23
  24. 24. Discussion We can perform (loss-less) data shape transformations between certain shapes. A number of data shape transformations are already standards - For RDB2RDF, see R2RML and Direct Mapping. - For XML2XML, see XSLT. - For XML2RDF, see GRDDL. Some data shape transformations are declarative in nature. In certain cases we have to deal with lossy transformations. 24
  25. 25. 25
  26. 26. Data Shapes and Data TransformationsMichael Hausenblas1, Boris Villazón-Terrazas2, and Richard Cyganiak1 1 DERI, NUI Galway, Ireland firstname.lastname@deri.org 2 iSOCO, Madrid, Spain bvillazon@isoco.com Paper available at: http://arxiv.org/abs/1211.1565

×