Your SlideShare is downloading. ×
  • Like
Omitola o rian_eswc_idts final
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Omitola o rian_eswc_idts final



Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Digital Enterprise Research Institute Capturing interactive data transformation operations using provenance workflows Tope Omitola, Andre Freitas, Edward Curry, Sean ORiain, Nicholas Gibbins and Nigel Shadbolt SWPM Workshop 28.05.2012, Herakleion, Crete Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. OutlineDigital Enterprise Research Institute  Motivation  Interactive data transformations (IDTs)  IDT & Provenance  Modelling IDTs  Provenance Representation  Provenance Capture  Case Study  Conclusion
  • 3. MotivationDigital Enterprise Research Institute  Dataspaces:  High number of heterogeneous data sources  Complex data transformation environment  Need for both repeatable data transformations and once- off transformations  Traditional ETL approaches for data transformation/integration:  Based on scripting/programming  Focus on repeatable data transformation processes
  • 4. Interactive Data Transformation (IDTs)Digital Enterprise Research Institute  Based on user interaction paradigms for user creation of data transformations  Explores GUI elements mapping to data transformation operations  Instant feedback of each iteration  Complementary to existing ETL tools  Lower the barriers for non-programmers (reduces programming effort) of doing data transformations  Example platforms: Google Refine, Potters Wheel, Wrangler
  • 5. Interactive Data Transformation (IDTs)Digital Enterprise Research Institute
  • 6. ChallengesDigital Enterprise Research Institute  How to model IDTs?  Facilitating the reuse of previous IDTs  Representing IDTs Provenance  Making IDT platforms provenance-aware  Enabling transportability across IDT and ETL platforms
  • 7. IDT & ProvenanceDigital Enterprise Research Institute  Provenance supports representation of interactive data transformations  Output: a provenance descriptor which shows the relationship between the inputs, the outputs, and the applied transformation operations  Both retrospective and prospective provenance
  • 8. IDTDigital Enterprise Research Institute  IDT model  Formal model (Algebra for IDT)  Provenance representation  Provenance capture of IDTs
  • 9. IDT Model: Core ElementsDigital Enterprise Research Institute  Schema and instance data  Set of predefined operations  GUI elements mapping to predefined operations  User actions  Operation selection  Parameter selection  Operation composition (workflow)
  • 10. IDT ModelDigital Enterprise Research Institute
  • 11. Formalizing the mapping from IDT to ProvenanceDigital Enterprise Research Institute  Definition 1: A provenance-based interactive data transformation engine, consists of a set of transformations (or activities) on a set of datasets generating outputs in the form of other datasets or events which may trigger further transformations  Definition 2: An interactive data transformation event, consists of the input dataset, the output dataset(s), the applied transformation function, and the time the transformation took place
  • 12. Formalizing the mapping from IDT to ProvenanceDigital Enterprise Research Institute  Definition 3: A run is a function from time to dataset(s) and the transformation applied to those dataset(s)  Definition 4: A trace is the sequence of pairs of a run and the time the run was made
  • 13. Provenance RepresentationDigital Enterprise Research Institute  Proposed in Representing Interoperable Provenance Descriptions for ETL Workflows  Three-layered provenance model:  Open Provenance Model Vocabulary Layer  Cogs ETL Provenance Vocabulary  Domain-Specific Model Layer  Linked Data standards
  • 14. Provenance Capture LayersDigital Enterprise Research Institute
  • 15. Provenance Event-Capture Sequence FlowDigital Enterprise Research Institute
  • 16. Case studyDigital Enterprise Research Institute  Implementation over the GR Platform  Example descriptor @prefix grf: <> . grf :MassCellChange-1092380975 rdf:type opmv:Process, cogs:ColumnOperation, cogs:Transformation; Mapping to the actual program cogs:operationName "MassCellChange"^^xsd:string; cogs:programUsed ""^^xsd:string; Process rdfs:label "Mass edit 1 cells in column ==List of winners=="^^xsd:string. grf:MassCellChange-1092380975/1_0 rdf:type opmv:Artifact ; Input Artifact rdfs:label "* 1955 [[Meena Kumari]][[Parineeta (1953 film)|Parineeta]] as Lolita"^^xsd:string. grf:MassCellChange-1092380975/1_1 rdf:type opmv:Artifact; Output Artifact rdfs:label "* John Wayne"^^xsd:string. Workflow structure grf:MassCellChange-1092380975/1_1 opmv:wasDerivedFrom grf:MassCellChange-1092380975/1_0. grf:MassCellChange-1092380975 opmv:used grf:MassCellChange-1092380975/1_0. grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedBy grf:MassCellChange-1092380975. grf:MassCellChange-1092380975/1_1 opmv:wasGeneratedAt "2011-11-16T11:2:14"^xsd: dateTime.
  • 17. ConclusionDigital Enterprise Research Institute  The proposed approach provides low impact on the existing IDT process  Provenance representation supports different data models  Preliminary implementation of a Google Refine provenance extension