SlideShare a Scribd company logo
The Practical Side of
Information Integration with




          Fariz Darari (FU Bolzano)
                                      1
            fadirra@gmail.com
Outline
1. Information Integration
2. CloverETL
3. Demo
  – Global Schema
  – Data Sources
  – Queries



                    Fariz Darari (FU Bolzano)
                                                2
                      fadirra@gmail.com
INFORMATION INTEGRATION


           Fariz Darari (FU Bolzano)
                                       3
             fadirra@gmail.com
Information Integration

II has the aim to provide uniform access to data
   that are stored in a number of autonomous
   and heterogeneous sources.




                   Fariz Darari (FU Bolzano)
                                                   4
                     fadirra@gmail.com
Challenges
• Different data models
     (structured, semi-structured, text)
• Different schemata
• Differences in the representation of
  – values (km vs. miles, USD vs. EUR)
  – entities (addresses, dates, etc.)
• Inconsistencies among the data

                     Fariz Darari (FU Bolzano)
                                                 5
                       fadirra@gmail.com
Components
• Consists of:
  1. Global Schema
     The unifying schema among local schemata.
  2. Wrappers
     Wrappers make sources accessible.
  3. Mediators
     Translate queries, combine answers of wrappers
     and other mediators.


                    Fariz Darari (FU Bolzano)
                                                      6
                      fadirra@gmail.com
Information Integration - GAV
• An approach of mapping source schemata and
  global schema
• GAV = relations in the global schema are
  views of the sources
• Views are virtual relations, the global schema
  describes a virtual DB



                   Fariz Darari (FU Bolzano)
                                               7
                     fadirra@gmail.com
Information Integration - GAV




           Fariz Darari (FU Bolzano)
                                       8
             fadirra@gmail.com
Information Integration - ETL




           Fariz Darari (FU Bolzano)
                                       9
             fadirra@gmail.com
Information Integration - ETL Products




               Fariz Darari (FU Bolzano)
                                           10
                 fadirra@gmail.com
CLOVER ETL


             Fariz Darari (FU Bolzano)
                                         11
               fadirra@gmail.com
CloverETL
• An Open Source based platform for
  information integration.
• Data can be:
  – extracted from any number of sources
  – validated and modified along the way
  – written to one or more destinations.




                    Fariz Darari (FU Bolzano)
                                                12
                      fadirra@gmail.com
CloverETL - Company




      Fariz Darari (FU Bolzano)
                                  13
        fadirra@gmail.com
CloverETL - Architecture




        Fariz Darari (FU Bolzano)
                                    14
          fadirra@gmail.com
CloverETL - Designer




      Fariz Darari (FU Bolzano)
                                  15
        fadirra@gmail.com
CloverETL - Designer
• Transformation graphs are created in
  CloverETL Designer.
• Tranformation graphs are divided into:
  – Extract (Green)
  – Transformation (Yellow)
  – Load (Blue)
• The edges correspond to the data flows from
  data sources to data targets.
                    Fariz Darari (FU Bolzano)
                                                16
                      fadirra@gmail.com
DEMO


       Fariz Darari (FU Bolzano)
                                   17
         fadirra@gmail.com
Global Schema




   Fariz Darari (FU Bolzano)
                               18
     fadirra@gmail.com
Global Schema - Example
• Student(sid, sname, age, nationality)
• Country(cid, cname, currency)




                   Fariz Darari (FU Bolzano)
                                               19
                     fadirra@gmail.com
Data Sources
• Unibz (Bolzano), from Relational DB
  – StudentBZ(id, name, sex, age, nationality, address)
• Unitr (Trento), from XML
  – StudentTR(id, full_name, age, nationality)
• Unimi (Milan), from CSV
  – StudentMI(student_id, name, gender, age, citizenship)
• UN (United Nations), from Excel
  – CountryUN(id, country_name, population, capital, currency)

                         Fariz Darari (FU Bolzano)
                           fadirra@gmail.com                20
Data Sources - Mapping
• Student(sid, sname, age, nationality) :-
     StudentBZ(sid, sname, _, age, nationality, _)
• Student(sid, sname, age, nationality) :-
     StudentTR(sid, sname, age, nationality)
• Student(sid, sname, age, nationality) :-
     StudentMI(sid, sname, _, age, nationality)
• Country (cid, cname, currency) :-
     CountryUN(cid, cname, _, _, currency)
                    Fariz Darari (FU Bolzano)
                                                 21
                      fadirra@gmail.com
Queries
1. All students with their information.
   q(sid, sname, age, nationality) :-
       Student(sid, sname, age, nationality).
2. All students whose age is more than 22.
   q(sid, sname) :-
       Student(sid, sname, age, nationality), age > 22.
3. All students with their nationality’s currency.
   q(sid, sname, age, nationality, currency) :-
       Student(sid, sname, age, nationality),
       Country(cid, nationality, currency).
4. The number of students per country.
   SELECT nationality, count(sid) FROM Student
       GROUP BY nationality
                          Fariz Darari (FU Bolzano)
                                                          22
                            fadirra@gmail.com
Demo
• Query:
q(sid, sname) :-
  Student(sid, sname, age, nationality), age > 22.
• Logical Plans:
q(sid, sname) :-
  StudentBZ(sid, sname, _, age, nationality, _), age > 22.
q(sid, sname) :-
  StudentTR(sid, sname, age, nationality), age > 22.
q(sid, sname) :-
  StudentMI(sid, sname, _, age, nationality), age > 22.
                      Fariz Darari (FU Bolzano)
                                                      23
                        fadirra@gmail.com
Demo - Execution Plan




       Fariz Darari (FU Bolzano)
                                   24
         fadirra@gmail.com
References
• http://www.cloveretl.com/
• http://www.inf.unibz.it/~nutt/InfInt1112/




                   Fariz Darari (FU Bolzano)
                                               25
                     fadirra@gmail.com

More Related Content

More from Fariz Darari

[PUBLIC] quiz-01-midterm-solutions.pdf
[PUBLIC] quiz-01-midterm-solutions.pdf[PUBLIC] quiz-01-midterm-solutions.pdf
[PUBLIC] quiz-01-midterm-solutions.pdfFariz Darari
 
Free AI Kit - Game Theory
Free AI Kit - Game TheoryFree AI Kit - Game Theory
Free AI Kit - Game TheoryFariz Darari
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroFariz Darari
 
NLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it hasNLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it hasFariz Darari
 
Supply and Demand - AI Talents
Supply and Demand - AI TalentsSupply and Demand - AI Talents
Supply and Demand - AI TalentsFariz Darari
 
Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02Fariz Darari
 
AI in education done properly
AI in education done properlyAI in education done properly
AI in education done properlyFariz Darari
 
Artificial Neural Networks: Pointers
Artificial Neural Networks: PointersArtificial Neural Networks: Pointers
Artificial Neural Networks: PointersFariz Darari
 
Open Tridharma at ICACSIS 2019
Open Tridharma at ICACSIS 2019Open Tridharma at ICACSIS 2019
Open Tridharma at ICACSIS 2019Fariz Darari
 
Defense Slides of Avicenna Wisesa - PROWD
Defense Slides of Avicenna Wisesa - PROWDDefense Slides of Avicenna Wisesa - PROWD
Defense Slides of Avicenna Wisesa - PROWDFariz Darari
 
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz DarariSeminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz DarariFariz Darari
 
Foundations of Programming - Java OOP
Foundations of Programming - Java OOPFoundations of Programming - Java OOP
Foundations of Programming - Java OOPFariz Darari
 
Recursion in Python
Recursion in PythonRecursion in Python
Recursion in PythonFariz Darari
 
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
[ISWC 2013] Completeness statements about RDF data sources and their use for ...[ISWC 2013] Completeness statements about RDF data sources and their use for ...
[ISWC 2013] Completeness statements about RDF data sources and their use for ...Fariz Darari
 
Testing in Python: doctest and unittest (Updated)
Testing in Python: doctest and unittest (Updated)Testing in Python: doctest and unittest (Updated)
Testing in Python: doctest and unittest (Updated)Fariz Darari
 
Testing in Python: doctest and unittest
Testing in Python: doctest and unittestTesting in Python: doctest and unittest
Testing in Python: doctest and unittestFariz Darari
 
Dissertation Defense - Managing and Consuming Completeness Information for RD...
Dissertation Defense - Managing and Consuming Completeness Information for RD...Dissertation Defense - Managing and Consuming Completeness Information for RD...
Dissertation Defense - Managing and Consuming Completeness Information for RD...Fariz Darari
 
Research Writing - 2018.07.18
Research Writing - 2018.07.18Research Writing - 2018.07.18
Research Writing - 2018.07.18Fariz Darari
 
KOI - Knowledge Of Incidents - SemEval 2018
KOI - Knowledge Of Incidents - SemEval 2018KOI - Knowledge Of Incidents - SemEval 2018
KOI - Knowledge Of Incidents - SemEval 2018Fariz Darari
 
Comparing Index Structures for Completeness Reasoning
Comparing Index Structures for Completeness ReasoningComparing Index Structures for Completeness Reasoning
Comparing Index Structures for Completeness ReasoningFariz Darari
 

More from Fariz Darari (20)

[PUBLIC] quiz-01-midterm-solutions.pdf
[PUBLIC] quiz-01-midterm-solutions.pdf[PUBLIC] quiz-01-midterm-solutions.pdf
[PUBLIC] quiz-01-midterm-solutions.pdf
 
Free AI Kit - Game Theory
Free AI Kit - Game TheoryFree AI Kit - Game Theory
Free AI Kit - Game Theory
 
Neural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An IntroNeural Networks and Deep Learning: An Intro
Neural Networks and Deep Learning: An Intro
 
NLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it hasNLP guest lecture: How to get text to confess what knowledge it has
NLP guest lecture: How to get text to confess what knowledge it has
 
Supply and Demand - AI Talents
Supply and Demand - AI TalentsSupply and Demand - AI Talents
Supply and Demand - AI Talents
 
Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02
 
AI in education done properly
AI in education done properlyAI in education done properly
AI in education done properly
 
Artificial Neural Networks: Pointers
Artificial Neural Networks: PointersArtificial Neural Networks: Pointers
Artificial Neural Networks: Pointers
 
Open Tridharma at ICACSIS 2019
Open Tridharma at ICACSIS 2019Open Tridharma at ICACSIS 2019
Open Tridharma at ICACSIS 2019
 
Defense Slides of Avicenna Wisesa - PROWD
Defense Slides of Avicenna Wisesa - PROWDDefense Slides of Avicenna Wisesa - PROWD
Defense Slides of Avicenna Wisesa - PROWD
 
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz DarariSeminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
Seminar Laporan Aktualisasi - Tridharma Terbuka - Fariz Darari
 
Foundations of Programming - Java OOP
Foundations of Programming - Java OOPFoundations of Programming - Java OOP
Foundations of Programming - Java OOP
 
Recursion in Python
Recursion in PythonRecursion in Python
Recursion in Python
 
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
[ISWC 2013] Completeness statements about RDF data sources and their use for ...[ISWC 2013] Completeness statements about RDF data sources and their use for ...
[ISWC 2013] Completeness statements about RDF data sources and their use for ...
 
Testing in Python: doctest and unittest (Updated)
Testing in Python: doctest and unittest (Updated)Testing in Python: doctest and unittest (Updated)
Testing in Python: doctest and unittest (Updated)
 
Testing in Python: doctest and unittest
Testing in Python: doctest and unittestTesting in Python: doctest and unittest
Testing in Python: doctest and unittest
 
Dissertation Defense - Managing and Consuming Completeness Information for RD...
Dissertation Defense - Managing and Consuming Completeness Information for RD...Dissertation Defense - Managing and Consuming Completeness Information for RD...
Dissertation Defense - Managing and Consuming Completeness Information for RD...
 
Research Writing - 2018.07.18
Research Writing - 2018.07.18Research Writing - 2018.07.18
Research Writing - 2018.07.18
 
KOI - Knowledge Of Incidents - SemEval 2018
KOI - Knowledge Of Incidents - SemEval 2018KOI - Knowledge Of Incidents - SemEval 2018
KOI - Knowledge Of Incidents - SemEval 2018
 
Comparing Index Structures for Completeness Reasoning
Comparing Index Structures for Completeness ReasoningComparing Index Structures for Completeness Reasoning
Comparing Index Structures for Completeness Reasoning
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

Data Integration with CloverETL

  • 1. The Practical Side of Information Integration with Fariz Darari (FU Bolzano) 1 fadirra@gmail.com
  • 2. Outline 1. Information Integration 2. CloverETL 3. Demo – Global Schema – Data Sources – Queries Fariz Darari (FU Bolzano) 2 fadirra@gmail.com
  • 3. INFORMATION INTEGRATION Fariz Darari (FU Bolzano) 3 fadirra@gmail.com
  • 4. Information Integration II has the aim to provide uniform access to data that are stored in a number of autonomous and heterogeneous sources. Fariz Darari (FU Bolzano) 4 fadirra@gmail.com
  • 5. Challenges • Different data models (structured, semi-structured, text) • Different schemata • Differences in the representation of – values (km vs. miles, USD vs. EUR) – entities (addresses, dates, etc.) • Inconsistencies among the data Fariz Darari (FU Bolzano) 5 fadirra@gmail.com
  • 6. Components • Consists of: 1. Global Schema The unifying schema among local schemata. 2. Wrappers Wrappers make sources accessible. 3. Mediators Translate queries, combine answers of wrappers and other mediators. Fariz Darari (FU Bolzano) 6 fadirra@gmail.com
  • 7. Information Integration - GAV • An approach of mapping source schemata and global schema • GAV = relations in the global schema are views of the sources • Views are virtual relations, the global schema describes a virtual DB Fariz Darari (FU Bolzano) 7 fadirra@gmail.com
  • 8. Information Integration - GAV Fariz Darari (FU Bolzano) 8 fadirra@gmail.com
  • 9. Information Integration - ETL Fariz Darari (FU Bolzano) 9 fadirra@gmail.com
  • 10. Information Integration - ETL Products Fariz Darari (FU Bolzano) 10 fadirra@gmail.com
  • 11. CLOVER ETL Fariz Darari (FU Bolzano) 11 fadirra@gmail.com
  • 12. CloverETL • An Open Source based platform for information integration. • Data can be: – extracted from any number of sources – validated and modified along the way – written to one or more destinations. Fariz Darari (FU Bolzano) 12 fadirra@gmail.com
  • 13. CloverETL - Company Fariz Darari (FU Bolzano) 13 fadirra@gmail.com
  • 14. CloverETL - Architecture Fariz Darari (FU Bolzano) 14 fadirra@gmail.com
  • 15. CloverETL - Designer Fariz Darari (FU Bolzano) 15 fadirra@gmail.com
  • 16. CloverETL - Designer • Transformation graphs are created in CloverETL Designer. • Tranformation graphs are divided into: – Extract (Green) – Transformation (Yellow) – Load (Blue) • The edges correspond to the data flows from data sources to data targets. Fariz Darari (FU Bolzano) 16 fadirra@gmail.com
  • 17. DEMO Fariz Darari (FU Bolzano) 17 fadirra@gmail.com
  • 18. Global Schema Fariz Darari (FU Bolzano) 18 fadirra@gmail.com
  • 19. Global Schema - Example • Student(sid, sname, age, nationality) • Country(cid, cname, currency) Fariz Darari (FU Bolzano) 19 fadirra@gmail.com
  • 20. Data Sources • Unibz (Bolzano), from Relational DB – StudentBZ(id, name, sex, age, nationality, address) • Unitr (Trento), from XML – StudentTR(id, full_name, age, nationality) • Unimi (Milan), from CSV – StudentMI(student_id, name, gender, age, citizenship) • UN (United Nations), from Excel – CountryUN(id, country_name, population, capital, currency) Fariz Darari (FU Bolzano) fadirra@gmail.com 20
  • 21. Data Sources - Mapping • Student(sid, sname, age, nationality) :- StudentBZ(sid, sname, _, age, nationality, _) • Student(sid, sname, age, nationality) :- StudentTR(sid, sname, age, nationality) • Student(sid, sname, age, nationality) :- StudentMI(sid, sname, _, age, nationality) • Country (cid, cname, currency) :- CountryUN(cid, cname, _, _, currency) Fariz Darari (FU Bolzano) 21 fadirra@gmail.com
  • 22. Queries 1. All students with their information. q(sid, sname, age, nationality) :- Student(sid, sname, age, nationality). 2. All students whose age is more than 22. q(sid, sname) :- Student(sid, sname, age, nationality), age > 22. 3. All students with their nationality’s currency. q(sid, sname, age, nationality, currency) :- Student(sid, sname, age, nationality), Country(cid, nationality, currency). 4. The number of students per country. SELECT nationality, count(sid) FROM Student GROUP BY nationality Fariz Darari (FU Bolzano) 22 fadirra@gmail.com
  • 23. Demo • Query: q(sid, sname) :- Student(sid, sname, age, nationality), age > 22. • Logical Plans: q(sid, sname) :- StudentBZ(sid, sname, _, age, nationality, _), age > 22. q(sid, sname) :- StudentTR(sid, sname, age, nationality), age > 22. q(sid, sname) :- StudentMI(sid, sname, _, age, nationality), age > 22. Fariz Darari (FU Bolzano) 23 fadirra@gmail.com
  • 24. Demo - Execution Plan Fariz Darari (FU Bolzano) 24 fadirra@gmail.com

Editor's Notes

  1. Global Schema (mediated schema): It’s called global since we are trying to unify a number of local schemata. In some other cases, this global schema can also be a local schema for other II systems.Wrapper: Wrappers make sources accessible. They transform data from the source native format to something acceptable to the mediator.Mediators: Translate queries, combine answers of wrappers and mediators.
  2. The approach is completely virtual: we never create a database the conforms to the global schema.
  3. How to pose queries? Simply unfold the user query by substituting the view definition for global schema relations.
  4. Corresponds to the process from sources to Global Schema
  5. CloverETL Designer is a member of the family of CloverETL software products developed by Javlin. It is a powerful Java-based standalone application for data extraction, transformation and loading.CloverETL Designer builds upon extensible Eclipse platform. See www.eclipse.org.Working with CloverETL Designer is much simpler than writing code for data parsing. Its graphical user interface makes creating and running graphs easier and comfortable.CloverETL Designer can be used to work with CloverETL Server. These two products are fully integrated. You can use CloverETL Designer to connect to and communicate with CloverETL Server, create projects, graphs and all other resources on CloverETL Server in the same way as if you were working with the standard CloverETL Designer only locally.CloverETL Server allows to achieve:StatisticsMonitoringCentralized ETL job managementIntegration into enterprise workflowsMulti-user environmentParallel execution of graphsTracking of executions of graphsScheduling tasksClustering and distributed execution of graphsLaunch servicesLoad balancing and failover
  6. Transformation graphs are created in CloverETL Designer from graph elements and executed by CloverETL Engine. The most important graph elements are components (nodes). They all serve to process data. Most of them have ports through which they can receive data and/or send the processed data out. Most components work only when edges are connected to these ports. Each edge in a graph connected to some port must have metadata assigned to it. Metadata describes the structure of data flowing through the edge from one component to another.
  7. Demo scenario in general
  8. Data sources have different schemata and formats.We assume that students are disjoint over unis. Also, Student IDs are unique ini Italy.
  9. GAV Mapping
  10. As for the third query, we assume that nationality attribute is strong enough to be an identifier.GROUP BY is not expressible by CQs.
  11. Logical plan = what to do OR what you want (declarative)We unfold the query here.
  12. Execution plan = how to do it (procedural)From (http://www.codeproject.com/Articles/9990/SQL-Tuning-Tutorial-Understanding-a-Database-Execu):The way that a statement can be physically executed is called an execution plan or a query plan.An execution plan is composed of primitive operations. Examples of primitive operations are: reading a table completely, using an index, performing a nested loop or a hash join, etc.