SlideShare a Scribd company logo
1 of 22
Download to read offline
LINKED DATA INTEGRATION
FRAMEWORK
LDIF
 Expressive language mapping for translating data from
the various vocabularies used on the web into a
consistent, local target vocabulary [Schultz et al, 2011]
CHALLENGES
 Vocabulary heterogeneity – wide range of different
RDF vocabularies to represent data about the same
type of entity.
 URI aliases – the same real-world entity is identified
with different URIs within different data sources.
SOLUTION
 Have all data describing one class of entities
being represented using the same vocabulary
 Have all triples describing the same entity have
the same subject URI
TARGET
 Vocabulary mapping = translate data to a single target vocabulary
 Identity resolution = replace URI aliases wit ha single target URI on
the client’s side (based on user-provided matching heuristics)
 while keeping track of data provenance
(using the Named Graphs data model)
INTEGRATION PIPLELINE STEPS
1. COLLECT DATA: replicate data sets locally via file download,
crawling and SPARQL;
2. MAP TO SCHEMA: expressive language mapping from the various
vocabularies used on the Web into consistent, local target
vocabulary;
3. RESOLVE IDENTITIES: identity resolution component – replace URI
aliases;
4. OUTPUT: integrated data in a single file + provenance tracking
(Named Graphs data model).
ARCHITECTURE
 Steps of the data integration process that are currently supported by LDIF.
COMPONENTS
SCHEDULER
 Used for triggering pending data import jobs or integration jobs;
 Configured with an XML document;
 Updates the representation of external sources in the local cache;
 Has the following elements:
 Properties : path to a Java properties file for configuration parameters;
 dataSources: directory containing the data sources configurations;
 importJobs configurations
 integrationJob
 dumpLocation: directory where local dumps are cached
 Supports relative and absolute paths
SCHEDULER
<scheduler xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www4.wiwiss.fu-berlin.de/ldif/">
<properties>scheduler.properties</properties>
<dataSources>datasources</dataSources>
<importJobs>importJobs</importJobs>
<integrationJobs>integration-config.xml</integrationJob>
<dumpLocation>dumps</dumpLocation>
</scheduler>
COMPONENTS
DATA IMPORT
 Replicate data sets locally;
 Different types of import jobs generate provenance
metadata, tracked throughout the integration process;
 Managed by a scheduler configured to refresh (e.g.
hourly, daily) the local cache for each source.
DATA IMPORT
 Elements:
 internalId: unique ID used to internally track the import job and its
files (i.e/ data and provenance)
 dataSource: reference to a data source to state from which source this
job imports data;
 One kind of importJob (exactly one for each element)
 refreshSchedule
DATA IMPORT
Mechanisms to import external data:
 Quad Import Job – import N-Quad dumps
 Triple Import Job – import RDF/N-Triple dumps
 Crawl Import Job – import by dereferencing URIs as RDF data, using the LDSpider
Web Crawling Framework
 SPARQL Import Job – import by querying a SPARQL endpoint
TRIPLE/QUAD DUMP IMPORT
 Download a file containing the data set;
 Difference Triple and Quad: LDIF generates a
provenance graph for a triple dump import, whereas it
takes the given graphs from a quad dump import as
provenance graphs;
CRAWLER IMPORT
 Data sets that can be accessed only via
dereferenceable URIS are good candidates for a
crawler;
 Each crawled URI is put into a separate named graph
for provenance tracking
SPARQL IMPORT
 The relevant data tube queried can be further
specified in the configuration file for a SPARQL import
job;
 Data from each SPARQL import job gets tracked by its
own named graph.
COMPONENTS
INTEGRATION RUNTIME ENVIRONMENT
 Manages the data flow between the various
stages/modules, the caching of intermediate
results and the execution of the different
modules for each stage.
 Mechanisms: data input, transformation, data
output, and runtime environments.
INTEGRATION RUNTIME ENVIRONMENT
Mechanisms:
 Data Input: expects to be represented as Named Graphs and be stored in N-Quands format
accessible locally;
 Transformation: LDIF provides transformation modules for vocabulary mapping and identity
resolution:
 R2R Data Translation
 Silk Identity Resolution – Silk Link Discovery Framework
 Data Output: formats supported are N-Quads Writer and N-Triples Writer
 Runtime Environments: depending on the size of the dataset and the available computing
resources:
 Single machine / In-memory – keeps all intermediate results in the memory (fast, but limited scalability);
 Single machine / RDF Store - Jena TDB RDF store to store intermediate results, communicating with the RDF
and runtime environment through SPARQL queries (allows the processing of datasets that don't fit in the
memory, but it is slower);
 Cluster / Hadoop - parallelize the work onto multiple machines using Hadoop.
FUTHER STEPS
• Data Quality Evaluation and Data Fusion Module: should allow data to be filtered
according to different quality data assessment policies and provide for fusing Web
data according to different conflict resolution methods;
• Flexible integration workflow: make the workflow and its configuration more
flexible in order to make it easier to include additional modules to cover other
data integration aspects.
REFERENCES
• Andreas Schultz, Andrea Matteini, Robert Isele, Christian Bizer, Christian
Becker (2012) “LDIF – Linked Data Integration Framework ” Available
online: http://www4.wiwiss.fu-berlin.de/bizer/ldif/, retrieved 06.02.2012
(since the link from above is not active anymore try:
http://www.wiwiss.fu-
berlin.de/en/institute/pwo/bizer/news/ldif03released.html)
• Andreas Schultz, Andrea Matteini, Robert Isele, Christian Bizer, Christian
Becker (2011) “LDIF - Linked Data Integration Framework”. 2nd
International Workshop on Consuming Linked Data, Bonn, Germany,
October 2011.

More Related Content

What's hot

datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...Nancy Thomas
 
2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State
2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State
2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo StateMike Curtis
 
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...Terry Reese
 
Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocolsHitesh Mohapatra
 
Understanding LINQ in C#
Understanding LINQ in C# Understanding LINQ in C#
Understanding LINQ in C# MD. Shohag Mia
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration ServicesRobert MacLean
 
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...Terry Reese
 
[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina
[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina
[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with BallerinaWSO2
 
Ruby On Rails Siddhesh
Ruby On Rails SiddheshRuby On Rails Siddhesh
Ruby On Rails SiddheshSiddhesh Bhobe
 
LSC@LDAPCon 2011
LSC@LDAPCon 2011LSC@LDAPCon 2011
LSC@LDAPCon 2011sbahloul
 
Working with the MarcEditor
Working with the MarcEditorWorking with the MarcEditor
Working with the MarcEditorTerry Reese
 
Etl project on weblog
Etl project on weblogEtl project on weblog
Etl project on webloganushri3
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)Michael Rys
 
Mule dataweave
Mule dataweaveMule dataweave
Mule dataweaveSon Nguyen
 
FDM Presentation
FDM PresentationFDM Presentation
FDM Presentationtdsrogers
 

What's hot (19)

datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...datastage training | datastage online training | datastage training videos | ...
datastage training | datastage online training | datastage training videos | ...
 
2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State
2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State
2010 SUNYLA - The X Layer - a solution for a special collection a Buffalo State
 
Graduate Project Summary
Graduate Project SummaryGraduate Project Summary
Graduate Project Summary
 
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
 
Advanced database protocols
Advanced database protocolsAdvanced database protocols
Advanced database protocols
 
Understanding LINQ in C#
Understanding LINQ in C# Understanding LINQ in C#
Understanding LINQ in C#
 
SQL Server Integration Services
SQL Server Integration ServicesSQL Server Integration Services
SQL Server Integration Services
 
Unit 5-lecture-3
Unit 5-lecture-3Unit 5-lecture-3
Unit 5-lecture-3
 
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
 
[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina
[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina
[WSO2Con EU 2017] Manipulating XML, JSON and SQL Data Types with Ballerina
 
Ruby On Rails Siddhesh
Ruby On Rails SiddheshRuby On Rails Siddhesh
Ruby On Rails Siddhesh
 
LSC@LDAPCon 2011
LSC@LDAPCon 2011LSC@LDAPCon 2011
LSC@LDAPCon 2011
 
Working with the MarcEditor
Working with the MarcEditorWorking with the MarcEditor
Working with the MarcEditor
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
 
Etl project on weblog
Etl project on weblogEtl project on weblog
Etl project on weblog
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 
Mule dataweave
Mule dataweaveMule dataweave
Mule dataweave
 
FDM Presentation
FDM PresentationFDM Presentation
FDM Presentation
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 

Similar to Linked data integration_framework

RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Dbms & prog lang
Dbms & prog langDbms & prog lang
Dbms & prog langTech_MX
 
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)Rudolf Husar
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.jsWinston Hsieh
 
DOORS RIF Capability
DOORS RIF CapabilityDOORS RIF Capability
DOORS RIF CapabilityManageware
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sourcesrumito
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and RLushi Chen
 
47468272 introduction-to-informatica
47468272 introduction-to-informatica47468272 introduction-to-informatica
47468272 introduction-to-informaticaVenkat485
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)Al Sabawi
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Minerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSMinerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSBowenDing4
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Modern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas JellemaModern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas JellemaLucas Jellema
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 

Similar to Linked data integration_framework (20)

RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
HadoopDB in Action
HadoopDB in ActionHadoopDB in Action
HadoopDB in Action
 
Dbms & prog lang
Dbms & prog langDbms & prog lang
Dbms & prog lang
 
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)
 
Srds Pres011120
Srds Pres011120Srds Pres011120
Srds Pres011120
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
DOORS RIF Capability
DOORS RIF CapabilityDOORS RIF Capability
DOORS RIF Capability
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
L08 Data Source Layer
L08 Data Source LayerL08 Data Source Layer
L08 Data Source Layer
 
HADOOP
HADOOPHADOOP
HADOOP
 
L15 Data Source Layer
L15 Data Source LayerL15 Data Source Layer
L15 Data Source Layer
 
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS TechnologiesEasily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and R
 
47468272 introduction-to-informatica
47468272 introduction-to-informatica47468272 introduction-to-informatica
47468272 introduction-to-informatica
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Minerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSMinerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFS
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Modern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas JellemaModern Database Development Oow2008 Lucas Jellema
Modern Database Development Oow2008 Lucas Jellema
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 

More from STIinnsbruck

More from STIinnsbruck (20)

Unister
UnisterUnister
Unister
 
Twoo
TwooTwoo
Twoo
 
Twibes
TwibesTwibes
Twibes
 
Tweet deck 2012-01-02
Tweet deck 2012-01-02Tweet deck 2012-01-02
Tweet deck 2012-01-02
 
Tv handbook revised_100120141
Tv handbook revised_100120141Tv handbook revised_100120141
Tv handbook revised_100120141
 
Tv feratel 13032014
Tv feratel 13032014Tv feratel 13032014
Tv feratel 13032014
 
Tv evaluation 12032014
Tv evaluation 12032014Tv evaluation 12032014
Tv evaluation 12032014
 
T vb publication_rules_11032014
T vb publication_rules_11032014T vb publication_rules_11032014
T vb publication_rules_11032014
 
T vb mapping_implementation_25032014
T vb mapping_implementation_25032014T vb mapping_implementation_25032014
T vb mapping_implementation_25032014
 
T vb alignment_022814_0
T vb alignment_022814_0T vb alignment_022814_0
T vb alignment_022814_0
 
Ttr 20130701
Ttr 20130701Ttr 20130701
Ttr 20130701
 
Ttg mapping to_schema.org_
Ttg mapping to_schema.org_Ttg mapping to_schema.org_
Ttg mapping to_schema.org_
 
Ttb 08042014
Ttb 08042014Ttb 08042014
Ttb 08042014
 
Trust you
Trust youTrust you
Trust you
 
Tripwolf
TripwolfTripwolf
Tripwolf
 
Tripbirds
TripbirdsTripbirds
Tripbirds
 
Traveltainment
TraveltainmentTraveltainment
Traveltainment
 
Travelaudience
TravelaudienceTravelaudience
Travelaudience
 
Tourismuszukunft
TourismuszukunftTourismuszukunft
Tourismuszukunft
 
Tourismusverband innsbruck 24.09.2013
Tourismusverband innsbruck 24.09.2013Tourismusverband innsbruck 24.09.2013
Tourismusverband innsbruck 24.09.2013
 

Recently uploaded

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 

Recently uploaded (20)

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 

Linked data integration_framework

  • 2. LDIF  Expressive language mapping for translating data from the various vocabularies used on the web into a consistent, local target vocabulary [Schultz et al, 2011]
  • 3. CHALLENGES  Vocabulary heterogeneity – wide range of different RDF vocabularies to represent data about the same type of entity.  URI aliases – the same real-world entity is identified with different URIs within different data sources.
  • 4. SOLUTION  Have all data describing one class of entities being represented using the same vocabulary  Have all triples describing the same entity have the same subject URI
  • 5. TARGET  Vocabulary mapping = translate data to a single target vocabulary  Identity resolution = replace URI aliases wit ha single target URI on the client’s side (based on user-provided matching heuristics)  while keeping track of data provenance (using the Named Graphs data model)
  • 6. INTEGRATION PIPLELINE STEPS 1. COLLECT DATA: replicate data sets locally via file download, crawling and SPARQL; 2. MAP TO SCHEMA: expressive language mapping from the various vocabularies used on the Web into consistent, local target vocabulary; 3. RESOLVE IDENTITIES: identity resolution component – replace URI aliases; 4. OUTPUT: integrated data in a single file + provenance tracking (Named Graphs data model).
  • 7. ARCHITECTURE  Steps of the data integration process that are currently supported by LDIF.
  • 9. SCHEDULER  Used for triggering pending data import jobs or integration jobs;  Configured with an XML document;  Updates the representation of external sources in the local cache;  Has the following elements:  Properties : path to a Java properties file for configuration parameters;  dataSources: directory containing the data sources configurations;  importJobs configurations  integrationJob  dumpLocation: directory where local dumps are cached  Supports relative and absolute paths
  • 12. DATA IMPORT  Replicate data sets locally;  Different types of import jobs generate provenance metadata, tracked throughout the integration process;  Managed by a scheduler configured to refresh (e.g. hourly, daily) the local cache for each source.
  • 13. DATA IMPORT  Elements:  internalId: unique ID used to internally track the import job and its files (i.e/ data and provenance)  dataSource: reference to a data source to state from which source this job imports data;  One kind of importJob (exactly one for each element)  refreshSchedule
  • 14. DATA IMPORT Mechanisms to import external data:  Quad Import Job – import N-Quad dumps  Triple Import Job – import RDF/N-Triple dumps  Crawl Import Job – import by dereferencing URIs as RDF data, using the LDSpider Web Crawling Framework  SPARQL Import Job – import by querying a SPARQL endpoint
  • 15. TRIPLE/QUAD DUMP IMPORT  Download a file containing the data set;  Difference Triple and Quad: LDIF generates a provenance graph for a triple dump import, whereas it takes the given graphs from a quad dump import as provenance graphs;
  • 16. CRAWLER IMPORT  Data sets that can be accessed only via dereferenceable URIS are good candidates for a crawler;  Each crawled URI is put into a separate named graph for provenance tracking
  • 17. SPARQL IMPORT  The relevant data tube queried can be further specified in the configuration file for a SPARQL import job;  Data from each SPARQL import job gets tracked by its own named graph.
  • 19. INTEGRATION RUNTIME ENVIRONMENT  Manages the data flow between the various stages/modules, the caching of intermediate results and the execution of the different modules for each stage.  Mechanisms: data input, transformation, data output, and runtime environments.
  • 20. INTEGRATION RUNTIME ENVIRONMENT Mechanisms:  Data Input: expects to be represented as Named Graphs and be stored in N-Quands format accessible locally;  Transformation: LDIF provides transformation modules for vocabulary mapping and identity resolution:  R2R Data Translation  Silk Identity Resolution – Silk Link Discovery Framework  Data Output: formats supported are N-Quads Writer and N-Triples Writer  Runtime Environments: depending on the size of the dataset and the available computing resources:  Single machine / In-memory – keeps all intermediate results in the memory (fast, but limited scalability);  Single machine / RDF Store - Jena TDB RDF store to store intermediate results, communicating with the RDF and runtime environment through SPARQL queries (allows the processing of datasets that don't fit in the memory, but it is slower);  Cluster / Hadoop - parallelize the work onto multiple machines using Hadoop.
  • 21. FUTHER STEPS • Data Quality Evaluation and Data Fusion Module: should allow data to be filtered according to different quality data assessment policies and provide for fusing Web data according to different conflict resolution methods; • Flexible integration workflow: make the workflow and its configuration more flexible in order to make it easier to include additional modules to cover other data integration aspects.
  • 22. REFERENCES • Andreas Schultz, Andrea Matteini, Robert Isele, Christian Bizer, Christian Becker (2012) “LDIF – Linked Data Integration Framework ” Available online: http://www4.wiwiss.fu-berlin.de/bizer/ldif/, retrieved 06.02.2012 (since the link from above is not active anymore try: http://www.wiwiss.fu- berlin.de/en/institute/pwo/bizer/news/ldif03released.html) • Andreas Schultz, Andrea Matteini, Robert Isele, Christian Bizer, Christian Becker (2011) “LDIF - Linked Data Integration Framework”. 2nd International Workshop on Consuming Linked Data, Bonn, Germany, October 2011.