SlideShare a Scribd company logo
CEDAL: Time-Efficient Detection of Erroneous Links in
Large-Scale Link Repositories
Andre Valdestilhas1 Tommaso Soru1 Axel-Cyrille Ngonga Ngomo2
1University of Leipzig 2University of Paderborn
August 23, 2017
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 1 / 13
Introduction
LODCloud has billions of links
Errors in those links can lead to applications failing completely.
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 2 / 13
Example of error types
Error type (1):
———————————-
Semantic accuracy.
(Not) represent correctly real world
facts.
Error type (2):
———————————-
Consistency & Conciseness.
(Not) free of contradictions &
minimization of redundancy.
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 3 / 13
State of the art & CEDAL Approach
State of the art:
Closures (reflexive, symmetric and transitive).
Not scale to millions of links.
O(n2).
CEDAL:
Union-Find (Graph-partition + Adjacency
Lists).
O(mlogn).
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 4 / 13
CEDAL Approach
Semantics of URI + Efficient Graph Partition (Union-find)
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 5 / 13
Parallel processing
CEDAL CPU/GPU processing.
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 6 / 13
Experiments & Results
State of the art
103 104 105 106
102
104
106
108
Input size (triples)
Time(ms)
Pellet
ClosureGenerator
CEDAL (Graph Partition)
O(n2) vs O(m log n)
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 7 / 13
Experiments & Results
Link Types
Property Triples
sameAs 19,606,657 (with duplicates)
country 1,309
author 766
spokenIn 624
locatedIn 250
exactMatch 167
near 30
spatial#P 28
seeAlso 14
organism 14
made 4
Links from LinkLion (http://www.linklion.org).
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 8 / 13
Experiments & Results
Runtime
2 4 6 8
0.2
0.3
0.4
0.5
0.6
Cores
Time(seconds)
(a) 1, 000 Links.
2 4 6 8
10
20
30
Cores
GPU
CPU
(b) 1, 000, 000 Links.
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 9 / 13
Experiment & Results
Knowledge bases ranking
0 10000 20000 30000 40000 50000 60000
Erroneous candidates
rkbexplorer.dotac---rkbexplorer.eprints
d-nb.info---viaf.org
dblp.rkbexplorer.com---dblp.l3s.de
linkedgeodata.org---sws.geonames.org
citeseer.rkbexplorer.com---kisti.rkbexplorer.com
Linksetknowledgebase
Top 5 Knowledge-base pairs with more erroneous candidates.
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 10 / 13
Provenance of the links
Comparison of results with respect to the provenance of the links
Framework Errors Resources M1
sameas.org 3,792,326 28,130,994 0.865
LIMES 1,130 27,819 0.951
Silk 5,933 208,300 0.972
DBpedia Extraction Framework 12,615 914,180 0.986
M1 is the metric for linkset quality measure.M1 = P∈P− |P|
P∈P |P|
SameAs.org Limes Silk DBpedia Extraction Framework
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
Erroneouscandidates
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 11 / 13
Conclusions and Future work
An approach for consistency without processing the closures
Track the consistency problems
Scalable (CPU / GPU)
Study case with LinkLion
Quality measure
Future work: Applications (LinkLion2 and Education)
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 12 / 13
The end
Thanks !
Questions ?
—————
Contact:
valdestilhas@informatik.uni-leipzig.de
tsoru@informatik.uni-leipzig.de
axel.ngonga@upb.de
https://github.com/dice-group/CEDAL
Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 13 / 13

More Related Content

Similar to Cedal slides. Web Inteligence 2017

Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Maribel Acosta Deibe
 
Visualization of 3D Genome Data
Visualization of 3D Genome DataVisualization of 3D Genome Data
Visualization of 3D Genome Data
Nils Gehlenborg
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
Till Blume
 
Knowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly CommunicationKnowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly Communication
Leipziger Semantic Web Tag
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
andrea huang
 
AI Science
AI Science AI Science
AI Science
Melanie Swan
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
Nexgen Technology
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
Miha Ahronovitz
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
tsysglobalsolutions
 
Smallworld : Efficient maximum common substructure searching of large databases
Smallworld : Efficient maximum common substructure searching of large databasesSmallworld : Efficient maximum common substructure searching of large databases
Smallworld : Efficient maximum common substructure searching of large databases
NextMove Software
 
Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...
Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...
Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...
Basil Ell
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
iaemedu
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
noho
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
Dan Elton
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
Iaetsd Iaetsd
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
aimsnist
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERSCLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
Nexgen Technology
 
Managing and Testing Ensembles of IoT, Network functions, and Clouds
Managing and Testing Ensembles of IoT, Network functions, and CloudsManaging and Testing Ensembles of IoT, Network functions, and Clouds
Managing and Testing Ensembles of IoT, Network functions, and Clouds
Hong-Linh Truong
 

Similar to Cedal slides. Web Inteligence 2017 (20)

Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Visualization of 3D Genome Data
Visualization of 3D Genome DataVisualization of 3D Genome Data
Visualization of 3D Genome Data
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
Knowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly CommunicationKnowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly Communication
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
AI Science
AI Science AI Science
AI Science
 
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUDEPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Smallworld : Efficient maximum common substructure searching of large databases
Smallworld : Efficient maximum common substructure searching of large databasesSmallworld : Efficient maximum common substructure searching of large databases
Smallworld : Efficient maximum common substructure searching of large databases
 
Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...
Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...
Semantically Enhanced Interactions between Heterogeneous Data Life-Cycles - A...
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Iaetsd a survey on one class clustering
Iaetsd a survey on one class clusteringIaetsd a survey on one class clustering
Iaetsd a survey on one class clustering
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERSCLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
CLUSTERING DATA STREAMS BASED ON SHARED DENSITY BETWEEN MICRO-CLUSTERS
 
Managing and Testing Ensembles of IoT, Network functions, and Clouds
Managing and Testing Ensembles of IoT, Network functions, and CloudsManaging and Testing Ensembles of IoT, Network functions, and Clouds
Managing and Testing Ensembles of IoT, Network functions, and Clouds
 

More from André Valdestilhas

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
André Valdestilhas
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
André Valdestilhas
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
André Valdestilhas
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
André Valdestilhas
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
André Valdestilhas
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
André Valdestilhas
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
André Valdestilhas
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
André Valdestilhas
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
André Valdestilhas
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
André Valdestilhas
 

More from André Valdestilhas (12)

NaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdfNaturalMSEQueries_presICWI2023.pdf
NaturalMSEQueries_presICWI2023.pdf
 
Presentation MSE 2022
Presentation MSE 2022Presentation MSE 2022
Presentation MSE 2022
 
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF SourcesMore Complete Resultset Retrieval from Large Heterogeneous RDF Sources
More Complete Resultset Retrieval from Large Heterogeneous RDF Sources
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
 
WASOTA (poster)
WASOTA (poster)WASOTA (poster)
WASOTA (poster)
 
Iswc 2016 completeness correctude
Iswc 2016 completeness correctudeIswc 2016 completeness correctude
Iswc 2016 completeness correctude
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
DBpediaSameAs
DBpediaSameAsDBpediaSameAs
DBpediaSameAs
 
Emotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resourcesEmotion-oriented computing: Possible uses and resources
Emotion-oriented computing: Possible uses and resources
 
Using semiotic profile
Using semiotic profileUsing semiotic profile
Using semiotic profile
 
Emotion-oriented computing: Possible uses and applications
Emotion-oriented computing: Possible uses and  applicationsEmotion-oriented computing: Possible uses and  applications
Emotion-oriented computing: Possible uses and applications
 
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
Um estudo sobre localização de serviços sensíveis ao contexto para Televisão ...
 

Recently uploaded

The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
OECD Directorate for Financial and Enterprise Affairs
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
samililja
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
OECD Directorate for Financial and Enterprise Affairs
 
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
OECD Directorate for Financial and Enterprise Affairs
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
Robin Haunschild
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
kainatfatyma9
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
ASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdfASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdf
ToshihiroIto4
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
artemacademy2
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Ben Linders
 
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
kekzed
 

Recently uploaded (20)

The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
 
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
Competition and Regulation in Professions and Occupations – OECD – June 2024 ...
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
 
ASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdfASONAM2023_presection_slide_track-recommendation.pdf
ASONAM2023_presection_slide_track-recommendation.pdf
 
Carrer goals.pptx and their importance in real life
Carrer goals.pptx  and their importance in real lifeCarrer goals.pptx  and their importance in real life
Carrer goals.pptx and their importance in real life
 
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussionPro-competitive Industrial Policy – OECD – June 2024 OECD discussion
Pro-competitive Industrial Policy – OECD – June 2024 OECD discussion
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
 
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
Suzanne Lagerweij - Influence Without Power - Why Empathy is Your Best Friend...
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
 

Cedal slides. Web Inteligence 2017

  • 1. CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link Repositories Andre Valdestilhas1 Tommaso Soru1 Axel-Cyrille Ngonga Ngomo2 1University of Leipzig 2University of Paderborn August 23, 2017 Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 1 / 13
  • 2. Introduction LODCloud has billions of links Errors in those links can lead to applications failing completely. Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 2 / 13
  • 3. Example of error types Error type (1): ———————————- Semantic accuracy. (Not) represent correctly real world facts. Error type (2): ———————————- Consistency & Conciseness. (Not) free of contradictions & minimization of redundancy. Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 3 / 13
  • 4. State of the art & CEDAL Approach State of the art: Closures (reflexive, symmetric and transitive). Not scale to millions of links. O(n2). CEDAL: Union-Find (Graph-partition + Adjacency Lists). O(mlogn). Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 4 / 13
  • 5. CEDAL Approach Semantics of URI + Efficient Graph Partition (Union-find) Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 5 / 13
  • 6. Parallel processing CEDAL CPU/GPU processing. Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 6 / 13
  • 7. Experiments & Results State of the art 103 104 105 106 102 104 106 108 Input size (triples) Time(ms) Pellet ClosureGenerator CEDAL (Graph Partition) O(n2) vs O(m log n) Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 7 / 13
  • 8. Experiments & Results Link Types Property Triples sameAs 19,606,657 (with duplicates) country 1,309 author 766 spokenIn 624 locatedIn 250 exactMatch 167 near 30 spatial#P 28 seeAlso 14 organism 14 made 4 Links from LinkLion (http://www.linklion.org). Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 8 / 13
  • 9. Experiments & Results Runtime 2 4 6 8 0.2 0.3 0.4 0.5 0.6 Cores Time(seconds) (a) 1, 000 Links. 2 4 6 8 10 20 30 Cores GPU CPU (b) 1, 000, 000 Links. Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 9 / 13
  • 10. Experiment & Results Knowledge bases ranking 0 10000 20000 30000 40000 50000 60000 Erroneous candidates rkbexplorer.dotac---rkbexplorer.eprints d-nb.info---viaf.org dblp.rkbexplorer.com---dblp.l3s.de linkedgeodata.org---sws.geonames.org citeseer.rkbexplorer.com---kisti.rkbexplorer.com Linksetknowledgebase Top 5 Knowledge-base pairs with more erroneous candidates. Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 10 / 13
  • 11. Provenance of the links Comparison of results with respect to the provenance of the links Framework Errors Resources M1 sameas.org 3,792,326 28,130,994 0.865 LIMES 1,130 27,819 0.951 Silk 5,933 208,300 0.972 DBpedia Extraction Framework 12,615 914,180 0.986 M1 is the metric for linkset quality measure.M1 = P∈P− |P| P∈P |P| SameAs.org Limes Silk DBpedia Extraction Framework 0.00% 2.00% 4.00% 6.00% 8.00% 10.00% 12.00% 14.00% Erroneouscandidates Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 11 / 13
  • 12. Conclusions and Future work An approach for consistency without processing the closures Track the consistency problems Scalable (CPU / GPU) Study case with LinkLion Quality measure Future work: Applications (LinkLion2 and Education) Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 12 / 13
  • 13. The end Thanks ! Questions ? ————— Contact: valdestilhas@informatik.uni-leipzig.de tsoru@informatik.uni-leipzig.de axel.ngonga@upb.de https://github.com/dice-group/CEDAL Andre Valdestilhas, Tommaso Soru, Axel-Cyrille Ngonga Ngomo (Universities of Somewhere and Elsewhere)CEDAL: Time-Efficient Detection of Erroneous Links in Large-Scale Link RepositoriesAugust 23, 2017 13 / 13