SlideShare a Scribd company logo
1 of 19
Quantifying the bias 
in data links 
Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta
The problem 
• Linked Data datasets are biased 
• Bias = the information is unevenly distributed 
• To detect such a bias, the information distribution in the 
dataset should be compared to an unbiased one (ground 
truth), which is not available 
• Our proposal is to use information coming from the 
connected datasets to approximate such a comparison
Is bias a problem? 
• LMDB is biased towards old movies (i.e., it mostly 
contains information about old movies) 
• A recommender system would therefore produce 
results biased towards old movies 
• There is a need of identifying this bias 
• to properly assess the results of Linked Data systems and 
• to compensate the bias.
Motivation 
• Dedalo: using Linked Data to explain patterns 
• Pattern 
• Students of the Open University enroll into Health&Social Care 
courses more often around Manchester than in other places 
• Explanation 
• Health&Social Care courses are popular in Manchester because it is 
in the Northern Hemisphere 
• In DBpedia, the information incompleteness regarding places 
locations is unevenly distributed, i.e. there is a bias
Identifying the bias 
• Measure how much a dataset is biased when compared to 
another one 
D 
S 
owl:sameAs 
rdf:seeAlso 
skos:exactMatch 
…. 
Dataset 
• Use the dataset projection into its connecting dataset D 
• compare the property values distribution of entities in D 
• with the one of entities in S (the dataset projection)
Example : is LMDB biased? 
• Compare dc:subject values for the entities in D and in S 
LMDB is biased towards black and white movies 
• Same for dbp:released 
LMDB is biased towards older movies
Bias detection proposition 
• Use SPARQL to build pairs of values distributions in S and D 
• Given 
• two populations (values) and 
• a same observation (RDF property) 
dc:subject(D) = {dbCat:ScienceFictionMovies,dbCat:Black&WhiteMovies} 
dc:subject(S) = {dbCat:Black&WhiteMovies} 
• Use the statistical t-tests commonly exploited to compare 
observations
T-Tests of statistical significance 
• There is a significant difference between two populations 
• calculates the probability p that the difference is due to 
chance 
• state a null hypothesis (i.e. is due to chance) 
• there is no bias in a property 
• an alternate hypothesis (the one you want to prove) 
• there is bias in a property 
• if p below 0.05, then one can reject the null hypothesis 
• the lower p, the more the property is biased 
• Rank the properties according to p to find the most biased ones
Experiments and results 
• 30 datasets and 54 pairs from the DataHub1 
• Varying in size of entities in S (from 30 to 60,000 
approx.) 
• Varying in domain (multi-domain, biomedical 
computer science, education, geography…) 
[1] http://datahub.io/
When results are expected… 
• NLFinland, places in Finland (connected to DBpedia) 
class prop value p 
db:Place dc:subject db:CitiesAndTownsInFinland p < 1.00e-15 
db:Place dbp:latd (average) 40.5 p < 1.00e-15 
db:Place dbp:longd (average) 24.6 p < 1.00e-15 
• NLSpain, bibliographic Spanish data (connected to DBpedia) 
class prop value p 
db:MusicalArtist db:birthPlace db:Spain p < 1.13e-13 
db:Writer dbp:nationality db:Spanish p < 4.64e-03
…when results are less expected 
• Uniprot, biomedical data (connected to 
Bio2RDF/BioPax/DrugBank) 
class prop value p 
up:Protein up:isolatedFrom uptissue:Brain p < 1.33e-04 
• RED, writers data (connected to DBpedia) 
class prop value p 
db:Agent db:genre db:Novel p < 1.00e-15 
db:Agent db:genre db:Poetry p < 1.00e-15 
db:Agent db:deathCause db:Suicide p < 1.00e-15
Conclusions and future work 
• The importance of identifying the bias in a dataset 
• Approach: 
• with information from the connected datasets 
• statistical t-tests on the distributions of the values of a property 
• ranking properties basing on the probability of being biased 
• Evaluating Dedalo’s performance on Google Trends 
Please participate! 
http://linkedu.eu/dedalo/eval/
Thank you for your attention 
Questions? 
ilaria.tiddi @open.ac.uk 
@IlaTiddi http://linkedu.eu/dedalo/eval/
Dedalo: explaining clusters with Linked Data 
• Linked Data are a graph 
• nodes : URIs 
• edges : RDF properties 
• Some nodes walk to the same node 
Walk = a chain of RDF properties 
• Walks can be an explanation for the cluster 
ExplC = a chain of properties and one final entity
Dedalo: explaining clusters with Linked Data 
ExplC =“movies whose subject is a subcategory of Science Fiction” 
 A* iterative search 
 Entropy to drive the search expanding the graph 
 Improving the F-score of ExplC at each iteration
Knowledge Discovery 
raw 
data 
clean 
data 
Patterns 
 The process of identifying patterns in data1 
 Patterns are usually interpreted by the experts 
 Linked Data can be used to automatically interpret patterns 
 open, shared, multi-domain, connected knowledge 
Knowledge 
[1] Fayyad, 1998.
Contribution 
Need of identify the bias when producing Linked Data systems 
A recommender system based on DBpedia (any kind of movies) 
DBpedia is linked to the Linked Movies Database ( ‘30s movies ) 
The recommendation might be compromised 
We propose a process to identify and measure the bias based on 
statistical methods
Motivation 
• Students are interested in Health&Social Care since they live in 
the Northern Hemisphere 
• What about the other counties? 
• are they connected to the “Northern Hemisphere” entity? 
• There must be a bias :the information is unevenly distributed 
• Solution: weighting properties to rebalance the unevenness
ilaria.tiddi @open.ac.uk 
@IlaTiddi 
THANK YOU VERY MUCH! 
Questions?

More Related Content

What's hot

Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Andrea Payant
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Information Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament dataInformation Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament dataWim Peters
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...NAIST Machine Translation Study Group
 
Information Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampInformation Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampWim Peters
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeDiane Rasmussen Pennington
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
 
Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)Yuya Yoshikawa
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondIsabelle Augenstein
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018Leon Derczynski
 

What's hot (20)

Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Information Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament dataInformation Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament data
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
Information Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampInformation Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative Camp
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
 
How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
Presentation to KILT
Presentation to KILTPresentation to KILT
Presentation to KILT
 
Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)Learning from Noisy Label Distributions (ICANN2017)
Learning from Noisy Label Distributions (ICANN2017)
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Qald 7 at ESWC2017
Qald 7 at ESWC2017Qald 7 at ESWC2017
Qald 7 at ESWC2017
 
QALD-7 Question Answering over Linked Data Challenge
QALD-7 Question Answering over Linked Data ChallengeQALD-7 Question Answering over Linked Data Challenge
QALD-7 Question Answering over Linked Data Challenge
 
Content based filtering
Content based filteringContent based filtering
Content based filtering
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Ievobio2010cdaostore
Ievobio2010cdaostoreIevobio2010cdaostore
Ievobio2010cdaostore
 
State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018State of Tools for NLP in Danish: 2018
State of Tools for NLP in Danish: 2018
 

Similar to Quantifying the bias in data links

Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Vrije Universiteit Amsterdam
 
Filtering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open DataFiltering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open Dataebrahim_bagheri
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...GUANGYUAN PIAO
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Anita de Waard
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia StudyMaribel Acosta Deibe
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
A multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasseA multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hassebalamurugan.k Kalibalamurugan
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
 

Similar to Quantifying the bias in data links (20)

Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
 
Filtering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open DataFiltering Inaccurate Entity Co-references on the Linked Open Data
Filtering Inaccurate Entity Co-references on the Linked Open Data
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
A multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasseA multi criteria evaluation of environmental databases using hasse
A multi criteria evaluation of environmental databases using hasse
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Sharing data
Sharing dataSharing data
Sharing data
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Research Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the Challenge
 

More from Vrije Universiteit Amsterdam

An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...Vrije Universiteit Amsterdam
 
Update of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agentsUpdate of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agentsVrije Universiteit Amsterdam
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingVrije Universiteit Amsterdam
 
Using Neural Networks to aggregate Linked Data rules
Using Neural Networks to aggregate Linked Data rulesUsing Neural Networks to aggregate Linked Data rules
Using Neural Networks to aggregate Linked Data rulesVrije Universiteit Amsterdam
 
Walking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clustersWalking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clustersVrije Universiteit Amsterdam
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 

More from Vrije Universiteit Amsterdam (13)

Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
Building intelligent systems with FAIR data
Building intelligent systems with FAIR dataBuilding intelligent systems with FAIR data
Building intelligent systems with FAIR data
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...An ontology-based approach to improve the accessibility of ROS-based robotic ...
An ontology-based approach to improve the accessibility of ROS-based robotic ...
 
Answer Worskshop @ESWC2017 - Introduction
Answer Worskshop @ESWC2017 - IntroductionAnswer Worskshop @ESWC2017 - Introduction
Answer Worskshop @ESWC2017 - Introduction
 
Update of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agentsUpdate of time-invalid information in knowledge bases through mobile agents
Update of time-invalid information in knowledge bases through mobile agents
 
Learning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic ProgrammingLearning to assess Linked Data relationships using Genetic Programming
Learning to assess Linked Data relationships using Genetic Programming
 
An Ontology Design Pattern to Define Explanations
An Ontology Design Pattern to Define ExplanationsAn Ontology Design Pattern to Define Explanations
An Ontology Design Pattern to Define Explanations
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Using Neural Networks to aggregate Linked Data rules
Using Neural Networks to aggregate Linked Data rulesUsing Neural Networks to aggregate Linked Data rules
Using Neural Networks to aggregate Linked Data rules
 
Walking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clustersWalking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clusters
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 

Recently uploaded

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 

Recently uploaded (20)

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 

Quantifying the bias in data links

  • 1. Quantifying the bias in data links Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta
  • 2. The problem • Linked Data datasets are biased • Bias = the information is unevenly distributed • To detect such a bias, the information distribution in the dataset should be compared to an unbiased one (ground truth), which is not available • Our proposal is to use information coming from the connected datasets to approximate such a comparison
  • 3. Is bias a problem? • LMDB is biased towards old movies (i.e., it mostly contains information about old movies) • A recommender system would therefore produce results biased towards old movies • There is a need of identifying this bias • to properly assess the results of Linked Data systems and • to compensate the bias.
  • 4. Motivation • Dedalo: using Linked Data to explain patterns • Pattern • Students of the Open University enroll into Health&Social Care courses more often around Manchester than in other places • Explanation • Health&Social Care courses are popular in Manchester because it is in the Northern Hemisphere • In DBpedia, the information incompleteness regarding places locations is unevenly distributed, i.e. there is a bias
  • 5. Identifying the bias • Measure how much a dataset is biased when compared to another one D S owl:sameAs rdf:seeAlso skos:exactMatch …. Dataset • Use the dataset projection into its connecting dataset D • compare the property values distribution of entities in D • with the one of entities in S (the dataset projection)
  • 6. Example : is LMDB biased? • Compare dc:subject values for the entities in D and in S LMDB is biased towards black and white movies • Same for dbp:released LMDB is biased towards older movies
  • 7. Bias detection proposition • Use SPARQL to build pairs of values distributions in S and D • Given • two populations (values) and • a same observation (RDF property) dc:subject(D) = {dbCat:ScienceFictionMovies,dbCat:Black&WhiteMovies} dc:subject(S) = {dbCat:Black&WhiteMovies} • Use the statistical t-tests commonly exploited to compare observations
  • 8. T-Tests of statistical significance • There is a significant difference between two populations • calculates the probability p that the difference is due to chance • state a null hypothesis (i.e. is due to chance) • there is no bias in a property • an alternate hypothesis (the one you want to prove) • there is bias in a property • if p below 0.05, then one can reject the null hypothesis • the lower p, the more the property is biased • Rank the properties according to p to find the most biased ones
  • 9. Experiments and results • 30 datasets and 54 pairs from the DataHub1 • Varying in size of entities in S (from 30 to 60,000 approx.) • Varying in domain (multi-domain, biomedical computer science, education, geography…) [1] http://datahub.io/
  • 10. When results are expected… • NLFinland, places in Finland (connected to DBpedia) class prop value p db:Place dc:subject db:CitiesAndTownsInFinland p < 1.00e-15 db:Place dbp:latd (average) 40.5 p < 1.00e-15 db:Place dbp:longd (average) 24.6 p < 1.00e-15 • NLSpain, bibliographic Spanish data (connected to DBpedia) class prop value p db:MusicalArtist db:birthPlace db:Spain p < 1.13e-13 db:Writer dbp:nationality db:Spanish p < 4.64e-03
  • 11. …when results are less expected • Uniprot, biomedical data (connected to Bio2RDF/BioPax/DrugBank) class prop value p up:Protein up:isolatedFrom uptissue:Brain p < 1.33e-04 • RED, writers data (connected to DBpedia) class prop value p db:Agent db:genre db:Novel p < 1.00e-15 db:Agent db:genre db:Poetry p < 1.00e-15 db:Agent db:deathCause db:Suicide p < 1.00e-15
  • 12. Conclusions and future work • The importance of identifying the bias in a dataset • Approach: • with information from the connected datasets • statistical t-tests on the distributions of the values of a property • ranking properties basing on the probability of being biased • Evaluating Dedalo’s performance on Google Trends Please participate! http://linkedu.eu/dedalo/eval/
  • 13. Thank you for your attention Questions? ilaria.tiddi @open.ac.uk @IlaTiddi http://linkedu.eu/dedalo/eval/
  • 14. Dedalo: explaining clusters with Linked Data • Linked Data are a graph • nodes : URIs • edges : RDF properties • Some nodes walk to the same node Walk = a chain of RDF properties • Walks can be an explanation for the cluster ExplC = a chain of properties and one final entity
  • 15. Dedalo: explaining clusters with Linked Data ExplC =“movies whose subject is a subcategory of Science Fiction”  A* iterative search  Entropy to drive the search expanding the graph  Improving the F-score of ExplC at each iteration
  • 16. Knowledge Discovery raw data clean data Patterns  The process of identifying patterns in data1  Patterns are usually interpreted by the experts  Linked Data can be used to automatically interpret patterns  open, shared, multi-domain, connected knowledge Knowledge [1] Fayyad, 1998.
  • 17. Contribution Need of identify the bias when producing Linked Data systems A recommender system based on DBpedia (any kind of movies) DBpedia is linked to the Linked Movies Database ( ‘30s movies ) The recommendation might be compromised We propose a process to identify and measure the bias based on statistical methods
  • 18. Motivation • Students are interested in Health&Social Care since they live in the Northern Hemisphere • What about the other counties? • are they connected to the “Northern Hemisphere” entity? • There must be a bias :the information is unevenly distributed • Solution: weighting properties to rebalance the unevenness
  • 19. ilaria.tiddi @open.ac.uk @IlaTiddi THANK YOU VERY MUCH! Questions?

Editor's Notes

  1. in favor of certain values
  2. one ds is linked to another a subset is linked to another compare the subset of linked entities the subset does not reflect the whole dataset We propose a process based on statistical methods to do it
  3. remove background **dsitrubutions comparison
  4. Use sparql to build paiss of distribution of values for (D, S)Compare these distributions How do ewe compare
  5. ind. >> compared by distributions a low p = a significant difference (not random) = the most bias Independent t-test for numerical values (dbp:released) Paired t-test for the others (dc:subject) itt >> the most common: the difference is shown by the groups avg, std-dev and sample size p is the value of alpha (type I error : assuming there is a relbut there Is not)
  6. indications of the most biased prop and class and which values the somr presenting them in order of surprise
  7. indications of the most biased prop and class and which values the somr presenting them in order of surprise
  8. dedalo.kmi.open.ac.uk >> kmi-web 26
  9. by obesrving LD
  10. other cluster = false positive