SlideShare a Scribd company logo
Research Knowledge Graphs at NFDI4DS & GESIS
RKG Workshop
Stefan Dietze, 23.06.2022
NFDI4DS - Consortium
2
FHI
Universität
Leipzig
 Several universities as centers of
excellence in DS and AI.
 Several non-university research
institutes as key contributors.
 Scientific information centers
and infrastructure providers as
providers of data and for access to
domain user groups.
● Paradigm shift in computer science towards data- and deep
learning driven methods (DS = fourth scientific paradigm).
● Reproducibility crisis.
○ Study on neural recommender publications at top-tier-
venues finds only 7/18 reproducible and only 1/18 beating
state-of-the-art. [Dacrema et al., RecSys2019]
● Bias and discrimination from learned data-induced biases.
○ Gender classification performance of neural approaches
performs significantly worse for darker skinned women.
[Buolamwini et al., FAccT2018]
○ Health risk of black people consistently underestimated
by predictive algorithms. [Obermeyer et al., Science2019]
Data Science & AI Challenges
3
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge Research Data Cycle
Provenance & Dependencies of Research Data, Resources, Knowledge
Common questions for researchers
• Which top-tier publications cite which data/method?
(„dataset authority“)
• Which data was used to train/evaluate which method?
Which method to produce what data?
• Which claims are supported/cited/rejected by what
dataset or publication?
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge
Provenance & Dependencies of Research Data, Resources, Knowledge
Challenges
• Data & metadata about resources and concepts not
represented in structured, machine-interpretable,
integrated manner (hidden in publications, web pages
etc)
• Persistent identifiers (e.g. DOIs) used inconsistently
(e.g. on publications/datasets)
• Relations and semantics not explicit
• Reproducibility crisis in CS/DS/AI
▪ Research Data
▪ Publications
▪ Code/Scripts
▪ ML Models
▪ Methods
▪ Claims
▪ Metrics
Relations between scientific resources, data, knowledge
Provenance & Dependencies of Research Data, Resources, Knowledge
• Data interoperability and reuse through established W3C standards for data
sharing (on the Web), e.g. RDF, JSON, shared vocabularies
(e.g. schema.org, DCAT, DDI), APIs for data reuse and linking
• Making links between resources and concepts explicit & machine-
interpretable
(e.g. which publications cite what dataset?)
• Consistent use of persisent IDs (e.g. URIs, DOIs) across all data, e.g.
concepts, resources etc („DOIs for all“)
• Partners in NFDI4DS/TA2 (“RKGs”)
Knowledge Graphs for FAIR Research Data in NFDI4DS
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
GESIS & TIB RKGs in NFDI4DS
10
• Community
• Expert-curation/annotation
workflow/tools
• Focus point & hub
• PIDs
https://www.nfdi4datascience.de/
• Large-scale knowledge
graphs
• Automated deep learning-
based methods for
extracting KGs
• Populating ORKG
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
11
Research KGs in Practice: integrated search @ GESIS
https://search.gesis.org/
Dataset
Rel. Publications
12
From publications to machine-interpretable metadata KGs
Disambiguation of dataset & software/script citations
https://data.gesis.org/softwarekg
▪ Training deep learning-
based model for extraction
software & data references
in large-scale data
(3.5 M publications)
▪ Data lifting into KG
▪ 300+ M triples / statements
▪ Search across
data/software/publications
(GESIS Search)
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
(Schindler et al., CIKM2021)
▪ Understanding SW
usage, citation habits
and their evolution
across disciplines
▪ Rise of data science =
rise of software usage
https://data.gesis.org/softwarekg
▪ Top adopters of data
science/AI/software…
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
https://data.gesis.org/softwarekg
▪ Top adopters of data
science/AI/software…
▪ …follow the worst
citation habits
From publications to machine-interpretable metadata KGs
Understanding scientific software/data usage
http://dbpedia.org/resource/Tim_Berners-Lee
wna:positive-emotion
onyx:Intensity "0.75"
onyx:Intensity "0.0"
http://dbpedia.org/resource/Solid
wna:negative-emotion
From social media to machine-interpretable research data KGs
Building a public research knowledge graph from Twitter data
https://data.gesis.org/tweetskb
From social media to machine-interpretable research data KGs
https://data.gesis.org/tweetskb
TweetsKB – a large-scale research KG of societal opinions
▪ Harvesting & archiving of 10 Billion tweets
(permanent collection from Twitter 1% sample since
2013)
▪ Information extraction pipeline to build a KG of
entities, interactions & sentiments
(distributed batch processing via Hadoop
Map/Reduce)
o Entity linking with knowledge graph/DBpedia
(“president”/“potus”/”trump” =>
dbp:DonaldTrump)
o Sentiment analysis/annotation
o Geotagging
o Lifting into knowledge graph schema
▪ Public, privacy-aware, large-scale research corpus
of public opinions and their evolution
=> interdisciplinary research
P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public
and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
RKG-based social science research using TweetsKB
https://dd4p.gesis.org
Investigating Vaccine Hesitancy in DACH countries
Germany suspends
vaccinations with Astra
Zeneca
Twitter discourse zu “Impfbereitschaft”
RKG-based social science research using TweetsKB
Investigating Vaccine Hesitancy in DACH countries
https://dd4p.gesis.org
RKG-based discourse analysis using TweetsKB
Vaccine Hesitancy– key topics in “safety” category
„Schwangerschaft“ „Kimmich“
„Alter“
„Nebenwirkungen“
„Herzinfarkt“
„Zulassung“
https://dd4p.gesis.org
Summary: Research KGs @ GESIS
21
Tools for constructing scholarly knowledge graphs
● NLP and deep learning-powered methods for extracting large-scale KGs
about methods, claims, data, software involved in the scientific process
Large-scale scholarly KGs, e.g.
● KGs about scholarly use of software & research data
(e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted
from 3 M publications, https://data.gesis.org/softwarekg/)
● Web mined KGs of social science research data, e.g. public opinions,
claims and attitudes expressed on social media
(e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments,
https://data.gesis.org/tweetskb)
Semantic Search powered by KGs and related tools
● RKG-powered search across scholarly publications, datasets, methods
and their relations (e.g. GESIS Search, https://search.gesis.org)
https://gesis.org/en/kts
https://search.gesis.org
Outlook: a joint Research KG in NFDI4DS
22
Resources
▪ Datasets
▪ Publications
▪ Code
▪ Software
Concepts
▪ Terms &
Definitions
▪ Claims
▪ Methods
▪ Topics
▪ Entities
• Community
• Expert-curation/annotation
workflow/tools
• Focus point & hub
• PIDs
https://www.nfdi4datascience.de/
• Large-scale knowledge
graphs
• Automated deep learning-
based methods for
extracting KGs
• Populating ORKG
23
@stefandietze
http://stefandietze.net

More Related Content

Similar to Research Knowledge Graphs at NFDI4DS & GESIS

Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
vty
 
Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
amiraryani
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
philipdurbin
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
African Open Science Platform
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
African Open Science Platform
 
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon HodsonCODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
Academy of Science of South Africa (ASSAf)
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiatives
Sarah Jones
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
National Information Standards Organization (NISO)
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
Kathy Fontaine
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
African Open Science Platform
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspective
dri_ireland
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
Robert Oostenveld
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
Susanna-Assunta Sansone
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Academy of Science of South Africa (ASSAf)
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Robin Rice
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
Robin Rice
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
ARDC
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)
albert ca
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
Clare Dean
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
LIBER Europe
 

Similar to Research Knowledge Graphs at NFDI4DS & GESIS (20)

Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon HodsonCODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiatives
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
The African Open Science Platform: Policy, Infrastructure, Skills and Incenti...
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspective
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 

More from Stefan Dietze

NEWORDER Project - Science in the online knowledge order
NEWORDER Project - Science in the online knowledge orderNEWORDER Project - Science in the online knowledge order
NEWORDER Project - Science in the online knowledge order
Stefan Dietze
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Stefan Dietze
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
Stefan Dietze
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
Stefan Dietze
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Stefan Dietze
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Stefan Dietze
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
Stefan Dietze
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Stefan Dietze
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
Stefan Dietze
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
Stefan Dietze
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
Stefan Dietze
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
Stefan Dietze
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Stefan Dietze
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
Stefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Stefan Dietze
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
Stefan Dietze
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
Stefan Dietze
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
Stefan Dietze
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
Stefan Dietze
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
Stefan Dietze
 

More from Stefan Dietze (20)

NEWORDER Project - Science in the online knowledge order
NEWORDER Project - Science in the online knowledge orderNEWORDER Project - Science in the online knowledge order
NEWORDER Project - Science in the online knowledge order
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

Research Knowledge Graphs at NFDI4DS & GESIS

  • 1. Research Knowledge Graphs at NFDI4DS & GESIS RKG Workshop Stefan Dietze, 23.06.2022
  • 2. NFDI4DS - Consortium 2 FHI Universität Leipzig  Several universities as centers of excellence in DS and AI.  Several non-university research institutes as key contributors.  Scientific information centers and infrastructure providers as providers of data and for access to domain user groups.
  • 3. ● Paradigm shift in computer science towards data- and deep learning driven methods (DS = fourth scientific paradigm). ● Reproducibility crisis. ○ Study on neural recommender publications at top-tier- venues finds only 7/18 reproducible and only 1/18 beating state-of-the-art. [Dacrema et al., RecSys2019] ● Bias and discrimination from learned data-induced biases. ○ Gender classification performance of neural approaches performs significantly worse for darker skinned women. [Buolamwini et al., FAccT2018] ○ Health risk of black people consistently underestimated by predictive algorithms. [Obermeyer et al., Science2019] Data Science & AI Challenges 3
  • 4. ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Research Data Cycle Provenance & Dependencies of Research Data, Resources, Knowledge
  • 5. Common questions for researchers • Which top-tier publications cite which data/method? („dataset authority“) • Which data was used to train/evaluate which method? Which method to produce what data? • Which claims are supported/cited/rejected by what dataset or publication? ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Provenance & Dependencies of Research Data, Resources, Knowledge
  • 6. Challenges • Data & metadata about resources and concepts not represented in structured, machine-interpretable, integrated manner (hidden in publications, web pages etc) • Persistent identifiers (e.g. DOIs) used inconsistently (e.g. on publications/datasets) • Relations and semantics not explicit • Reproducibility crisis in CS/DS/AI ▪ Research Data ▪ Publications ▪ Code/Scripts ▪ ML Models ▪ Methods ▪ Claims ▪ Metrics Relations between scientific resources, data, knowledge Provenance & Dependencies of Research Data, Resources, Knowledge
  • 7. • Data interoperability and reuse through established W3C standards for data sharing (on the Web), e.g. RDF, JSON, shared vocabularies (e.g. schema.org, DCAT, DDI), APIs for data reuse and linking • Making links between resources and concepts explicit & machine- interpretable (e.g. which publications cite what dataset?) • Consistent use of persisent IDs (e.g. URIs, DOIs) across all data, e.g. concepts, resources etc („DOIs for all“) • Partners in NFDI4DS/TA2 (“RKGs”) Knowledge Graphs for FAIR Research Data in NFDI4DS Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  • 8. GESIS & TIB RKGs in NFDI4DS 10 • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities
  • 9. 11 Research KGs in Practice: integrated search @ GESIS https://search.gesis.org/ Dataset Rel. Publications
  • 10. 12 From publications to machine-interpretable metadata KGs Disambiguation of dataset & software/script citations https://data.gesis.org/softwarekg ▪ Training deep learning- based model for extraction software & data references in large-scale data (3.5 M publications) ▪ Data lifting into KG ▪ 300+ M triples / statements ▪ Search across data/software/publications (GESIS Search)
  • 11. From publications to machine-interpretable metadata KGs Understanding scientific software/data usage https://data.gesis.org/softwarekg (Schindler et al., CIKM2021) ▪ Understanding SW usage, citation habits and their evolution across disciplines ▪ Rise of data science = rise of software usage
  • 12. https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… From publications to machine-interpretable metadata KGs Understanding scientific software/data usage
  • 13. https://data.gesis.org/softwarekg ▪ Top adopters of data science/AI/software… ▪ …follow the worst citation habits From publications to machine-interpretable metadata KGs Understanding scientific software/data usage
  • 14. http://dbpedia.org/resource/Tim_Berners-Lee wna:positive-emotion onyx:Intensity "0.75" onyx:Intensity "0.0" http://dbpedia.org/resource/Solid wna:negative-emotion From social media to machine-interpretable research data KGs Building a public research knowledge graph from Twitter data https://data.gesis.org/tweetskb
  • 15. From social media to machine-interpretable research data KGs https://data.gesis.org/tweetskb TweetsKB – a large-scale research KG of societal opinions ▪ Harvesting & archiving of 10 Billion tweets (permanent collection from Twitter 1% sample since 2013) ▪ Information extraction pipeline to build a KG of entities, interactions & sentiments (distributed batch processing via Hadoop Map/Reduce) o Entity linking with knowledge graph/DBpedia (“president”/“potus”/”trump” => dbp:DonaldTrump) o Sentiment analysis/annotation o Geotagging o Lifting into knowledge graph schema ▪ Public, privacy-aware, large-scale research corpus of public opinions and their evolution => interdisciplinary research P. Fafalios, V. Iosifidis, E. Ntoutsi, and S. Dietze, TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets, ESWC'18.
  • 16. RKG-based social science research using TweetsKB https://dd4p.gesis.org Investigating Vaccine Hesitancy in DACH countries
  • 17. Germany suspends vaccinations with Astra Zeneca Twitter discourse zu “Impfbereitschaft” RKG-based social science research using TweetsKB Investigating Vaccine Hesitancy in DACH countries https://dd4p.gesis.org
  • 18. RKG-based discourse analysis using TweetsKB Vaccine Hesitancy– key topics in “safety” category „Schwangerschaft“ „Kimmich“ „Alter“ „Nebenwirkungen“ „Herzinfarkt“ „Zulassung“ https://dd4p.gesis.org
  • 19. Summary: Research KGs @ GESIS 21 Tools for constructing scholarly knowledge graphs ● NLP and deep learning-powered methods for extracting large-scale KGs about methods, claims, data, software involved in the scientific process Large-scale scholarly KGs, e.g. ● KGs about scholarly use of software & research data (e.g. SoftwareKG: 1.8 M disambiguated software mentions extracted from 3 M publications, https://data.gesis.org/softwarekg/) ● Web mined KGs of social science research data, e.g. public opinions, claims and attitudes expressed on social media (e.g. TweetsKB: > 10 Bn semantically annotated tweets, sentiments, https://data.gesis.org/tweetskb) Semantic Search powered by KGs and related tools ● RKG-powered search across scholarly publications, datasets, methods and their relations (e.g. GESIS Search, https://search.gesis.org) https://gesis.org/en/kts https://search.gesis.org
  • 20. Outlook: a joint Research KG in NFDI4DS 22 Resources ▪ Datasets ▪ Publications ▪ Code ▪ Software Concepts ▪ Terms & Definitions ▪ Claims ▪ Methods ▪ Topics ▪ Entities • Community • Expert-curation/annotation workflow/tools • Focus point & hub • PIDs https://www.nfdi4datascience.de/ • Large-scale knowledge graphs • Automated deep learning- based methods for extracting KGs • Populating ORKG