SlideShare a Scribd company logo
“Automatic relation extraction
from biomedical content for
knowledge graph creation and
maintenance”
Stefan Geißler - Kairntech
“AI SDV”
Vienna, Oct 11, 2022
info@kairntech.com
1
2
3
French/German NLP/AI specialist created in December 2018
Grenoble, France (headquarter), Paris and Heidelberg, Germany
Selected customers: Boehringer-Ingelheim, SCAI-Fraunhofer, AFP, French
Government, Groupe Revue Fiduciaire,…
Introducing Kairntech
4 Mission: Making NLP / Machine Learning accessible for domain experts (“No code /
Low code platform”)
Stefan Geißler
Co-founder
Kairntech
Heidelberg
Project Intro & Background
• Spring 2021: Fraunhofer SCAI: “We are looking for an automatic
procedure to extract information about entities and their relations from a
larger corpus on psychiatric disorders and render that in the BEL language.
Can you do that?”
“Phosphorylation of glycogen synthase kinase 3beta at Threonine,
668 increases the degradation of Amyloid precursor protein.”
BEL (Biological Expression Language) captures
and encodes biological relations
Copied from Martin Hofmann-Apitius
The challenge: updating knowledge graphs
Number of articles
Existing knowledge graphs outdated quickly; update is
time-consuming and challenging
Data volumes are large –
but not infinite
Knowledge is complex -
but can be formalized
Hypotheses:
Publications on Pubmed on “TAU phosphorylation”
Copied from Martin Hofmann-Apitius
Assessing the “Pharmacome”
Previous efforts at SCAI: Manual creation of KGs on
genes, proteins, drugs on Neurogenerative diseases
Graphs often involve thousands of
genes, proteins and their interactions
On Alzheimer alone, SCAI
built 24 specific subgraphs
Copied from Martin Hofmann-Apitius
Main goal: Find information for
drug repurposing: Which
drugable, known substances
have a therapeutic effect on the
investigated diseases?
Relevant but hard to find: Indirect relations
“A interacts with B which stimulates C which blocks D …”
Approach: Harvest interactions automatically and introduce them
into machine-readable large indication-wide Knowledge Graphs
Chains of interactions over
multiple entities are notoriously
hard to find, often unknown even
to individuals who are aware of
the specific isolated steps.
Copied from Martin Hofmann-Apitius
Towards Automatic Knowledge Graph Creation:
Relation Extraction
Knowledge Graphs:
• Network of entities and the relations between
them
• Facilitates investigating many relevant questions
(compared to trad. Relational DBs)
• E.g. in bio-sciences: pathways (“A stimulates B
which blocks C which increases D…”)
• Growing adoption in the industry
• Many intuitive data management systems
Relation Extraction:
• Challenging NLP task: Identify relations from
natural language text
• Co-Occurrence: Relation holds when entities
occur together in text
• Benefits: Fast, simple
• Drawbacks: overgeneration, no relation
type
• Hand-crafted rules: Apply complex grammar
• Benefits: Can be very accurate, can detect
types of relations
• Drawbacks: complex process of rule
building. How to model relations beyond
sentence-level?
• Machine-learning: Training on annotated text
• Benefits: highest quality
• Drawback: Data & Computation intensive
Kairntech Studio (build a pipeline)
TXT, PDF,
Word, HTML,
XML, JSON…
Import
documents
Create a training
dataset
Explore, label text with
manual or assisted text
annotation tools
Create and compare
learning models
Create NLP pipelines
Combining models, client taxonomy,
Knowledge graphs, technical
components…
Using built-in and
state-of-the-art
algorithms.
Usage via No-code &
easy-to-use web
application
or
via RestAPI
Kairntech Server (run the pipeline in production)
REST API
Link entities to
WikiData taxonomies
Custom housekeeping
to improve
recognition quality:
Filtering
Refine entity
extraction
NLP Pipeline
Knowledge
applications
• Large-scale entity
recognition
• Multi-topic
• Multi-lingual
• Constantly updated
• Linked to background
information
Compute add’tl
entity properties
Proteins receive label
about protein
modification (ex:
phosphorylation)
Build relations
between entities
Format results
to output
Apply custom-
trained relation
extraction model
Return results in
BEL format for
integration in
Knowledge Graph
environment
Entity Linking & Namespaces
Entities are not just strings, they need to be associated with the real-world objects they
refer to.
A subset of namespaces is used:
⚫ HGNC (Hugo Gene Nomenclature Committee) for proteins
⚫ MeSH (Medical Subject Headings) for pathologies
⚫ CheBI (Chemical Entities of Biological Interest) for drug/chemical abundances
⚫ GO (Gene Ontology) for biological processes
Recognized entities are
⚫ Disambiguated (“cancer”: animal or disease?)
⚫ Scored (prominence/weight of the concept at in this context?)
⚫ Normalized (synonyms → preferred form, ex: “NIDDM” → “Diabetes Mellitus Type 2”)
⚫ Linked: Entities are linked to world-knowledge
NLP Pipeline for Entity Recognition & Linking: Example!
Relationship extraction
• Trained on a relation training data set
• The table (right) lists the most important
relations addressed in the project
• The model determines whether between any
pair of entities a relation holds or not
(“NoRelation”) and if yes, which one
• Quality decreases with smaller numbers of
training examples per relationship type (as
expected).
• Manual inspection of (a sample of) the results
from Kairntech by SCAI experts: ~73% of the
relations returned by Kairntech are valid
RELATION NoRelation
RELATION increases
RELATION decreases
RELATION regulates
RELATION positiveCorrelation
RELATION association
RELATION negativeCorrelation
Complete NLP Pipeline with Relationship Extraction: Example!
Results displayed as a Knowledge Graph: Example!
“Which proteins are known to be positively correlated both
with Schizophrenia as well as Bipolar Disorders?”
“We now have relation extraction accuracies of
between 70% and 80%. Not long ago you had to
be happy to get that for just entity extraction.
Having this for relations now is outstanding.”
Martin Hoffmann-Apitius, SCAI
Link entities to
taxonomies
Refine entity
extraction
Kairntech Natural Language Processing Pipeline
Create relations
between entities
BEL
output
Automated relationship extraction
Ongoing work
• Speed up the analysis by parallelizing the process: Make use of SCAI’s high
performance computing cluster, leverage SCAI expertise in parallelizing
complex software processes
• Investigate extension to other therapeutic areas
• Define joint approach & offering for industry use cases on
• computing topic specific knowledge graphs
• Updating/extending knowledge graphs
• Expand approach to cover large chunks (all?) of Medline?
• Joint (Kairntech&SCAI) communication efforts: Webinars, publications
• Cf. www.biorxiv.org/content/10.1101/2022.03.07.483233v1
Conclusion / findings
• Kairntech off-the-shelf entity extraction performs well even in this
highly specific subdomain
• Kairntech notion of processing pipelines allows to define
sophisticated processing chains (here: entity recognition, application
of specific custom model (ModType), relation extraction, output
encoding into BEL syntax)
• Relation results assessed and declared useful by SCAI experts after
detailed manual analysis
• Relation extraction from large literature corpus to feed Knowledge
Graphs is feasible
Thank you for your
attention!
info@kairntech.com

More Related Content

Similar to AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Analysis of Scientific Literature Stefan Geißler (Kairntech, Germany)

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
Jennifer Shelton
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
Elia Brodsky
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
Maryam Farooq
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
Eleanor Howe
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks
Access Innovations, Inc.
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
jcscholtes
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
semanticsconference
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
Philip Bourne
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
EITESANGO
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research Community
Monica Munoz-Torres
 
Biehl (2015) Data Warehousing with Semantic Ontologies
Biehl (2015) Data Warehousing with Semantic OntologiesBiehl (2015) Data Warehousing with Semantic Ontologies
Biehl (2015) Data Warehousing with Semantic Ontologies
rbiehl
 
Making Terminology Work
Making Terminology WorkMaking Terminology Work
Making Terminology Work
Health Informatics New Zealand
 
Social Listening for Scientists - BLA Case Study
Social Listening for Scientists - BLA Case StudySocial Listening for Scientists - BLA Case Study
Social Listening for Scientists - BLA Case Study
Masood Akhtar
 
ShortStory_bioCaster.pptx
ShortStory_bioCaster.pptxShortStory_bioCaster.pptx
ShortStory_bioCaster.pptx
DeviPriyaRavi2
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
Barry Smith
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
Anita de Waard
 
Medical Assistant Design during this Pandemic Like Covid-19
Medical Assistant Design during this Pandemic Like Covid-19Medical Assistant Design during this Pandemic Like Covid-19
Medical Assistant Design during this Pandemic Like Covid-19
AI Publications
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
Chris Evelo
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
Amit Sheth
 
Developing cognitive applications v1
Developing cognitive applications v1Developing cognitive applications v1
Developing cognitive applications v1
Harsha Srivatsa
 

Similar to AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Analysis of Scientific Literature Stefan Geißler (Kairntech, Germany) (20)

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
NYAI #27: Cognitive Architecture & Natural Language Processing w/ Dr. Catheri...
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research Community
 
Biehl (2015) Data Warehousing with Semantic Ontologies
Biehl (2015) Data Warehousing with Semantic OntologiesBiehl (2015) Data Warehousing with Semantic Ontologies
Biehl (2015) Data Warehousing with Semantic Ontologies
 
Making Terminology Work
Making Terminology WorkMaking Terminology Work
Making Terminology Work
 
Social Listening for Scientists - BLA Case Study
Social Listening for Scientists - BLA Case StudySocial Listening for Scientists - BLA Case Study
Social Listening for Scientists - BLA Case Study
 
ShortStory_bioCaster.pptx
ShortStory_bioCaster.pptxShortStory_bioCaster.pptx
ShortStory_bioCaster.pptx
 
Ontologies: What Librarians Need to Know
Ontologies: What Librarians Need to KnowOntologies: What Librarians Need to Know
Ontologies: What Librarians Need to Know
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
Medical Assistant Design during this Pandemic Like Covid-19
Medical Assistant Design during this Pandemic Like Covid-19Medical Assistant Design during this Pandemic Like Covid-19
Medical Assistant Design during this Pandemic Like Covid-19
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Developing cognitive applications v1
Developing cognitive applications v1Developing cognitive applications v1
Developing cognitive applications v1
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
Dr. Haxel Consult
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
Dr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
Dr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
Dr. Haxel Consult
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
 

Recently uploaded

留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
bseovas
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
Paul Walk
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
uehowe
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
davidjhones387
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
Trending Blogers
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
zoowe
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
bseovas
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
ysasp1
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
Trish Parr
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
cuobya
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
uehowe
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
vmemo1
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
ukwwuq
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
SEO Article Boost
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 

Recently uploaded (20)

留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
留学学历(UoA毕业证)奥克兰大学毕业证成绩单官方原版办理
 
Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?Should Repositories Participate in the Fediverse?
Should Repositories Participate in the Fediverse?
 
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
留学挂科(UofM毕业证)明尼苏达大学毕业证成绩单复刻办理
 
Discover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to IndiaDiscover the benefits of outsourcing SEO to India
Discover the benefits of outsourcing SEO to India
 
Explore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories SecretlyExplore-Insanony: Watch Instagram Stories Secretly
Explore-Insanony: Watch Instagram Stories Secretly
 
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
国外证书(Lincoln毕业证)新西兰林肯大学毕业证成绩单不能毕业办理
 
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
不能毕业如何获得(USYD毕业证)悉尼大学毕业证成绩单一比一原版制作
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
假文凭国外(Adelaide毕业证)澳大利亚国立大学毕业证成绩单办理
 
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
办理毕业证(UPenn毕业证)宾夕法尼亚大学毕业证成绩单快速办理
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
 
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
制作原版1:1(Monash毕业证)莫纳什大学毕业证成绩单办理假
 
Understanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdfUnderstanding User Behavior with Google Analytics.pdf
Understanding User Behavior with Google Analytics.pdf
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 

AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Analysis of Scientific Literature Stefan Geißler (Kairntech, Germany)

  • 1. “Automatic relation extraction from biomedical content for knowledge graph creation and maintenance” Stefan Geißler - Kairntech “AI SDV” Vienna, Oct 11, 2022 info@kairntech.com
  • 2. 1 2 3 French/German NLP/AI specialist created in December 2018 Grenoble, France (headquarter), Paris and Heidelberg, Germany Selected customers: Boehringer-Ingelheim, SCAI-Fraunhofer, AFP, French Government, Groupe Revue Fiduciaire,… Introducing Kairntech 4 Mission: Making NLP / Machine Learning accessible for domain experts (“No code / Low code platform”) Stefan Geißler Co-founder Kairntech Heidelberg
  • 3. Project Intro & Background • Spring 2021: Fraunhofer SCAI: “We are looking for an automatic procedure to extract information about entities and their relations from a larger corpus on psychiatric disorders and render that in the BEL language. Can you do that?” “Phosphorylation of glycogen synthase kinase 3beta at Threonine, 668 increases the degradation of Amyloid precursor protein.” BEL (Biological Expression Language) captures and encodes biological relations Copied from Martin Hofmann-Apitius
  • 4. The challenge: updating knowledge graphs Number of articles Existing knowledge graphs outdated quickly; update is time-consuming and challenging Data volumes are large – but not infinite Knowledge is complex - but can be formalized Hypotheses: Publications on Pubmed on “TAU phosphorylation” Copied from Martin Hofmann-Apitius
  • 5. Assessing the “Pharmacome” Previous efforts at SCAI: Manual creation of KGs on genes, proteins, drugs on Neurogenerative diseases Graphs often involve thousands of genes, proteins and their interactions On Alzheimer alone, SCAI built 24 specific subgraphs Copied from Martin Hofmann-Apitius Main goal: Find information for drug repurposing: Which drugable, known substances have a therapeutic effect on the investigated diseases?
  • 6. Relevant but hard to find: Indirect relations “A interacts with B which stimulates C which blocks D …” Approach: Harvest interactions automatically and introduce them into machine-readable large indication-wide Knowledge Graphs Chains of interactions over multiple entities are notoriously hard to find, often unknown even to individuals who are aware of the specific isolated steps. Copied from Martin Hofmann-Apitius
  • 7. Towards Automatic Knowledge Graph Creation: Relation Extraction Knowledge Graphs: • Network of entities and the relations between them • Facilitates investigating many relevant questions (compared to trad. Relational DBs) • E.g. in bio-sciences: pathways (“A stimulates B which blocks C which increases D…”) • Growing adoption in the industry • Many intuitive data management systems Relation Extraction: • Challenging NLP task: Identify relations from natural language text • Co-Occurrence: Relation holds when entities occur together in text • Benefits: Fast, simple • Drawbacks: overgeneration, no relation type • Hand-crafted rules: Apply complex grammar • Benefits: Can be very accurate, can detect types of relations • Drawbacks: complex process of rule building. How to model relations beyond sentence-level? • Machine-learning: Training on annotated text • Benefits: highest quality • Drawback: Data & Computation intensive
  • 8. Kairntech Studio (build a pipeline) TXT, PDF, Word, HTML, XML, JSON… Import documents Create a training dataset Explore, label text with manual or assisted text annotation tools Create and compare learning models Create NLP pipelines Combining models, client taxonomy, Knowledge graphs, technical components… Using built-in and state-of-the-art algorithms. Usage via No-code & easy-to-use web application or via RestAPI
  • 9. Kairntech Server (run the pipeline in production) REST API Link entities to WikiData taxonomies Custom housekeeping to improve recognition quality: Filtering Refine entity extraction NLP Pipeline Knowledge applications • Large-scale entity recognition • Multi-topic • Multi-lingual • Constantly updated • Linked to background information Compute add’tl entity properties Proteins receive label about protein modification (ex: phosphorylation) Build relations between entities Format results to output Apply custom- trained relation extraction model Return results in BEL format for integration in Knowledge Graph environment
  • 10. Entity Linking & Namespaces Entities are not just strings, they need to be associated with the real-world objects they refer to. A subset of namespaces is used: ⚫ HGNC (Hugo Gene Nomenclature Committee) for proteins ⚫ MeSH (Medical Subject Headings) for pathologies ⚫ CheBI (Chemical Entities of Biological Interest) for drug/chemical abundances ⚫ GO (Gene Ontology) for biological processes Recognized entities are ⚫ Disambiguated (“cancer”: animal or disease?) ⚫ Scored (prominence/weight of the concept at in this context?) ⚫ Normalized (synonyms → preferred form, ex: “NIDDM” → “Diabetes Mellitus Type 2”) ⚫ Linked: Entities are linked to world-knowledge
  • 11. NLP Pipeline for Entity Recognition & Linking: Example!
  • 12. Relationship extraction • Trained on a relation training data set • The table (right) lists the most important relations addressed in the project • The model determines whether between any pair of entities a relation holds or not (“NoRelation”) and if yes, which one • Quality decreases with smaller numbers of training examples per relationship type (as expected). • Manual inspection of (a sample of) the results from Kairntech by SCAI experts: ~73% of the relations returned by Kairntech are valid RELATION NoRelation RELATION increases RELATION decreases RELATION regulates RELATION positiveCorrelation RELATION association RELATION negativeCorrelation
  • 13. Complete NLP Pipeline with Relationship Extraction: Example!
  • 14. Results displayed as a Knowledge Graph: Example! “Which proteins are known to be positively correlated both with Schizophrenia as well as Bipolar Disorders?”
  • 15. “We now have relation extraction accuracies of between 70% and 80%. Not long ago you had to be happy to get that for just entity extraction. Having this for relations now is outstanding.” Martin Hoffmann-Apitius, SCAI Link entities to taxonomies Refine entity extraction Kairntech Natural Language Processing Pipeline Create relations between entities BEL output Automated relationship extraction
  • 16. Ongoing work • Speed up the analysis by parallelizing the process: Make use of SCAI’s high performance computing cluster, leverage SCAI expertise in parallelizing complex software processes • Investigate extension to other therapeutic areas • Define joint approach & offering for industry use cases on • computing topic specific knowledge graphs • Updating/extending knowledge graphs • Expand approach to cover large chunks (all?) of Medline? • Joint (Kairntech&SCAI) communication efforts: Webinars, publications • Cf. www.biorxiv.org/content/10.1101/2022.03.07.483233v1
  • 17. Conclusion / findings • Kairntech off-the-shelf entity extraction performs well even in this highly specific subdomain • Kairntech notion of processing pipelines allows to define sophisticated processing chains (here: entity recognition, application of specific custom model (ModType), relation extraction, output encoding into BEL syntax) • Relation results assessed and declared useful by SCAI experts after detailed manual analysis • Relation extraction from large literature corpus to feed Knowledge Graphs is feasible
  • 18. Thank you for your attention! info@kairntech.com