SlideShare a Scribd company logo
1 of 30
#EMEARC17
#EMEARC17
Linking entities via
semantic indexing
Shenghui Wang, Rob Koopman
#EMEARC17
Linking Open Data cloud diagram 2017,
by Andrejs Abele, John P. McCrae, Paul
Buitelaar, Anja Jentzsch and Richard
Cyganiak. http://lod-cloud.net/
#EMEARC17
Linked data
The Semantic Web isn't
just about putting data on
the web. It is about making
links, so that a person or
machine can explore the
web of data. With linked
data, when you have some
of it, you can find other,
related, data.
Tim Berners-Lee
http://5stardata.info/en/
#EMEARC17
#EMEARC17
#EMEARC17
#EMEARC17
#EMEARC17
Information locked in free-text
245 [3$c] Translated from the French by
Guy Endore, illustrated with lithographs
by Yngve Derg
#EMEARC17
#EMEARC17
245 [3$c] ( Ying) jian. Ao si ding
zhu ; ( ying ) xiu. Tang mu sen tu ;
zhou dan yi.
700 [1$a] Ao, Siding.
700 [1$a] Tang, Musen.
700 [1$a] Zhou, Dan.
#EMEARC17
#EMEARC17
More links could be recovered
• If we have enough (good) data
• If we have effective and scalable algorithms
• If we have patience
#EMEARC17
Linking entities via semantic indexing
#EMEARC17
An example by Stefan Evert: what’s the meaning of bardiwac?
• He handed her her glass of bardiwac.
• Beef dishes are made to complement the bardiwacs.
• Nigel staggered to his feet, face flushed from too much bardiwac.
• Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s
sunshine.
• I dined on bread and cheese and this excellent bardiwac.
• The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish.
⇒ ‘bardiwac’ is a heavy red alcoholic beverage made from grapes
#EMEARC17
Linking entities via semantic indexing
• Statistical Semantics [furnas1983,weaver1955] based on
the assumption of “a word is characterized by the
company it keeps” [firth1957]
• Distributional Hypothesis [harris1954, sahlgren2008]:
words that occur in similar contexts tend to have similar
meanings
#EMEARC17
Let’s embed entities in a vector space
• Discrete encoding does not help to automatically process
the underlying semantics
• Entities (words) are represented in a continuous vector
space where semantically similar words are mapped to
nearby points (‘are embedding nearby each other’)
• A desirable property: computable similarity
#EMEARC17
Word embedding techniques
Two main categories of approaches:
• Global co-occurrence count-based method, such as
Latent Semantic Analysis
• Local context predictive methods, such as neural
probabilistic language models
#EMEARC17
Word2Vec: Continuous Bag of Words model
• Scan text in large corpus with a window
• The model predicts the current word given
the context
the cat chills on a mat
w(t-2) w(t-1) w(t) w(t+1) w(t+2)
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of
word representations in vector space. ICLR Workshop, 2013.
#EMEARC17
When an entity becomes a vector
• Similarity or relatedness can be computed automatically
– Cosine similarity
• Such similarity/relatedness can be used to link an entity to
its most related entities via schema:isRelatedTo
• Such links can be complementary to existing triples
#EMEARC17
viaf:141187599 fast:1091083
Real estate
management
--Law and legislation
ocn
ocn
ocn
ocn
ocn
Ireland. Office of the
Director of Corporate
Enforcement
…
Cosine similarity (viaf:141187599, fast:1091083) = 0.2
schema:isRelatedTo
#EMEARC17
Complementary links
• viaf:41915577 (Ferris, Ina) is mostly related to
– viaf:68928644 (Engels, Friedrich)
– fast:1033899 (Nationalism in literature)
– viaf:57397450 (Smith, Goldwin)
– viaf:27899280 (Duffy, Charles Gavan)
– viaf:49228757 (Marx, Karl)
– viaf:47158897 (McCarthy, Michael John Fitzgerald)
– viaf:56715842 (Kinealy, Christine)
– viaf:107533016 (Sydney, Lady Morgan Irish novelist)
#EMEARC17
#EMEARC17
Word embedding techniques
• Ariadne (OCLC): based on Random Projection of the
global co-occurrence matrix
• Word2Vec (Google): shallow, two-layer neural
networks that are trained to reconstruct linguistic
contexts of words
• GloVe (Stanford): a global log-bilinear regression
model to learn word vectors based on the ratio of the
co-occurrence probabilities of two words.
#EMEARC17
Different model, different embedding
knee
Word2Vec ankle, hip, elbow, knees, shoulder, patellofemoral, joint, wrist, tka, patellar
GloVe ankle, hip, joint, knees, arthroplasty, osteoarthritis, elbow, flexion, cruciate,
joints
Ariadne knees, knee joint, contralateral knee, tibiofemoral, knee pain, knee motion,
medial compartment, lateral compartment, operated knees, right knee
frog
Word2Vec toad, bullfrog, amphibian, rana, turtle, salamander, caudiverbera, frogs,
leptodactylid, pleurodema
GloVe rana, toad, amphibian, bullfrog, frogs, temporaria, laevis, xenopus, anuran,
catesbeiana
Ariadne frogs, isolated frog, frog muscle, rana pipiens, anurans, hyla, anuran, tree
frog, anuran species, hylid
#EMEARC17
Different corpus, different embedding
What is young?
WorldCat people, children, adolescents, nobleman, christians,
pianists, siblings, vietnamese, clergyman, housekeeper
Medline adults, children, people, women, men, adulthood, infants,
athletes, girls, leaves, patients, mania, boys, chicks, calves
Art library people, children, persons, adults, lady, women, gentlemen,
artists, readers, folks, americans, memorial, girls, architects
Astrophysics stars, supernova, stellar, clusters, massive star clusters,
brown dwarf
#EMEARC17
What about the temporal dimension?
• 20 million Medline articles published since 1977
• 1.5 million entities (subjects, authors, journals, words)
• 8 five-year periods
• Each subject is embedded in 8 chronological vector
spaces
• Is there concept drift and can we detect it?
#EMEARC17
Most and least stable MeSH subjects
Most stable subjects Least stable subjects
history 15th century
history 18th century
history 17th century
history 16th century
history 19th century
thymoma
history ancient
history medieval
rabies
history
diagnostic techniques, surgical
chromium isotopes
shock, surgical
iodine isotopes
diagnostic techniques and procedures
blood circulation time
trauma nervous system
cesium isotopes
liver extracts
macroglobulins
#EMEARC17
Subjects most related to “trauma nervous system”
1977-1982 anatomy regional, fracture fixation internal, bulgaria, piedra, surgery plastic, germany west, wound infection,
carbuncle, burns
1982-1987 legionellosis, povidone, tropocollagen, attention deficit disorder with hyperactivity, legionnaires disease, transfer
psychology
1987-1992 leg injuries, neurosurgical procedures, arm injuries, wound infection, orthopedic equipment, dermatomycoses,
multiple trauma, candidiasis cutaneous, fractures closed
1992-1997 piperacillin, tazobactam, microbiology, diagnostic errors, sorption detoxification, arthroplasty, hsp40 heat shock
proteins, emaciation, professional patient relations
1997-2002 defensive medicine, insurance liability, diagnostic errors, expert testimony, birth injuries, maleic anhydrides,
dimethyl sulfate, medical errors, p protein hepatitis b virus
2002-2007 peripheral nervous system diseases, peripheral nerve injuries, neurologic examination, male, recovery of function,
peripheral nerves, elbow, comorbidity, mother child relations
2007-2012 peripheral nerve injuries, sciatic neuropathy, papilledema, sciatic nerve, peripheral nerves, nerve crush, neuroma,
nerve regeneration, acute disease
2012-2017 mitochondrial dynamics, dental records, park7 protein human, persistent vegetative state, dnm1l protein human,
platelet derived growth factor bb, dual specificity phosphatases, lingual nerve injuries, dental care
anatomy regional, fracture
fixation internal, bulgaria,
piedra, surgery plastic
defensive medicine, insurance
liability, diagnostic errors,
expert testimony, birth injuries,
#EMEARC17
Summary
• Semantic indexing helps to discover links between entities
• Links might have to be time stamped
• Free-text in metadata is a promising but challenging
source
• No perfect algorithms yet but lots of on-going research

More Related Content

Similar to Linking Entities via Semantic Indexing and Word Embeddings

download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...Jan Wedekind
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
A pragmatic view on Semantic Technologies
A pragmatic view on Semantic TechnologiesA pragmatic view on Semantic Technologies
A pragmatic view on Semantic TechnologiesRoberto García
 
2010 mobilelearning workshopsctr5
2010 mobilelearning workshopsctr52010 mobilelearning workshopsctr5
2010 mobilelearning workshopsctr5Stefaan Ternier
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSouth Tyrol Free Software Conference
 
Linq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_siciliaLinq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_siciliaLINQ_Conference
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
 

Similar to Linking Entities via Semantic Indexing and Word Embeddings (20)

download
downloaddownload
download
 
download
downloaddownload
download
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...Efficient implementations of machine vision algorithms using a dynamically ty...
Efficient implementations of machine vision algorithms using a dynamically ty...
 
C010521418
C010521418C010521418
C010521418
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
A pragmatic view on Semantic Technologies
A pragmatic view on Semantic TechnologiesA pragmatic view on Semantic Technologies
A pragmatic view on Semantic Technologies
 
2010 mobilelearning workshopsctr5
2010 mobilelearning workshopsctr52010 mobilelearning workshopsctr5
2010 mobilelearning workshopsctr5
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
Linq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_siciliaLinq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_sicilia
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3ESWC 2014 Tutorial part 3
ESWC 2014 Tutorial part 3
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 

More from Shenghui Wang

Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject PredictionShenghui Wang
 
Semantic indexing for KOS
Semantic indexing for KOSSemantic indexing for KOS
Semantic indexing for KOSShenghui Wang
 
Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...Shenghui Wang
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Shenghui Wang
 
Learning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance SimilarityLearning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance SimilarityShenghui Wang
 
Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...Shenghui Wang
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang
 
What is concept dirft and how to measure it?
What is concept dirft and how to measure it?What is concept dirft and how to measure it?
What is concept dirft and how to measure it?Shenghui Wang
 
Study concept drift in political ontologies
Study concept drift in political ontologiesStudy concept drift in political ontologies
Study concept drift in political ontologiesShenghui Wang
 

More from Shenghui Wang (12)

Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject Prediction
 
Semantic indexing for KOS
Semantic indexing for KOSSemantic indexing for KOS
Semantic indexing for KOS
 
Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...Contextualization of topics - browsing through terms, authors, journals and c...
Contextualization of topics - browsing through terms, authors, journals and c...
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
Learning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance SimilarityLearning Concept Mappings from Instance Similarity
Learning Concept Mappings from Instance Similarity
 
Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...Measuring the dynamic bi-directional influence between content and social ne...
Measuring the dynamic bi-directional influence between content and social ne...
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
What is concept dirft and how to measure it?
What is concept dirft and how to measure it?What is concept dirft and how to measure it?
What is concept dirft and how to measure it?
 
ICA Slides
ICA SlidesICA Slides
ICA Slides
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
Study concept drift in political ontologies
Study concept drift in political ontologiesStudy concept drift in political ontologies
Study concept drift in political ontologies
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Linking Entities via Semantic Indexing and Word Embeddings

  • 2. #EMEARC17 Linking entities via semantic indexing Shenghui Wang, Rob Koopman
  • 3. #EMEARC17 Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
  • 4. #EMEARC17 Linked data The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. Tim Berners-Lee http://5stardata.info/en/
  • 9. #EMEARC17 Information locked in free-text 245 [3$c] Translated from the French by Guy Endore, illustrated with lithographs by Yngve Derg
  • 11. #EMEARC17 245 [3$c] ( Ying) jian. Ao si ding zhu ; ( ying ) xiu. Tang mu sen tu ; zhou dan yi. 700 [1$a] Ao, Siding. 700 [1$a] Tang, Musen. 700 [1$a] Zhou, Dan.
  • 13. #EMEARC17 More links could be recovered • If we have enough (good) data • If we have effective and scalable algorithms • If we have patience
  • 14. #EMEARC17 Linking entities via semantic indexing
  • 15. #EMEARC17 An example by Stefan Evert: what’s the meaning of bardiwac? • He handed her her glass of bardiwac. • Beef dishes are made to complement the bardiwacs. • Nigel staggered to his feet, face flushed from too much bardiwac. • Malbec, one of the lesser-known bardiwac grapes, responds well to Australia’s sunshine. • I dined on bread and cheese and this excellent bardiwac. • The drinks were delicious: blood-red bardiwac as well as light, sweet Rhenish. ⇒ ‘bardiwac’ is a heavy red alcoholic beverage made from grapes
  • 16. #EMEARC17 Linking entities via semantic indexing • Statistical Semantics [furnas1983,weaver1955] based on the assumption of “a word is characterized by the company it keeps” [firth1957] • Distributional Hypothesis [harris1954, sahlgren2008]: words that occur in similar contexts tend to have similar meanings
  • 17. #EMEARC17 Let’s embed entities in a vector space • Discrete encoding does not help to automatically process the underlying semantics • Entities (words) are represented in a continuous vector space where semantically similar words are mapped to nearby points (‘are embedding nearby each other’) • A desirable property: computable similarity
  • 18. #EMEARC17 Word embedding techniques Two main categories of approaches: • Global co-occurrence count-based method, such as Latent Semantic Analysis • Local context predictive methods, such as neural probabilistic language models
  • 19. #EMEARC17 Word2Vec: Continuous Bag of Words model • Scan text in large corpus with a window • The model predicts the current word given the context the cat chills on a mat w(t-2) w(t-1) w(t) w(t+1) w(t+2) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.
  • 20. #EMEARC17 When an entity becomes a vector • Similarity or relatedness can be computed automatically – Cosine similarity • Such similarity/relatedness can be used to link an entity to its most related entities via schema:isRelatedTo • Such links can be complementary to existing triples
  • 21. #EMEARC17 viaf:141187599 fast:1091083 Real estate management --Law and legislation ocn ocn ocn ocn ocn Ireland. Office of the Director of Corporate Enforcement … Cosine similarity (viaf:141187599, fast:1091083) = 0.2 schema:isRelatedTo
  • 22. #EMEARC17 Complementary links • viaf:41915577 (Ferris, Ina) is mostly related to – viaf:68928644 (Engels, Friedrich) – fast:1033899 (Nationalism in literature) – viaf:57397450 (Smith, Goldwin) – viaf:27899280 (Duffy, Charles Gavan) – viaf:49228757 (Marx, Karl) – viaf:47158897 (McCarthy, Michael John Fitzgerald) – viaf:56715842 (Kinealy, Christine) – viaf:107533016 (Sydney, Lady Morgan Irish novelist)
  • 24. #EMEARC17 Word embedding techniques • Ariadne (OCLC): based on Random Projection of the global co-occurrence matrix • Word2Vec (Google): shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words • GloVe (Stanford): a global log-bilinear regression model to learn word vectors based on the ratio of the co-occurrence probabilities of two words.
  • 25. #EMEARC17 Different model, different embedding knee Word2Vec ankle, hip, elbow, knees, shoulder, patellofemoral, joint, wrist, tka, patellar GloVe ankle, hip, joint, knees, arthroplasty, osteoarthritis, elbow, flexion, cruciate, joints Ariadne knees, knee joint, contralateral knee, tibiofemoral, knee pain, knee motion, medial compartment, lateral compartment, operated knees, right knee frog Word2Vec toad, bullfrog, amphibian, rana, turtle, salamander, caudiverbera, frogs, leptodactylid, pleurodema GloVe rana, toad, amphibian, bullfrog, frogs, temporaria, laevis, xenopus, anuran, catesbeiana Ariadne frogs, isolated frog, frog muscle, rana pipiens, anurans, hyla, anuran, tree frog, anuran species, hylid
  • 26. #EMEARC17 Different corpus, different embedding What is young? WorldCat people, children, adolescents, nobleman, christians, pianists, siblings, vietnamese, clergyman, housekeeper Medline adults, children, people, women, men, adulthood, infants, athletes, girls, leaves, patients, mania, boys, chicks, calves Art library people, children, persons, adults, lady, women, gentlemen, artists, readers, folks, americans, memorial, girls, architects Astrophysics stars, supernova, stellar, clusters, massive star clusters, brown dwarf
  • 27. #EMEARC17 What about the temporal dimension? • 20 million Medline articles published since 1977 • 1.5 million entities (subjects, authors, journals, words) • 8 five-year periods • Each subject is embedded in 8 chronological vector spaces • Is there concept drift and can we detect it?
  • 28. #EMEARC17 Most and least stable MeSH subjects Most stable subjects Least stable subjects history 15th century history 18th century history 17th century history 16th century history 19th century thymoma history ancient history medieval rabies history diagnostic techniques, surgical chromium isotopes shock, surgical iodine isotopes diagnostic techniques and procedures blood circulation time trauma nervous system cesium isotopes liver extracts macroglobulins
  • 29. #EMEARC17 Subjects most related to “trauma nervous system” 1977-1982 anatomy regional, fracture fixation internal, bulgaria, piedra, surgery plastic, germany west, wound infection, carbuncle, burns 1982-1987 legionellosis, povidone, tropocollagen, attention deficit disorder with hyperactivity, legionnaires disease, transfer psychology 1987-1992 leg injuries, neurosurgical procedures, arm injuries, wound infection, orthopedic equipment, dermatomycoses, multiple trauma, candidiasis cutaneous, fractures closed 1992-1997 piperacillin, tazobactam, microbiology, diagnostic errors, sorption detoxification, arthroplasty, hsp40 heat shock proteins, emaciation, professional patient relations 1997-2002 defensive medicine, insurance liability, diagnostic errors, expert testimony, birth injuries, maleic anhydrides, dimethyl sulfate, medical errors, p protein hepatitis b virus 2002-2007 peripheral nervous system diseases, peripheral nerve injuries, neurologic examination, male, recovery of function, peripheral nerves, elbow, comorbidity, mother child relations 2007-2012 peripheral nerve injuries, sciatic neuropathy, papilledema, sciatic nerve, peripheral nerves, nerve crush, neuroma, nerve regeneration, acute disease 2012-2017 mitochondrial dynamics, dental records, park7 protein human, persistent vegetative state, dnm1l protein human, platelet derived growth factor bb, dual specificity phosphatases, lingual nerve injuries, dental care anatomy regional, fracture fixation internal, bulgaria, piedra, surgery plastic defensive medicine, insurance liability, diagnostic errors, expert testimony, birth injuries,
  • 30. #EMEARC17 Summary • Semantic indexing helps to discover links between entities • Links might have to be time stamped • Free-text in metadata is a promising but challenging source • No perfect algorithms yet but lots of on-going research

Editor's Notes

  1. For the past eight years, OCLC researchers have been working with library standards experts to define entity-relationship models for the description of resources managed by libraries and implement them on a large scale in OCLC’s publicly accessible databases.
  2. O