SlideShare a Scribd company logo
1 of 16
Download to read offline
Entity-Relationship	Extraction	from		
Wikipedia	Unstructured	Text
Radityo	Eko	Prasojo(Rido)
PhD	Student	@	KRDB,	Free	University	of	Bozen-Bolzano
Supervised	by:
Mouna Kacimi &	Werner	Nutt
20.07.16,	Bilbao,	Spain
Automatically	generated Manually	curated
Automated	extraction	without
(yet)	a	KB	as	a	result
Knowledge	Vault	
[1]
Knowledge	Graph
NELL	[2]
220/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
Infobox completion	[3]	[4]
320/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
420/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
520/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
Where	was	Obama	born?
Who	are	the	children	
of	Obama?
620/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
When	was	Obama	born?
Who	are	the	children	
of	Obama?
Yes	we	can!
Honolulu,	 Hawaii
Malia	and	Sasha	Obama
720/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
Which	are	Obama’s
favourite sports	team?
Does	Obama	have	pets?
Our	goal	is	to	enrich	existing	Knowledge	Bases	by	
extracting	new	facts	in	the	form	of	machine-readable	
entity-relationship	from	Wikipedia	unstructured	text.
Specific	focus:	RDF	
820/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao
Why	is	it	difficult?
•The	extraction	problem
• Entity	extraction	&	disambiguation
• Relation	extraction
•The	representation	problem
• Lack	of	predefined	schema/ontology
• Topic-independency
• Complex	fact	representation
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 9
Why	is	it	difficult?	Example
• “Obama	is	a	supporter	of	the Chicago	White	Sox”
• Straightforward,	singleton	information
• Pure	syntactic	extraction	possible
• Barack_Obama supporterOf Chicago_White_Sox
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 10
Why	is	it	difficult?	Example
• “Obama	is	a	supporter	of	the Chicago	White	Sox”
• Straightforward,	singleton	information
• Pure	syntactic	extraction	possible
• Barack_Obama supporterOf Chicago_White_Sox
• “He is	also	primarily a Chicago	Bears football	fan	in	the NFL,	but	in	his	
childhood	and	adolescence	was a fan	of	the	Pittsburgh	Steelers”
• Complex,	multiple	information
• Semantic	understanding	necessary
• …	how	do	we	represent	this?
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 11
Example:	representing	complex	fact
• “He is	also	primarily a Chicago	Bears football	fan	in	the NFL,	but	in	his	
childhood	and	adolescence	was a fan	of	the	Pittsburgh	Steelers”
• Barack_Obama footballFan Chicago_Bears in NFL
• supporterOf vs	footballFan
• Is	it	necessary	to	include	NFL in	the	whole	relations?
• What	about	the	adjective	primarily?	What	information	does	it	imply?
• Barack_Obama fanOf Pittsburgh_Steelers
• fanOf vs supporterOf
• Missing	the	time	information	referred	in	“in	his	childhood	and	adolescence	
was”
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 12
Approach
• Document	preprocessing	to	annotate	all	entity	occurences.
• Grammatical	dependency	to	extract	(candidate)	relations.
• Separation	between	the	extraction	problem	and	the	representation	
problem
• We	first	extract	all	candidate	relations	and	then	later	apply	semantic	refinement	
for	better	representation.
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 13
Preliminary	results
• Ground	truth	manually	curated	from	25	Wikipedia	articles	of	famous	
people.
• Preprocessing	
• 4	handcrafted	extraction	rules	leveraging	grammatical	dependency
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 14
Ongoing	work
• Automated	rule	mining
• Semantic	refinement	for	knowledge	representation
• Ontology	building
• Naming	and	taxonomy	of	entities,	classes,	and	relations
• Handling	complex	fact
• Obama	appoints	x	as	y	in	z
• Handling	modality,	adjectives,	and	sentiment
• “In	the	past”,	“it	is	rumoured that”,	“it	is	not	true	that”
• Future	evaluation
• Bigger	ground	truth	(amount	+	topic	coverage)
• Evaluate	how	well	we	enrich	existing	KBs
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 15
Future	work
• Metadata	extraction
• Data	quality,	data	completeness
• Natural	language	question	answering	based	on	the	enriched	KB.
20/07/16 RE	Prasojo	|	KRDB	@	UNIBZ	|	WebST'16,	Bilbao 16

More Related Content

Viewers also liked

Natural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in LumifyNatural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in Lumify
Charlie Greenbacker
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
DataWorks Summit
 

Viewers also liked (11)

Ontology
OntologyOntology
Ontology
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data Integration
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Ontologies for Mental Health and Disease
Ontologies for Mental Health and DiseaseOntologies for Mental Health and Disease
Ontologies for Mental Health and Disease
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Natural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in LumifyNatural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in Lumify
 
Using AI to Make Sense of Customer Feedback
Using AI to Make Sense of Customer FeedbackUsing AI to Make Sense of Customer Feedback
Using AI to Make Sense of Customer Feedback
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Pipeline for automated structure-based classification in the ChEBI ontology
Pipeline for automated structure-based classification in the ChEBI ontologyPipeline for automated structure-based classification in the ChEBI ontology
Pipeline for automated structure-based classification in the ChEBI ontology
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
AI and the Future of Growth
AI and the Future of GrowthAI and the Future of Growth
AI and the Future of Growth
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 

Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview