SlideShare a Scribd company logo
An Ensemble Approach for Entity Type
Prediction over Linked Data
Guangyuan Piao, John G. Breslin
Insight Centre for Data Analytics @NUI Galway, Ireland
Unit for Social Software
The Data Challenge at 5th Joint International Semantic Technology Conference
Yichang, China, 11/11/2015
Contents
• Introduction of the Data Challenge
• Overall Approach
• Results
2
• the main task of the challenge is to predict labels of
entities/resources in Zhishi.me1
• 1,897 entity URLs are provided and 1,397 of them are provided
with label information. Information related to the entities:
• abstracts of entities
• infobox properties
• external links
• related pages
• 13 participated teams
3
Introduction of the Data Challenge
1. http://zhishi.apexlab.org/
• features for predicting entity types
1. all distinct properties of entities in the dataset
2. semantic similarities between the entity and all labels (i.e., insect,
novel etc.)
3. a bag of Named Entities (NEs) created from all abstracts of entities in
the dataset
• feature selection
• in total, there were 1,888 features
• filter out irrelevant features using GainRatioAttributeEval method in
Weka1 (1,888  458 features)
• prediction strategy
• Random Forest as the classification method (using 100 trees)
4
Overall Approach
1. http://www.cs.waikato.ac.nz/ml/weka/
1. all distinct properties of entities in the dataset
• the value of the property is 1 if the entity has the property, 0 if not
1. semantic similarities between the entity and all labels (i.e.,
insect , novel etc.)
• RESIM(ei, ej)1: a measure for calculating the semantic similarity
between two entities in the context of a Linked Data graph
• |lu|: the total # of entities of label lu
5
Overall Approach - Features
1. Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes, Piao et al., JIST2015
3. a bag of Named Entities (NEs) created from all abstracts
of entities in the dataset
• entities appeared at the beginning of an abstract and appeared
frequently in the abstract can have higher weights.
6
Overall Approach - Features
abstract Stanford NER1
segmented NEs
• nei : a NE appeared more than 10 times
• pos(nei, a): the position of nei in a
• n: the # of NEs in a
1. http://nlp.stanford.edu/software/CRF-NER.shtml
• performance on the provided training set (10-fold cross-
validation)
Classifier Precision Recall F-score
Decision Tree 0.942 0.942 0.942
SVM 0.920 0.910 0.912
Random Forest 0.970 0.969 0.969
Stacking 0.949 0.948 0.948
7
Results
• performance on the provided test set (4th among 13 teams)
Team Precision Recall F-score
4PKUICL 0.985439 0.987461 0.986449
1CBrain 0.983633 0.985069 0.98435
3FRDC_ML 0.978096 0.977348 0.977722
6pgy 0.977442 0.977247 0.977344
JIST2015-data challenge

More Related Content

What's hot

Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
 
Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)
Jamshaid Ashraf
 
Information retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic modelsInformation retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic models
Vaibhav Khanna
 
PhD Defense Slides
PhD Defense SlidesPhD Defense Slides
PhD Defense Slides
Debasmit Das
 
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Debasmit Das
 
Naive Bayes | Statistics
Naive Bayes | StatisticsNaive Bayes | Statistics
Naive Bayes | Statistics
Transweb Global Inc
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
Frank van Harmelen
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
Amrapali Zaveri, PhD
 
Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
jerdeb
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
Gokulks007
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
feiwin
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Arjen de Vries
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
feiwin
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
Pranav Prakash
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Jeff Z. Pan
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
Justin Sybrandt, Ph.D.
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
Madhav Sigdel
 
Mappings Validation
Mappings ValidationMappings Validation
Mappings Validation
andimou
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
Constantin Orasan
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
Maribel Acosta Deibe
 

What's hot (20)

Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)Domain Ontology Usage Analysis Framework (OUSAF)
Domain Ontology Usage Analysis Framework (OUSAF)
 
Information retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic modelsInformation retrieval 10 vector and probabilistic models
Information retrieval 10 vector and probabilistic models
 
PhD Defense Slides
PhD Defense SlidesPhD Defense Slides
PhD Defense Slides
 
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...
 
Naive Bayes | Statistics
Naive Bayes | StatisticsNaive Bayes | Statistics
Naive Bayes | Statistics
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Real Time Competitive Marketing Intelligence
Real Time Competitive Marketing IntelligenceReal Time Competitive Marketing Intelligence
Real Time Competitive Marketing Intelligence
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
 
Mappings Validation
Mappings ValidationMappings Validation
Mappings Validation
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 

Similar to JIST2015-data challenge

Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
Simon Knight
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
Elsevier
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
Anita de Waard
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
ijsc
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
ijsc
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
Gabriela Agustini
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Anita de Waard
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Hilmar Lapp
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
David Johnson
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
Francisco Couto
 
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Lucidworks
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET Journal
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
Novartis Institutes for BioMedical Research
 
01 Nature of Biology
01 Nature of Biology01 Nature of Biology
01 Nature of Biology
Stephen Taylor
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
Bianca Pereira
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
James Howison
 
Chapter-OBDD.pptx
Chapter-OBDD.pptxChapter-OBDD.pptx
Chapter-OBDD.pptx
XanGwaps
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
rathnaarul
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
KishorMahale5
 

Similar to JIST2015-data challenge (20)

Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
 
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
 
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
Searching the Stuff of Life - BioSolr: Presented by Matt Pearce & Alan Woodwa...
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
01 Nature of Biology
01 Nature of Biology01 Nature of Biology
01 Nature of Biology
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
Chapter-OBDD.pptx
Chapter-OBDD.pptxChapter-OBDD.pptx
Chapter-OBDD.pptx
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
 

More from GUANGYUAN PIAO

Env2Vec: Accelerating VNF Testing with Deep Learning
Env2Vec: Accelerating VNF Testing with Deep LearningEnv2Vec: Accelerating VNF Testing with Deep Learning
Env2Vec: Accelerating VNF Testing with Deep Learning
GUANGYUAN PIAO
 
Domain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNsDomain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNs
GUANGYUAN PIAO
 
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
GUANGYUAN PIAO
 
Retweet Prediction with Attention-based Deep Neural Network
Retweet Prediction with Attention-based Deep Neural NetworkRetweet Prediction with Attention-based Deep Neural Network
Retweet Prediction with Attention-based Deep Neural Network
GUANGYUAN PIAO
 
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
GUANGYUAN PIAO
 
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
GUANGYUAN PIAO
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
GUANGYUAN PIAO
 
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
GUANGYUAN PIAO
 
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
GUANGYUAN PIAO
 
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
GUANGYUAN PIAO
 
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
GUANGYUAN PIAO
 
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
GUANGYUAN PIAO
 
Analyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News RecommendationsAnalyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News Recommendations
GUANGYUAN PIAO
 
RDFa Basics
RDFa BasicsRDFa Basics
RDFa Basics
GUANGYUAN PIAO
 
Owl 2.0 Overview
Owl 2.0 OverviewOwl 2.0 Overview
Owl 2.0 Overview
GUANGYUAN PIAO
 
OWL 2.0 Primer Part01
OWL 2.0 Primer Part01OWL 2.0 Primer Part01
OWL 2.0 Primer Part01
GUANGYUAN PIAO
 
OWL2.0 Primer Part02
OWL2.0 Primer Part02OWL2.0 Primer Part02
OWL2.0 Primer Part02
GUANGYUAN PIAO
 
Hdd industry
Hdd industryHdd industry
Hdd industry
GUANGYUAN PIAO
 

More from GUANGYUAN PIAO (18)

Env2Vec: Accelerating VNF Testing with Deep Learning
Env2Vec: Accelerating VNF Testing with Deep LearningEnv2Vec: Accelerating VNF Testing with Deep Learning
Env2Vec: Accelerating VNF Testing with Deep Learning
 
Domain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNsDomain-Aware Sentiment Classification with GRUs and CNNs
Domain-Aware Sentiment Classification with GRUs and CNNs
 
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
A Study of the Similarities of Entity Embeddings Learned from Different Aspec...
 
Retweet Prediction with Attention-based Deep Neural Network
Retweet Prediction with Attention-based Deep Neural NetworkRetweet Prediction with Attention-based Deep Neural Network
Retweet Prediction with Attention-based Deep Neural Network
 
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
 
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
Hypertext2017-Leveraging Followee List Memberships for Inferring User Interes...
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
 
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
EKAW2016 - Interest Representation, Enrichment, Dynamics, and Propagation: A ...
 
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
SEMANTiCS2016 - Exploring Dynamics and Semantics of User Interests for User ...
 
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
UMAP2016EA - Analyzing MOOC Entries of Professionals on LinkedIn for User Mod...
 
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
UMAP2016 - Analyzing Aggregated Semantics-enabled User Modeling on Google+ an...
 
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
SAC2016-Measuring Semantic Distance for Linked Open Data-enabled Recommender ...
 
Analyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News RecommendationsAnalyzing User Modeling on Twitter for Personalized News Recommendations
Analyzing User Modeling on Twitter for Personalized News Recommendations
 
RDFa Basics
RDFa BasicsRDFa Basics
RDFa Basics
 
Owl 2.0 Overview
Owl 2.0 OverviewOwl 2.0 Overview
Owl 2.0 Overview
 
OWL 2.0 Primer Part01
OWL 2.0 Primer Part01OWL 2.0 Primer Part01
OWL 2.0 Primer Part01
 
OWL2.0 Primer Part02
OWL2.0 Primer Part02OWL2.0 Primer Part02
OWL2.0 Primer Part02
 
Hdd industry
Hdd industryHdd industry
Hdd industry
 

Recently uploaded

NOVEC 1230 Fire Suppression System Presentation
NOVEC 1230 Fire Suppression System PresentationNOVEC 1230 Fire Suppression System Presentation
NOVEC 1230 Fire Suppression System Presentation
miniruwan1
 
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdfRed Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
mdfkobir
 
Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)
rkpv2002
 
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Machine Learning- Perceptron_Backpropogation_Module 3.pdfMachine Learning- Perceptron_Backpropogation_Module 3.pdf
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Dr. Shivashankar
 
Concepts of Automatic Block Signalling.ppt
Concepts of Automatic Block Signalling.pptConcepts of Automatic Block Signalling.ppt
Concepts of Automatic Block Signalling.ppt
princeshah76
 
System Analysis and Design in a changing world 5th edition
System Analysis and Design in a changing world 5th editionSystem Analysis and Design in a changing world 5th edition
System Analysis and Design in a changing world 5th edition
mnassar75g
 
Safety Operating Procedure for Testing Lifting Tackles
Safety Operating Procedure for Testing Lifting TacklesSafety Operating Procedure for Testing Lifting Tackles
Safety Operating Procedure for Testing Lifting Tackles
ssuserfcf701
 
AFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdfAFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdf
vibhapatil140
 
charting the development of the autonomous train
charting the development of the autonomous traincharting the development of the autonomous train
charting the development of the autonomous train
huseindihon
 
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptxOME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
shanmugamram247
 
Cisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptxCisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptx
Duy Nguyen
 
Protect YugabyteDB with Hashicorp Vault.pdf
Protect YugabyteDB with Hashicorp Vault.pdfProtect YugabyteDB with Hashicorp Vault.pdf
Protect YugabyteDB with Hashicorp Vault.pdf
Gwenn Etourneau
 
Digital Image Processing - Module 4 Chapter 2
Digital Image Processing - Module 4 Chapter 2Digital Image Processing - Module 4 Chapter 2
Digital Image Processing - Module 4 Chapter 2
821priyankaj
 
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele..."Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
nanduchaihan9
 
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf
AsiimweJulius2
 
DPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS
DPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYSDPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS
DPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS
RyanMacayan
 
ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...
ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...
ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...
Fady M. A Hassouna
 
Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...
Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...
Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...
AIRCC Publishing Corporation
 
Human_assault project using jetson nano new
Human_assault project using jetson nano newHuman_assault project using jetson nano new
Human_assault project using jetson nano new
frostflash010
 
Red Hat Enterprise Linux Administration 9.0 RH134 pdf
Red Hat Enterprise Linux Administration 9.0 RH134 pdfRed Hat Enterprise Linux Administration 9.0 RH134 pdf
Red Hat Enterprise Linux Administration 9.0 RH134 pdf
mdfkobir
 

Recently uploaded (20)

NOVEC 1230 Fire Suppression System Presentation
NOVEC 1230 Fire Suppression System PresentationNOVEC 1230 Fire Suppression System Presentation
NOVEC 1230 Fire Suppression System Presentation
 
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdfRed Hat Enterprise Linux Administration 9.0 RH124 pdf
Red Hat Enterprise Linux Administration 9.0 RH124 pdf
 
Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)Generative AI and Large Language Models (LLMs)
Generative AI and Large Language Models (LLMs)
 
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Machine Learning- Perceptron_Backpropogation_Module 3.pdfMachine Learning- Perceptron_Backpropogation_Module 3.pdf
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
 
Concepts of Automatic Block Signalling.ppt
Concepts of Automatic Block Signalling.pptConcepts of Automatic Block Signalling.ppt
Concepts of Automatic Block Signalling.ppt
 
System Analysis and Design in a changing world 5th edition
System Analysis and Design in a changing world 5th editionSystem Analysis and Design in a changing world 5th edition
System Analysis and Design in a changing world 5th edition
 
Safety Operating Procedure for Testing Lifting Tackles
Safety Operating Procedure for Testing Lifting TacklesSafety Operating Procedure for Testing Lifting Tackles
Safety Operating Procedure for Testing Lifting Tackles
 
AFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdfAFCAT STATIC Genral knowledge important CAPSULE.pdf
AFCAT STATIC Genral knowledge important CAPSULE.pdf
 
charting the development of the autonomous train
charting the development of the autonomous traincharting the development of the autonomous train
charting the development of the autonomous train
 
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptxOME754 – INDUSTRIAL SAFETY - unit notes.pptx
OME754 – INDUSTRIAL SAFETY - unit notes.pptx
 
Cisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptxCisco Intersight Technical OverView.pptx
Cisco Intersight Technical OverView.pptx
 
Protect YugabyteDB with Hashicorp Vault.pdf
Protect YugabyteDB with Hashicorp Vault.pdfProtect YugabyteDB with Hashicorp Vault.pdf
Protect YugabyteDB with Hashicorp Vault.pdf
 
Digital Image Processing - Module 4 Chapter 2
Digital Image Processing - Module 4 Chapter 2Digital Image Processing - Module 4 Chapter 2
Digital Image Processing - Module 4 Chapter 2
 
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele..."Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
"Operational and Technical Overview of Electric Locomotives at the Kanpur Ele...
 
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf
1. DEE 1203 ELECTRICAL ENGINEERING DRAWING.pdf
 
DPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS
DPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYSDPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS
DPWH - DEPARTMENT OF PUBLIC WORKS AND HIGHWAYS
 
ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...
ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...
ECONOMIC FEASIBILITY AND ENVIRONMENTAL IMPLICATIONS OF PERMEABLE PAVEMENT IN ...
 
Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...
Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...
Developing a Genetic Algorithm Based Daily Calorie Recommendation System for ...
 
Human_assault project using jetson nano new
Human_assault project using jetson nano newHuman_assault project using jetson nano new
Human_assault project using jetson nano new
 
Red Hat Enterprise Linux Administration 9.0 RH134 pdf
Red Hat Enterprise Linux Administration 9.0 RH134 pdfRed Hat Enterprise Linux Administration 9.0 RH134 pdf
Red Hat Enterprise Linux Administration 9.0 RH134 pdf
 

JIST2015-data challenge

  • 1. An Ensemble Approach for Entity Type Prediction over Linked Data Guangyuan Piao, John G. Breslin Insight Centre for Data Analytics @NUI Galway, Ireland Unit for Social Software The Data Challenge at 5th Joint International Semantic Technology Conference Yichang, China, 11/11/2015
  • 2. Contents • Introduction of the Data Challenge • Overall Approach • Results 2
  • 3. • the main task of the challenge is to predict labels of entities/resources in Zhishi.me1 • 1,897 entity URLs are provided and 1,397 of them are provided with label information. Information related to the entities: • abstracts of entities • infobox properties • external links • related pages • 13 participated teams 3 Introduction of the Data Challenge 1. http://zhishi.apexlab.org/
  • 4. • features for predicting entity types 1. all distinct properties of entities in the dataset 2. semantic similarities between the entity and all labels (i.e., insect, novel etc.) 3. a bag of Named Entities (NEs) created from all abstracts of entities in the dataset • feature selection • in total, there were 1,888 features • filter out irrelevant features using GainRatioAttributeEval method in Weka1 (1,888  458 features) • prediction strategy • Random Forest as the classification method (using 100 trees) 4 Overall Approach 1. http://www.cs.waikato.ac.nz/ml/weka/
  • 5. 1. all distinct properties of entities in the dataset • the value of the property is 1 if the entity has the property, 0 if not 1. semantic similarities between the entity and all labels (i.e., insect , novel etc.) • RESIM(ei, ej)1: a measure for calculating the semantic similarity between two entities in the context of a Linked Data graph • |lu|: the total # of entities of label lu 5 Overall Approach - Features 1. Computing the Semantic Similarity of Resources in DBpedia for Recommendation Purposes, Piao et al., JIST2015
  • 6. 3. a bag of Named Entities (NEs) created from all abstracts of entities in the dataset • entities appeared at the beginning of an abstract and appeared frequently in the abstract can have higher weights. 6 Overall Approach - Features abstract Stanford NER1 segmented NEs • nei : a NE appeared more than 10 times • pos(nei, a): the position of nei in a • n: the # of NEs in a 1. http://nlp.stanford.edu/software/CRF-NER.shtml
  • 7. • performance on the provided training set (10-fold cross- validation) Classifier Precision Recall F-score Decision Tree 0.942 0.942 0.942 SVM 0.920 0.910 0.912 Random Forest 0.970 0.969 0.969 Stacking 0.949 0.948 0.948 7 Results • performance on the provided test set (4th among 13 teams) Team Precision Recall F-score 4PKUICL 0.985439 0.987461 0.986449 1CBrain 0.983633 0.985069 0.98435 3FRDC_ML 0.978096 0.977348 0.977722 6pgy 0.977442 0.977247 0.977344