SlideShare a Scribd company logo
1 of 12
Download to read offline
COLING2014_reading_seminar 
Single Document Keyphrase 
Extraction Using Label Information 
tokyo metropolitan university 
M2 
Ryuichi Tachibana
Abstract 
• Keyphrases have found wide ranging application in NLP 
and IR tasks. 
• We pose the problem of extracting label specific 
keyphrases from a document which has labels or tags. 
• Our proposed methods utilizes both the document’s text 
and label information. 
• The objective of this paper is to explore extensions to 
TextRank for extracting label-specific keyphrases from a 
multi-labeled document. 
2
TextRank 
• The TextRank models the text as a graph, text units of 
various sizes and characteristics can be added as vertices. 
• Connections can be drawn between these vertices e.g. 
lexical or semantic relation, etc. 
• To identify “key” text units in this text graph, TextRank runs 
the PageRank algorithm on this constructed graph. 
• The ranking over vertices is obtained by finding the 
stationary distribution of the random walk on the text 
graph. 
3
TextRank for multi-labeled 
document 
• We consider the problem of extracting label specific keyphrases 
from a document which has labels. 
! 
! 
! 
• These keyphrases, though central to the article, are not specific 
to any of the labels that have been assigned to the article. 
• The objective of this paper is to explore extensions to TextRank 
for extracting label-specific keyphrases from a multi-labeled 
document. 4
Proposed Method 
• We propose an approach that models the 
problem of finding label-specific keyphrases 
in a document as a random walk on the 
document’s text-graph. 
• Two approaches are proposed namely PTR: 
Personalized TextRank and TRDMS: TextRank 
using Ranking on Data Manifolds with Sinks. 
5
PTR: Personalized 
TextRank 
• In this setting the PageRank algorithm is replaced with the personalized page rank. 
• Identify a set of label specific features fl 
cand that are strongly correlated with the 
label. 
• By using the label specific features fl 
cand as the personalization vector we are able 
to bias the walk on the underlying text graph towards terms relevant to the label. 
• The label specific features for label Plant Physiology is denoted as follows. The 
term colored in red indicates the term that is common between fplant-physiology and Gd. 
! 
! 
! 
• But terms relevant to labels other than l continue to influence the walk. 
6
TRDMS: TextRank using Ranking on Data 
Manifolds with Sinks approach 
• We model the problem of identifying label specific 
keyphrases in a given document as a random walk over the 
document’s weighted text graph with sink and query nodes. 
! 
! 
• Walk biased towards terms related to fplant-physiology , while 
simultaneously penalizing terms that are related to fair-pollution. 
• The sink points which are shown in black color are vertices 
whose ranking scores are fixed at the minimum score (zero 
in our case) during the ranking process. 
7
TRDMS: TextRank using Ranking on Data 
Manifolds with Sinks approach • One can view p as a vector i.e. p = [p1,....,pM ]. A binary vector y = [y1,....,yM ] is defined in which yi 
= 1 if wi is contained Vl 
d, otherwise yi = 0. 
• Vertices are sorted in reverse order of their score. Let this ranked list be represented as <TK>. 
• In the text “calibrated instruments are used to measure”, if the unigram terms calibrated and 
instruments are selected as potential/candidate terms since they are adjacent they are 
collapsed into one single keyphrase “calibrated instruments”. This heuristic is implemented as a 
function which is referred as kphrasegen(<TK>,d). 
8
Experiment 
• We use a subset of the multi-label corpus EUR-Lex labeled with 
EUROVOC descriptors. 
• EUROVOC is a multilingual thesaurus maintained by the 
Publications Office of the European Union. 
• A document in this data-set could be associated with multiple 
EUROVOC descriptors. 
• The data set that was downloaded contained 16k documents 
documents and 3,993 EUROVOC descriptors. 
• We removed labels represented in this data set. We refer to this 
data set as the EUR-Lexfiltered data set. 
9
Result 
• Two graduate students were asked to manually extract 
label-specific keyphrases for each document.This results 
in a total of 1721 keyphrases. 
• Any annotation conflicts between the two subjects was 
resolved by a third graduate student. 
10
Result • In order to investigate how the size of the 
evidence set impacts the performance of our 
system the following simulation was carried 
out. 
11
Future work 
• We would like to further assess the quality of 
the generated keyphrases by using these 
keyphrases for generating topic (or label) 
focused document summaries. 
12

More Related Content

What's hot

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Text data mining1
Text data mining1Text data mining1
Text data mining1KU Leuven
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalssbd6985
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsMauro Dragoni
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
 
Comparisons of ranking algorithms
Comparisons of ranking algorithmsComparisons of ranking algorithms
Comparisons of ranking algorithmsPravin Patil
 
Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Jie Bao
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesIJwest
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataRoi Blanco
 
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II ajay pashankar
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Foldersfeiwin
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documentsKriti Khanna
 
Automating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsAutomating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsThomas Lee
 
Overview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseOverview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseEditor IJMTER
 
Making data FAIR using InterMine
Making data FAIR using InterMineMaking data FAIR using InterMine
Making data FAIR using InterMineJustin Clark-Casey
 

What's hot (20)

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Translating Ontologies in Real-World Settings
Translating Ontologies in Real-World SettingsTranslating Ontologies in Real-World Settings
Translating Ontologies in Real-World Settings
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
IR
IRIR
IR
 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
 
Comparisons of ranking algorithms
Comparisons of ranking algorithmsComparisons of ranking algorithms
Comparisons of ranking algorithms
 
Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics Query Translation for Data Sources with Heterogeneous Content Semantics
Query Translation for Data Sources with Heterogeneous Content Semantics
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
Text mining
Text miningText mining
Text mining
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
 
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II Unit i  data structure FYCS MUMBAI UNIVERSITY SEM II
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Folders
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documents
 
Automating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic DatasetsAutomating Relational Database Schema Design for Very Large Semantic Datasets
Automating Relational Database Schema Design for Very Large Semantic Datasets
 
Overview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseOverview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented Database
 
Making data FAIR using InterMine
Making data FAIR using InterMineMaking data FAIR using InterMine
Making data FAIR using InterMine
 

Similar to Coling2014:Single Document Keyphrase Extraction Using Label Information

Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and ClusteringAnkur Shrivastava
 
IRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxIRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxShivaVemula2
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachFindwise
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Spark Summit
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents Sharvil Katariya
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdfHabtamu100
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text miningKrish_ver2
 
Name IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docxName IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docxrosemarybdodson23141
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classificationraghavr186
 
NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...
NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...
NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...Nishant Kumar
 

Similar to Coling2014:Single Document Keyphrase Extraction Using Label Information (20)

Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
 
IRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxIRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptx
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Name IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docxName IDPractical Data MiningCOMP-321BTutorial 5.docx
Name IDPractical Data MiningCOMP-321BTutorial 5.docx
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Text mining
Text miningText mining
Text mining
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural Network
 
Ibm haifa.mq.final
Ibm haifa.mq.finalIbm haifa.mq.final
Ibm haifa.mq.final
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Harvester_presentaion
Harvester_presentaionHarvester_presentaion
Harvester_presentaion
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classification
 
NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...
NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...
NIPS 2017 Competition Track : Personalized Cancer Treatment -- Classifying Cl...
 

Recently uploaded

Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 

Recently uploaded (20)

Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 

Coling2014:Single Document Keyphrase Extraction Using Label Information

  • 1. COLING2014_reading_seminar Single Document Keyphrase Extraction Using Label Information tokyo metropolitan university M2 Ryuichi Tachibana
  • 2. Abstract • Keyphrases have found wide ranging application in NLP and IR tasks. • We pose the problem of extracting label specific keyphrases from a document which has labels or tags. • Our proposed methods utilizes both the document’s text and label information. • The objective of this paper is to explore extensions to TextRank for extracting label-specific keyphrases from a multi-labeled document. 2
  • 3. TextRank • The TextRank models the text as a graph, text units of various sizes and characteristics can be added as vertices. • Connections can be drawn between these vertices e.g. lexical or semantic relation, etc. • To identify “key” text units in this text graph, TextRank runs the PageRank algorithm on this constructed graph. • The ranking over vertices is obtained by finding the stationary distribution of the random walk on the text graph. 3
  • 4. TextRank for multi-labeled document • We consider the problem of extracting label specific keyphrases from a document which has labels. ! ! ! • These keyphrases, though central to the article, are not specific to any of the labels that have been assigned to the article. • The objective of this paper is to explore extensions to TextRank for extracting label-specific keyphrases from a multi-labeled document. 4
  • 5. Proposed Method • We propose an approach that models the problem of finding label-specific keyphrases in a document as a random walk on the document’s text-graph. • Two approaches are proposed namely PTR: Personalized TextRank and TRDMS: TextRank using Ranking on Data Manifolds with Sinks. 5
  • 6. PTR: Personalized TextRank • In this setting the PageRank algorithm is replaced with the personalized page rank. • Identify a set of label specific features fl cand that are strongly correlated with the label. • By using the label specific features fl cand as the personalization vector we are able to bias the walk on the underlying text graph towards terms relevant to the label. • The label specific features for label Plant Physiology is denoted as follows. The term colored in red indicates the term that is common between fplant-physiology and Gd. ! ! ! • But terms relevant to labels other than l continue to influence the walk. 6
  • 7. TRDMS: TextRank using Ranking on Data Manifolds with Sinks approach • We model the problem of identifying label specific keyphrases in a given document as a random walk over the document’s weighted text graph with sink and query nodes. ! ! • Walk biased towards terms related to fplant-physiology , while simultaneously penalizing terms that are related to fair-pollution. • The sink points which are shown in black color are vertices whose ranking scores are fixed at the minimum score (zero in our case) during the ranking process. 7
  • 8. TRDMS: TextRank using Ranking on Data Manifolds with Sinks approach • One can view p as a vector i.e. p = [p1,....,pM ]. A binary vector y = [y1,....,yM ] is defined in which yi = 1 if wi is contained Vl d, otherwise yi = 0. • Vertices are sorted in reverse order of their score. Let this ranked list be represented as <TK>. • In the text “calibrated instruments are used to measure”, if the unigram terms calibrated and instruments are selected as potential/candidate terms since they are adjacent they are collapsed into one single keyphrase “calibrated instruments”. This heuristic is implemented as a function which is referred as kphrasegen(<TK>,d). 8
  • 9. Experiment • We use a subset of the multi-label corpus EUR-Lex labeled with EUROVOC descriptors. • EUROVOC is a multilingual thesaurus maintained by the Publications Office of the European Union. • A document in this data-set could be associated with multiple EUROVOC descriptors. • The data set that was downloaded contained 16k documents documents and 3,993 EUROVOC descriptors. • We removed labels represented in this data set. We refer to this data set as the EUR-Lexfiltered data set. 9
  • 10. Result • Two graduate students were asked to manually extract label-specific keyphrases for each document.This results in a total of 1721 keyphrases. • Any annotation conflicts between the two subjects was resolved by a third graduate student. 10
  • 11. Result • In order to investigate how the size of the evidence set impacts the performance of our system the following simulation was carried out. 11
  • 12. Future work • We would like to further assess the quality of the generated keyphrases by using these keyphrases for generating topic (or label) focused document summaries. 12