SlideShare a Scribd company logo
Diversity Measure for text
Summarization
Guided by: Dr.Vasudev Verma
Mentor: Litton J Kurisinkel
Submitted By:- Group No.27
Nirat Attri (201201013)
Dhruva Das (201301151)
Siddharth Saklecha (201505570)
Introduction
➢ A summary is brief but detailed outline of a
document, which conveys the essence of
the document.
➢ The goal of a text Summarizer is
condensing the source text into a shorter
version while its overall information and
meaning remains same.
➢ The Generated Summary should cover
relevant topics in the original corpus and
diverse enough.
Motivation
➢ A short summary, which conveys the
essence of the document, helps in finding
relevant information quickly
➢ Text summarization also provides a way to
cluster similar documents and present a
summary.
➢ Text summarization has become an
important and timely tool for assisting and
interpreting text information in today’ fast-
growing information age.
Problem Statement :
➢ Derive a method to improve efficiency of the task
of Text Summarization.
➢ We design novel methods to improve efficiency of the
task of Text Summarization using a class of sub
modular functions.
➢ These functions each combine two terms, one which
encourages the summary to be representative of the
corpus (coverage), and the other which positively
rewards diversity.
➢ Our functions are monotone non-decreasing and
submodular, which means that an efficient scalable
greedy optimization scheme has a constant factor
guarantee of optimality.
Proposed Solution :
Submodular Function:
➢ Sub-modular functions are those that satisfy the
property of diminishing returns: for any A⊆B⊆ V
v, a sub-modular function F must satisfy
F(A+v) - F(A) >= F(B + v) - F(B).
That is, the incremental value of v decreases as
the context in which v is considered grows from A
to B.
➢ An equivalent definition, useful mathematically, is
that for any A,B⊆V, we must have that
F(A)+F(B) >= F(AUB)+F(AnB).
If this is satisfied everywhere with equality, then
the function F is called modular.
Summarization Tools :
➢ CLUTO
It is a software package for clustering low
and high dimensional datasets and for
analyzing the characteristics of the various
clusters.
➢ ROUGE
Recall-Oriented Understudy for Gisting
Evaluation, is a set of metrics and a
software package used for evaluating
automatic summarization and machine
translation software in natural language
processing.
Approach
➢ Two properties of a good summary are relevance and non
redundancy.
➢ Objective functions for extractive summarization usually
measure these two separately and then mix them together
trading off encouraging relevance and penalizing
Redundancy.
➢ The redundancy penalty usually violates the monotonicity
of the objective functions.
➢ In particular, we model the summary quality as
F(S) = L(S) + λ R(S)
where, L(S) measures the coverage, or fidelity, of summary
set S to the document, R(S) rewards diversity in S, and λ is
a trade-off coefficient.
➢ Coverage Measure:
L(S) can be interpreted either as a set function that
measures the similarity of summary set S to the document
to be summarized, or as a function representing some
form of coverage of V by S.
L(S) should be monotone, as coverage improves with a
larger summary.
Approach Continue…
Shannon entropy is a well-known monotone submodular
function. So, we take our coverage function as:
L(S) = Σ min { Ci(S) , α Ci(V) } and i ∈ V
Basically, Ci(S) measures how similar S is to element i, or how
much of i is covered by S and Ci (V) is just the largest value
that Ci(S) can achieve.
➢Diversity Measure :
where Pi, i = 1, ...,K is a partition of the ground set V into
separate clusters, and ri ≥ 0 indicates the singleton reward of i
(i.e., the reward of adding i into the empty set). The value ri
estimates the
importance of i to the summary.
Approach 1 : K-means Clustering
➢ The dataset was fed into CLUTO to perform K-means and
Hierarchical Clustering to obtain clusters referring to
similar data.
➢ We ran a Grid Search on the values of and to get the best
optimal value to maximize the sub modular function.
➢ Algorithm:
➢ Summary → ⌀
➢ allowedClusters ← allClusters
➢ while size(Summary) ≤ 665:
• pick the cluster most similar to corpus from
allowedClusters → chosenCluster
• chosenSentence ← highest ranking sentence of
chosenCluster based on coverage and diversity
measure
• Summary ← chosenSentence
Approach 2 : Agglomerative
Clustering
➢ Agglomerative clustering(also called Hierarchical clustering
analysis or HCA) is a method of cluster analysis which
seeks to build a hierarchy of cluster.
➢ It is a bottom up approac. Each observation starts in its
own cluster, and pairs of clusters are merged as one moves
up the hierarchy.
➢ O(n^3) approach.
➢ Each observation starts in its own cluster and clusters are
succesively merged together. The linkage criteria
determines the metric used for the merge strategy.
➢ In computing the clusters, we used Cosine Similarity
criteria.
Challenges Faced
➢ One the challenges that we faced was to understand the
Output given by the Tool Cluto, which Clusters the given
sentences. After which finding the summary of the folder
was just a basic implementation of the formula given in the
paper.
➢ Another challenge was that the time taken to find the
summary for a single folder was quite large. This time
could be drastically reduced by not using the condition of
Sub-modular formula f(A+v) - f(A) > f(B+v) - f(B), that is, the
incremental value of v decreases as the context in which v
is considered grows from A to B. But this would be at the
cost of accuracy.
Results and Conclusion
➢ The values for λ and α values for the diversity and
coverage measures giving us the submodular function were
calculated using a sweep search for the best values of
ROGUE scores.
➢ We recovered the best values as follows :
α = 15
λ = 4
➢ The clusters were summarized and their ROGUE scores
calculated to estimate the efficiency
➢ We can conclude by saying that K-means and Heirarchical
clustering using a weighted consideration for both, the
diversity and coverage, gives us the best scores in Text
Summarization.
Apporach ROUGE-R ROUGE-F
K-Means and hierarchical 0.3843 0.3792
Agglomerative 0.3724 0.3674
References
➢ Vishal Gupta and Gurpreet Singh Lehal, A Survey of Text
Summarization Extractive Techniques.
➢ Hui Lin and Jeff Bilmes, A Class of Submodular Functions
for Document Summerization.
➢ Hui Lin and Jeff Bilmes, Multi-document Summarization
via Budgeted Maximization of Submodular Function.
➢ Ying Zhao and George Karypis, Criterion Functions for
Document Clustering Experiments and Analysis.
➢ Chin-Yew LIN, A Package for Automatic Evaluation of
Summaries.
➢ DUC 2004,http://duc.nist.gov/data.html
Thank You!!

More Related Content

What's hot

Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
otisg
 
Harendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd YearHarendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd Year
dezyneecole
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
csandit
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
Sharath TS
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
HyeonSeok Choi
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
Yueshen Xu
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deren Lei
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Tomonari Masada
 
An Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & AgreeAn Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & Agree
Diego Krivochen
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
Eugene Nho
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 
The Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesThe Impact Of Semantic Handshakes
The Impact Of Semantic Handshakes
Lutz Maicher
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
Claudia Wagner
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
swapnac12
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 

What's hot (20)

Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Harendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd YearHarendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd Year
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
An Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & AgreeAn Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & Agree
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 
Ca notes
Ca notesCa notes
Ca notes
 
The Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesThe Impact Of Semantic Handshakes
The Impact Of Semantic Handshakes
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
H-MLQ
H-MLQH-MLQ
H-MLQ
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
KMAP PAPER (1)
KMAP PAPER (1)KMAP PAPER (1)
KMAP PAPER (1)
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 

Similar to Ire final

TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
naveenchaurasia
 
Comparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationComparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorization
eSAT Journals
 
35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf
JuliusCaesar36
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
IJDKP
 
E1062530
E1062530E1062530
E1062530
IJERD Editor
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
theijes
 
A0311010106
A0311010106A0311010106
A0311010106theijes
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
Sebastian Ruder
 
Global Citation Recommendations Using Knowledge Graphs
Global Citation Recommendations Using Knowledge GraphsGlobal Citation Recommendations Using Knowledge Graphs
Global Citation Recommendations Using Knowledge Graphs
Eötvös Loránd University
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
Sharvil Katariya
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
Sho Takase
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
Large Scale Hierarchical Text Classification
Large Scale Hierarchical Text ClassificationLarge Scale Hierarchical Text Classification
Large Scale Hierarchical Text Classification
Hammad Haleem
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
Prabhakar Bikkaneti
 
Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149olimpica
 
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Dhivyaa C.R
 
UNIT IV (4).pptx
UNIT IV (4).pptxUNIT IV (4).pptx
UNIT IV (4).pptx
DrDhivyaaCRAssistant
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Chris Ohk
 

Similar to Ire final (20)

TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Comparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationComparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorization
 
35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
E1062530
E1062530E1062530
E1062530
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
 
A0311010106
A0311010106A0311010106
A0311010106
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Global Citation Recommendations Using Knowledge Graphs
Global Citation Recommendations Using Knowledge GraphsGlobal Citation Recommendations Using Knowledge Graphs
Global Citation Recommendations Using Knowledge Graphs
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Large Scale Hierarchical Text Classification
Large Scale Hierarchical Text ClassificationLarge Scale Hierarchical Text Classification
Large Scale Hierarchical Text Classification
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149
 
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
 
UNIT IV (4).pptx
UNIT IV (4).pptxUNIT IV (4).pptx
UNIT IV (4).pptx
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 

Recently uploaded

Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 

Recently uploaded (20)

Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 

Ire final

  • 1. Diversity Measure for text Summarization Guided by: Dr.Vasudev Verma Mentor: Litton J Kurisinkel Submitted By:- Group No.27 Nirat Attri (201201013) Dhruva Das (201301151) Siddharth Saklecha (201505570)
  • 2. Introduction ➢ A summary is brief but detailed outline of a document, which conveys the essence of the document. ➢ The goal of a text Summarizer is condensing the source text into a shorter version while its overall information and meaning remains same. ➢ The Generated Summary should cover relevant topics in the original corpus and diverse enough.
  • 3. Motivation ➢ A short summary, which conveys the essence of the document, helps in finding relevant information quickly ➢ Text summarization also provides a way to cluster similar documents and present a summary. ➢ Text summarization has become an important and timely tool for assisting and interpreting text information in today’ fast- growing information age.
  • 4. Problem Statement : ➢ Derive a method to improve efficiency of the task of Text Summarization. ➢ We design novel methods to improve efficiency of the task of Text Summarization using a class of sub modular functions. ➢ These functions each combine two terms, one which encourages the summary to be representative of the corpus (coverage), and the other which positively rewards diversity. ➢ Our functions are monotone non-decreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. Proposed Solution :
  • 5. Submodular Function: ➢ Sub-modular functions are those that satisfy the property of diminishing returns: for any A⊆B⊆ V v, a sub-modular function F must satisfy F(A+v) - F(A) >= F(B + v) - F(B). That is, the incremental value of v decreases as the context in which v is considered grows from A to B. ➢ An equivalent definition, useful mathematically, is that for any A,B⊆V, we must have that F(A)+F(B) >= F(AUB)+F(AnB). If this is satisfied everywhere with equality, then the function F is called modular.
  • 6. Summarization Tools : ➢ CLUTO It is a software package for clustering low and high dimensional datasets and for analyzing the characteristics of the various clusters. ➢ ROUGE Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing.
  • 7. Approach ➢ Two properties of a good summary are relevance and non redundancy. ➢ Objective functions for extractive summarization usually measure these two separately and then mix them together trading off encouraging relevance and penalizing Redundancy. ➢ The redundancy penalty usually violates the monotonicity of the objective functions. ➢ In particular, we model the summary quality as F(S) = L(S) + λ R(S) where, L(S) measures the coverage, or fidelity, of summary set S to the document, R(S) rewards diversity in S, and λ is a trade-off coefficient. ➢ Coverage Measure: L(S) can be interpreted either as a set function that measures the similarity of summary set S to the document to be summarized, or as a function representing some form of coverage of V by S. L(S) should be monotone, as coverage improves with a larger summary.
  • 8. Approach Continue… Shannon entropy is a well-known monotone submodular function. So, we take our coverage function as: L(S) = Σ min { Ci(S) , α Ci(V) } and i ∈ V Basically, Ci(S) measures how similar S is to element i, or how much of i is covered by S and Ci (V) is just the largest value that Ci(S) can achieve. ➢Diversity Measure : where Pi, i = 1, ...,K is a partition of the ground set V into separate clusters, and ri ≥ 0 indicates the singleton reward of i (i.e., the reward of adding i into the empty set). The value ri estimates the importance of i to the summary.
  • 9. Approach 1 : K-means Clustering ➢ The dataset was fed into CLUTO to perform K-means and Hierarchical Clustering to obtain clusters referring to similar data. ➢ We ran a Grid Search on the values of and to get the best optimal value to maximize the sub modular function. ➢ Algorithm: ➢ Summary → ⌀ ➢ allowedClusters ← allClusters ➢ while size(Summary) ≤ 665: • pick the cluster most similar to corpus from allowedClusters → chosenCluster • chosenSentence ← highest ranking sentence of chosenCluster based on coverage and diversity measure • Summary ← chosenSentence
  • 10. Approach 2 : Agglomerative Clustering ➢ Agglomerative clustering(also called Hierarchical clustering analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of cluster. ➢ It is a bottom up approac. Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. ➢ O(n^3) approach. ➢ Each observation starts in its own cluster and clusters are succesively merged together. The linkage criteria determines the metric used for the merge strategy. ➢ In computing the clusters, we used Cosine Similarity criteria.
  • 11. Challenges Faced ➢ One the challenges that we faced was to understand the Output given by the Tool Cluto, which Clusters the given sentences. After which finding the summary of the folder was just a basic implementation of the formula given in the paper. ➢ Another challenge was that the time taken to find the summary for a single folder was quite large. This time could be drastically reduced by not using the condition of Sub-modular formula f(A+v) - f(A) > f(B+v) - f(B), that is, the incremental value of v decreases as the context in which v is considered grows from A to B. But this would be at the cost of accuracy.
  • 12. Results and Conclusion ➢ The values for λ and α values for the diversity and coverage measures giving us the submodular function were calculated using a sweep search for the best values of ROGUE scores. ➢ We recovered the best values as follows : α = 15 λ = 4 ➢ The clusters were summarized and their ROGUE scores calculated to estimate the efficiency ➢ We can conclude by saying that K-means and Heirarchical clustering using a weighted consideration for both, the diversity and coverage, gives us the best scores in Text Summarization. Apporach ROUGE-R ROUGE-F K-Means and hierarchical 0.3843 0.3792 Agglomerative 0.3724 0.3674
  • 13. References ➢ Vishal Gupta and Gurpreet Singh Lehal, A Survey of Text Summarization Extractive Techniques. ➢ Hui Lin and Jeff Bilmes, A Class of Submodular Functions for Document Summerization. ➢ Hui Lin and Jeff Bilmes, Multi-document Summarization via Budgeted Maximization of Submodular Function. ➢ Ying Zhao and George Karypis, Criterion Functions for Document Clustering Experiments and Analysis. ➢ Chin-Yew LIN, A Package for Automatic Evaluation of Summaries. ➢ DUC 2004,http://duc.nist.gov/data.html