SlideShare a Scribd company logo
1 of 14
Diversity Measure for text
Summarization
Guided by: Dr.Vasudev Verma
Mentor: Litton J Kurisinkel
Submitted By:- Group No.27
Nirat Attri (201201013)
Dhruva Das (201301151)
Siddharth Saklecha (201505570)
Introduction
➢ A summary is brief but detailed outline of a
document, which conveys the essence of
the document.
➢ The goal of a text Summarizer is
condensing the source text into a shorter
version while its overall information and
meaning remains same.
➢ The Generated Summary should cover
relevant topics in the original corpus and
be diverse enough.
Motivation
➢ A short summary, which conveys the
essence of the document, helps in finding
relevant information quickly
➢ Text summarization also provides a way to
cluster similar documents and present a
summary.
➢ Text summarization has become an
important and timely tool for assisting and
interpreting text information in today’ fast-
growing information age.
Problem Statement :
➢ Derive a method to improve efficiency of the task
of Text Summarization.
➢ We design novel methods to improve efficiency of the
task of Text Summarization using a class of sub
modular functions.
➢ These functions each combine two terms, one which
encourages the summary to be representative of the
corpus (coverage), and the other which positively
rewards diversity.
➢ Our functions are monotone non-decreasing and
submodular, which means that an efficient scalable
greedy optimization scheme has a constant factor
guarantee of optimality.
Proposed Solution :
Submodular Function:
➢ Sub-modular functions are those that satisfy the
property of diminishing returns: for any A⊆B⊆ V
v, a sub-modular function F must satisfy
F(A+v) - F(A) >= F(B + v) - F(B).
That is, the incremental value of v decreases as
the context in which v is considered grows from A
to B.
➢ An equivalent definition, useful mathematically, is
that for any A,B⊆V, we must have that
F(A)+F(B) >= F(AUB)+F(AnB).
If this is satisfied everywhere with equality, then
the function F is called modular.
Summarization Tools :
➢ CLUTO
It is a software package for clustering low
and high dimensional datasets and for
analyzing the characteristics of the various
clusters.
➢ ROUGE
Recall-Oriented Understudy for Gisting
Evaluation, is a set of metrics and a
software package used for evaluating
automatic summarization and machine
translation software in natural language
processing.
Approach
➢ Two properties of a good summary are relevance and non
redundancy.
➢ Objective functions for extractive summarization usually
measure these two separately and then mix them together
trading off encouraging relevance and penalizing
Redundancy.
➢ The redundancy penalty usually violates the monotonicity
of the objective functions.
➢ In particular, we model the summary quality as
F(S) = L(S) + λ R(S)
where, L(S) measures the coverage, or fidelity, of summary
set S to the document, R(S) rewards diversity in S, and λ is
a trade-off coefficient.
➢ Coverage Measure:
L(S) can be interpreted either as a set function that
measures the similarity of summary set S to the document
to be summarized, or as a function representing some
form of coverage of V by S.
L(S) should be monotone, as coverage improves with a
larger summary.
Approach Continue…
Shannon entropy is a well-known monotone submodular
function. So, we take our coverage function as:
L(S) = Σ min { Ci(S) , α Ci(V) } and i ∈ V
Basically, Ci(S) measures how similar S is to element i, or how
much of i is covered by S and Ci (V) is just the largest value
that Ci(S) can achieve.
➢Diversity Measure :
where Pi, i = 1, ...,K is a partition of the ground set V into
separate clusters, and ri ≥ 0 indicates the singleton reward of i
(i.e., the reward of adding i into the empty set). The value ri
estimates the
importance of i to the summary.
Approach 1 : K-means Clustering
➢ The dataset was fed into CLUTO to perform K-means
Clustering to obtain clusters referring to similar data.
➢ We ran a Grid Search on the values to get the best optimal
value to maximize the sub modular function.
➢ Algorithm:
➢ Summary → ⌀
➢ allowedClusters ← allClusters
➢ while size(Summary) ≤ 665:
• pick the cluster most similar to corpus from
allowedClusters → chosenCluster
• chosenSentence ← highest ranking sentence of
chosenCluster based on coverage and diversity
measure
• Summary ← chosenSentence
Approach 2 : Agglomerative
Clustering
➢ Agglomerative clustering(also called Hierarchical clustering
analysis or HCA) is a method of cluster analysis which
seeks to build a hierarchy of cluster.
➢ It is a bottom up approach. Each observation starts in its
own cluster, and pairs of clusters are merged as one moves
up the hierarchy.
➢ O(n^3) approach.
➢ Each observation starts in its own cluster and clusters are
succesively merged together. The linkage criteria
determines the metric used for the merge strategy.
➢ In computing the clusters, we used Cosine Similarity
criteria.
Challenges Faced
➢ One of the challenges that we faced was to understand the
Output given by the Tool Cluto, which Clusters the given
sentences. Finding the summary of the folder was just a
basic implementation of the formula given in the paper.
➢ Another challenge was that the time taken to find the
summary for a single folder was quite large. This time
could be drastically reduced by not using the condition of
Sub-modular formula f(A+v) - f(A) > f(B+v) - f(B), that is, the
incremental value of v decreases as the context in which v
is considered grows from A to B. But this would be at the
cost of accuracy.
Results and Conclusion
➢ The values for λ and α for the diversity and coverage
measures giving us the submodular function were
calculated using a sweep search for the best values of
ROGUE scores.
➢ We recovered the best values as follows :
α = 15
λ = 4
➢ The clusters were summarized and their ROGUE scores
calculated to estimate the efficiency
➢ We can conclude by saying that Agglomerative clustering
using a weighted consideration for both, the diversity and
coverage, gives us the best scores in Text Summarization.
Approach ROUGE-R ROUGE-F
Agglomerative 0.3843 0.3792
K-Means 0.3724 0.3674
References
➢ Vishal Gupta and Gurpreet Singh Lehal, A Survey of Text
Summarization Extractive Techniques.
➢ Hui Lin and Jeff Bilmes, A Class of Submodular Functions
for Document Summerization.
➢ Hui Lin and Jeff Bilmes, Multi-document Summarization
via Budgeted Maximization of Submodular Function.
➢ Ying Zhao and George Karypis, Criterion Functions for
Document Clustering Experiments and Analysis.
➢ Chin-Yew LIN, A Package for Automatic Evaluation of
Summaries.
➢ DUC 2004,http://duc.nist.gov/data.html
Thank You!!

More Related Content

What's hot

Harendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd YearHarendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd Yeardezyneecole
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Conditioncsandit
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrievalotisg
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Cemal Ardil
 
The Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesThe Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesLutz Maicher
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphsWaqas Nawaz
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6HyeonSeok Choi
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
Graph-Based Code Completion
Graph-Based Code CompletionGraph-Based Code Completion
Graph-Based Code CompletionMasud Rahman
 
An Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & AgreeAn Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & AgreeDiego Krivochen
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2Techglyphs
 
Text classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_juneText classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_juneDeep Learning Italia
 

What's hot (20)

Harendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd YearHarendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd Year
 
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source ConditionError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Ca notes
Ca notesCa notes
Ca notes
 
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
Compact binary-tree-representation-of-logic-function-with-enhanced-throughput-
 
The Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesThe Impact Of Semantic Handshakes
The Impact Of Semantic Handshakes
 
Journal paper 1
Journal paper 1Journal paper 1
Journal paper 1
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Graph-Based Code Completion
Graph-Based Code CompletionGraph-Based Code Completion
Graph-Based Code Completion
 
H-MLQ
H-MLQH-MLQ
H-MLQ
 
An Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & AgreeAn Introduction to Radical Minimalism: Merge & Agree
An Introduction to Radical Minimalism: Merge & Agree
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2Bt0080 fundamentals of algorithms2
Bt0080 fundamentals of algorithms2
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Text classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_juneText classification with fast text elena_meetup_milano_27_june
Text classification with fast text elena_meetup_milano_27_june
 

Viewers also liked

PowerPoint of BioclimaticsCompany
PowerPoint of BioclimaticsCompanyPowerPoint of BioclimaticsCompany
PowerPoint of BioclimaticsCompanySergio Frutos C
 
[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...
[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...
[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...Jens M Eichkorn
 
Habilidades comunicativas-desempeno-profesional
Habilidades comunicativas-desempeno-profesionalHabilidades comunicativas-desempeno-profesional
Habilidades comunicativas-desempeno-profesionalJhon Wilders Serpa Villena
 
High resolution two-dimensional electrophoresis as a tool to differentiate wi...
High resolution two-dimensional electrophoresis as a tool to differentiate wi...High resolution two-dimensional electrophoresis as a tool to differentiate wi...
High resolution two-dimensional electrophoresis as a tool to differentiate wi...Egidijus Dauksas
 
Softjourn and the Entertainment industry VOD Live Video Live Events
Softjourn and the Entertainment industry VOD Live Video Live EventsSoftjourn and the Entertainment industry VOD Live Video Live Events
Softjourn and the Entertainment industry VOD Live Video Live EventsEmmy Gengler
 
Tipos de conexiones a internet
Tipos de conexiones a internetTipos de conexiones a internet
Tipos de conexiones a internetElii Zabeth D.
 
Tipos de conexión de internet
Tipos de conexión de internetTipos de conexión de internet
Tipos de conexión de internetyolipuma1990
 
ResumeComprehensivePDF
ResumeComprehensivePDFResumeComprehensivePDF
ResumeComprehensivePDFMark Sension
 
Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...
Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...
Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...Egidijus Dauksas
 

Viewers also liked (12)

PowerPoint of BioclimaticsCompany
PowerPoint of BioclimaticsCompanyPowerPoint of BioclimaticsCompany
PowerPoint of BioclimaticsCompany
 
[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...
[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...
[Citavi Backup] BachelorThesisEichkornJensM_final 2016-08-30 11-21-06+correct...
 
Habilidades comunicativas-desempeno-profesional
Habilidades comunicativas-desempeno-profesionalHabilidades comunicativas-desempeno-profesional
Habilidades comunicativas-desempeno-profesional
 
User Centered Design
User Centered Design User Centered Design
User Centered Design
 
High resolution two-dimensional electrophoresis as a tool to differentiate wi...
High resolution two-dimensional electrophoresis as a tool to differentiate wi...High resolution two-dimensional electrophoresis as a tool to differentiate wi...
High resolution two-dimensional electrophoresis as a tool to differentiate wi...
 
Softjourn and the Entertainment industry VOD Live Video Live Events
Softjourn and the Entertainment industry VOD Live Video Live EventsSoftjourn and the Entertainment industry VOD Live Video Live Events
Softjourn and the Entertainment industry VOD Live Video Live Events
 
Tipos de conexiones a internet
Tipos de conexiones a internetTipos de conexiones a internet
Tipos de conexiones a internet
 
Tipos de conexión de internet
Tipos de conexión de internetTipos de conexión de internet
Tipos de conexión de internet
 
ResumeComprehensivePDF
ResumeComprehensivePDFResumeComprehensivePDF
ResumeComprehensivePDF
 
Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...
Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...
Supercritical CO2 extraction of the main constituents of Lovage (Levisticum o...
 
7 Components to Medical Device Usability Testing Success
7 Components to Medical Device Usability Testing Success7 Components to Medical Device Usability Testing Success
7 Components to Medical Device Usability Testing Success
 
Game On! The New Reality of Virtual Reality at the GDC16
Game On! The New Reality of Virtual Reality at the GDC16Game On! The New Reality of Virtual Reality at the GDC16
Game On! The New Reality of Virtual Reality at the GDC16
 

Similar to Ire final

EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 
35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdfJuliusCaesar36
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problemstheijes
 
A0311010106
A0311010106A0311010106
A0311010106theijes
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Comparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationComparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationeSAT Journals
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_ProjectreportSampath Velaga
 
An Approach to Mathematically Establish the Practical Use of Assignment Probl...
An Approach to Mathematically Establish the Practical Use of Assignment Probl...An Approach to Mathematically Establish the Practical Use of Assignment Probl...
An Approach to Mathematically Establish the Practical Use of Assignment Probl...ijtsrd
 
Large Scale Hierarchical Text Classification
Large Scale Hierarchical Text ClassificationLarge Scale Hierarchical Text Classification
Large Scale Hierarchical Text ClassificationHammad Haleem
 
Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149olimpica
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesSho Takase
 
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Dhivyaa C.R
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilisticcsandit
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET Journal
 

Similar to Ire final (20)

TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
E1062530
E1062530E1062530
E1062530
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf35000120030_Aritra Kundu_Operations Research.pdf
35000120030_Aritra Kundu_Operations Research.pdf
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
 
A0311010106
A0311010106A0311010106
A0311010106
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Comparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorizationComparative study of classification algorithm for text based categorization
Comparative study of classification algorithm for text based categorization
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_Projectreport
 
An Approach to Mathematically Establish the Practical Use of Assignment Probl...
An Approach to Mathematically Establish the Practical Use of Assignment Probl...An Approach to Mathematically Establish the Practical Use of Assignment Probl...
An Approach to Mathematically Establish the Practical Use of Assignment Probl...
 
Large Scale Hierarchical Text Classification
Large Scale Hierarchical Text ClassificationLarge Scale Hierarchical Text Classification
Large Scale Hierarchical Text Classification
 
Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149Cost versus distance_in_the_traveling_sa_79149
Cost versus distance_in_the_traveling_sa_79149
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
UNIT IV (4).pptx
UNIT IV (4).pptxUNIT IV (4).pptx
UNIT IV (4).pptx
 
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
Instance Learning and Genetic Algorithm by Dr.C.R.Dhivyaa Kongu Engineering C...
 
An approximate possibilistic
An approximate possibilisticAn approximate possibilistic
An approximate possibilistic
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Ire final

  • 1. Diversity Measure for text Summarization Guided by: Dr.Vasudev Verma Mentor: Litton J Kurisinkel Submitted By:- Group No.27 Nirat Attri (201201013) Dhruva Das (201301151) Siddharth Saklecha (201505570)
  • 2. Introduction ➢ A summary is brief but detailed outline of a document, which conveys the essence of the document. ➢ The goal of a text Summarizer is condensing the source text into a shorter version while its overall information and meaning remains same. ➢ The Generated Summary should cover relevant topics in the original corpus and be diverse enough.
  • 3. Motivation ➢ A short summary, which conveys the essence of the document, helps in finding relevant information quickly ➢ Text summarization also provides a way to cluster similar documents and present a summary. ➢ Text summarization has become an important and timely tool for assisting and interpreting text information in today’ fast- growing information age.
  • 4. Problem Statement : ➢ Derive a method to improve efficiency of the task of Text Summarization. ➢ We design novel methods to improve efficiency of the task of Text Summarization using a class of sub modular functions. ➢ These functions each combine two terms, one which encourages the summary to be representative of the corpus (coverage), and the other which positively rewards diversity. ➢ Our functions are monotone non-decreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. Proposed Solution :
  • 5. Submodular Function: ➢ Sub-modular functions are those that satisfy the property of diminishing returns: for any A⊆B⊆ V v, a sub-modular function F must satisfy F(A+v) - F(A) >= F(B + v) - F(B). That is, the incremental value of v decreases as the context in which v is considered grows from A to B. ➢ An equivalent definition, useful mathematically, is that for any A,B⊆V, we must have that F(A)+F(B) >= F(AUB)+F(AnB). If this is satisfied everywhere with equality, then the function F is called modular.
  • 6. Summarization Tools : ➢ CLUTO It is a software package for clustering low and high dimensional datasets and for analyzing the characteristics of the various clusters. ➢ ROUGE Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing.
  • 7. Approach ➢ Two properties of a good summary are relevance and non redundancy. ➢ Objective functions for extractive summarization usually measure these two separately and then mix them together trading off encouraging relevance and penalizing Redundancy. ➢ The redundancy penalty usually violates the monotonicity of the objective functions. ➢ In particular, we model the summary quality as F(S) = L(S) + λ R(S) where, L(S) measures the coverage, or fidelity, of summary set S to the document, R(S) rewards diversity in S, and λ is a trade-off coefficient. ➢ Coverage Measure: L(S) can be interpreted either as a set function that measures the similarity of summary set S to the document to be summarized, or as a function representing some form of coverage of V by S. L(S) should be monotone, as coverage improves with a larger summary.
  • 8. Approach Continue… Shannon entropy is a well-known monotone submodular function. So, we take our coverage function as: L(S) = Σ min { Ci(S) , α Ci(V) } and i ∈ V Basically, Ci(S) measures how similar S is to element i, or how much of i is covered by S and Ci (V) is just the largest value that Ci(S) can achieve. ➢Diversity Measure : where Pi, i = 1, ...,K is a partition of the ground set V into separate clusters, and ri ≥ 0 indicates the singleton reward of i (i.e., the reward of adding i into the empty set). The value ri estimates the importance of i to the summary.
  • 9. Approach 1 : K-means Clustering ➢ The dataset was fed into CLUTO to perform K-means Clustering to obtain clusters referring to similar data. ➢ We ran a Grid Search on the values to get the best optimal value to maximize the sub modular function. ➢ Algorithm: ➢ Summary → ⌀ ➢ allowedClusters ← allClusters ➢ while size(Summary) ≤ 665: • pick the cluster most similar to corpus from allowedClusters → chosenCluster • chosenSentence ← highest ranking sentence of chosenCluster based on coverage and diversity measure • Summary ← chosenSentence
  • 10. Approach 2 : Agglomerative Clustering ➢ Agglomerative clustering(also called Hierarchical clustering analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of cluster. ➢ It is a bottom up approach. Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. ➢ O(n^3) approach. ➢ Each observation starts in its own cluster and clusters are succesively merged together. The linkage criteria determines the metric used for the merge strategy. ➢ In computing the clusters, we used Cosine Similarity criteria.
  • 11. Challenges Faced ➢ One of the challenges that we faced was to understand the Output given by the Tool Cluto, which Clusters the given sentences. Finding the summary of the folder was just a basic implementation of the formula given in the paper. ➢ Another challenge was that the time taken to find the summary for a single folder was quite large. This time could be drastically reduced by not using the condition of Sub-modular formula f(A+v) - f(A) > f(B+v) - f(B), that is, the incremental value of v decreases as the context in which v is considered grows from A to B. But this would be at the cost of accuracy.
  • 12. Results and Conclusion ➢ The values for λ and α for the diversity and coverage measures giving us the submodular function were calculated using a sweep search for the best values of ROGUE scores. ➢ We recovered the best values as follows : α = 15 λ = 4 ➢ The clusters were summarized and their ROGUE scores calculated to estimate the efficiency ➢ We can conclude by saying that Agglomerative clustering using a weighted consideration for both, the diversity and coverage, gives us the best scores in Text Summarization. Approach ROUGE-R ROUGE-F Agglomerative 0.3843 0.3792 K-Means 0.3724 0.3674
  • 13. References ➢ Vishal Gupta and Gurpreet Singh Lehal, A Survey of Text Summarization Extractive Techniques. ➢ Hui Lin and Jeff Bilmes, A Class of Submodular Functions for Document Summerization. ➢ Hui Lin and Jeff Bilmes, Multi-document Summarization via Budgeted Maximization of Submodular Function. ➢ Ying Zhao and George Karypis, Criterion Functions for Document Clustering Experiments and Analysis. ➢ Chin-Yew LIN, A Package for Automatic Evaluation of Summaries. ➢ DUC 2004,http://duc.nist.gov/data.html