SlideShare a Scribd company logo
1 of 26
Multi-Document Summarization using
Closed Patterns
Superviser: Ms. Zakia Jalil
Co-Superviser: Ms Sabina Irum
Presented by
Hafsa Sattar [2896-FBAS/BSCS/F14]
Uswa Ihsan [2822-FBAS/BSCS/F14]
Department of Computer Science & Software Engineering
Faculty of Basic &Applied Sciences
International Islamic University, Islamabad.
2
Overview of Presentation
• Introduction.
• Literature Review.
• Problem Statement.
• Proposed Solution.
• Block Diagram.
• Evaluation.
• Tools Technology Dataset.
• References.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
3
Introduction
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
• The internet provides access to a huge volume of
documents.
• We propose multi-document summarization to extract all
information from multiple documents.
• This saves the time and effort of user instead of reading
the whole document.
4
Left Click
Multi-Document Summarization
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
• Multi-document summarization is an automatic
procedure aimed at extraction of information.
• Multi-document summarization creates information
reports that are both concise and comprehensive.
5
Left Click
Literature Review
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
• Multi-documentation summarization methods can be
classified into two classes:
• Extractive summarization :
Extractive summarization extracts the most informative
document components.
• Abstractive summarization:
Abstractive summarization involves reformulation of
contents.
6
Existing System
• Term-Based Method:
A term-based method has the advantages of efficiency and
maturity for term weight calculation.
Term-Based methods can be divided into the following
categories:
1. Centroid-Based Method:
This method uses clustering algorithms to generate
sentences’ clusters by calculating sentence similarity.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
7
Existing System
2. Graph-Based Method:
a)The graph-based approaches also belong to extractive
summarization.
b)This method builds a graph-based model.
c)Then select sentences by means of voting from their
neighbors.
•Ontology-Based Method:
Ontology-based approaches take into account of
meanings of vocabulary
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
8
Problem Statement
• The explosion of electronic documents presents a
serious challenge for reader to extract information .
• The information extracted can be false or incomplete
that might cause trouble in future.
• The main problem in MDS occurred due to the
collection of multiple resources from where the data is
extracted.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
9
Proposed Solution
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
• A pattern-based model for general multi-document
summarization is proposed.
• To extract the most informative sentences from a
document collection.
• Reduce redundancy in the summary.
• Calculates the weight of each sentence of a document
collection.
• By accumulating the weights of its covering closed
patterns.
10
Closed Pattern
• Represent the terms with high frequency in the
document collection.
• Weight of each sentence is decided by the number of
closed patterns in sentence.
• Sentence containing more closed patterns have high
scores.
• Sentence that do not contain any closed pattern have
minimal score zero.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
11
Block Diagram
• Block diagram of MDS using Closed Patterns
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
12
Example
• Table 1. A set of sentences from two news reports
• Sentence is represented by Sj
i , where i is the document
number and j represents sentence number.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
13
Example (Cont…)
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
• Table 2. The terms that occur more than 3 times are saved in table
given below.
14
Left Click
Example (Cont…)
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
Table 3. All frequent patterns are shown in this table with minimum support 3.
15
Example (Cont…)
• The support of all super-patterns is smaller than 4.
• A pattern is a super-pattern if other patterns in the
document are subset of that pattern.
• There are 23 frequent patterns in table 3 but all are not
frequent patterns.
• The longest patterns are consider namely closed
patterns.
• Closed patterns are shown in bold font in table 3.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
16
Sentence Representation
• A closed pattern can be expressed as a term-weight
pair.
• Let ts be a term-weight pair composing of a set of
terms and their weights, such as {(t1, a1),…,(ti,
ai),…,(tx, ax)}
• Where ti denotes a single term, and ai is its weight.
• tw={(Obama, 3), (Republican, 3), (Leader, 3)},
where 3 is the weight of this closed pattern
• w(pi)=|coverSent(pi)|*|coverDoc(pi)|/N
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
17
Literature Survey (Cont…)
• Using table 3, the term-weight pairs of all closed patterns
are;
a) tw1={(Obama, 4), (Leader, 4)}
b) tw2={(Obama, 3), (Republican, 3), (Leader, 3)}
c) tw3={(Obama, 3), (President, 3), (Leader, 3)}
d) tw4={(Obama, 3), (Leader, 3), (Mcconnell, 3),
(Senate, 3)}
• Using tw1, tw2, tw3 and tw4 in table 3
• We have S1
1 ={ tw1, tw2, tw4}, S2
1 ={ tw1, tw2, tw3}, S1
2={
tw1, tw2, tw3, tw4}, S2
2 ={ tw1, tw3, tw4}.
Sentence Representation (Cont…)
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
18
Literature Survey (Cont…)
• tw1={(t1, a1),…,(ti, ai),…,(tx, ax)} and tw2={(w1, b1),…,(wj,
bj),…,(wy, by)} be two term-weight pairs associated to two
patterns.
• The composition operation between tw1 and tw2 will be used to
obtain sentence representation using closed patterns.
• For example, if tw1={(t1, 3), (t2, 2), (t4, 4)}, tw2={(t2, 5), (t3, 2), (t5,
1)},
• then composition operation between tw1 and tw2 is tw1 tw2={(t1,
3), (t2, 7), (t3, 2), (t4, 4), (t5, 1)}.
• Only closed pattern whose size is more than 1 is used.
• The representation of sentence Si can be obtained using formula;
tw(Si)= twi1 twi2…twir.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
Sentence Representation (Cont…)
19
Literature Survey (Cont…)
Then representation of each sentence can be obtained as;
• tw(S1
1)={(Senate, 3), (Leader, 10), (Obama,10),
(Republican, 3), (Mcconnell, 3)}
• tw(S2
1)={(Obama,10), (Republican, 3), (President 3),
(Leader, 10)},
• tw(S1
2)={(Senate, 3), (Mcconnell, 3), (Leader, 13),
(Obama,13), (Republican, 3), (President, 3)},
• tw(S2
2)={(Senate, 3), (Leader, 10), (Obama,10),
(President, 3), (Mcconnell, 3)}.
Sentence Representation (Cont…)
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
20
Literature Survey (Cont…)
• Pattern-based summarization ranks all sentences
according to their sentence representation.
• Those sentences will have high score that contain more
closed patterns with high weight.
• Score (Sj
i)= weight(Sj
i)/| Sj
i |*(1-j-1/|di|)
• The starting sentence in a document contains more new
information than the following sentences.
Sentence Ranking
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
21
Sentence Selection
• Generating the summary of the document collection.
• Considering both content coverage and non-
redundancy.
• Until a given length of the summary is reached.
• Some methods measure the similarity of next
candidate sentence to that of previously selected ones.
• Select it if its similarity is below a threshold.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
22
Dataset
• The standard benchmark DUC2004 dataset.
• Which is form Document Understanding Conference
(DUC) for generic summarization evaluation.
• There are 50 document clusters in DUC2004.
• Each cluster consists of 10 English documents.
• DUC2004 provides at least four human-generated
summaries in each cluster.
• Participants to the DUC2004 contest submitted their
own summaries and were evaluated against human-
generated summaries.
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
23
Evaluation (Cont…)
• ROUGE or Recall-Oriented understudy for Gisting
Evaluation.
• It is a set of metrics and a software package used for
evaluating automatic summarization.
• Rouge tells us how effective our summary is as compared to
human made summary.
• The metrics compare an automatically produced summary
against a reference or a set of reference summary.
• The formula for rouge calculation is:-
• Number of ngrams which are present in both human made and
system summary / total number of possible ngrams in system
made summary
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
24
Tools Technology Dataset
Following are the list of tools and technology used for
this project:
• Python
• DUC 2004 (Document Understanding Conference)
• Matlab 2017
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
25
References
DCSSE/IIUI Multi-Document Summarization using Closed Patterns
[1] Ji-Peng Qiang, Peng Chen, Wei Ding, Fei Xei, Xindong Wu,
Multi-document Summarization using Closed Patterns, Knowledge
Based System (2016).
[2] https://en.wikipedia.org/wiki/Multi-
document_summarization
[3] https://www.slideshare.net/LiaRatna1/sinonim-38250183
[4] https://www.hindawi.com/journals/tswj/2016/1784827/
[5] https://en.wikipedia.org/wiki/ROUGE_(metric)
[6] https://www.quora.com/What-is-the-meaning-and-formula-
for-the-ROUGE-SU-metric-for-evaluating-summaries
[7] https://en.wikipedia.org/wiki/N-gram
[8] https://en.wikipedia.org/wiki/F1_score
26
Questions & Answers
Q & A
Thank You
DCSSE/IIUI Multi-Document Summarization using Closed Patterns

More Related Content

Similar to Project Presentation.ppt

Model of semantic textual document clustering
Model of semantic textual document clusteringModel of semantic textual document clustering
Model of semantic textual document clustering
SK Ahammad Fahad
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logic
IAEME Publication
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
Arumugam90
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
MR Z
 
On Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet StreamsOn Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet Streams
1crore projects
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
Prabin Pandit
 

Similar to Project Presentation.ppt (20)

Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Model of semantic textual document clustering
Model of semantic textual document clusteringModel of semantic textual document clustering
Model of semantic textual document clustering
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logic
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Data analytics for engineers- introduction
Data analytics for engineers-  introductionData analytics for engineers-  introduction
Data analytics for engineers- introduction
 
Ir models
Ir modelsIr models
Ir models
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
 
Topic Segmentation in Dialogue
Topic Segmentation in DialogueTopic Segmentation in Dialogue
Topic Segmentation in Dialogue
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
Test strategy utilising mc useful tools
Test strategy utilising mc useful toolsTest strategy utilising mc useful tools
Test strategy utilising mc useful tools
 
Introduction to Data Structures & Algorithms
Introduction to Data Structures & AlgorithmsIntroduction to Data Structures & Algorithms
Introduction to Data Structures & Algorithms
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
On Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet StreamsOn Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet Streams
 
Lecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.pptLecture-8-The-GIS-Database-Part-1.ppt
Lecture-8-The-GIS-Database-Part-1.ppt
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 

Project Presentation.ppt

  • 1. Multi-Document Summarization using Closed Patterns Superviser: Ms. Zakia Jalil Co-Superviser: Ms Sabina Irum Presented by Hafsa Sattar [2896-FBAS/BSCS/F14] Uswa Ihsan [2822-FBAS/BSCS/F14] Department of Computer Science & Software Engineering Faculty of Basic &Applied Sciences International Islamic University, Islamabad.
  • 2. 2 Overview of Presentation • Introduction. • Literature Review. • Problem Statement. • Proposed Solution. • Block Diagram. • Evaluation. • Tools Technology Dataset. • References. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 3. 3 Introduction DCSSE/IIUI Multi-Document Summarization using Closed Patterns • The internet provides access to a huge volume of documents. • We propose multi-document summarization to extract all information from multiple documents. • This saves the time and effort of user instead of reading the whole document.
  • 4. 4 Left Click Multi-Document Summarization DCSSE/IIUI Multi-Document Summarization using Closed Patterns • Multi-document summarization is an automatic procedure aimed at extraction of information. • Multi-document summarization creates information reports that are both concise and comprehensive.
  • 5. 5 Left Click Literature Review DCSSE/IIUI Multi-Document Summarization using Closed Patterns • Multi-documentation summarization methods can be classified into two classes: • Extractive summarization : Extractive summarization extracts the most informative document components. • Abstractive summarization: Abstractive summarization involves reformulation of contents.
  • 6. 6 Existing System • Term-Based Method: A term-based method has the advantages of efficiency and maturity for term weight calculation. Term-Based methods can be divided into the following categories: 1. Centroid-Based Method: This method uses clustering algorithms to generate sentences’ clusters by calculating sentence similarity. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 7. 7 Existing System 2. Graph-Based Method: a)The graph-based approaches also belong to extractive summarization. b)This method builds a graph-based model. c)Then select sentences by means of voting from their neighbors. •Ontology-Based Method: Ontology-based approaches take into account of meanings of vocabulary DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 8. 8 Problem Statement • The explosion of electronic documents presents a serious challenge for reader to extract information . • The information extracted can be false or incomplete that might cause trouble in future. • The main problem in MDS occurred due to the collection of multiple resources from where the data is extracted. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 9. 9 Proposed Solution DCSSE/IIUI Multi-Document Summarization using Closed Patterns • A pattern-based model for general multi-document summarization is proposed. • To extract the most informative sentences from a document collection. • Reduce redundancy in the summary. • Calculates the weight of each sentence of a document collection. • By accumulating the weights of its covering closed patterns.
  • 10. 10 Closed Pattern • Represent the terms with high frequency in the document collection. • Weight of each sentence is decided by the number of closed patterns in sentence. • Sentence containing more closed patterns have high scores. • Sentence that do not contain any closed pattern have minimal score zero. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 11. 11 Block Diagram • Block diagram of MDS using Closed Patterns DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 12. 12 Example • Table 1. A set of sentences from two news reports • Sentence is represented by Sj i , where i is the document number and j represents sentence number. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 13. 13 Example (Cont…) DCSSE/IIUI Multi-Document Summarization using Closed Patterns • Table 2. The terms that occur more than 3 times are saved in table given below.
  • 14. 14 Left Click Example (Cont…) DCSSE/IIUI Multi-Document Summarization using Closed Patterns Table 3. All frequent patterns are shown in this table with minimum support 3.
  • 15. 15 Example (Cont…) • The support of all super-patterns is smaller than 4. • A pattern is a super-pattern if other patterns in the document are subset of that pattern. • There are 23 frequent patterns in table 3 but all are not frequent patterns. • The longest patterns are consider namely closed patterns. • Closed patterns are shown in bold font in table 3. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 16. 16 Sentence Representation • A closed pattern can be expressed as a term-weight pair. • Let ts be a term-weight pair composing of a set of terms and their weights, such as {(t1, a1),…,(ti, ai),…,(tx, ax)} • Where ti denotes a single term, and ai is its weight. • tw={(Obama, 3), (Republican, 3), (Leader, 3)}, where 3 is the weight of this closed pattern • w(pi)=|coverSent(pi)|*|coverDoc(pi)|/N DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 17. 17 Literature Survey (Cont…) • Using table 3, the term-weight pairs of all closed patterns are; a) tw1={(Obama, 4), (Leader, 4)} b) tw2={(Obama, 3), (Republican, 3), (Leader, 3)} c) tw3={(Obama, 3), (President, 3), (Leader, 3)} d) tw4={(Obama, 3), (Leader, 3), (Mcconnell, 3), (Senate, 3)} • Using tw1, tw2, tw3 and tw4 in table 3 • We have S1 1 ={ tw1, tw2, tw4}, S2 1 ={ tw1, tw2, tw3}, S1 2={ tw1, tw2, tw3, tw4}, S2 2 ={ tw1, tw3, tw4}. Sentence Representation (Cont…) DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 18. 18 Literature Survey (Cont…) • tw1={(t1, a1),…,(ti, ai),…,(tx, ax)} and tw2={(w1, b1),…,(wj, bj),…,(wy, by)} be two term-weight pairs associated to two patterns. • The composition operation between tw1 and tw2 will be used to obtain sentence representation using closed patterns. • For example, if tw1={(t1, 3), (t2, 2), (t4, 4)}, tw2={(t2, 5), (t3, 2), (t5, 1)}, • then composition operation between tw1 and tw2 is tw1 tw2={(t1, 3), (t2, 7), (t3, 2), (t4, 4), (t5, 1)}. • Only closed pattern whose size is more than 1 is used. • The representation of sentence Si can be obtained using formula; tw(Si)= twi1 twi2…twir. DCSSE/IIUI Multi-Document Summarization using Closed Patterns Sentence Representation (Cont…)
  • 19. 19 Literature Survey (Cont…) Then representation of each sentence can be obtained as; • tw(S1 1)={(Senate, 3), (Leader, 10), (Obama,10), (Republican, 3), (Mcconnell, 3)} • tw(S2 1)={(Obama,10), (Republican, 3), (President 3), (Leader, 10)}, • tw(S1 2)={(Senate, 3), (Mcconnell, 3), (Leader, 13), (Obama,13), (Republican, 3), (President, 3)}, • tw(S2 2)={(Senate, 3), (Leader, 10), (Obama,10), (President, 3), (Mcconnell, 3)}. Sentence Representation (Cont…) DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 20. 20 Literature Survey (Cont…) • Pattern-based summarization ranks all sentences according to their sentence representation. • Those sentences will have high score that contain more closed patterns with high weight. • Score (Sj i)= weight(Sj i)/| Sj i |*(1-j-1/|di|) • The starting sentence in a document contains more new information than the following sentences. Sentence Ranking DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 21. 21 Sentence Selection • Generating the summary of the document collection. • Considering both content coverage and non- redundancy. • Until a given length of the summary is reached. • Some methods measure the similarity of next candidate sentence to that of previously selected ones. • Select it if its similarity is below a threshold. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 22. 22 Dataset • The standard benchmark DUC2004 dataset. • Which is form Document Understanding Conference (DUC) for generic summarization evaluation. • There are 50 document clusters in DUC2004. • Each cluster consists of 10 English documents. • DUC2004 provides at least four human-generated summaries in each cluster. • Participants to the DUC2004 contest submitted their own summaries and were evaluated against human- generated summaries. DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 23. 23 Evaluation (Cont…) • ROUGE or Recall-Oriented understudy for Gisting Evaluation. • It is a set of metrics and a software package used for evaluating automatic summarization. • Rouge tells us how effective our summary is as compared to human made summary. • The metrics compare an automatically produced summary against a reference or a set of reference summary. • The formula for rouge calculation is:- • Number of ngrams which are present in both human made and system summary / total number of possible ngrams in system made summary DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 24. 24 Tools Technology Dataset Following are the list of tools and technology used for this project: • Python • DUC 2004 (Document Understanding Conference) • Matlab 2017 DCSSE/IIUI Multi-Document Summarization using Closed Patterns
  • 25. 25 References DCSSE/IIUI Multi-Document Summarization using Closed Patterns [1] Ji-Peng Qiang, Peng Chen, Wei Ding, Fei Xei, Xindong Wu, Multi-document Summarization using Closed Patterns, Knowledge Based System (2016). [2] https://en.wikipedia.org/wiki/Multi- document_summarization [3] https://www.slideshare.net/LiaRatna1/sinonim-38250183 [4] https://www.hindawi.com/journals/tswj/2016/1784827/ [5] https://en.wikipedia.org/wiki/ROUGE_(metric) [6] https://www.quora.com/What-is-the-meaning-and-formula- for-the-ROUGE-SU-metric-for-evaluating-summaries [7] https://en.wikipedia.org/wiki/N-gram [8] https://en.wikipedia.org/wiki/F1_score
  • 26. 26 Questions & Answers Q & A Thank You DCSSE/IIUI Multi-Document Summarization using Closed Patterns