SlideShare a Scribd company logo
1 of 26
-Ayush Pareek (Sophomore)
The LNM Institute of InformationTechnology
 TOPICS COVERED:
 Pre-processing
 Stemming algorithms
 Generic and Query-based
Stemming
 Zipf's Law
 Stop-word removal
 frequency matrix
 Clustering
 SentenceWeighting
 Pearson Correlation
Coefficient
 Cosine Similarity
 Abstraction Extraction
based Summary
 =>For coding purposes
we sharpened our
knowledge of C/C++ file
handling, Standard
Template Library, diverse
libraries etc.
 same words were used in sentences containing redundant
information.
 notion of “Connectivity”
 But which Sentences should we use for summary?
 From Literature survey of Statistics::
a)Pearson Correlation Coefficient
b)Cosine Correlation Coefficient
c) Classical Info. Retrieval F-measure.
Step 3 “Sorting and Removing StopWords
Common words like the, and, is, are, for, am, so…
=>Symbols, numbers and punctuations.
STEP 2 “Stemming”
“do”, “doing”, “done”
 do
“agreed”, ”agree”  agree
“gone”, “go”, ”went”  go
• “plays”, ”play”, “playing”  play
STEP 1“Preprocessing”
Extracting only those words from the text which are relevant for analysis.
Pakistan India Surgery Medical Patient
Sentence 1 1 2 0 1 2
Sentence 2 0 0 3 1 1
Sentence 3 2 0 0 1 0
Sentence 4 1 0 0 0 1
Now theVector Corresponding to sentence 1 is::
[1 2 0 1 2]
Finding Correlation between Sentence
Vectors
 Text->Sentences ->Vectors->PCC-> value of r
->gives connectivity between vectors
->connectivity between sentences
COEFFICIENT VALUE
The coefficient value can range
between
-1.00 and 1.00.
CASE 1:: PCC > 0
 As one variable increases, the
other also increases.
 >0.5 =>Considerable
connectivity
 >0.7 =>Strong Connectivity
CASE 3:: PCC < 0
NoegativeAssociation
between variables
Sentence
1
Sentence 2 Sentence 3 Sentence
4
Sentence 5 Sentence 6
Sentence 1 1 0.224862 0.125127 0.40471 0.127615 0.224413
Sentence 2 0.224862 1 0.317351 0.328374 0.0122265 0.116916
Sentence 3 0.125127 0.317351 1 0.297626 -0.0922254 -0.0502292
Sentence 4 0.40471 0.328374 0.297626 1 0.0799604 0.349622
Sentence 5 0.127615 0. 0122265 -0.0922254 0.0799604 1 -0.0791082
Sentence 6 0.224413 0.116916 -0.0502292 0.349622 -0.0791082 1
We need to rank these sentences in order of
“connectivity”
We take the average of each sentenceVector
to compute their order of importance to the
entire text.
 Eg; sentence 3 >sentence 5>
 sentence 7> sentence 8> sentence 9
S1 S2 S3 S4 S5 S6
S1 1 0.225 0.40471 0.125 0.127 0.224
S2 0.225 1 0.317351 0.328374 0.0122265 -0.116916
S3 0.40471 0.317351 1 0.297626 -0.0922254 -
0.0502292
S4 0.125127 0.328374 0.297626 1 0.0799604 0.349622
S5 0.127615 0.0122265 -0.0922254 0.0799604 1 -0.0791082
S6 0.224413 -0.116916 -0.0502292 0.349622 -0.0791082 1
S2 S1+S3/2 S4 S5 S6
S2 1.000000 0.3173510.276618 0.012226 -0.116916
S3+S1/2 0.3173511.000000 0.211376 -0.092225 -0.050229
S4 0.276618 0.211376 1.000000 0.103788 0.287017
S5 0.012226 -0.092225 0.103788 1.000000 -0.079108
S6 -0.116916 -0.050229 0.287017 -0.079108 1.000000
(S1+S2+S3)/3 S4 S5 S4
(S1+S2+S3)/3 1.000000 0.243997 -0.039999 -0.083573
S4 0.243997 ` 1.000000 0.103788 0.287017
S5 -0.039999 0.103788 1.000000 -0.079108
S6 -0.083573 0.287017 -0.079108 1.000000
COEFFICIENT
MATRIX
USING
COSINE
SIMILARITY
Get Document
and perform
Preprocessing
START
TAKE
CONSENSUS
OF FINAL
RANKS
FROMALL 4
METHODS
Make a
WORD v/s
SENTENCE
FREQUENCY
MATRIX
Sentence
Weighting
Sentence
Clustering
Sentence
Weighing
Sentence
Clustering
COEFFICIENT
MATRIX USING
P.C.C.
Basic Steps used in all our algorithms
ALGO 1
ALGO 2
ALGO 3
ALGO
4
METHOD 1:: (GENERIC SUMMARY) Giving Equal
Weights to all 4 algorithms
 Shortcomings of one algorithm is compensated by the
strength of another algorithm.
 Thus, we get the reasonably accurate accurate ranking
possible.
Sentence
Weighting
Sentence
Clustering
P.C.C. Cosine
METHOD 2(Identifying DataSets)::
Algorithm for Math-Dataset
Algorithm for Literature Dataset
Algorithm for Encyclopedia articles
Algorithm for New Reports
Algorithm for Biographies
What is the
Genre of
Data? Use
algorithm on
that Basis
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Algorithm 6
Algorithm 7
Algorithm 8
Take Keywords from
user or use title of
text forWord
Matching with all the
available summaries
Final
Summary
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25
Accuracy
Accuracy
MAXIMA =
87.4 %
Number of sentences (x-axis)
Accuracy
 Language Independent summaries
 Sub-Heading and Index Creator
 Content Highlighter
 Browser Add-On
 Subjective Exam sheet checker
 Making Abstract of Research papers and articles
 Plagiarism Detector
 Hypertext context-link based summarizer
 Daily News feed summarizer / RSS
 In search engines to present compressed descriptions of the
search results
 In keyword directed subscription of news which are
summarized and pushed to the user.
 The software can effectively convert
BRUTE FORCE reading effort to DIVIDE-
AND-CONQUER
News summary maker
Abridged project ppt_ayush
Abridged project ppt_ayush

More Related Content

Similar to Abridged project ppt_ayush

Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems23ashmawy
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesShantanu Sharma
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSHaritikaChhatwal1
 
A NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksA NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksKyriakos Chatzidimitriou
 
linear Algebra least squares
linear Algebra least squareslinear Algebra least squares
linear Algebra least squaresNoreen14
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...HPCC Systems
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
 
Error detection.
Error detection.Error detection.
Error detection.Wasim Akbar
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015Conor McGrory
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
Unsupervised learning
Unsupervised learning Unsupervised learning
Unsupervised learning AlexAman1
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxdickonsondorris
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmHadi Fadlallah
 
MetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systemsMetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systemsLawrence Paulson
 

Similar to Abridged project ppt_ayush (20)

Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems2
 
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICS
 
A NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksA NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State Networks
 
linear Algebra least squares
linear Algebra least squareslinear Algebra least squares
linear Algebra least squares
 
Generation of Descriptive Elements for Text
Generation of Descriptive Elements for TextGeneration of Descriptive Elements for Text
Generation of Descriptive Elements for Text
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
 
Error detection.
Error detection.Error detection.
Error detection.
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
Unsupervised learning
Unsupervised learning Unsupervised learning
Unsupervised learning
 
Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
MetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systemsMetiTarski's menagerie of cooperating systems
MetiTarski's menagerie of cooperating systems
 

Recently uploaded

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

Abridged project ppt_ayush

  • 1. -Ayush Pareek (Sophomore) The LNM Institute of InformationTechnology
  • 2.  TOPICS COVERED:  Pre-processing  Stemming algorithms  Generic and Query-based Stemming  Zipf's Law  Stop-word removal  frequency matrix  Clustering  SentenceWeighting  Pearson Correlation Coefficient  Cosine Similarity  Abstraction Extraction based Summary  =>For coding purposes we sharpened our knowledge of C/C++ file handling, Standard Template Library, diverse libraries etc.
  • 3.  same words were used in sentences containing redundant information.  notion of “Connectivity”  But which Sentences should we use for summary?  From Literature survey of Statistics:: a)Pearson Correlation Coefficient b)Cosine Correlation Coefficient c) Classical Info. Retrieval F-measure.
  • 4. Step 3 “Sorting and Removing StopWords Common words like the, and, is, are, for, am, so… =>Symbols, numbers and punctuations. STEP 2 “Stemming” “do”, “doing”, “done”  do “agreed”, ”agree”  agree “gone”, “go”, ”went”  go • “plays”, ”play”, “playing”  play STEP 1“Preprocessing” Extracting only those words from the text which are relevant for analysis.
  • 5.
  • 6. Pakistan India Surgery Medical Patient Sentence 1 1 2 0 1 2 Sentence 2 0 0 3 1 1 Sentence 3 2 0 0 1 0 Sentence 4 1 0 0 0 1 Now theVector Corresponding to sentence 1 is:: [1 2 0 1 2] Finding Correlation between Sentence Vectors
  • 7.  Text->Sentences ->Vectors->PCC-> value of r ->gives connectivity between vectors ->connectivity between sentences COEFFICIENT VALUE The coefficient value can range between -1.00 and 1.00. CASE 1:: PCC > 0  As one variable increases, the other also increases.  >0.5 =>Considerable connectivity  >0.7 =>Strong Connectivity CASE 3:: PCC < 0 NoegativeAssociation between variables
  • 8.
  • 9. Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 5 Sentence 6 Sentence 1 1 0.224862 0.125127 0.40471 0.127615 0.224413 Sentence 2 0.224862 1 0.317351 0.328374 0.0122265 0.116916 Sentence 3 0.125127 0.317351 1 0.297626 -0.0922254 -0.0502292 Sentence 4 0.40471 0.328374 0.297626 1 0.0799604 0.349622 Sentence 5 0.127615 0. 0122265 -0.0922254 0.0799604 1 -0.0791082 Sentence 6 0.224413 0.116916 -0.0502292 0.349622 -0.0791082 1
  • 10. We need to rank these sentences in order of “connectivity” We take the average of each sentenceVector to compute their order of importance to the entire text.  Eg; sentence 3 >sentence 5>  sentence 7> sentence 8> sentence 9
  • 11. S1 S2 S3 S4 S5 S6 S1 1 0.225 0.40471 0.125 0.127 0.224 S2 0.225 1 0.317351 0.328374 0.0122265 -0.116916 S3 0.40471 0.317351 1 0.297626 -0.0922254 - 0.0502292 S4 0.125127 0.328374 0.297626 1 0.0799604 0.349622 S5 0.127615 0.0122265 -0.0922254 0.0799604 1 -0.0791082 S6 0.224413 -0.116916 -0.0502292 0.349622 -0.0791082 1
  • 12. S2 S1+S3/2 S4 S5 S6 S2 1.000000 0.3173510.276618 0.012226 -0.116916 S3+S1/2 0.3173511.000000 0.211376 -0.092225 -0.050229 S4 0.276618 0.211376 1.000000 0.103788 0.287017 S5 0.012226 -0.092225 0.103788 1.000000 -0.079108 S6 -0.116916 -0.050229 0.287017 -0.079108 1.000000
  • 13. (S1+S2+S3)/3 S4 S5 S4 (S1+S2+S3)/3 1.000000 0.243997 -0.039999 -0.083573 S4 0.243997 ` 1.000000 0.103788 0.287017 S5 -0.039999 0.103788 1.000000 -0.079108 S6 -0.083573 0.287017 -0.079108 1.000000
  • 14. COEFFICIENT MATRIX USING COSINE SIMILARITY Get Document and perform Preprocessing START TAKE CONSENSUS OF FINAL RANKS FROMALL 4 METHODS Make a WORD v/s SENTENCE FREQUENCY MATRIX Sentence Weighting Sentence Clustering Sentence Weighing Sentence Clustering COEFFICIENT MATRIX USING P.C.C. Basic Steps used in all our algorithms ALGO 1 ALGO 2 ALGO 3 ALGO 4
  • 15. METHOD 1:: (GENERIC SUMMARY) Giving Equal Weights to all 4 algorithms  Shortcomings of one algorithm is compensated by the strength of another algorithm.  Thus, we get the reasonably accurate accurate ranking possible. Sentence Weighting Sentence Clustering P.C.C. Cosine
  • 16. METHOD 2(Identifying DataSets):: Algorithm for Math-Dataset Algorithm for Literature Dataset Algorithm for Encyclopedia articles Algorithm for New Reports Algorithm for Biographies What is the Genre of Data? Use algorithm on that Basis
  • 17. Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 Algorithm 5 Algorithm 6 Algorithm 7 Algorithm 8 Take Keywords from user or use title of text forWord Matching with all the available summaries Final Summary
  • 18. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 Accuracy Accuracy MAXIMA = 87.4 % Number of sentences (x-axis) Accuracy
  • 20.  Sub-Heading and Index Creator  Content Highlighter  Browser Add-On  Subjective Exam sheet checker  Making Abstract of Research papers and articles  Plagiarism Detector  Hypertext context-link based summarizer  Daily News feed summarizer / RSS  In search engines to present compressed descriptions of the search results  In keyword directed subscription of news which are summarized and pushed to the user.
  • 21.  The software can effectively convert BRUTE FORCE reading effort to DIVIDE- AND-CONQUER
  • 22.
  • 23.