SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1063
Automatic Text Summarization Using Machine Learning
1Sanjay H Pillai, 1Raheek Mohammed, 1Yadhu Krishna, 1Sangeeth S Krishnan, 2Jithin Jacob
1UG Scholar, Department of Computer Science and Engineering
2Asst. Prof, Department Of Computer Science And Engineering, UKF College of Engineering and Technology
---------------------------------------------------------------------------------***-------------------------------------------------------------------------------------
Abstract- Our project combines human and machine-based approaches to text summarization using a multi-stage
extractor-abstractor network. We use an initial abstract generated by Google’s Pegasus abstraction model as a reference to
create an initial extraction generated by using a novel statistical extraction model that we created. The method uses word
frequency and likelihood of occurrence in the reference summary to generate an accurate extract from the original document,
this is then used as an input to the abstractor. This method allows us to create an accurate extracted input to the abstraction
stage, this improves the efficiency and speed of the abstractor thereby improving the overall performance of the system. This
method is similar to human abstraction method, we first extract important parts of the document before it is shortened to
create an abstract sentence.
Key Words: Summarization, Extraction, Abstraction
1. INTRODUCTION
An effective text summarization technique is an exhaustively researched area in the field of Natural Language Processing.
Text summarization requires shortening of text documents while preserving their intended meaning. This is usually
achieved in two ways, either using extraction or abstraction.
The extraction method [1] [2] [3] involves selection of important sentences from the original document such that the total
number of sentences in the final document is less than that of the original document. In this method there is no rephrasing
of the sentences from the original document as they are included without change to the final summary. This method
provides a fast but primitive abstract.
The abstraction method [4] [5] actively rephrases the sentences from the original document in a concise manner before
including them to the final summary. This method requires machine learning models that can rephrase sentences while
their meaning remains unchanged. Even though abstractive method provides a more human like summary, the process is
often slow and erroneous as sometimes the final summary may contain sentences that are semantically inaccurate or
convey a different meaning that that is intended in the original document.
Although some models combine [6] the use of extraction and abstraction, they are not mainstream and applications that
implement these methods for a large user base do not exist, therefore our summarization system aims to be accessible to a
large user base via an online platform. The website will be an easy-to-use application that allows users to upload large
documents and create summarised documents free of cost.
We propose a novel summarization technique which is a combination of both human and machine approach to text
summarization, we first create a reference abstraction of the original document using Google’s Pegasus abstraction model,
this abstract is used just as a reference to determine the salient sentences from the original document. We use a likelihood-
based extraction method in which we find the probability of adding a sentence from the original document to the extracted
summary using the Pegasus abstract as a reference.
Our contributions therefore include:
A free summarization tool that will be widely accessible to a large userbase, a novel extraction mechanism that can provide
accurate extracted summaries using a statistical approach, A novel abstract-extract-abstract system that can provide much
more accurate results.
2. MODEL
The model works by first creating a reference abstract using the Pegasus model [7], this is then used to determine
the salient sentences from the original document using a statistical approach, this shortened version is then passed
through the abstractor again to create a final summary.
The initial document is first summarised abstractedly using the Pegasus model and some important words from
the original document is appended to this abstract, this forms the reference document containing salient words from the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1064
initial document, now we make another separate document that contains unimportant words from the initial document.
These two documents then act as reference points which is used to classify whether a sentence from the initial document
should be added to the final extract or not. Once the extracted summary is obtained, we again use the Pegasus model to
abstract the extracted summary, this method ensures that we get an accurate summary that includes all the salient
sentences from the initial document that is rephrased so that the final summary is concise and semantically correct. This
method therefore incorporates the advantages of both the extractive and abstractive approach.
3. RELATED WORKS
We find that the majority of the works done on automatic summarization is mainly focused on extraction or
compression-based approaches. There are also some recent abstraction-based approaches using neural abstractive models
[8] [9], RL based metric optimization [10], coverage [11] [12] [13], copy mechanism [14] [15], graph-based attention [16]
etc. Reinforcement Learning is also an extensively used method in this field, Q-learning based RL for extractive
summarization [17] is also an effective method. RL policy gradients have been used for producing accurate abstractive
summaries.
Some works in this field also include use of selective gates [18] to improve attention in abstractive summarization.
There are also works that attempted cascaded small non-recurrent networks on extractive QA, producing a scalable,
parallelizable model [19].
4. EXTRACTOR
The extractor is a new sentence level selection mechanism that uses naïve bayes to estimate the maximum
likelihood for a sentence to be added to the extracted summary. To achieve this, as mentioned above we first create a
temporary reference abstract using the Google’s Pegasus model. Then we use a statistical technique to find salient words
from the initial document, for this we first determine the frequency of occurrence of each no filler word in the original
document, then we use other parameters like capitalization and spacing to find other salient words that were not included
in the initial abstract. These words are then appended to the abstract such the it contains almost all the salient words from
the initial document while ignoring all the unimportant filler words from the original document. These filler words are
the probability of occurrence of each word in the sentence to the prior probability.
then included into another file which is also used as reference to perform naïve bayes.
Fig -1: Model Schema
For each sentence in the original document, we find its likelihood to be included in the extracted summary by multiplying
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1065
L =
Likelihood ‘L’ is given by multiplying the probability of occurrence of each word in the given sentence by the prior
probability ‘P’. If the calculated likelihood is lower with respect to the temporary reference abstract, then the sentence will
not be included in the extracted summary and vice versa.
5. WORD WEIGHTAGE
6. ABSTRACTOR
We use the Pegasus model for the two abstraction stages within our project. The first stage abstraction produces a
larger file that tries to include more salient sentences, this is used as reference along with the weighted words selected
from the original document to act as reference for initial extraction. Once the initial extraction is completed, we use the
extracted output as the input for the second stage of abstraction, since the input of the second stage of abstraction is more
concise than the original document, the second stage of abstraction will be much faster and provide much more accurate
results as the input is tailored during the initial extraction stage. We intend to use our on novel abstraction system that
produces faster results than Pegasus system as a future scope of our project, as of now we decide to continue with the
Pegasus system as it provides concise and accurate abstracts that works well with our new extraction mechanism.
7. USER INTERFACE
The UI is an easy-to-use web application that allows the user to select and upload the documents to be
summarized directly to our servers which then employs our summarization algorithm to produce an accurate summary
and then returns a downloadable link to the user. The web application is developed using HTML, CSS, JavaScript front end
stack. For testing purposes, we host the web application initially on a local Linux server using apache2.
8. HUMAN VALIDATION
To ensure the accuracy and semantic consistency of our summarization algorithm, we conduct human evaluation
of the output at various stages to ensure that the output meets the various criteria required. We also compare the output of
our summarization techniques to the output produced by other commonly used summarization algorithms to compare
speed and accuracy. Human validation is only conducted during the testing phase and is not required after the deployment
of the application.
9. CONCLUSION
We find that the summarization technique proposed has the following contributions -
1. A free summarization tool that will be widely accessible to a large userbase
2. A novel extraction mechanism that can provide accurate extracted summaries using a statistical approach
3. A novel abstract-extract-abstract system that can provide much more accurate results.
We find that our summarization technique is slightly slower than some state-of-the-art fast abstraction and extraction
systems due to our three-stage process but we find that due to this process we are able to produce much more accurate
results consistently.
REFERENCES
[1] Hongyan Jing and Kathleen R. McKeown (2000), cut and paste based text summarizer, which uses operations derived
from an analysis of human written abstracts.
[2] Kevin Knight and Daniel Marcu (2000) Summarization beyond sentence extraction: A probabilistic approach to
sentence compression.
We estimate the weight or saliency of a word to be include to the reference abstract using a statistical approach.
We find the frequency of occurrence of non-filler words and use that as the first indication of saliency. In some cases, there
might be salient words which might not have a high frequency of occurrence, in such cases we use other parameters like
capitalization and spacing to determine the importance of such words. Capitalized words and nouns are given a higher
weightage and has a higher chance to be included in the temporary abstract.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1066
[3] Andre F. T. Martins and Noah A. Smith 2009; Summarization with a Joint Model for Sentence Extraction and
Compression.
[4] Michele Banko, Vibhu O. Mittal, and Michael J. Witbrock. 2000. Headline generation based on statistical translation,
Association for Computational Linguistics.
[5] David Zajic, Bonnie Dorr, and Richard Schwartz. 2004. Bbn/umd at duc-2004: Topiary. In HLT-NAACL 2004 Document
Understanding Workshop, pages 112–119, Boston, Massachusetts.
[6] Yen-Chun Chen and Mohit Bansal UNC Chapel Hill 2018; Fast Abstractive Summarization with Reinforce-Selected
Sentence Rewriting.
[7] Peter J. Liu and Yao Zhao, Software Engineers, Google Research 2020; PEGASUS: A State-of-the-Art Model for
Abstractive Text Summarization.
[8] Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, and Bing Xiang. 2016. Abstractive text
summarization using sequence-tosequence rnns and beyond.
[9] Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent
neural networks. In ICLR.
[10] Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive summarization. In
ICLR.
[11] Jun Suzuki and Masaaki Nagata. 2016. RNN-based encoder-decoder approach with word frequency estimation. In
EACL.
[12] Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for modeling
documents. In IJCAI.
[13] Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer generator
networks.
[14] Yishu Miao and Phil Blunsom. 2016. Language as a latent variable: Discrete generative models for sentence
compression. In EMNLP.
[15] Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence
learning.
[16] Swabha Swayamdipta, Ankur P. Parikh, and Tom Kwiatkowski. 2017. multi-mention learning for reading
comprehension with neural cascades.
[17] Sebastian Henß, Margot Mieskes, and Iryna Gurevych. 2015. A reinforcement learning approach for adaptive single-
and multi-document summarization.
[18] Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarizer: A recurrent neural network-based sequence
model for extractive summarization of documents.
[19] Angela Fan, David Grangier, and Michael Auli. 2017. Controllable abstractive summarization.

More Related Content

Similar to AUTOMATED SUMMARY

8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
Text Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer ModelText Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer ModelIRJET Journal
 
IRJET - Automated System for Frequently Occurred Exam Questions
IRJET - Automated System for Frequently Occurred Exam QuestionsIRJET - Automated System for Frequently Occurred Exam Questions
IRJET - Automated System for Frequently Occurred Exam QuestionsIRJET Journal
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation SystemIRJET Journal
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET Journal
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A SurveyIRJET Journal
 
Review of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationReview of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationIRJET Journal
 
EXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEY
EXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEYEXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEY
EXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEYIRJET Journal
 
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...IRJET Journal
 
IRJET- Plagiarism Checker
IRJET- Plagiarism CheckerIRJET- Plagiarism Checker
IRJET- Plagiarism CheckerIRJET Journal
 
Ceviche framework for dynamic adaptation business process using complex event...
Ceviche framework for dynamic adaptation business process using complex event...Ceviche framework for dynamic adaptation business process using complex event...
Ceviche framework for dynamic adaptation business process using complex event...eSAT Journals
 
Automated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionAutomated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionIRJET Journal
 
IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET Journal
 
Academic Resources Architecture Framework Planning using ERP in Cloud Computing
Academic Resources Architecture Framework Planning using ERP in Cloud ComputingAcademic Resources Architecture Framework Planning using ERP in Cloud Computing
Academic Resources Architecture Framework Planning using ERP in Cloud ComputingIRJET Journal
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningIRJET Journal
 
AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...
AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...
AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...IAEME Publication
 

Similar to AUTOMATED SUMMARY (20)

8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
Text Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer ModelText Summarization Using the T5 Transformer Model
Text Summarization Using the T5 Transformer Model
 
IRJET - Automated System for Frequently Occurred Exam Questions
IRJET - Automated System for Frequently Occurred Exam QuestionsIRJET - Automated System for Frequently Occurred Exam Questions
IRJET - Automated System for Frequently Occurred Exam Questions
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation System
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software Technologies
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 
Review of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationReview of Topic Modeling and Summarization
Review of Topic Modeling and Summarization
 
EXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEY
EXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEYEXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEY
EXTRACTIVE TEXT SUMMARISATION TECHNIQUES- A SURVEY
 
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
Text Summarization of Food Reviews using AbstractiveSummarization and Recurre...
 
IRJET- Plagiarism Checker
IRJET- Plagiarism CheckerIRJET- Plagiarism Checker
IRJET- Plagiarism Checker
 
Ceviche framework for dynamic adaptation business process using complex event...
Ceviche framework for dynamic adaptation business process using complex event...Ceviche framework for dynamic adaptation business process using complex event...
Ceviche framework for dynamic adaptation business process using complex event...
 
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A ReviewSummarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
 
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A ReviewSummarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
 
Automated Essay Grading using Features Selection
Automated Essay Grading using Features SelectionAutomated Essay Grading using Features Selection
Automated Essay Grading using Features Selection
 
IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text Rank
 
Academic Resources Architecture Framework Planning using ERP in Cloud Computing
Academic Resources Architecture Framework Planning using ERP in Cloud ComputingAcademic Resources Architecture Framework Planning using ERP in Cloud Computing
Academic Resources Architecture Framework Planning using ERP in Cloud Computing
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
 
AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...
AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...
AN ARTIFICIAL NEURAL NETWORK-BASED APPROACH COUPLED WITH TAGUCHI'S METHOD FOR...
 
PID2143641
PID2143641PID2143641
PID2143641
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 

AUTOMATED SUMMARY

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1063 Automatic Text Summarization Using Machine Learning 1Sanjay H Pillai, 1Raheek Mohammed, 1Yadhu Krishna, 1Sangeeth S Krishnan, 2Jithin Jacob 1UG Scholar, Department of Computer Science and Engineering 2Asst. Prof, Department Of Computer Science And Engineering, UKF College of Engineering and Technology ---------------------------------------------------------------------------------***------------------------------------------------------------------------------------- Abstract- Our project combines human and machine-based approaches to text summarization using a multi-stage extractor-abstractor network. We use an initial abstract generated by Google’s Pegasus abstraction model as a reference to create an initial extraction generated by using a novel statistical extraction model that we created. The method uses word frequency and likelihood of occurrence in the reference summary to generate an accurate extract from the original document, this is then used as an input to the abstractor. This method allows us to create an accurate extracted input to the abstraction stage, this improves the efficiency and speed of the abstractor thereby improving the overall performance of the system. This method is similar to human abstraction method, we first extract important parts of the document before it is shortened to create an abstract sentence. Key Words: Summarization, Extraction, Abstraction 1. INTRODUCTION An effective text summarization technique is an exhaustively researched area in the field of Natural Language Processing. Text summarization requires shortening of text documents while preserving their intended meaning. This is usually achieved in two ways, either using extraction or abstraction. The extraction method [1] [2] [3] involves selection of important sentences from the original document such that the total number of sentences in the final document is less than that of the original document. In this method there is no rephrasing of the sentences from the original document as they are included without change to the final summary. This method provides a fast but primitive abstract. The abstraction method [4] [5] actively rephrases the sentences from the original document in a concise manner before including them to the final summary. This method requires machine learning models that can rephrase sentences while their meaning remains unchanged. Even though abstractive method provides a more human like summary, the process is often slow and erroneous as sometimes the final summary may contain sentences that are semantically inaccurate or convey a different meaning that that is intended in the original document. Although some models combine [6] the use of extraction and abstraction, they are not mainstream and applications that implement these methods for a large user base do not exist, therefore our summarization system aims to be accessible to a large user base via an online platform. The website will be an easy-to-use application that allows users to upload large documents and create summarised documents free of cost. We propose a novel summarization technique which is a combination of both human and machine approach to text summarization, we first create a reference abstraction of the original document using Google’s Pegasus abstraction model, this abstract is used just as a reference to determine the salient sentences from the original document. We use a likelihood- based extraction method in which we find the probability of adding a sentence from the original document to the extracted summary using the Pegasus abstract as a reference. Our contributions therefore include: A free summarization tool that will be widely accessible to a large userbase, a novel extraction mechanism that can provide accurate extracted summaries using a statistical approach, A novel abstract-extract-abstract system that can provide much more accurate results. 2. MODEL The model works by first creating a reference abstract using the Pegasus model [7], this is then used to determine the salient sentences from the original document using a statistical approach, this shortened version is then passed through the abstractor again to create a final summary. The initial document is first summarised abstractedly using the Pegasus model and some important words from the original document is appended to this abstract, this forms the reference document containing salient words from the
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1064 initial document, now we make another separate document that contains unimportant words from the initial document. These two documents then act as reference points which is used to classify whether a sentence from the initial document should be added to the final extract or not. Once the extracted summary is obtained, we again use the Pegasus model to abstract the extracted summary, this method ensures that we get an accurate summary that includes all the salient sentences from the initial document that is rephrased so that the final summary is concise and semantically correct. This method therefore incorporates the advantages of both the extractive and abstractive approach. 3. RELATED WORKS We find that the majority of the works done on automatic summarization is mainly focused on extraction or compression-based approaches. There are also some recent abstraction-based approaches using neural abstractive models [8] [9], RL based metric optimization [10], coverage [11] [12] [13], copy mechanism [14] [15], graph-based attention [16] etc. Reinforcement Learning is also an extensively used method in this field, Q-learning based RL for extractive summarization [17] is also an effective method. RL policy gradients have been used for producing accurate abstractive summaries. Some works in this field also include use of selective gates [18] to improve attention in abstractive summarization. There are also works that attempted cascaded small non-recurrent networks on extractive QA, producing a scalable, parallelizable model [19]. 4. EXTRACTOR The extractor is a new sentence level selection mechanism that uses naïve bayes to estimate the maximum likelihood for a sentence to be added to the extracted summary. To achieve this, as mentioned above we first create a temporary reference abstract using the Google’s Pegasus model. Then we use a statistical technique to find salient words from the initial document, for this we first determine the frequency of occurrence of each no filler word in the original document, then we use other parameters like capitalization and spacing to find other salient words that were not included in the initial abstract. These words are then appended to the abstract such the it contains almost all the salient words from the initial document while ignoring all the unimportant filler words from the original document. These filler words are the probability of occurrence of each word in the sentence to the prior probability. then included into another file which is also used as reference to perform naïve bayes. Fig -1: Model Schema For each sentence in the original document, we find its likelihood to be included in the extracted summary by multiplying
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1065 L = Likelihood ‘L’ is given by multiplying the probability of occurrence of each word in the given sentence by the prior probability ‘P’. If the calculated likelihood is lower with respect to the temporary reference abstract, then the sentence will not be included in the extracted summary and vice versa. 5. WORD WEIGHTAGE 6. ABSTRACTOR We use the Pegasus model for the two abstraction stages within our project. The first stage abstraction produces a larger file that tries to include more salient sentences, this is used as reference along with the weighted words selected from the original document to act as reference for initial extraction. Once the initial extraction is completed, we use the extracted output as the input for the second stage of abstraction, since the input of the second stage of abstraction is more concise than the original document, the second stage of abstraction will be much faster and provide much more accurate results as the input is tailored during the initial extraction stage. We intend to use our on novel abstraction system that produces faster results than Pegasus system as a future scope of our project, as of now we decide to continue with the Pegasus system as it provides concise and accurate abstracts that works well with our new extraction mechanism. 7. USER INTERFACE The UI is an easy-to-use web application that allows the user to select and upload the documents to be summarized directly to our servers which then employs our summarization algorithm to produce an accurate summary and then returns a downloadable link to the user. The web application is developed using HTML, CSS, JavaScript front end stack. For testing purposes, we host the web application initially on a local Linux server using apache2. 8. HUMAN VALIDATION To ensure the accuracy and semantic consistency of our summarization algorithm, we conduct human evaluation of the output at various stages to ensure that the output meets the various criteria required. We also compare the output of our summarization techniques to the output produced by other commonly used summarization algorithms to compare speed and accuracy. Human validation is only conducted during the testing phase and is not required after the deployment of the application. 9. CONCLUSION We find that the summarization technique proposed has the following contributions - 1. A free summarization tool that will be widely accessible to a large userbase 2. A novel extraction mechanism that can provide accurate extracted summaries using a statistical approach 3. A novel abstract-extract-abstract system that can provide much more accurate results. We find that our summarization technique is slightly slower than some state-of-the-art fast abstraction and extraction systems due to our three-stage process but we find that due to this process we are able to produce much more accurate results consistently. REFERENCES [1] Hongyan Jing and Kathleen R. McKeown (2000), cut and paste based text summarizer, which uses operations derived from an analysis of human written abstracts. [2] Kevin Knight and Daniel Marcu (2000) Summarization beyond sentence extraction: A probabilistic approach to sentence compression. We estimate the weight or saliency of a word to be include to the reference abstract using a statistical approach. We find the frequency of occurrence of non-filler words and use that as the first indication of saliency. In some cases, there might be salient words which might not have a high frequency of occurrence, in such cases we use other parameters like capitalization and spacing to determine the importance of such words. Capitalized words and nouns are given a higher weightage and has a higher chance to be included in the temporary abstract.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 05 | May 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1066 [3] Andre F. T. Martins and Noah A. Smith 2009; Summarization with a Joint Model for Sentence Extraction and Compression. [4] Michele Banko, Vibhu O. Mittal, and Michael J. Witbrock. 2000. Headline generation based on statistical translation, Association for Computational Linguistics. [5] David Zajic, Bonnie Dorr, and Richard Schwartz. 2004. Bbn/umd at duc-2004: Topiary. In HLT-NAACL 2004 Document Understanding Workshop, pages 112–119, Boston, Massachusetts. [6] Yen-Chun Chen and Mohit Bansal UNC Chapel Hill 2018; Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. [7] Peter J. Liu and Yao Zhao, Software Engineers, Google Research 2020; PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization. [8] Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, and Bing Xiang. 2016. Abstractive text summarization using sequence-tosequence rnns and beyond. [9] Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In ICLR. [10] Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive summarization. In ICLR. [11] Jun Suzuki and Masaaki Nagata. 2016. RNN-based encoder-decoder approach with word frequency estimation. In EACL. [12] Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, and Hui Jiang. 2016. Distraction-based neural networks for modeling documents. In IJCAI. [13] Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer generator networks. [14] Yishu Miao and Phil Blunsom. 2016. Language as a latent variable: Discrete generative models for sentence compression. In EMNLP. [15] Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. [16] Swabha Swayamdipta, Ankur P. Parikh, and Tom Kwiatkowski. 2017. multi-mention learning for reading comprehension with neural cascades. [17] Sebastian Henß, Margot Mieskes, and Iryna Gurevych. 2015. A reinforcement learning approach for adaptive single- and multi-document summarization. [18] Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarizer: A recurrent neural network-based sequence model for extractive summarization of documents. [19] Angela Fan, David Grangier, and Michael Auli. 2017. Controllable abstractive summarization.