Automatic keyword extraction.pptx

PROJECT PRESENTATION
ON
“AUTOMATIC KEYWORD EXTRACTION FOR
TEXT SUMMARIZATION”
Presented by
Biswarup Das
Roll-102217, No.-02220630
10th semester
Under the guidance of Dr. Rakesh Kumar, Assistant Professor.
Department of Computer Science,
Assam University, Silchar.

CONTENTS: -
 Introduction
 Objective
 Problem Statement
 Types of Summarization
 Literature Review
 Methodology and Implementation
 System Configuration
 Results
 Conclusion
 Future Work
 References

INTRODUCTION
 Summarization is a process where the most salient
features of a text are extracted and compiled into a
short abstract of the original document.
 In order to achieve this, we need to first mine the text
from the document.

TEXT MINING
 Text mining is a method of extracting information by
collecting various design and keywords from an
indefinite data.
 It basically includes of text sorting, Sentiment analysis
and various other features.

NATURAL LANGUAGE PROCESSING (NLP)
 NLP is all about the interaction between the computers and human
speech.
 Data which is come from the conversations, statements are
basically examples of unstructured data, which are very
disarranged and difficult to manage.
 For understanding of the text to computers we have to translate it
to a language computer, for achieving this we will use Word
Embedding.

WORD EMBEDDING
 It is a numerical representation of words. A common
representation is one-hot vector [1].
 This method encodes each word with a unique vector.
All values in that vector are zeroes except for a value 1,
which defines the word representation.
 The most popular word embedding are Word2Vec and
Glove.

OBJECTIVE
 The main objective of automatic text summarization is
presenting the source text into a shorter version with
semantics.
 The main advantage of using summary is, it reduces
time.

PROBLEM STATEMENT
 Generating a summary from a text document: -
 This helps in to understand a large amount of text.
 As the internet is available in every corner the
information is also growing at a certain pace.
 Which Ultimately becomes a challenge to
summarize all type of data.

TYPES OF SUMMARIZATIONS
 Basically there are two types of summarization-
1) Extractive Text Summarization:-In this method it
selects the information from the document as exactly it
appears in the source based to form the summary.
2) Abstractive text Summarization:- In this procedure, a
machine must need to grasp the idea of all the documents
which are being used as input and then it produces summary
for a particular given sentence.

LITERATURE REVIEW
 Arunlfo and Ledeneva [2] suggested a method of term selection with the help of
TF-IDF. They attain this by unsupervised method to generate necessary
summary.
 Krishnaveni et.al (2017) [3] suggested a text summarization on the basis of
heading as the conditions can be identify from the heading.
 Nikhil S. Shirwandkar et.al (2018) [4] proposed a method that uses both
Restricted Boltzmann Machine (RBM) and fuzzy logic to recognize the key
sentences. An approach is proposed to produce short and concise summaries
for long text documents.
 J.N.Madhuri et.al (2019) [5] has submitted a technique to create extractive
summary using sentence rating techniques with the help of term frequency later
than removing stop words. It works for any type of text but cannot differentiate
sentences.

METHODOLOGY AND IMPLEMENTATION
 Architecture:-
Figure 1- The Architecture of extractive text summarizer

DETAILS OF ARCHITECTURE
 Source File:-
- To create the summary a few inputs must be taken into consideration by
taking the document. The input document should only be in English
language.
- For uploading the file type we will use the following command:-
FIGURE 2- UPLOADING OF FILE TYPE

PRE-PROCESSING
- The input text is divided into sentences based on the sentence terminator. These
sentences are individually preprocessed using the below techniques.
- 1) Lower Casing- In this, the entire input data is transformed into lowercase
letters.
- For performing this task we will use the following command-
- Figure 3- Lowering of texts

2) Stop word Removal:-
- In this step all the stop words is being removed from the input data which
came on frequent basis.
- We will remove the stop words by following command-
- FIGURE 4- REMOVING OF STOP WORDS

FEATURE EXTRACTION
1) Word Frequency:- The total words that resides in the document is
take into count and make a frequency list of the words.
- To determine the word frequencies we will use following command:-
- Figure 5- Word Frequency

2) Sentence Tokenization:-
- For sentence tokenization we have to first lower the words
which we have done earlier, after comparing each word with
sentences we will determine the sentence scores follows with
specific sentence with scores.
- We will use this command to do this task:-
Figure 6- Text Tokenization

 Extraction of high score sentences:-
- After determining the sentence scores we will arrange them in
descending order and store in the list for the summary generation.
- The command is given below-
-
- Figure 7- Sentence Scores

 Summary Generation:-
- It depend on the locating of file’s theme ,which includes various popular
topics like term frequency, TF-IDF etc.
- The steps in the processing of this summarizer are as follows:
- 1) Conversion of input text into an intermediate depiction.
- 2) Giving a priority score for each sentences.
- We will use the following command to generate summary-
- Figure 8- Summary Generation

POST-PROCESSING
 We will convert the sentences from spacy span to strings for joining of entire
sentences.
 After that we will do list comprehension of the sentences of the previous step.
- We will do above steps using the following commands-
Figure 9- Conversion from spacy span to strings
Figure 10- List Comprehension

SYSTEM CONFIGURATION
 Software Configuration:-
1) Python.
2) Natural Language Toolkit(NLTK).
3) Jupyter Notebook.
4) Various other packages.
 Hardware Configuration:-
1. Processors (min. Intel i3 processor).
2. RAM (min. 2GB).
3. Hard disk (512GB is enough)
4. Power supply (input of 100V-240V)

RESULTS
 Screenshots-
1)
Figure 11- Original text

2)
Figure 12- List of Stop Words

3)
Figure 13- Word Frequencies

3)
Figure 14- Maximum Word Frequency

4)
Figure 15- Sentence Scores
5)
Figure 16- Summary generation with length

6)
Figure 17- Length Comparisons

7)
Figure 18- Keywords with its Equivalent scores

CONCLUSION AND FUTURE WORK
 Conclusion-
- The whole project work is done in an extractive text summarization technique.
The summarization method should create a useful summary in a short duration
with minimal redundancy and grammatically correct sentences.
- The other summarization techniques like abstractive method which is responsible
to generate more related and exact summaries, but the main catch is that it
requires more complicated heuristic algorithms.
- The summarization method needs to make more accurate summaries in less
time with the least quantity of redundancy.

 Future Scope-
- There are quite a few problems to solve, like the accuracy of the parsing
that reduced the sequential entireness that has to be improved.
- The main focus is based on to improve the parsing accuracy and to
minimize the redundancy.
- Although this work can be also be done in deep learning domain where we
can use layered structured and after training of the datasets it may show
more accurate summary.

REFERENCES
1. Valverde Tohalino, Jorge & Amancio, Diego. (2017). Extractive Multi-document
Summarization Using Multilayer Networks. Physica A: Statistical Mechanics and its
Applications. 503. 10.1016/j.physa.2018.03.013.
2. R. A. Garc´ıa- Hern´andez and Y. Ledeneva, “Word sequence models for single text
summarization,” in Proceedings of the 2nd International Conferences on Advances in
Computer-Human Interactions, ACHI 2009, pp. 44–48, IEEE, 2009.
3. P. Krishnaveni and Balasundaram S.R, “Automatic text summarization by local scoring
and ranking for improving coherence,” july 2017 2017 international conference of
computing methodologies and communication,doi:10.1109/ICCMC.2017.8282539.
4. N. S. Shirwandkar and S. Kulkarni, "Extractive Text Summarization Using Deep
Learning," 2018 Fourth International Conference on Computing Communication
Control and Automation (ICCUBEA), Pune, India, 2018, pp. 1-5.
doi:10.1109/ICCUBEA.2018.8697465.
5. J. N. Madhuri and R. Ganesh Kumar, "Extractive Text Summarization Using Sentence
Ranking," 2019 International Conference on Data Science and Communication (Icon
DSC), Bangalore, India, 2019, pp. 1-3. doi: 10.1109/IconDSC.2019.8817040.

Automatic keyword extraction.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Automatic keyword extraction.pptx

Similar to Automatic keyword extraction.pptx (20)

Recently uploaded

Recently uploaded (20)

Automatic keyword extraction.pptx