1. ——
Natural Language
Processing
(BAG OF WORDS)
Algorithm that helps in feature extraction ( know
intuition)from text.
MLAlgorithms cannot be performed on raw text ,
the text is first converted into vectors of numbers.
This process is known as feature extraction. It is
done by the help of baat of words.
2. ABOUT BAG OF
WORDS
NLP model which helps in extracting features out of the text which
can be helpful in machine learning algorithms.
Any information about the order or structure of the words is rejected.
That is why it is called bag of words.
The perception is that similar documents have similar content. We
can also learn something about the meaning of the document from
its content page.
3. step by step approach to implement bag of
words
text normalisation: collect data and preprocess it.
create dictionary : make a list of all the unique words occurring in the Corpus.
document vectors : find how many times the word from the unique list of words
has occurred.
document vectors for all the documents
5. Step 1: Text
Normalization
No tokens have been removed in
the stopwords removal step. It is
because we have very little data
and since the frequency of all the
words is almost the same, no word
can be said to have lesser value
than the other .
step 2: Sentence
segmentation
Each document has single
sentence therefore segmentation
not needed.
6. Step 3 : Lowercase and remove duplicate
This is called differentiation. We remove duplicate words from the document if any.