5. Bar graph representing the count of each class
As we can see, half the
reviews has positive
sentiment and other half
has negative sentiment
6. Preprocessing the “Sentiment” column
We have 2 main values in
sentiment - ‘positive’ &
‘negative’. So assigning 1s
and 0s to them
7. Data Cleaning
Clearing data
● process of clearing punctuation marks in data.
● cleaning unnecessary marks in data.
● capitalization to lowercase.
● cleaning extra spaces.
● removal of stopwords in sentences.
9. Splitting the Dataset for testing and training
We splitted the data set in 80-20 ratio
randomly for training and testing our
models after removing stopword from
lines.
TRAIN size: 40000
TEST size: 10000
from sklearn.model_selection
import train_test_split
x_train, x_test, y_train,
y_test =
train_test_split(data,sentim
ent,test_size = 0.2,
random_state = 42)
10. Words and their Equivalent Tokens:-
thesis behind rise evil
seems br br hitler bad man
bad man hated jews case miss
going fact every scene film
br br
13091 383 2007 337 85
1 1 2009 16 44 16
44
1631 4271 296 557 70
90 73 47 3 1 1
12. Model Training & Evaluating
Training the model with necessary informations
history = model.fit(x_train_pad, y_train, validation_split=0.3,
epochs=5, batch_size=1000, shuffle=True, verbose = 1)
Then we evaluate the model on test data
result = model.evaluate(x_test_pad, y_test)