ExcelR Convers
PRESENTED BY
01
SAYAN MONDAL
GUIDED BY
MR. BYOM
02
Business Problem
Extracting all the actionable
insights from the chat transcript
which
will be helpful for:
•   Improvement of business
•   Easy Connectivity
•   Reduced Human Dependency
•   Effective Connectivity
•   24*7 Service
Objective
Topic Mining & Exploratory Analysis for
improving the Resource Allocation, Content
Modification & Service Improvement.
03
Data Set
•90 Days conversational Data
•Semi Structured Data
START WITH BUSINESS PROBLEM
CLEAN THE DATA
PERFORM EDA
APPLY MACHINE LEARNING TECHNIQUES
INSIGHTS
PROJECT ARCHITECTURE / PROJECT FLOW
04
DATA INSIGHT
FROM DIFFERENT
VARIABLE
TIMESTAMP
From the timestamp we can understand which
particular day of the week we are getting maximum call,
Which time slot we are getting that we can understand.
UNREAD
false/True : Depend upon this we also get the idea how
many people visited the website and how many of
people visited the site & how many are interested.
VISITOR EMAIL
Could be used for marketing purpose.                           
e.g. special occasion discount.
05
DATA INSIGHT
FROM DIFFERENT
VARIABLE
COUNTRY NAME/REGION/CITY
From this we could do a geographic segmentation.
AGE
If we add age as an input we could do the demographic
analysis
06
DATA INSIGHT
FROM DIFFERENT
VARIABLE
CHAT
From chat start and end time we get idea how the
prospect is engaged.
Which particular time chat volume are high we can
engage more executive. And give them break
alternately when chat vol. is less.
What are different things prospects are looking for
we got an idea. e.g digital marketing, deep
learning, Block chain. If any thing is not there but
The demand percentage is high we could think
about to put that in our curriculum.
If prospects going more technicalities then there
should be such kind of option shift them to a
technical person.
07
DATA INSIGHT
FROM DIFFERENT
VARIABLE
CHAT
we can also do sentimental analysis of the chat data
we can add one more feature to our data as rating.
after that we can say if the score is above 6, he/she
can be a potential customer who look more
interested in doing course in our institute.
08
DATA MERGE
We wanted to merge all the text data from all the
text file in directory.
up til now we are able to get data from all file into
one output file.
DATA PARSING
09
Chat Data Analysis & TopicChat Data Analysis & Topic
Modeling Using LDAModeling Using LDA
10
Data Pre-Processing
Converting all text into
Lowercase.
11
Removing punctuation
from text
Removing all stopwords
Lemmatization
Removing Special
Words.
eg - Id, Okay, etc
12
Most Visited Visitors By Days
13
Different Types Of Platforms Used Outside India
14
Month Wise Visits
15
No of People Actively reading Messages
16
Most Visits By Region
17
Platforms Used Worldwide
18
Most Visitors Apart From India
19
Word Cloud of Complete Corpus
20
Negative Word Cloud
21
Positive Word Cloud
[('get', 27293), ('data', 25215), ('science', 23305), ('end', 17841), ('location', 17710),
('support', 17066), ('number', 15972), ('information', 14559), ('quick', 14239), ('request',
11318), ('see', 11115), ('month', 11112), ('name', 11041), ('exploring', 11033),
('elearning', 10408), ('discount', 9830), ('exclusive', 9612), ('namecontact', 9606), ('like',
9551), ('time', 9505), ('access', 9502), ('detail', 9297), ('offer', 9151), ('enroll', 9029),
('money', 8979), ('love', 8971), ('special', 8966), ('save', 8962), ('life', 8805), ('contact',
8724), ('placement', 8620), ('region', 8333), ('project', 8284), ('please', 7989), ('city',
7772), ('group', 7772), ('clarification', 7743), ('whats', 7605), ('app', 7459), ('interview',
7260), ('doubt', 6925), ('live', 6722), ('call', 6493), ('student', 6453), ('preparation', 6444),
('25', 6316), ('forum', 6248), ('know', 5148), ('fee', 4187), ('pmp', 4148)]
22
Top 50 Frequent words
23
Bigram
24
Trigram
25
Sentiment Analysis
Positive Negative
26
Topic 1 - Course inquiry
Topic 2 - Career transformation
Topic 3 - Assistance
Topic 4 - E-learning and discount
Topic Modeling Using LDA
Model Purplexity :  -5.46
coherence score : 0.59
28
Unsupervised To Supervised Model
why we converted unsupervised to supervised ?
what benefits we will get from from business
prospective ?
How did we do it ?
29
Chat CSV with time duration
Naive Bayes Classifier
30
Accuracy 97.79 %
Confusion matrix
Kappa score 0.8303
Logistic Regression Classifier
31
Accuracy 98.06 %
Confusion matrix
Kappa Score 0.8508
0.8749
Catboost Classifier
32
Accuracy 98.35 %
Classification
report
Kappa score
33
Challenges Faced and ways to improve.
unread was miss-classified
chat-bot should take user credentials before
staring the conversation.
Course fees should be mentioned according to
respective country.
34
Model Deployment Using Flask
27
Model Deployment Using Flask
35
Thank
You

Chatbot data to Topic modelling

  • 1.
    ExcelR Convers PRESENTED BY 01 SAYANMONDAL GUIDED BY MR. BYOM
  • 2.
    02 Business Problem Extracting allthe actionable insights from the chat transcript which will be helpful for: •   Improvement of business •   Easy Connectivity •   Reduced Human Dependency •   Effective Connectivity •   24*7 Service
  • 3.
    Objective Topic Mining &Exploratory Analysis for improving the Resource Allocation, Content Modification & Service Improvement. 03 Data Set •90 Days conversational Data •Semi Structured Data
  • 4.
    START WITH BUSINESSPROBLEM CLEAN THE DATA PERFORM EDA APPLY MACHINE LEARNING TECHNIQUES INSIGHTS PROJECT ARCHITECTURE / PROJECT FLOW 04
  • 5.
    DATA INSIGHT FROM DIFFERENT VARIABLE TIMESTAMP Fromthe timestamp we can understand which particular day of the week we are getting maximum call, Which time slot we are getting that we can understand. UNREAD false/True : Depend upon this we also get the idea how many people visited the website and how many of people visited the site & how many are interested. VISITOR EMAIL Could be used for marketing purpose.                            e.g. special occasion discount. 05
  • 6.
    DATA INSIGHT FROM DIFFERENT VARIABLE COUNTRYNAME/REGION/CITY From this we could do a geographic segmentation. AGE If we add age as an input we could do the demographic analysis 06
  • 7.
    DATA INSIGHT FROM DIFFERENT VARIABLE CHAT Fromchat start and end time we get idea how the prospect is engaged. Which particular time chat volume are high we can engage more executive. And give them break alternately when chat vol. is less. What are different things prospects are looking for we got an idea. e.g digital marketing, deep learning, Block chain. If any thing is not there but The demand percentage is high we could think about to put that in our curriculum. If prospects going more technicalities then there should be such kind of option shift them to a technical person. 07
  • 8.
    DATA INSIGHT FROM DIFFERENT VARIABLE CHAT wecan also do sentimental analysis of the chat data we can add one more feature to our data as rating. after that we can say if the score is above 6, he/she can be a potential customer who look more interested in doing course in our institute. 08
  • 9.
    DATA MERGE We wantedto merge all the text data from all the text file in directory. up til now we are able to get data from all file into one output file. DATA PARSING 09
  • 10.
    Chat Data Analysis& TopicChat Data Analysis & Topic Modeling Using LDAModeling Using LDA 10
  • 11.
    Data Pre-Processing Converting alltext into Lowercase. 11 Removing punctuation from text Removing all stopwords Lemmatization Removing Special Words. eg - Id, Okay, etc
  • 12.
  • 13.
    13 Different Types OfPlatforms Used Outside India
  • 14.
  • 15.
    15 No of PeopleActively reading Messages
  • 16.
  • 17.
  • 18.
  • 19.
    19 Word Cloud ofComplete Corpus
  • 20.
  • 21.
  • 22.
    [('get', 27293), ('data',25215), ('science', 23305), ('end', 17841), ('location', 17710), ('support', 17066), ('number', 15972), ('information', 14559), ('quick', 14239), ('request', 11318), ('see', 11115), ('month', 11112), ('name', 11041), ('exploring', 11033), ('elearning', 10408), ('discount', 9830), ('exclusive', 9612), ('namecontact', 9606), ('like', 9551), ('time', 9505), ('access', 9502), ('detail', 9297), ('offer', 9151), ('enroll', 9029), ('money', 8979), ('love', 8971), ('special', 8966), ('save', 8962), ('life', 8805), ('contact', 8724), ('placement', 8620), ('region', 8333), ('project', 8284), ('please', 7989), ('city', 7772), ('group', 7772), ('clarification', 7743), ('whats', 7605), ('app', 7459), ('interview', 7260), ('doubt', 6925), ('live', 6722), ('call', 6493), ('student', 6453), ('preparation', 6444), ('25', 6316), ('forum', 6248), ('know', 5148), ('fee', 4187), ('pmp', 4148)] 22 Top 50 Frequent words
  • 23.
  • 24.
  • 25.
  • 26.
    26 Topic 1 -Course inquiry Topic 2 - Career transformation Topic 3 - Assistance Topic 4 - E-learning and discount Topic Modeling Using LDA Model Purplexity :  -5.46 coherence score : 0.59
  • 27.
    28 Unsupervised To SupervisedModel why we converted unsupervised to supervised ? what benefits we will get from from business prospective ? How did we do it ?
  • 28.
    29 Chat CSV withtime duration
  • 29.
    Naive Bayes Classifier 30 Accuracy97.79 % Confusion matrix Kappa score 0.8303
  • 30.
    Logistic Regression Classifier 31 Accuracy98.06 % Confusion matrix Kappa Score 0.8508
  • 31.
    0.8749 Catboost Classifier 32 Accuracy 98.35% Classification report Kappa score
  • 32.
    33 Challenges Faced andways to improve. unread was miss-classified chat-bot should take user credentials before staring the conversation. Course fees should be mentioned according to respective country.
  • 33.
  • 34.
  • 35.