SlideShare a Scribd company logo
1 of 43
Download to read offline
Zendesk Tickets
and Natural Language Processing
Overview
Zendesk is a customer support
ticketing system. I will be taking a
look at 80k tickets created over
~2 years, hoping to classify
tickets based on the text of the
customer support request.
About the Data
● Wide variety of issues related to an
online education platform
● Users include administrators,
teachers, and students/parents
● Sender type is gathered
programmatically based on username
● “Page,” aka general area the
complaint relates to, is tagged by
human customer support
Two Goals:
1) Predict the “Affected Page” and
determine if an automated, page-
specific response should be sent
(as opposed to a generic response).
2) Determine if the user is a student or
an adult
Zendesk has a bulk csv export available, but the
actual message content is not included.
Instead, I had to request 1 ticket per call, which
took about 1 second. The whole process took 20+
hours.
Takeaways:
● Use “requests” and “json” in python for api
calls & parsing
● Need to build in exception handling because
some ticket IDs are not available
Gathering Data: Zendesk API
Carrier pigeons may have been faster
EDA: Affected Pages
45 unique page types tagged
The top 15 page counts
After grouping the “other” pages
Prep Data Set
1) factorize target values:
○ y = pd.factorize(affected_pages)
○ Y=y[0]
2) Use the count vectorizer to convert the text into a numerical matrix:
○ vectorizerA = CountVectorizer(ngram_range = (1, 2))
○ freq_term_matrixA = vectorizerA.fit_transform(DFA.text)
3) Use StratifiedKFold to train-test-split:
○ n_folds=2
This is the data --->
Initial Model
sidebar: Low hanging fruit
If you only look at “Logins”
vs “Other,” you leap to
90% accuracy without any
tweaking of the model or
adding additional features.
Considering logins may be
the best response to
automate, this is
interesting!
~ 3% of “other” tickets are
miscategorized
0=other, 1=login
Further Analysis of results
Support Vector machine w/ linear kernel returns a score around .75
Some misclassifications are worse than others...
Since “other” pages would receive the “generic” response, we aren’t actually doing harm if
we mislabel a category as “other”. It’s only a problem when we send a subject-specific page
to the wrong category - for instance, sending “login” instructions when the question is about
“tests and quizzes”.
How can we incorporate additional Features?
TF-IDF
term frequency-inverse document frequency:
Takes into account both the frequency of a particular word in the document, as
well as the frequency of this word across ALL the documents.
The other benefit is that the end result is a sparse matrix, to which we can
append other data (if we convert back to a regular “dense” matrix).
After converting to TF-IDF the scores for various models will fluctuate
Dont forget to convert the sparse matrix to a dense matrix
And we’ll need the indices of the train-test-split...
Create a new factor...
Use the training and test indices
from the stratified K fold split to
divide into training and test data
Append the array to your matrix
Fit a model...
For imperceptible gains...
<-old score…
Returning values for new tickets
- First pull new tickets via the API that the model has not
seen before
- Then convert them using the same methods as before
- Apply the model that was trained on the big data set
- get predictions / classifications for new tickets
Need a new “y” value
More ticket predictions
https://docs.google.com/a/engrade.com/document/d/1VZKYwkzsY-
nBHVbB26MXSIMlKUAqzCyhrKvRhZrcoSI/edit?usp=sharing
Students vs Teachers
Using only the “logins” page results, I followed the above
process and to classify student vs teacher:
Model: Logistic Regression (C=5)
Score = 0.7170
TEXTSTAT library:
Still didn’t improve much...
What next?
● Pickle to save best model for use in production
● Determine risk threshold for sending wrong emails and
try to tweak model to perform as desired
● Looping through C values, etc, to get best model.
● looping through various page types to see what is easily
classified
● Using other features…

More Related Content

Viewers also liked

Undergraduate Research
Undergraduate ResearchUndergraduate Research
Undergraduate ResearchTrevor Harder
 
profile youguess
profile youguessprofile youguess
profile youguessVIJEE V
 
Jesús Adrián Romero
Jesús Adrián RomeroJesús Adrián Romero
Jesús Adrián RomeroSandra Roldan
 
Anaisa cruz 
Anaisa cruz Anaisa cruz 
Anaisa cruz ascruz13
 
Google glass #1
Google glass #1Google glass #1
Google glass #1KariiLOL
 
Skin & its appendages
Skin & its appendagesSkin & its appendages
Skin & its appendages2015101
 
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesMT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesDell EMC World
 
Implications of Industry 4.0 for CIOs
Implications of Industry 4.0 for CIOsImplications of Industry 4.0 for CIOs
Implications of Industry 4.0 for CIOsCapgemini
 

Viewers also liked (11)

Mooneye Advisors Srl
Mooneye Advisors SrlMooneye Advisors Srl
Mooneye Advisors Srl
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Undergraduate Research
Undergraduate ResearchUndergraduate Research
Undergraduate Research
 
profile youguess
profile youguessprofile youguess
profile youguess
 
Jesús Adrián Romero
Jesús Adrián RomeroJesús Adrián Romero
Jesús Adrián Romero
 
Anaisa cruz 
Anaisa cruz Anaisa cruz 
Anaisa cruz 
 
Google glass #1
Google glass #1Google glass #1
Google glass #1
 
Skin & its appendages
Skin & its appendagesSkin & its appendages
Skin & its appendages
 
Clear Tax: An IRS Help Tool
Clear Tax: An IRS Help ToolClear Tax: An IRS Help Tool
Clear Tax: An IRS Help Tool
 
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesMT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
 
Implications of Industry 4.0 for CIOs
Implications of Industry 4.0 for CIOsImplications of Industry 4.0 for CIOs
Implications of Industry 4.0 for CIOs
 

Similar to Zendesk and NLP

Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Optimizing IT Operations with Natural Language Processing
Optimizing IT Operations with Natural Language ProcessingOptimizing IT Operations with Natural Language Processing
Optimizing IT Operations with Natural Language ProcessingCognizant
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
final report (ppt)
final report (ppt)final report (ppt)
final report (ppt)butest
 
EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_finalShanglin Yang
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdfUsing Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdfWes Madrigal
 
College of administrative and financial sciences assignme
College of administrative and financial sciences assignmeCollege of administrative and financial sciences assignme
College of administrative and financial sciences assignmenand15
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentationBhavishya Tyagi
 
IRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET Journal
 
Spring Data JPA USE FOR CREATING DATA JPA
Spring Data JPA USE FOR CREATING DATA  JPASpring Data JPA USE FOR CREATING DATA  JPA
Spring Data JPA USE FOR CREATING DATA JPAmichaelaaron25322
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance PredictorIRJET Journal
 
2020 Updated Microsoft MB-200 Questions and Answers
2020 Updated Microsoft MB-200 Questions and Answers2020 Updated Microsoft MB-200 Questions and Answers
2020 Updated Microsoft MB-200 Questions and Answersdouglascarnicelli
 

Similar to Zendesk and NLP (20)

Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Optimizing IT Operations with Natural Language Processing
Optimizing IT Operations with Natural Language ProcessingOptimizing IT Operations with Natural Language Processing
Optimizing IT Operations with Natural Language Processing
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
final report (ppt)
final report (ppt)final report (ppt)
final report (ppt)
 
EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_final
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdfUsing Graphs for Feature Engineering_ Graph Reduce-2.pdf
Using Graphs for Feature Engineering_ Graph Reduce-2.pdf
 
College of administrative and financial sciences assignme
College of administrative and financial sciences assignmeCollege of administrative and financial sciences assignme
College of administrative and financial sciences assignme
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Database Modeling presentation
Database Modeling  presentationDatabase Modeling  presentation
Database Modeling presentation
 
IRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET-Fake Product Review Monitoring
IRJET-Fake Product Review Monitoring
 
Spring Data JPA USE FOR CREATING DATA JPA
Spring Data JPA USE FOR CREATING DATA  JPASpring Data JPA USE FOR CREATING DATA  JPA
Spring Data JPA USE FOR CREATING DATA JPA
 
Student Performance Predictor
Student Performance PredictorStudent Performance Predictor
Student Performance Predictor
 
2020 Updated Microsoft MB-200 Questions and Answers
2020 Updated Microsoft MB-200 Questions and Answers2020 Updated Microsoft MB-200 Questions and Answers
2020 Updated Microsoft MB-200 Questions and Answers
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Zendesk and NLP

  • 1. Zendesk Tickets and Natural Language Processing
  • 2. Overview Zendesk is a customer support ticketing system. I will be taking a look at 80k tickets created over ~2 years, hoping to classify tickets based on the text of the customer support request.
  • 3. About the Data ● Wide variety of issues related to an online education platform ● Users include administrators, teachers, and students/parents ● Sender type is gathered programmatically based on username ● “Page,” aka general area the complaint relates to, is tagged by human customer support
  • 4. Two Goals: 1) Predict the “Affected Page” and determine if an automated, page- specific response should be sent (as opposed to a generic response). 2) Determine if the user is a student or an adult
  • 5. Zendesk has a bulk csv export available, but the actual message content is not included. Instead, I had to request 1 ticket per call, which took about 1 second. The whole process took 20+ hours. Takeaways: ● Use “requests” and “json” in python for api calls & parsing ● Need to build in exception handling because some ticket IDs are not available Gathering Data: Zendesk API Carrier pigeons may have been faster
  • 6.
  • 8. 45 unique page types tagged
  • 9. The top 15 page counts
  • 10.
  • 11. After grouping the “other” pages
  • 12. Prep Data Set 1) factorize target values: ○ y = pd.factorize(affected_pages) ○ Y=y[0] 2) Use the count vectorizer to convert the text into a numerical matrix: ○ vectorizerA = CountVectorizer(ngram_range = (1, 2)) ○ freq_term_matrixA = vectorizerA.fit_transform(DFA.text) 3) Use StratifiedKFold to train-test-split: ○ n_folds=2 This is the data --->
  • 14. sidebar: Low hanging fruit If you only look at “Logins” vs “Other,” you leap to 90% accuracy without any tweaking of the model or adding additional features. Considering logins may be the best response to automate, this is interesting! ~ 3% of “other” tickets are miscategorized 0=other, 1=login
  • 15. Further Analysis of results Support Vector machine w/ linear kernel returns a score around .75
  • 16. Some misclassifications are worse than others... Since “other” pages would receive the “generic” response, we aren’t actually doing harm if we mislabel a category as “other”. It’s only a problem when we send a subject-specific page to the wrong category - for instance, sending “login” instructions when the question is about “tests and quizzes”.
  • 17.
  • 18. How can we incorporate additional Features?
  • 19. TF-IDF term frequency-inverse document frequency: Takes into account both the frequency of a particular word in the document, as well as the frequency of this word across ALL the documents. The other benefit is that the end result is a sparse matrix, to which we can append other data (if we convert back to a regular “dense” matrix).
  • 20.
  • 21. After converting to TF-IDF the scores for various models will fluctuate
  • 22. Dont forget to convert the sparse matrix to a dense matrix And we’ll need the indices of the train-test-split...
  • 23. Create a new factor... Use the training and test indices from the stratified K fold split to divide into training and test data
  • 24. Append the array to your matrix
  • 25. Fit a model... For imperceptible gains... <-old score…
  • 26. Returning values for new tickets - First pull new tickets via the API that the model has not seen before - Then convert them using the same methods as before - Apply the model that was trained on the big data set - get predictions / classifications for new tickets
  • 27.
  • 28. Need a new “y” value
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 37. Students vs Teachers Using only the “logins” page results, I followed the above process and to classify student vs teacher: Model: Logistic Regression (C=5) Score = 0.7170
  • 39.
  • 40.
  • 41.
  • 43. What next? ● Pickle to save best model for use in production ● Determine risk threshold for sending wrong emails and try to tweak model to perform as desired ● Looping through C values, etc, to get best model. ● looping through various page types to see what is easily classified ● Using other features…