Generic Short Text Classifier

889 views
696 views

Published on

Generic Short Text Classifier

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
889
On SlideShare
0
From Embeds
0
Number of Embeds
400
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Generic Short Text Classifier

  1. 1. Ham  or  Spam?   Generic  Short  Text   Classifier Romit Singhai
  2. 2. Some  Stats •  241 million active user accounts per month •  5% are spam account
  3. 3. Some  More  Stats… •  130 million mobile users •  Two thirds of mobile users received SMS spam
  4. 4. Data  Pipeline Acquisition (TwiAer   API) Cleaning   and   Munging Feature   Engineering Feature   Selection Feature   Extraction
  5. 5. Text  Classification
  6. 6. Text  Classification
  7. 7. Confusion  Matrix
  8. 8. Feature  Engineering
  9. 9. Feature  Engineering
  10. 10. Generic  Classifier •  Added SMS data to Twitter Data o  Tweet + SMS (AUC=0.91) •  Just meta features o  Random Forest Classifier(Mean Accuracy 0.70)
  11. 11. Benefits •  Reduce operational cost •  Improve Security •  Supports Revenue Model
  12. 12. Next  Steps… •  Incorporate data from other platforms •  Analyze and Extract more meta features •  “Catch All” Classifier as API
  13. 13. Thanks Romit Singhai romits@gmail.com

×