Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

0

Share

Fault Finder: Identifying Laptop Failures from Amazon Reviews

A presentation describing the model behind Fault Finder.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Fault Finder: Identifying Laptop Failures from Amazon Reviews

  1. 1. FaultFindr how might this laptop fail? Brendon O’Leary
  2. 2. We use customer reviews to aid in making decisions to buy products online.
  3. 3. We use customer reviews to aid in making decisions to buy products online.
  4. 4. We use customer reviews to aid in making decisions to buy products online. Decisions can hinge on the likelihood of the product failing after purchase.
  5. 5. DEMO
  6. 6. The Model • Data • Historical Dataset • Scraping from Amazon • Stored in SQL database • Natural Language Processing • Cleaning • Tokenization • Bag of Words • Tagging • Manual Tagging of ~1000 sentences • Supervised Learning • Naïve Bayes classifier
  7. 7. The Model • Data • Historical Dataset • Scraping from Amazon • Stored in SQL database • Natural Language Processing • Cleaning • Tokenization • Bag of Words • Tagging • Manual Tagging of ~1000 sentences • Supervised Learning • Naïve Bayes classifier “My Toshiba Satellite crashed 5 times within the first year, each time Toshiba states they replaced the hard drive.”
  8. 8. The Model • Data • Historical Dataset • Scraping from Amazon • Stored in SQL database • Natural Language Processing • Cleaning • Tokenization • Bag of Words • Tagging • Manual Tagging of ~1000 sentences • Supervised Learning • Naïve Bayes classifier toshiba_satellite crashed 5 times within first year time toshiba states replaced hard_drive .
  9. 9. The Model • Data • Historical Dataset • Scraping from Amazon • Stored in SQL database • Natural Language Processing • Cleaning • Tokenization • Bag of Words • Tagging • Manual Tagging of ~1000 sentences • Supervised Learning • Naïve Bayes classifier toshiba_satellite crashed 5 times within first year time toshiba states replaced hard_drive .
  10. 10. The Model • Data • Historical Dataset • Scraping from Amazon • Stored in SQL database • Natural Language Processing • Cleaning • Tokenization • Bag of Words • Tagging • Manual Tagging of ~1000 sentences • Supervised Learning • Naïve Bayes classifier toshiba_satellite crashed 5 times within first year time toshiba states replaced hard_drive . 99.5% chance of failure
  11. 11. Most Predictive Features • Indicative of product failure: • “defective, replacement, flaw, dead, went, send, repair … “ • Indicative of laptop characteristics: • Screen: “screen, display, monitor, lcd, touchscreen, graphics …” • Operating System: “windows_8, windows, windows_7, os, linux, ubuntu … “ • Speed/Responsiveness: “slow, fast, memory, laggy, performance …” • …
  12. 12. Model Performance • Trade-off between precision and recall • Optimized precision with fixed recall of .5
  13. 13. Testing High Energy Physics Models with very precise Low Energy Experiments Ph.D from Yale University In Physics (2010-2016) Brendon O’Leary
  14. 14. end
  15. 15. extra
  16. 16. Model Discrimination
  17. 17. Improving Model Performance • Bootstrapped 4-fold cross validation • Optimizing Precision at Fixed Recall = 50% • Explored additional features: sentiment and word interactions • Much better than random (~10% precision with a random classifier) • Not statistically significant evidence of increased performance with added features
  18. 18. LDA to extract more natural review Topics?
  19. 19. LDA to extract more natural review Topics?
  20. 20. Optimizing the Number of Bag of Words Features Precision • L1 regularized logistic regression model performed first. • Vocabulary was sorted by magnitude of logistic regression coefficients. • Size of vocabulary was varied, optimum vocabulary size was observed, at the brink of overfitting.
  21. 21. A Snapshot at the Dataset • 14000 reviews distributed over about 9000 products and about 10 years

A presentation describing the model behind Fault Finder.

Views

Total views

480

On Slideshare

0

From embeds

0

Number of embeds

379

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×