MongoDB & Machine Learning

9,778
-1

Published on

Update: Social Harvest is going open source, see http://www.socialharvest.io for more information.

My MongoSV 2011 talk about implementing machine learning and other algorithms in MongoDB. With a little real-world example at the end about what Social Harvest is doing with MongoDB. For more updates about my research, check out my blog at www.shift8creative.com

Published in: Technology, Education
0 Comments
24 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,778
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
122
Comments
0
Likes
24
Embeds 0
No embeds

No notes for slide

MongoDB & Machine Learning

  1. 1. <ul>Machine Learning </ul><ul>Tom Maiaroto @shift8creative </ul>
  2. 2. <ul>What is Machine Learning? </ul>
  3. 3. <ul>Algorithms & Approaches </ul><ul>Decision trees   Random forests   Artificial neural networks     k-NN (nearest neighbour)     Naive Bayesian classifier </ul>
  4. 4. <ul>Algorithms & Approaches </ul><ul>Decision trees   Random forests   Artificial neural networks     k-NN (nearest neighbour)     Naive Bayesian classifier </ul>
  5. 5. <ul>So could machines one day rule the earth? </ul>
  6. 6. <ul>So could machines one day rule the earth? </ul><ul>  Maybe   (ok probably not) </ul>
  7. 7. <ul>What can Machine Learning  do for Apps? </ul><ul>  Spam filtering </ul>
  8. 8. <ul>What can Machine Learning  do for Apps? </ul><ul>Auto-tagging </ul>
  9. 9. <ul>What can Machine Learning  do for Apps? </ul><ul>All Sorts of Categorization </ul>
  10. 10. <ul>What can Machine Learning  do for Apps? </ul><ul>Sentiment Analysis </ul>
  11. 11. <ul>Languages Commonly Used </ul><ul><ul><li>Java </li></ul></ul><ul><ul><ul><li>Java-ML, WEKA, Apache Mahout, many more... </li></ul></ul></ul><ul><ul><li>Python </li></ul></ul><ul><ul><ul><li>NLTK, scikit-learn, PyML, a good deal more... </li></ul></ul></ul><ul><ul><li>C++ </li></ul></ul><ul><ul><ul><li>libDAI, Armadillo, Orange, tons more... </li></ul></ul></ul><ul>    and then some others... </ul>
  12. 12. <ul>Languages Commonly Used </ul><ul>    http://www.mloss.org </ul>
  13. 13. <ul>MongoDB Too! </ul><ul><ul><li>Map/Reduce
  14. 14. Stored JavaScript
  15. 15. Geo-spatial Indexing
  16. 16. Replication </li></ul></ul>
  17. 17. <ul>Geo-spatial Indexing </ul><ul>Did someone say nearest neighbour? </ul>
  18. 18. <ul>Geo-spatial Indexing </ul><ul>Did someone say nearest neighbour? Design geeks, imagine the visualizations... </ul>
  19. 19. <ul>Replication </ul><ul><ul><li>Store massive amounts of data
  20. 20. Distributed performance benefits
  21. 21. Dedicated databases for calculations  </li></ul></ul><ul>    All the obvious benefits. </ul>
  22. 22. <ul>Map/Reduce </ul><ul>It's the brain. </ul>
  23. 23. <ul>Map/Reduce </ul><ul>It's the brain. It's not just for aggregation. </ul>
  24. 24. <ul>Map/Reduce </ul><ul>It's the brain. It's not just for aggregation.       It's faster than you might think. </ul>
  25. 25. <ul>Map/Reduce </ul><ul>It's the brain. It's not just for aggregation.       It's faster than you might think. It runs in the database. </ul>
  26. 26. <ul>Map/Reduce </ul><ul>In the computer. .. </ul>
  27. 27. <ul>Example Time! </ul><ul>It's simple...Just take this... </ul>
  28. 28. <ul>Example Time! </ul><ul>It's simple...Just take this... </ul>
  29. 29. <ul>Example Time! </ul><ul>Just kidding...       Let's Break Down a Naive Bayes Classifier </ul>
  30. 30. <ul>Classification /Naive Bayes </ul><ul>Training the System </ul>
  31. 31. <ul>Classification /Naive Bayes </ul><ul>Training the System Simple... $inc </ul>
  32. 32. <ul>Classification /Naive Bayes </ul><ul>Just Keep Count of Words per Category </ul>
  33. 33. <ul>Classification /Naive Bayes </ul><ul>Reduce: </ul>
  34. 34. <ul>Classification /Naive Bayes </ul><ul>Reduce: </ul>
  35. 35. <ul>Classification /Naive Bayes </ul><ul>Finalize: </ul>
  36. 36. <ul>Classification /Naive Bayes </ul><ul>Finalize: </ul>
  37. 37. <ul>Classification /Naive Bayes </ul><ul>Call the Command: </ul>
  38. 38. <ul>Classification /Naive Bayes </ul><ul>Results: </ul><ul>Can see total words. Can also see word  counts per category. </ul>
  39. 39. <ul>Classification /Naive Bayes </ul><ul>Results: </ul><ul>...and of course the scores per category... </ul><ul>cae = arts and entertainment cs = science ... </ul>
  40. 40. <ul>Classification /Naive Bayes </ul><ul><ul><li>Accurate even with little training
  41. 41. MongoDB on a small VM Took 1.7 seconds
  42. 42. Compared to say PHP 33 seconds and timed out
  43. 43. More training data == exponentially faster than PHP </li></ul></ul>
  44. 44. <ul>Classification /Naive Bayes </ul><ul><ul><li>This wasn't even a full map/reduce
  45. 45. Your mileage will vary based on formula
  46. 46. You can cache certain values for speed
  47. 47. Don't forget about stored JavaScript (but use it wisely) </li></ul></ul>
  48. 48. <ul>Porter Stemming Algorithm </ul><ul>  Thank You Martin Porter http://tartarus.org/martin/PorterStemmer </ul>
  49. 49. <ul>Porter Stemming Algorithm </ul><ul><ul><li>Exists for nearly every language
  50. 50. MongoDB will use JavaScript of course
  51. 51. Decent execution time </li></ul></ul>
  52. 52. <ul>Porter Stemming Algorithm </ul><ul><ul><li>About 2.5x faster than PHP class
  53. 53. 663x faster than a web browser </li></ul></ul>
  54. 54. <ul>Porter Stemming Algorithm </ul><ul><ul><li>About 2.5x faster than PHP class
  55. 55. 663x faster than a web browser
  56. 56. 7x slower than PHP PECL extension </li></ul></ul>
  57. 57. <ul>Real World Application </ul><ul>Social Harvest Analyzes social data from the internet to determine languages spoken, gender, age, sentiment analysis, and categories.     </ul><ul>www.social-harvest.com </ul>
  58. 58. <ul>Real World Application </ul><ul>Social Harvest Who doesn't like pie charts? </ul>
  59. 60. <ul>Follow Tom </ul><ul>@shift8creative www.shift8creative.com www.social-harvest.com   www.union-of-rad.com </ul><ul>  Thank You! </ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×