Tools andTechnologies for Large Scale Data Mining

2,005 views

Published on

Tools andTechnologies for Large Scale Data
Mining

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,005
On SlideShare
0
From Embeds
0
Number of Embeds
49
Actions
Shares
0
Downloads
85
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Tools andTechnologies for Large Scale Data Mining

  1. 1. Tools andTechnologies for Large Scale Data Mining Jaganadh G Project Lead NLP R&D 365Media Pvt. Ltd. jaganadhg@gmail.com DRDO Sponsored National Level Seminar on Challenging Issues on Data Mining Semantic Web, Sri Krishna College of Engineering and Technology, Coimbatore 27th Jan 2012 Jaganadh G Tools andTechnologies for Large Scale Data Mining
  2. 2. About me !! Software Engineer Specializing in Text Analytics Research & Development When free, teaches Python, Speaks about FOSS and blogs at http://jaganadhg.in Working as Project Lead (NLP) 365Media Pvt. Ltd. Coimbatore I am a computational linguist / Linguist and Indologist, Book reviewer Maters Degree Holder in Sanskrit from University of Kerala Jaganadh G Tools andTechnologies for Large Scale Data Mining
  3. 3. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  4. 4. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  5. 5. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Jaganadh G Tools andTechnologies for Large Scale Data Mining
  6. 6. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Jaganadh G Tools andTechnologies for Large Scale Data Mining
  7. 7. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Jaganadh G Tools andTechnologies for Large Scale Data Mining
  8. 8. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Jaganadh G Tools andTechnologies for Large Scale Data Mining
  9. 9. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Jaganadh G Tools andTechnologies for Large Scale Data Mining
  10. 10. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools E-mail spam filtering , product recommendations etc .. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  11. 11. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools E-mail spam filtering , product recommendations etc .. Fraud detection Jaganadh G Tools andTechnologies for Large Scale Data Mining
  12. 12. Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  13. 13. Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  14. 14. Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  15. 15. Tool for building Machine Learning powerd product/service Apache Mahout Apache Mahout is a scalable machine learning library that supports large data sets. Apache Mahout’s goal is to build scalable machine learning libraries. Commercially friendly licence Well documented Healthy community Targeted to developers Jaganadh G Tools andTechnologies for Large Scale Data Mining
  16. 16. Algorithms in Apache Mahout Jaganadh G Tools andTechnologies for Large Scale Data Mining
  17. 17. Algorithms in Apache Mahout Collaborative Filtering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  18. 18. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Jaganadh G Tools andTechnologies for Large Scale Data Mining
  19. 19. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  20. 20. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  21. 21. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  22. 22. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Jaganadh G Tools andTechnologies for Large Scale Data Mining
  23. 23. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Jaganadh G Tools andTechnologies for Large Scale Data Mining
  24. 24. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Jaganadh G Tools andTechnologies for Large Scale Data Mining
  25. 25. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Jaganadh G Tools andTechnologies for Large Scale Data Mining
  26. 26. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Jaganadh G Tools andTechnologies for Large Scale Data Mining
  27. 27. Demo Building recommendations engines with Mahout Document Classification with Mahout Some Python stuff on Machine Learning Jaganadh G Tools andTechnologies for Large Scale Data Mining
  28. 28. Reference Jaganadh G Tools andTechnologies for Large Scale Data Mining
  29. 29. Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective- Intelligence-Building-Applications/dp/0596529325 Jaganadh G Tools andTechnologies for Large Scale Data Mining
  30. 30. Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Jaganadh G Tools andTechnologies for Large Scale Data Mining
  31. 31. Questions ?? Jaganadh G Tools andTechnologies for Large Scale Data Mining
  32. 32. Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions Sreejith S and Biju B for Java help @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Jaganadh G Tools andTechnologies for Large Scale Data Mining
  33. 33. Finally Jaganadh G Tools andTechnologies for Large Scale Data Mining

×