Tools andTechnologies for Large Scale Data Mining
Upcoming SlideShare
Loading in...5
×
 

Tools andTechnologies for Large Scale Data Mining

on

  • 1,538 views

Tools andTechnologies for Large Scale Data

Tools andTechnologies for Large Scale Data
Mining

Statistics

Views

Total Views
1,538
Views on SlideShare
1,526
Embed Views
12

Actions

Likes
1
Downloads
58
Comments
0

3 Embeds 12

http://www.linkedin.com 9
https://www.linkedin.com 2
http://dschool.co 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Tools andTechnologies for Large Scale Data Mining Tools andTechnologies for Large Scale Data Mining Presentation Transcript

  • Tools andTechnologies for Large Scale Data Mining Jaganadh G Project Lead NLP R&D 365Media Pvt. Ltd. jaganadhg@gmail.com DRDO Sponsored National Level Seminar on Challenging Issues on Data Mining Semantic Web, Sri Krishna College of Engineering and Technology, Coimbatore 27th Jan 2012 Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • About me !! Software Engineer Specializing in Text Analytics Research & Development When free, teaches Python, Speaks about FOSS and blogs at http://jaganadhg.in Working as Project Lead (NLP) 365Media Pvt. Ltd. Coimbatore I am a computational linguist / Linguist and Indologist, Book reviewer Maters Degree Holder in Sanskrit from University of Kerala Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools E-mail spam filtering , product recommendations etc .. Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools E-mail spam filtering , product recommendations etc .. Fraud detection Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Examples Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Tool for building Machine Learning powerd product/service Apache Mahout Apache Mahout is a scalable machine learning library that supports large data sets. Apache Mahout’s goal is to build scalable machine learning libraries. Commercially friendly licence Well documented Healthy community Targeted to developers Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Demo Building recommendations engines with Mahout Document Classification with Mahout Some Python stuff on Machine Learning Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Reference Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective- Intelligence-Building-Applications/dp/0596529325 Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Questions ?? Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions Sreejith S and Biju B for Java help @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Jaganadh G Tools andTechnologies for Large Scale Data Mining
  • Finally Jaganadh G Tools andTechnologies for Large Scale Data Mining