Machine Learning - What, Where and How
Upcoming SlideShare
Loading in...5
×
 

Machine Learning - What, Where and How

on

  • 3,322 views

 

Statistics

Views

Total Views
3,322
Views on SlideShare
2,628
Embed Views
694

Actions

Likes
4
Downloads
33
Comments
0

2 Embeds 694

http://mercris.wordpress.com 691
http://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Machine Learning - What, Where and How Machine Learning - What, Where and How Presentation Transcript

  • Machine Learning What, Where and HowNarinder Kumar (nkumar@mercris.com)Mercris Technologies (www.mercris.com)
  • Agenda Definition Types of Machine Learning Under-the Hood Languages & Libraries 2
  • What is Machine Learning ? 3
  • Definition Field of Study that gives Computers the ability to learn without being explicitly programmed --Arthur Samuel A more Mathematical one A Computer program is said to learn from Experience E with respect to some Task T and Performance measure P, if its Performance at Task in T, as measured by P, improves with Experience E –Tom M. Mitchell 4
  • Related Disciplines Sub-Field of Artificial Intelligence Deals with Design and Development of Algorithms Closely related to Data Mining Uses techniques from Statistics, Probability Theory and Pattern Recognition Not new but growing fast because of Big Data 5
  • Types of Machine Learning Supervised Machine Learning  Provide right set of answers for different set of questions  Underlying algorithm learns/infers over a period of time  Tries to return correct answers for similar questions Unsupervised Machine Learning  Provide data &  Let underlying algorithm find some structure 6
  • Popular Use Cases Recommendation Systems  Amazon, Netflix, iTunes Genius, IMDb... Up-Selling & Churn Analysis Customer Sentiment Analysis Market Segmentation ... 7
  • Understanding Regression 8
  • Problem Contest 9
  • Typical Machine Learning Algorithm Training Set Learning Algorithm Input Expected Hypothesis OutputFeatures 10
  • Lets Simplify a bit ➢ Goal is to draw a 4000 House Sizes vs Prices Straight line which 3500 covers our Data-Set 3000 reasonably 2500 ➢ Our Hypothesis can bePrices (1000 USD) 2000 1500 hθ ( x)=θ0+θ1 x hΘthat 0+Θ1(xx)≃ y x=Θ h 1000 Such 500 ➢ 0 θ 50 100 150 200 250 300 350 400 House Sizes (Sq Yards) 11
  • In Mathematical Terms➢ Hypothesis hθ ( x)=θ0+θ1 x➢ Parameters θ0 ,θ1➢ Cost Function➢ We would like to minimize J (θ0 ,θ1 ) 12
  • Solution : Gradient Descent➢ Start with an initial values of θ0 , θ1➢ Keep Changing θ0 , θ1 until we end up at minimum 13
  • MathematicallyRepeat Until ConvergenceFor Our ScenarioGeneric Formula 14
  • Lets see all this in Action 15
  • Extending Regression➢ Quadratic Model➢ Cubic Model➢ Square Root Model➢ We can create multiple new Features like X 2=X 2 X 3=X 3 X 4= √ X 16
  • Additional Pointers➢ Mean Normalization➢ Feature Scaling➢ Learning Rate➢ Gradient Descent vs Others 17
  • HOW-TOLanguages & Libraries 18
  • Languages 19
  • Libraries, Tools and Products 20
  • A Short Introduction 21
  • What is WEKA ? Developed by Machine Learning Group, University of Waikato, New Zealand Collection of Machine Learning Algorithms Contains tools for  Data Pre-Processing  Classification & Regression  Clustering  Visualization Can be embedded inside your application Implemented in Java 22
  • Main Components Explorer Experimenter Knowledge Flow CLI 23
  • Terminology Training DataSet == Instances Each Row in DataSet == Instance Instance is Collection of Attributes (Features) Types of Attributes  Nominal (True, False, Malignant, Benign, Cloudy...)  Real values (6, 2.34, 0...)  String (“Interesting”, “Really like it”, “Hate It” ...)  ... 24
  • Sample DataSets@RELATION house @RELATION CPU@ATTRIBUTE houseSize real @attribute outlook {sunny, overcast,@ATTRIBUTE lotSize real rainy}@ATTRIBUTE bedrooms real @attribute temperature real@ATTRIBUTE granite real @attribute humidity real@ATTRIBUTE bathroom real @attribute windy {TRUE, FALSE}@ATTRIBUTE sellingPrice real @attribute play {yes, no}@DATA @data3529,9191,6,0,0,205000 sunny,85,85,FALSE,no3247,10061,5,1,1,224900 sunny,80,90,TRUE,no4032,10150,5,0,1,197900 overcast,83,86,FALSE,yes2397,14156,4,1,0,189900 rainy,70,96,FALSE,yes2200,9600,4,0,1,195000 rainy,68,80,FALSE,yes3536,19994,6,1,1,325000 rainy,65,70,TRUE,no2983,9365,5,0,1,230000 overcast,64,65,TRUE,yes 25
  • WEKA Demo 26
  • 27
  • Apache Mahout➢ Collection of Machine Learning Algorithms➢ Map-Reduce Enabled (most cases)➢ DataSources ➢ Database ➢ File-System ➢ Lucene Integration➢ Very Active Community➢ Apache License 28
  • WEKA vs Apache Mahout WEKA Apache-Mahout➢ Lot of Algorithms ➢ Lesser number of➢ Tools for Algorithms but ➢ Modeling growing ➢ Comparison ➢ Lack of tools for ➢ Data-Flow Modeling➢ May need work for ➢ Ready by Design for running on large data- Large Scale sets ➢ Vibrant Community➢ License Issues ➢ Apache License 29
  • &An Overview 30
  • Google Prediction API 101➢ Cloud Based Web Service for Machine Learning➢ Exposed as REST API➢ Does not require any Machine Learning knowledge➢ Capabilities ➢ Categorical & ➢ Regression 31
  • Working with Google Prediction API 32
  • Lets see in Action 33
  • AnalysisVery Promising ConceptCan be powerful tool for SMEsNot configurableData SecurityNot Yet Production Ready (IMHO) 34
  • Recap➢ Very vast➢ Huge demand➢ Has an Initial Steep Learning Curve➢ Several libraries available➢ Lot of Innovative work going on currently 35
  • nkumar@mercris.com @kumar_narinder www.mercris.comhttp://mercris.wordpress.com 36
  • Resources➢ Online Machine Learning Course - Prof. Andrew Ng, Stanford University➢ WEKA Wiki and API docs➢ Apache Mahout Wiki➢ IBM Developer Works Articles➢ Google Prediction API Web Site➢ Data Mining : Practical Machine Learning Tools & Techniques – Ian H. Witten, Eibe Frank, Mark Hall➢ Machine Learning Forums 37