Machine Learning - What, Where and How
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,534
On Slideshare
2,832
From Embeds
702
Number of Embeds
2

Actions

Shares
Downloads
41
Comments
0
Likes
5

Embeds 702

http://mercris.wordpress.com 699
http://www.linkedin.com 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Machine Learning What, Where and HowNarinder Kumar (nkumar@mercris.com)Mercris Technologies (www.mercris.com)
  • 2. Agenda Definition Types of Machine Learning Under-the Hood Languages & Libraries 2
  • 3. What is Machine Learning ? 3
  • 4. Definition Field of Study that gives Computers the ability to learn without being explicitly programmed --Arthur Samuel A more Mathematical one A Computer program is said to learn from Experience E with respect to some Task T and Performance measure P, if its Performance at Task in T, as measured by P, improves with Experience E –Tom M. Mitchell 4
  • 5. Related Disciplines Sub-Field of Artificial Intelligence Deals with Design and Development of Algorithms Closely related to Data Mining Uses techniques from Statistics, Probability Theory and Pattern Recognition Not new but growing fast because of Big Data 5
  • 6. Types of Machine Learning Supervised Machine Learning  Provide right set of answers for different set of questions  Underlying algorithm learns/infers over a period of time  Tries to return correct answers for similar questions Unsupervised Machine Learning  Provide data &  Let underlying algorithm find some structure 6
  • 7. Popular Use Cases Recommendation Systems  Amazon, Netflix, iTunes Genius, IMDb... Up-Selling & Churn Analysis Customer Sentiment Analysis Market Segmentation ... 7
  • 8. Understanding Regression 8
  • 9. Problem Contest 9
  • 10. Typical Machine Learning Algorithm Training Set Learning Algorithm Input Expected Hypothesis OutputFeatures 10
  • 11. Lets Simplify a bit ➢ Goal is to draw a 4000 House Sizes vs Prices Straight line which 3500 covers our Data-Set 3000 reasonably 2500 ➢ Our Hypothesis can bePrices (1000 USD) 2000 1500 hθ ( x)=θ0+θ1 x hΘthat 0+Θ1(xx)≃ y x=Θ h 1000 Such 500 ➢ 0 θ 50 100 150 200 250 300 350 400 House Sizes (Sq Yards) 11
  • 12. In Mathematical Terms➢ Hypothesis hθ ( x)=θ0+θ1 x➢ Parameters θ0 ,θ1➢ Cost Function➢ We would like to minimize J (θ0 ,θ1 ) 12
  • 13. Solution : Gradient Descent➢ Start with an initial values of θ0 , θ1➢ Keep Changing θ0 , θ1 until we end up at minimum 13
  • 14. MathematicallyRepeat Until ConvergenceFor Our ScenarioGeneric Formula 14
  • 15. Lets see all this in Action 15
  • 16. Extending Regression➢ Quadratic Model➢ Cubic Model➢ Square Root Model➢ We can create multiple new Features like X 2=X 2 X 3=X 3 X 4= √ X 16
  • 17. Additional Pointers➢ Mean Normalization➢ Feature Scaling➢ Learning Rate➢ Gradient Descent vs Others 17
  • 18. HOW-TOLanguages & Libraries 18
  • 19. Languages 19
  • 20. Libraries, Tools and Products 20
  • 21. A Short Introduction 21
  • 22. What is WEKA ? Developed by Machine Learning Group, University of Waikato, New Zealand Collection of Machine Learning Algorithms Contains tools for  Data Pre-Processing  Classification & Regression  Clustering  Visualization Can be embedded inside your application Implemented in Java 22
  • 23. Main Components Explorer Experimenter Knowledge Flow CLI 23
  • 24. Terminology Training DataSet == Instances Each Row in DataSet == Instance Instance is Collection of Attributes (Features) Types of Attributes  Nominal (True, False, Malignant, Benign, Cloudy...)  Real values (6, 2.34, 0...)  String (“Interesting”, “Really like it”, “Hate It” ...)  ... 24
  • 25. Sample DataSets@RELATION house @RELATION CPU@ATTRIBUTE houseSize real @attribute outlook {sunny, overcast,@ATTRIBUTE lotSize real rainy}@ATTRIBUTE bedrooms real @attribute temperature real@ATTRIBUTE granite real @attribute humidity real@ATTRIBUTE bathroom real @attribute windy {TRUE, FALSE}@ATTRIBUTE sellingPrice real @attribute play {yes, no}@DATA @data3529,9191,6,0,0,205000 sunny,85,85,FALSE,no3247,10061,5,1,1,224900 sunny,80,90,TRUE,no4032,10150,5,0,1,197900 overcast,83,86,FALSE,yes2397,14156,4,1,0,189900 rainy,70,96,FALSE,yes2200,9600,4,0,1,195000 rainy,68,80,FALSE,yes3536,19994,6,1,1,325000 rainy,65,70,TRUE,no2983,9365,5,0,1,230000 overcast,64,65,TRUE,yes 25
  • 26. WEKA Demo 26
  • 27. 27
  • 28. Apache Mahout➢ Collection of Machine Learning Algorithms➢ Map-Reduce Enabled (most cases)➢ DataSources ➢ Database ➢ File-System ➢ Lucene Integration➢ Very Active Community➢ Apache License 28
  • 29. WEKA vs Apache Mahout WEKA Apache-Mahout➢ Lot of Algorithms ➢ Lesser number of➢ Tools for Algorithms but ➢ Modeling growing ➢ Comparison ➢ Lack of tools for ➢ Data-Flow Modeling➢ May need work for ➢ Ready by Design for running on large data- Large Scale sets ➢ Vibrant Community➢ License Issues ➢ Apache License 29
  • 30. &An Overview 30
  • 31. Google Prediction API 101➢ Cloud Based Web Service for Machine Learning➢ Exposed as REST API➢ Does not require any Machine Learning knowledge➢ Capabilities ➢ Categorical & ➢ Regression 31
  • 32. Working with Google Prediction API 32
  • 33. Lets see in Action 33
  • 34. AnalysisVery Promising ConceptCan be powerful tool for SMEsNot configurableData SecurityNot Yet Production Ready (IMHO) 34
  • 35. Recap➢ Very vast➢ Huge demand➢ Has an Initial Steep Learning Curve➢ Several libraries available➢ Lot of Innovative work going on currently 35
  • 36. nkumar@mercris.com @kumar_narinder www.mercris.comhttp://mercris.wordpress.com 36
  • 37. Resources➢ Online Machine Learning Course - Prof. Andrew Ng, Stanford University➢ WEKA Wiki and API docs➢ Apache Mahout Wiki➢ IBM Developer Works Articles➢ Google Prediction API Web Site➢ Data Mining : Practical Machine Learning Tools & Techniques – Ian H. Witten, Eibe Frank, Mark Hall➢ Machine Learning Forums 37