Think Machine Learning with Scikit-
learn (Python)
By: Chetan Khatri
Principal Big Data Engineer, Nazara Technologies.
Data Science Lab, The Department of Computer Science, University of Kachchh.
About me
l- Principal Big Data Engineer, Nazara Technologies.
l- Technical Reviewer – Packt Publication.
l- Ex. Developer - Eccella Corporation.
lAlumni, The Department of Computer Science, KSKV Kachchh
University.
Outline
lAn Introduction to Machine Learning
lHello World in Machine learning with 6 lines of
code
lVisualizing a Decision Tree
lClassifying Images
lSupervised learning : Pipeline
lWriting first Classifier
Early Days AI Programs : Deep Blue
Now, AI Programs
lAlpha go is best example, wrote for Playing Go
game, but it can play Atari games also.
Machine Learning
lMachine Learning does this possible, it is study of
algorithms which learns from examples and
experience having set of rules and hard coded
lines.
l“Learns from Examples and Experience”
Let's have problem
lLet's have problem: It seems easy but difficult to
solve without machine learning.
Open Source Libraries
Classifier
Scikit-learn
Test ! No error ! Yay !!
Supervised Learning
Collecting
Training
Data
Train
Classifier
Make
Predictions
Training Data
Weight Texture Label
150g Bumpy Orange
170g Bumpy Orange
140g Smooth Apple
130g Smooth Apple
Feature
s
Example
s
Training Data
Important Concepts
lHow does this work in Real world ?
lHow much training data do you need ?
lHow is the tree created ?
lWhat makes a good feature ?
Many Types of Classifier
lArtificial Neural Network (ANN)
lSupport Vector Machine (SVM)
lNearest Neighbour classifier (KNN)
lRandom Forest (RF)
lGradient Boosting Machine (GBM)
lEtc..
lEtc..
Demo
2. Visualizing a Decision Tree
3. What Makes a Good Feature?
lImagine we want to write classifier to classify two
types of dogs.
Variation in the world !
lHands - On
About 80% of dogs at this height are
labs
About 95% of dogs at this height are
greyhounds
lFeature captures different types of information
Thought Experiment
Avoid useless features
Independent features are best
lHeight in Inches
lHeight in centimeters
lHeight in Inches
lHeight in centimeters
lAvoid Redundant features
lFeature should be easy to understand
lSimpler relationships are easier to learn.
lIdeal features are:
lInformative
lIndependent
lSimple
4. Pipeline - Machine Learning
lhttp://playground.tensorflow.org/
5. Writing our first classifier
lMeasure Distance
Demo
lImplement nearest neighbor Algorithm
Next Step
Thank you

Think Machine Learning with Scikit-Learn (Python)