Catalit LLC
MACHINE LEARNING
Francesco Mosconi, PhD
Data Weekends & CATALIT LLC
Data Weekends Catalit
Techniques, Best Practices and Practical Application
Catalit LLC
INSTALL
• while I talk ….. download and install
• Anaconda Python 3.6
• get code at www.dataweekends.com/tdwi
Catalit LLC
WHY HERE ?
Catalit LLC
WHY ML?
https://futureoflife.org
Catalit LLC
THIS SESSION
• Recognize problems & choose right ML technique
• Load data with Pandas
• Build classification model with Scikit-Learn
• Evaluate model performance with Scikit-Learn
Catalit LLC
YOUR BRAIN…
…A GREAT
PATTERN RECOGNIZER
http://scriptoriumdaily.com/
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
Catalit LLC
REGRESSION
CLASSIFICATION
CLUSTERING
Catalit LLC
APPLICATIONSSentiment AnalysisBook recommendation
Human recognition
House price prediction
Document classification
Catalit LLC
MACHINE LEARNING
Catalit LLC
SUPERVISED LEARNING
http://www.realsafety.org/wp-content/uploads/2014/11/safety-supervisors-interaction.png
Catalit LLC
SUPERVISED LEARNING
Catalit LLC
UNSUPERVISED LEARNING
http://blog.mwbookkeeping.co.uk/files/82/images/LEADER_2110.jpg
Catalit LLC
UNSUPERVISED LEARNING
Catalit LLC
PREDICTIONTYPE
http://i.ytimg.com/vi/WX0hnuniLpI/maxresdefault.jpg
Catalit LLC
PREDICTIONTYPE
CATEGORICAL CONTINUOUS
Eye colors Height of children
Courses at university Weight of cars
Highest degree Speed of the train
Gender Temperature
Spam or not Stock price
Catalit LLC
COMBINED
CATEGORICAL CONTINUOUS
SUPERVISED CLASSIFICATION REGRESSION
UNSUPERVISED CLUSTERING
Catalit LLC
APPLICATIONSSentiment AnalysisBook recommendation
Human recognition
House price prediction
Document classification
Catalit LLC
BINARY CLASSIFICATION
Charges
Has
Internet
Internet
Type
Tenure Churn
Client 1 62.43 No
No
Internet
23 No
Client 2 180.23 Yes DSL 6 No
Client 3 670.85 Yes Fiber 18 Yes
Features Labels
Data Point
Catalit LLC
ML STEPS
1.
Collection
2.
Processing
3. Model
Building
4.
Evaluation
5.
Deployment
Catalit LLC
BENCHMARK
• What’s the dumbest model you can think of?
Catalit LLC
BUILD MODEL
Catalit LLC
BUILD MODEL
http://www.aboutdm.com/2013/04/history-of-machine-learning.html
Catalit LLC
LOGISTIC REGRESSION
probability of
belonging to
class
value of independent
variable
• Advantages:
• simple function
• can be parallelized
• large scale
GOAL: minimize squared error for probability
Catalit LLC
NEURAL NETWORKS
Input layer Several hidden layers
Output layer
• Advantages:
• Unstructured data
• No limit to complexity
• Great on large datasets
Catalit LLC
DECISIONTREES
Comedy?
Yes No
Watch Foreign Film
Yes No
Watch Pass
• Advantages:
• easy to interpret
• fast prediction
• rules based
Catalit LLC
DECISIONTREE
Multiway splitsBinary splits
Catalit LLC
ENSEMBLE METHODS
Bagging
Boosting
Random Forest
Catalit LLC
EVALUATE MODEL
Catalit LLC
MODEL PERFORMANCE
Training
data
Testing
data
Model
Train
Model
Measure
performance
Alldataavailable
Catalit LLC
CONFUSION MATRIX
• Accuracy: Overall, how often is it correct?
• (TP +TN) / total
Test Negative Test Positive
Condition
Negative
TRUE NEGATIVE
FALSE POSITIVE
(Type I error)
Condition
Positive
FALSE NEGATIVE
(Type II error)
TRUE POSITIVE
Catalit LLC
CLASSIFICATION SCORES
• Accuracy: Overall, how often correct?
• (TP +TN) / total
• Precision: Test positive, how often prediction correct?
• TP / test yes
• Recall: Actual value positive, how often prediction correct?
• TP / actual yes
Test Negative Test Positive
Condition
Negative
TRUE NEGATIVE
FALSE POSITIVE
(Type I error)
Condition
Positive
FALSE NEGATIVE
(Type II error)
TRUE POSITIVE
Catalit LLC
TRAIN /TEST
Train
Test
Train
Test
Train
Test
Train
Test
Train
Catalit LLC
CROSSVALIDATION
Round 1
Test
Train
Train
Train
Train
93%
Round 2
Train
Test
Train
Train
Train
89%
Round 3
Train
Train
Test
Train
Train
91%
Round 4
Train
Train
Train
Test
Train
92%
Round 5
Train
Train
Train
Train
Test
88%
Accuracy = Average(Rounds)
Catalit LLC
LEARNING CURVE
More data or better model?
0.7
0.8
0.9
1
1.1
0 300 600 900 1200 1500
Test Scores Train Scores
Catalit LLC
FEATURE SELECTION
Catalit LLC
PYDATA
Bokeh
Catalit LLC
TUTORIAL
Catalit LLC
ANY QUESTIONS ?
remember to give feedback using the app
www.catalit.com
info@catalit.com

Machine Learning: Techniques, Best Practices and Practical Application