4. Attempt at definition
Machine Learning
• “… gives computers the ability to learn without being explicitly
programmed” (Arthur Samuel, 1959)
• “… is the systematic study of algorithms and systems that improve their
knowledge or performance with experience” (Peter Flach, 2012)
• “… concerns systems that automatically learn programs from data” (Pedro
Domingos, 2012)
14-10-2019 page 4
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
5. Related to ML
• Artificial Intelligence
• Knowledge discovery
• (Predictive) Analytics
• Statistics / Statistical Learning
• Optimization
• Evolutionary algorithms
• Deep Learning
• Data Mining
• Pattern recognition
• Data Science
• Informatics, computer/computational science
• Econometrics
• Related buzzwords: Big Data, Internet of Things
14-10-2019 page 5
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
6. Related fields
Artificial Intelligence
Data Science
Statistics
Informatics
Econometrics
Optimization
14-10-2019 page 6
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
7. Terminology
Statistics/Econometrics Machine Learning
Independent variables, predictors Features, inputs
Dependent variable Output, response
Estimation, fitting Training, learning
Dummy coding One-hot encoding
Transformation of variables Feature engineering
Parameters Weights
Regression/classification Supervised learning
Goal is to understand (model) Goal is to predict
14-10-2019 page 7
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
8. ML Applications
• Handwriting recognition
• Facial/image recognition
• Speech recognition
• Spam filters
• Text Mining
• DNA sequence classification
• Search engines
• Stock market analysis
• Game playing
• Medical diagnostics
• Fraud detection
• Passenger screening
• Crime prediction
• Satellite image classification
• Robotics
• Automatic flight pilots
• Self-driving cars
• …
14-10-2019 page 8
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
11. “Object” recognition
in computer vision
14-10-2019 page 11
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
12. ML Techniques
Three groups:
• Supervised learning (classification, regression)
• Unsupervised learning (PCA, clustering, …)
• Reinforcement learning (agent-based)
• (Transfer learning)
• …
What do you need for a self-driving car?
14-10-2019 page 12
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
13. Supervised techniques
Specific for Classification
• Decision trees
• Bagged trees
• Boosted trees
• Random Forests
• Neural networks
• Support Vector Machines
• Genetic programming
• Bayesian Networks
• MARS
• Lasso
• Logistic regression
• Naive Bayes
• kNN
• Ensemble models
• …
14-10-2019 page 13
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
16. Example Unsupervised:
Anomaly detection
boundary case
outlier
extreme
case
Robust regression:
MVE estimation
14-10-2019 page 16
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
17. Optimisation techniques
• Linear programming
• (Mixed) Integer Programming
• Non-linear programming
• Modern optimisation techniques:
14-10-2019 page 17
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
18. Considerations
• Similarity between ML and statistical methods is often big:
representation – evaluation – optimisation
• Personal note: if it works well (prediction!), use it!
(but explainability may also matter)
• Data preparation (incl. feature engineering) often is 80% of the work
• Bias-Variance dilemma remains (overfitting)
• Perform fair comparison using ROC-curves on independent testset
• “No free lunch”: no single technique is always best
=> Use expert knowledge and choose representation fitting the problem
(data alone is not enough)
• Curse of dimensionality: input space grows exponentially with k, the
number of observations (generally) does not
• Consider making multiple models and combining (ensembles)
14-10-2019 page 18
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
19. Sources
• Literature (use Google Scholar and arXiv)
• Data (Kaggle, UCI, Quandl, governments, APIs)
• Competitions (Kaggle, Topcoder, HackerRank, CrowdAnalytix)
• Courses (Coursera, Udacity, Udemy, DataCamp)
• Academic education (A’dam School of Data Science,
Eindhoven, Delft, Tilburg)
• Fora (Kaggle, Stackoverflow, Quora)
• Other websites (Analytics Vidhya, Data Science Central,
DeepMind, DutchDigitalDelta-Commit2data)
•
14-10-2019 page 19
s (e.g. ADS and AMDS)
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
20. Literature
• Articles
– A Few Useful Things to Know About Machine Learning (Pedro Domingos, CACM Oct 2012)
– Statistical Modeling: The Two Cultures (Leo Breiman, Statistical Science 2001)
• Books
– The Elements of Statistical Learning (Hastie/Tibshirani/Friedman; Springer
2008)
– Applied Predictive Modeling (Kuhn/Johnson; Springer 2013)
– Machine Learning (Flach; Cambridge Univ. Press 2012)
– Reinforcement Learning: An Introduction (Sutton/Barto; MIT Press 2012)
– Artificial Intelligence: A Modern Approach 3rd ed. (Russell/Norvig; Prentice Hall
2016)
– Modern Optimization with R (Cortez; Springer 2014)
14-10-2019 page 20
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
21. Software
• General purpose programming languages
– Python
– R
– SAS
– Matlab
• ML environments/libraries
– MS Azure
– Google Tensorflow (for R)
– AWS: Amazon Web Services
14-10-2019 page 21
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
22. Machine Learning in R
• CRAN Task View:
Machine Learning & Statistical Learning
• Caret package
– Vignette
– Many model types
– Training and prediction
– Variable importance
– Parameter tuning
– Cross-Validation, ROC curves, plots
– etc.
• Tensorflow interface (via Python)
14-10-2019 page 22
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
23. Machine Learning in Python
14-10-2019 page 23
Definition
Applications
Techniques
Considerations
Sources
Software
ML in R & Python
• scikit-learn
• pytorch
(for deep learning)
• (auto-ml)
24. Reinforcement Learning
• MDP: Markov Decision Process
• Environment (S,A,P,R) entirely or partly known
• Packages in R
– MDPtoolbox
– ReinforcementLearning
• Code in Python
– Lots on github,
e.g. DeepMind TRFL
• Self coding
14-10-2019 page 24