Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Aiday
1. AGENDA - DATA SCIENCE IN PRACTICE
• The ”compact” version of data science activities
• Data science process breakdown step by step
• Vote to see deep / reinforcement learning demo
4. SIMPLIFICATION OF DATA SCIENCE
PROCESS
(1) Business
Understanding
+
(2) Data
understanding
(3) Data processing
+
(4) Feature
Engineering
(5) Model selection
+
(6) Performance
Evaluation
(7) Deployment
+
(8) Consumption
5. BUSINESS UNDERSTANDING & DATA UNDERSTANDING
(1) Business
Understanding
+
(2) Data
understanding
(3) Data processing
+
(4) Feature
Engineering
(5) Model selection
+
(6) Performance
Evaluation
(7) Deployment
+
(8) Consumption
6. Data Science is an
iterative
process,
and EVERY decision a Data
Scientist made in the process is a
Trade-off
14. Observation Interval of
distance
Direction to the
right
A B
Highly correlated(0.75~1) : Tesla car and Volvo car moving almost at the same speed and
toward the same direction
Negatively correlated(<0) : Tesla car and Volvo car moving toward different directions
Positively correlated (0.5 ~0.75) : Tesla car move a bit faster than Volvo car but they are
still both heading at the same direction
explain Correlation with a metaphor
continued
Distance=1-corr
0
Distance=1-corr 0.25-
0.5
Distance=1-corr 0. 5- 1
16. MODEL SELECTION & PERFORMANCE
EVALUATION
(1) Business
Understanding
+
(2) Data
understanding
(3) Data processing
+
(4) Feature
Engineering
(5) Model selection
+
(6) Performance
Evaluation
(7) Deployment
+
(8) Consumption
17. Q : do you know
the answer to the
question you asked
?
Supervise
d
learning
Regressio
ns
Classes
Unsupervis
ed learning
Deep
learning
Clusterin
g
Associati
on
analysis
Ye
s
No
18.
19. GENERIC SUPERVISED MACHINE LEARINNG
FLOW
(1) Business
Understanding
+
(2) Data
understanding
(3) Data processing
+
(4) Feature
Engineering
(5) Model selection
+
(6) Performance
Evaluation
(7) Deployment
+
(8) Consumption
21. MODEL PERFORMANCE EVALUATION
USE SUPERVISED LEARNING AS AN EXAMPLE
(1) Business
Understanding
+
(2) Data
understanding
(3) Data processing
+
(4) Feature
Engineering
(5) Model selection
+
(6) Performance
Evaluation
(7) Deployment
+
(8) Consumption
22. NAIVE WAY TO LOOK AT IT – ACCURACY !
Just
guessing
Better than
guessing
23. ZOOM-IN, WHAT IS MORE IMPORTANT ?
Breast
cancer
Recurrent
(=1)
Breat cancer
Not
recurrent(=0)
Breast Cancer
Recurrent(=1)
True Positive
Type II
error
(False
Negative)
Breast Cancer
Not
recurrent(=0)
Type I
error
(False
Positive)
True
Negative
PredictedLabel
True Label
Prediction says that you dont have
BreastCancer
but you acutlaly DO !!!
34. Supervised Learning
Regressions:
Linear Regression
Step-wised Regression
Piecewise Polynomials and splines
Smoothing Splines
Logistic Regression
Multivariate Adaptive Regression Splines
Least Absolute Shrinkage and Selection Operator (LASSO)
Ridge Regression
Linear Discriminant Analysis (LDA)
Trees :
Decision trees
Gradient Boosted Regression trees
Adaptive Boosting trees (AdaBoost)
Conditional Inference trees (CI trees)
Bootstrap Aggregation (Bagging) trees
Gradient Boosted Machines(GBM)
Random Forest (RF)
Support Vector Machines (SVM) :
Support vector classifier (two class)
Support vector classifier (multiclass)
Kernels and support vector machines
Dimensionality reduction:
Principal Component Analysis(PCA)
Singular Value Decomposition (SVD)
MinHash
Locality Sensitive Hashing(LSH)
t-Distributed Stochastic Neighbor embedding (t-SNE)
Clustering :
Kmeans Clustering
Hierarchical Clustering
Bradley-Fayyad-Reina (BFR) clustering
Clustering Using REpresentatives CURE clustering
Bayesian networks
Topic modelling
Market Basket :
Apriori (association rules)
Park Chen and Yu algorithm (PCY)
Savasere, Omiecinski and Navathe (SON)
Toivonen’s algorithm
Stream Analysis :
Bloom filters
Flajolet-Martin Algorithm
Alon-Matias-Szegedy
Datar-Gionis-Indyk-Motwani algorithm
Unsupervised Learning
NeuralNetwork families
Deep Learning
Perceptrons
Simple Neural Networks (fully
connected )
Deep Boltzmann machines
Convolutional neural networks
Recurrent neural networks
Genetic algorithm (chromosome)
Multi-arm bandit
K’s Nearest Neighbors (KNN)
Content based recommender
User-User recommender
Item-item recommender
Hybrid recommender
Latent Dirichlet Allocation
Recommender Systems
Others
Others
35. Supervised Learning
Regressions: What kind of problem it addresses
Linear Regression
Step-wised Regression
Piecewise Polynomials and splines
Smoothing Splines
Logistic Regression
Multivariate Adaptive Regression Splines
Least Absolute Shrinkage and Selection Operator (LASSO)
Ridge Regression
Linear Discriminant Analysis (LDA)
Trees : What kind of problem it address
Decision trees
Gradient Boosted Regression trees
Adaptive Boosting trees (AdaBoost)
Conditional Inference trees (CI trees)
Bootstrap Aggregation (Bagging) trees
Gradient Boosted Machines(GBM)
Random Forest (RF)
Support Vector Machines (SVM) : problem it address
Support vector classifier (two class)
Support vector classifier (multiclass)
Kernels and support vector machines
Dimensionality reduction: type of problems it address
Principal Component Analysis(PCA)
Singular Value Decomposition (SVD)
MinHash
Locality Sensitive Hashing(LSH)
t-Distributed Stochastic Neighbor embedding (t-SNE)
Clustering :
Kmeans Clustering
Hierarchical Clustering
Bradley-Fayyad-Reina (BFR) clustering
Clustering Using REpresentatives CURE clustering
Bayesian networks
Topic modelling
Market Basket : type of problems it address
Apriori (association rules)
Park Chen and Yu algorithm (PCY)
Savasere, Omiecinski and Navathe (SON)
Toivonen’s algorithm
Stream Analysis : problems it address
Bloom filters
Flajolet-Martin Algorithm
Alon-Matias-Szegedy
Datar-Gionis-Indyk-Motwani algorithm
Unsupervised Learning
NeuralNetwork families
Deep Learning
Perceptrons
Simple Neural Networks (fully
connected )
Deep Boltzmann machines
Convolutional neural networks
Recurrent neural networks
Genetic algorithm (chromosome)
Multi-arm bandit
K’s Nearest Neighbors (KNN)
Content based recommender
User-User recommender
Item-item recommender
Hybrid recommender
Latent Dirichlet Allocation
Recommender Systems
Others
Others
How many patients will enter
ICU in a given time ?
Does patient number X having
Breast cancer Yes/No ?
Does patient number X having
Breast cancer Yes/No ?
I have 1200 features (age, gender,
income, diagostic codes, hospital
visits...etc) in my data about the
patients, is there a simplier way to
group those features ?
Diet preference - If I love eating
strawberries, will I also like
raspberries ?
How do i count a live-feed camera
(=stream) # of patients passing by
this check-point ?
(1) Object detection – people
& cars
(2) NLP – respond to a
sentence
Predict rating of hospitals
36. Frequently used
english letters
Countries /cities ?
Relative positions
?
Frequently used
nouns in a
sentense ?
Frequently used past tensed
verbs
Geographic
regions &
nationalities
Education
facilities
Concept of
good/just ?
Editor's Notes
Microsoft : https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Microsoft : https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Microsoft : https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Align the two blocks
Why/When do we use PCA ? when lots of columns are highly correlated usually a bad practice for regression models such as LDA
So PCA does two things (1) reduce dimension (2) remove collinearlity
We have two cars, one tesla car and one volvo car here.
During the interval of this distance ( from point A to point B) , we know that these two cars are both moving toward the direction to the right at almost the same ” speed
We know that when we observe these two car from point A to point B, we can see that these two car will arrive approximated at the same place and they move alone the path quite simontenously syncronized
Now this could be due to that there were a husband and a wife ( both own a car) were driving home together , it could be these two cars were in a racing track
It could be completedly coincidential , two strangers were just happen to join together in this road toward the same direction within this observed path A to B
Now since we do not been given enough information, we have no idea which of these scenario it is .. The only valid conclusion we could draw from this is that
When we observed car tesla car and the volvo car, we know that these two car move together almost syncronized in speed and time ( which translate to the distance they covered is quite similar as well)
So if we know the fact that we will eventually get tesla car when we standing at point B, we know that we will also have volvo car there as well when we see the tesla car
Now we only need to know one of the car ( either tesla or volvo ) when we are at point B to determined how many distance these two cars covered ( since they arrive at the same point B at almost at the same time.. So we actually can just pick one..
This means that these two cars are positively correlated and their correlation is quite strong , approach to 1 since they are moving toward the same direction quite simuteneously
Now think about the fact that we did not know if these two cars happend to move toward the same direction simutenously by accident or if there is some scenarios behind the scene that is yet to discover.. Which means that correlation ( either positively or negatively does not mean causation )
So why is it important for feature engineering to know this ?
Oki, so let’s say that we want to know fuel consumption efficiency with cars, we then should NOT take tesla car into cosideration, since tesla car did not even use fuel
Hence it will just comfused the model i build, the model could not possibly know why tesla car has only zero as values through and through..when it comes to fuel comsumption
Hence it is actually harmful to not carefully select your features
We have two cars, one tesla car and one volvo car here.
During the interval of this distance ( from point A to point B) , we know that these two cars are both moving toward the direction to the right at almost the same ” speed
We know that when we observe these two car from point A to point B, we can see that these two car will arrive approximated at the same place and they move alone the path quite simontenously syncronized
Now this could be due to that there were a husband and a wife ( both own a car) were driving home together , it could be these two cars were in a racing track
It could be completedly coincidential , two strangers were just happen to join together in this road toward the same direction within this observed path A to B
Now since we do not been given enough information, we have no idea which of these scenario it is .. The only valid conclusion we could draw from this is that
When we observed car tesla car and the volvo car, we know that these two car move together almost syncronized in speed and time ( which translate to the distance they covered is quite similar as well)
So if we know the fact that we will eventually get tesla car when we standing at point B, we know that we will also have volvo car there as well when we see the tesla car
Now we only need to know one of the car ( either tesla or volvo ) when we are at point B to determined how many distance these two cars covered ( since they arrive at the same point B at almost at the same time.. So we actually can just pick one..
This means that these two cars are positively correlated and their correlation is quite strong , approach to 1 since they are moving toward the same direction quite simuteneously
Now think about the fact that we did not know if these two cars happend to move toward the same direction simutenously by accident or if there is some scenarios behind the scene that is yet to discover.. Which means that correlation ( either positively or negatively does not mean causation )
So why is it important for feature engineering to know this ?
Oki, so let’s say that we want to know fuel consumption efficiency with cars, we then should NOT take tesla car into cosideration, since tesla car did not even use fuel
Hence it will just comfused the model i build, the model could not possibly know why tesla car has only zero as values through and through..when it comes to fuel comsumption
Hence it is actually harmful to not carefully select your features
Notes on this slide-
Source : http://scott.fortmann-roe.com/docs/BiasVariance.html
For forecasting models
Input : datetime + counts ( see below example of DeviceNumber 41274’s daily count)
Date counts
2015-01-01 520
2015-01-02 319
2015-01-03 389
2015-01-04 355
2015-01-05 437
2015-01-06 333
output & performance: also a time series but with lower and upper bound
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2018.01 159.8044 23.64662 295.9622 -48.43096 368.0398
2018.02 299.7230 143.70186 455.7441 61.10926 538.3367
2018.03 356.6332 198.41676 514.8496 114.66204 598.6043
2018.04 345.0193 186.61808 503.4206 102.76551 587.2732
2018.05 308.1619 149.19870 467.1251 65.04866 551.2751
2018.06 279.4213 118.15266 440.6899 32.78220 526.0604
Output (Visual)
Note : it should be evident that this type of model in forecasting cannot ’’predict’’ the top 5 destination since it does not accept any other inputs except for date+Counts
For prediction models :
Input : Features of choices + label (= correct answer to the question) example below note that for this type of the model, one needs to feed BOTH the features as well as the ’’answer’’ per record
DeviceNumber PublicHoliday year month hour date2hour DestinationZoneName
41274 0 2014 6 3 2014-06-01 3 KASTRUP/KO0OPENHAMN(F+L)
41274 0 2014 6 3 2014-06-01 3 MALMO0O
41274 0 2014 6 4 2014-06-01 4 KASTRUP/KO0OPENHAMN(F+L)
Output&preformance : performance usually represented by how ’’accurate’’ the model is predicting per correct class ( given the features fed to it) , i.e if you have 18 destination to predict , then its accuracy is measured by how well it is answering correctly on all 18 classes !