SlideShare a Scribd company logo
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Read this… A brilliant read that offers
an accessible overview of predictive
analytics, technical but at the same
time a recreational read with ample
practical examples, and it provides
footnotes for further study...
I highly recommend it…
Deriving Knowledge from Data at Scale
Review of Course Plan…
W5: Clustering Review
Clustering Assignment
W6: Feature Select/Create
SVMs & Regression
Data Prep Assignment
Kaggle Contest HW
W7: SVMs Cont’d
Deriving Knowledge from Data at Scale
• Opening Discussion 30 minutes
Review Discussion…
• Data ScienceHands On 60 minutes
• Break 5 minutes
• Data Science Modelling 30 minutes
Model performance evaluation…
• Machine Learning Boot Camp ~60 minutes
Clustering, k-Means…
• Close
Deriving Knowledge from Data at Scale
• Clustering
• Clustering in Weka
• Class Imbalance
• Performance Measures
Deriving Knowledge from Data at Scale
• Opening Discussion 30 minutes
Review Discussion…
• Data ScienceHands On 60 minutes
• Break 5 minutes
• Data Science Modelling 30 minutes
Model performance evaluation…
• Machine Learning Boot Camp ~60 minutes
Clustering, k-Means…
• Close
Deriving Knowledge from Data at Scale
To keep your sensor cheap and simple, you
need to sense as few of these attributes as
possible to meet the 95% requirement.
Question: Which attributes should your
sensor be capable of measuring?
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Diversity of Opinion
Independence
Decentralization
Aggregation
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Began October 2006
http://www.wired.com/business/2009/09/how-the-netflix-prize-was-won/, a light read (highly suggested)
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
from http://www.research.att.com/~volinsky/netflix/
However, improvement slowed…
Deriving Knowledge from Data at Scale
The top team posted a 8.5% improvement.
Ensemble methods are the best performers…
Deriving Knowledge from Data at Scale
“Thanks to Paul Harrison's collaboration, a
simple mix of our solutions improved our result
from 6.31 to 6.75”
Rookies
Deriving Knowledge from Data at Scale
“My approach is to combine the results of many
methods (also two-way interactions between
them) using linear regression on the test set.
The best method in my ensemble is regularized
SVD with biases, post processed with kernel
ridge regression”
Arek Paterek
http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
Deriving Knowledge from Data at Scale
“When the predictions of multiple RBM models and
multiple SVD models are linearly combined, we
achieve an error rate that is well over 6% better than
the score of Netflix’s own system.”
U of Toronto
http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
Deriving Knowledge from Data at Scale
Gravity
home.mit.bme.hu/~gtakacs/download/gravity.pdf
Deriving Knowledge from Data at Scale
“Our common team blends the result of team
Gravity and team Dinosaur Planet.”
Might have guessed from the name…
When Gravity and
Dinosaurs Unite
Deriving Knowledge from Data at Scale
And, yes, the top team which is from AT&T…
“Our final solution (RMSE=0.8712) consists
of blending 107 individual results. “
BellKor / KorBell
Deriving Knowledge from Data at Scale
Clustering
Fundamental Concepts: Calculating similarity of objects described
by data; Using similarity for prediction; Clustering as similarity-
based segmentation.
Exemplary Techniques: Searching for similar entities; Nearest
neighbor methods; Clustering methods; Distance metrics for
calculating similarity.
Deriving Knowledge from Data at Scale
similar
unsupervised learning
data exploration
Deriving Knowledge from Data at Scale
Customers
Movies
I loved this movie…
The movies I watched…
You might want to
watch this movie…
You might like this one too…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
We may want to retrieve similar things directly. For example, IBM wants to find companies
that are similar to their best business customers, in order to have sales staff look at them as
prospects. Hewlett-Packard maintains many high performance servers for clients; this
maintenance is aided by a tool that, given a server configuration, retrieves information on
other similarly configured servers.
We may want to group similar items together into clusters, for example to see whether our
customer base contains groups of similar customers and what these groups have in
common.
Reasoning from similar cases of course extends beyond business applications; it is natural
to fields such as medicine and law. A doctor may reason about a new difficult case by
recalling a similar case and its diagnosis. A lawyer often argues cases by citing legal
precedents, which are similar historical cases whose dispositions were previously judged and
entered into the legal casebook.
Deriving Knowledge from Data at Scale
Successful Predictions
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
grouping within a group are
similar and different from (or unrelated to)
the objects in other groups
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
Deriving Knowledge from Data at Scale
• Outliers objects that do not belong to any cluster
outlier analysis
cluster
outliers
Deriving Knowledge from Data at Scale
data reduction
natural clusters useful outlier detection
Deriving Knowledge from Data at Scale
d(x, y) x y metric
• d(i, j)  0 non-negativity
• d(i, i) = 0 isolation
• d(i, j) = d(j, i) symmetry
• d(i, j) ≤ d(i, h)+d(h, j) triangular inequality
real,
boolean, categorical, ordinal
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Single Linkage:
Minimum distance* *
Complete Linkage:
Maximum distance* *
Average Linkage:
Average distance*
*
*
*
Wards method:
Minimization of
within-cluster variance
*
*
*
*
*
¤
*
* *
*
¤
Centroid method:
Distance between
centres
*
*
*
* *
**
*
* *
¤ ¤
Non overlapping Overlapping
Hierarchical Non-hierarchical
1a 1b
1c
1a 1b
1b1
1b22
Agglomerative Divisive
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Single Linkage:
Minimum distance* *
Complete Linkage:
Maximum distance* *
Average Linkage:
Average distance*
*
*
*
Wards method:
Minimization of
within-cluster variance
*
*
*
*
*
¤
*
* *
*
¤
Centroid method:
Distance between
centres
*
*
*
* *
**
*
* *
¤ ¤
Non overlapping Overlapping
Hierarchical Non-hierarchical
1a 1b
1c
1a 1b
1b1
1b22
Agglomerative Divisive
Deriving Knowledge from Data at Scale
centroid
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Sub-optimal Clustering
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Optimal Clustering
Original Points
Deriving Knowledge from Data at Scale
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
Deriving Knowledge from Data at Scale
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
Deriving Knowledge from Data at Scale
 

K
i Cx
i
i
xmdistSSE
1
2
),(
Deriving Knowledge from Data at Scale
• Boolean Values
• Categories
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Opening Discussion 30 minutes
Review Discussion…
• Data ScienceHands On 60 minutes
• Break 5 minutes
• Data Science Modelling 30 minutes
Model performance evaluation…
• Machine Learning Boot Camp ~60 minutes
Clustering, k-Means…
• Close
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
evaluates
• Use training set
Supplied test Percentage split
• Classes to clusters
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Note, some implementations of K-means
only allow numerical values so it may be
necessary to convert categorical to binary.
Also, normalize attributes on very differently
scales (age and income).
Deriving Knowledge from Data at Scale
hands on…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Some final takeaways from this model: The power of clustering and Nearest
Neighbor becomes obvious when we talk about data sets like Netflix and
Amazon. Amazon with their ~100 million users and Netflix with their 4 Billion
streamed moves, their algorithms are very accurate since there are likely many
potential customers in their databases with similar buying/viewing habits to
you. Thus, the nearest neighbor to yourself is likely very similar. This creates an
accurate and effective model.
Contrarily, the model breaks down quickly and becomes inaccurate when you
have few data points for comparison. In the early stages of an online e-
commerce store for example, when there are only 50 customers, a product
recommendation feature will likely not be accurate at all, as the nearest
neighbor may in fact be very distant from yourself.
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
10 Minute Break…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
•
biased majority class
• reduce
error rate
•
Deriving Knowledge from Data at Scale
synthetic samples
• Controlling amount placement
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
oversampling minority class
random undersampling majority class
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
: Minority sample
: Synthetic sample
… But what if there
is a majority sample
Nearby?
: Majority sample
Deriving Knowledge from Data at Scale
Let’s try it
Deriving Knowledge from Data at Scale
10 Minute Break…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• It depends
one more
example right than you did
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale92
Deriving Knowledge from Data at Scale
No Prob Target CustID Age
1 0.97 Y 1746 …
2 0.95 N 1024 …
3 0.94 Y 2478 …
4 0.93 Y 3820 …
5 0.92 N 4897 …
… … … …
99 0.11 N 2734 …
100 0.06 N 2422
Use a model to assign score (probability) to each instance
Sort instances by decreasing score
Expect more targets (hits) near the top of the list
3 hits in top 5% of
the list
If there 15 targets
overall, then top 5
has 3/15=20% of
targets
Deriving Knowledge from Data at Scale
40% of responses for
10% of cost
Lift factor = 4
80% of responses for
40% of cost
Lift factor = 2
Model
Random
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Precision and Recall
Deriving Knowledge from Data at Scale
Once you can compute precision and recall, you are often able to produce
precision/recall curves. Suppose that you are attempting to identify spam. You
run a learning algorithm to make predictions on a test set. But instead of just
taking a “yes/no” answer, you allow your algorithm to produce its confidence.
For instance, using a perceptron, you might use the distance from the
hyperplane as a confidence measure. You can then sort all of your test emails
according to this ranking. You may put the most spam-like emails at the top
and the least spam-like emails at the bottom
Deriving Knowledge from Data at Scale
Once you can compute precision and recall, you are often able to produce precision/recall curves. Suppose
that you are attempting to identify spam. You run a learning algorithm to make predictions on a test set. But
instead of just taking a “yes/no” answer, you allow your algorithm to produce its confidence. For instance, using
a perceptron, you might use the distance from the hyperplane as a confidence measure. You can then sort all
of your test emails according to this ranking. You may put the most spam-like emails at the top and the least
spam-like emails at the bottom
Once you have this sorted list, you can choose how aggressively you want your
spam filter to be by setting a threshold anywhere on this list. One would hope
that if you set the threshold very high, you are likely to have high precision (but
low recall). If you set the threshold very low, you’ll have high recall (but low
precision). By considering every possible place you could put this threshold,
you can trace out a curve of precision/recall values, like the one in Figure 4.15.
This allows us to ask the question: for some fixed precision, what sort of
recall can I get…
Deriving Knowledge from Data at Scale
Sometimes we want a single number that informs us of the quality of the
solution. A popular way to combe precision and recall into a single number is
by taking their harmonic mean. This is known as the balanced f-measure:
The reason to use a harmonic mean rather than an arithmetic mean is that it
favors systems that achieve roughly equal precision and recall. In the extreme
case where P = R, then F = P = R. But in the imbalanced case, for instance P =
0.1 and R = 0.9, the overall f-measure is a modest 0.18.
Deriving Knowledge from Data at Scale
depend crucially on which class is considered
not the case that precision on the flipped task is equal to recall
on the original task
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
blue dominates red and green
neither red nor green dominate the other
You could get the best of the red and
green curves by making a hybrid
classifier that switches between
strategies at the cross-over points.
Deriving Knowledge from Data at Scale
Suppose you have a test for Alzheimer’s whose false
positive rate can be varied from 5% to 25% as the
false negative rate varies from 25% to 5% (suppose
linear dependences on both):
You try the test on a population of 10,000 people, 1%
of whom actually are Alzheimer’s positive:
Deriving Knowledge from Data at Scale
Area under the
ROC curve =
AUC
• Area under the ROC curve (AUC) is a
measure of the model performance
0.5 𝑟𝑎𝑛𝑑𝑜𝑚 𝑚𝑜𝑑𝑒𝑙 <
𝐴𝑈𝐶 <
1 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 𝑚𝑜𝑑𝑒𝑙
• Larger the AUC, better is the model
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
to impact…
1. Build our predictive model in WEKA Explorer;
2. Use our model to score (predict) which new customers to
target in our upcoming advertising campaign;
• ARFF file manipulation (hacking), all too common pita…
• Excel manipulation to join model output with our customers list
3. Compute the lift chart to assess business impact of our
predictive model on the advertising campaign
• How are Lift charts built, of all the charts and/or performance
measures from a model this one is ‘on you’ to construct;
• Where is the business ‘bang for the buck’?
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Bagging
with replacement…
Boosting
Decision Trees:
bagging
boosting
Deriving Knowledge from Data at Scale
Decision Trees and Decision Forests
A forest is an ensemble of trees. The trees are all slightly different from one another.
terminal (leaf) node
internal
(split) node
root node0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
A general tree structure
Is top
part blue?
Is bottom
part green?
Is bottom
part blue?
A decision tree
Deriving Knowledge from Data at Scale
Decision Forest Model: the randomness model
1) Bagging (randomizing the training set)
The full training set
The randomly sampled subset of training data made available for the tree t
Forest training
Deriving Knowledge from Data at Scale
Decision Forest Model: the randomness model
The full set of all possible node test parameters
For each node the set of randomly sampled features
Randomness control parameter.
For no randomness and maximum tree correlation.
For max randomness and minimum tree correlation.
2) Randomized node optimization (RNO)
Small value of ; little tree correlation. Large value of ; large tree correlation.
The effect of
Node weak learner
Node test params
Node training
Deriving Knowledge from Data at Scale
Decision Forest Model: training and information gain
Beforesplit
Information gain
Shannon’s entropy
Node training
(for categorical, non-parametric distributions)
Split1Split2
Deriving Knowledge from Data at Scale
Why we prune…
Deriving Knowledge from Data at Scale
Classification Forest
Training data in feature space
?
?
?
Entropy of a discrete distribution
with
Classification tree
training
Obj. funct. for node j (information gain)
Training node j
Output is categorical
Input data point
Node weak learner
Predictor model (class posterior)
Model specialization for classification
( is feature response)
(discrete set)
Deriving Knowledge from Data at Scale
Classification Forest: the weak learner model
Node weak learner
Node test params
Splitting data at node j
Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section
Examples of weak learners
Feature response
for 2D example.
With a generic line in homog. coordinates.
Feature response
for 2D example.
With a matrix representing a conic.
Feature response
for 2D example.
In general may select only a very small subset of features
With or
Deriving Knowledge from Data at Scale
Classification Forest: the prediction model
What do we do at the leaf?
leaf
leaf
leaf
Prediction model: probabilistic
Deriving Knowledge from Data at Scale
Classification Forest: the ensemble model
Tree t=1 t=2 t=3
Forest output probability
The ensemble model
Deriving Knowledge from Data at Scale
Training different trees in the forest
Testing different trees in the forest
(2 videos in this page)
Classification Forest: effect of the weak learner model
Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic
• “Accuracy of prediction”
• “Quality of confidence”
• “Generalization”
Three concepts to keep in mind:
Training points
Deriving Knowledge from Data at Scale
Training different trees in the forest
Testing different trees in the forest
Classification Forest: effect of the weak learner model
Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic
Training points
Deriving Knowledge from Data at Scale
Classification Forest: effect of the weak learner model
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic
(2 videos in this page)
Training points
Deriving Knowledge from Data at Scale
Classification Forest: with >2 classes
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=3, weak learner = conic, leaf model = probabilistic
(2 videos in this page)
Training points
Deriving Knowledge from Data at Scale
Classification Forest: effect of tree depth
max tree depth, D
overfittingunderfitting
T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic
Predictor model = prob.(3 videos in this page)
Training points: 4-class mixed
Deriving Knowledge from Data at Scale
Classification Forest: analysing generalization
Parameters: T=200, D=13, w. l. = conic, predictor = prob.
(3 videos in this page)
Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gapsTestingposteriors
Deriving Knowledge from Data at Scale
Q
Deriving Knowledge from Data at Scale
Feature extraction and selection are the most important but
underrated step of machine learning. Better features are
better than better algorithms…
Deriving Knowledge from Data at Scale
That’s all for tonight….

More Related Content

What's hot

Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
Shishir Choudhary
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
Venkata Reddy Konasani
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
HJ van Veen
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Koundinya Desiraju
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Edureka!
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
Sri Ambati
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
Vishal Patel
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
BICA Labs
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
Darius Barušauskas
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen
 

What's hot (20)

Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 

Viewers also liked

Consequence of long term playing videogames
Consequence of long term playing videogamesConsequence of long term playing videogames
Consequence of long term playing videogames
Darryl Harvey
 
Convocatoria licitacions ue 2012
Convocatoria licitacions ue 2012Convocatoria licitacions ue 2012
Convocatoria licitacions ue 2012coacnet
 
prudential financial 2Q04 QFS
prudential financial 2Q04 QFSprudential financial 2Q04 QFS
prudential financial 2Q04 QFS
finance8
 
CloudShare Dev and Test SPSTCDC
CloudShare Dev and Test SPSTCDCCloudShare Dev and Test SPSTCDC
CloudShare Dev and Test SPSTCDC
Chris Riley ☁
 
Osservatorio mobile social networks final report
Osservatorio mobile social networks final reportOsservatorio mobile social networks final report
Osservatorio mobile social networks final report
Laura Cavallaro
 
6. Lampiran 4a
6. Lampiran 4a6. Lampiran 4a
6. Lampiran 4a
kabupaten_pakpakbharat
 
Ffs bop, a business opportunity like no other 12.15.16
Ffs bop, a business opportunity like no other 12.15.16Ffs bop, a business opportunity like no other 12.15.16
Ffs bop, a business opportunity like no other 12.15.16
Merlita Dela Cerna
 
Data model
Data modelData model
Data model
Syed Zaid Irshad
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
Jongwook Woo
 
VMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SANVMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SAN
VMworld
 
Documentation management system & information databases
Documentation management system & information databasesDocumentation management system & information databases
Documentation management system & information databases
Syed Zaid Irshad
 
Ppt pohon kelapa
Ppt pohon kelapaPpt pohon kelapa
Ppt pohon kelapa
ratna ainun
 
Día de exposición
Día de exposiciónDía de exposición
Día de exposición
Ana Mtez Ortega
 
U1 P1,Valores
U1 P1,ValoresU1 P1,Valores
U1 P1,Valores
richar_15
 

Viewers also liked (14)

Consequence of long term playing videogames
Consequence of long term playing videogamesConsequence of long term playing videogames
Consequence of long term playing videogames
 
Convocatoria licitacions ue 2012
Convocatoria licitacions ue 2012Convocatoria licitacions ue 2012
Convocatoria licitacions ue 2012
 
prudential financial 2Q04 QFS
prudential financial 2Q04 QFSprudential financial 2Q04 QFS
prudential financial 2Q04 QFS
 
CloudShare Dev and Test SPSTCDC
CloudShare Dev and Test SPSTCDCCloudShare Dev and Test SPSTCDC
CloudShare Dev and Test SPSTCDC
 
Osservatorio mobile social networks final report
Osservatorio mobile social networks final reportOsservatorio mobile social networks final report
Osservatorio mobile social networks final report
 
6. Lampiran 4a
6. Lampiran 4a6. Lampiran 4a
6. Lampiran 4a
 
Ffs bop, a business opportunity like no other 12.15.16
Ffs bop, a business opportunity like no other 12.15.16Ffs bop, a business opportunity like no other 12.15.16
Ffs bop, a business opportunity like no other 12.15.16
 
Data model
Data modelData model
Data model
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
VMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SANVMworld 2015: Building a Business Case for Virtual SAN
VMworld 2015: Building a Business Case for Virtual SAN
 
Documentation management system & information databases
Documentation management system & information databasesDocumentation management system & information databases
Documentation management system & information databases
 
Ppt pohon kelapa
Ppt pohon kelapaPpt pohon kelapa
Ppt pohon kelapa
 
Día de exposición
Día de exposiciónDía de exposición
Día de exposición
 
U1 P1,Valores
U1 P1,ValoresU1 P1,Valores
U1 P1,Valores
 

Similar to Barga Data Science lecture 5

Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
Joshua Bloom
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
udacity-dandsyllabus
udacity-dandsyllabusudacity-dandsyllabus
udacity-dandsyllabus
Bora Yüret
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
Tuan Yang
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
VisageCloud
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
Data Mining
Data MiningData Mining
Data Mining
SHIKHA GAUTAM
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Izwan Nizal Mohd Shaharanee
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
background.pptx
background.pptxbackground.pptx
background.pptx
KabileshCm
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
Izwan Nizal Mohd Shaharanee
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's disease
George Kalangi
 

Similar to Barga Data Science lecture 5 (20)

Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
udacity-dandsyllabus
udacity-dandsyllabusudacity-dandsyllabus
udacity-dandsyllabus
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's disease
 

More from Roger Barga

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York City
Roger Barga
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentation
Roger Barga
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
Roger Barga
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
Roger Barga
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
Roger Barga
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 

More from Roger Barga (7)

RS Barga STRATA'18 New York City
RS Barga STRATA'18 New York CityRS Barga STRATA'18 New York City
RS Barga STRATA'18 New York City
 
Barga Strata'18 presentation
Barga Strata'18 presentationBarga Strata'18 presentation
Barga Strata'18 presentation
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 

Recently uploaded

一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 

Recently uploaded (20)

一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 

Barga Data Science lecture 5

  • 1. Deriving Knowledge from Data at Scale
  • 2. Deriving Knowledge from Data at Scale Read this… A brilliant read that offers an accessible overview of predictive analytics, technical but at the same time a recreational read with ample practical examples, and it provides footnotes for further study... I highly recommend it…
  • 3. Deriving Knowledge from Data at Scale Review of Course Plan… W5: Clustering Review Clustering Assignment W6: Feature Select/Create SVMs & Regression Data Prep Assignment Kaggle Contest HW W7: SVMs Cont’d
  • 4. Deriving Knowledge from Data at Scale • Opening Discussion 30 minutes Review Discussion… • Data ScienceHands On 60 minutes • Break 5 minutes • Data Science Modelling 30 minutes Model performance evaluation… • Machine Learning Boot Camp ~60 minutes Clustering, k-Means… • Close
  • 5. Deriving Knowledge from Data at Scale • Clustering • Clustering in Weka • Class Imbalance • Performance Measures
  • 6. Deriving Knowledge from Data at Scale • Opening Discussion 30 minutes Review Discussion… • Data ScienceHands On 60 minutes • Break 5 minutes • Data Science Modelling 30 minutes Model performance evaluation… • Machine Learning Boot Camp ~60 minutes Clustering, k-Means… • Close
  • 7. Deriving Knowledge from Data at Scale To keep your sensor cheap and simple, you need to sense as few of these attributes as possible to meet the 95% requirement. Question: Which attributes should your sensor be capable of measuring?
  • 8. Deriving Knowledge from Data at Scale
  • 9. Deriving Knowledge from Data at Scale Diversity of Opinion Independence Decentralization Aggregation
  • 10. Deriving Knowledge from Data at Scale
  • 11. Deriving Knowledge from Data at Scale Began October 2006 http://www.wired.com/business/2009/09/how-the-netflix-prize-was-won/, a light read (highly suggested)
  • 12. Deriving Knowledge from Data at Scale
  • 13. Deriving Knowledge from Data at Scale from http://www.research.att.com/~volinsky/netflix/ However, improvement slowed…
  • 14. Deriving Knowledge from Data at Scale The top team posted a 8.5% improvement. Ensemble methods are the best performers…
  • 15. Deriving Knowledge from Data at Scale “Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies
  • 16. Deriving Knowledge from Data at Scale “My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
  • 17. Deriving Knowledge from Data at Scale “When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
  • 18. Deriving Knowledge from Data at Scale Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf
  • 19. Deriving Knowledge from Data at Scale “Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name… When Gravity and Dinosaurs Unite
  • 20. Deriving Knowledge from Data at Scale And, yes, the top team which is from AT&T… “Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor / KorBell
  • 21. Deriving Knowledge from Data at Scale Clustering Fundamental Concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity- based segmentation. Exemplary Techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity.
  • 22. Deriving Knowledge from Data at Scale similar unsupervised learning data exploration
  • 23. Deriving Knowledge from Data at Scale Customers Movies I loved this movie… The movies I watched… You might want to watch this movie… You might like this one too…
  • 24. Deriving Knowledge from Data at Scale
  • 25. Deriving Knowledge from Data at Scale We may want to retrieve similar things directly. For example, IBM wants to find companies that are similar to their best business customers, in order to have sales staff look at them as prospects. Hewlett-Packard maintains many high performance servers for clients; this maintenance is aided by a tool that, given a server configuration, retrieves information on other similarly configured servers. We may want to group similar items together into clusters, for example to see whether our customer base contains groups of similar customers and what these groups have in common. Reasoning from similar cases of course extends beyond business applications; it is natural to fields such as medicine and law. A doctor may reason about a new difficult case by recalling a similar case and its diagnosis. A lawyer often argues cases by citing legal precedents, which are similar historical cases whose dispositions were previously judged and entered into the legal casebook.
  • 26. Deriving Knowledge from Data at Scale Successful Predictions
  • 27. Deriving Knowledge from Data at Scale
  • 28. Deriving Knowledge from Data at Scale grouping within a group are similar and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 29. Deriving Knowledge from Data at Scale • Outliers objects that do not belong to any cluster outlier analysis cluster outliers
  • 30. Deriving Knowledge from Data at Scale data reduction natural clusters useful outlier detection
  • 31. Deriving Knowledge from Data at Scale d(x, y) x y metric • d(i, j)  0 non-negativity • d(i, i) = 0 isolation • d(i, j) = d(j, i) symmetry • d(i, j) ≤ d(i, h)+d(h, j) triangular inequality real, boolean, categorical, ordinal
  • 32. Deriving Knowledge from Data at Scale
  • 33. Deriving Knowledge from Data at Scale
  • 34. Deriving Knowledge from Data at Scale Single Linkage: Minimum distance* * Complete Linkage: Maximum distance* * Average Linkage: Average distance* * * * Wards method: Minimization of within-cluster variance * * * * * ¤ * * * * ¤ Centroid method: Distance between centres * * * * * ** * * * ¤ ¤ Non overlapping Overlapping Hierarchical Non-hierarchical 1a 1b 1c 1a 1b 1b1 1b22 Agglomerative Divisive
  • 35. Deriving Knowledge from Data at Scale
  • 36. Deriving Knowledge from Data at Scale Single Linkage: Minimum distance* * Complete Linkage: Maximum distance* * Average Linkage: Average distance* * * * Wards method: Minimization of within-cluster variance * * * * * ¤ * * * * ¤ Centroid method: Distance between centres * * * * * ** * * * ¤ ¤ Non overlapping Overlapping Hierarchical Non-hierarchical 1a 1b 1c 1a 1b 1b1 1b22 Agglomerative Divisive
  • 37. Deriving Knowledge from Data at Scale centroid
  • 38. Deriving Knowledge from Data at Scale
  • 39. Deriving Knowledge from Data at Scale -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Sub-optimal Clustering -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Optimal Clustering Original Points
  • 40. Deriving Knowledge from Data at Scale -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 6
  • 41. Deriving Knowledge from Data at Scale -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 6
  • 42. Deriving Knowledge from Data at Scale    K i Cx i i xmdistSSE 1 2 ),(
  • 43. Deriving Knowledge from Data at Scale • Boolean Values • Categories
  • 44. Deriving Knowledge from Data at Scale
  • 45. Deriving Knowledge from Data at Scale
  • 46. Deriving Knowledge from Data at Scale • Opening Discussion 30 minutes Review Discussion… • Data ScienceHands On 60 minutes • Break 5 minutes • Data Science Modelling 30 minutes Model performance evaluation… • Machine Learning Boot Camp ~60 minutes Clustering, k-Means… • Close
  • 47. Deriving Knowledge from Data at Scale
  • 48. Deriving Knowledge from Data at Scale
  • 49. Deriving Knowledge from Data at Scale
  • 50. Deriving Knowledge from Data at Scale
  • 51. Deriving Knowledge from Data at Scale
  • 52. Deriving Knowledge from Data at Scale
  • 53. Deriving Knowledge from Data at Scale
  • 54. Deriving Knowledge from Data at Scale evaluates • Use training set Supplied test Percentage split • Classes to clusters
  • 55. Deriving Knowledge from Data at Scale
  • 56. Deriving Knowledge from Data at Scale
  • 57. Deriving Knowledge from Data at Scale
  • 58. Deriving Knowledge from Data at Scale
  • 59. Deriving Knowledge from Data at Scale
  • 60. Deriving Knowledge from Data at Scale
  • 61. Deriving Knowledge from Data at Scale
  • 62. Deriving Knowledge from Data at Scale
  • 63. Deriving Knowledge from Data at Scale
  • 64. Deriving Knowledge from Data at Scale
  • 65. Deriving Knowledge from Data at Scale
  • 66. Deriving Knowledge from Data at Scale Note, some implementations of K-means only allow numerical values so it may be necessary to convert categorical to binary. Also, normalize attributes on very differently scales (age and income).
  • 67. Deriving Knowledge from Data at Scale hands on…
  • 68. Deriving Knowledge from Data at Scale
  • 69. Deriving Knowledge from Data at Scale
  • 70. Deriving Knowledge from Data at Scale Some final takeaways from this model: The power of clustering and Nearest Neighbor becomes obvious when we talk about data sets like Netflix and Amazon. Amazon with their ~100 million users and Netflix with their 4 Billion streamed moves, their algorithms are very accurate since there are likely many potential customers in their databases with similar buying/viewing habits to you. Thus, the nearest neighbor to yourself is likely very similar. This creates an accurate and effective model. Contrarily, the model breaks down quickly and becomes inaccurate when you have few data points for comparison. In the early stages of an online e- commerce store for example, when there are only 50 customers, a product recommendation feature will likely not be accurate at all, as the nearest neighbor may in fact be very distant from yourself.
  • 71. Deriving Knowledge from Data at Scale
  • 72. Deriving Knowledge from Data at Scale 10 Minute Break…
  • 73. Deriving Knowledge from Data at Scale
  • 74. Deriving Knowledge from Data at Scale
  • 75. Deriving Knowledge from Data at Scale
  • 76. Deriving Knowledge from Data at Scale
  • 77. Deriving Knowledge from Data at Scale • biased majority class • reduce error rate •
  • 78. Deriving Knowledge from Data at Scale synthetic samples • Controlling amount placement
  • 79. Deriving Knowledge from Data at Scale
  • 80. Deriving Knowledge from Data at Scale
  • 81. Deriving Knowledge from Data at Scale
  • 82. Deriving Knowledge from Data at Scale oversampling minority class random undersampling majority class
  • 83. Deriving Knowledge from Data at Scale
  • 84. Deriving Knowledge from Data at Scale
  • 85. Deriving Knowledge from Data at Scale : Minority sample : Synthetic sample … But what if there is a majority sample Nearby? : Majority sample
  • 86. Deriving Knowledge from Data at Scale Let’s try it
  • 87. Deriving Knowledge from Data at Scale 10 Minute Break…
  • 88. Deriving Knowledge from Data at Scale
  • 89. Deriving Knowledge from Data at Scale • It depends one more example right than you did
  • 90. Deriving Knowledge from Data at Scale
  • 91. Deriving Knowledge from Data at Scale
  • 92. Deriving Knowledge from Data at Scale92
  • 93. Deriving Knowledge from Data at Scale No Prob Target CustID Age 1 0.97 Y 1746 … 2 0.95 N 1024 … 3 0.94 Y 2478 … 4 0.93 Y 3820 … 5 0.92 N 4897 … … … … … 99 0.11 N 2734 … 100 0.06 N 2422 Use a model to assign score (probability) to each instance Sort instances by decreasing score Expect more targets (hits) near the top of the list 3 hits in top 5% of the list If there 15 targets overall, then top 5 has 3/15=20% of targets
  • 94. Deriving Knowledge from Data at Scale 40% of responses for 10% of cost Lift factor = 4 80% of responses for 40% of cost Lift factor = 2 Model Random
  • 95. Deriving Knowledge from Data at Scale
  • 96. Deriving Knowledge from Data at Scale
  • 97. Deriving Knowledge from Data at Scale
  • 98. Deriving Knowledge from Data at Scale
  • 99. Deriving Knowledge from Data at Scale Precision and Recall
  • 100. Deriving Knowledge from Data at Scale Once you can compute precision and recall, you are often able to produce precision/recall curves. Suppose that you are attempting to identify spam. You run a learning algorithm to make predictions on a test set. But instead of just taking a “yes/no” answer, you allow your algorithm to produce its confidence. For instance, using a perceptron, you might use the distance from the hyperplane as a confidence measure. You can then sort all of your test emails according to this ranking. You may put the most spam-like emails at the top and the least spam-like emails at the bottom
  • 101. Deriving Knowledge from Data at Scale Once you can compute precision and recall, you are often able to produce precision/recall curves. Suppose that you are attempting to identify spam. You run a learning algorithm to make predictions on a test set. But instead of just taking a “yes/no” answer, you allow your algorithm to produce its confidence. For instance, using a perceptron, you might use the distance from the hyperplane as a confidence measure. You can then sort all of your test emails according to this ranking. You may put the most spam-like emails at the top and the least spam-like emails at the bottom Once you have this sorted list, you can choose how aggressively you want your spam filter to be by setting a threshold anywhere on this list. One would hope that if you set the threshold very high, you are likely to have high precision (but low recall). If you set the threshold very low, you’ll have high recall (but low precision). By considering every possible place you could put this threshold, you can trace out a curve of precision/recall values, like the one in Figure 4.15. This allows us to ask the question: for some fixed precision, what sort of recall can I get…
  • 102. Deriving Knowledge from Data at Scale Sometimes we want a single number that informs us of the quality of the solution. A popular way to combe precision and recall into a single number is by taking their harmonic mean. This is known as the balanced f-measure: The reason to use a harmonic mean rather than an arithmetic mean is that it favors systems that achieve roughly equal precision and recall. In the extreme case where P = R, then F = P = R. But in the imbalanced case, for instance P = 0.1 and R = 0.9, the overall f-measure is a modest 0.18.
  • 103. Deriving Knowledge from Data at Scale depend crucially on which class is considered not the case that precision on the flipped task is equal to recall on the original task
  • 104. Deriving Knowledge from Data at Scale
  • 105. Deriving Knowledge from Data at Scale
  • 106. Deriving Knowledge from Data at Scale
  • 107. Deriving Knowledge from Data at Scale
  • 108. Deriving Knowledge from Data at Scale
  • 109. Deriving Knowledge from Data at Scale
  • 110. Deriving Knowledge from Data at Scale
  • 111. Deriving Knowledge from Data at Scale
  • 112. Deriving Knowledge from Data at Scale
  • 113. Deriving Knowledge from Data at Scale blue dominates red and green neither red nor green dominate the other You could get the best of the red and green curves by making a hybrid classifier that switches between strategies at the cross-over points.
  • 114. Deriving Knowledge from Data at Scale Suppose you have a test for Alzheimer’s whose false positive rate can be varied from 5% to 25% as the false negative rate varies from 25% to 5% (suppose linear dependences on both): You try the test on a population of 10,000 people, 1% of whom actually are Alzheimer’s positive:
  • 115. Deriving Knowledge from Data at Scale Area under the ROC curve = AUC • Area under the ROC curve (AUC) is a measure of the model performance 0.5 𝑟𝑎𝑛𝑑𝑜𝑚 𝑚𝑜𝑑𝑒𝑙 < 𝐴𝑈𝐶 < 1 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 𝑚𝑜𝑑𝑒𝑙 • Larger the AUC, better is the model
  • 116. Deriving Knowledge from Data at Scale
  • 117. Deriving Knowledge from Data at Scale
  • 118. Deriving Knowledge from Data at Scale to impact… 1. Build our predictive model in WEKA Explorer; 2. Use our model to score (predict) which new customers to target in our upcoming advertising campaign; • ARFF file manipulation (hacking), all too common pita… • Excel manipulation to join model output with our customers list 3. Compute the lift chart to assess business impact of our predictive model on the advertising campaign • How are Lift charts built, of all the charts and/or performance measures from a model this one is ‘on you’ to construct; • Where is the business ‘bang for the buck’?
  • 119. Deriving Knowledge from Data at Scale
  • 120. Deriving Knowledge from Data at Scale
  • 121. Deriving Knowledge from Data at Scale
  • 122. Deriving Knowledge from Data at Scale Bagging with replacement… Boosting Decision Trees: bagging boosting
  • 123. Deriving Knowledge from Data at Scale Decision Trees and Decision Forests A forest is an ensemble of trees. The trees are all slightly different from one another. terminal (leaf) node internal (split) node root node0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A general tree structure Is top part blue? Is bottom part green? Is bottom part blue? A decision tree
  • 124. Deriving Knowledge from Data at Scale Decision Forest Model: the randomness model 1) Bagging (randomizing the training set) The full training set The randomly sampled subset of training data made available for the tree t Forest training
  • 125. Deriving Knowledge from Data at Scale Decision Forest Model: the randomness model The full set of all possible node test parameters For each node the set of randomly sampled features Randomness control parameter. For no randomness and maximum tree correlation. For max randomness and minimum tree correlation. 2) Randomized node optimization (RNO) Small value of ; little tree correlation. Large value of ; large tree correlation. The effect of Node weak learner Node test params Node training
  • 126. Deriving Knowledge from Data at Scale Decision Forest Model: training and information gain Beforesplit Information gain Shannon’s entropy Node training (for categorical, non-parametric distributions) Split1Split2
  • 127. Deriving Knowledge from Data at Scale Why we prune…
  • 128. Deriving Knowledge from Data at Scale Classification Forest Training data in feature space ? ? ? Entropy of a discrete distribution with Classification tree training Obj. funct. for node j (information gain) Training node j Output is categorical Input data point Node weak learner Predictor model (class posterior) Model specialization for classification ( is feature response) (discrete set)
  • 129. Deriving Knowledge from Data at Scale Classification Forest: the weak learner model Node weak learner Node test params Splitting data at node j Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section Examples of weak learners Feature response for 2D example. With a generic line in homog. coordinates. Feature response for 2D example. With a matrix representing a conic. Feature response for 2D example. In general may select only a very small subset of features With or
  • 130. Deriving Knowledge from Data at Scale Classification Forest: the prediction model What do we do at the leaf? leaf leaf leaf Prediction model: probabilistic
  • 131. Deriving Knowledge from Data at Scale Classification Forest: the ensemble model Tree t=1 t=2 t=3 Forest output probability The ensemble model
  • 132. Deriving Knowledge from Data at Scale Training different trees in the forest Testing different trees in the forest (2 videos in this page) Classification Forest: effect of the weak learner model Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic • “Accuracy of prediction” • “Quality of confidence” • “Generalization” Three concepts to keep in mind: Training points
  • 133. Deriving Knowledge from Data at Scale Training different trees in the forest Testing different trees in the forest Classification Forest: effect of the weak learner model Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic Training points
  • 134. Deriving Knowledge from Data at Scale Classification Forest: effect of the weak learner model Training different trees in the forest Testing different trees in the forest Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic (2 videos in this page) Training points
  • 135. Deriving Knowledge from Data at Scale Classification Forest: with >2 classes Training different trees in the forest Testing different trees in the forest Parameters: T=200, D=3, weak learner = conic, leaf model = probabilistic (2 videos in this page) Training points
  • 136. Deriving Knowledge from Data at Scale Classification Forest: effect of tree depth max tree depth, D overfittingunderfitting T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic Predictor model = prob.(3 videos in this page) Training points: 4-class mixed
  • 137. Deriving Knowledge from Data at Scale Classification Forest: analysing generalization Parameters: T=200, D=13, w. l. = conic, predictor = prob. (3 videos in this page) Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gapsTestingposteriors
  • 138. Deriving Knowledge from Data at Scale Q
  • 139. Deriving Knowledge from Data at Scale Feature extraction and selection are the most important but underrated step of machine learning. Better features are better than better algorithms…
  • 140. Deriving Knowledge from Data at Scale That’s all for tonight….