SlideShare a Scribd company logo
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Feature extraction and selection are the most important but underrated step
of machine learning. Better features are better than better algorithms…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Lecture Objectives
homework
There is an order or workflow
that takes place here, don’t lose
the forest in the trees…
Deriving Knowledge from Data at Scale
Review…
Deriving Knowledge from Data at Scale
• Cluster 0 – It contains a cluster of Females with an average age of 37 who live in inner city and
possess saving account number and current account number. They are unmarried and do not have
any mortgage or pep. The average monthly income is 23,300.
• Cluster 1 - It contains a cluster of Females with an average age of 44 who live in rural area and
possess saving account number and current account number. They are married and do not have
any mortgage or pep. The average monthly income is 27,772.
• Cluster 2 - It contains a cluster of Females with an average age of 48 who live in inner city and
possess current account number but no saving account number. They are unmarried and do not
have mortgage but do have pep. The average monthly income is 27,668.
• Cluster 3 - It contains a cluster of Females with an average age of 39 who live in town and possess
saving account number and current account number. They are married and do not have any
mortgage or pep. The average monthly income is 24,047.
• Cluster 4 - It contains a cluster of Males with an average age of 39 who live in inner city and
possess current account number but no saving account number. They are married and have
mortgage and pep. The average monthly income is 26,359.
• Cluster 5 - It contains a cluster of Males with an average age of 47 who live in inner city and
possess saving account number and current account number. They are unmarried and do not have
mortgage but do have pep. The average monthly income is 35,419.
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Classifiers  Lazy –> IBk
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale15
Deriving Knowledge from Data at Scale
No Prob Target CustID Age
1 0.97 Y 1746 …
2 0.95 N 1024 …
3 0.94 Y 2478 …
4 0.93 Y 3820 …
5 0.92 N 4897 …
… … … …
99 0.11 N 2734 …
100 0.06 N 2422
Use a model to assign score (probability) to each instance
Sort instances by decreasing score
Expect more targets (hits) near the top of the list
3 hits in top 5% of
the list
If there 15 targets
overall, then top 5
has 3/15=20% of
targets
Deriving Knowledge from Data at Scale
40% of responses for
10% of cost
Lift factor = 4
80% of responses for
40% of cost
Lift factor = 2
Model
Random
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
to impact…
1. Build our predictive model in WEKA Explorer;
2. Use our model to score (predict) which new customers to
target in our upcoming advertising campaign;
• ARFF file manipulation (hacking), all too common pita…
• Excel manipulation to join model output with our customers list
3. Compute the lift chart to assess business impact of our
predictive model on the advertising campaign
• How are Lift charts built, of all the charts and/or performance
measures from a model this one is ‘on you’ to construct;
• Where is the business ‘bang for the buck’?
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
You can’t turn data lead into
modeling gold – we’re data
scientists, not data alchemists…
Deriving Knowledge from Data at Scale
Motivation: Real world examples
Example (1)
Lesson: Correct data transformation is important!
Deriving Knowledge from Data at Scale
Motivation: Real world examples
Example (2): KDD Cup 2001
Lesson: A model that uses lots of features can turn out to be
very sub-optimal, however well it is designed!
Deriving Knowledge from Data at Scale
Motivation: Real world examples
Example (3)
Lesson: Feature selection can be crucial even when the
number of features is small!
Deriving Knowledge from Data at Scale
Motivation: Real world examples
Example (4)
Lesson: Variations of the same ML method can give vastly
different performances!
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Predictive modeling competitions
Deriving Knowledge from Data at Scale
Global competitions
1½ weeks 70.8%
Competition closes 77%
State of the art 70%
Predicting HIV viral load
Improved by 10%
Deriving Knowledge from Data at Scale
Mismatch between those with data and
those with the skills to analyse it
Crowdsourcing
Deriving Knowledge from Data at Scale
Forecast Error
(MASE)
Existing model
Tourism Forecasting Competition
Aug 9 2 weeks
later
1 month
later
Competition
End
Deriving Knowledge from Data at Scale
• neural networks
• logistic regression
• support vector machine
• decision trees
• ensemble methods
• adaBoost
• Bayesian networks
• genetic algorithms
• random forest
• Monte Carlo methods
• principal component analysis
• Kalman filter
• evolutionary fuzzy modeling
Users apply different techniques
Deriving Knowledge from Data at Scale
VicRoads has an algorithm they use to forecast travel time on Melbourne freeways (taking into
account time, weather, accidents, etc). Their current model is inaccurate and somewhat
useless. They want to do better (or at least find out about whether it’s possible to do better).
Deriving Knowledge from Data at Scale
1 2 3
Upload Submit Evaluate &
Exchange
Deriving Knowledge from Data at Scale
Use the wizard to post a competition
Deriving Knowledge from Data at Scale
Participants make their entries
Deriving Knowledge from Data at Scale
Competitions are judged based on predictive accuracy
Deriving Knowledge from Data at Scale
Competition Mechanics
Competitions are judged on objective criteria
Deriving Knowledge from Data at Scale
Kaggle
How They Won It…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Three Files
ford_train
• 510 Trials, ~1,200 observations each spaced by 0.1 sec -> 604,330 rows
ford_test
• 100 Trials,~1,200 observations/trial, 120,841 rows
example_submission.csv
Deriving Knowledge from Data at Scale
Junpei Komiyama (#4)
Deriving Knowledge from Data at Scale
Junpei Komiyama (#4)
Deriving Knowledge from Data at Scale
Mick Wagner (#2)
Deriving Knowledge from Data at Scale
Mick Wagner (#2)
Deriving Knowledge from Data at Scale
Inference (#1)
Deriving Knowledge from Data at Scale
VicRoads has an algorithm they use to forecast travel time on Melbourne freeways (taking into
account time, weather, accidents etc). Their current model is inaccurate and somewhat useless.
They want to do better (or at least find out about whether it’s possible to do better).
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
François GUILLEM (#14)
Deriving Knowledge from Data at Scale
#1 used Random Forests
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Homework Week 6
Monday Sept. 21st
Upload to site…
http://blog.kaggle.com/category/dojo/
Content is 10 pages of interview on how the team(s) built their models, some have multiple interviews;
You will review at least 10 interviews, bounce around do not go sequentially.
1) What model(s) did they use, 2) insights they had that influenced modeling, 3) what feature creation and
selection, 4) other observations. I will cons all these together and upload as shared document on our site.
Deriving Knowledge from Data at Scale
5 Minute Break…
Deriving Knowledge from Data at Scale
Course Project
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
https://www.kaggle.com/c/springleaf-marketing-response
not
Determine whether to send a direct mail piece to a customer
Deriving Knowledge from Data at Scale
The Data
Deriving Knowledge from Data at Scale
The Rules
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
what is the data telling you
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Data Wrangling
Deriving Knowledge from Data at Scale
Data
Acquisition
Data
Exploration
Pre-
processing
Feature and
Target
construction
Train/ Test
split
Feature
selection
Model
training
Model
scoring
Model
scoring
Evaluation
Evaluation
Compare
metrics
Deriving Knowledge from Data at Scale
• Data preparation step is by far the most time consuming step
0
10
20
30
40
50
60
70
Understanding
of Domain
Understanding
of Data
Preparation of
Data
Data Mining Evaluation of
Results
Deployment of
Results
KDDM steps
relative effort [%] Cabena et al. estimates
Shearer estimates
Cios and Kurgan estimates
Deriving Knowledge from Data at Scale
Out of Class Reading, highly recommended
Deriving Knowledge from Data at Scale
Out of Class Reading, highly recommended
Deriving Knowledge from Data at Scale
1. Do you have domain knowledge?
2. Are your features commensurate?
3. Do you suspect interdependence of features?
4. Do you need to prune the input variables
5. Do you need to assess features individually
6. Do you need a predictor?
7. Do you suspect your data is “dirty”
8. Do you know what to try first?
9. Do you have new ideas, time, computational resources, and enough examples?
10. Do you want a stable solution
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
15 15
𝑃 = 0.5
𝑃 = 0.5
Deriving Knowledge from Data at Scale
15 157 13
𝑃 = 0.5
𝑃 = 0.5
𝑃 = 0.35
𝑃 = 0.65
Deriving Knowledge from Data at Scale
15 1515 15
𝑃 = 0.5
𝑃 = 0.510 10
Deriving Knowledge from Data at Scale
15 1515 15
𝑃 = 0.5
𝑃 = 0.5
Time
T
r
a
i
n
T
e
s
t
Horizontal
Vertical
Deriving Knowledge from Data at Scale
Data Characterization…
Deriving Knowledge from Data at Scale
1. Unique values
2. Most frequent values
3. Highest and lowest values
4. Location and dispersion – gini, statistical test for dispersion
5. Quartiles
Deriving Knowledge from Data at Scale
1. Missing values
2. Outliers
3. Coding
4. Constraints
Deriving Knowledge from Data at Scale
Missing values – UCI machine learning repository, 31 of 68 data sets
reported to have missing values. “Missing” can mean many things…
MAR: "Missing at Random":
– usually best case
– usually not true
Non-randomly missing
Presumed normal, so not measured
Causally missing
– attribute value is missing because of other attribute values (or because of
the outcome value!)
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Outliers – may indicate ‘bad data’ or it may represent
something scientifically interesting in the data…
Simple working definition: an outlier is an element of a data sequence
S that is inconsistent with expectations, based on the majority of other
elements of S.
Sources of outliers
• Measurement errors
• Other uninteresting anomalous data
• Surprising observations that may be important
Deriving Knowledge from Data at Scale
Outliers – may indicate ‘bad data’ or it may represent
something scientifically interesting in the data…
Simple working definition: an outlier is an element of a data sequence
S that is inconsistent with expectations, based on the majority of other
elements of S.
Sources of outliers
• Insurance company sees niche of sports car enthusiasts, married boomers
with kids and second family car. Low risk, lower rate to attract. Simple case
where outlier carries meaning for modeling…
Deriving Knowledge from Data at Scale
Outliers can distort the regression results. When an outlier is
included in the analysis, it pulls the regression line towards
itself. This can result in a solution that is more accurate for the
outlier, but less accurate for all the other cases in the data set.
Outliers – may indicate ‘bad data’ or it may represent
something scientifically interesting in the data…
Deriving Knowledge from Data at Scale
Identify outliers
• Question origin, domain knowledge invaluable
• Dispersion – "spread" of a data set, departure from central tendency, use a box plot…
Deal with outliers
• Winsorize – Set all outliers to a specified percentile of the data. Not
equivalent to trimming, which simply excludes data. In a Winsorized
estimator, extreme values are instead replaced by certain percentiles (the
trimmed minimum and maximum). Same as clipping in signal processing.
Outliers – may indicate ‘bad data’ or it may represent
something scientifically interesting in the data…
Deriving Knowledge from Data at Scale
Identify outliers
• Question origin, domain knowledge invaluable
• Dispersion – "spread" of a data set, departure from central tendency, use a box plot…
Deal with outliers
• Include – Robust statistics, a convenient way to summarize results when
they include a small proportion of outliers. A hot topic for research, see
NIPS 2010 Workshop, Robust Statistical learning (robustml).
Outliers – may indicate ‘bad data’ or it may represent
something scientifically interesting in the data…
Deriving Knowledge from Data at Scale
• Entity integrity
• Referential integrity
• Type checking
• Format
• Bounds checking
Constraints
Deriving Knowledge from Data at Scale
• weka.filters.unsupervised.instance.RemoveMisclassified
• weka.filters.unsupervised.instance.RemovePercentage
• weka.filters.unsupervised.instance.RemoveRange
• weka.filters.unsupervised.instance.RemoveWithValues
• weka.filters.unsupervised.instance.Resample
Deriving Knowledge from Data at Scale
5 Minute Break…
Deriving Knowledge from Data at Scale
Simple Definition
feature selection problem
Feature extraction
11 .
{ ,..., ,..., } { ,..., ,..., }j mi n i i if selection
f f f f f f
F
F‘ F F‘
1 1 1 1 1.
{ ,..., ,..., } { ( ,..., ),..., ( ,..., ),..., ( ,..., )}i n n j n m nf extraction
f f f g f f g f f g f f
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
3 types of methods
Filter Methods
Wrapper Methods
Embedded Methods
decision trees, random forests
Deriving Knowledge from Data at Scale
Most learning methods implicitly do feature selection:
• Decision Trees: use info gain or gain ratio to decide what attributes to use as
tests. Many features don’t get used.
• neural nets: backprop learns strong connections to some inputs, and near-
zero connections to other inputs.
• kNN, MBL (any similarity based learning): weights in Weighted Euclidean
Distance determine how important each feature is. Weights near zero mean
feature is not used.
• SVMs: maximum margin hyperplane may focus on important features,
ignore irrelevant features.
So why do we need feature selection?
Data Integration
Deriving Knowledge from Data at Scale
Curse of Dimensionality
exponentially
In many cases the information lost by
discarding variables is made up for by a
more accurate mapping/sampling in the
lower-dimensional space !
Deriving Knowledge from Data at Scale
Feature Selection and Engineering
Optimality?
This deserves a deeper treatment, which we will cover next week with
hands-on exercises in class…
Deriving Knowledge from Data at Scale
Numerical data
• Binning – a mapping to discrete categories;
• Recenter – shift by c where max, min, avg and median shift, the range and
standard deviation will not shift;
• Rescale – multiply everything by d, all measures change;
• Standard ND – recenter, make mean 0, divide all previous values by SD
Character data
• Lower case
• Spellcheck
• Data extraction (e.g. regular expressions)
Coding – shape and enrich…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
feature
red
blue
green
red
red
green
blue
red blue green
1 0 0
0 1 0
0 0 1
1 0 0
1 0 0
0 0 1
0 1 0
Deriving Knowledge from Data at Scale
Outlook T emperature Humidity Windy Play
sunny 85 85 false no
sunny 80 90 true no
overcast 83 78 false yes
rain 70 96 false yes
rain 68 80 false yes
rain 65 70 true no
overcast 64 65 true yes
sunny 72 95 false no
sunny 69 70 false yes
rain 75 80 false yes
sunny 75 70 true yes
overcast 72 90 true yes
overcast 81 75 false yes
rain 71 80 true no
Attributes:
Outlook (overcast, rain, sunny)
Temperature real
Humidity real
Windy (true, false)
Play (yes, no)
OutLook OutLook OutLook Temp Humidity Windy Windy Play Play
overcast rain sunny TRUE FALSE yes no
0 0 1 85 85 0 1 1 0
0 0 1 80 90 1 0 0 1
1 0 0 83 78 0 1 1 0
0 1 0 70 96 0 1 1 0
0 1 0 68 80 0 1 1 0
0 1 0 65 70 1 0 0 1
1 0 0 64 65 1 0 1 0
. . . . . . . . .
. . . . . . . . .
Standard
Spreadsheet
Format
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Household income
$10.000 $200.000
very
low
low average high very
high
Deriving Knowledge from Data at Scale
Less features, more discrimination ability
concept hierarchies
Deriving Knowledge from Data at Scale
• Equal-width (distance) partitioning
uniform grid
• Equal-depth (frequency) partitioning
• Class label based partitioning
Deriving Knowledge from Data at Scale
into the user-
specified
Deriving Knowledge from Data at Scale
[64,67) [67,70) [70,73) [73,76) [76,79) [79,82) [82,85]
Temperature values:
64 65 68 69 70 71 72 72 75 75 80 81 83 85
2 2
Count
4
2 2 20
Deriving Knowledge from Data at Scale
[0 – 200,000) … ….
1
Count
Salary in a corporation
[1,800,000 –
2,000,000]
Deriving Knowledge from Data at Scale
user-specified nFi number of
intervals
Deriving Knowledge from Data at Scale
[64 .. .. .. .. 69] [70 .. 72] [73 .. .. .. .. .. .. .. .. 81] [83 .. 85]
Temperature values:
64 65 68 69 70 71 72 72 75 75 80 81 83 85
4
Count
4 4
2
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 119
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 120
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 121
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 122
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 123
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 124
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 125
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 126
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 127
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 128
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 129
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 130
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 131
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 132
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 133
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 134
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 135
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 136
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 137
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 138
Deriving Knowledge from Data at Scale
4/12/2016 University of Waikato 139
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Domain expertise, play a hunch in terms of feature discrimination
Deriving Knowledge from Data at Scale
That’s all for tonight….

More Related Content

What's hot

Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
Roger Barga
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
Roger Barga
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
Roger Barga
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
Roger Barga
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
Roger Barga
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
Shishir Choudhary
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
Michał Łopuszyński
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Edureka!
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
Vishal Patel
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
Mark West
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
Poo Kuan Hoong
 

What's hot (20)

Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
Exploring the Data science Process
Exploring the Data science ProcessExploring the Data science Process
Exploring the Data science Process
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 

Viewers also liked

Kajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 years
KAJUL VERMA
 
Toxoplasmosis
ToxoplasmosisToxoplasmosis
Toxoplasmosis
degatitos
 
Giacomo
GiacomoGiacomo
Rfs Overhead Presentation[1]
Rfs Overhead Presentation[1]Rfs Overhead Presentation[1]
Rfs Overhead Presentation[1]
guest6e5f8
 
21. Kecamatan kerajaan
21. Kecamatan kerajaan21. Kecamatan kerajaan
21. Kecamatan kerajaan
kabupaten_pakpakbharat
 
Participantes Concurso Miradas DeGatitos 2013 Homenaje al Día del Animal
Participantes Concurso Miradas DeGatitos 2013 Homenaje al Día del AnimalParticipantes Concurso Miradas DeGatitos 2013 Homenaje al Día del Animal
Participantes Concurso Miradas DeGatitos 2013 Homenaje al Día del Animal
degatitos
 
Slides jose falck zepeda nas study economics december 2016 original submtted
Slides jose falck zepeda nas study economics december 2016 original submttedSlides jose falck zepeda nas study economics december 2016 original submtted
Slides jose falck zepeda nas study economics december 2016 original submtted
Jose Falck Zepeda
 
Rescate y adopcion canina
Rescate y adopcion caninaRescate y adopcion canina
Rescate y adopcion canina
tatycagiraldo
 
Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...
Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...
Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...
f.micali
 
ENBE infographic Pop-out Poster
ENBE infographic Pop-out PosterENBE infographic Pop-out Poster
ENBE infographic Pop-out Poster
Darryl Harvey
 
Neev QA Offering
Neev QA OfferingNeev QA Offering
Neev QA Offering
Neev Technologies
 
Presentación sustento teórico
Presentación sustento teóricoPresentación sustento teórico
Presentación sustento teórico
Ada Luz Olivares
 
Отчет о работе Школы кадрового резерва, 2000-2015
Отчет о работе Школы кадрового резерва, 2000-2015Отчет о работе Школы кадрового резерва, 2000-2015
Отчет о работе Школы кадрового резерва, 2000-2015
nizhgma.ru
 
Francesco Micali : Formazione aziendale social media - Mediabeta srl
Francesco Micali : Formazione aziendale social media - Mediabeta srlFrancesco Micali : Formazione aziendale social media - Mediabeta srl
Francesco Micali : Formazione aziendale social media - Mediabeta srl
f.micali
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple data
Syed Zaid Irshad
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
Jongwook Woo
 
Distributed system
Distributed systemDistributed system
Distributed system
Syed Zaid Irshad
 
V slide budidaya sayuran
V slide budidaya sayuranV slide budidaya sayuran
V slide budidaya sayuran
hjfgjghjghj
 
Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)
Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)
Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)
Andri_Ferdians
 

Viewers also liked (19)

Kajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 yearsKajul verma-Product Implementation Engineer_4 years
Kajul verma-Product Implementation Engineer_4 years
 
Toxoplasmosis
ToxoplasmosisToxoplasmosis
Toxoplasmosis
 
Giacomo
GiacomoGiacomo
Giacomo
 
Rfs Overhead Presentation[1]
Rfs Overhead Presentation[1]Rfs Overhead Presentation[1]
Rfs Overhead Presentation[1]
 
21. Kecamatan kerajaan
21. Kecamatan kerajaan21. Kecamatan kerajaan
21. Kecamatan kerajaan
 
Participantes Concurso Miradas DeGatitos 2013 Homenaje al Día del Animal
Participantes Concurso Miradas DeGatitos 2013 Homenaje al Día del AnimalParticipantes Concurso Miradas DeGatitos 2013 Homenaje al Día del Animal
Participantes Concurso Miradas DeGatitos 2013 Homenaje al Día del Animal
 
Slides jose falck zepeda nas study economics december 2016 original submtted
Slides jose falck zepeda nas study economics december 2016 original submttedSlides jose falck zepeda nas study economics december 2016 original submtted
Slides jose falck zepeda nas study economics december 2016 original submtted
 
Rescate y adopcion canina
Rescate y adopcion caninaRescate y adopcion canina
Rescate y adopcion canina
 
Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...
Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...
Francesco Micali : Big data per le smart cities - Mediabeta srl / Lo Stretto ...
 
ENBE infographic Pop-out Poster
ENBE infographic Pop-out PosterENBE infographic Pop-out Poster
ENBE infographic Pop-out Poster
 
Neev QA Offering
Neev QA OfferingNeev QA Offering
Neev QA Offering
 
Presentación sustento teórico
Presentación sustento teóricoPresentación sustento teórico
Presentación sustento teórico
 
Отчет о работе Школы кадрового резерва, 2000-2015
Отчет о работе Школы кадрового резерва, 2000-2015Отчет о работе Школы кадрового резерва, 2000-2015
Отчет о работе Школы кадрового резерва, 2000-2015
 
Francesco Micali : Formazione aziendale social media - Mediabeta srl
Francesco Micali : Formazione aziendale social media - Mediabeta srlFrancesco Micali : Formazione aziendale social media - Mediabeta srl
Francesco Micali : Formazione aziendale social media - Mediabeta srl
 
Single instruction multiple data
Single instruction multiple dataSingle instruction multiple data
Single instruction multiple data
 
Big Data Platform adopting Spark and Use Cases with Open Data
Big Data  Platform adopting Spark and Use Cases with Open DataBig Data  Platform adopting Spark and Use Cases with Open Data
Big Data Platform adopting Spark and Use Cases with Open Data
 
Distributed system
Distributed systemDistributed system
Distributed system
 
V slide budidaya sayuran
V slide budidaya sayuranV slide budidaya sayuran
V slide budidaya sayuran
 
Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)
Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)
Hasil Observasi Tanaman Kelapa (Bhs. Indonesia)
 

Similar to Barga Data Science lecture 6

Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptx
ImXaib
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptx
AmjadAlDgour
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
Si Krishan
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
Dr. C.V. Suresh Babu
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...
Rodger Devine
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
IRJET Journal
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real World
Neo4j
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Boston Institute of Analytics
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Rahul Bhatia
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
Ishan Awadhesh
 
Capstone project- Airbnb (using Python)
Capstone project- Airbnb (using Python)Capstone project- Airbnb (using Python)
Capstone project- Airbnb (using Python)
Sophie (C.F.) Tsai
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
Souma Maiti
 
How Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataHow Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with Data
Ta-Wei (David) Huang
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Hyderabad Scalability Meetup
 

Similar to Barga Data Science lecture 6 (20)

Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptx
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptx
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...
Data Science for Fundraising: Build Data-Driven Solutions Using R - Rodger De...
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Graphs in the Real World
Graphs in the Real WorldGraphs in the Real World
Graphs in the Real World
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Capstone project- Airbnb (using Python)
Capstone project- Airbnb (using Python)Capstone project- Airbnb (using Python)
Capstone project- Airbnb (using Python)
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
How Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataHow Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with Data
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 

Recently uploaded

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 

Recently uploaded (20)

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 

Barga Data Science lecture 6

  • 1. Deriving Knowledge from Data at Scale
  • 2. Deriving Knowledge from Data at Scale Feature extraction and selection are the most important but underrated step of machine learning. Better features are better than better algorithms…
  • 3. Deriving Knowledge from Data at Scale
  • 4. Deriving Knowledge from Data at Scale
  • 5. Deriving Knowledge from Data at Scale Lecture Objectives homework There is an order or workflow that takes place here, don’t lose the forest in the trees…
  • 6. Deriving Knowledge from Data at Scale Review…
  • 7. Deriving Knowledge from Data at Scale • Cluster 0 – It contains a cluster of Females with an average age of 37 who live in inner city and possess saving account number and current account number. They are unmarried and do not have any mortgage or pep. The average monthly income is 23,300. • Cluster 1 - It contains a cluster of Females with an average age of 44 who live in rural area and possess saving account number and current account number. They are married and do not have any mortgage or pep. The average monthly income is 27,772. • Cluster 2 - It contains a cluster of Females with an average age of 48 who live in inner city and possess current account number but no saving account number. They are unmarried and do not have mortgage but do have pep. The average monthly income is 27,668. • Cluster 3 - It contains a cluster of Females with an average age of 39 who live in town and possess saving account number and current account number. They are married and do not have any mortgage or pep. The average monthly income is 24,047. • Cluster 4 - It contains a cluster of Males with an average age of 39 who live in inner city and possess current account number but no saving account number. They are married and have mortgage and pep. The average monthly income is 26,359. • Cluster 5 - It contains a cluster of Males with an average age of 47 who live in inner city and possess saving account number and current account number. They are unmarried and do not have mortgage but do have pep. The average monthly income is 35,419.
  • 8. Deriving Knowledge from Data at Scale
  • 9. Deriving Knowledge from Data at Scale Classifiers  Lazy –> IBk
  • 10. Deriving Knowledge from Data at Scale
  • 11. Deriving Knowledge from Data at Scale
  • 12. Deriving Knowledge from Data at Scale
  • 13. Deriving Knowledge from Data at Scale
  • 14. Deriving Knowledge from Data at Scale
  • 15. Deriving Knowledge from Data at Scale15
  • 16. Deriving Knowledge from Data at Scale No Prob Target CustID Age 1 0.97 Y 1746 … 2 0.95 N 1024 … 3 0.94 Y 2478 … 4 0.93 Y 3820 … 5 0.92 N 4897 … … … … … 99 0.11 N 2734 … 100 0.06 N 2422 Use a model to assign score (probability) to each instance Sort instances by decreasing score Expect more targets (hits) near the top of the list 3 hits in top 5% of the list If there 15 targets overall, then top 5 has 3/15=20% of targets
  • 17. Deriving Knowledge from Data at Scale 40% of responses for 10% of cost Lift factor = 4 80% of responses for 40% of cost Lift factor = 2 Model Random
  • 18. Deriving Knowledge from Data at Scale
  • 19. Deriving Knowledge from Data at Scale
  • 20. Deriving Knowledge from Data at Scale
  • 21. Deriving Knowledge from Data at Scale
  • 22. Deriving Knowledge from Data at Scale to impact… 1. Build our predictive model in WEKA Explorer; 2. Use our model to score (predict) which new customers to target in our upcoming advertising campaign; • ARFF file manipulation (hacking), all too common pita… • Excel manipulation to join model output with our customers list 3. Compute the lift chart to assess business impact of our predictive model on the advertising campaign • How are Lift charts built, of all the charts and/or performance measures from a model this one is ‘on you’ to construct; • Where is the business ‘bang for the buck’?
  • 23. Deriving Knowledge from Data at Scale
  • 24. Deriving Knowledge from Data at Scale
  • 25. Deriving Knowledge from Data at Scale
  • 26. Deriving Knowledge from Data at Scale You can’t turn data lead into modeling gold – we’re data scientists, not data alchemists…
  • 27. Deriving Knowledge from Data at Scale Motivation: Real world examples Example (1) Lesson: Correct data transformation is important!
  • 28. Deriving Knowledge from Data at Scale Motivation: Real world examples Example (2): KDD Cup 2001 Lesson: A model that uses lots of features can turn out to be very sub-optimal, however well it is designed!
  • 29. Deriving Knowledge from Data at Scale Motivation: Real world examples Example (3) Lesson: Feature selection can be crucial even when the number of features is small!
  • 30. Deriving Knowledge from Data at Scale Motivation: Real world examples Example (4) Lesson: Variations of the same ML method can give vastly different performances!
  • 31. Deriving Knowledge from Data at Scale
  • 32. Deriving Knowledge from Data at Scale Predictive modeling competitions
  • 33. Deriving Knowledge from Data at Scale Global competitions 1½ weeks 70.8% Competition closes 77% State of the art 70% Predicting HIV viral load Improved by 10%
  • 34. Deriving Knowledge from Data at Scale Mismatch between those with data and those with the skills to analyse it Crowdsourcing
  • 35. Deriving Knowledge from Data at Scale Forecast Error (MASE) Existing model Tourism Forecasting Competition Aug 9 2 weeks later 1 month later Competition End
  • 36. Deriving Knowledge from Data at Scale • neural networks • logistic regression • support vector machine • decision trees • ensemble methods • adaBoost • Bayesian networks • genetic algorithms • random forest • Monte Carlo methods • principal component analysis • Kalman filter • evolutionary fuzzy modeling Users apply different techniques
  • 37. Deriving Knowledge from Data at Scale VicRoads has an algorithm they use to forecast travel time on Melbourne freeways (taking into account time, weather, accidents, etc). Their current model is inaccurate and somewhat useless. They want to do better (or at least find out about whether it’s possible to do better).
  • 38. Deriving Knowledge from Data at Scale 1 2 3 Upload Submit Evaluate & Exchange
  • 39. Deriving Knowledge from Data at Scale Use the wizard to post a competition
  • 40. Deriving Knowledge from Data at Scale Participants make their entries
  • 41. Deriving Knowledge from Data at Scale Competitions are judged based on predictive accuracy
  • 42. Deriving Knowledge from Data at Scale Competition Mechanics Competitions are judged on objective criteria
  • 43. Deriving Knowledge from Data at Scale Kaggle How They Won It…
  • 44. Deriving Knowledge from Data at Scale
  • 45. Deriving Knowledge from Data at Scale
  • 46. Deriving Knowledge from Data at Scale Three Files ford_train • 510 Trials, ~1,200 observations each spaced by 0.1 sec -> 604,330 rows ford_test • 100 Trials,~1,200 observations/trial, 120,841 rows example_submission.csv
  • 47. Deriving Knowledge from Data at Scale Junpei Komiyama (#4)
  • 48. Deriving Knowledge from Data at Scale Junpei Komiyama (#4)
  • 49. Deriving Knowledge from Data at Scale Mick Wagner (#2)
  • 50. Deriving Knowledge from Data at Scale Mick Wagner (#2)
  • 51. Deriving Knowledge from Data at Scale Inference (#1)
  • 52. Deriving Knowledge from Data at Scale VicRoads has an algorithm they use to forecast travel time on Melbourne freeways (taking into account time, weather, accidents etc). Their current model is inaccurate and somewhat useless. They want to do better (or at least find out about whether it’s possible to do better).
  • 53. Deriving Knowledge from Data at Scale
  • 54. Deriving Knowledge from Data at Scale François GUILLEM (#14)
  • 55. Deriving Knowledge from Data at Scale #1 used Random Forests
  • 56. Deriving Knowledge from Data at Scale
  • 57. Deriving Knowledge from Data at Scale Homework Week 6 Monday Sept. 21st Upload to site… http://blog.kaggle.com/category/dojo/ Content is 10 pages of interview on how the team(s) built their models, some have multiple interviews; You will review at least 10 interviews, bounce around do not go sequentially. 1) What model(s) did they use, 2) insights they had that influenced modeling, 3) what feature creation and selection, 4) other observations. I will cons all these together and upload as shared document on our site.
  • 58. Deriving Knowledge from Data at Scale 5 Minute Break…
  • 59. Deriving Knowledge from Data at Scale Course Project
  • 60. Deriving Knowledge from Data at Scale
  • 61. Deriving Knowledge from Data at Scale https://www.kaggle.com/c/springleaf-marketing-response not Determine whether to send a direct mail piece to a customer
  • 62. Deriving Knowledge from Data at Scale The Data
  • 63. Deriving Knowledge from Data at Scale The Rules
  • 64. Deriving Knowledge from Data at Scale
  • 65. Deriving Knowledge from Data at Scale
  • 66. Deriving Knowledge from Data at Scale
  • 67. Deriving Knowledge from Data at Scale what is the data telling you
  • 68. Deriving Knowledge from Data at Scale
  • 69. Deriving Knowledge from Data at Scale
  • 70. Deriving Knowledge from Data at Scale Data Wrangling
  • 71. Deriving Knowledge from Data at Scale Data Acquisition Data Exploration Pre- processing Feature and Target construction Train/ Test split Feature selection Model training Model scoring Model scoring Evaluation Evaluation Compare metrics
  • 72. Deriving Knowledge from Data at Scale • Data preparation step is by far the most time consuming step 0 10 20 30 40 50 60 70 Understanding of Domain Understanding of Data Preparation of Data Data Mining Evaluation of Results Deployment of Results KDDM steps relative effort [%] Cabena et al. estimates Shearer estimates Cios and Kurgan estimates
  • 73. Deriving Knowledge from Data at Scale Out of Class Reading, highly recommended
  • 74. Deriving Knowledge from Data at Scale Out of Class Reading, highly recommended
  • 75. Deriving Knowledge from Data at Scale 1. Do you have domain knowledge? 2. Are your features commensurate? 3. Do you suspect interdependence of features? 4. Do you need to prune the input variables 5. Do you need to assess features individually 6. Do you need a predictor? 7. Do you suspect your data is “dirty” 8. Do you know what to try first? 9. Do you have new ideas, time, computational resources, and enough examples? 10. Do you want a stable solution
  • 76. Deriving Knowledge from Data at Scale
  • 77. Deriving Knowledge from Data at Scale
  • 78. Deriving Knowledge from Data at Scale
  • 79. Deriving Knowledge from Data at Scale 15 15 𝑃 = 0.5 𝑃 = 0.5
  • 80. Deriving Knowledge from Data at Scale 15 157 13 𝑃 = 0.5 𝑃 = 0.5 𝑃 = 0.35 𝑃 = 0.65
  • 81. Deriving Knowledge from Data at Scale 15 1515 15 𝑃 = 0.5 𝑃 = 0.510 10
  • 82. Deriving Knowledge from Data at Scale 15 1515 15 𝑃 = 0.5 𝑃 = 0.5 Time T r a i n T e s t Horizontal Vertical
  • 83. Deriving Knowledge from Data at Scale Data Characterization…
  • 84. Deriving Knowledge from Data at Scale 1. Unique values 2. Most frequent values 3. Highest and lowest values 4. Location and dispersion – gini, statistical test for dispersion 5. Quartiles
  • 85. Deriving Knowledge from Data at Scale 1. Missing values 2. Outliers 3. Coding 4. Constraints
  • 86. Deriving Knowledge from Data at Scale Missing values – UCI machine learning repository, 31 of 68 data sets reported to have missing values. “Missing” can mean many things… MAR: "Missing at Random": – usually best case – usually not true Non-randomly missing Presumed normal, so not measured Causally missing – attribute value is missing because of other attribute values (or because of the outcome value!)
  • 87. Deriving Knowledge from Data at Scale
  • 88. Deriving Knowledge from Data at Scale
  • 89. Deriving Knowledge from Data at Scale
  • 90. Deriving Knowledge from Data at Scale Outliers – may indicate ‘bad data’ or it may represent something scientifically interesting in the data… Simple working definition: an outlier is an element of a data sequence S that is inconsistent with expectations, based on the majority of other elements of S. Sources of outliers • Measurement errors • Other uninteresting anomalous data • Surprising observations that may be important
  • 91. Deriving Knowledge from Data at Scale Outliers – may indicate ‘bad data’ or it may represent something scientifically interesting in the data… Simple working definition: an outlier is an element of a data sequence S that is inconsistent with expectations, based on the majority of other elements of S. Sources of outliers • Insurance company sees niche of sports car enthusiasts, married boomers with kids and second family car. Low risk, lower rate to attract. Simple case where outlier carries meaning for modeling…
  • 92. Deriving Knowledge from Data at Scale Outliers can distort the regression results. When an outlier is included in the analysis, it pulls the regression line towards itself. This can result in a solution that is more accurate for the outlier, but less accurate for all the other cases in the data set. Outliers – may indicate ‘bad data’ or it may represent something scientifically interesting in the data…
  • 93. Deriving Knowledge from Data at Scale Identify outliers • Question origin, domain knowledge invaluable • Dispersion – "spread" of a data set, departure from central tendency, use a box plot… Deal with outliers • Winsorize – Set all outliers to a specified percentile of the data. Not equivalent to trimming, which simply excludes data. In a Winsorized estimator, extreme values are instead replaced by certain percentiles (the trimmed minimum and maximum). Same as clipping in signal processing. Outliers – may indicate ‘bad data’ or it may represent something scientifically interesting in the data…
  • 94. Deriving Knowledge from Data at Scale Identify outliers • Question origin, domain knowledge invaluable • Dispersion – "spread" of a data set, departure from central tendency, use a box plot… Deal with outliers • Include – Robust statistics, a convenient way to summarize results when they include a small proportion of outliers. A hot topic for research, see NIPS 2010 Workshop, Robust Statistical learning (robustml). Outliers – may indicate ‘bad data’ or it may represent something scientifically interesting in the data…
  • 95. Deriving Knowledge from Data at Scale • Entity integrity • Referential integrity • Type checking • Format • Bounds checking Constraints
  • 96. Deriving Knowledge from Data at Scale • weka.filters.unsupervised.instance.RemoveMisclassified • weka.filters.unsupervised.instance.RemovePercentage • weka.filters.unsupervised.instance.RemoveRange • weka.filters.unsupervised.instance.RemoveWithValues • weka.filters.unsupervised.instance.Resample
  • 97. Deriving Knowledge from Data at Scale 5 Minute Break…
  • 98. Deriving Knowledge from Data at Scale Simple Definition feature selection problem Feature extraction 11 . { ,..., ,..., } { ,..., ,..., }j mi n i i if selection f f f f f f F F‘ F F‘ 1 1 1 1 1. { ,..., ,..., } { ( ,..., ),..., ( ,..., ),..., ( ,..., )}i n n j n m nf extraction f f f g f f g f f g f f
  • 99. Deriving Knowledge from Data at Scale
  • 100. Deriving Knowledge from Data at Scale 3 types of methods Filter Methods Wrapper Methods Embedded Methods decision trees, random forests
  • 101. Deriving Knowledge from Data at Scale Most learning methods implicitly do feature selection: • Decision Trees: use info gain or gain ratio to decide what attributes to use as tests. Many features don’t get used. • neural nets: backprop learns strong connections to some inputs, and near- zero connections to other inputs. • kNN, MBL (any similarity based learning): weights in Weighted Euclidean Distance determine how important each feature is. Weights near zero mean feature is not used. • SVMs: maximum margin hyperplane may focus on important features, ignore irrelevant features. So why do we need feature selection? Data Integration
  • 102. Deriving Knowledge from Data at Scale Curse of Dimensionality exponentially In many cases the information lost by discarding variables is made up for by a more accurate mapping/sampling in the lower-dimensional space !
  • 103. Deriving Knowledge from Data at Scale Feature Selection and Engineering Optimality? This deserves a deeper treatment, which we will cover next week with hands-on exercises in class…
  • 104. Deriving Knowledge from Data at Scale Numerical data • Binning – a mapping to discrete categories; • Recenter – shift by c where max, min, avg and median shift, the range and standard deviation will not shift; • Rescale – multiply everything by d, all measures change; • Standard ND – recenter, make mean 0, divide all previous values by SD Character data • Lower case • Spellcheck • Data extraction (e.g. regular expressions) Coding – shape and enrich…
  • 105. Deriving Knowledge from Data at Scale
  • 106. Deriving Knowledge from Data at Scale feature red blue green red red green blue red blue green 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0
  • 107. Deriving Knowledge from Data at Scale Outlook T emperature Humidity Windy Play sunny 85 85 false no sunny 80 90 true no overcast 83 78 false yes rain 70 96 false yes rain 68 80 false yes rain 65 70 true no overcast 64 65 true yes sunny 72 95 false no sunny 69 70 false yes rain 75 80 false yes sunny 75 70 true yes overcast 72 90 true yes overcast 81 75 false yes rain 71 80 true no Attributes: Outlook (overcast, rain, sunny) Temperature real Humidity real Windy (true, false) Play (yes, no) OutLook OutLook OutLook Temp Humidity Windy Windy Play Play overcast rain sunny TRUE FALSE yes no 0 0 1 85 85 0 1 1 0 0 0 1 80 90 1 0 0 1 1 0 0 83 78 0 1 1 0 0 1 0 70 96 0 1 1 0 0 1 0 68 80 0 1 1 0 0 1 0 65 70 1 0 0 1 1 0 0 64 65 1 0 1 0 . . . . . . . . . . . . . . . . . . Standard Spreadsheet Format
  • 108. Deriving Knowledge from Data at Scale
  • 109. Deriving Knowledge from Data at Scale
  • 110. Deriving Knowledge from Data at Scale Household income $10.000 $200.000 very low low average high very high
  • 111. Deriving Knowledge from Data at Scale Less features, more discrimination ability concept hierarchies
  • 112. Deriving Knowledge from Data at Scale • Equal-width (distance) partitioning uniform grid • Equal-depth (frequency) partitioning • Class label based partitioning
  • 113. Deriving Knowledge from Data at Scale into the user- specified
  • 114. Deriving Knowledge from Data at Scale [64,67) [67,70) [70,73) [73,76) [76,79) [79,82) [82,85] Temperature values: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 2 2 Count 4 2 2 20
  • 115. Deriving Knowledge from Data at Scale [0 – 200,000) … …. 1 Count Salary in a corporation [1,800,000 – 2,000,000]
  • 116. Deriving Knowledge from Data at Scale user-specified nFi number of intervals
  • 117. Deriving Knowledge from Data at Scale [64 .. .. .. .. 69] [70 .. 72] [73 .. .. .. .. .. .. .. .. 81] [83 .. 85] Temperature values: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 4 Count 4 4 2
  • 118. Deriving Knowledge from Data at Scale
  • 119. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 119
  • 120. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 120
  • 121. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 121
  • 122. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 122
  • 123. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 123
  • 124. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 124
  • 125. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 125
  • 126. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 126
  • 127. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 127
  • 128. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 128
  • 129. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 129
  • 130. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 130
  • 131. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 131
  • 132. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 132
  • 133. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 133
  • 134. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 134
  • 135. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 135
  • 136. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 136
  • 137. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 137
  • 138. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 138
  • 139. Deriving Knowledge from Data at Scale 4/12/2016 University of Waikato 139
  • 140. Deriving Knowledge from Data at Scale
  • 141. Deriving Knowledge from Data at Scale Domain expertise, play a hunch in terms of feature discrimination
  • 142. Deriving Knowledge from Data at Scale That’s all for tonight….