SlideShare a Scribd company logo
Class summary
BigML, Inc 2
Day 2 – Morning sessions
BigML, Inc 3
Basic transformations
Expectations
Poul Petersen
Reality
$
ML­ready data needs work!!!
Any data is always ML­ready
What does ML­ready mean?
● Machine Learning algorithms consume instances of the question that you want to 
model. Each row must describe one of the instances and each column a property 
of the instance
● Fields can be:
– already present in your data
– derived from your data
– generated using other fields
BigML, Inc 4
Basic transformations
●
Select  the  right  model  for  the  problem  you  want  to  solve: 
Classification,  regression,  cluster  analysis,  anomaly  detection, 
association discovery
●
Perform  cleansing,  denormalizing,  aggregating,  pivoting,  and 
other  data  wrangling  tasks  to  generate  a  collection  of  instances 
relevant to the problem at hand. Finally use a very common format as 
output format: CSV
●
 Choose the right format to store each type of feature into a field
●
Feature  engineering:  Using  domain  knowledge  and  Machine 
Learning  expertise,  generate  explicit  features  that  help  to  better 
represent the instances (Flatline)
ML-ready steps
BigML, Inc 5
Basic transformations
Cleansing:  Homogenize  missing  values  and  different  types  in  the 
same feature, fix input errors, correct semantic issues, etc.
Denormalizing:  Data  is  usually  normalized  in  relational  databases, 
ML­Ready  datasets  need  the  information  de­normalized  in  a  single 
file/dataset.
Aggregation:  When  data  is  stored  as  individual  transactions,  as  in 
log files, an aggregation to get the entity might be needed
Pivoting: Different values of a feature are pivoted to new columns in 
the result dataset
Regular  time  windows:  Create  new  features  using  values  over 
different periods of time
Preprocessing data
BigML, Inc 6
Basic transformations
For numeric features: 
– Discretization: percentiles, within percentiles, groups
– Replacement
– Normalization
– Exponentiation
– Shocks (speed of change compared to stdev)
For text features:
– Mispellings
– Length
– Number of subordinate sentences
– Language
– Levenshtein distance
Stacking
Compute a field using non­linear combinations of other fields
Feature engineering
BigML, Inc 7
Basic transformations
●
Define a clear idea of the goal.
●
Understand what ML tasks will achieve the goal.
●
Understand the data structure to perform those ML tasks.
●
Find out what kind of data you have and make it ML­Ready
– where is it, how is it stored?
– what are the features?
– can you access it programmatically?
●
Feature Engineering: transform the data you have into the
data you actually need.
●
Evaluate: Try it on a small scale
●
Accept that you might have to start over….
●
But when it works, automate it!!!
Holistic approach
BigML, Inc 8
Basic transformations
Command line tools:
join, jq, awk, sed, sort, uniq
Automation:
Shell, Python, etc.
Talend
BigML: flatline, bindings, bigmler, API, whizzml
Relational Db:
MySQL
Non­Relational Db:
MongoDB
Tools that help
BigML, Inc 9
Feature Engineering
Data + ML Algorithm, is that enough?
The  ML  Algorithm  only  knows  about  the  features  in  the  dataset. 
Features can be useless to the algorithm if:
●
They are not correlated to the objective to be predicted
●
Their  values  change  their  meaning  when  combined  with  other 
features
For  ML  Algorithms  to  work  there  must  be  some  kind  of  statistical 
relation  between  some  of  the  features  and  the  objective. 
Sometimes,  you  must  transform  the  available  features  to  find  such 
relations
Feature  engineering:  the  process  of  transforming  raw  data  into 
machine learning ready­data
Charles Parker
BigML, Inc 10
Feature Engineering
When do you need Feature Engineering?
●
When  the  relationship  between  the  feature  and  the 
objective is mathematically unsatisfying
●
When  the  relationship  of  a  function  of  two  or  more 
features  with  the  objective  is  far  more  relevant  than  the 
one of the original features
●
When there is missing data
●
When  the  data  is  time­series,  especially  when  the 
previous time period’s objective is known
●
When the data can’t be used for machine learning in the 
obvious way (e.g., timestamps, text data)
BigML, Inc 11
Feature Engineering
Mathematical transformations
●
Statistical aggregations (group by, all and all­but)
●
Better categories
– too many detailed categories should be avoided
– ordered categories can be translated to numeric values. The model will be able to 
extract more information by partinioning the ordered number range
●
Binning  or  discretization:  consider  whether  your  number  is  more  informative  in 
ranges (quartiles, deciles, percentiles) even for the objective field
●
Linearization:  non­important  for  decision  trees  but  can  be  for  logistic  regression 
(watch out for exponential distributions)
Missing data
●
Missing  value  induction  (replace  missings  with  common  values:  mean,  median, 
mode, even with a Machine Learning model)
●
Missing values presence can be informative, so this can be added as a new feature
BigML, Inc 12
Feature Engineering
Time­series transformations
●
Better  objective  (percent  change  instead  of  absolute 
values)
●
Deltas from previous reference time points
●
Deltas from moving average (time windows)
●
Recent Volatility...
Problem: Exponential explosion of possible transformations
Caveats:
●
The regularity in time of the points has to match your training data
●
You have to keep track of past points to compute your windows
●
Really  easy  to  get  information  leakage  by  including  your  objective  in  a 
window computation (and can be very hard to detect)!
BigML, Inc 13
Feature Engineering
Date­time features
●
Cannot be used “as is” in a model. It's a collection of features. BigML is able to 
decompose  them  automatically  when  they  are  provided  in  the  most  usual 
formats. With Flatline, you can decompose them all.
●
Date­time predicates that the computer does not know (some of them, domain 
dependent): Working hours? Daylight? Is rush hour?...
Text features
●
Bag of words: a new feature is associated to each word in the document
●
Tokenization:  how  do  we  select  tokens?  Do  we  want  n­grams?  What  about 
numbers?
●
Stemming: grouping forms of the same word in a unique term
●
Length
●
Text predicates: Dollar amounts? Dates? Salutations? Please and Thank you?
BigML, Inc 14
Feature Engineering
Machine Learning for Feature engineering
Latent Dirichlet Allocation
• Learn word distributions for topics
• Infer topic scores for each document
•  Use  the  topic  scores  as  features  to  a  model  (dimensional 
reduction)
Distance to cluster Centroids
Stacked Generalization: Classifiers provide new features
BigML, Inc 15
Day 2 – Evening sessions
BigML, Inc 16
REST API, bindings and basic workflows
jao (José Antonio Ortega)
Academics Real world
How do Machine Learning Workflows look like?
We need high­level tools to face the real world workflows by growing in:
● Automation
● Abstraction
BigML, Inc 17
REST API, bindings and basic workflows
The foundations
●
REST  API  first  applications:  Standards  in  software  development. 
First level of abstraction
Client side tools
●
Web UI:  Sitting on top of the REST API. Human­friendly access and 
visualizations for all the Machine Learning resources. Workflows must 
be defined and executed step by step. Second level of abstraction.
●
Bindings:  Sitting on top of the REST API. Fine­grained accessors for 
the REST API calls. Workflows must be defined and executed step by 
step. Second level of abstraction.
●
BigMLer: Relying on the bindings. High­level syntax. Entire workflows 
can be created in only one command line. Third level of abstraction.
BigML, Inc 18
REST API, bindings and basic workflows
.
BigMLer automation
●
 Basic 1­click workflows in one command line
●
Rich parameterized workflows: feature selection, cross­validation, etc. 
●
Models are downloaded to your laptop, tablet, cell phone, etc. once 
and can be used offline to create predictions
Still..
Great for local predictions
BigML, Inc 19
REST API, bindings and basic workflows
.
Problems of client­side solutions
●
Complexity  Lots  of  details  outside  the  problem 
domain
●
Reuse No inter­language compatibility
●
Scalability Client­side workflows hard to optimize
●
Extensibility BigMLer hides complexity at the cost of 
flexibility
●
Not enough abstraction
BigML, Inc 20
REST API, bindings and basic workflows
.Solution: bringing automation and abstraction to the server­side
 
●
DSL for ML workflow automation
●
Framework for scalable, remote execution of ML workflows
Sophisticated server­side optimization
Out­of­the­box scalability
Client­server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
WhizzML
BigML, Inc 21
REST API, bindings and basic workflows
.
WhizzML's new REST API resources:
Scripts:  Executable  code  that  describes  an  actual 
workflow, taking a list of typed inputs and producing 
a list of outputs. 
Executions:  Given  a  script  and  a  complete  set  of 
inputs, the workflow can be executed and its outputs 
generated.
Libraries:  A  collection  of  WhizzML  definitions  that 
can be imported by other libraries or scripts.
BigML, Inc 22
REST API, bindings and basic workflows
Scripts
Creating scripts
●
Usable by any binding (from any language)
●
Built­in parallelization
●
BigML resources management as primitives of the language
●
Complete programming language for workflow definition
Using scripts
Web UI
Bindings
BigMLer
WhizzML
BigML, Inc 23
Advanced WhizzML workflows
Charles Parker
WhizzML offers:
● Primitives for all ML resources: (datasets, models, clusters, etc.)
● A complete programming language to compose at will these ML resources.
● Parallelization and Scalability built­in.
This empowers the user to benefit from:
● Automated feature engineering: Best­first feature selection.
● Automated  configuration  choice:  Randomized  parameter  optimization, 
SMACdown.
● Complex algorithms as 1­click: Stacked generalization, Boosting.
All of them can be shared, reproduced and reused as one more 
BigML resource in a language­agnostic way.
BigML, Inc 24
Advanced WhizzML workflows
f5 fn
... ...
......
... ...
f5 f7 f5 fn
... ...
......
... ...
f5 f1
Selected
fields
()
(f5)
The best score
is obtained for
the model with (f5)
The best score
is obtained for
the model with (f5 f7)
Following iterations don't improve the score for the model
with (f5 f7), so the process stops
Step 1
Step 2
f1
Best­first feature selection
BigML, Inc 25
Advanced WhizzML workflows
A new dataset is generated
with the predictions for the
hold out data
A new metamodel is created
from this dataset
50%
Hold out
Stacked generalization
BigML, Inc 26
Advanced WhizzML workflows
Configuration
random
generator
... ...
Best
score
Process stops when you reach the expected performance
or the user­given iterations limit
+
Randomized parameter optimization
BigML, Inc 27
Advanced WhizzML workflows
Configuration
random
generator
... ...
+ New configurations are filtered
according to the predictions
of the model of performances
Only promising
configurations are analyzed
SMACdown
BigML, Inc 28
Advanced WhizzML workflows
… …
The final model is an ensemble of models
T0
F0
T1
F1
T2
F2
F8
T8
Boosting
BigML, Inc 29
Advanced WhizzML workflows
Script it once, for everybody anywhere
Publish scripts 
in the gallery
Add scripts to
your menus

More Related Content

What's hot

Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
BigML, Inc
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
BigML, Inc
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Ted Xiao
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
Charles Vestur
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
Benjamin Bengfort
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
Bhaskara Reddy Sannapureddy
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
Hayim Makabee
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
Gianmario Spacagna
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actions
Hesen Peng
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
Hayim Makabee
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 

What's hot (20)

Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actions
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 

Viewers also liked

VSSML16 L6. Feature Engineering
VSSML16 L6. Feature EngineeringVSSML16 L6. Feature Engineering
VSSML16 L6. Feature Engineering
BigML, Inc
 
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BigML, Inc
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
BigML, Inc
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
BigML, Inc
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
BigML, Inc
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
BigML, Inc
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
BigML, Inc
 
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
BigML, Inc
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
BigML, Inc
 
Google TensorFlow Tutorial
Google TensorFlow TutorialGoogle TensorFlow Tutorial
Google TensorFlow Tutorial
台灣資料科學年會
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 

Viewers also liked (11)

VSSML16 L6. Feature Engineering
VSSML16 L6. Feature EngineeringVSSML16 L6. Feature Engineering
VSSML16 L6. Feature Engineering
 
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
BSSML16 L9. Advanced Workflows: Feature Selection, Boosting, Gradient Descent...
 
BSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic WorkflowsBSSML16 L8. REST API, Bindings, and Basic Workflows
BSSML16 L8. REST API, Bindings, and Basic Workflows
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
 
API, WhizzML and Apps
API, WhizzML and AppsAPI, WhizzML and Apps
API, WhizzML and Apps
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
Google TensorFlow Tutorial
Google TensorFlow TutorialGoogle TensorFlow Tutorial
Google TensorFlow Tutorial
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 

Similar to VSSML16 LR2. Summary Day 2

VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
BigML, Inc
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
Yuriy Guts
 
Practical data science
Practical data sciencePractical data science
Practical data science
Ding Li
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
jaffarbikat
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
gdgsurrey
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
jasontseng19
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
Matouš Havlena
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
Ali Raza Anjum
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
Vivek Raja P S
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
Davis David
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
DataScienceConferenc1
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Python and data analytics
Python and data analyticsPython and data analytics
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Nick Pentreath
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
BigML, Inc
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
Luminous8
 

Similar to VSSML16 LR2. Summary Day 2 (20)

VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...[DSC Europe 22] Smart approach in development and deployment process for vari...
[DSC Europe 22] Smart approach in development and deployment process for vari...
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Python and data analytics
Python and data analyticsPython and data analytics
Python and data analytics
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
 

More from BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
BigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
BigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
BigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
BigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
BigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
BigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
BigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
BigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
BigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
BigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
BigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
BigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
BigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
BigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
BigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
BigML, Inc
 

More from BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

VSSML16 LR2. Summary Day 2

  • 2. BigML, Inc 2 Day 2 – Morning sessions
  • 3. BigML, Inc 3 Basic transformations Expectations Poul Petersen Reality $ ML­ready data needs work!!! Any data is always ML­ready What does ML­ready mean? ● Machine Learning algorithms consume instances of the question that you want to  model. Each row must describe one of the instances and each column a property  of the instance ● Fields can be: – already present in your data – derived from your data – generated using other fields
  • 4. BigML, Inc 4 Basic transformations ● Select  the  right  model  for  the  problem  you  want  to  solve:  Classification,  regression,  cluster  analysis,  anomaly  detection,  association discovery ● Perform  cleansing,  denormalizing,  aggregating,  pivoting,  and  other  data  wrangling  tasks  to  generate  a  collection  of  instances  relevant to the problem at hand. Finally use a very common format as  output format: CSV ●  Choose the right format to store each type of feature into a field ● Feature  engineering:  Using  domain  knowledge  and  Machine  Learning  expertise,  generate  explicit  features  that  help  to  better  represent the instances (Flatline) ML-ready steps
  • 5. BigML, Inc 5 Basic transformations Cleansing:  Homogenize  missing  values  and  different  types  in  the  same feature, fix input errors, correct semantic issues, etc. Denormalizing:  Data  is  usually  normalized  in  relational  databases,  ML­Ready  datasets  need  the  information  de­normalized  in  a  single  file/dataset. Aggregation:  When  data  is  stored  as  individual  transactions,  as  in  log files, an aggregation to get the entity might be needed Pivoting: Different values of a feature are pivoted to new columns in  the result dataset Regular  time  windows:  Create  new  features  using  values  over  different periods of time Preprocessing data
  • 6. BigML, Inc 6 Basic transformations For numeric features:  – Discretization: percentiles, within percentiles, groups – Replacement – Normalization – Exponentiation – Shocks (speed of change compared to stdev) For text features: – Mispellings – Length – Number of subordinate sentences – Language – Levenshtein distance Stacking Compute a field using non­linear combinations of other fields Feature engineering
  • 7. BigML, Inc 7 Basic transformations ● Define a clear idea of the goal. ● Understand what ML tasks will achieve the goal. ● Understand the data structure to perform those ML tasks. ● Find out what kind of data you have and make it ML­Ready – where is it, how is it stored? – what are the features? – can you access it programmatically? ● Feature Engineering: transform the data you have into the data you actually need. ● Evaluate: Try it on a small scale ● Accept that you might have to start over…. ● But when it works, automate it!!! Holistic approach
  • 8. BigML, Inc 8 Basic transformations Command line tools: join, jq, awk, sed, sort, uniq Automation: Shell, Python, etc. Talend BigML: flatline, bindings, bigmler, API, whizzml Relational Db: MySQL Non­Relational Db: MongoDB Tools that help
  • 9. BigML, Inc 9 Feature Engineering Data + ML Algorithm, is that enough? The  ML  Algorithm  only  knows  about  the  features  in  the  dataset.  Features can be useless to the algorithm if: ● They are not correlated to the objective to be predicted ● Their  values  change  their  meaning  when  combined  with  other  features For  ML  Algorithms  to  work  there  must  be  some  kind  of  statistical  relation  between  some  of  the  features  and  the  objective.  Sometimes,  you  must  transform  the  available  features  to  find  such  relations Feature  engineering:  the  process  of  transforming  raw  data  into  machine learning ready­data Charles Parker
  • 10. BigML, Inc 10 Feature Engineering When do you need Feature Engineering? ● When  the  relationship  between  the  feature  and  the  objective is mathematically unsatisfying ● When  the  relationship  of  a  function  of  two  or  more  features  with  the  objective  is  far  more  relevant  than  the  one of the original features ● When there is missing data ● When  the  data  is  time­series,  especially  when  the  previous time period’s objective is known ● When the data can’t be used for machine learning in the  obvious way (e.g., timestamps, text data)
  • 11. BigML, Inc 11 Feature Engineering Mathematical transformations ● Statistical aggregations (group by, all and all­but) ● Better categories – too many detailed categories should be avoided – ordered categories can be translated to numeric values. The model will be able to  extract more information by partinioning the ordered number range ● Binning  or  discretization:  consider  whether  your  number  is  more  informative  in  ranges (quartiles, deciles, percentiles) even for the objective field ● Linearization:  non­important  for  decision  trees  but  can  be  for  logistic  regression  (watch out for exponential distributions) Missing data ● Missing  value  induction  (replace  missings  with  common  values:  mean,  median,  mode, even with a Machine Learning model) ● Missing values presence can be informative, so this can be added as a new feature
  • 12. BigML, Inc 12 Feature Engineering Time­series transformations ● Better  objective  (percent  change  instead  of  absolute  values) ● Deltas from previous reference time points ● Deltas from moving average (time windows) ● Recent Volatility... Problem: Exponential explosion of possible transformations Caveats: ● The regularity in time of the points has to match your training data ● You have to keep track of past points to compute your windows ● Really  easy  to  get  information  leakage  by  including  your  objective  in  a  window computation (and can be very hard to detect)!
  • 13. BigML, Inc 13 Feature Engineering Date­time features ● Cannot be used “as is” in a model. It's a collection of features. BigML is able to  decompose  them  automatically  when  they  are  provided  in  the  most  usual  formats. With Flatline, you can decompose them all. ● Date­time predicates that the computer does not know (some of them, domain  dependent): Working hours? Daylight? Is rush hour?... Text features ● Bag of words: a new feature is associated to each word in the document ● Tokenization:  how  do  we  select  tokens?  Do  we  want  n­grams?  What  about  numbers? ● Stemming: grouping forms of the same word in a unique term ● Length ● Text predicates: Dollar amounts? Dates? Salutations? Please and Thank you?
  • 14. BigML, Inc 14 Feature Engineering Machine Learning for Feature engineering Latent Dirichlet Allocation • Learn word distributions for topics • Infer topic scores for each document •  Use  the  topic  scores  as  features  to  a  model  (dimensional  reduction) Distance to cluster Centroids Stacked Generalization: Classifiers provide new features
  • 15. BigML, Inc 15 Day 2 – Evening sessions
  • 16. BigML, Inc 16 REST API, bindings and basic workflows jao (José Antonio Ortega) Academics Real world How do Machine Learning Workflows look like? We need high­level tools to face the real world workflows by growing in: ● Automation ● Abstraction
  • 17. BigML, Inc 17 REST API, bindings and basic workflows The foundations ● REST  API  first  applications:  Standards  in  software  development.  First level of abstraction Client side tools ● Web UI:  Sitting on top of the REST API. Human­friendly access and  visualizations for all the Machine Learning resources. Workflows must  be defined and executed step by step. Second level of abstraction. ● Bindings:  Sitting on top of the REST API. Fine­grained accessors for  the REST API calls. Workflows must be defined and executed step by  step. Second level of abstraction. ● BigMLer: Relying on the bindings. High­level syntax. Entire workflows  can be created in only one command line. Third level of abstraction.
  • 18. BigML, Inc 18 REST API, bindings and basic workflows . BigMLer automation ●  Basic 1­click workflows in one command line ● Rich parameterized workflows: feature selection, cross­validation, etc.  ● Models are downloaded to your laptop, tablet, cell phone, etc. once  and can be used offline to create predictions Still.. Great for local predictions
  • 19. BigML, Inc 19 REST API, bindings and basic workflows . Problems of client­side solutions ● Complexity  Lots  of  details  outside  the  problem  domain ● Reuse No inter­language compatibility ● Scalability Client­side workflows hard to optimize ● Extensibility BigMLer hides complexity at the cost of  flexibility ● Not enough abstraction
  • 20. BigML, Inc 20 REST API, bindings and basic workflows .Solution: bringing automation and abstraction to the server­side   ● DSL for ML workflow automation ● Framework for scalable, remote execution of ML workflows Sophisticated server­side optimization Out­of­the­box scalability Client­server brittleness removed Infrastructure for creating and sharing ML scripts and libraries WhizzML
  • 21. BigML, Inc 21 REST API, bindings and basic workflows . WhizzML's new REST API resources: Scripts:  Executable  code  that  describes  an  actual  workflow, taking a list of typed inputs and producing  a list of outputs.  Executions:  Given  a  script  and  a  complete  set  of  inputs, the workflow can be executed and its outputs  generated. Libraries:  A  collection  of  WhizzML  definitions  that  can be imported by other libraries or scripts.
  • 22. BigML, Inc 22 REST API, bindings and basic workflows Scripts Creating scripts ● Usable by any binding (from any language) ● Built­in parallelization ● BigML resources management as primitives of the language ● Complete programming language for workflow definition Using scripts Web UI Bindings BigMLer WhizzML
  • 23. BigML, Inc 23 Advanced WhizzML workflows Charles Parker WhizzML offers: ● Primitives for all ML resources: (datasets, models, clusters, etc.) ● A complete programming language to compose at will these ML resources. ● Parallelization and Scalability built­in. This empowers the user to benefit from: ● Automated feature engineering: Best­first feature selection. ● Automated  configuration  choice:  Randomized  parameter  optimization,  SMACdown. ● Complex algorithms as 1­click: Stacked generalization, Boosting. All of them can be shared, reproduced and reused as one more  BigML resource in a language­agnostic way.
  • 24. BigML, Inc 24 Advanced WhizzML workflows f5 fn ... ... ...... ... ... f5 f7 f5 fn ... ... ...... ... ... f5 f1 Selected fields () (f5) The best score is obtained for the model with (f5) The best score is obtained for the model with (f5 f7) Following iterations don't improve the score for the model with (f5 f7), so the process stops Step 1 Step 2 f1 Best­first feature selection
  • 25. BigML, Inc 25 Advanced WhizzML workflows A new dataset is generated with the predictions for the hold out data A new metamodel is created from this dataset 50% Hold out Stacked generalization
  • 26. BigML, Inc 26 Advanced WhizzML workflows Configuration random generator ... ... Best score Process stops when you reach the expected performance or the user­given iterations limit + Randomized parameter optimization
  • 27. BigML, Inc 27 Advanced WhizzML workflows Configuration random generator ... ... + New configurations are filtered according to the predictions of the model of performances Only promising configurations are analyzed SMACdown
  • 28. BigML, Inc 28 Advanced WhizzML workflows … … The final model is an ensemble of models T0 F0 T1 F1 T2 F2 F8 T8 Boosting
  • 29. BigML, Inc 29 Advanced WhizzML workflows Script it once, for everybody anywhere Publish scripts  in the gallery Add scripts to your menus