SlideShare a Scribd company logo
1 of 24
Download to read offline
Managing
Data Science
Dr. David Martinez Rego
Big Data Spain 2016
Lead Data Science
• Leading Data Science is probably one of the most
exciting/fun positions that someone can have
nowadays
• Create computational algorithms that can take
decisions and learn from errors in any problem
that can be formulated in numbers
• There many paths to find the treasure in your data,
the role of a Lead Data Scientist is to find the
shortest and safest one.
DS plan meeting!
Top weird moments
• I prefer not to give you any insight into the problem. Why
do you want to know what the columns are? I prefer you
treat the problem as just data.
• There exist labels. We do not have permission to access
them. You inspect the results and see if they makes sense.
• We can give the screenshot of the dashboard and tell the
algorithm to predict if it will break.
• Your algorithm is wrong! I have been managing this for 10
years, it cannot be like that.
Other meetings!
Common language
• Because of its short public life,
Machine Learning lacks the
general understanding of its
fundamental limitations/
principles.
• Focus on practicality makes
literature/media oblivious to
these fundamentals.
• Only when we agree on some
common language, all parties
in a room can start to
understand each other.
Learning theory
• Set of fundamental results that are behind many of
the common practices and algorithms we use
nowadays.
• Has been heavily researched since the 80s and
offers a set of mathematical guarantees/limitations
in the practice of ML
• Useful both for ML practitioners and managers as a
rule of thumb to understand and manage DS.
Domain
Dataset
Loss function
Hypothesis space
Training algorithm
Evaluation
The problem
No Free Lunch
No Free Lunch
No Free Lunch
How can we prevent such failures? 	

By using our prior knowledge about a specific learning task, to
avoid the distributions that will cause us to fail when learning
that task. Such prior knowledge can be expressed by
restricting our hypothesis class.
No Free Lunch take aways
• No free lunch theorem is a mathematical certificate
• For managers & HR
• foresee an investment in a variety of specialists if you plan
to tackle an increasing number of data challenges
• escape from promises of one killer technique that acts as
a hammer for all problems
• For Data Science teams
• foresee and increasing number of specific techniques
which you have to keep up to date (team effort)
Generalisation bounds
• How can be sure that a model will not fail in
production?
• How can we correct when things do not go well?
• How can I know if I am being wasteful?
Generalisation bounds
• A ML practitioner is going to train a model with
complexity d (VC-dimension), on m samples, and
she is going to observe an error Ls.
• The expected performance when this model goes
to production is bounded by with probability 1-𝜹
Manage DS
• How can we correct when things do not go well
• Get a larger sample
• Change the hypothesis class by:
• Enlarging it
• Reducing it
• Completely changing it
• Changing the parameters you consider
• Change the feature representation of the data
• Change the optimisation algorithm used to apply your learning
rule
Big Data
• Big Data has had a significant impact in the number of m samples, and also
the complexity complexity d (VC-dimension).
• When tackling Variety by making use of unstructured data we increase the
complexity d and so it should be planned that the size m is adequate.
• Review the modelling that we are doing to know if we need a big database.
• Is it the case that you do not need to maintain all that data?
Half pie syndrome
• Symptoms
• You are spending a lot of
money on gathering
data to fuel growth in
your business
• Your systems look like
this pie, succulent but it
seems that your
business has lost
appetite.
Enough data?
Andrew Gelman (2005):
“Sample sizes are never large. If N is
too small to get a sufficiently-precise
estimate, you need to get more data
(or make more assumptions). But once
N is "large enough”, you can start
subdividing the data to learn more. N
is never enough because if it were
"enough" you'd already be on to the
next problem for which you need more
data.”
Big data bounds
Alg. design #Data Engineering
Conclusions
• In order to build a better understanding between
data science teams and other stakeholders, we
need to make an effort to build a robust common
language!
• Learning theory, originally devised as the
fundamental theoretic pillar of ML, can help to
build an understanding
• These proven basic laws can help you to have a
structured way to manage Data Science
References
• Shai Shalev-Shwartz and Shai Ben-David.
Understanding Machine Learning: From Theory to
Algorithms, 2014.
• León Bottou and Olivier Bousquet. The Tradeoffs of
Large Scale Learning. NIPS 2008
• SVM Optimization: Inverse depencen on dataset size.
ICML 2008
• Gelman, Andrew. N is never large enough, http://
andrewgelman.com/2005/07/31/n_is_never_larg/
Managing
Data Science
Dr. David Martinez Rego
Big Data Spain 2016

More Related Content

What's hot

MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionBigML, Inc
 
Innovation explained
Innovation explainedInnovation explained
Innovation explainedLeroy Yau
 
Design vs Data: Battle Royale (UX+Data Meetup)
Design vs Data: Battle Royale (UX+Data Meetup)Design vs Data: Battle Royale (UX+Data Meetup)
Design vs Data: Battle Royale (UX+Data Meetup)Jess Dale
 
Using Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A JobUsing Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A JobGary Clement
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Lviv Startup Club
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data ScientistNuno Carneiro
 
Prevalence Of Spreadsheet Errors
Prevalence Of Spreadsheet ErrorsPrevalence Of Spreadsheet Errors
Prevalence Of Spreadsheet Errorshetupatel
 
The Seven Problem Solving Steps
The Seven Problem Solving StepsThe Seven Problem Solving Steps
The Seven Problem Solving StepsDeborah_W
 
Data Analysis Goes Wrong by Microsoft Sr PM
Data Analysis Goes Wrong by Microsoft Sr PMData Analysis Goes Wrong by Microsoft Sr PM
Data Analysis Goes Wrong by Microsoft Sr PMProduct School
 
MLSEV Virtual. Predictions
MLSEV Virtual. PredictionsMLSEV Virtual. Predictions
MLSEV Virtual. PredictionsBigML, Inc
 
The current state of prediction in neuroimaging
The current state of prediction in neuroimagingThe current state of prediction in neuroimaging
The current state of prediction in neuroimagingSaigeRutherford
 
Lesson 6 troubleshooting toolkit
Lesson 6 troubleshooting toolkitLesson 6 troubleshooting toolkit
Lesson 6 troubleshooting toolkitkeem773
 
Keys to Better Problem Solving
Keys to Better Problem SolvingKeys to Better Problem Solving
Keys to Better Problem SolvingMike Wicker
 
Figuring out the right metrics for your game
Figuring out the right metrics for your gameFiguring out the right metrics for your game
Figuring out the right metrics for your gameSaurav Sahu
 
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey   lavastorm analyticsAnalytical Skills Tools and Attitudes 2013 Survey   lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analyticsjjoseph100
 

What's hot (18)

MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
Math early release
Math early releaseMath early release
Math early release
 
Quantum machine learning basics
Quantum machine learning basicsQuantum machine learning basics
Quantum machine learning basics
 
Innovation explained
Innovation explainedInnovation explained
Innovation explained
 
Design vs Data: Battle Royale (UX+Data Meetup)
Design vs Data: Battle Royale (UX+Data Meetup)Design vs Data: Battle Royale (UX+Data Meetup)
Design vs Data: Battle Royale (UX+Data Meetup)
 
Using Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A JobUsing Problem Solving Skills To Get A Job
Using Problem Solving Skills To Get A Job
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
 
Preso slidedeck
Preso slidedeckPreso slidedeck
Preso slidedeck
 
10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist10 Tips From A Young Data Scientist
10 Tips From A Young Data Scientist
 
Prevalence Of Spreadsheet Errors
Prevalence Of Spreadsheet ErrorsPrevalence Of Spreadsheet Errors
Prevalence Of Spreadsheet Errors
 
The Seven Problem Solving Steps
The Seven Problem Solving StepsThe Seven Problem Solving Steps
The Seven Problem Solving Steps
 
Data Analysis Goes Wrong by Microsoft Sr PM
Data Analysis Goes Wrong by Microsoft Sr PMData Analysis Goes Wrong by Microsoft Sr PM
Data Analysis Goes Wrong by Microsoft Sr PM
 
MLSEV Virtual. Predictions
MLSEV Virtual. PredictionsMLSEV Virtual. Predictions
MLSEV Virtual. Predictions
 
The current state of prediction in neuroimaging
The current state of prediction in neuroimagingThe current state of prediction in neuroimaging
The current state of prediction in neuroimaging
 
Lesson 6 troubleshooting toolkit
Lesson 6 troubleshooting toolkitLesson 6 troubleshooting toolkit
Lesson 6 troubleshooting toolkit
 
Keys to Better Problem Solving
Keys to Better Problem SolvingKeys to Better Problem Solving
Keys to Better Problem Solving
 
Figuring out the right metrics for your game
Figuring out the right metrics for your gameFiguring out the right metrics for your game
Figuring out the right metrics for your game
 
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey   lavastorm analyticsAnalytical Skills Tools and Attitudes 2013 Survey   lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
 

Viewers also liked

Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesBig Data Spain
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...Big Data Spain
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
 Turning an idea into a Data-Driven Production System: An Energy Load Forecas... Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...Big Data Spain
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Growing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso BetanzosGrowing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso BetanzosBig Data Spain
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...Big Data Spain
 
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. BrodersenInferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. BrodersenBig Data Spain
 
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...Big Data Spain
 

Viewers also liked (12)

Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
 Turning an idea into a Data-Driven Production System: An Energy Load Forecas... Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
Turning an idea into a Data-Driven Production System: An Energy Load Forecas...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Growing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso BetanzosGrowing Data Scientists by Amparo Alonso Betanzos
Growing Data Scientists by Amparo Alonso Betanzos
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. BrodersenInferring the effect of an event using CausalImpact by Kay H. Brodersen
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
 
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
 

Similar to Managing Data Science by David Martínez Rego

Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015Fiona Lew
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data ScientistAmanMehta47
 
How to start thinking like a data scientist?
How to start thinking like a data scientist?How to start thinking like a data scientist?
How to start thinking like a data scientist?NarasingaMoorthy V
 
Machine learning and language v2
Machine learning  and language v2Machine learning  and language v2
Machine learning and language v2William Moore
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
The Data Greenhouse DevOps Measurement at Scale
The Data Greenhouse  DevOps Measurement at ScaleThe Data Greenhouse  DevOps Measurement at Scale
The Data Greenhouse DevOps Measurement at Scalesparkagility
 
Analytics that deliver Value
Analytics that deliver ValueAnalytics that deliver Value
Analytics that deliver ValueSandro Catanzaro
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
How to Get Started or Expand Your Learning Analytics Program
 How to Get Started or Expand Your Learning Analytics Program How to Get Started or Expand Your Learning Analytics Program
How to Get Started or Expand Your Learning Analytics ProgramWatershed
 
How To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataHow To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataSnapShot
 
Machine Learning and Languge
Machine Learning and LangugeMachine Learning and Languge
Machine Learning and LangugeWilliam Moore
 
Copyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docxCopyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docxdickonsondorris
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
Building a successful data organization nov 2018
Building a successful data organization   nov 2018Building a successful data organization   nov 2018
Building a successful data organization nov 2018Alejandro Cantarero
 
CommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedJim Parnitzke
 

Similar to Managing Data Science by David Martínez Rego (20)

Decision Science POV 6-26-13
Decision Science POV 6-26-13Decision Science POV 6-26-13
Decision Science POV 6-26-13
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015BIG DATA WORKBOOK OCT 2015
BIG DATA WORKBOOK OCT 2015
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data Scientist
 
How to start thinking like a data scientist?
How to start thinking like a data scientist?How to start thinking like a data scientist?
How to start thinking like a data scientist?
 
Machine learning and language v2
Machine learning  and language v2Machine learning  and language v2
Machine learning and language v2
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
The Data Greenhouse DevOps Measurement at Scale
The Data Greenhouse  DevOps Measurement at ScaleThe Data Greenhouse  DevOps Measurement at Scale
The Data Greenhouse DevOps Measurement at Scale
 
Analytics that deliver Value
Analytics that deliver ValueAnalytics that deliver Value
Analytics that deliver Value
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
How to Get Started or Expand Your Learning Analytics Program
 How to Get Started or Expand Your Learning Analytics Program How to Get Started or Expand Your Learning Analytics Program
How to Get Started or Expand Your Learning Analytics Program
 
How To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataHow To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise Data
 
Machine Learning and Languge
Machine Learning and LangugeMachine Learning and Languge
Machine Learning and Languge
 
Copyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docxCopyright © 2014 EMC Corporation. All Rights Reserved..docx
Copyright © 2014 EMC Corporation. All Rights Reserved..docx
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
Building a successful data organization nov 2018
Building a successful data organization   nov 2018Building a successful data organization   nov 2018
Building a successful data organization nov 2018
 
CommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Managing Data Science by David Martínez Rego

  • 1.
  • 2. Managing Data Science Dr. David Martinez Rego Big Data Spain 2016
  • 3. Lead Data Science • Leading Data Science is probably one of the most exciting/fun positions that someone can have nowadays • Create computational algorithms that can take decisions and learn from errors in any problem that can be formulated in numbers • There many paths to find the treasure in your data, the role of a Lead Data Scientist is to find the shortest and safest one.
  • 5. Top weird moments • I prefer not to give you any insight into the problem. Why do you want to know what the columns are? I prefer you treat the problem as just data. • There exist labels. We do not have permission to access them. You inspect the results and see if they makes sense. • We can give the screenshot of the dashboard and tell the algorithm to predict if it will break. • Your algorithm is wrong! I have been managing this for 10 years, it cannot be like that.
  • 7. Common language • Because of its short public life, Machine Learning lacks the general understanding of its fundamental limitations/ principles. • Focus on practicality makes literature/media oblivious to these fundamentals. • Only when we agree on some common language, all parties in a room can start to understand each other.
  • 8. Learning theory • Set of fundamental results that are behind many of the common practices and algorithms we use nowadays. • Has been heavily researched since the 80s and offers a set of mathematical guarantees/limitations in the practice of ML • Useful both for ML practitioners and managers as a rule of thumb to understand and manage DS.
  • 12. No Free Lunch How can we prevent such failures? By using our prior knowledge about a specific learning task, to avoid the distributions that will cause us to fail when learning that task. Such prior knowledge can be expressed by restricting our hypothesis class.
  • 13.
  • 14. No Free Lunch take aways • No free lunch theorem is a mathematical certificate • For managers & HR • foresee an investment in a variety of specialists if you plan to tackle an increasing number of data challenges • escape from promises of one killer technique that acts as a hammer for all problems • For Data Science teams • foresee and increasing number of specific techniques which you have to keep up to date (team effort)
  • 15. Generalisation bounds • How can be sure that a model will not fail in production? • How can we correct when things do not go well? • How can I know if I am being wasteful?
  • 16. Generalisation bounds • A ML practitioner is going to train a model with complexity d (VC-dimension), on m samples, and she is going to observe an error Ls. • The expected performance when this model goes to production is bounded by with probability 1-𝜹
  • 17. Manage DS • How can we correct when things do not go well • Get a larger sample • Change the hypothesis class by: • Enlarging it • Reducing it • Completely changing it • Changing the parameters you consider • Change the feature representation of the data • Change the optimisation algorithm used to apply your learning rule
  • 18. Big Data • Big Data has had a significant impact in the number of m samples, and also the complexity complexity d (VC-dimension). • When tackling Variety by making use of unstructured data we increase the complexity d and so it should be planned that the size m is adequate. • Review the modelling that we are doing to know if we need a big database. • Is it the case that you do not need to maintain all that data?
  • 19. Half pie syndrome • Symptoms • You are spending a lot of money on gathering data to fuel growth in your business • Your systems look like this pie, succulent but it seems that your business has lost appetite.
  • 20. Enough data? Andrew Gelman (2005): “Sample sizes are never large. If N is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once N is "large enough”, you can start subdividing the data to learn more. N is never enough because if it were "enough" you'd already be on to the next problem for which you need more data.”
  • 21. Big data bounds Alg. design #Data Engineering
  • 22. Conclusions • In order to build a better understanding between data science teams and other stakeholders, we need to make an effort to build a robust common language! • Learning theory, originally devised as the fundamental theoretic pillar of ML, can help to build an understanding • These proven basic laws can help you to have a structured way to manage Data Science
  • 23. References • Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms, 2014. • León Bottou and Olivier Bousquet. The Tradeoffs of Large Scale Learning. NIPS 2008 • SVM Optimization: Inverse depencen on dataset size. ICML 2008 • Gelman, Andrew. N is never large enough, http:// andrewgelman.com/2005/07/31/n_is_never_larg/
  • 24. Managing Data Science Dr. David Martinez Rego Big Data Spain 2016