SlideShare a Scribd company logo
DATA
SCIENCE
POP UP
AUSTIN
Data Do's and Dont's: Lessons
From the Front Line
Ryan Orban
VP of Product and Strategy, 

Data Scientist, Galvanize
ryanorban
DATA
SCIENCE
POP UP
AUSTIN
#datapopupaustin
April 13, 2016
Galvanize, Austin Campus
Data Do’s and Dont’s:
Lessons from the Frontline
Co-Founder & CEO
Zipfian Academy
Ryan Orban
@ryanorban
EVP of Product and Strategy
Galvanize
We believe an opportunity belongs 

to anyone with aptitude and ambition.
4Galvanize 2015
NODES ON THE NETWORK
COLORADO (BOULDER, DENVER, FORT COLLINS)
SEATTLE, WA
SAN FRANCISCO, CA
AUSTIN, TX (OPENING Q1 2016)
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Data
Engineering Immersive, Masters of Science in Data Science,
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
[Explanation Text]
5Galvanize 2015
5 PROGRAMS
• Full Stack Immersive
• Data Science Immersive
• Data Engineering Immersive
Project over 500 Student Member Graduates in 2015
Currently over 1500 Members
• Master of Science in Data Science 

(University of New Haven)
• Startup Membership
6Galvanize 2015
PLACEMENT STATS
FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE
$43K $77KPre-program Salary
Average Starting Salary
97% Placement
Rate*
*Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market.
This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation.
$72K $114KPre-program Salary
94%Placement
Rate*
Average Starting Salary
Software Engineering
Data
Science
Data
Analysis
Data
Engineering
Machine
Learning Java
Linux, UNIX
Mobile
Development
Objective C
C, C++, C#
Web
Development
Ruby on Rails
JavaScript
Front-endPHP
Full-
Stack
Excel
Python
SQL
NLP
Hadoop
Databases
Network Analysis
Java
Assembly
Statistics
R
The orange words are the most
important things we teach.
How These Things
Relate to Each Other
Full-Stack Web Development
and Data Science are in gray
circles.
8Galvanize 2015
DATA SCIENCE IMMERSIVE
Week 1 - Exploratory Data Analysis and Software Engineering Best Practices
Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit
Week 3 - Regression, Regularization, Gradient Descent
Week 4 - Supervised Machine Learning: Classification, Validation, Ensemble Methods
Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP
Week 6 - Network Analysis, Matrix Factorization, and Time Series
Week 7 - Hadoop, Hive, and MapReduce
Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study
Weeks 9-10 - Capstone Projects
Week 12 - Onsite Interviews
Data Manipulation Model Creation Prediction
Data Manipulation
Do
Don’t
• Assume your data is friendly
• ETL and feature engineering is largely opaque to others (and
yourself after enough time away)
• Automate cleaning and transformation pipelines
• Jupyter and RStudio are great for EDA, but have issues with
collaboration and version control
• Build functional code to be reused; export into plain code files,
track with Git
Model Creation
Do
Don’t
• Never use accuracy as your main metric
• You can have 99% accuracy but 0% predictive power
• Unbalanced classes; sampling
• Use metrics like precision and recall
• Aggregate metrics like F1-score, AUC/AIC/BIC also good
• Remember that models with highest scores are not always the
ones you need; permissive vs. conservative based on use case
Do
Don’t
• Don’t start with the most complicated models first (deep learning,
gradient boosting, SVMs, etc.)
• Don’t focus on the algorithm
•“More data always beats better algorithms”
• But better features usually beat better algorithms*
• Start with a baseline model, then continuously “close the loop”
• Create a base case to optimize against
• Does 1% greater F1-score outweigh a 10x training time in
production? Not usually unless you’re Google-scale.
Do
Don’t
• Assume your cross-validation metrics will hold up against real-life
data
• Separate your application and prediction code
• Fast iteration cycles are key. Create a “scoring service” that is
uncoupled from application code.
• APIs & service oriented architectures typically work best
Communication
Do
Don’t
• Don’t focus on the “how”, i.e. cover every trial and tribulation
• Cut to the chase
• After a presentation, I always ask the class two questions:
• What is one sentence that describes what the speaker learned?
• Why do I care?
19Galvanize 2015
• Early Access to Students
• Candidate Matching
• Curriculum Development
• Corporate Student Sponsorship
• Diversity
TALENT
20Galvanize 2015
• Membership
• Organic Relationships
• Course Content
• Mentorship
• Community
• Events
ACCESS
21Galvanize 2015
• Galvanize Experts
• Capstone Projects
• Internship
• Corporate Training
EXPERTISE
THANK YOU
RYAN ORBAN | EVP, STRATEGY
ryan.orban@galvanize.com
@ryanorban
www.galvanize.com
DATA
SCIENCE
POP UP
AUSTIN
@datapopup
#datapopupaustin

More Related Content

Viewers also liked

Data Science Popup Austin: Automating Data Science for Fun and Profit
Data Science Popup Austin: Automating Data Science for Fun and Profit Data Science Popup Austin: Automating Data Science for Fun and Profit
Data Science Popup Austin: Automating Data Science for Fun and Profit
Domino Data Lab
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science Stack
Domino Data Lab
 
The Right Question
The Right QuestionThe Right Question
The Right Question
Domino Data Lab
 
Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...
Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...
Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...
Domino Data Lab
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User Engagement
Domino Data Lab
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Domino Data Lab
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social Media
Domino Data Lab
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
Domino Data Lab
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Domino Data Lab
 

Viewers also liked (9)

Data Science Popup Austin: Automating Data Science for Fun and Profit
Data Science Popup Austin: Automating Data Science for Fun and Profit Data Science Popup Austin: Automating Data Science for Fun and Profit
Data Science Popup Austin: Automating Data Science for Fun and Profit
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science Stack
 
The Right Question
The Right QuestionThe Right Question
The Right Question
 
Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...
Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...
Data Science Popup Austin: Making Data Science Fast: Survey of GPU Accelerate...
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User Engagement
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going On
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social Media
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
 

More from Domino Data Lab

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
Domino Data Lab
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
Domino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
Domino Data Lab
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
Domino Data Lab
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
Domino Data Lab
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
Domino Data Lab
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
Domino Data Lab
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
Domino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data Scientists
Domino Data Lab
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
Domino Data Lab
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Domino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
Domino Data Lab
 

More from Domino Data Lab (20)

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data Scientists
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
 

Recently uploaded

一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 

Recently uploaded (20)

一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 

Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

  • 1. DATA SCIENCE POP UP AUSTIN Data Do's and Dont's: Lessons From the Front Line Ryan Orban VP of Product and Strategy, 
 Data Scientist, Galvanize ryanorban
  • 3.
  • 4. Data Do’s and Dont’s: Lessons from the Frontline
  • 5. Co-Founder & CEO Zipfian Academy Ryan Orban @ryanorban EVP of Product and Strategy Galvanize
  • 6. We believe an opportunity belongs 
 to anyone with aptitude and ambition.
  • 7. 4Galvanize 2015 NODES ON THE NETWORK COLORADO (BOULDER, DENVER, FORT COLLINS) SEATTLE, WA SAN FRANCISCO, CA AUSTIN, TX (OPENING Q1 2016) Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship Programs: Full Stack Immersive, Data Science Immersive, Data Engineering Immersive, Masters of Science in Data Science, Entrepreneurship Programs: Full Stack Immersive, Data Science Immersive, Entrepreneurship [Explanation Text]
  • 8. 5Galvanize 2015 5 PROGRAMS • Full Stack Immersive • Data Science Immersive • Data Engineering Immersive Project over 500 Student Member Graduates in 2015 Currently over 1500 Members • Master of Science in Data Science 
 (University of New Haven) • Startup Membership
  • 9. 6Galvanize 2015 PLACEMENT STATS FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE $43K $77KPre-program Salary Average Starting Salary 97% Placement Rate* *Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market. This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation. $72K $114KPre-program Salary 94%Placement Rate* Average Starting Salary
  • 10. Software Engineering Data Science Data Analysis Data Engineering Machine Learning Java Linux, UNIX Mobile Development Objective C C, C++, C# Web Development Ruby on Rails JavaScript Front-endPHP Full- Stack Excel Python SQL NLP Hadoop Databases Network Analysis Java Assembly Statistics R The orange words are the most important things we teach. How These Things Relate to Each Other Full-Stack Web Development and Data Science are in gray circles.
  • 11. 8Galvanize 2015 DATA SCIENCE IMMERSIVE Week 1 - Exploratory Data Analysis and Software Engineering Best Practices Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit Week 3 - Regression, Regularization, Gradient Descent Week 4 - Supervised Machine Learning: Classification, Validation, Ensemble Methods Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP Week 6 - Network Analysis, Matrix Factorization, and Time Series Week 7 - Hadoop, Hive, and MapReduce Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study Weeks 9-10 - Capstone Projects Week 12 - Onsite Interviews
  • 12.
  • 13. Data Manipulation Model Creation Prediction
  • 15. Do Don’t • Assume your data is friendly • ETL and feature engineering is largely opaque to others (and yourself after enough time away) • Automate cleaning and transformation pipelines • Jupyter and RStudio are great for EDA, but have issues with collaboration and version control • Build functional code to be reused; export into plain code files, track with Git
  • 17. Do Don’t • Never use accuracy as your main metric • You can have 99% accuracy but 0% predictive power • Unbalanced classes; sampling • Use metrics like precision and recall • Aggregate metrics like F1-score, AUC/AIC/BIC also good • Remember that models with highest scores are not always the ones you need; permissive vs. conservative based on use case
  • 18. Do Don’t • Don’t start with the most complicated models first (deep learning, gradient boosting, SVMs, etc.) • Don’t focus on the algorithm •“More data always beats better algorithms” • But better features usually beat better algorithms* • Start with a baseline model, then continuously “close the loop” • Create a base case to optimize against • Does 1% greater F1-score outweigh a 10x training time in production? Not usually unless you’re Google-scale.
  • 19. Do Don’t • Assume your cross-validation metrics will hold up against real-life data • Separate your application and prediction code • Fast iteration cycles are key. Create a “scoring service” that is uncoupled from application code. • APIs & service oriented architectures typically work best
  • 21. Do Don’t • Don’t focus on the “how”, i.e. cover every trial and tribulation • Cut to the chase • After a presentation, I always ask the class two questions: • What is one sentence that describes what the speaker learned? • Why do I care?
  • 22. 19Galvanize 2015 • Early Access to Students • Candidate Matching • Curriculum Development • Corporate Student Sponsorship • Diversity TALENT
  • 23. 20Galvanize 2015 • Membership • Organic Relationships • Course Content • Mentorship • Community • Events ACCESS
  • 24. 21Galvanize 2015 • Galvanize Experts • Capstone Projects • Internship • Corporate Training EXPERTISE
  • 25. THANK YOU RYAN ORBAN | EVP, STRATEGY ryan.orban@galvanize.com @ryanorban www.galvanize.com