SlideShare a Scribd company logo
Data Science:
How did we get here and where are we going?
June 2017
http://bit.ly/data-la
WIFI: CrossCamp.us Events
About us
We train developers and data
scientists through 1-on-1
mentorship and career prep
About us
• Noel Duarte
• Los Angeles Area
General Manager
• UC Berkeley ’15 — worked
primarily with R for
population genetics
analysis, at Thinkful since
January 2016
• Kyle Polich
• Data science mentor at
Thinkful
• Host for Data Skeptic, a
podcast devoted to all
things data science and
advancements in the
industry
About you
Why are you here?
• I already have a career in data
• I’m curious about switching to a career in data
• I want to learn what data science is and why it’s
important
Today’s goals
• Why is data science important?
• What is a data scientist and what do they do?
• How and why has the field emerged?
• How can one become a data scientist? (And why
would you want to?)
Why is data science important?
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 people with deep analytical skills
as well as 1.5 million managers and analysts with the
know-how to use the analysis of big data to make
effective decisions.
- McKinsey Global Institute (MGI)
Data Scientist:
Case study: LinkedIn (2006)
“[LinkedIn] was like arriving at a conference reception
and realizing you don’t know anyone. So you just stand
in the corner sipping your drink—and you probably
leave early.”
-LinkedIn Manager, June 2006
The new guy
• Joined LinkedIn in 2006,
only 8M users (450M in
2016)
• Started experiments to
predict people’s networks
• Engineers were dismissive:
“you can already import
your address book”
The result
Data, data everywhere 🚀
• Uber — Where drivers should hang out
• Netflix — movie recommendations
• Ebola epidemic — Mobile mapping in Senegal to
fight disease
Data, data everywhere 🚀
Big Data — what exactly does it mean?
Big Data: datasets whose size is beyond the ability of
typical database software tools to capture, store,
manage, and analyze
Big Data — brief history
• Trend “started” in 2005 (Hadoop!)
• Web 2.0 - Majority of content is created by users
• Mobile accelerates this — data/person skyrockets
Big Data — 3 Vs
Big Data — tldr;
90% of the data in the world today has been created
in the last two years alone.
- IBM, May 2013
In come data scientists!
Intersection of engineering, statistics, & communication
The data science process
Let’s come back to LinkedIn’s evolution in 2006 and
examine it using a typical* data science approach.
• Frame the question
• Collect the raw data
• Process the data
• Explore the data
• Communicate results
Case: Frame the question
What questions do we want to answer?
Case: Frame the question
• What connections (type and number) lead to higher
user engagement?
• Which connections do people want to make but are
currently limited from making?
• How might we predict these types of connections
with limited data from the user?
Case: Collect the data
What data do we need to answer these questions?
Case: Collect the data
• Connection data (who is who connected to?)
• Demographic data (what is profile of connection?)
• Retention data (how do people stay or leave?)
• Engagement data (how do they use the site?)
Case: Process the data
How is the data “dirty” and how can we clean it?
Case: Process the data
• User input
• Redundancies
• Feature changes
• Data model changes
Case: Explore the data
What are the meaningful patterns in the data?
Case: Explore the data
• Triangle closing
• Time overlaps
• Geographic clustering
Case: Communicate results
How do we communicate this? To whom?
Case: Communicate results
• Tell story at the right technical level for each audience
• Make sure to focus on Whats In It For You (WIIFY!)
• Be objective, don’t lie with statistics
• Be visual! Show, don’t just tell
Tools to explore “big data”
• SQL Queries
• Business Analytics Software
• Machine Learning Algorithms
Tool #1: SQL queries
SQL is the standard querying language to access and
manipulate databases
SQL example
friends
id full_name age
1 Dan Friedman 24
2 Jared Jones 27
3 Paul Gu 22
4 Noel Duarte 73
SELECT full_name FROM friends WHERE age=73
Tool #2: Analytics software
Business analytics software for your database enabling
you to easily find and communicate insights visually
Tableau example
Tool #3: Machine Learning Algorithms
Machine learning algorithms provide computers
with the ability to learn without being explicitly
programmed — “programming by example”
Iris data set example
Iris data set example
Use cases for machine learning
• Classification — Predict categories
• Regression — Predict values
• Anomaly Detection — Find unusual occurrences
• Clustering — Discover structure
I’m in! Where do I start?
• Knowledge of statistics, algorithms, & software
• Comfort with languages & tools (Python, SQL,
Tableau)
• Inquisitiveness and intellectual curiosity
• Strong communication skills
Ways to keep learning
More Structure
Less Structure
Less Support More Support
1-on-1 mentorship enables flexibility
325+ mentors with an average of 10
years of experience in the field
Support ‘round the clock
You
Your mentor
Q&A Sessions
In-person
workshops
Career coachSlack
Program Manager
Want to try us/data science out?
Talk to us now or be on the look out for our email 📬
Thinkful’s Data Science
Prep Course covers:
- Python fundamentals
- Statistics
- Data science concepts
- Capstone project
$250 for 3 weeks

More Related Content

What's hot

Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
DataWorks Summit
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
HostedbyConfluent
 
stackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introductionstackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introduction
NETWAYS
 

What's hot (20)

Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Building an integrated data strategy
Building an integrated data strategyBuilding an integrated data strategy
Building an integrated data strategy
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
 
What is Solution Architecture?
What is Solution Architecture?What is Solution Architecture?
What is Solution Architecture?
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
How to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that LastsHow to Make a Data Governance Program that Lasts
How to Make a Data Governance Program that Lasts
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
stackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introductionstackconf 2021 | Weaviate Vector Search Engine – Introduction
stackconf 2021 | Weaviate Vector Search Engine – Introduction
 
Deep Web – drugie dno internetu
Deep Web – drugie dno internetuDeep Web – drugie dno internetu
Deep Web – drugie dno internetu
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
 
Building a Data-Driven Culture
Building a Data-Driven CultureBuilding a Data-Driven Culture
Building a Data-Driven Culture
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Big data
Big dataBig data
Big data
 

Similar to Getting Started in Data Science

Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
Thinkful
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
Thinkful
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
Thinkful
 
Getting started in ds (july 17) atlanta
Getting started in ds (july 17)   atlantaGetting started in ds (july 17)   atlanta
Getting started in ds (july 17) atlanta
Thinkful
 

Similar to Getting Started in Data Science (20)

Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Getting started in data science (4:3)
Getting started in data science (4:3)Getting started in data science (4:3)
Getting started in data science (4:3)
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
Thinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DCThinkful - Intro to Data Science - Washington DC
Thinkful - Intro to Data Science - Washington DC
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Getting started in ds (july 17) atlanta
Getting started in ds (july 17)   atlantaGetting started in ds (july 17)   atlanta
Getting started in ds (july 17) atlanta
 
Getstarteddssd12717sd
Getstarteddssd12717sdGetstarteddssd12717sd
Getstarteddssd12717sd
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
D92-198gstindspdx
D92-198gstindspdxD92-198gstindspdx
D92-198gstindspdx
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sd
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Deck 92-146 (3)
Deck 92-146 (3)Deck 92-146 (3)
Deck 92-146 (3)
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 

More from Thinkful

LA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: FundamentalsLA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: Fundamentals
Thinkful
 
Getting started-jan-9-2018
Getting started-jan-9-2018Getting started-jan-9-2018
Getting started-jan-9-2018
Thinkful
 

More from Thinkful (20)

893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
893ff61f-1fb8-4e15-a379-775dfdbcee77-7-14-25-46-115-141-308-324-370
 
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsLA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: Fundamentals
 
LA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: FundamentalsLA 1/31/18 Intro to JavaScript: Fundamentals
LA 1/31/18 Intro to JavaScript: Fundamentals
 
Itjsf129
Itjsf129Itjsf129
Itjsf129
 
Twit botsd1.30.18
Twit botsd1.30.18Twit botsd1.30.18
Twit botsd1.30.18
 
Build your-own-instagram-filters-with-javascript-202-335 (1)
Build your-own-instagram-filters-with-javascript-202-335 (1)Build your-own-instagram-filters-with-javascript-202-335 (1)
Build your-own-instagram-filters-with-javascript-202-335 (1)
 
Baggwjs124
Baggwjs124Baggwjs124
Baggwjs124
 
Become a Data Scientist: A Thinkful Info Session
Become a Data Scientist: A Thinkful Info SessionBecome a Data Scientist: A Thinkful Info Session
Become a Data Scientist: A Thinkful Info Session
 
Vpet sd-1.25.18
Vpet sd-1.25.18Vpet sd-1.25.18
Vpet sd-1.25.18
 
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
LA 1/18/18 Become A Web Developer: A Thinkful Info SessionLA 1/18/18 Become A Web Developer: A Thinkful Info Session
LA 1/18/18 Become A Web Developer: A Thinkful Info Session
 
How to Choose a Programming Language
How to Choose a Programming LanguageHow to Choose a Programming Language
How to Choose a Programming Language
 
Batbwjs117
Batbwjs117Batbwjs117
Batbwjs117
 
1/16/18 Intro to JS Workshop
1/16/18 Intro to JS Workshop1/16/18 Intro to JS Workshop
1/16/18 Intro to JS Workshop
 
LA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: FundamentalsLA 1/16/18 Intro to Javascript: Fundamentals
LA 1/16/18 Intro to Javascript: Fundamentals
 
(LA 1/16/18) Intro to JavaScript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals(LA 1/16/18) Intro to JavaScript: Fundamentals
(LA 1/16/18) Intro to JavaScript: Fundamentals
 
Websitesd1.15.17.
Websitesd1.15.17.Websitesd1.15.17.
Websitesd1.15.17.
 
Bavpwjs110
Bavpwjs110Bavpwjs110
Bavpwjs110
 
Byowwhc110
Byowwhc110Byowwhc110
Byowwhc110
 
Getting started-jan-9-2018
Getting started-jan-9-2018Getting started-jan-9-2018
Getting started-jan-9-2018
 
Introjs1.9.18tf
Introjs1.9.18tfIntrojs1.9.18tf
Introjs1.9.18tf
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Getting Started in Data Science

  • 1. Data Science: How did we get here and where are we going? June 2017 http://bit.ly/data-la WIFI: CrossCamp.us Events
  • 2. About us We train developers and data scientists through 1-on-1 mentorship and career prep
  • 3. About us • Noel Duarte • Los Angeles Area General Manager • UC Berkeley ’15 — worked primarily with R for population genetics analysis, at Thinkful since January 2016 • Kyle Polich • Data science mentor at Thinkful • Host for Data Skeptic, a podcast devoted to all things data science and advancements in the industry
  • 4. About you Why are you here? • I already have a career in data • I’m curious about switching to a career in data • I want to learn what data science is and why it’s important
  • 5. Today’s goals • Why is data science important? • What is a data scientist and what do they do? • How and why has the field emerged? • How can one become a data scientist? (And why would you want to?)
  • 6. Why is data science important? By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. - McKinsey Global Institute (MGI)
  • 8. Case study: LinkedIn (2006) “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  • 9. The new guy • Joined LinkedIn in 2006, only 8M users (450M in 2016) • Started experiments to predict people’s networks • Engineers were dismissive: “you can already import your address book”
  • 11. Data, data everywhere 🚀 • Uber — Where drivers should hang out • Netflix — movie recommendations • Ebola epidemic — Mobile mapping in Senegal to fight disease
  • 13. Big Data — what exactly does it mean? Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  • 14. Big Data — brief history • Trend “started” in 2005 (Hadoop!) • Web 2.0 - Majority of content is created by users • Mobile accelerates this — data/person skyrockets
  • 15. Big Data — 3 Vs
  • 16. Big Data — tldr; 90% of the data in the world today has been created in the last two years alone. - IBM, May 2013
  • 17. In come data scientists!
  • 18. Intersection of engineering, statistics, & communication
  • 19. The data science process Let’s come back to LinkedIn’s evolution in 2006 and examine it using a typical* data science approach. • Frame the question • Collect the raw data • Process the data • Explore the data • Communicate results
  • 20. Case: Frame the question What questions do we want to answer?
  • 21. Case: Frame the question • What connections (type and number) lead to higher user engagement? • Which connections do people want to make but are currently limited from making? • How might we predict these types of connections with limited data from the user?
  • 22. Case: Collect the data What data do we need to answer these questions?
  • 23. Case: Collect the data • Connection data (who is who connected to?) • Demographic data (what is profile of connection?) • Retention data (how do people stay or leave?) • Engagement data (how do they use the site?)
  • 24. Case: Process the data How is the data “dirty” and how can we clean it?
  • 25. Case: Process the data • User input • Redundancies • Feature changes • Data model changes
  • 26. Case: Explore the data What are the meaningful patterns in the data?
  • 27. Case: Explore the data • Triangle closing • Time overlaps • Geographic clustering
  • 28. Case: Communicate results How do we communicate this? To whom?
  • 29. Case: Communicate results • Tell story at the right technical level for each audience • Make sure to focus on Whats In It For You (WIIFY!) • Be objective, don’t lie with statistics • Be visual! Show, don’t just tell
  • 30. Tools to explore “big data” • SQL Queries • Business Analytics Software • Machine Learning Algorithms
  • 31. Tool #1: SQL queries SQL is the standard querying language to access and manipulate databases
  • 32. SQL example friends id full_name age 1 Dan Friedman 24 2 Jared Jones 27 3 Paul Gu 22 4 Noel Duarte 73 SELECT full_name FROM friends WHERE age=73
  • 33. Tool #2: Analytics software Business analytics software for your database enabling you to easily find and communicate insights visually
  • 35. Tool #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  • 36. Iris data set example
  • 37. Iris data set example
  • 38. Use cases for machine learning • Classification — Predict categories • Regression — Predict values • Anomaly Detection — Find unusual occurrences • Clustering — Discover structure
  • 39. I’m in! Where do I start? • Knowledge of statistics, algorithms, & software • Comfort with languages & tools (Python, SQL, Tableau) • Inquisitiveness and intellectual curiosity • Strong communication skills
  • 40. Ways to keep learning More Structure Less Structure Less Support More Support
  • 41. 1-on-1 mentorship enables flexibility 325+ mentors with an average of 10 years of experience in the field
  • 42. Support ‘round the clock You Your mentor Q&A Sessions In-person workshops Career coachSlack Program Manager
  • 43. Want to try us/data science out? Talk to us now or be on the look out for our email 📬 Thinkful’s Data Science Prep Course covers: - Python fundamentals - Statistics - Data science concepts - Capstone project $250 for 3 weeks