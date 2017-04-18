Getting Started in Data Science April 2017 http://bit.ly/data-la Wiﬁ: CrossCamp.us Events
About us We train developers and data scientists through 1-on-1 mentorship and career prep
About me • Noel Duarte • Los Angeles Area General Manager • UC Berkeley ’15 — worked primarily with R for population genet...
About you Why are you here? • I already have a career in data • I’m curious about switching to a career in data • I’m curr...
Today’s goals • Why is data science important? • What is a data scientist and what do they do? • How and why has the ﬁeld ...
Why is data science important? By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with de...
Data Scientist: Deﬁnition #1
Data Scientist: Deﬁnition #2 Nate Silver FiveThirtyEight.com “I think data-scientist is a sexed up term for a statistician”
Data Scientist: Deﬁnition #3
Case study: LinkedIn (2006) “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. S...
The new guy • Joined LinkedIn in 2006, only 8M users (450M in 2016) • Started experiments to predict people’s networks • E...
The result
Data, data everywhere 🚀 • Uber — Where drivers should hang out • Netﬂix — $1M movie recommendations contest • Ebola epidem...
Data, data everywhere 🚀
Big Data — what exactly does it mean? Big Data: datasets whose size is beyond the ability of typical database software too...
Big Data — brief history • Trend “started” in 2005 (Hadoop!) • Web 2.0 - Majority of content is created by users • Mobile ...
Big Data — 3 Vs
Big Data — tldr; 90% of the data in the world today has been created in the last two years alone. - IBM, May 2013
In come data scientists!
Intersection of engineering, statistics, & communication
The data science process Let’s come back to LinkedIn’s evolution in 2006 and examine it using a typical* data science appr...
Case: Frame the question What questions do we want to answer?
Case: Frame the question • What connections (type and number) lead to higher user engagement? • Which connections do peopl...
Case: Collect the data What data do we need to answer these questions?
Case: Collect the data • Connection data (who is who connected to?) • Demographic data (what is proﬁle of connection?) • R...
Case: Process the data How is the data “dirty” and how can we clean it?
Case: Process the data • User input • Redundancies • Feature changes • Data model changes
Case: Explore the data What are the meaningful patterns in the data?
Case: Explore the data • Triangle closing • Time overlaps • Geographic clustering
Case: Communicate results How do we communicate this? To whom?
Case: Communicate results • Tell story at the right technical level for each audience • Make sure to focus on Whats In It ...
Tools to explore “big data” • SQL Queries • Business Analytics Software • Machine Learning Algorithms
Tool #1: SQL queries SQL is the standard querying language to access and manipulate databases
SQL example friends id full_name age 1 Dan Friedman 24 2 Jared Jones 27 3 Paul Gu 22 4 Noel Duarte 73 SELECT full_name FRO...
Tool #2: Analytics software Business analytics software for your database enabling you to easily ﬁnd and communicate insig...
Tableau example
Tool #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being...
Iris data set example
Iris data set example
Use cases for machine learning • Classiﬁcation — Predict categories • Regression — Predict values • Anomaly Detection — Fi...
If this excites you…
This is what you’ll need • Knowledge of statistics, algorithms, & software • Comfort with languages & tools (Python, SQL, ...
More about Thinkful You’ll learn concepts, practice with drills, and build capstone projects for your own portfolio — all ...
Our mentors Mentors have, on average, 10+ years of experience
Our results Job Titles after GraduationMonths until Employed
Want to try us/data science out? • Three-week program, includes six mentor sessions for $250 • Overview of Python, Python’...
October 2015 Questions? noel@thinkful.com
Upcoming SlideShare
Loading in …5
×

Getting started in Data Science (April 2017, Los Angeles)

46 views

Published on

Getting started in Data Science (April 2017, Los Angeles)

Published in: Education
0 Comments
0 Likes
Statistics
Notes
no profile picture user

  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
46
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Getting started in Data Science (April 2017, Los Angeles)

  1. 1. Getting Started in Data Science April 2017 http://bit.ly/data-la Wiﬁ: CrossCamp.us Events
  2. 2. About us We train developers and data scientists through 1-on-1 mentorship and career prep
  3. 3. About me • Noel Duarte • Los Angeles Area General Manager • UC Berkeley ’15 — worked primarily with R for population genetics analysis, at Thinkful since January 2016
  4. 4. About you Why are you here? • I already have a career in data • I’m curious about switching to a career in data • I’m currently transitioning into a career in data • I want to learn what data science is and why it’s important
  5. 5. Today’s goals • Why is data science important? • What is a data scientist and what do they do? • How and why has the ﬁeld emerged? • How can one become a data scientist? (And why would you want to?)
  6. 6. Why is data science important? By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. - McKinsey Global Institute (MGI)
  7. 7. Data Scientist: Deﬁnition #1
  8. 8. Data Scientist: Deﬁnition #2 Nate Silver FiveThirtyEight.com “I think data-scientist is a sexed up term for a statistician”
  9. 9. Data Scientist: Deﬁnition #3
  10. 10. Case study: LinkedIn (2006) “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  11. 11. The new guy • Joined LinkedIn in 2006, only 8M users (450M in 2016) • Started experiments to predict people’s networks • Engineers were dismissive: “you can already import your address book”
  12. 12. The result
  13. 13. Data, data everywhere 🚀 • Uber — Where drivers should hang out • Netﬂix — $1M movie recommendations contest • Ebola epidemic — Mobile mapping in Senegal to ﬁght disease
  14. 14. Data, data everywhere 🚀
  15. 15. Big Data — what exactly does it mean? Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  16. 16. Big Data — brief history • Trend “started” in 2005 (Hadoop!) • Web 2.0 - Majority of content is created by users • Mobile accelerates this — data/person skyrockets
  17. 17. Big Data — 3 Vs
  18. 18. Big Data — tldr; 90% of the data in the world today has been created in the last two years alone. - IBM, May 2013
  19. 19. In come data scientists!
  20. 20. Intersection of engineering, statistics, & communication
  21. 21. The data science process Let’s come back to LinkedIn’s evolution in 2006 and examine it using a typical* data science approach. • Frame the question • Collect the raw data • Process the data • Explore the data • Communicate results
  22. 22. Case: Frame the question What questions do we want to answer?
  23. 23. Case: Frame the question • What connections (type and number) lead to higher user engagement? • Which connections do people want to make but are currently limited from making? • How might we predict these types of connections with limited data from the user?
  24. 24. Case: Collect the data What data do we need to answer these questions?
  25. 25. Case: Collect the data • Connection data (who is who connected to?) • Demographic data (what is proﬁle of connection?) • Retention data (how do people stay or leave?) • Engagement data (how do they use the site?)
  26. 26. Case: Process the data How is the data “dirty” and how can we clean it?
  27. 27. Case: Process the data • User input • Redundancies • Feature changes • Data model changes
  28. 28. Case: Explore the data What are the meaningful patterns in the data?
  29. 29. Case: Explore the data • Triangle closing • Time overlaps • Geographic clustering
  30. 30. Case: Communicate results How do we communicate this? To whom?
  31. 31. Case: Communicate results • Tell story at the right technical level for each audience • Make sure to focus on Whats In It For You (WIIFY!) • Be objective, don’t lie with statistics • Be visual! Show, don’t just tell
  32. 32. Tools to explore “big data” • SQL Queries • Business Analytics Software • Machine Learning Algorithms
  33. 33. Tool #1: SQL queries SQL is the standard querying language to access and manipulate databases
  34. 34. SQL example friends id full_name age 1 Dan Friedman 24 2 Jared Jones 27 3 Paul Gu 22 4 Noel Duarte 73 SELECT full_name FROM friends WHERE age=73
  35. 35. Tool #2: Analytics software Business analytics software for your database enabling you to easily ﬁnd and communicate insights visually
  36. 36. Tableau example
  37. 37. Tool #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  38. 38. Iris data set example
  39. 39. Iris data set example
  40. 40. Use cases for machine learning • Classiﬁcation — Predict categories • Regression — Predict values • Anomaly Detection — Find unusual occurrences • Clustering — Discover structure
  41. 41. If this excites you…
  42. 42. This is what you’ll need • Knowledge of statistics, algorithms, & software • Comfort with languages & tools (Python, SQL, Tableau) • Inquisitiveness and intellectual curiosity • Strong communication skills
  43. 43. More about Thinkful You’ll learn concepts, practice with drills, and build capstone projects for your own portfolio — all guided by a personal mentor
  44. 44. Our mentors Mentors have, on average, 10+ years of experience
  45. 45. Our results Job Titles after GraduationMonths until Employed
  46. 46. Want to try us/data science out? • Three-week program, includes six mentor sessions for $250 • Overview of Python, Python’s data science toolkit, stats • Option to continue into full data science bootcamp • Talk to me (or email me) if you’re interested
  47. 47. October 2015 Questions? noel@thinkful.com

×