This document discusses an introductory session on data science. It begins with introductions and an outline of the session's goals, which are to define what a data scientist is, how the field has emerged, and how to become one. It then discusses the growing demand and high salaries for data scientists. Examples are given of how data science has been applied at companies like LinkedIn, Netflix, and for fighting Ebola. Key aspects of data science like big data, Hadoop, MapReduce, and machine learning algorithms are explained. The document concludes by discussing the data science process and tools used, and encourages the audience that it is possible for them to become data scientists with the right knowledge, skills, and learning approach.
4. Today’s Goals
What is a data scientist and what do they do?
How and why has the field emerged?
How can one become a data scientist?
5. Why do we care?
“The United States alone faces a shortage of
140,000 to 190,000 people with deep analytical
skills as well as 1.5 million managers and
analysts to analyze big data and make
decisions based on their findings.”
- @McKinsey
6. Why do we care?
Also… average salaries are $115,000 a year
10. Example: LinkedIn 2006
“[LinkedIn] was like arriving at a conference
reception and realizing you don’t know
anyone. So you just stand in the corner
sipping your drink—and you probably leave
early.”
-LinkedIn Manager, June 2006
11. Enter: Data Scientist
Joined LinkedIn in 2006, only 8M
users (450M in 2016)
Started experiments to predict
people’s networks
Engineers were dismissive: “you
can already import your address
book”
Jonathan Goldman
13. Other Examples
Uber — Where drivers should hang out
Netflix — $1M movie recommendations
contest
Ebola — Mobile mapping in Senegal to fight
disease
14. Big Data
Big Data: datasets whose size is beyond the
ability of typical database software tools to
capture, store, manage, and analyze
15. Big Data - History
Trend “started” in 2005 (Hadoop!)
Web 2.0 - Majority of content is created by
users
Mobile accelerates this — data/person
skyrockets
24. The Process - LinkedIn Example
Frame the question
Collect the raw data
Process the data
Explore the data
Communicate results
25. Case: Frame the Question
What questions do we want to answer?
26. Case: Frame the Question
What connections (type and number) lead to
higher user engagement?
Which connections do people want to make
but are currently limited from making?
How might we predict these types of
connections with limited data from the user?
27. Case: Collect the Data
What data do we need to answer these
questions?
28. Case: Collect the Data
Connection data (who is who connected to?)
Demographic data (what is the profile of the
connection)
Retention data (how do people stay or leave)
Engagement data (how do they use the site)
29. Case: Process the Data
How is the data “dirty” and how can we clean
it?
30. Case: Process the Data
User input - 80/20
Redundancies - 2 emails
Feature changes
Data model changes
31. Case: Explore the Data
What are the meaningful patterns in the
data?
32. Case: Explore the Data
Triangle closing
Time overlaps
Geographic clustering
34. Case: Communicate Findings
Tell story at the right technical level for each
audience
Make sure to focus on Whats In It For You
(WIIFY!)
Be objective, don’t lie with statistics
Be visual! Show, don’t just tell
40. #3: Machine Learning Algorithms
Machine learning algorithms provide computers
with the ability to learn without being explicitly
programmed — “programming by example”
47. That someone might be you
Knowledge of statistics, algorithms, &
software
Comfort with languages & tools (Python,
SQL, Tableau)
Inquisitiveness and intellectual curiosity
Strong communication skills
It’s all Teachable!
48. Ways to keep learningLevelofsupport
Learning methods
49. 1-on-1 mentorship enables flexibility
325+ mentors with an average of 10
years of experience in the field
52. Try us out!
• Initial 3-week prep course
includes six mentor sessions
for $250
• Learn Python, Python’s data
science toolkit, Statistics intro
• Option to continue onto Data
Science bootcamp
• Talk to me (or email
tj@thinkful.com) if you’re
interested