Getting Started with Data Science
December 2017
http://bit.ly/data-science-sd
Deskhub-main - stake2017!
Jordan Zurowski
Thinkful Community Manager
MA in Industrial/Organizational
Psychology
About me
About you
You already have a career in data
I'm interested in switching into a data career
I just want to see what all the fuss is about
About Thinkful
Thinkful helps people become developers or data
scientists through 1-on-1 mentorship and project-based
learning
These workshops are built using this approach.
Today's Goals
What is Data Science?
How and why has the field emerged?
What do they do?
Next steps
Example: LinkedIn 2006
“[LinkedIn] was like arriving at a conference
reception and realizing you don’t know
anyone. So you just stand in the corner
sipping your drink—and you probably leave
early.”
-LinkedIn Manager, June 2006
Enter: Data Scientist
Jonathan Goldman
Joined LinkedIn in 2006, only
8M users (450M in 2016)
Started experiments to predict
people’s networks
Engineers were dismissive: “you
can already import your
address book”
The Result
Other Examples
Uber — Where drivers should hang out
Tala — Microfinance loan approval
Why now?
Big Data: datasets whose size is
beyond the ability of typical database
software tools to capture, store,
manage, and analyze
Brief history of "big data"
Trend "started" in 2005
Web 2.0 - Majority of content is created
by users
Mobile accelerates this — data/person
skyrockets
Big Data
90% of the data in the world
today has been created in the
last two years alone
- IBM, May 2013
The Problem
The Solution
Data Scientists - Jack of All Trades
Data Science is just the beginning
“The United States alone faces a shortage
of 140,000 to 190,000 people with deep
analytical skills as well as 1.5 million
managers and analysts to analyze big
data and make decisions based on their
findings.”
- McKinsey
The Process - LinkedIn Example
Frame the question
Collect the raw data
Process the data
Explore the data
Communicate results
Case: Frame the Question
What questions do we want to answer?
Case: Frame the Question
What connections (type and number) lead to
higher user engagement?
Which connections do people want to make
but are currently limited from making?
How might we predict these types of
connections with limited data from the user?
Case: Collect the Data
What data do we need to answer these
questions?
Case: Collect the Data
Connection data (who is who connected to?)
Demographic data (what is the profile of the
connection)
Engagement data (how do they use the site)
Case: Process the Data
How is the data “dirty” and how can we clean
it?
Case: Process the Data
User input
Redundancies
Feature changes
Data model changes
Case: Explore the Data
What are the meaningful patterns in the
data?
Case: Explore the Data
Triangle closing
Time overlaps
Geographic overlaps
Case: Communicate Findings
How do we communicate this? To whom?
Case: Communicate Findings
“People You Know” feature increased
clickthrough by 30% (generating millions
more page views)
Tools
SQL Queries
Business Analytics Software
Machine Learning Algorithms
#1 SQL Queries
SQL is the standard querying language
to access and manipulate databases
#1 SQL Queries
SELECT full_name FROM friends WHERE age>22
#2: Visualization Software
Business analytics software for your database
enabling you to easily find and communicate
insights visually
#2: Visualization Software
#3: Machine Learning Algorithms
Machine learning algorithms provide
computers with the ability to learn
without being explicitly programmed —
“programming by example”
Iris Data Set
Iris Data Set
Iris Data Set
Use Cases for Machine Learning
Classification — Predict categories
Regression — Predict values Anomaly
Fraud Detection — Find unusual occurrences
Clustering — Discover structure
It may seem like a daunting opportunity
But if you're interested...
Knowledge of statistics, algorithms, &
software
Comfort with languages & tools (Python,
SQL, Tableau)
Inquisitiveness and intellectual curiosity
Strong communication skills
It’s all Teachable!
Ways to keep learning
For aspiring developers...
Source: Bureau of Labor Statistics
92%of grads placed in full-time tech jobs
job guarantee
Link for the third party audit jobs report:
https://www.thinkful.com/outcomes
Thinkful's track record of getting students jobs
Our students receive unprecedented support
1-on-1 Learning Mentor
1-on-1 Career MentorProgram Manager
San Diego Community
You
1-on-1 mentorship enables flexible learning
Learn anywhere,
anytime, and at your
own schedule
You don't have to quit
your job to start career
transition
Thinkful's Free Resource
Introduction to Python, Data
Visualization, and Stats.
Unlimited mentor-led Q&A sessions
Personal Program Manager
bit.ly/tf-ds-free-
course

Deck 92-146 (3)

  • 1.
    Getting Started withData Science December 2017 http://bit.ly/data-science-sd Deskhub-main - stake2017!
  • 2.
    Jordan Zurowski Thinkful CommunityManager MA in Industrial/Organizational Psychology About me
  • 3.
    About you You alreadyhave a career in data I'm interested in switching into a data career I just want to see what all the fuss is about
  • 4.
    About Thinkful Thinkful helpspeople become developers or data scientists through 1-on-1 mentorship and project-based learning These workshops are built using this approach.
  • 5.
    Today's Goals What isData Science? How and why has the field emerged? What do they do? Next steps
  • 9.
    Example: LinkedIn 2006 “[LinkedIn]was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  • 10.
    Enter: Data Scientist JonathanGoldman Joined LinkedIn in 2006, only 8M users (450M in 2016) Started experiments to predict people’s networks Engineers were dismissive: “you can already import your address book”
  • 11.
  • 12.
    Other Examples Uber —Where drivers should hang out Tala — Microfinance loan approval
  • 13.
    Why now? Big Data:datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  • 14.
    Brief history of"big data" Trend "started" in 2005 Web 2.0 - Majority of content is created by users Mobile accelerates this — data/person skyrockets
  • 15.
    Big Data 90% ofthe data in the world today has been created in the last two years alone - IBM, May 2013
  • 16.
  • 17.
  • 18.
    Data Scientists -Jack of All Trades
  • 19.
    Data Science isjust the beginning “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” - McKinsey
  • 20.
    The Process -LinkedIn Example Frame the question Collect the raw data Process the data Explore the data Communicate results
  • 21.
    Case: Frame theQuestion What questions do we want to answer?
  • 22.
    Case: Frame theQuestion What connections (type and number) lead to higher user engagement? Which connections do people want to make but are currently limited from making? How might we predict these types of connections with limited data from the user?
  • 23.
    Case: Collect theData What data do we need to answer these questions?
  • 24.
    Case: Collect theData Connection data (who is who connected to?) Demographic data (what is the profile of the connection) Engagement data (how do they use the site)
  • 25.
    Case: Process theData How is the data “dirty” and how can we clean it?
  • 26.
    Case: Process theData User input Redundancies Feature changes Data model changes
  • 27.
    Case: Explore theData What are the meaningful patterns in the data?
  • 28.
    Case: Explore theData Triangle closing Time overlaps Geographic overlaps
  • 29.
    Case: Communicate Findings Howdo we communicate this? To whom?
  • 30.
    Case: Communicate Findings “PeopleYou Know” feature increased clickthrough by 30% (generating millions more page views)
  • 31.
    Tools SQL Queries Business AnalyticsSoftware Machine Learning Algorithms
  • 32.
    #1 SQL Queries SQLis the standard querying language to access and manipulate databases
  • 33.
    #1 SQL Queries SELECTfull_name FROM friends WHERE age>22
  • 34.
    #2: Visualization Software Businessanalytics software for your database enabling you to easily find and communicate insights visually
  • 35.
  • 36.
    #3: Machine LearningAlgorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  • 37.
  • 38.
  • 39.
  • 40.
    Use Cases forMachine Learning Classification — Predict categories Regression — Predict values Anomaly Fraud Detection — Find unusual occurrences Clustering — Discover structure
  • 41.
    It may seemlike a daunting opportunity
  • 42.
    But if you'reinterested... Knowledge of statistics, algorithms, & software Comfort with languages & tools (Python, SQL, Tableau) Inquisitiveness and intellectual curiosity Strong communication skills It’s all Teachable!
  • 43.
    Ways to keeplearning
  • 44.
    For aspiring developers... Source:Bureau of Labor Statistics
  • 45.
    92%of grads placedin full-time tech jobs job guarantee Link for the third party audit jobs report: https://www.thinkful.com/outcomes Thinkful's track record of getting students jobs
  • 46.
    Our students receiveunprecedented support 1-on-1 Learning Mentor 1-on-1 Career MentorProgram Manager San Diego Community You
  • 47.
    1-on-1 mentorship enablesflexible learning Learn anywhere, anytime, and at your own schedule You don't have to quit your job to start career transition
  • 48.
    Thinkful's Free Resource Introductionto Python, Data Visualization, and Stats. Unlimited mentor-led Q&A sessions Personal Program Manager bit.ly/tf-ds-free- course