Introductions
➔ What's your name?
➔ What brought you here today?
➔ What is your programming experience?
About Thinkful
We train web developers
and data scientists
through 1x1 mentorship
and project-based
learning.
Guaranteed.
“[LinkedIn] was like arriving at a conference
reception and realizing you don’t know anyone. So
you just stand in the corner sipping your drink —
and you probably leave early.”
— LinkedIn Manager, June 2006
Example:
LinkedIn
2006
➔ Joined LinkedIn in 2006, only 8M
users (450M in 2016)
➔ Started experiments to predict
people’s networks
➔ Engineers were dismissive: “you
can already import your address
book”
Enter:
Data
Scientist
The
Result
➔ Uber: where drivers should hang out
➔ Tala: microfinance loan approval
Other
Examples
➔ Big Data: datasets whose size is
beyond the ability of typical
database software to capture,
store, manage, and analyze
Why
Now?
➔ Trend "started" in 2005
➔ Web 2.0 - Majority of content is created by
users
➔ Mobile accelerates this — data/person skyrockets
Brief
History
of
Big Data
“90% of the data in the world today has been created
in the last two years alone”
— IBM, May 2013
Big
Data
The
Problem
The
Solution
Jack
of
all
Trades
“The United States alone faces a shortage of 140,000
to 190,000 people with deep analytical skills as
well as 1.5 million managers and analysts to analyze
big data and make decisions based on their
findings.”
— McKinsey
Just
the
Beginning
➔ Frame the question
➔ Collect the raw data
➔ Process the data
➔ Explore the data
➔ Communicate results
The
Process:
LinkedIn
Example
➔ What questions do we want to answer?
◆ Who?
◆ What?
◆ When?
◆ Where?
◆ Why?
◆ How?
Case:
Frame
the
Question
➔ What connections (type and number) lead to higher
user engagement?
➔ Which connections do people want to make but are
currently limited from making?
➔ How might we predict these types of connections with
limited data from the user?
Case:
Frame
the
Question
➔ What data do we need to
answer these questions?
Case:
Collect
the
Data
➔ Connection data (who is who connected to?)
➔ Demographic data (what is the profile of
the connection)
➔ Engagement data (how do they use the site)
Case:
Collect
the
Data
➔ How is the data
“dirty” and how can
we clean it?
Case:
Process
the
Data
➔ User input
➔ Redundancies
➔ Feature changes
➔ Data model changes
Case:
Process
the
Data
➔ What are the meaningful
patterns in the data?
Case:
Explore
the
Data
➔ Triangle closing
➔ Time Overlaps
➔ Geographic Overlaps
Case:
Explore
the Data
➔ How do we communicate this?
➔ To whom?
Case:
Communicate
Findings
➔ Marketing - sell X more ad space, results in X more
impressions per day
➔ Product - build X more features
➔ Development - grow our team by X
➔ Sales - attract X more premium accounts
➔ C-Level - more revenue, 8M - 450M in 10 years
Case:
Communicate
Findings
➔ SQL Queries
➔ Business Analytics Software
➔ Machine Learning Algorithms
Tools
SQL is the standard
querying language to access
and manipulate databases
SQL
Queries
SELECT full_name
FROM friends
WHERE age>22
SQL
Queries
Business analytics
software for your database
enabling you to easily
find and communicate
insights visually
Visualization
Software
Visualization
Software
Machine
Learning
Algorithms
Machine
Learning
Algorithms
Machine learning algorithms provide
computers with the ability to learn
without being explicitly programmed —
“programming by example”
Iris
Data
Set
Iris
Data
Set
Iris
Data
Set
➔ Classification
➔ Regression
➔ Fraud Detection
➔ Clustering
Use
Cases
for
Machine
Learning
➔ Classification
◆ What defined set does this
new piece of data belong to
◆ Examples?
Use
Cases
for
Machine
Learning
➔ Classification
◆ Is this Email Spam or Not
Spam?
◆ Pattern Recognition
◆ Handwriting Recognition
◆ Speech Recognition
◆ Iris Species
◆ Supervised Learning
Use
Cases
for
Machine
Learning
➔ Regression
◆ Using Past Data to
determine the future
◆ Examples?
Use
Cases
for
Machine
Learning
➔ Regression
◆ Using Past Data to
determine the future
◆ Stock Market Prices
◆ House Prices
◆ Meetup Attendance
Use
Cases
for
Machine
Learning
➔ Fraud Detection
◆ Finding Uncommon Patterns
to Route to a Human
◆ Examples?
Use
Cases
for
Machine
Learning
➔ Fraud Detection
◆ Finding Uncommon Patterns
to Route to a Human
◆ Credit Card Purchases
◆ Online Reviews
◆ Gaming
◆ DDOS
Use
Cases
for
Machine
Learning
➔ Clustering
◆ Grouping data inputs into
groups that differentiate
them from other groups
◆ Examples?
Use
Cases
for
Machine
Learning
➔ Clustering
◆ Grouping data inputs into
groups that differentiate
them from other groups
◆ Image Analysis &
Identification - Cat
Breeds
◆ Customer/Donor
Segmentation & Strategy
◆ Unsupervised Learning
Use
Cases
for
Machine
Learning
➔ Python for Programming
◆ Great for Data Science
◆ Robotics
◆ Web Development (Python /
Django)
◆ Automation
Python
def greet(name):
print 'Hello', name
greet('Jack')
greet('Jill')
greet('Bob')
Python
A
Daunting
Opportunity
➔ Knowledge of statistics, algorithms, &
software
➔ Comfort with languages & tools (Python, SQL,
Tableau)
➔ Inquisitiveness and intellectual curiosity
➔ Strong communication skills
➔ It’s all teachable and learnable!
But
If
You’re
Interested...
Ways
to
Learn
Data
Science
➔ Start with Python and Statistics
➔ Personal Program Manager
➔ Unlimited Q&A Sessions
➔ Student Slack Community
➔ bit.ly/freetrial-ds
Thinkful
Two-Week
Free
Trial
The
Student
Experience
Marnie Boyer, Thinkful Graduate
Capstone
Wolfgang Hall, Thinkful Graduate
Capstone
➔ bit.ly/tf-event-feedback
Survey

Tf gsds

  • 2.
    Introductions ➔ What's yourname? ➔ What brought you here today? ➔ What is your programming experience?
  • 3.
    About Thinkful We trainweb developers and data scientists through 1x1 mentorship and project-based learning. Guaranteed.
  • 5.
    “[LinkedIn] was likearriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink — and you probably leave early.” — LinkedIn Manager, June 2006 Example: LinkedIn 2006
  • 6.
    ➔ Joined LinkedInin 2006, only 8M users (450M in 2016) ➔ Started experiments to predict people’s networks ➔ Engineers were dismissive: “you can already import your address book” Enter: Data Scientist
  • 7.
  • 8.
    ➔ Uber: wheredrivers should hang out ➔ Tala: microfinance loan approval Other Examples
  • 9.
    ➔ Big Data:datasets whose size is beyond the ability of typical database software to capture, store, manage, and analyze Why Now?
  • 10.
    ➔ Trend "started"in 2005 ➔ Web 2.0 - Majority of content is created by users ➔ Mobile accelerates this — data/person skyrockets Brief History of Big Data
  • 11.
    “90% of thedata in the world today has been created in the last two years alone” — IBM, May 2013 Big Data
  • 12.
  • 13.
  • 14.
  • 15.
    “The United Statesalone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” — McKinsey Just the Beginning
  • 16.
    ➔ Frame thequestion ➔ Collect the raw data ➔ Process the data ➔ Explore the data ➔ Communicate results The Process: LinkedIn Example
  • 17.
    ➔ What questionsdo we want to answer? ◆ Who? ◆ What? ◆ When? ◆ Where? ◆ Why? ◆ How? Case: Frame the Question
  • 18.
    ➔ What connections(type and number) lead to higher user engagement? ➔ Which connections do people want to make but are currently limited from making? ➔ How might we predict these types of connections with limited data from the user? Case: Frame the Question
  • 19.
    ➔ What datado we need to answer these questions? Case: Collect the Data
  • 20.
    ➔ Connection data(who is who connected to?) ➔ Demographic data (what is the profile of the connection) ➔ Engagement data (how do they use the site) Case: Collect the Data
  • 21.
    ➔ How isthe data “dirty” and how can we clean it? Case: Process the Data
  • 22.
    ➔ User input ➔Redundancies ➔ Feature changes ➔ Data model changes Case: Process the Data
  • 23.
    ➔ What arethe meaningful patterns in the data? Case: Explore the Data
  • 24.
    ➔ Triangle closing ➔Time Overlaps ➔ Geographic Overlaps Case: Explore the Data
  • 25.
    ➔ How dowe communicate this? ➔ To whom? Case: Communicate Findings
  • 26.
    ➔ Marketing -sell X more ad space, results in X more impressions per day ➔ Product - build X more features ➔ Development - grow our team by X ➔ Sales - attract X more premium accounts ➔ C-Level - more revenue, 8M - 450M in 10 years Case: Communicate Findings
  • 27.
    ➔ SQL Queries ➔Business Analytics Software ➔ Machine Learning Algorithms Tools
  • 28.
    SQL is thestandard querying language to access and manipulate databases SQL Queries
  • 29.
  • 30.
    Business analytics software foryour database enabling you to easily find and communicate insights visually Visualization Software
  • 31.
  • 32.
  • 33.
    Machine Learning Algorithms Machine learning algorithmsprovide computers with the ability to learn without being explicitly programmed — “programming by example”
  • 34.
  • 35.
  • 36.
  • 37.
    ➔ Classification ➔ Regression ➔Fraud Detection ➔ Clustering Use Cases for Machine Learning
  • 38.
    ➔ Classification ◆ Whatdefined set does this new piece of data belong to ◆ Examples? Use Cases for Machine Learning
  • 39.
    ➔ Classification ◆ Isthis Email Spam or Not Spam? ◆ Pattern Recognition ◆ Handwriting Recognition ◆ Speech Recognition ◆ Iris Species ◆ Supervised Learning Use Cases for Machine Learning
  • 40.
    ➔ Regression ◆ UsingPast Data to determine the future ◆ Examples? Use Cases for Machine Learning
  • 41.
    ➔ Regression ◆ UsingPast Data to determine the future ◆ Stock Market Prices ◆ House Prices ◆ Meetup Attendance Use Cases for Machine Learning
  • 42.
    ➔ Fraud Detection ◆Finding Uncommon Patterns to Route to a Human ◆ Examples? Use Cases for Machine Learning
  • 43.
    ➔ Fraud Detection ◆Finding Uncommon Patterns to Route to a Human ◆ Credit Card Purchases ◆ Online Reviews ◆ Gaming ◆ DDOS Use Cases for Machine Learning
  • 44.
    ➔ Clustering ◆ Groupingdata inputs into groups that differentiate them from other groups ◆ Examples? Use Cases for Machine Learning
  • 45.
    ➔ Clustering ◆ Groupingdata inputs into groups that differentiate them from other groups ◆ Image Analysis & Identification - Cat Breeds ◆ Customer/Donor Segmentation & Strategy ◆ Unsupervised Learning Use Cases for Machine Learning
  • 46.
    ➔ Python forProgramming ◆ Great for Data Science ◆ Robotics ◆ Web Development (Python / Django) ◆ Automation Python
  • 47.
    def greet(name): print 'Hello',name greet('Jack') greet('Jill') greet('Bob') Python
  • 48.
  • 49.
    ➔ Knowledge ofstatistics, algorithms, & software ➔ Comfort with languages & tools (Python, SQL, Tableau) ➔ Inquisitiveness and intellectual curiosity ➔ Strong communication skills ➔ It’s all teachable and learnable! But If You’re Interested...
  • 50.
  • 51.
    ➔ Start withPython and Statistics ➔ Personal Program Manager ➔ Unlimited Q&A Sessions ➔ Student Slack Community ➔ bit.ly/freetrial-ds Thinkful Two-Week Free Trial
  • 52.
    The Student Experience Marnie Boyer, ThinkfulGraduate Capstone Wolfgang Hall, Thinkful Graduate Capstone
  • 53.