Airbnb Tech Talk: Josh Wills on The Life of a Data Scientist
Upcoming SlideShare
Loading in...5
×
 

Airbnb Tech Talk: Josh Wills on The Life of a Data Scientist

on

  • 3,798 views

This is the accompanying presentation for a tech talk given at Airbnb. ...

This is the accompanying presentation for a tech talk given at Airbnb.
Video of the talk here:
http://www.youtube.com/watch?v=h9vQIPfe2uU
Other tech talks:
https://www.airbnb.com/tech_talks

Statistics

Views

Total Views
3,798
Views on SlideShare
3,781
Embed Views
17

Actions

Likes
8
Downloads
35
Comments
0

4 Embeds 17

http://devslides.com 7
http://devvideos.com 6
http://tweetedtimes.com 3
http://us-w1.rockmelt.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • It helps to be an actual scientist first.
  • In the talk I mention “Baseball By the Numbers” instead of this book– the one I really meant. ;)

Airbnb Tech Talk: Josh Wills on The Life of a Data Scientist Airbnb Tech Talk: Josh Wills on The Life of a Data Scientist Presentation Transcript

  • The Life of a Data ScientistMay 23, 2012
  • About Me• jwills@cloudera.com and @josh_wills• Formerly of Google (2008 – 2011) • Worked on the ad auction • Led the team that build the data infrastructure for Google+• Before that: a bunch of startups • Sometimes as a software engineer, sometimes as a statistician• Math degree from Duke and a half-finished PhD from The University of Texas at Austin• Now: Director of Data Science at Cloudera Copyright 2011 Cloudera Inc. All rights reserved 2012 2
  • @josh_wills, #hacker vs.@josh_wills, #ThoughtLeader Copyright 2012 Cloudera Inc. All rights reserved
  • What is a Data Scientist? Copyright 2012 Cloudera Inc. All rights reserved
  • One Definition… Copyright 2012 Cloudera Inc. All rights reserved
  • … versus Another Copyright 2012 Cloudera Inc. All rights reserved
  • Why Is Everyone Talking About Them? Copyright 2012 Cloudera Inc. All rights reserved
  • Because They Make Things Fun. Copyright 2012 Cloudera Inc. All rights reserved
  • Data Scientists Power The Products You Love Copyright 2012 Cloudera Inc. All rights reserved
  • The Job Isn’t New. The Impact Is. Copyright 2012 Cloudera Inc. All rights reserved
  • How Do I Become One? Copyright 2012 Cloudera Inc. All rights reserved
  • The Standard Reply Copyright 2012 Cloudera Inc. All rights reserved
  • Personality Trait #1: Relentless, but in a Lazy Way Copyright 2012 Cloudera Inc. All rights reserved
  • Personality Trait #2: (Acquired) Humility Copyright 2012 Cloudera Inc. All rights reserved
  • Step 1: Study Math Copyright 2012 Cloudera Inc. All rights reserved
  • But…I didn’t study math. Copyright 2012 Cloudera Inc. All rights reserved
  • Alternate Step 1: Study (Computer) Science Copyright 2012 Cloudera Inc. All rights reserved
  • Things People Don’t Know About Computer Science Copyright 2012 Cloudera Inc. All rights reserved
  • Things Scientists Don’t Know About Statistics Copyright 2012 Cloudera Inc. All rights reserved
  • Problem Solving In Context Copyright 2012 Cloudera Inc. All rights reserved
  • Phase 2: Stuff You Still Don’t Know Copyright 2012 Cloudera Inc. All rights reserved
  • Statisticians: How to Work on a Engineering Team • Modular software design • Unit tests • Code reviews • Automated build and test infrastructure • Source code management Copyright 2012 Cloudera Inc. All rights reserved
  • Software Engineers: How to Carry Out an Analysis Copyright 2012 Cloudera Inc. All rights reserved
  • Industrial Machine Learning Copyright 2012 Cloudera Inc. All rights reserved
  • Data Scientists and Hadoop Copyright 2012 Cloudera Inc. All rights reserved
  • Data Analyst“If my tools and data can’t answer a question, thenthe question doesn’t get answered.” Copyright 2012 Cloudera Inc. All rights reserved
  • Data Scientist“If my tools and data can’t answer a question, thenI go get better tools and data.” Copyright 2012 Cloudera Inc. All rights reserved
  • Incredibly Common Question “When should I use Hadoop instead of a relational database?” Copyright 2012 Cloudera Inc. All rights reserved
  • The Unit of Analysis Problem: Three Symptoms Copyright 2012 Cloudera Inc. All rights reserved
  • First Symptom: COUNT DISTINCT Copyright 2012 Cloudera Inc. All rights reserved
  • Second Symptom: Cursors Copyright 2012 Cloudera Inc. All rights reserved
  • Third Symptom: ALTER TABLE OF_DOOM Copyright 2012 Cloudera Inc. All rights reserved
  • The Unit of Analysis Problem • Data warehouses are optimized to analyze transactions • Awesome for finance and ERP • Not ideal for product and marketing • A function of what databases are good at Copyright 2012 Cloudera Inc. All rights reserved
  • What Are You Trying to Analyze? Simple Entities Complex Entities • Static attributes • Evolving attributes • Flat data structure • Hierarchical data structure • Transient • Persistent • Examples • Examples • SKUs • Customers • Line items from an invoice • Suppliers • Log messages • Website visitors Copyright 2012 Cloudera Inc. All rights reserved
  • Choosing Our Own Data Format • We get to structure our data in the way that works best for the problem we are solving • Flexible • Evolvable • Compact • Fast serialization/deserializati on Copyright 2012 Cloudera Inc. All rights reserved
  • Spell Correction: The Drosophila of Data Science Copyright 2012 Cloudera Inc. All rights reserved
  • Simple Counts on Complex Objects Copyright 2012 Cloudera Inc. All rights reserved
  • The Uncanny Valley for Statisticians on Hadoop Copyright 2012 Cloudera Inc. All rights reserved
  • The Business of Data Science Copyright 2012 Cloudera Inc. All rights reserved
  • Where You Should Work: The Two Options Copyright 2012 Cloudera Inc. All rights reserved
  • A Startup Copyright 2012 Cloudera Inc. All rights reserved
  • Close to the Money Copyright 2012 Cloudera Inc. All rights reserved
  • Dealing for Data Copyright 2012 Cloudera Inc. All rights reserved
  • Education and Growth Copyright 2012 Cloudera Inc. All rights reserved
  • Questions?@josh_wills