"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

  • 3,167 views
Uploaded on

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

More in: Spiritual , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,167
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex FarquharFriday, 24 February 2012
  • 2. exabytes data (IDC/EMC report 2008) 20,000 15,000 10,000 5,000 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017Friday, 24 February 2012
  • 3. By 2018, the United States alone could face a shortage of 140,000 to 190,000 data people...Friday, 24 February 2012
  • 4. WE’RE ALL DOOMEDFriday, 24 February 2012
  • 5. DATA PEOPLE? © Drew ConwayFriday, 24 February 2012
  • 6. MAYBE WE CAN JUST.... •1 statistician + 1 developer ≈ 1 data scientist?Friday, 24 February 2012
  • 7. HOW ABOUT.... •4 statisticians + 4 developers ≈ 4 Data Scientists?Friday, 24 February 2012
  • 8. Friday, 24 February 2012
  • 9. Friday, 24 February 2012
  • 10. WHAT CAN WE DO? • Train more new data scientists (not fast enough) • Cross-train people • Cobble together different skills in teams (see above)Friday, 24 February 2012
  • 11. WHAT CAN WE DO? • Do more workFriday, 24 February 2012
  • 12. DOING MORE • simplify (fob the work off) • automate (fob even more work off) • choose/build the right tools • parallelise • iterateFriday, 24 February 2012
  • 13. SIMPLIFY & AUTOMATE • Counting stuff is not much funFriday, 24 February 2012
  • 14. SIMPLIFY & AUTOMATE Hive TSV files HadoopFriday, 24 February 2012
  • 15. AUTOMATE / PARALLELISE magic Hadoop JobFriday, 24 February 2012
  • 16. AUTOMATE / PARALLELISE magic Hadoop Lots of jobs at once Job 1 Job 2 Job 3 Job 4Friday, 24 February 2012
  • 17. TOOLS • something thats allows fast iteration i.e. not java • R, ruby, pythonFriday, 24 February 2012
  • 18. PARALLELISEFriday, 24 February 2012
  • 19. ITERATE • try different things • improve what works • dump what doesn’t • constant improvement & learning → get fasterFriday, 24 February 2012
  • 20. WE’RE NOT ALL DOOMEDFriday, 24 February 2012