HUMAN CLONING                           The Data Scientist bottleneck resolved                                        Dr A...
exabytes data (IDC/EMC report 2008)         20,000         15,000         10,000            5,000                   0     ...
By 2018, the United States alone could face a                           shortage of 140,000 to 190,000 data people...Frida...
WE’RE ALL DOOMEDFriday, 24 February 2012
DATA PEOPLE?                                     © Drew ConwayFriday, 24 February 2012
MAYBE WE CAN JUST....    •1       statistician + 1 developer ≈ 1 data scientist?Friday, 24 February 2012
HOW ABOUT....    •4       statisticians + 4 developers ≈ 4 Data Scientists?Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO?    • Train            more new data scientists (not fast enough)    • Cross-train             people    • ...
WHAT CAN WE DO?    • Do            more workFriday, 24 February 2012
DOING MORE    • simplify             (fob the work off)    • automate               (fob even more work off)    • choose/b...
SIMPLIFY & AUTOMATE    • Counting              stuff is not much funFriday, 24 February 2012
SIMPLIFY & AUTOMATE                                             Hive                                 TSV files   HadoopFrid...
AUTOMATE / PARALLELISE                           magic                           Hadoop                             JobFri...
AUTOMATE / PARALLELISE                                      magic                                     Hadoop              ...
TOOLS    • something            thats allows fast iteration i.e. not java    • R, ruby, pythonFriday, 24 February 2012
PARALLELISEFriday, 24 February 2012
ITERATE    • try        different things    • improve                what works    • dump                 what doesn’t    ...
WE’RE NOT ALL                             DOOMEDFriday, 24 February 2012
Upcoming SlideShare
Loading in...5
×

"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

3,311

Published on

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

Published in: Spiritual, Technology

"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

  1. 1. HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex FarquharFriday, 24 February 2012
  2. 2. exabytes data (IDC/EMC report 2008) 20,000 15,000 10,000 5,000 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017Friday, 24 February 2012
  3. 3. By 2018, the United States alone could face a shortage of 140,000 to 190,000 data people...Friday, 24 February 2012
  4. 4. WE’RE ALL DOOMEDFriday, 24 February 2012
  5. 5. DATA PEOPLE? © Drew ConwayFriday, 24 February 2012
  6. 6. MAYBE WE CAN JUST.... •1 statistician + 1 developer ≈ 1 data scientist?Friday, 24 February 2012
  7. 7. HOW ABOUT.... •4 statisticians + 4 developers ≈ 4 Data Scientists?Friday, 24 February 2012
  8. 8. Friday, 24 February 2012
  9. 9. Friday, 24 February 2012
  10. 10. WHAT CAN WE DO? • Train more new data scientists (not fast enough) • Cross-train people • Cobble together different skills in teams (see above)Friday, 24 February 2012
  11. 11. WHAT CAN WE DO? • Do more workFriday, 24 February 2012
  12. 12. DOING MORE • simplify (fob the work off) • automate (fob even more work off) • choose/build the right tools • parallelise • iterateFriday, 24 February 2012
  13. 13. SIMPLIFY & AUTOMATE • Counting stuff is not much funFriday, 24 February 2012
  14. 14. SIMPLIFY & AUTOMATE Hive TSV files HadoopFriday, 24 February 2012
  15. 15. AUTOMATE / PARALLELISE magic Hadoop JobFriday, 24 February 2012
  16. 16. AUTOMATE / PARALLELISE magic Hadoop Lots of jobs at once Job 1 Job 2 Job 3 Job 4Friday, 24 February 2012
  17. 17. TOOLS • something thats allows fast iteration i.e. not java • R, ruby, pythonFriday, 24 February 2012
  18. 18. PARALLELISEFriday, 24 February 2012
  19. 19. ITERATE • try different things • improve what works • dump what doesn’t • constant improvement & learning → get fasterFriday, 24 February 2012
  20. 20. WE’RE NOT ALL DOOMEDFriday, 24 February 2012

×