"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

4,477 views

Published on

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

Published in: Spiritual, Technology

"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

  1. HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex FarquharFriday, 24 February 2012
  2. exabytes data (IDC/EMC report 2008) 20,000 15,000 10,000 5,000 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017Friday, 24 February 2012
  3. By 2018, the United States alone could face a shortage of 140,000 to 190,000 data people...Friday, 24 February 2012
  4. WE’RE ALL DOOMEDFriday, 24 February 2012
  5. DATA PEOPLE? © Drew ConwayFriday, 24 February 2012
  6. MAYBE WE CAN JUST.... •1 statistician + 1 developer ≈ 1 data scientist?Friday, 24 February 2012
  7. HOW ABOUT.... •4 statisticians + 4 developers ≈ 4 Data Scientists?Friday, 24 February 2012
  8. Friday, 24 February 2012
  9. Friday, 24 February 2012
  10. WHAT CAN WE DO? • Train more new data scientists (not fast enough) • Cross-train people • Cobble together different skills in teams (see above)Friday, 24 February 2012
  11. WHAT CAN WE DO? • Do more workFriday, 24 February 2012
  12. DOING MORE • simplify (fob the work off) • automate (fob even more work off) • choose/build the right tools • parallelise • iterateFriday, 24 February 2012
  13. SIMPLIFY & AUTOMATE • Counting stuff is not much funFriday, 24 February 2012
  14. SIMPLIFY & AUTOMATE Hive TSV files HadoopFriday, 24 February 2012
  15. AUTOMATE / PARALLELISE magic Hadoop JobFriday, 24 February 2012
  16. AUTOMATE / PARALLELISE magic Hadoop Lots of jobs at once Job 1 Job 2 Job 3 Job 4Friday, 24 February 2012
  17. TOOLS • something thats allows fast iteration i.e. not java • R, ruby, pythonFriday, 24 February 2012
  18. PARALLELISEFriday, 24 February 2012
  19. ITERATE • try different things • improve what works • dump what doesn’t • constant improvement & learning → get fasterFriday, 24 February 2012
  20. WE’RE NOT ALL DOOMEDFriday, 24 February 2012

×