• Save
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

  • 3,733 views
Uploaded on

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

More in: Spiritual , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,733
On Slideshare
3,127
From Embeds
606
Number of Embeds
6

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 606

http://datasciencelondon.org 592
https://twitter.com 9
http://eventifier.info 2
http://www.onlydoo.com 1
http://pult.io 1
http://eventifier.co 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex FarquharFriday, 24 February 2012
  • 2. exabytes data (IDC/EMC report 2008) 20,000 15,000 10,000 5,000 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017Friday, 24 February 2012
  • 3. By 2018, the United States alone could face a shortage of 140,000 to 190,000 data people...Friday, 24 February 2012
  • 4. WE’RE ALL DOOMEDFriday, 24 February 2012
  • 5. DATA PEOPLE? © Drew ConwayFriday, 24 February 2012
  • 6. MAYBE WE CAN JUST.... •1 statistician + 1 developer ≈ 1 data scientist?Friday, 24 February 2012
  • 7. HOW ABOUT.... •4 statisticians + 4 developers ≈ 4 Data Scientists?Friday, 24 February 2012
  • 8. Friday, 24 February 2012
  • 9. Friday, 24 February 2012
  • 10. WHAT CAN WE DO? • Train more new data scientists (not fast enough) • Cross-train people • Cobble together different skills in teams (see above)Friday, 24 February 2012
  • 11. WHAT CAN WE DO? • Do more workFriday, 24 February 2012
  • 12. DOING MORE • simplify (fob the work off) • automate (fob even more work off) • choose/build the right tools • parallelise • iterateFriday, 24 February 2012
  • 13. SIMPLIFY & AUTOMATE • Counting stuff is not much funFriday, 24 February 2012
  • 14. SIMPLIFY & AUTOMATE Hive TSV files HadoopFriday, 24 February 2012
  • 15. AUTOMATE / PARALLELISE magic Hadoop JobFriday, 24 February 2012
  • 16. AUTOMATE / PARALLELISE magic Hadoop Lots of jobs at once Job 1 Job 2 Job 3 Job 4Friday, 24 February 2012
  • 17. TOOLS • something thats allows fast iteration i.e. not java • R, ruby, pythonFriday, 24 February 2012
  • 18. PARALLELISEFriday, 24 February 2012
  • 19. ITERATE • try different things • improve what works • dump what doesn’t • constant improvement & learning → get fasterFriday, 24 February 2012
  • 20. WE’RE NOT ALL DOOMEDFriday, 24 February 2012