Your SlideShare is downloading. ×
0
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

3,285

Published on

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

Dr. Alex Farquhar, Data Scientist @ForwardTek presentation at Data Science London @ds_ldn On the scarcity of data scientists, and how data scientists can maximise their output.

Published in: Spiritual, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,285
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex FarquharFriday, 24 February 2012
  • 2. exabytes data (IDC/EMC report 2008) 20,000 15,000 10,000 5,000 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017Friday, 24 February 2012
  • 3. By 2018, the United States alone could face a shortage of 140,000 to 190,000 data people...Friday, 24 February 2012
  • 4. WE’RE ALL DOOMEDFriday, 24 February 2012
  • 5. DATA PEOPLE? © Drew ConwayFriday, 24 February 2012
  • 6. MAYBE WE CAN JUST.... •1 statistician + 1 developer ≈ 1 data scientist?Friday, 24 February 2012
  • 7. HOW ABOUT.... •4 statisticians + 4 developers ≈ 4 Data Scientists?Friday, 24 February 2012
  • 8. Friday, 24 February 2012
  • 9. Friday, 24 February 2012
  • 10. WHAT CAN WE DO? • Train more new data scientists (not fast enough) • Cross-train people • Cobble together different skills in teams (see above)Friday, 24 February 2012
  • 11. WHAT CAN WE DO? • Do more workFriday, 24 February 2012
  • 12. DOING MORE • simplify (fob the work off) • automate (fob even more work off) • choose/build the right tools • parallelise • iterateFriday, 24 February 2012
  • 13. SIMPLIFY & AUTOMATE • Counting stuff is not much funFriday, 24 February 2012
  • 14. SIMPLIFY & AUTOMATE Hive TSV files HadoopFriday, 24 February 2012
  • 15. AUTOMATE / PARALLELISE magic Hadoop JobFriday, 24 February 2012
  • 16. AUTOMATE / PARALLELISE magic Hadoop Lots of jobs at once Job 1 Job 2 Job 3 Job 4Friday, 24 February 2012
  • 17. TOOLS • something thats allows fast iteration i.e. not java • R, ruby, pythonFriday, 24 February 2012
  • 18. PARALLELISEFriday, 24 February 2012
  • 19. ITERATE • try different things • improve what works • dump what doesn’t • constant improvement & learning → get fasterFriday, 24 February 2012
  • 20. WE’RE NOT ALL DOOMEDFriday, 24 February 2012

×