SlideShare a Scribd company logo
The Data Science Handbook
Advice and Insights from 25 Amazing Data
Scientists
Carl Shan, Henry Wang, William Chen and Max
Song
This book is for sale at http://leanpub.com/datasciencehandbook
This version was published on 2015-05-04
This is a Leanpub book. Leanpub empowers authors and
publishers with the Lean Publishing process. Lean Publishing is
the act of publishing an in-progress ebook using lightweight tools
and many iterations to get reader feedback, pivot until you have
the right book and build traction once you do.
©2015 The Data Science Bookshelf
To our parents, siblings, friends and mentors. Your support and
encouragement has been the fuel for our fire.
Contents
Foreword by Jake Klamka . . . . . . . . . . . . . . . . . . 1
Foreword by Jake Klamka
In the past five years, data science has gone from a nascent, tech
industry competency to a field that is having a global, cross-
industry impact in almost every major area of human endeavour.
From education, to energy, to government, to non-profits and, of
course, software and the Internet, data science is creating immense
value for companies and organizations across the world. In fact,
in early 2015, the President of the United States announced the
creation of the new role of Chief Data Scientist to the White House,
appointing one of the interviewees of this book, DJ Patil.
Like many innovations in the world, the birth and growth of this
industry was started by a few motivated people. Over the last few
years, they founded, developed and advocated for the value that
data analytics can bring to every industry around the world. In
The Data Science Handbook, you will have the opportunity to meet
many of these founding data scientists, hear first hand accounts of
the incredible journeys they took, and where they think the field is
headed.
The road to becoming a data scientist is not always an easy one.
When I tried to transition from experimental particle physics to
industry, resources were few and far between. In fact, although a
need for data science existed in companies, the job title had not
been created yet. I spent a lot of time learning and teaching myself,
working on various startup projects, and later saw many of my
friends from academia run into the same challenges.
I saw a groundswell of incredibly gifted and highly trained re-
searchers who were excited about moving into data-driven roles,
yet they were missing key pieces of knowledge, and had trouble
transferring the incredible quantitative and data analysis skills they
Foreword by Jake Klamka 2
had gained in their research to a career in industry. Meanwhile,
having lived and worked in Silicon Valley, I also saw that there was
a very strong demand from the technology companies who wanted
to hire these people.
To help others bridge the gap between academia and industry, I
founded the Insight Data Science Fellows Program¹ in 2012. Insight
is a training fellowship that helps quantitative PhDs transition
from academia to industry. Over the last few years, we’ve helped
hundreds of Insight Fellows, from fields like physics, computational
biology, neuroscience, math, and engineering transition from a
background in academia to become leading data scientists at com-
panies like Facebook, Airbnb, LinkedIn, New York Times, Memorial
Sloan Kettering Cancer Center and nearly a hundred other compa-
nies, with a strong alumni network on both the East and West Coast.
In my personal journey to enter the technology field, and creating
a community for others to do the same, one key resource I found
to be tremendously useful was conversations with others who had
successfully made the transition themselves. As I developed Insight,
I have had the chance to engage with some of Silicon Valley’s best
data scientists who are mentors to the program:
Jonathan Goldman created one of the first data products at LinkedIn–
People You May Know–which transformed the growth trajectory
of the company. DJ Patil build and grew the data science team
at LinkedIn into a powerhouse and in the process co-coined the
term “Data Scientist.” Riley Newman worked on developing product
analytics that was instrumental in Airbnb’s growth. Jace Kohlmeier
led the data team at Khan Academy that helped to define how to
optimize learning at a scale of millions of students.
Unfortunately, face-to-face time with people has trouble scaling.
At Insight, to maintain an exceptional high quality and personal
time with its mentors, we accept a small group of talented scientists
and engineers three times per year. The Data Science Handbook
¹http://insightdatascience.com/
Foreword by Jake Klamka 3
provides readers with a way to have that in-depth conversation at
scale. By reading the interviews in The Data Science Handbook,
you will have the experience of learning from the leaders in data
science at your own pace, no matter where you are in the world.
Each interview is an in-depth conversation, covering the personal
stories of these data scientists from their initial experiences that
helped them find their own path to a career in data science.
It’s not just the early data science leaders who can have a big
impact on the field. There is also new talent entering the field, with
the opportunity for each and every new member to push the field
forward. When I met the authors of this book, they were still college
students and aspiring data scientists, full of the same questions that
those beginning in data science have. Through 18 months of hard
work, they have gone and done the legwork for all those interested,
seeking out some of the best data scientists around the country, and
asking them for their advice and guidance. This book is the result
of that work, containing over 100 hours of collected wisdom with
people otherwise inaccessible to talk to (imagine having to compete
with President Obama to talk with DJ Patil!). In the meantime, these
young authors also have gone on to earn their own stripes as data
scientists, working at some well-known companies.
By reading these extended, informal interviews, you will get to sit
down with industry trailblazers like DJ Patil, Jonathan Goldman
and Pete Skomoroch, who were all part of the core, early LinkedIn
data science teams. You will meet with Hilary Mason and Drew
Conway, who were instrumental in creating the thriving New
York data science community. You will hear advice from the next
generation of data science leaders, like Diane Wu and Chris Moody,
both former PhDs and Insight Alumni, who are now blazing new
trails at MetaMinds and Stitch Fix. You will meet data scientists
who are having a big impact in academia, including Bradley Voytek
from UCSD and Joe Blitzstein from Harvard. You will meet data
scientists in startups like Clare Corthell from Mattermark and Kunal
Punera of Bento Labs, who will share how they use data science and
Foreword by Jake Klamka 4
analytics as a core competitive advantage.
The data scientists in the Data Science Handbook, along with
dozens of others, have helped create the very industry that is now
having such a tremendous impact on the world. Here, in this book,
they discuss the mindset that allowed them to create this industry,
address misconceptions about the field, share stories of specific
challenges and victories, and talk about what skills they look for
when building their teams. By reading their stories, hearing how
they think and learning about where they see the future of data
science going, you will gain the context to think of ways you can
both have an impact and perhaps advance the field yourself in the
years to come.
Jake Klamka
Founder
Insight Data Science Fellows Program²
Insight Data Engineering Fellows Program³
Insight Health Data Science Fellows Program⁴
²http://insightdatascience.com/
³http://insightdataengineering.com/
⁴http://insighthealthdata.com/

More Related Content

What's hot

NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA DATASCIENCE
 
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 "Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
Leon Kappelman
 
Leading Without Formal Authority -- By Using Data
Leading Without Formal Authority -- By Using DataLeading Without Formal Authority -- By Using Data
Leading Without Formal Authority -- By Using Data
Terri Griffith
 
Wtf is data science?
Wtf is data science?Wtf is data science?
Wtf is data science?
Dylan
 
Data Science and Decision Making
Data Science and Decision MakingData Science and Decision Making
Data Science and Decision Making
Luciano Vilas Boas
 
Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
EMC
 
data mining
data mining data mining
data mining
ellen16187
 
CDO Slides: A Chief Data Officer Interview
CDO Slides: A Chief Data Officer InterviewCDO Slides: A Chief Data Officer Interview
CDO Slides: A Chief Data Officer Interview
DATAVERSITY
 
Do This, Not That: Rowan-Salisbury Schools
Do This, Not That: Rowan-Salisbury SchoolsDo This, Not That: Rowan-Salisbury Schools
Do This, Not That: Rowan-Salisbury Schools
Analisa Sorrells
 
Use of Technology Tools to Improve Leadership
Use of Technology Tools to Improve LeadershipUse of Technology Tools to Improve Leadership
Use of Technology Tools to Improve Leadership
Bill Sheridan, CAE
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
eSAT Journals
 
Measuring What Matters: Meaningful Metrics
Measuring What Matters: Meaningful MetricsMeasuring What Matters: Meaningful Metrics
Measuring What Matters: Meaningful Metrics
Social Media for Nonprofits
 
Inspiration Architecture: The Future of Libraries
Inspiration Architecture: The Future of LibrariesInspiration Architecture: The Future of Libraries
Inspiration Architecture: The Future of Libraries
Peter Morville
 
HICSS - 50
HICSS - 50 HICSS - 50
HICSS - 50
diannepatricia
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
University of Washington
 
Altmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implicationsAltmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implications
KR_Barker
 

What's hot (19)

NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 "Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
 
Leading Without Formal Authority -- By Using Data
Leading Without Formal Authority -- By Using DataLeading Without Formal Authority -- By Using Data
Leading Without Formal Authority -- By Using Data
 
Wtf is data science?
Wtf is data science?Wtf is data science?
Wtf is data science?
 
Data Science and Decision Making
Data Science and Decision MakingData Science and Decision Making
Data Science and Decision Making
 
Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science Infographic
 
Who We Are
Who We AreWho We Are
Who We Are
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
data mining
data mining data mining
data mining
 
CDO Slides: A Chief Data Officer Interview
CDO Slides: A Chief Data Officer InterviewCDO Slides: A Chief Data Officer Interview
CDO Slides: A Chief Data Officer Interview
 
Do This, Not That: Rowan-Salisbury Schools
Do This, Not That: Rowan-Salisbury SchoolsDo This, Not That: Rowan-Salisbury Schools
Do This, Not That: Rowan-Salisbury Schools
 
Use of Technology Tools to Improve Leadership
Use of Technology Tools to Improve LeadershipUse of Technology Tools to Improve Leadership
Use of Technology Tools to Improve Leadership
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 
Measuring What Matters: Meaningful Metrics
Measuring What Matters: Meaningful MetricsMeasuring What Matters: Meaningful Metrics
Measuring What Matters: Meaningful Metrics
 
Inspiration Architecture: The Future of Libraries
Inspiration Architecture: The Future of LibrariesInspiration Architecture: The Future of Libraries
Inspiration Architecture: The Future of Libraries
 
HICSS - 50
HICSS - 50 HICSS - 50
HICSS - 50
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Altmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implicationsAltmetrics: the movement, the tools, and the implications
Altmetrics: the movement, the tools, and the implications
 

Similar to Datasciencehandbook sample

Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
Booz Allen Hamilton
 
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | Hamilton
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | HamiltonThe field-guide-to-data-science 2015 (second edition) By Booz | Allen | Hamilton
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | Hamilton
Arysha Channa
 
Chicago School of Data Book
Chicago School of Data BookChicago School of Data Book
Chicago School of Data Book
Smart Chicago Collaborative
 
Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)
MiguelRosario24
 
1    A Guide to Performing a Needs Assessment and a .docx
1    A Guide to Performing a Needs Assessment and a .docx1    A Guide to Performing a Needs Assessment and a .docx
1    A Guide to Performing a Needs Assessment and a .docx
croftsshanon
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 
Data Scientist
Data ScientistData Scientist
Data Scientist
Prince Barai
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
Lin Todd
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
12 sept2013 imd network orchestration martha g russell
12 sept2013 imd network orchestration martha g russell12 sept2013 imd network orchestration martha g russell
12 sept2013 imd network orchestration martha g russell
Martha Russell
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
Josh Cowls
 
ASIS&T Diane Sonnenwald Information Science as a Career
ASIS&T Diane Sonnenwald Information Science as a Career ASIS&T Diane Sonnenwald Information Science as a Career
ASIS&T Diane Sonnenwald Information Science as a Career
ASIS&T
 
Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -
Good Rebels
 
Emerging technologies and the future of libraries (and library systems). Keyn...
Emerging technologies and the future of libraries (and library systems). Keyn...Emerging technologies and the future of libraries (and library systems). Keyn...
Emerging technologies and the future of libraries (and library systems). Keyn...
Ken Chad Consulting Ltd
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
Ioannis Kourouklides
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
BICC Thomas More
 
Pros and Cons of Open Data: A Global South Perspective
Pros and Cons of Open Data: A Global South PerspectivePros and Cons of Open Data: A Global South Perspective
Pros and Cons of Open Data: A Global South Perspective
Michelle Willmers
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
eraser Juan José Calderón
 
In an era of social media, voluntary communities and networks, are you a conn...
In an era of social media, voluntary communities and networks, are you a conn...In an era of social media, voluntary communities and networks, are you a conn...
In an era of social media, voluntary communities and networks, are you a conn...
Kelly Hoey
 
SLA RGC Universe
SLA RGC Universe SLA RGC Universe
SLA RGC Universe
Access Innovations, Inc.
 

Similar to Datasciencehandbook sample (20)

Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
 
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | Hamilton
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | HamiltonThe field-guide-to-data-science 2015 (second edition) By Booz | Allen | Hamilton
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | Hamilton
 
Chicago School of Data Book
Chicago School of Data BookChicago School of Data Book
Chicago School of Data Book
 
Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)
 
1    A Guide to Performing a Needs Assessment and a .docx
1    A Guide to Performing a Needs Assessment and a .docx1    A Guide to Performing a Needs Assessment and a .docx
1    A Guide to Performing a Needs Assessment and a .docx
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data Scientist
Data ScientistData Scientist
Data Scientist
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
12 sept2013 imd network orchestration martha g russell
12 sept2013 imd network orchestration martha g russell12 sept2013 imd network orchestration martha g russell
12 sept2013 imd network orchestration martha g russell
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
ASIS&T Diane Sonnenwald Information Science as a Career
ASIS&T Diane Sonnenwald Information Science as a Career ASIS&T Diane Sonnenwald Information Science as a Career
ASIS&T Diane Sonnenwald Information Science as a Career
 
Data Scientist - Good Rebels -
Data Scientist - Good Rebels -Data Scientist - Good Rebels -
Data Scientist - Good Rebels -
 
Emerging technologies and the future of libraries (and library systems). Keyn...
Emerging technologies and the future of libraries (and library systems). Keyn...Emerging technologies and the future of libraries (and library systems). Keyn...
Emerging technologies and the future of libraries (and library systems). Keyn...
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
Pros and Cons of Open Data: A Global South Perspective
Pros and Cons of Open Data: A Global South PerspectivePros and Cons of Open Data: A Global South Perspective
Pros and Cons of Open Data: A Global South Perspective
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 
In an era of social media, voluntary communities and networks, are you a conn...
In an era of social media, voluntary communities and networks, are you a conn...In an era of social media, voluntary communities and networks, are you a conn...
In an era of social media, voluntary communities and networks, are you a conn...
 
SLA RGC Universe
SLA RGC Universe SLA RGC Universe
SLA RGC Universe
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 

Datasciencehandbook sample

  • 1.
  • 2. The Data Science Handbook Advice and Insights from 25 Amazing Data Scientists Carl Shan, Henry Wang, William Chen and Max Song This book is for sale at http://leanpub.com/datasciencehandbook This version was published on 2015-05-04 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. ©2015 The Data Science Bookshelf
  • 3. To our parents, siblings, friends and mentors. Your support and encouragement has been the fuel for our fire.
  • 4. Contents Foreword by Jake Klamka . . . . . . . . . . . . . . . . . . 1
  • 5. Foreword by Jake Klamka In the past five years, data science has gone from a nascent, tech industry competency to a field that is having a global, cross- industry impact in almost every major area of human endeavour. From education, to energy, to government, to non-profits and, of course, software and the Internet, data science is creating immense value for companies and organizations across the world. In fact, in early 2015, the President of the United States announced the creation of the new role of Chief Data Scientist to the White House, appointing one of the interviewees of this book, DJ Patil. Like many innovations in the world, the birth and growth of this industry was started by a few motivated people. Over the last few years, they founded, developed and advocated for the value that data analytics can bring to every industry around the world. In The Data Science Handbook, you will have the opportunity to meet many of these founding data scientists, hear first hand accounts of the incredible journeys they took, and where they think the field is headed. The road to becoming a data scientist is not always an easy one. When I tried to transition from experimental particle physics to industry, resources were few and far between. In fact, although a need for data science existed in companies, the job title had not been created yet. I spent a lot of time learning and teaching myself, working on various startup projects, and later saw many of my friends from academia run into the same challenges. I saw a groundswell of incredibly gifted and highly trained re- searchers who were excited about moving into data-driven roles, yet they were missing key pieces of knowledge, and had trouble transferring the incredible quantitative and data analysis skills they
  • 6. Foreword by Jake Klamka 2 had gained in their research to a career in industry. Meanwhile, having lived and worked in Silicon Valley, I also saw that there was a very strong demand from the technology companies who wanted to hire these people. To help others bridge the gap between academia and industry, I founded the Insight Data Science Fellows Program¹ in 2012. Insight is a training fellowship that helps quantitative PhDs transition from academia to industry. Over the last few years, we’ve helped hundreds of Insight Fellows, from fields like physics, computational biology, neuroscience, math, and engineering transition from a background in academia to become leading data scientists at com- panies like Facebook, Airbnb, LinkedIn, New York Times, Memorial Sloan Kettering Cancer Center and nearly a hundred other compa- nies, with a strong alumni network on both the East and West Coast. In my personal journey to enter the technology field, and creating a community for others to do the same, one key resource I found to be tremendously useful was conversations with others who had successfully made the transition themselves. As I developed Insight, I have had the chance to engage with some of Silicon Valley’s best data scientists who are mentors to the program: Jonathan Goldman created one of the first data products at LinkedIn– People You May Know–which transformed the growth trajectory of the company. DJ Patil build and grew the data science team at LinkedIn into a powerhouse and in the process co-coined the term “Data Scientist.” Riley Newman worked on developing product analytics that was instrumental in Airbnb’s growth. Jace Kohlmeier led the data team at Khan Academy that helped to define how to optimize learning at a scale of millions of students. Unfortunately, face-to-face time with people has trouble scaling. At Insight, to maintain an exceptional high quality and personal time with its mentors, we accept a small group of talented scientists and engineers three times per year. The Data Science Handbook ¹http://insightdatascience.com/
  • 7. Foreword by Jake Klamka 3 provides readers with a way to have that in-depth conversation at scale. By reading the interviews in The Data Science Handbook, you will have the experience of learning from the leaders in data science at your own pace, no matter where you are in the world. Each interview is an in-depth conversation, covering the personal stories of these data scientists from their initial experiences that helped them find their own path to a career in data science. It’s not just the early data science leaders who can have a big impact on the field. There is also new talent entering the field, with the opportunity for each and every new member to push the field forward. When I met the authors of this book, they were still college students and aspiring data scientists, full of the same questions that those beginning in data science have. Through 18 months of hard work, they have gone and done the legwork for all those interested, seeking out some of the best data scientists around the country, and asking them for their advice and guidance. This book is the result of that work, containing over 100 hours of collected wisdom with people otherwise inaccessible to talk to (imagine having to compete with President Obama to talk with DJ Patil!). In the meantime, these young authors also have gone on to earn their own stripes as data scientists, working at some well-known companies. By reading these extended, informal interviews, you will get to sit down with industry trailblazers like DJ Patil, Jonathan Goldman and Pete Skomoroch, who were all part of the core, early LinkedIn data science teams. You will meet with Hilary Mason and Drew Conway, who were instrumental in creating the thriving New York data science community. You will hear advice from the next generation of data science leaders, like Diane Wu and Chris Moody, both former PhDs and Insight Alumni, who are now blazing new trails at MetaMinds and Stitch Fix. You will meet data scientists who are having a big impact in academia, including Bradley Voytek from UCSD and Joe Blitzstein from Harvard. You will meet data scientists in startups like Clare Corthell from Mattermark and Kunal Punera of Bento Labs, who will share how they use data science and
  • 8. Foreword by Jake Klamka 4 analytics as a core competitive advantage. The data scientists in the Data Science Handbook, along with dozens of others, have helped create the very industry that is now having such a tremendous impact on the world. Here, in this book, they discuss the mindset that allowed them to create this industry, address misconceptions about the field, share stories of specific challenges and victories, and talk about what skills they look for when building their teams. By reading their stories, hearing how they think and learning about where they see the future of data science going, you will gain the context to think of ways you can both have an impact and perhaps advance the field yourself in the years to come. Jake Klamka Founder Insight Data Science Fellows Program² Insight Data Engineering Fellows Program³ Insight Health Data Science Fellows Program⁴ ²http://insightdatascience.com/ ³http://insightdataengineering.com/ ⁴http://insighthealthdata.com/