This document summarizes The Data Science Handbook, which contains interviews with 25 leading data scientists. It provides career advice and insights into data science. The foreword discusses how data science has grown from a niche field to one with global impact. It introduces some of the data scientists interviewed in the book who helped establish the field. Their stories provide guidance for others looking to enter data science.
Data Science For Social Scientists WorkshopIan Hopkinson
The slides from a Workshop presentation on Data Science and Big Data given to academic social scientists. Lots of links to sources, should be interesting to those outside the original target field.
Hiring data scientists and deploying Hadoop is not enough. Your company needs a data driven culture, based on values such as honesty, democracy, creativity and strategy. Your company also needs good data engineering and good experimentation practices.
IBM Research Distinguished Speaker Series 2014. (Some notes included.) How can we improve work with the power of analytics? IBM’s Analytics website describes the success AAA of Northern California, Nevada, and Utah had in their compensation area (“what if” modeling was used to assess different sales compensation strategies against past data.) Tacit, acquired in 2008 by Oracle, used email and other work products to identify expertise where the experts were not always even aware of their own value, and to link people unaware of the value their being connected could provide. These are relatively rare examples of the power of analytics being turned inward on work. Using frameworks from substitutes for leadership (e.g., feedback from the work itself, technology support -- Kerr & Jermier, 1978; Jermier & Kerr, 1997) and organizational behavior more generally, I will offer a framework suggesting where analytics has the opportunity to complement our ability to lead by letting go -- to let go of work practices that made sense before we had the opportunity to work with the power of vast, varied, and dynamic data.
New Developments in Machine Learning - Prof. Dr. Max WellingTextkernel
Presentation from Prof. Dr. Max Welling, Professor of Machine Learning at the University of Amsterdam, at Textkernel's Intelligent Machines and the Future of Recruitment on June 2nd in Amsterdam.
At the end of this slide deck, you can also find the YouTube recording.
Due to increased compute power and large amounts of available data, machine learning is flourishing once again. In particular a technology called deep learning is making great strides maturing into a powerful technology. Max Welling briefly discusses variants of deep learning, such as convolutional neural networks and recurrent neural networks. But what lies around the corner in machine learning? He will discuss the three developments that in his opinion will become increasingly important:
1) Learning to interact with the world through reinforcement learning,
2) Learning while respecting everyone's privacy, and
3) Learning the causal relations in data (as opposed to discovering mere correlations).
Together, they represent the "power tools" of the future machine learner.
Data Science For Social Scientists WorkshopIan Hopkinson
The slides from a Workshop presentation on Data Science and Big Data given to academic social scientists. Lots of links to sources, should be interesting to those outside the original target field.
Hiring data scientists and deploying Hadoop is not enough. Your company needs a data driven culture, based on values such as honesty, democracy, creativity and strategy. Your company also needs good data engineering and good experimentation practices.
IBM Research Distinguished Speaker Series 2014. (Some notes included.) How can we improve work with the power of analytics? IBM’s Analytics website describes the success AAA of Northern California, Nevada, and Utah had in their compensation area (“what if” modeling was used to assess different sales compensation strategies against past data.) Tacit, acquired in 2008 by Oracle, used email and other work products to identify expertise where the experts were not always even aware of their own value, and to link people unaware of the value their being connected could provide. These are relatively rare examples of the power of analytics being turned inward on work. Using frameworks from substitutes for leadership (e.g., feedback from the work itself, technology support -- Kerr & Jermier, 1978; Jermier & Kerr, 1997) and organizational behavior more generally, I will offer a framework suggesting where analytics has the opportunity to complement our ability to lead by letting go -- to let go of work practices that made sense before we had the opportunity to work with the power of vast, varied, and dynamic data.
New Developments in Machine Learning - Prof. Dr. Max WellingTextkernel
Presentation from Prof. Dr. Max Welling, Professor of Machine Learning at the University of Amsterdam, at Textkernel's Intelligent Machines and the Future of Recruitment on June 2nd in Amsterdam.
At the end of this slide deck, you can also find the YouTube recording.
Due to increased compute power and large amounts of available data, machine learning is flourishing once again. In particular a technology called deep learning is making great strides maturing into a powerful technology. Max Welling briefly discusses variants of deep learning, such as convolutional neural networks and recurrent neural networks. But what lies around the corner in machine learning? He will discuss the three developments that in his opinion will become increasingly important:
1) Learning to interact with the world through reinforcement learning,
2) Learning while respecting everyone's privacy, and
3) Learning the causal relations in data (as opposed to discovering mere correlations).
Together, they represent the "power tools" of the future machine learner.
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 Leon Kappelman
Talk I gave in Paris on 28-Oct-10 @ the Complex System Design and Management Conference on "Enterprise Architecture and the Information Age Enterprise." Excellent event, wonderful people, beautiful city.
Leading Without Formal Authority -- By Using DataTerri Griffith
Success in today's fast-paced environments requires that we all lead, whether or not we have the formal authority. Our goals, roles, and colleagues change more quickly than in the past and this pushes us to develop strategies of influence that work clearly and quickly. In this session, we will develop a data-focused approach to direct and coordinate your human, technical, and organizational resources -- we need all three! -- through "light-weight" experiments. Whether you have two minutes or a whole business cycle, you can use data to lead.
Data science is having a growing effect on our lives, from the content we see on social media feeds to the decisions businesses are making. Along with successes, data science has inspired much hype about what it is and what it can do. So I plan to try and demystify data science and have a discussion about what it really is. What does a day-in-the-life look like? What tools and skills are needed? How is data science successfully applied in the real world? In this talk, I’ll be providing insight into these questions and also speculate the future of data science and its place in business and technology.
Presented at OpenWest 2018
An enormous amount of valuable data is out there -- waiting to be transformed into mission-driving insights. But to excavate those insights, we must first assemble the right data science team.
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
CDO Slides: A Chief Data Officer InterviewDATAVERSITY
Join John and Kelle as they talk to a Chief Data Officer (CDO). We will continue to explore why organizations hire CDO’s and how the CDO role is still evolving. We will examine some of the critical success factors and challenges of the role. This interview will also take a deeper dive into specific activities and value generated by the CDO positions.
In this webinar we will discuss:
•What were and are the biggest challenges?
•What kind of support do you get?
•What kind of business strategy planning are you a part of?
•What can be done differently?
Isolating values from big data with the help of four v’seSAT Journals
Abstract
Big Data refers to the massive amounts of data that collect over time that are difficult to analyze and handle using common database management tools. It includes business transactions, e-mail messages, photos, surveillance videos and activity logs. It also includes unstructured text posted on the Web, such as blogs and social media. Big Data has shown lot of potential in real world industry and research community. We support the power and Potential of it in solving real world problems. However, it is imperative to understand Big Data through the lens of 4 Vs. 4th V as ‘Value’ is desired output for industry challenges and issues. We provide a brief survey study of 4 Vs. of Big Data in order to understand Big Data and extract Value concept in general. Finally we conclude by showing our vision of improved healthcare, a product of Big Data Utilization, as a future work for researchers and students, while moving forward.
Keywords: Big Data, Surveillance videos, blogs, social media, four Vs.
In the age of information overload, having a social media measurement practice is the key to successful execution of your social strategy. In this session, Debra Askanase looked at what data points tell you that your community cares and is willing to take action, a methodology to figuring what data is relevant to your outcomes, where to find the metrics that matter, and why setting up the right metrics can make the difference between knowing that people visited a page on your website, and if your social media actions sent them there.
Hawaii International Conference on Systems Sciences 2017. There are many opportunities for academics to submit papers for presentation at this very important conference which has sessions on Cognitive, Analytics, Big Data and much more. Haluk Demirkan, U Washington and Sergey Belov, IBM University Relations CEEMA made this presentation at Cognitive Systems Institute Speaker Series call on March 10, 2016.
Altmetrics: the movement, the tools, and the implicationsKR_Barker
Measuring scholarly impact has traditionally been tied to the calculation of a scholarly article’s number of citations and the Impact Factor of its journal. Today, however, scholarly contributions take many forms: computer code, data sets, blog postings, tweets, practice guidelines and beyond. As the products of research evolve, so will the way in which credit is measured. This class will provide an overview of “altmetrics”, the movement to assess influence of both traditional and non-traditional scholarly contributions. We will define altmetrics, discuss why it is important in today’s digital scholarly environment, and demonstrate tools available to measure influence. After completing this course, the learner will be able to define altmetrics and compare it to traditional forms of measuring scholarly impact; name examples of scholarly contributions that are alternatives to traditional methods (e.g. datasets, blog postings, tweets, etc.); name examples of alternative means of measuring scholarly contributions (e.g. download counts, tweets about, etc.); discuss why today’s online, social environment necessitates a change in the way scholarly contributions are measured; name resources to learn more about altmetrics such as altmetrics.org; and name tools to measure alternative scholarly contributions such as Altmetric.com, Impact Story, Plum Analytics, etc.
Booz Allen Hamilton created the Field Guide to Data Science to help organizations and missions understand how to make use of data as a resource. The Second Edition of the Field Guide, updated with new features and content, delivers our latest insights in a fast-changing field. http://bit.ly/1O78U42
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | HamiltonArysha Channa
Foreword: Data science touches aspects of our lives on a daily basis. When we visit the doctor, drive our cars, get on an airplane, or shop for services, Data science is changing the way we interact with and explore our world.
The book summarizes the Chicago School of Data project which included a scan of our local data ecosystem from 2013 - 2014 and a convening we built on top of that scan. Typical with other Smart Chicago projects like CUTGroup and the Array of Things Civic Engagement Project, we also included “meta” sections in the Chicago School of Data book — specific details about how we executed our projects, what tools we used, and the logic or guiding principles behind our program design decisions.
http://www.chicagoschoolofdata.com/
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 Leon Kappelman
Talk I gave in Paris on 28-Oct-10 @ the Complex System Design and Management Conference on "Enterprise Architecture and the Information Age Enterprise." Excellent event, wonderful people, beautiful city.
Leading Without Formal Authority -- By Using DataTerri Griffith
Success in today's fast-paced environments requires that we all lead, whether or not we have the formal authority. Our goals, roles, and colleagues change more quickly than in the past and this pushes us to develop strategies of influence that work clearly and quickly. In this session, we will develop a data-focused approach to direct and coordinate your human, technical, and organizational resources -- we need all three! -- through "light-weight" experiments. Whether you have two minutes or a whole business cycle, you can use data to lead.
Data science is having a growing effect on our lives, from the content we see on social media feeds to the decisions businesses are making. Along with successes, data science has inspired much hype about what it is and what it can do. So I plan to try and demystify data science and have a discussion about what it really is. What does a day-in-the-life look like? What tools and skills are needed? How is data science successfully applied in the real world? In this talk, I’ll be providing insight into these questions and also speculate the future of data science and its place in business and technology.
Presented at OpenWest 2018
An enormous amount of valuable data is out there -- waiting to be transformed into mission-driving insights. But to excavate those insights, we must first assemble the right data science team.
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
CDO Slides: A Chief Data Officer InterviewDATAVERSITY
Join John and Kelle as they talk to a Chief Data Officer (CDO). We will continue to explore why organizations hire CDO’s and how the CDO role is still evolving. We will examine some of the critical success factors and challenges of the role. This interview will also take a deeper dive into specific activities and value generated by the CDO positions.
In this webinar we will discuss:
•What were and are the biggest challenges?
•What kind of support do you get?
•What kind of business strategy planning are you a part of?
•What can be done differently?
Isolating values from big data with the help of four v’seSAT Journals
Abstract
Big Data refers to the massive amounts of data that collect over time that are difficult to analyze and handle using common database management tools. It includes business transactions, e-mail messages, photos, surveillance videos and activity logs. It also includes unstructured text posted on the Web, such as blogs and social media. Big Data has shown lot of potential in real world industry and research community. We support the power and Potential of it in solving real world problems. However, it is imperative to understand Big Data through the lens of 4 Vs. 4th V as ‘Value’ is desired output for industry challenges and issues. We provide a brief survey study of 4 Vs. of Big Data in order to understand Big Data and extract Value concept in general. Finally we conclude by showing our vision of improved healthcare, a product of Big Data Utilization, as a future work for researchers and students, while moving forward.
Keywords: Big Data, Surveillance videos, blogs, social media, four Vs.
In the age of information overload, having a social media measurement practice is the key to successful execution of your social strategy. In this session, Debra Askanase looked at what data points tell you that your community cares and is willing to take action, a methodology to figuring what data is relevant to your outcomes, where to find the metrics that matter, and why setting up the right metrics can make the difference between knowing that people visited a page on your website, and if your social media actions sent them there.
Hawaii International Conference on Systems Sciences 2017. There are many opportunities for academics to submit papers for presentation at this very important conference which has sessions on Cognitive, Analytics, Big Data and much more. Haluk Demirkan, U Washington and Sergey Belov, IBM University Relations CEEMA made this presentation at Cognitive Systems Institute Speaker Series call on March 10, 2016.
Altmetrics: the movement, the tools, and the implicationsKR_Barker
Measuring scholarly impact has traditionally been tied to the calculation of a scholarly article’s number of citations and the Impact Factor of its journal. Today, however, scholarly contributions take many forms: computer code, data sets, blog postings, tweets, practice guidelines and beyond. As the products of research evolve, so will the way in which credit is measured. This class will provide an overview of “altmetrics”, the movement to assess influence of both traditional and non-traditional scholarly contributions. We will define altmetrics, discuss why it is important in today’s digital scholarly environment, and demonstrate tools available to measure influence. After completing this course, the learner will be able to define altmetrics and compare it to traditional forms of measuring scholarly impact; name examples of scholarly contributions that are alternatives to traditional methods (e.g. datasets, blog postings, tweets, etc.); name examples of alternative means of measuring scholarly contributions (e.g. download counts, tweets about, etc.); discuss why today’s online, social environment necessitates a change in the way scholarly contributions are measured; name resources to learn more about altmetrics such as altmetrics.org; and name tools to measure alternative scholarly contributions such as Altmetric.com, Impact Story, Plum Analytics, etc.
Booz Allen Hamilton created the Field Guide to Data Science to help organizations and missions understand how to make use of data as a resource. The Second Edition of the Field Guide, updated with new features and content, delivers our latest insights in a fast-changing field. http://bit.ly/1O78U42
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | HamiltonArysha Channa
Foreword: Data science touches aspects of our lives on a daily basis. When we visit the doctor, drive our cars, get on an airplane, or shop for services, Data science is changing the way we interact with and explore our world.
The book summarizes the Chicago School of Data project which included a scan of our local data ecosystem from 2013 - 2014 and a convening we built on top of that scan. Typical with other Smart Chicago projects like CUTGroup and the Array of Things Civic Engagement Project, we also included “meta” sections in the Chicago School of Data book — specific details about how we executed our projects, what tools we used, and the logic or guiding principles behind our program design decisions.
http://www.chicagoschoolofdata.com/
12 sept2013 imd network orchestration martha g russellMartha Russell
Presentation to the eMBA delegation of IMD on September 12, 2013 at Stanford University. Martha G Russell, Executive Director mediaX at Stanford University & Tony Lai, StartX.
ASIS&T Diane Sonnenwald Information Science as a Career ASIS&T
American Society for Information Science & Technology Board of Directors President Diane H. Sonnenwald presents "Reflections on the Journey" at European Chapter’s Celebration of ASIS&T’s 75th Anniversary. With examples from her own career, she speaks to how a career in the discipline of Information Science can be shaped.
What does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them.
Emerging technologies and the future of libraries (and library systems). Keyn...Ken Chad Consulting Ltd
Global technology trends and new directions in Higher Education will clearly affect the future of academic libraries and the nature of library technology. A common thread is the increasing focus on the user/consumer in an increasingly digital economy. For example a leading information technology research and advisory company, Gartner states ('Top 10 strategic predictions for 2015') that: "Renovating the customer experience is a digital priority." What should libraries and library tech companies do? Ken argues that the first step is looking again at user needs and suggests an innovative and practical methodology to help
Pros and Cons of Open Data: A Global South PerspectiveMichelle Willmers
Presentation by ROER4D Curation & Dissemination Manager Michelle Willmers on open data practice in the Global South to the Committee of Plenipotentiary Representatives of the International Committee for Scientific and Technical Information (ICSTI).
The State of Open Data Report by @figshare.
A selection of analyses and articles about open data, curated by Figshare
Foreword by Professor Sir Nigel Shadbolt
OCTOBER 2016
In an era of social media, voluntary communities and networks, are you a conn...Kelly Hoey
Facebook, Linkedin, Twitter - Technology helps find “connections” and magnifies word of mouth. What are you doing to bridge your networks, connecting people, information and ideas?
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
5. Foreword by Jake Klamka
In the past five years, data science has gone from a nascent, tech
industry competency to a field that is having a global, cross-
industry impact in almost every major area of human endeavour.
From education, to energy, to government, to non-profits and, of
course, software and the Internet, data science is creating immense
value for companies and organizations across the world. In fact,
in early 2015, the President of the United States announced the
creation of the new role of Chief Data Scientist to the White House,
appointing one of the interviewees of this book, DJ Patil.
Like many innovations in the world, the birth and growth of this
industry was started by a few motivated people. Over the last few
years, they founded, developed and advocated for the value that
data analytics can bring to every industry around the world. In
The Data Science Handbook, you will have the opportunity to meet
many of these founding data scientists, hear first hand accounts of
the incredible journeys they took, and where they think the field is
headed.
The road to becoming a data scientist is not always an easy one.
When I tried to transition from experimental particle physics to
industry, resources were few and far between. In fact, although a
need for data science existed in companies, the job title had not
been created yet. I spent a lot of time learning and teaching myself,
working on various startup projects, and later saw many of my
friends from academia run into the same challenges.
I saw a groundswell of incredibly gifted and highly trained re-
searchers who were excited about moving into data-driven roles,
yet they were missing key pieces of knowledge, and had trouble
transferring the incredible quantitative and data analysis skills they
6. Foreword by Jake Klamka 2
had gained in their research to a career in industry. Meanwhile,
having lived and worked in Silicon Valley, I also saw that there was
a very strong demand from the technology companies who wanted
to hire these people.
To help others bridge the gap between academia and industry, I
founded the Insight Data Science Fellows Program¹ in 2012. Insight
is a training fellowship that helps quantitative PhDs transition
from academia to industry. Over the last few years, we’ve helped
hundreds of Insight Fellows, from fields like physics, computational
biology, neuroscience, math, and engineering transition from a
background in academia to become leading data scientists at com-
panies like Facebook, Airbnb, LinkedIn, New York Times, Memorial
Sloan Kettering Cancer Center and nearly a hundred other compa-
nies, with a strong alumni network on both the East and West Coast.
In my personal journey to enter the technology field, and creating
a community for others to do the same, one key resource I found
to be tremendously useful was conversations with others who had
successfully made the transition themselves. As I developed Insight,
I have had the chance to engage with some of Silicon Valley’s best
data scientists who are mentors to the program:
Jonathan Goldman created one of the first data products at LinkedIn–
People You May Know–which transformed the growth trajectory
of the company. DJ Patil build and grew the data science team
at LinkedIn into a powerhouse and in the process co-coined the
term “Data Scientist.” Riley Newman worked on developing product
analytics that was instrumental in Airbnb’s growth. Jace Kohlmeier
led the data team at Khan Academy that helped to define how to
optimize learning at a scale of millions of students.
Unfortunately, face-to-face time with people has trouble scaling.
At Insight, to maintain an exceptional high quality and personal
time with its mentors, we accept a small group of talented scientists
and engineers three times per year. The Data Science Handbook
¹http://insightdatascience.com/
7. Foreword by Jake Klamka 3
provides readers with a way to have that in-depth conversation at
scale. By reading the interviews in The Data Science Handbook,
you will have the experience of learning from the leaders in data
science at your own pace, no matter where you are in the world.
Each interview is an in-depth conversation, covering the personal
stories of these data scientists from their initial experiences that
helped them find their own path to a career in data science.
It’s not just the early data science leaders who can have a big
impact on the field. There is also new talent entering the field, with
the opportunity for each and every new member to push the field
forward. When I met the authors of this book, they were still college
students and aspiring data scientists, full of the same questions that
those beginning in data science have. Through 18 months of hard
work, they have gone and done the legwork for all those interested,
seeking out some of the best data scientists around the country, and
asking them for their advice and guidance. This book is the result
of that work, containing over 100 hours of collected wisdom with
people otherwise inaccessible to talk to (imagine having to compete
with President Obama to talk with DJ Patil!). In the meantime, these
young authors also have gone on to earn their own stripes as data
scientists, working at some well-known companies.
By reading these extended, informal interviews, you will get to sit
down with industry trailblazers like DJ Patil, Jonathan Goldman
and Pete Skomoroch, who were all part of the core, early LinkedIn
data science teams. You will meet with Hilary Mason and Drew
Conway, who were instrumental in creating the thriving New
York data science community. You will hear advice from the next
generation of data science leaders, like Diane Wu and Chris Moody,
both former PhDs and Insight Alumni, who are now blazing new
trails at MetaMinds and Stitch Fix. You will meet data scientists
who are having a big impact in academia, including Bradley Voytek
from UCSD and Joe Blitzstein from Harvard. You will meet data
scientists in startups like Clare Corthell from Mattermark and Kunal
Punera of Bento Labs, who will share how they use data science and
8. Foreword by Jake Klamka 4
analytics as a core competitive advantage.
The data scientists in the Data Science Handbook, along with
dozens of others, have helped create the very industry that is now
having such a tremendous impact on the world. Here, in this book,
they discuss the mindset that allowed them to create this industry,
address misconceptions about the field, share stories of specific
challenges and victories, and talk about what skills they look for
when building their teams. By reading their stories, hearing how
they think and learning about where they see the future of data
science going, you will gain the context to think of ways you can
both have an impact and perhaps advance the field yourself in the
years to come.
Jake Klamka
Founder
Insight Data Science Fellows Program²
Insight Data Engineering Fellows Program³
Insight Health Data Science Fellows Program⁴
²http://insightdatascience.com/
³http://insightdataengineering.com/
⁴http://insighthealthdata.com/