BIG DATA, DATA SCIENCE, AND THE U.S. DEPARTMENT OF DEFENSE (DO
1. BIG DATA, DATA SCIENCE, AND THE U.S. DEPARTMENT
OF DEFENSE (DOD)
by
Roy Lancaster
GAYLE GRANT, DM, Faculty Mentor and Chair
MICHELLE PREIKSAITIS, JD, PhD, Committee Member
BRUCE WINSTON, PhD, Committee Member
Tonia Teasley, JD, Interim Dean
School of Business and Technology
A Dissertation Presented in Partial Fulfillment
Of the Requirements for the Degree
Doctor of Business Administration
Capella University
January 2019
4. data analysis skills and tools. Eleven DOD analysts answered
individual interview questions,
eight managers participated in a focus group, and the DOD
provided documents to assist with
investigating two research questions: How does the Bravo Zulu
Center glean actionable
information from big data sets? How mature are the data science
analytical skills, processes, and
software tools used by Bravo Zulu Center analysts? Qualitative
analysis using the NVivo-11®
Pro software on the results of the interviews, focus group, and
documents, showed that
overarching themes of access to quality data, training, data
science skills, domain understanding,
management, infrastructure and legacy systems, organization
structure and culture, and
competition for analytical talent appear as concerns for
improving big data analysis in the DOD.
The Bravo Zulu Center is experiencing the same large data
growth as other organizations
described in scholarly research and is struggling with creating
actionable information from large
data sets to meet mission requirements and this is compounded
by immature data science skills.
5. iii
Dedication
The study is dedicated to my wife of thirty years Laurie
Lancaster. Your love, continued
encouragement, and desire for life-long learning has always
provided me strength to continue, I
love you and thank you! I also dedicate this work to our
children Sarah, TJ, and Wesley and to
our grandbabies Nora and Jameson! A special thank you to my
mom Kathryn for “grounding”
me in the early years and teaching me the value of education
and for your foundational love and
support! Special thank you to my sisters Shari and Amy and to
all my extended family and
friends, I love you all!
iv
6. Acknowledgments
I wholeheartedly thank my mentor and chair, Dr. Gayle Grant
for her expert guidance
throughout this project and getting me to finish line, thank you!
I extend gratitude to my
committee, Dr. Michelle Preiksaitis and Dr. Bruce Winston for
their expert reviews and
guidance. A special thank you to Dr. Linda Haynes for her
outstanding reviews and most
importantly her love and inspiration, thanks Aunt Linda! Thank
you to the Bravo Zulu Center
(pseudonym) for opening their doors for me, this study would
not have been possible without
your generosity. Thank you to the men and women who wear the
uniform of the United States
military!
v
Table of Contents
Dedication
7. ...............................................................................................
............... iii
Acknowledgments....................................................................
.............................. iv
List of Tables
...............................................................................................
........ viii
List of Figures
...............................................................................................
...........x
CHAPTER 1. INTRODUCTION
........................................................................................1
Introduction
...............................................................................................
...............1
Background
...............................................................................................
...............2
Business Problem
...............................................................................................
......4
Research Purpose
...............................................................................................
......5
Research Questions
...............................................................................................
...6
9. ...............................................................................................
..........31
vi
Data Sciences Skills
...............................................................................................
34
Federal Job Series and DOD Data Scientists
.........................................................45
Management Implications
......................................................................................48
Summary
...............................................................................................
.................52
CHAPTER 3. METHODOLOGY
.....................................................................................53
Introduction
...............................................................................................
.............53
Research Questions
...............................................................................................
.53
Design and Methodology
.......................................................................................54
11. Results....................................................................................
......78
Data Analysis and Results
.....................................................................................84
Summary
...............................................................................................
...............141
CHAPTER 5. DISCUSSION, IMPLICATIONS,
RECOMMENDATIONS ..................143
Introduction
...............................................................................................
...........143
vii
Evaluation of Research Questions
.......................................................................147
Fulfillment of Research Purpose
..........................................................................149
Contribution to Business Problem
.......................................................................152
Recommendations for Further Research
..............................................................153
Conclusions
13. Table 5. BZC participant
criteria…………………………………………………………60
Table 6. Instruments and data collection methods
………………………………………62
Table 7. Initial codes
…………………………………………………………………….71
Table 8. Interviewee experience
levels…..……………………………………………….80
Table 9. Management focus group
experience….…..……………………………………81
Table 10. BZC collected
documents…………………….………………………………..83
Table 11. Initial codes
(restated)……………………………………………………….....85
Table 12. Analysts’ responses to questions about big
data..........................................…...89
Table 13. Analysts’ responses to data usage
questions…………………………………...90
14. Table 14. Analysts’ responses to questions regarding data
analysis challenges………….92
Table 15. Analysts’ responses further exploring access to
quality data…………………..93
Table 16. Analysts’ responses to data usage and data analysis
questions………………...95
Table 17. Additional responses to analysis challenges
questions…………………………97
Table 18. Additional analysts’ responses to challenges
questions………………………..99
Table 19. Analysts’ responses to data science skills
questions…………………………..101
Table 20. Analysts’ responses to data science skills and analysis
software questions…...104
Table 21. Analysts’ responses to training related
questions……………………………...106
15. ix
Table 22. Analysts’ responses to data scientists scarcity
questions……………………..108
Table 23. Analysts’ responses to data scientist skills and roles
questions………………110
Table 24. Managers’ responses to questions about big
data…………………………….114
Table 25. Managers’ responses to data usage
questions………………………………...116
Table 26. Managers’ responses to questions regarding data
analysis challenges……….117
Table 27. Managers’ additional responses to data analysis
challenges…………….……119
Table 28. Managers’ responses to data usage and data analysis
questions……………...120
Table 29. Managers’ responses to analysis
challenges………………………………….122
16. Table 30. Managers’ responses to data science skills
questions………………………...124
Table 31. Managers’ responses to data science skills and
analysis software questions....125
Table 32. Managers’ responses to training related
questions……………………….…...127
Table 33. Managers’ responses to data scientists scarcity
questions……………….........128
Table 34. Managers’ responses to data scientists’ skills and
roles questions…………....130
Table 35. Data scientist and BZC Supply Analyst skills
comparison……………….…..134
Table 36. Data scientist and BZC Program Management Analyst
skills comparison…...136
Table 37. Data scientist and BZC Operations Research Analyst
skills comparison……..138
Table 38. Data scientist and BZC Computer Scientist skills
comparison……………......140
17. x
List of Figures
Figure 1. Analysis of big data scholarship
.........................................................................16
Figure 2. Cleveland’s data science
taxonomy....................................................................32
Figure 3. Adaptation of Cleveland’s data science taxonomy
............................................63
Figure 4. BZC case study triangulation
.............................................................................67
Figure 5. BZC case study data analysis process
................................................................72
18. Figure 6. BZC potential analyst
participants………………………………..……………79
Figure 7. Final hierarchical coding
structure……..………………………………………86
Figure 8. Initial analysts interviews word frequency
diagram……………………………87
Figure 9. Refined analyst interviews word frequency
diagram…………………………..88
Figure 10. Initial management focus group interview word
frequency diagram………..112
Figure 11. Refined management focus group interview word
frequency diagram.……..113
Figure 12. BZC strategic document word frequency
diagram……………………..…….131
Figure 13. BZC job announcements word frequency
diagram……………………….….133
Figure 14. Cleveland’s data science taxonomy
(restated)……………….……….…........144
19. Figure 15. Final hierarchical coding structure
(restated)……………………….…..……146
Figure 16. Domain and data science assessment
model………………………….…...….151
1
CHAPTER 1. INTRODUCTION
Introduction
A seemingly infinite amount of data (big data) has emerged, and
its effects are profound
on modern-day corporations and the United States military as
they continue to progress through
the information technology age (Ransbotham, Kiron, &
Prentice, 2015). The ability to connect
and analyze continuously growing digital data is now essential
to competitiveness in most
20. sectors of the United States economy (Lansiti & Lakhani, 2014).
George, Haas, and Pentland
(2014) suggested that although there is evidence demonstrating
the significant growth in data and
its importance for sustainability there is a gap in published
management scholarship providing
theory and practices for management. Additionally, growing
evidence supports the notion that
the skills required to manage and analyze the exponentially
growing size of data are inadequate
and in short supply with bleak predictions for the future (Harris
& Mehrotra, 2014). If there is
truly a new occupation emerging (data scientist) in the
commercial sector because of the
exceptional data growth, then determining how United States
Department of Defense (DOD)
organizations currently analyze large data sets will help
determine if data scientists are warranted
in their organizations. Chapter 1 of this study demonstrates a
business problem for both
commercial organizations and the DOD. The general business
problem is the lack of effective
analysis in organizations operating in the modern-day big data
environment (Harris & Mehrotra,
21. 2014). The specific business problem is that DOD organizations
may be struggling with gleaning
actionable information from large data sets compounded by
immature data science skills of DOD
analysts (Harris, Murphy, & Vaisman, 2013). This chapter
describes the conceptual framework
that supports this study and the rationale, purpose, and
significance of the study. The overall
significance of this study is to assist with the gap in DOD
related scholarly research related to
2
big data and data science and seeks to contribute value to
scholars and practitioners working on
this important business problem.
Gang-Hoon, Trimi, and Ji-Hyong (2014) proposed a level of
skepticism in the United
States military’s ability to adapt new technologies and
philosophies required to leverage
meaningful information from large data sets. The research
explored big data and data science
associated with the challenges brought on by the enormous data
22. growth being observed in nearly
all organizations. The DOD is an extremely large organization
and well beyond the ability of one
dissertation to affect massive change. This research was
supported by a comprehensive literature
review of big data and data science application in corporate
America as well as the DOD and
seeks to provide actionable insights into the requirements of the
analysts in modern-day
organizations and serve as a catalyst for additional research.
Background
Managing data represents both problems and opportunity with
distinct advantages to
organizations that can manage and analyze data (McAfee &
Brynjolfsson, 2012). This research
investigated how organizational leaders and analysts manage
and probe data to make better-
informed decisions, offer new insights, and automate business
processes thereby adding value
throughout the value chain and creating sustainable competitive
advantages (Berner, Graupner,
& Maedche, 2014). Watson and Marjanovic (2013) advocated
that although executives are aware
23. of big data and know of some specific uses, they are often
unsure how big data can be used in
their organizations and what is required to be successful.
Additionally, Edwards (2014) found the
DOD is experiencing a similar data growth and presents similar
problems and opportunities for
DOD leaders.
Watson and Marjanovic (2013) suggested big data and data
science may not represent
3
something new but are simply the next stage of business
analysis as organizations continue to
progress through the information technology age. The fields of
business intelligence (BI) and
business analytics (BA) are not new with decades of existence
in business and were the subject
of examination in this research. Scholarly researchers agree it is
important to understand the
desired connection between raw data and actionable information
through the evolution of
business intelligence (BI) and business analytics (BA) (Chen,
Chiang, & Storey, 2012). The term
24. intelligence has been a term used in scientific research since the
early 1950s. In the 1970s,
computing technology began providing actionable information
to the business world and
companies began utilizing systems to generate information from
raw data for management
(Ortiz, 2010). In her seminal book, In the Age of the Smart
Machine: The Future of Work and
Power, Zuboff (1988) predicted information systems are not
only going to automate business
processes they will also produce valuable information in a
unique manner. The field of business
intelligence became popular in the business and information
technology (IT) communities and
the idea of business analytics became popular in the 2000s as
the key analytics component of
business intelligence (Chen et al. 2012). The unquestioned
benefit of business intelligence and
business analytics is the ability to capture trends, gain insights,
and draw conclusions from the
data generated in support of the business or to gain advantages
over the competition and create
sustainable growth (Rouhani, Ashrafi, Zare Ravasan, & Afshari,
2016). Berner et al. (2014)
25. suggested that with data generation on a sharp incline there are
significant gaps in the abilities of
modern-day organizations to leverage big data, and without
mitigation, this gap will continue to
grow. The concept of business intelligence means organizations
understand their business and
the environment it operates in, thus creating the ability for
smarter decisions. Big data stands to
be a key enabler for business intelligence success (Swain,
2016).
4
Business Problem
Organizations face rapid data growth, requiring deliberate and
strategic action by
leadership to remain competitive and ensure sustainability
(Gabel & Tokarski, 2014). For
example, the data-rich, highly-competitive airline industry gives
a clear advantage to airline
corporations that use big data to drive their strategies and
decisions, while punishing those that
do not (Akerkar, 2014). Additionally, corporations such as
26. Amazon are leading the way utilizing
high-powered big data analytics to alter the retail industry
(Watson & Marjanovic, 2013). The
airline and retail industries are just two examples of industries
that are being reshaped due to
their ability or inability to analyze large data sets and may
provide actionable insights for the
DOD.
Ransbotham, Kiron, and Prentice (2015) is a significant
research study published in the
MIT Sloan Management Review that in 2014 surveyed 2,719
participants. The participants of the
study advocated combining high level analytical skills with
existing business knowledge are
creating competitive advantages. Phillips-Wren and Hoskisson
(2015) suggested big data is
stimulating innovation and altering foundational aspects of
many business models. Additionally,
both of these sources indicate the analysis of big data is proving
difficult as companies struggle
with the ability to create actionable analytical products and
integrating new analysis into existing
decisions venues. Ransbotham et al. (2015) proposed a key
constraint preventing analysts from
27. producing actionable information from large data sets are the
lack of analytical skills.
The general business problem is the lack of effective analysis in
organizations operating
in the modern-day big data environment (Harris & Mehrotra,
2014). The specific business
problem is that DOD organizations may be struggling with
gleaning actionable information from
large data sets compounded by immature data science skill s of
DOD analysts (Harris, Murphy, &
5
Vaisman, 2013). Symon and Tarapore (2015) proposed the fast-
paced evolution of analysis
capabilities in commercial organizations represents great
opportunity to address this business
problem for the DOD. Hamilton and Kreuzer (2018) suggested
the amount of data collected by
DOD organizations continues to outpace the ability to process
and interpret the data and the
ability to glean actionable information from large data sets is
crucial for DOD mission success.
28. Research Purpose
The purpose of this qualitative case study was to explore how
DOD employees conduct
data analysis with the influx of big data. An unidentified U.S.
Air Force command was selected
by the researcher as the case study organization to support this
study. The Bravo Zulu Center
(BZC) pseudonym was applied throughout this research to
conceal the identity of the case study
organization. This research explored the emerging commercial
data scientist occupation and the
skills required of data scientists to help determine if data
science is applicable to the DOD. This
research sought to further define the skills required of data
scientists to help enable their
effectiveness in modern organizations with specific emphasis
aimed at the DOD. The targeted
population consisted of analysts, managers, or executives
working within the Bravo Zulu Center
(BZC). The implication for positive social change includes the
potential to identify needed
adaptations in the skills and abilities of analysts and managers
working within DOD
organizations that are required to glean actionable information
29. from big data sets. This research
explored data science and the implications associated with the
big data phenomenon by
conducting qualitative research with a representative case study
organization. This dissertation
explored important skill sets, attitudes, and perceptions of the
analysts working big data issues
for the BZC, along with the skills sets, attitudes, and
perceptions of management within the same
organization. Big data innovations are happening throughout
commercial industries and it is
6
transforming foundational aspects of many business models and
placing greater demands for
fast-paced innovation (Parmar, Cohn, & Marshall, 2014). This
fast-paced evolution of analysis
capabilities in commercial organizations represents great
opportunity for the DOD. This research
builds upon several big data and data science constructs
documented in contemporary scholarly
literature (Symon & Tarapore, 2015). First, big data represents
both potential and liability with
30. the ability to manage and analyze big data sets likely required
for business sustainability
(Gobble, 2013). Second, for organizations to harvest actionable
information from big data sets
requires deliberate change in many aspects of organization
design and management of human
resources (Gabel & Tokarski, 2014).
A qualitative research methodology is appropriate for
understanding human behavior and
is common in social and behavioral sciences and by scholar
practitioners who seek to understand
a phenomenon (Cooper & Schindler, 2013). This type of
research involves collecting data
typically in the participants’ settings and inductively analyzing
the collected information looking
for themes to provide insight and understanding (Cooper &
Schindler, 2013). This research is an
exploration of how big data analysis is accomplished within the
DOD and why the rise of large
data sets may generate the need to increase the analytical skills
of DOD employees making a
qualitative research methodology most appropriate.
Research Questions
31. The objective of this research was to develop an understanding
of how DOD analysts
respond to, probe and assimilate data in big data environments
to help determine if a data science
occupation is justified and warranted in the DOD. The following
research questions guided the
study:
7
Primary Research Question 1: How does the Bravo Zulu Center
glean actionable
information from big data sets?
Primary Research Question 2: How mature are the data science
analytical skills,
processes, and software tools used by Bravo Zulu Center
analysts?
Rationale
The principle rationale for furthering the knowledge on the big
data phenomenon and
data science through a qualitative case study is a result of the
32. need to view big data analysis
through the humanist lens instead of an information system
technological lens (McAfee &
Brynjolfsson, 2012). Managing big data requires senior decision
makers to embrace data driven
decisions and this will require a cultural change in many
organizations (Gabel & Tokarski,
2014). Even though there are researchers that stress the
importance of big data capability, there is
no consensus on how best to re-align and organize modern-day
organizational models to support
big data efforts (Grossman & Siegel, 2014). Additionally,
Brynjolfsson and McAfee (2012)
suggested there is a lack of understanding by all levels of
management regarding the value of big
data and the changes required to harness the power of big data.
Management may need to invest
in data scientists who can manage and manipulate large data
sets and turn this raw data into
meaningful information. Unfortunately, organizations and
academia may be struggling with
defining the skills sets of these so-called data scientists (Harris
et al. 2013). Gabel and Tokarski
(2014) advocated data capture usage is on a sharp increase and
33. businesses and organizations
would like to realize competitive advantages contained in the
use of the tremendous amount of
data. Digital data is driving foundational changes in personal
lives, business, academia, and
functions of government. The analysis of big data promises to
reshape everything from
government, international development, and even how we
conduct basic science (Gobble, 2013).
8
DOD organizations are generating massive amounts of
information from activities along their
value chains. There has been a dramatic increase of embedded
sensors into modern-day weapon
systems that is compounding the data growth (Hamilton &
Kreuzer, 2018).
Moorthy et al. (2015) suggested there is potential in nearly all
industries regarding the
impact of turning vast amounts of raw data into meaningful
information. Additionally, turning
large raw data sets into meaningful information will require
deliberate and strategic action
34. (Galbraith, 2014). Warehousing data is problematic, expensive,
and time consuming and creates
alignment difficulties in modern organizations (Gabel &
Tokarski, 2014). Davenport and Patil
(2012) submitted that the skills required to large amounts of
raw data into meaningful
information are in high demand and are in short supply. The
technology for producing data has
evolved greatly but the skills and software tools required to
analyze large data sets have been
lagging (Gobble, 2013). Additionally, the DOD has declared
they have a scarcity of data
scientists. According to the Deputy Assistant Secretary for
Defense Research, data scientists are
in short supply and are becoming the most in demand job for the
U.S. Military (Hoffman, 2013).
There are experts suggesting there is a data analysis skills
shortfall especially for analysts that
have the talent to create predictive analytical products utilizing
statistics, artificial intelligence,
and machine learning (Davenport & Patil, 2012).
Conceptual Framework
The conceptual framework serves as the foundational knowledge
35. to support the research
study. This framework serves to guide the research by relying
on formal theory, which supports
the researcher’s thinking on how to understand and plan to
research the topic (Grant & Osanloo,
2014). William S. Cleveland (2001) coined the term data
science in the context of enlarging the
major areas of technical work in the field of statistics.
Cleveland’s seminal work described the
9
requirement of an “action plan to enlarge the technical areas of
statistics focuses of the data
analyst” (Cleveland, 2001, p. 1). Cleveland described a major
altering of the analysis occupation
to the point a new field shall emerge and will be called “data
science” (Cleveland, 2001, p. 1).
The plan of six technical areas that encompass the field of data
science described by Cleveland
include multidisciplinary investigations, models and methods
for data, computing with data,
pedagogy, tool evaluation, and theory. The primary catalyst for
Cleveland’s declaration of the six
36. technical areas was to act as a guideline for the percentage of
the overall effort a university or
governing organization should apply to each technical area to
begin to define curriculum for the
development of future data scientists and was adapted to
support this research (Cleveland, 2001).
Significance
DISA (2015) suggested the capability to leverage meaningful
information from big data
is important to the DOD. However, there are researchers that
also suggests there are significant
shortfalls in the abilities of complex organizations to fully
employ business intelligence
techniques on extremely large data sets (Harris & Mehrotra,
2014). In June 2014, the Office of
Naval Research published a request to commercial and DOD
industries for white papers and full
proposals on how to use big data for real insight (McCaney,
2014). The overall objective was to
achieve unprecedented access to data with deeper insights by
examining the data in new and
innovative ways (McCaney, 2014). Additionally, in March of
2015 the Defense Information
37. Systems Agency (DISA) published a request for information
regarding infrastructure
development to support potential big data and governance
solutions. This request is specifically
seeking examples of commercially developed solutions that are
more efficient than current DOD
solutions (DISA, 2015). The desired significance of this
research was to develop an
understanding of the skills required by modern-day analysts and
help determine if a data scientist
10
is justified and warranted in the DOD.
Definition of Terms
Big Data is characterized as “datasets that are too large for
traditional data processing
systems and that therefore require new technologies” (Provost &
Fawcett, 2013, p. 54).
Big Data is characterized by “extremely high volume, velocity,
and variety (commonly
referred to as the “3 Vs”). It also exceeds the capabilities of
most relational database
38. management systems and has spawned a host of new
technologies, platforms, and approaches”
(Watson & Marjanovic, 2013, p. 5).
Big Data Analytics: “Analytical techniques in applications that
are so large (from
terabytes to exabytes) and complex (from sensor to social media
data) that they require advanced
and unique data storage, management, analysis, and
visualization technologies” (Chen et al.
2012, p. 1165).
Data Scientist Definition #1 is a seasoned professional with the
training, skills, and
curiosity to discover new insights in the era of big data
(Davenport & Patil, 2012).
Data Scientist Definition #2 is someone that is better at
programming than statistics and
better at statistics than a computer scientist (Baskarada &
Koronios, 2017).
Assumptions and Limitations
The goal of this qualitative case study was to explore how DOD
employees conduct data
analysis with the influx of big data. This research explored the
emerging commercial data
39. scientist occupation and the skills required of data scientists to
help determine if data science is
applicable to the DOD. The ability to generalize conclusions to
a larger population is a potential
limitation of qualitative research (Cooper & Schindler, 2013). A
potential limitation of this study
is the ability to draw conclusions on an organization as large
and complex as the DOD. The
11
following were the assumptions and limitations within this
study.
Assumptions
The sample in this study was limited to a small number of DOD
analysts and managers
within one organization. The research findings are not meant to
be representative of the entire
population of DOD analysts and managers but are meant to be a
catalyst for additional
quantitative research and analysis. Responses from the analysts
and the managers were based
upon their own experiences and perceptions are not meant to be
representative of the entire DOD
40. population.
Limitations
There were some limitations to qualitative data collection,
primarily because of the
subjectivity and biases inherent to each participant and the
researcher (Cooper & Schindler,
2013). The researcher purposively selected an organization
within the DOD responsible for large
data sets and is experiencing the big data phenomenon for
supporting documents, research
literature, and case study. A potential limitation was the
researcher’s bias due to his long DOD
career. The researcher is a career U.S. Navy employee and
purposively avoided U.S. Navy
organizations to prevent bias. All the data collected in support
of this research will be retained
for seven years and then destroyed personally by the researcher
via a crosscut shredder for
documents and via an approved data destruction program for
digital recordings.
Organization for Remainder of Study
This study is organized into five chapters and the basis of
Chapter 1 was to identify the
41. purpose, reasoning, and intent of this doctoral research. The
research in support of Chapter 1
demonstrated a clear business problem regarding the challenges
associated with the big data
phenomenon and lack of defining skills for DOD analysts and
proposed the DOD is suffering
12
from this business problem (Gobble, 2013). Chapter 2 contains
a literature review with
explanations on how this study differs from previous research.
Chapter 3 describes the
methodology and research design employed in this study.
Additionally, the data collection
method(s) are described to include the data analysis, credibility,
dependability, and ethical
considerations (Moustakas, 1994). Chapter 4 presents the data
analysis and findings and Chapter
5 presents a discussion of the results, conclusions, and
recommendations for further research.
42. 13
CHAPTER 2. LITERATURE REVIEW
The evidence is clear; forward acting leaders manage and
harness insights from data to
gain sustainable competitive advantages (Lansiti & Lakhani,
2014). Additionally, there is clear
evidence that there are big data problems emerging due to the
disproportionate growth between
collected data and the abilities of most organizations to analyze
the data (Géczy, 2015). The
general business problem is the lack of effective analysis in
organizations operating in the
modern-day big data environment (Harris & Mehrotra, 2014).
The specific business problem is
that DOD organizations may be struggling with gleaning
actionable information from large data
sets compounded by immature data science skills of DOD
analysts (Harris et al. 2013).
Additionally, the amount of data being collected and requiring
analysis is on a sharp increase for
the DOD. Porche III, Wilson, Johnson, Erin-Elizabeth, and
Tierney (2014) commented that at
little as 5% of all data collected in the U.S. Navy and Air
Force’s intelligence, surveillance, and
43. reconnaissance mission received analytical interpretation: the
U.S. military data analysts are
overwhelmed. Additionally, substantial research is underway to
determine how big data volumes
can create value for individuals, community organizations and
governments (Gobble, 2013). In
response to concern regarding extreme data growth and its
impact on modern day businesses and
society, several scholarly journals have been created just in the
past few years which are bringing
scholars and practitioners together to research and report on the
growing big data business
problem and data sciences (Frizzo-Barker, Chow-White,
Mozafari & Dung, 2016). For example,
the Big Data Analytics, Big Data & Society, and the EPJ Data
Science Journals have all been
founded since 2012.
The objective of this research was to develop an understanding
of how DOD analysts
14
respond to, probe and assimilate data in big data environments
44. to help determine if a data science
occupation is justified and warranted in the DOD. The following
research questions guided the
study:
Primary Research Question 1: How does the Bravo Zulu Center
glean actionable
information from big data sets?
Primary Research Question 2: How mature are the data science
analytical skills,
processes, and software tools used by Bravo Zulu Center
analysts?
This chapter describes the processes used to explore big data
and data sciences and
identifies and describes research studies that have been
completed regarding this important
business problem in commercial business as well as the DOD.
This chapter is the result of a
comprehensive review of the pertinent scholarly and
practitioner literature surrounding big data
and data sciences and is foundational for a qualitative
methodology and case study research
design.
Conceptual Framework and Research Design
45. The conceptual framework that serves as the foundational
knowledge to support this
research study is the work of William S. Cleveland (2001). This
seminal research introduced the
term data science in the context of “expanding the technical
areas of the field of statistics.” This
seminal work described the requirement of an “action plan to
enlarge the technical areas of
statistics focuses of the data analyst” (Cleveland, 2001, p. 1).
Cleveland described a major
altering of the analyst occupation to the point that a new field
shall emerge called “data science”
(Cleveland, 2001, p. 1). Cleveland’s data science taxonomy
directed universities to develop six
technical areas, allocate resources appropriately to research, and
develop curriculum within these
technical areas. Additionally, Cleveland recommended a data
science action plan that could be
15
adapted for research by government and corporate
organizations. Since Cleveland (2001) there
46. have been many researchers advancing the field of data science
through theories and methods.
However, there has yet to be provided a largely accepted
academic definition of data science to
include the skills required of data scientists and how best to
employ data scientists in modern big
data environments (Viaene, 2013). Conversely, there are
scholars conducting scientific research
further defining the data science occupation and there are
universities that have developed
curriculum to educate data scientists (Cotter, 2014). The lack of
a definition regarding data
science and the potential shortage of these professionals
coupled with the rapid data growth in
DOD data systems presents a key issue for the DOD.
As described by Moustakas (1994), qualitative research is an
approach to explore how
groups or individuals perceive a specific phenomenon or
problem. This type of research involves
collecting data typically in the participants’ settings and
inductively conducting analysis of the
collected information looking for themes to provide insight and
understanding (Moustakas,
1994). A qualitative research design utilizing a single embedded
47. case study organization is
appropriate for this research and the Bravo Zulu Center agreed
to participate as the case study
organization.
Gap in Literature
Although there is a tremendous amount of literature with
researchers investigating the
implications with big data sets and data science, there is a gap
in published scholarly literature
regarding big data and data sciences related specifically to the
DOD. Frizzo-Barker et al. (2016)
conducted a systematic review of the big data business
scholarship published between the years
2009-2014. These researchers analyzed 219 papers from 152
relevant academic journals and
concluded big data research and theory is fragmented and in
“early state of domain of research in
16
terms of theoretical grounding, methodological diversity, and
empirical evidence” (Frizzo-
Barker et al. 2016, p. 1). Frizzo-Barker et al. (2016) examined
48. key elements as to the types and
sheer volume of published big data research as well as to the
aspects of big data problems and
opportunities examined in contemporary big data research.
Frizzo-Barker et al. (2016) examined
the types of industries and organizations being analyzed through
big data research and concluded
most research can be categorized as either business in general
or financial and management.
These researchers categorized any research regarding big data
and the DOD into the law and
governance category making up 17% of the total big data
research submitted suggesting a
significant gap exists in big data research associated with DOD
as seen in Figure 1.
Figure 1. Analysis of Big Data Scholarship. Adapted from “An
Empirical Study of the Rise of
Big Data in Business Scholarship,” by J. Frizzo-Barker, P.
Chow-White, M. Mozafari, and H.
Dung 2016, International Journal of Information Management,
36(3), p. 410. Copyright 2016 by
Elsevier. Reprinted with permission.
49. Additionally, there is an abundance of contemporary big data
research regarding the
technological advances enabling the big data phenomenon and
much less surrounding the human
and data science implications associated with big data. In fact,
there appears to be a gap in
published scholarly literature that tackles the human
implications associated with big data and
17
data sciences and this gap is the focus of this research. There
appears to be many opportunities to
explore new theories and practices that may evolve regardi ng
the management of big data and
the evolution and application of data science (George et al.
2014).
The Big Data and Data Science Buzz
Without question the term big data and associated literature
experienced a sharp increase
over the past decade. In Young’s (2014) dissertation regarding
big data and healthcare Young
cited a 2013 Google search on the term big data which yielded
9.1 million hits, I executed the
50. same Google search in December 2017, and the search provided
343 million hits regarding big
data and I executed the same search in August 2018, and the
search provided 824 million hits.
Additionally, there is a plethora of both scholarly and secondary
literature surrounding big data
and data science and this literature review was the product of
the examination of hundreds of
writings regarding these topics. This literature review focused
on the perceived benefits and
liabilities of big data and the implications for analysts in
modern organizations responsible for
capturing meaningful information from the data. Specifically,
are there actions and emerging
requirements of the people responsible for analyzing data
because of the arrival of large amounts
of data, and secondly is the notion of a data scientist warranted?
Additionally, this literature
review focused on supported evidence of successful big data
application by commercial
organizations to aid the DOD regarding their initiatives to
harness big data.
A continually growing interest from mainstream media and
research firms are
51. contributing to the message regarding data sciences. The
research firm Glassdoor is an
organization that ranks occupations based upon current job
openings, salaries, career
opportunities, and job satisfaction. This organization ranked
data scientist as the top job in the
United States for 2016, 2017, and 2018 and indicated a data
scientist could expect to earn an
18
annual salary of $110,000 (Columbus, 2018). In this example, a
major research firm on job
occupations in the United States declared data scientist as the
top profession and yet as this
literature review highlights the DOD has not determined how
and if data scientists are needed.
Additionally, a very often cited report Manyika et al. (2011)
suggested a short fall of analytical
and managerial talent in the United States in the range of
140,000 to 190,000 people by 2018.
The well-published big data researchers Thomas Davenport and
D.J. Patil not only agreed to the
52. shortfall but also labeled data scientist as the “sexiest” job in
the 21st century (Davenport & Patil,
2012, p. 1). Conversely, Fox and Do (2013) advocated there
may be too much hype regarding
big data and its potential impacts. These researchers indicated
the term big data is too vague and
this vagueness is causing prioritization problems for
organizations. These researchers suggest
that increasing data both in size and complexity has been on-
going since the mid-1990s and it
does not represent a new problem (Fox & Do, 2013). Comparing
literature between researchers
such as Davenport and Patil (2012) who claimed big data and
data science is having profound
effects on most industries and researchers such as Fox and Do
(2013) who proposed that big data
is not new demonstrates this is an on-going debate that requires
further research.
The term data scientist gained significant notoriety and
momentum in 2008, when D. J.
Patil and Jeff Hammerbacker were leading the analytical efforts
at Facebook and LinkedIn
(Davenport & Patil, 2012). Data scientists are professionals at
gleaning actionable information
53. from large amounts of data. Data scientist use traditional math,
science, and statistical techniques
along with modern analysis software to glean actionable
information from large data sets
(Davenport & Patil, 2012). Furthermore, the term data scientist
received a great amount of
popular press when D. J. Patil went on to be appointed by
President Obama as the first Chief
Data Scientist at the White House (Smith, 2015). D.J. Patil
served in this capacity under
19
President Obama from 2015-2017. The following
comprehensive review of the existing scholarly
and practitioner literature explores the potential and effects of
big data and seeks to document the
implications and requirements of today’s business leaders and
understand the growing
importance of data science.
Big Data Defined
There is clear evidence demonstrating there is a big data
phenomenon underway, but it is
54. less clear on the full ramifications of big data and how prepared
is the human element and the
full significance of the big data phenomenon. There are
scholarly researchers suggesting the
arrival of big data includes cultural, technological, and
scholarly impacts (George, Haas, &
Pentland, 2014). Conversely, there are some influential
researchers, such as Watson and
Marjanovic (2013), that indicate big data may not represent
something new but is simply the next
phase of digitization as societies continue to progress through
the information age. Beer’s (2016)
theoretical framework suggested there is very little
understanding of the concept of big data,
such as where the term came from, how is it used and how does
it lend authority thereby further
conceptualizing the big data phenomenon and allowing for
actionable research and theory.
Schneider, Lyle, and Murphy (2015) indicated the growing
conversation of big data is a very
relevant conversation to the DOD due to the extreme data
growth and data capture by DOD
activities coupled with indications the data growth trends will
continue for the near future.
55. Big data has become a ubiquitous term with no single unified
definition. A commonly
cited explanation describes big data “as the collection of data
sets so large and complex that it
becomes difficult to process using traditional relational
database tools and traditional data
processing applications” (Moorthy et al. 2015, p. 76). The
origin of the term big data is
debatable; however, this term has been around since at least the
1990s. Several authors give
20
some credit to John Mashey, who in the 1990s was a chief
scientist working at Silicon Graphics
Inc., responsible for developing methods for the management of
large amounts of computer
graphics. Mashey gave hundreds of presentations to small
groups in the 1990s to explain the
concept of an extremely large amount of data capture coming
quickly with profound impacts
(Lohr, 2013).
Several researchers, such as Watson and Marjanovic (2013),
placed big data on an
56. evolutionary scale and depict the big data phenomenon as the
fourth generation in the
information age. With decision support systems (DSS) as the
first generation which was born in
the early 1970s. Secondly, the 1990s brought in the era of the
enterprise data warehousing in
which businesses aggregated their data from many disparate
data sources and field locations into
a single warehouse or warehouses. The third generation arrived
in the early 2000s in which
senior leaders and managers were gaining near and real-time
access into these data warehouses
and invested heavily into the business intelligence layers built
on top of these data sets to gain
powerful and competitively attractive decisions into their value
chains. Finally, the big data era is
creating a fourth generation that promises to be a catalyst for
major change and innovation in
nearly all industries (Watson & Marjanovic, 2013).
The Size of Big Data
The amount of data collection globally is growing rapidly and
modern organizations are
capturing massive amounts of data on activities up and down
57. their value chains. Additionally,
millions of networked sensors are being embedded into
machines creating a hugely data rich
environment. This exponential growth in data is underway in
nearly all sectors of the U.S.
economy and businesses are simply collecting more data than
they can manage (McAfee &
Brynjolfsson, 2012). There are several researchers and
organizations studying the amount of data
21
generated and providing predictions of massive growth in the
decade ahead. One common
resource cited in modern literature surrounding big data is the
Digital Universe research project
sponsored by the EMC Corporation (Turner, Reinsel, Gantz &
Minton, 2014). This project seeks
to define how big the big data expansion is today and provides
predictions of data growth into
the next decade. According to the Digital Universe, data
generation and collection will double
every two years and by 2020, the size of stored digital data will
reach 44 trillion gigabytes. To
58. help put this into context if this amount of data was stored in a
stack of tablet computers, such as
an iPad™, there would be 6.6 stacks of tablets equal to the
distance from the Earth to the Moon
(Turner et al. 2014).
The Three V’s Revised
There are many assumptions and perplexities regarding big data
definitions. If all
organizations generate data, what constitutes big data?
Additionally, because big data is a term
with different meanings it creates difficulties when determining
solution paths regarding big data
efforts (Watson & Marjanovic, 2013). Attempting to define a
taxonomy on which to conduct big
data research is a common theme in contemporary big data
literature (Beer, 2016). In 2001,
Douglas Laney of META group authored what is now
considered a foundational white paper
regarding data management and provided a context upon which
the big data phenomenon could
be described. Even though there is no consensus on the amount
of data that constitutes big data,
the impact of big data could be described through the constructs
59. of volume, velocity, and variety
(Phillips-Wren & Hoskisson, 2015). Although an exact and
wide-spread definition of big data
has not been commonly agreed to, examining the data growth
through Laney’s definition is very
commonly cited in the literature. Laney described the three V’s
in the context of the amount and
size of the data (volume), the rate at which data is
produced(velocity), and range of different
22
formats data is being generated and delivered (variety)
(Phillips-Wren & Hoskisson, 2015).
Kitchin and McArdle (2016) suggested Laney’s traditional view
of big data using the
three V’s lacks ontological clarity. Ontological clarity would
define the concepts, categories and
properties of big data and the relationships between them
(Kitchin & McArdle, 2016). The use of
the three V’s to describe big data is a useful entry point but
only describes a broad set of issues
associated with big data, vice providing further definition and
practicality of big data (Kitchin &
60. McArdle, 2016). Additionally, Kitchin and McArdle (2016)
aggregated and submitted several
important and new qualities and attributes of big data,
suggested by several contemporary big
data researchers, to include the following:
than being sampled.
-grained. Resolution and uniquely indexical (in
identification).
hat enable the
conjoining of different
datasets.
and error.
. Data provides many insights can be extracted and the
data repurposed.
context in which they are
generated” (Kitchin & McArdle, 2016, p. 1).
Kitchin and McArdle (2016) explored ontological
61. characteristics of 26 datasets to
provide a more actionable definition of big data. These
researchers developed a taxonomy of
seven big data traits and then applied these traits against 26
data sets that were considered to
23
meet current definitions of big data. Kitchin and McArdle
(2016) significantly added to Laney’s
foundational definition of big data and demonstrated big data is
qualitatively different to
traditionally small data sets along seven axes as seen in Table 1.
Table 1
Kitchin & McArdles’ Seven Traits and Small to Big Data
Comparison
Small Data Big Data
Volume Small or limited to large Very large
Velocity Slow, freeze-framed or bundled Fast, continuous
Variety Limited in scope to wide ranging Wide
Exhaustivity Samples Entire populations
62. Resolution and indexicality Course and weak to strong and tight
Tight and strong
Relationality Weak to strong Strong
Extensionality and
scalability
Low to middling High
Note. Adapted from “What makes big data, big data? Exploring
the ontological characteristics of 26
datasets,” by R. Kitchin and G. McArdle, 2016. Big Data &
Society, 3 (1). CC 2016 by Sage Publishing.
Big Data Benefits
The traditional analytics environment that exists in most
organizations today includes
transactional systems that generate data and data warehouses
that store the data. Data warehouses
are thus collections of federated data marts. A set of business
intelligence and analytics tools that
aid decision-making through queries, data mining, and
dashboards. Typical dashboards drill from
top-level key performance indicators down through a wide range
of supporting metrics and
detailed data (Davenport, Barth, & Bean, 2012).
63. Almeida (2017) suggested the primary purpose of big data
analysis is to improve
24
business processes through greater insights and better decision
making. Understanding how to
leverage increasingly amounts of data is crucial for business
success in the modern environment.
This researcher conducted an in-depth literature review of
published works between the years
2012-2017 and determined that big data analysis is a growing
theme of importance in big data
research (Almeida, 2017). Additionally, research published in
the Harvard Business Review by
McAfee and Brynjolfsson (2012) was a study encompassing 330
large North American
companies and consisted of structured interviews with
executives spread across these
organizations. The researchers gathered information in
interviews about the companies’
organizational management and technology strategies and
collected information from annual
64. reports and independent sources. The primary purpose of
McAfee and Brynjolfssons’ study was
to investigate if exploiting vast new flows of information in the
era of big data could radically
improve performance. The researchers suggested the era of big
data is a revolution because
companies can measure and therefore manage more precisely
activities up and down their values
streams unlike any time in the past. McAfee and Brynjolfssons
concluded that top performing
companies that are using data-driven decision-making supported
by analytical software were on
average “5% more productive and 6% more profitable”
suggesting companies can and do build
competitive advantages through big data analysis (McAfee &
Brynjolfsson, 2012, p. 64).
Additionally, according to Davenport and Dyché (2013) the
analysis of data to provide insight
into the organizations’ value chain is not a new concept.
However, most businesses are just now
starting to strategize the potential benefits of big data analysis
and how best to implement big
data analysis into their traditional business intelligence
architectures. Corporations such as
65. Yahoo, Google, Wal-Mart, and Amazon are clearly leading the
way regarding big data
management and analysis. However, for most companies the
ability to manage large data sets to
25
the extent of these leading corporations requires strategic
planning and action (Watson &
Marjanovic, 2013). Prominent researchers such as Davenport
and McAfee clearly demonstrate
there is value to companies that can analyze big data sets and
may provide actionable theory for
the DOD. Hoffman (2013) suggested that although the DOD has
been warehousing and
analyzing data for several decades, they, too, require strategic
change to leverage information in
the era of big data. Leveraging big data through analysis is a
high priority for the U.S. military,
however there are researchers suggesting the DOD’s ability to
analyze its data is not keeping
pace with the amount of data being collected (Hoffman, 2013).
Much of the expectation involved
in big data analysis is the continued desire by companies and
66. the DOD to move from reactionary
metrics based on historical data to predictive and prescriptive
metrics that may be possible with
big data analysis. Research on big data and data science
suggests the ability to locate hidden
facts, indicators, and relationships immersed in big data sets not
yet explored (Chen et al. 2012).
DOD and Big Data
The amount of data collection across the DOD has been
increasing at a fast pace and the
demands from the warfighters to make well-informed decisions
from massive amounts of data
are critical (Hamilton & Kreuzer, 2018). Edwards (2014)
suggested big data insights are now an
essential requirement for modern warfare and military
organizations need to use advanced
analytics to take advantage of their massive amounts of data and
avoid over saturation from the
data. The notion the DOD is aware of its growing data challenge
is well documented. However,
it is less clear on just how large is the data growth in DOD
information systems and how
prepared is the DOD to handle big data. The purpose of this
exploratory qualitative case study
67. was to explore how DOD employees conduct data analysis with
the influx of big data. This
research will explore the emerging commercial data scientist
occupation and the skills required
26
of data scientists to help determine if data science is applicable
to the DOD. By conducting a
comprehensive literature review as to the perceptions of big
data and data science there are
potential benefits to the DOD.
DOD Big Data Initiatives
Although Frizzo-Barker et al. (2016) suggested there is a gap in
big data literature for
U.S. government organizations the U.S. defense industry
appears energized by the potential of
big data and big data analysis. The DOD is reaching out to
commercial industries for assistance
and advice (Konkel, 2015). Cyber defense and situation
awareness initiatives appear to be in the
forefront of the department’s initiatives. Many of the big data
projects underway within the DOD
68. are aimed at advancing military, surveillance, and
reconnaissance (ISR) systems (Costlow,
2014). Porche et al. (2014) accumulated several formal research
projects requested by the U.S.
Navy to investigate the huge data growth and provide any
potential ways forward. The amount of
ISR data collected by the U.S. Navy has become overwhelming
with no end in sight. These
researchers explained the U.S. Navy is only able to analyze
approximately five percent of the
data it collects from its ISR platforms (Porche et al. 2014).
Additionally, several researchers
from the U.S. Navy’s postgraduate school collaborated on Big
Data and Deep Learning for
Understanding DOD Data (2015) further expounding on the big
data problem for the DOD with
specific research to help determine if big data and data science
are really something new or just
the next progression in information technology analysis. These
researchers explained that
applications including traditional numerical analysis, statistics,
machine learning, data mining,
business intelligence, and artificial intelligence are migrating
into a common term called big data
69. analytics (Zhao, MacKinnon, & Gallup, 2015).
The U.S. Air Force (USAF) is also struggling with the demands
for ISR data collection
27
and analysis as the requirement for these types of missions
continues to increase. In Data Science
and the USAF ISR Enterprise (2016), the USAF Deputy Chief of
Staff for Intelligence,
Surveillance and Reconnaissance released a publicly available
white paper that described
extreme emphasis on the U.S. Air Force’s big data growth and
the opportunities for data
sciences. The U.S. Air Force is experiencing exponential data
growth and increasing demands on
analysts. Data science is a key element in order to unlock big
data for the U.S. Air Force ISR
community (USAF, 2016). This white paper described three
specific conditions that exist today
that are indications of lacking big data analysis. First, even
though there is exponential growth in
data, only a limited set of data is analyzed due to the lack of
70. integration and connectedness.
Secondly, a problem is the incapability to dynamically correlate
and cross-reference data
vertically through organizations and horizontally across mission
areas. Lastly, the shortage of
streamlined processes to coordinate, combine, and disseminate
data to other participating
organizations (USAF, 2016). In this writing, the U.S. Air Force
clearly acknowledged a big data
and data science problem and is requesting additional research
to understand the impacts of
leveraging data scientists. This research suggested big data
specialists should take the lead of
researching and comprehending data science methods and
approaches that would be instrumental
in advancing the field of data sciences across the U.S. Air Force
(USAF, 2016).
Another recent big data and data science initiative suggests the
DOD is strategically
making efforts to analyze big data streams aimed at improving
personnel readiness.
Strengthening Data Science Methods for Department of Defense
Personnel and Readiness
Missions (2017) is a publically available and comprehensive
71. report sponsored by the DOD. The
report requests the National Academies of Science, Engineering,
and Medicine to collaborate on
and provide recommendations on how the Office of the Under
Secretary of Defense (Personnel
28
& Readiness) could use the field of data science to improve the
effectiveness and efficiency of
their critical mission. Specifically, the request was to develop
an implementation plan for the
integration of data analytics into the DOD decision-making
processes. A major theme is this
report is to further the development of advanced analytics and
the strengthening of data science
education. A skilled workforce that can apply contemporary
advances in data science
methodologies is critical. Furthermore, this research study
concluded that based upon similar
research conducted in other mature organizations this portion of
the DOD’s depth, skills, and
overall resources in data analytics is insufficient. Having small
pockets of data science expertise
72. is not sufficient and the DOD should seek to raise the overall
general level of awareness and
skills to become more effective. Simply stated, new data science
skills are critically needed in
the DOD workforce (National Academies Press, 2017). The U.S.
Army also has several big data
initiatives underway with exclamations that big data analysis
has arrived and is here to stay. The
Commander’s Risk Reduction Dashboard (CRRD) is an
initiative that integrates a variety of
personnel data from several data sources. The CRRD relies on
big data analysis to inform local
commanders and higher echelon commands of personnel who
might be at higher risk of suicide
(Schneider et al. 2015). By examining current and publically
available literature from the U.S.
Navy, U.S. Air Force, and the U.S. Army there are distinct big
data and data science projects on-
going. Many of the projects are championed by senior officers
who have expressed concern
regarding the abilities of DOD organizations to analyze big data
sets. Additionally, it is also clear
the DOD is interested in examining the big data and data
science practices of commercial
73. organizations and to leverage these advances across DOD
organizations to support national
defense strategies.
29
Big Data Challenges
According to Watson and Marjanovic (2013) the challenge with
harnessing the power of
big data includes identifying which sectors of data to exploit,
getting data into an appropriate
platform and integrating across several platforms, providing
governance, and getting the people
with the correct skill sets to make sense of the data. There is
evidence this fundamental problem
resides within the DOD as well. The essence of analyzing big
data within the DOD requires
many data sources to be fed from hundreds of organizations
requiring the defining data sharing
legal, policy, oversight, and compliance standards to make it
happen (Edwards, 2014). To make
74. effective use of big data within the DOD requires an investment
of time and money as well as
finding the correct talent to do the analysis. Locating the people
within DOD as well as bringing
in analysts from outside the DOD to successfully conduct big
data analysis is a major challenge
(Edwards, 2014). Schneider, Lyle, and Murphy (2015)
categorized the primary challenges
associated with big data specifically for the DOD and listed the
ability to analyze and interpret
the data as a primary concern. Furthermore, these researchers
recommended incentivizing
analysts to remain loyal to the DOD may be one of the biggest
challenges the DOD will face
with big data analysis.
White House Big Data Strategy
Another example that the U.S. Government is acting on big
data and data science is the
White House’s big data strategy. In March 2012, the Obama
administration published the Big
Data Research and Development Initiative with specific
implications for six federal departments
or agencies including the DOD. The intent of the initiative is to
build an innovation ecosystem to
75. enhance the ability to analyze, extract and make decisions from
large and diverse data sets. The
intent is for Federal agencies to better support the entire nation
based upon data (White House,
30
2012). One of the specific initiatives was to expand the
workforce needed across federal agencies
to develop and use big data technologies. The DOD portion of
the big data initiative focuses on
three areas: data for decisions, autonomy, and human systems.
The data to decision aspect of this
initiative is to develop computation techniques and software
tools for analyzing large amounts of
data (White House, 2012). Stemming from the White House big
data initiative the Federal Big
Data Research and Development Strategic Plan (2016) was
promulgated. The Big Data Steering
Group reports to the Subcommittee on Networking and
Information Technology Research and
Development (NITRD) and published their report through the
direction of the Executive Office
76. of the President, National Science, and Technology Council.
There are seven detailed strategies
promulgated in this plan with strategy number six directly
related to the business problem and
research questions that chartered this research with the BZC.
Strategy 1: “Create next generation capabilities by leveraging
emerging Big Data foundations,
techniques, and technologies” (White House, 2016, p. 6).
Strategy 2: Support R & D to explore and understand…
Strategy 3: Build and enhance research cyber infrastructure…
Strategy 4: Increase the value of data through policies that
promote sharing…
Strategy 5: Understand big data collection, sharing, regarding
…
Strategy 6: “Improve the national landscape for big data
education and training to fulfill
increasing demand for both deep analytical talent and analytical
capacity for the broader
workforce” (White House, 2016, p. 29).
-empowered domain experts
-capable workface
77. 31
Strategy 7: “Create and enhance connections in the national big
data innovation ecosystem”
(White House, 2016, p. 34).
The NITRD’s supplement to the fiscal year 2018 President’s
budget indicates the Federal Big
Data Research and Development Strategic Plan (2016) is still an
active plan under President
Trump (White House, 2018).
Data Sciences
Similar to using a search engine to search term big data, a
review of both scholarly and
gray literature regarding data sciences and data scientists
returns a plethora of literature. There is
evidence suggesting the term data science has been around for
decades. However, many scholars
credit William S. Cleveland (2001) with introducing the term
data science in the context of
enlarging the major areas of technical work in the field of
78. statistics. This seminal work described
the requirement of an “action plan to enlarge the technical areas
of statistics focuses of the data
analyst” (Cleveland, 2001, p. 1). Cleveland described, due to
the increasing collections of data a
major altering of the analysis occupation to the point a new
field shall emerge and will be called
“data science” (Cleveland, 2001 p. 1). The plan of six technical
areas that encompass the field of
data science includes multidisciplinary investigations, models,
and methods for data, computing
with data, pedagogy, tool evaluation, and theory Figure 2. The
primary catalyst for Cleveland’s
declaration of the six technical areas was to act as a guideline
for the percentage of the overall
effort a university or governing organization should apply to
each technical area to begin to
define curriculum for the development of future data scientists
(Cleveland, 2001). The focal
point of this research is to understand and document the current
environment surrounding the
required skills for big data analysis. Additionally, to explore the
call for data science as described
79. 32
by Cleveland and further the body of knowledge regarding the
progression of the data science
occupation with specific emphasis on the DOD.
Figure 2. Cleveland’s Data Science Taxonomy. Adapted from
“Data Science: An action plan for
expanding the technical areas of the field of statistics.” by W.
Cleveland (2001) International
statistical review, 69(1), 21-26.
Scholarly Views of the Data Scientist Role
Zhu and Xiong (2015) explained there is a new discipline
emerging called data science
and there are distinct differences between the established
sciences, data technologies, and big
data. The formation and the further development of data science
extends much further than
computer science. Although data scientists use similar methods
and techniques there are
profound differences and data science requires fundamental
theories and new techniques (Zhu &
80. Xiong, 2015). In an attempt to further define data science and
data scientist Harris, Murphy and
Vaisman (2013) provided the results of the survey they
conducted in mid-2012 of working
analysts across multiple industries. These researchers surveyed
analysts to understand their
Data
Sciences
Multidisciplinary
Investigation
Models &
Methods
Computing
with Data
Pedagogy
Tool
Evaluation
Theory
33
experiences and perceptions of their skills. This research
provided a quantitative methodology
81. that researchers and DOD organizations could leverage to
understand how to evolve their
existing analysts into data scientists. Harris, Murphy and
Vaisman (2013) furthered the notion of
the T-shaped data analysts. These are analysts that have broad
expertise (top of the T) coupled
with in-depth knowledge of a particular skill or business domain
(stem of the T). The vertical
stem of the T represents deep and foundational business domain
understanding and the
horizontal bar represents a wide range of skills necessary across
the organization (Harris et al.
2013). Additionally, scholars such as Vincent Granville, Ph.D.
have now published detailed
descriptions of data scientists with specific skill requirements.
In his foundational book
Developing Analytic Talent: Becoming a Data Scientist (2014)
Granville explained vividly data
science is a new role emerging across industries and
government organizations. The data
scientist role is different from traditional roles of statistician,
business analysts and data
engineers. Data science is a combination of business
engineering and business domain expertise,
82. data mining, statistics, and computer science, along with
advanced predictive capabilities such as
machine learning. Data science is bringing a number of
processes, techniques, and
methodologies together with a business vision to drive
actionable insights (Granville, 2014).
Business Intelligence and Business Analytics
Although there are scholars such as Zhu and Xiong (2015) and
Harris, Murphy and
Vaisman (2013) that proposed data science is an emerging
occupation with distinct skill
requirements beyond traditional data analysts. There are
scholarly researchers suggesting data
science is the next logical progression of business intelligence
(BI) and business analytics (BA)
generating on-going debate. Provost and Fawcett (2013)
suggested companies have realized the
benefits of hiring data scientists and academic institutions are
creating data science curriculums
34
and contemporary literature is documenting advocacy for a new
data science occupation.
83. However, there is disagreement about what constitutes data
science is and without further
definition; the concept may diffuse into a meaningless term.
These researchers argue data science
has been difficult to define because it is intermingled with other
data driven decision making
concepts such as business analytics, business intelligence, and
big data. The relationships
between these concepts and data science required further
exploration and the underlying
principles of data science need to emerge to fully understand
the potential of data science
(Provost & Fawcett, 2013).
The research conducted by Chen, Chiang, and Storey (2012)
described a clear evolution
of business intelligence and business analytics starting in the
1990s and determined big data
analytics is a similar field offering new opportunities. They
described big data and big data
analytics as terms used to describe the “data sets and analytical
techniques that have become
large and complex and typically require unique and advanced
storage” (p. 1165). Additionally,
84. big data sets may require specialized management, analysis and
visualization technologies, and
techniques. The big data era has quietly moved into many
public, private, and corporate
organizations and these researchers explained significant
improvements in market intelligence,
government, politics, science and technology, healthcare,
security, and public safety through big
data analysis. These researchers expressed that the analysis of
big data is a related but separate
field to business intelligence and business analytics (Chen et al.
2012).
Data Sciences Skills
The literature suggests before modern-day organizations,
including the DOD, can benefit
from the rapid data growth and access to real time information,
data scientists are going to be
required and will need to be embedded into the decision
processes (Galbraith, 2014). Research
35
published in the Harvard Business Review Shah, Horne, and
Capellá (2012) suggested even
85. though companies are investing heavily in deriving insights
from data streaming from their
customers and suppliers there are still significant gaps in skills
and abilities of individuals and
organizations to conduct the analysis. In 2012, these researchers
surveyed 5,000 employees from
22 global companies and determined less than 40% of
employees have sufficiently matured skills
to succeed in a big data environment (Shah, Horne & Capellá,
2012). Fundamentally, the ability
most organizations possess is to analyze only a small subset of
their collected data that is
constrained by analytics and algorithms of desktop software
solutions with modest capability
(Shah et al. 2012).
Fundamental to the investigation on whether a data scientist is
different from traditional
quantitative analysts requires an investigation of the current
abilities of data scientists in relation
to their requirements to generate information and the ability of
the data scientists to use the
modern tool sets (Harris & Mehrotra, 2014). Many questions
still exist such as: what is the level
86. of education needed? Do data scientists need to have a terminal
degree or is data science an
applied role? Do all data scientists need to be experts in
machine learning and unstructured data
analysis? Additionally, there is evidence suggesting a rise in the
mistaken assumptions regarding
the meaningfulness of correlations in the era of big data. For
example, big data sets often
produce statistically significant findings even though the results
are false and potentially based
on inappropriate analytical methods suggesting a required
modification of analytical skills (Shah
et al. 2012). The arrival of big data suggests the typical
statistical approach of relying on p values
to establish significance and correlation will unlikely be
sufficient in a world of immense data in
that almost everything is significant. Simply, when utilizing
traditional and typical statistical
tools to analyze big data it is common to arrive at false
correlations (George et al. 2014).
36
Harris and Mehrotra (2014) expressed that in their research the
87. organizations that create
the most value from data science are the ones that allow their
data scientists to discover insights
from “open-ended questions that matter the most to the
business” (p. 16). These researchers also
suggested there are distinguishable differences between data
scientists when compared to
traditional quantitative analysts and there are many implications
on how to define the roles of
data scientists as well as how to attract and train these experts
and how to get the most value
from this emerging discipline. In 2014, these researchers
surveyed more than 300 analytical
professionals from many different companies and from several
industries to learn how these
analysts perceived their work and role in the organization. In
their research they concluded about
one-third of the analysts describe themselves as data scientists
with the remaining identifying
themselves as analysts with distinguishable characteristics. For
example, more data scientists
than analysts consider their work more critical to favorable
business outcomes. Additionally,
94% of the data scientists’ surveyed indicated analytical
88. abilities are a key element of their
companies’ strategies and business model as compared to 65%
of the traditional analysts who
believe their work is tied directly to business models and
strategies (Harris & Mehrotra, 2014).
According to Harris and Mehrotra (2014), data scientist skills
differ from traditional analyst and
the most typical distinctions are provided in Table 2.
37
Table 2
Harris and Mehrotra’s Analysts and Data Scientists
Comparisons
Traditional Analysts Data Scientists
Types of Data Structured or semi-
structured, relational and
89. typically numeric data
All types, including unstructured,
numeric, and non-numeric data (such
as images, sound, text)
Preferred Tools Statistical and modeling
tools, usually contained in a
data repository
Mathematical languages (such as R
and Python®), machine learning,
natural language processing and
open-source tools.
Nature of work Report, predict, prescribe
and optimize
Explore, discover, investigate and
visualize
Typical educational
background
Operations research,
90. statistics, applied
mathematics, predictive
analytics
Computer science, data science,
symbolic systems, cognitive science.
Mind-set Percentage who say they:
formal projects 54%
Percentage who say they:
projects 89%
Note. Adapted from “Getting value from your data scientists,”
by J. Harris and V. Mehrotra, (2014). MIT
Sloan Management Review, 56(1), 15-18. Copyright 2014 by
Massachusetts Institute of Technology.
91. Adapted with permission.
The research concluded data scientists are highly skilled
specialists who tackle the most
significant and complex business challenges (Harris &
Mehrotra, 2014). Common themes
regarding the skills required of data scientist include advanced
and in many cases, open source
statistical software such as R and Python. These applications
lend themselves to another common
characteristic of the perceived data scientist and that is they
will serve the organization best if
they can explore open-ended questions (Davenport & Dyché,
2013).
Harris, Murphy and Vaisman (2013) conducted quantitative
research in 2012 that
surveyed analysts across several industries to further the
knowledge of data science skills and the
38
role of data scientists. The researchers developed a list of 22
generic data science skills and then
ask the respondents of their survey to categorize the skill s and
92. to self-identify their perceived
roles against the list of data science skills. The list of perceived
data science skills as described
by these researchers was adapted to analyze the perceived skills
and roles of the analysts at the
Bravo Zulu Center as seen in Table 3.
Table 3
Harris, Murphy and Vaisman Data Science Skills
Perceived Category Data Science Skills
Business Product development
Business
Machine Learning/Big Data Unstructured data
Structured data
Machine learning
Big and distributed data
Math & Operations research Optimization
Math
Graphical models
Bayesian/Monte Carlo statistics
93. Algorithms
Simulation
Programming System administration
Back end programming
Front end programming
Statistics Visualization
Temporal statistics
Surveys and marketing
Spatial statistics
Science
Data manipulation
Classical statistics
Note. Adapted from “Analyzing the Analyzers: An introspective
survey of data scientists and their work,”
by H. Harris, D. Murphy, and M. Vaisman, (2013). Sebastopol,
CA: O’Reilly Media. Copyright 2013
by the authors. Adapted with permission.
39
94. Defining the occupation of the data scientist is an evolutionary
process currently
underway. Viaene (2013) explains that data science is not yet a
defined academic discipline or
established profession. There appears to be a group of
occupations such as scientists, analysts,
technologists, engineers, statisticians working together to carve
out the role for the data scientist.
This researcher also agrees with other data science research
underway that big data analysis
requires a multi-skilled team in which the data scientist is a
member. Big data sets combined
with advanced analytical capability are creating a breed of
analysts that are going to be able to
uncover hidden patterns and unknown correlations
(Santaferraro, 2013).
Data Science and Business Domain Connection
A common theme in data science research suggests that for data
scientists to generate
business value they will need to work closely with domain
experts in the organization (Viaene,
2013). To create the business value and prevent runaway data
projects this researcher proposes a
benefits realization process through a circular series of steps.
95. This process can create
collaboration between the business domain experts and the data
scientists and should be a
foundational requirement before starting a data science project.
Viaene’s benefits realization
process steps are briefly described below:
- modeling represent using data to
create improvements in the
business.
- discovery takes place in the model domain.
- operational insights are transferred
to the model domain to
the business domain or operationalized.
- promotes the best practices for the
use of data and data science
to maximize the investment.
40
Three Types of Analysts
Viaene (2013) describes the roles of traditional analysts fall
into three categories: data
96. analysts, business intelligence analysts, and business analysts.
First data analysts are
professionals that understand where data comes from and how
to make data available for
business decisions. These analysts typically focus on the
extraction, cleansing, and
transformation of raw data in actionable information and most
data analysts have computer
science training and solid backgrounds in math and statistics.
Second, business intelligence
analysts are effective once the data have been moved into data
marts and data warehouses. Third
business intelligence (BI) analysts perform the next level of
data preparation. Business Analysts
are the business analysts are the group within the organization
that can transform the information
collected into actionable insights on where to influence the
business. The abilities of moving,
handling and analyzing data make these traditional analysts
ideal data scientist candidates.
To evolve these traditional analysts into data scientists will
require proficiencies in
parallel computing and petabyte sized non-structured analysis
capability of NoSQL databases,
97. machine learning, and advanced statistics (Santaferraro, 2013).
To gain these data scientists,
Santaferraro suggested the creation of internal programs that
provides the opportunity for
existing data, BI analysts, and business analysts to acquire the
skills they need to become big
data scientists and recommends the creation of this program
around five primary tasks.
Santaferraro (2013) breaks the skills required of the emerging
data scientists into a few distinct
descriptions and provides a five-point plan for filling the
demand for data scientists.
Santaferraro’s five-point plan is summarized below:
Task 1 – Canvas existing analysts and identify those with the
background, talent and
desire to increase their skills and create education opportunities
for these individuals.
41
Task 2 - Provide incentives for participants and reward them for
reaching milestones.
Incentivizing data scientists’ loyalty will be important due to
98. the shortage of data scientists.
Task 3 – Organize analysis structure to support big data
success. Avoid tying data
scientists only to business units or only creating an enterprise
pool of data scientists. A hybrid of
these two approaches is warranted.
Task 4 – Deploy the infrastructure to support big data analytics.
Create an infrastructure
to support unconstrained analytics. These systems should
contain embedded analytics, agile
extensions, rapid iterations, real-time access, and extreme
flexibility.
Task 5 – Foster a culture of analytics that supports data driven
decisions. Big data
analysis can eliminate emotions, gut feelings, and egos from
decision-making.
Training and Certification of Data Scientists
Henry and Venkatraman (2015) claim the average American
universities and their degree
programs are unprepared to provide the analytical skills
required of corporations in the modern
big data environment. Conversely, the literature suggests there
are many colleges, universities,
99. trade-schools, research organizations, software providers and
government organizations that are
modifying their curriculums to include advanced analytics and
data science (Miller, 2014). The
literature regarding data science suggests there are no widely
agreed upon standards and
certification requirements for data science and data scientists.
Essentially anyone can label
themselves a data scientist. Considerations such as the
educational level and the core skill
requirements are still in large debate making it difficult to
define data science skills and
curriculums. However, there are many educational institutions
now providing their interpretation
(Cotter, 2014).
In Cotter’s (2014) dissertation: Analytics by Degree: The
Dilemmas of Big Data Analytics
42
in Lasting University/Corporate Partnerships this researcher
conducted in-depth investigation
about how corporations and universities should partner to
ensure the readiness of graduates to fill
100. key analysis roles in the era of big data. Cotter conducted a
phenomenological study and
interviewed four business analytical groups: business leaders,
faculty, recent graduates, and
supervisors of recent graduates to determine the readiness of the
recent graduates and the
perceived overall effectiveness of the university education. This
research concluded that most
business analytics graduates are initially lacking in real -world
preparation. Additionally, Cotter
concluded the ever-changing business world is creating a need
for analytical capability that may
have been previously satisfied with the T-shaped analysts
(Cotter, 2014). Cotter’s research
amplifies the research questions posed in this dissertation
regarding how prepared are the
analysts within the DOD to glean actionable information from
big data sets? Fundamentally,
determining how the curriculums offered today at universities
and DOD learning institutions
may need to alter to provide data scientists to the workforce is
high interest to DOD leaders
(Edwards, 2014).
101. Defining data scientists’ skills, training and certification
requirements is problematic
because of the broad implications and overlapping language
with business intelligence, data
analysis, and business analytics. Cotter (2014), also conducted a
comprehensive review of the
current degrees and certifications offered at the undergraduate
and graduate levels in the United
States and abroad and concluded there are several learning
institutions with many undergraduate
degrees and certifications available. Fundamental to the
investigation on whether data scientists
are different from traditional quantitative analysts requires an
investigation of the current
abilities of data scientists in relation to their requirements to
generate information and the ability
of the data scientists to use the modern tool sets. There is
evidence suggesting not only a skills
43
gap, but the analysis tools are outpacing the ability of the
analysts suggesting a gap in human
talent to harness big data (Halper, 2016). Watson and
102. Marjanovic (2013) suggested already
embedded business analysts can upgrade their skills through
university courses and should
include Java, R, SAS Enterprise Miner, IBM SPSS Modeler,
Hadoop, and MapReduce.
Commercial Certification
Another available option for the DOD to examine their data
science abilities is through
the use of certification from agencies outside of the DOD and
academia. Modest research for
options available for certification of data scientists today
suggests there are several companies
and trade organizations providing training and certification. The
Institute for Operations
Research and Management Science (INFORMS) is an
international organization comprised of
over 12,500 members supporting the fields of operations
research and analytics. INFORMS
describes in their charter a desire to promote practices that
create advances in operations research
and analytics for the betterment of decision-making and
optimize business processes
(INFORMS, 2017). This organization claims to be a leading
organization in the formalization of
103. a certification process for analytics focused on moving
organizations from descriptive to
predictive and prescriptive analytics (Sharda, Asamoah, &
Ponna, 2013). INFORMS sets an
eligibility requirement for experience and skills and then
through a set of high standards and
rigorous examinations certifies analytical professionals with
CAP certification (INFORMS,
2017).
Halper (2016) provided the results of a snapshot survey from an
audience at The Data
Warehouse Institute Chicago 2016. This researched aimed at
furthering the understanding as to
the confidence of software providers to automate analysis of big
data sets and address the skills
gap. There is a push by software and hardware technology
providers to ease the skills required of
44
data scientists by advancing analytical software to continually
move through large data sets
while also providing high level and effective statistical analysis
104. and training. Halper’s modest
research supports the notion that organizations are still trying to
determine what skills are
required for their analysts, where the analysts are going to come
from and are uncertain as the
overall effectiveness of software solutions (Halper, 2016).
Vendor Training and Certification
There are several major corporations such as Microsoft, IBM,
TeraData, and SAS that
are quickly developing professional analytical and data science
programs. Microsoft is
recognizing the growing need for professional expertise in data
science through their
professional development program focusing on data science
theory, hands-on training, on-line
course curriculum coupled with a final project prior to
certification (Davis, 2016). The SAS
institute is another organization offering a data science
certification. This company was founded
in 1976 and has been consistently growing ever since. SAS
suggests that companies successfully
harnessing information from big data are augmenting their
existing analytical staffs with data
105. scientists. Data scientist possess higher levels of IT capability
and specialize training and skills
with emphasis on big data technologies (SAS, 2017). SAS has
developed an Academy for Data
Sciences that offers a blend of classroom and on-line courses
that also uses a case study approach
to get hands on experience. Additionally, the SAS training
curriculum offers training in several
of the sought after big data and data science applications such
as R, Python, Pig, Hive and
Hadoop (SAS, 2017). This research study explored the
commercial availability of data science
training and explored how analysts are trained at the BZC to
help determine if further
exploration of commercial data science training is appropriate
for DOD organizations.
45
Shortfall Preparation
The literature suggests there is a significant shortfall of
analytical professionals within the
106. commercial sector and the DOD and this shortfall is expected to
grow (Géczy, 2015). As this
literature review demonstrates, researchers are calling for
action. Miller (2014) suggested that
big data and data science are such a significant problem that a
national consortium is warranted.
Academia, industry, and the U.S. Government should work
together to continue the growth of a
big data and data science national consortium to address the big
data analytical skills gap (Miller,
2014). This consortium would do the following:
scientist
and analytics
shared communities of
interest
establish strong internship
programs and increase the collaboration between academia and
business
levels of education