SlideShare a Scribd company logo
1 of 14
Download to read offline
Figure 7. Research operation "2 uses two outputs
from operation "# and one from operation "! as its
inputs, and is independent of operation "".
If that which I have said above helps you to see empirical work in a n
since the researcher(s) responsible for the operation will
undoubtedly assign, for instance, different filenames for
different outputs.
By contrast, the namespace for labeling vertices,
that is, research operations is potentially global and
cuts across various governance boundaries. This makes
vertex labeling more difficult. It is relatively easy to
come up with a scheme to name research operations in
an individual research project, even if the project is large
and transcends organizational boundaries involving
many different researchers. However, interesting
opportunities arise if one could develop an infrastructure
that provides globally unique and searchable identifiers
for research operations (and, as we will discuss below,
verify their integrity). This would allow, in principle,
any research operation to reference outputs from any
other operation in the global network of empirical
research—note that I am not talking about the papers
RPG2
v1
v2
v3
v4
e1
e2
e3
Figure 7. Research operation v4 uses two outputs
from operation v2 and one from operation v1 as its
inputs, and is independent of operation v3.
If that which I have said above helps you to see
27 October 2021
INFORMS Annual Meeting
Graphing the Empirical Research Process:
Toward Modular Empirical Research
Aleksi Aaltonen
aleksi@temple.edu
Motivation
The nature of empirical research varies considerably between academic fields.
Methodological plurality and varying practices between academic communities
make it difficult to understand the process by which empirical studies produce their
results beyond one’s own niche.
This is a problem because it:
1. Makes research less transparent and reproducible
2. Hinders the re-usability of intermediate outputs in the research process
Research vs. Software Development Practices
In software development, we glue together existing, well-
tested and validated components while trying to write as little
new code as possible.
In empirical research, we tend to start from the scratch, from
‘raw’ data and do everything ourselves.
The Aim of the Project
Design a rigorous approach for modeling empirical research processes without
sacrificing the diversity of research. To this end, I make three assumption about
research:
1. Empirical research means producing a posteriori knowledge by justifying knowledge claims with
appropriately analyzed observations.
2. The observations are recorded on a relatively persistent medium as data.
3. An empirical study incorporates a process that starts from acquiring, simulating, or otherwise generating
data about a phenomenon of interest and then proceeds by performing analytical operations on the data.
The process can go through several iterations and dead ends until the data have
been transformed into a form in which they support a scholarly knowledge claim.
Definitions
Research process is a series of modular operations that transform data step-by-step into a form that
supports a posteriori knowledge claim.
Modularity entails dividing a complex system into relatively independent components so that the
relationships between the components are easily governable.
Research operation is the basic unit (module) of a research process. Internally, a research operation
is a bundle of closely related data manipulations. Externally, the manipulations that make up the
operation are separated from all other operations so that the individual operation can be
understood in isolation.
Graph is a representation of a structure formed by vertices that may or may not be connected by
edges.
Minimal Graph Based Representation
That is, status quo in the absence of common language
We can think G1 to capture the entire research
process as one massive operation, that is, the
process is collapsed into a tightly coupled bundle
of inputs, data manipulations, and outputs.
An effective description of the process must then
fall back on whatever idiosyncratic and field-
specific practices are available to the researcher.
Obviously, a graph-based representation is here
mostly superfluous…
arrive at the results. Such practices may be shared to within a specific
they are seldom pinned down as formally specified rules. It is p
projects are naturally like +!, that is, inherently difficult to break dow
A graph-based representation would then seem superfluous, although
the research externally as inputs to other research processes.
Figure 1. A minimal graph-based representation
an empirical research process
More interestingly, +! can be seen as the status quo in how resear
without a shared language: publications may include elaborate desc
that was performed to produce the results, but these tend to lack
available to the researcher to convey the steps that we
performed to arrive at the results. Such practices m
be shared to within a specific academic communi
but they are seldom pinned down as formally specifi
rules. It is possible that some research proje
are naturally like G1, that is, inherently difficult
break down into modular operations. A graph-bas
representation would then seem superfluous, althou
it might still help offering the research externally
inputs to other research processes.
G1
v1
Figure 1. A minimal graph-based representation o
an empirical research process
More interestingly, G1 can be seen as the stat
quo in how research processes are described witho
a shared language: publications may include elabora
descriptions of empirical work that was performed
produce the results, but these tend to lack a structu
that would be immediately recognizable to fello
academics. Without extensive and often tacit knowled
A minimal graph-based representation of an
empirical research process
Vertices and Edges
We need a meaningful way to define the
vertices and edges of the research
process graph.
To account for the temporal order of
operations, we make the graph directed.
Vertices as outputs/inputs and edges as
research operations.
results in a number of problems that are worth exploring in some detai
must have a vertex at its both ends and hence a minimal graph-based m
(!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem
operations, we also want to make +" a directed graph in which the set
ordered pair &! = ("!, ""). Figure 2 illustrates +".
Figure 2. Vertices as outputs/inputs and edges as
research operations
+" is immediately less elegant than +! as a starting point. It is difficult
counterpart similarly to +!. Also, the former would seem to suggest th
already existing output which does not make sense. Some material or e
k G1 to capture the entire research as one
ration; that is, the process is collapsed into
upled bundle of inputs, data manipulations,
.5
The description of the process must
ck on whatever idiosyncratic practices are
the researcher to convey the steps that were
o arrive at the results. Such practices may
o within a specific academic community,
seldom pinned down as formally specified
is possible that some research projects
y like G1, that is, inherently difficult to
into modular operations. A graph-based
on would then seem superfluous, although
ll help offering the research externally as
her research processes.
G1
v1
A minimal graph-based representation of
an empirical research process
terestingly, G1 can be seen as the status
research processes are described without
guage: publications may include elaborate
be tempting to think edges as research operations
and vertices as their outputs, yet this results in a
number of problems that are worth exploring in some
detail. To begin with, an edge must have a vertex
at its both ends and hence a minimal graph-based
model, let us call this G2 = (V2, E2), would be
V2 = {v1, v2}, E2 = {e1}. To account for the tempora
order of research operations, we also want to make G2 a
directed graph in which the set of edges E2 is made of
an ordered pair e1 = (v1, v2). Figure 2 illustrates G2.
G2
v1 v2
e1
Figure 2. Vertices as outputs/inputs and edges as
research actions
G2 is immediately less elegant than G1 as a starting
point. It is difficult to map G2 to a real-world
counterpart in a similarly to G1. Also, the former would
seem to suggest that research starts with an already
existing output which does not make sense. Some
material or events must of course exist for an empirica
research to start with, but the observations of any such
entities become research data only through the actions
Vertices and Edges
It is tempting to think edges as research
operations and vertices as their inputs/
outputs, but this results in several problems:
1. An edge must have vertices at both ends, which makes
the minimal graph G2 less elegant than G1 – it’s difficult
to map the former to a real-world counterpart in a
similar manner to the latter.
2. G2 suggests that research starts with an already existing
output, which does not make sense. Any observations
become data only through the actions of a researcher.
Vertices as outputs/inputs and edges as
research operations.
results in a number of problems that are worth exploring in some detai
must have a vertex at its both ends and hence a minimal graph-based m
(!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem
operations, we also want to make +" a directed graph in which the set
ordered pair &! = ("!, ""). Figure 2 illustrates +".
Figure 2. Vertices as outputs/inputs and edges as
research operations
+" is immediately less elegant than +! as a starting point. It is difficult
counterpart similarly to +!. Also, the former would seem to suggest th
already existing output which does not make sense. Some material or e
k G1 to capture the entire research as one
ration; that is, the process is collapsed into
upled bundle of inputs, data manipulations,
.5
The description of the process must
ck on whatever idiosyncratic practices are
the researcher to convey the steps that were
o arrive at the results. Such practices may
o within a specific academic community,
seldom pinned down as formally specified
is possible that some research projects
y like G1, that is, inherently difficult to
into modular operations. A graph-based
on would then seem superfluous, although
ll help offering the research externally as
her research processes.
G1
v1
A minimal graph-based representation of
an empirical research process
terestingly, G1 can be seen as the status
research processes are described without
guage: publications may include elaborate
be tempting to think edges as research operations
and vertices as their outputs, yet this results in a
number of problems that are worth exploring in some
detail. To begin with, an edge must have a vertex
at its both ends and hence a minimal graph-based
model, let us call this G2 = (V2, E2), would be
V2 = {v1, v2}, E2 = {e1}. To account for the tempora
order of research operations, we also want to make G2 a
directed graph in which the set of edges E2 is made of
an ordered pair e1 = (v1, v2). Figure 2 illustrates G2.
G2
v1 v2
e1
Figure 2. Vertices as outputs/inputs and edges as
research actions
G2 is immediately less elegant than G1 as a starting
point. It is difficult to map G2 to a real-world
counterpart in a similarly to G1. Also, the former would
seem to suggest that research starts with an already
existing output which does not make sense. Some
material or events must of course exist for an empirica
research to start with, but the observations of any such
entities become research data only through the actions
perspectives on data that are being processed. Finally, a research opera
one output as its input, which is difficult to model if we define edg
illustrated by +# in Figure 3.
Figure 3. Research operation !! with multiple input
results in an ill-defined graph
Given the problems with +" and +#, and the overall approach they repr
research operations including their outputs. An operation is therefore
G3
v1
v2
v3
e1
e1
Figure 3. Research operation e1 with multiple inputs
results in an ill-defined graph
the output within the research operation. It follows from
this is that the references, that is, edges between research
operations will actually point backwards in time. This
makes it intuitive to trace research results back to the
operations that contributed to them and are needed to
Vertices and Edges
3. A research operation can use more than one
output as its input, which is difficult to model if we
define research operations as edges.
Research operation with multiple inputs
results in an ill-defined graph.
Research Process Graph, RPG
Research operation references the outputs
of two earlier operations as its inputs.
("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The
strictly preceding operations, since an operation can only use existi
Figure 4 illustrates how 01+ solves the problem of capturing multipl
minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin
Figure 4. Research operation "" references the
outputs of two earlier operations "! and "# as its
inputs
The edges of 01+ record the order of operations for each path in the g
can arise if we need to know the order of two operations that do not
outputs from other research operations. The edges are
thus directed and defined as ordered pairs of operations,
e = (vtail, vhead) where vtail, vhead 2 V and E =
{e1, e2, e3, ..., em}. The edges can only point to strictly
preceding operations, since an operation can only use
existing outputs as its inputs. Figure 4 illustrates how
RPG solves the problem of capturing multiple inputs in
G3. Also, the minimal, RPG = ((v1), ;), is isomorphic
with our elegant starting point G1.
RPG1
v1
v2
v3
e1
e2
Figure 4. Research operation v3 references the
outputs of two earlier operations v1 and v2 as its
inputs
The edges of RPG record the order of operations
for each path in the graph. However, problems can arise
if we need to know the order of two operations that
do not appear on the same path. This can happen, for
Let us define vertices as research operations
including their outputs, and edges as references to
outputs of earlier research operations.
It follows that references (edges) point backward in time.
This makes it intuitive to trace back a research operation
(results) to all those operations that contributed to it and
makes actual implementation of the graph more
straightforward.
Note that the minimal RPG is again isomorphic with our
elegant starting point G1
Research Process Graph, RPG
Research operation references the outputs
of two earlier operations as its inputs.
("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The
strictly preceding operations, since an operation can only use existi
Figure 4 illustrates how 01+ solves the problem of capturing multipl
minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin
Figure 4. Research operation "" references the
outputs of two earlier operations "! and "# as its
inputs
The edges of 01+ record the order of operations for each path in the g
can arise if we need to know the order of two operations that do not
outputs from other research operations. The edges are
thus directed and defined as ordered pairs of operations,
e = (vtail, vhead) where vtail, vhead 2 V and E =
{e1, e2, e3, ..., em}. The edges can only point to strictly
preceding operations, since an operation can only use
existing outputs as its inputs. Figure 4 illustrates how
RPG solves the problem of capturing multiple inputs in
G3. Also, the minimal, RPG = ((v1), ;), is isomorphic
with our elegant starting point G1.
RPG1
v1
v2
v3
e1
e2
Figure 4. Research operation v3 references the
outputs of two earlier operations v1 and v2 as its
inputs
The edges of RPG record the order of operations
for each path in the graph. However, problems can arise
if we need to know the order of two operations that
do not appear on the same path. This can happen, for
Graphing the Research Process
albeit the order of operations in ! and other edges in % may rule out such a possibility.
e 5 summarizes the definition of 01+.
1: #$% = ((, *)
2: ( = (,$, ,%, ,&, … , ,'), where . > 0
3: Research operations are added to ( in a non-decreasing order according
to their completion time.
5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0
6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;<
7: #$% is a directed acyclic graph.
Figure 5. The definition of research process graph
does not have to be a simple or connected graph. There may be parallel edges, that is, more
one reference between two operations in the case the latter uses two different outputs from
rmer. To distinguish between the parallel edges and, more generally, to help identify the
t that is being referenced, we add a set of edge labels ;'
1
. We also add a set of vertex
;%
2
that allows to construct unique identifiers for research operations and their outputs by
ining vertex and edge labels together. To serve their function as identifiers, the edge labels
However, to allow mapping the graph to actual
research processes, we need identifiers for
research operations and references (edges).
Research Process Graph, RPG
Research operation references the outputs
of two earlier operations as its inputs.
("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The
strictly preceding operations, since an operation can only use existi
Figure 4 illustrates how 01+ solves the problem of capturing multipl
minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin
Figure 4. Research operation "" references the
outputs of two earlier operations "! and "# as its
inputs
The edges of 01+ record the order of operations for each path in the g
can arise if we need to know the order of two operations that do not
outputs from other research operations. The edges are
thus directed and defined as ordered pairs of operations,
e = (vtail, vhead) where vtail, vhead 2 V and E =
{e1, e2, e3, ..., em}. The edges can only point to strictly
preceding operations, since an operation can only use
existing outputs as its inputs. Figure 4 illustrates how
RPG solves the problem of capturing multiple inputs in
G3. Also, the minimal, RPG = ((v1), ;), is isomorphic
with our elegant starting point G1.
RPG1
v1
v2
v3
e1
e2
Figure 4. Research operation v3 references the
outputs of two earlier operations v1 and v2 as its
inputs
The edges of RPG record the order of operations
for each path in the graph. However, problems can arise
if we need to know the order of two operations that
do not appear on the same path. This can happen, for
Graphing the Research Process
albeit the order of operations in ! and other edges in % may rule out such a possibility.
e 5 summarizes the definition of 01+.
1: #$% = ((, *)
2: ( = (,$, ,%, ,&, … , ,'), where . > 0
3: Research operations are added to ( in a non-decreasing order according
to their completion time.
5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0
6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;<
7: #$% is a directed acyclic graph.
Figure 5. The definition of research process graph
does not have to be a simple or connected graph. There may be parallel edges, that is, more
one reference between two operations in the case the latter uses two different outputs from
rmer. To distinguish between the parallel edges and, more generally, to help identify the
t that is being referenced, we add a set of edge labels ;'
1
. We also add a set of vertex
;%
2
that allows to construct unique identifiers for research operations and their outputs by
ining vertex and edge labels together. To serve their function as identifiers, the edge labels
Graphing the Research Process
cture that provides globally unique and searchable identifiers for research operations
we will discuss below, verify their integrity). This would allow, in principle, any
operation to reference outputs from any other operation in the global network of
al research—note that I am not talking about the papers published from the study but the
empirical operations that were performed to come up with the results. This may sound a
hed vision, yet it is exactly what happens in software development, where newly written
ypically a fraction of the total codebase of a new product.
8: =0
= {<$
0
, <%
0
, <&
0
, … , <'
0
}, where . is the number of vertices in #$%
9: =1
= {<$
1
, <%
1
, <&
1
, … , <(
1
}, where 4 is the number of vertices in #$%
Figure 6. Adding vertex and edge labels to >?@
isolated vertices and separate graph components can capture false starts and separate
inquiry that did not contribute to each other or did not lead to useful findings, but are still
hile to report as they may provide valuable resources to other studies. These
ations lead to two important questions concerning the boundaries of 01+. First, one
choose whether to include in 01+ false starts, all the iterations, and separates lines of
Toward a Global Research Graph
The graph-based notation of research process graph leaves many practical issues
open.
1. The identification of research operations and their outputs in practice
2. The degree of process modularization
3. Research operation metadata
4. The stability and verifiability of research operations
The Vision
Research process graphs could evolve into boundary objects in academic
communication that allow researchers to make intermediary outputs from
empirical operations broadly available to each other.
To explore the feasibility of a global research graph, future studies should:
1. Develop algorithms to create visually appealing ways to plot RPGs
2. Assess the idea of RPG with respect to existing infrastructures
3. Simulate the benefits of modular empirical research

More Related Content

Similar to Graphing the Empirical Research Process: Toward Modular Empirical Research

IntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docxIntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docxmariuse18nolet
 
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWSAN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWSgerogepatton
 
An ai planning approach for generating
An ai planning approach for generatingAn ai planning approach for generating
An ai planning approach for generatingijaia
 
Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2Chris Rucker
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Assignment HandoutProgr.docx
Assignment HandoutProgr.docxAssignment HandoutProgr.docx
Assignment HandoutProgr.docxpoulterbarbara
 
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdfAftaZani1
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualizationgiurca
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiryguest2137aa
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiryguest9fa52
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingTakuma Wakamori
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
Unit 5 project management
Unit 5 project management Unit 5 project management
Unit 5 project management RASHMIPANWAR10
 

Similar to Graphing the Empirical Research Process: Toward Modular Empirical Research (20)

IntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docxIntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docx
 
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWSAN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
 
An ai planning approach for generating
An ai planning approach for generatingAn ai planning approach for generating
An ai planning approach for generating
 
Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2
 
Report writing
Report writingReport writing
Report writing
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Assignment HandoutProgr.docx
Assignment HandoutProgr.docxAssignment HandoutProgr.docx
Assignment HandoutProgr.docx
 
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
 
Reportwriteup
ReportwriteupReportwriteup
Reportwriteup
 
Dawak f v.6camera-1
Dawak f v.6camera-1Dawak f v.6camera-1
Dawak f v.6camera-1
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiry
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiry
 
Sq lite module2
Sq lite module2Sq lite module2
Sq lite module2
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Unit 5 project management
Unit 5 project management Unit 5 project management
Unit 5 project management
 
Critical Path Analysis
Critical Path AnalysisCritical Path Analysis
Critical Path Analysis
 
The Literature Review
The Literature ReviewThe Literature Review
The Literature Review
 

More from Aleksi Aaltonen

Data Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value CreationData Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value CreationAleksi Aaltonen
 
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...Aleksi Aaltonen
 
The Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge WorkThe Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge WorkAleksi Aaltonen
 
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...Aleksi Aaltonen
 
Beyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic ArtifactsBeyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic ArtifactsAleksi Aaltonen
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Aleksi Aaltonen
 

More from Aleksi Aaltonen (6)

Data Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value CreationData Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value Creation
 
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
 
The Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge WorkThe Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge Work
 
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
 
Beyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic ArtifactsBeyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic Artifacts
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
 

Recently uploaded

Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfCherry
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCherry
 
Method of Quantifying interactions and its types
Method of Quantifying interactions and its typesMethod of Quantifying interactions and its types
Method of Quantifying interactions and its typesNISHIKANTKRISHAN
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cherry
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisAreesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloChristian Robert
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCherry
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Nistarini College, Purulia (W.B) India
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationSérgio Sacani
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxRASHMI M G
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 

Recently uploaded (20)

Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
 
Method of Quantifying interactions and its types
Method of Quantifying interactions and its typesMethod of Quantifying interactions and its types
Method of Quantifying interactions and its types
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 

Graphing the Empirical Research Process: Toward Modular Empirical Research

  • 1. Figure 7. Research operation "2 uses two outputs from operation "# and one from operation "! as its inputs, and is independent of operation "". If that which I have said above helps you to see empirical work in a n since the researcher(s) responsible for the operation will undoubtedly assign, for instance, different filenames for different outputs. By contrast, the namespace for labeling vertices, that is, research operations is potentially global and cuts across various governance boundaries. This makes vertex labeling more difficult. It is relatively easy to come up with a scheme to name research operations in an individual research project, even if the project is large and transcends organizational boundaries involving many different researchers. However, interesting opportunities arise if one could develop an infrastructure that provides globally unique and searchable identifiers for research operations (and, as we will discuss below, verify their integrity). This would allow, in principle, any research operation to reference outputs from any other operation in the global network of empirical research—note that I am not talking about the papers RPG2 v1 v2 v3 v4 e1 e2 e3 Figure 7. Research operation v4 uses two outputs from operation v2 and one from operation v1 as its inputs, and is independent of operation v3. If that which I have said above helps you to see 27 October 2021 INFORMS Annual Meeting Graphing the Empirical Research Process: Toward Modular Empirical Research Aleksi Aaltonen aleksi@temple.edu
  • 2. Motivation The nature of empirical research varies considerably between academic fields. Methodological plurality and varying practices between academic communities make it difficult to understand the process by which empirical studies produce their results beyond one’s own niche. This is a problem because it: 1. Makes research less transparent and reproducible 2. Hinders the re-usability of intermediate outputs in the research process
  • 3. Research vs. Software Development Practices In software development, we glue together existing, well- tested and validated components while trying to write as little new code as possible. In empirical research, we tend to start from the scratch, from ‘raw’ data and do everything ourselves.
  • 4. The Aim of the Project Design a rigorous approach for modeling empirical research processes without sacrificing the diversity of research. To this end, I make three assumption about research: 1. Empirical research means producing a posteriori knowledge by justifying knowledge claims with appropriately analyzed observations. 2. The observations are recorded on a relatively persistent medium as data. 3. An empirical study incorporates a process that starts from acquiring, simulating, or otherwise generating data about a phenomenon of interest and then proceeds by performing analytical operations on the data. The process can go through several iterations and dead ends until the data have been transformed into a form in which they support a scholarly knowledge claim.
  • 5. Definitions Research process is a series of modular operations that transform data step-by-step into a form that supports a posteriori knowledge claim. Modularity entails dividing a complex system into relatively independent components so that the relationships between the components are easily governable. Research operation is the basic unit (module) of a research process. Internally, a research operation is a bundle of closely related data manipulations. Externally, the manipulations that make up the operation are separated from all other operations so that the individual operation can be understood in isolation. Graph is a representation of a structure formed by vertices that may or may not be connected by edges.
  • 6. Minimal Graph Based Representation That is, status quo in the absence of common language We can think G1 to capture the entire research process as one massive operation, that is, the process is collapsed into a tightly coupled bundle of inputs, data manipulations, and outputs. An effective description of the process must then fall back on whatever idiosyncratic and field- specific practices are available to the researcher. Obviously, a graph-based representation is here mostly superfluous… arrive at the results. Such practices may be shared to within a specific they are seldom pinned down as formally specified rules. It is p projects are naturally like +!, that is, inherently difficult to break dow A graph-based representation would then seem superfluous, although the research externally as inputs to other research processes. Figure 1. A minimal graph-based representation an empirical research process More interestingly, +! can be seen as the status quo in how resear without a shared language: publications may include elaborate desc that was performed to produce the results, but these tend to lack available to the researcher to convey the steps that we performed to arrive at the results. Such practices m be shared to within a specific academic communi but they are seldom pinned down as formally specifi rules. It is possible that some research proje are naturally like G1, that is, inherently difficult break down into modular operations. A graph-bas representation would then seem superfluous, althou it might still help offering the research externally inputs to other research processes. G1 v1 Figure 1. A minimal graph-based representation o an empirical research process More interestingly, G1 can be seen as the stat quo in how research processes are described witho a shared language: publications may include elabora descriptions of empirical work that was performed produce the results, but these tend to lack a structu that would be immediately recognizable to fello academics. Without extensive and often tacit knowled A minimal graph-based representation of an empirical research process
  • 7. Vertices and Edges We need a meaningful way to define the vertices and edges of the research process graph. To account for the temporal order of operations, we make the graph directed. Vertices as outputs/inputs and edges as research operations. results in a number of problems that are worth exploring in some detai must have a vertex at its both ends and hence a minimal graph-based m (!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem operations, we also want to make +" a directed graph in which the set ordered pair &! = ("!, ""). Figure 2 illustrates +". Figure 2. Vertices as outputs/inputs and edges as research operations +" is immediately less elegant than +! as a starting point. It is difficult counterpart similarly to +!. Also, the former would seem to suggest th already existing output which does not make sense. Some material or e k G1 to capture the entire research as one ration; that is, the process is collapsed into upled bundle of inputs, data manipulations, .5 The description of the process must ck on whatever idiosyncratic practices are the researcher to convey the steps that were o arrive at the results. Such practices may o within a specific academic community, seldom pinned down as formally specified is possible that some research projects y like G1, that is, inherently difficult to into modular operations. A graph-based on would then seem superfluous, although ll help offering the research externally as her research processes. G1 v1 A minimal graph-based representation of an empirical research process terestingly, G1 can be seen as the status research processes are described without guage: publications may include elaborate be tempting to think edges as research operations and vertices as their outputs, yet this results in a number of problems that are worth exploring in some detail. To begin with, an edge must have a vertex at its both ends and hence a minimal graph-based model, let us call this G2 = (V2, E2), would be V2 = {v1, v2}, E2 = {e1}. To account for the tempora order of research operations, we also want to make G2 a directed graph in which the set of edges E2 is made of an ordered pair e1 = (v1, v2). Figure 2 illustrates G2. G2 v1 v2 e1 Figure 2. Vertices as outputs/inputs and edges as research actions G2 is immediately less elegant than G1 as a starting point. It is difficult to map G2 to a real-world counterpart in a similarly to G1. Also, the former would seem to suggest that research starts with an already existing output which does not make sense. Some material or events must of course exist for an empirica research to start with, but the observations of any such entities become research data only through the actions
  • 8. Vertices and Edges It is tempting to think edges as research operations and vertices as their inputs/ outputs, but this results in several problems: 1. An edge must have vertices at both ends, which makes the minimal graph G2 less elegant than G1 – it’s difficult to map the former to a real-world counterpart in a similar manner to the latter. 2. G2 suggests that research starts with an already existing output, which does not make sense. Any observations become data only through the actions of a researcher. Vertices as outputs/inputs and edges as research operations. results in a number of problems that are worth exploring in some detai must have a vertex at its both ends and hence a minimal graph-based m (!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem operations, we also want to make +" a directed graph in which the set ordered pair &! = ("!, ""). Figure 2 illustrates +". Figure 2. Vertices as outputs/inputs and edges as research operations +" is immediately less elegant than +! as a starting point. It is difficult counterpart similarly to +!. Also, the former would seem to suggest th already existing output which does not make sense. Some material or e k G1 to capture the entire research as one ration; that is, the process is collapsed into upled bundle of inputs, data manipulations, .5 The description of the process must ck on whatever idiosyncratic practices are the researcher to convey the steps that were o arrive at the results. Such practices may o within a specific academic community, seldom pinned down as formally specified is possible that some research projects y like G1, that is, inherently difficult to into modular operations. A graph-based on would then seem superfluous, although ll help offering the research externally as her research processes. G1 v1 A minimal graph-based representation of an empirical research process terestingly, G1 can be seen as the status research processes are described without guage: publications may include elaborate be tempting to think edges as research operations and vertices as their outputs, yet this results in a number of problems that are worth exploring in some detail. To begin with, an edge must have a vertex at its both ends and hence a minimal graph-based model, let us call this G2 = (V2, E2), would be V2 = {v1, v2}, E2 = {e1}. To account for the tempora order of research operations, we also want to make G2 a directed graph in which the set of edges E2 is made of an ordered pair e1 = (v1, v2). Figure 2 illustrates G2. G2 v1 v2 e1 Figure 2. Vertices as outputs/inputs and edges as research actions G2 is immediately less elegant than G1 as a starting point. It is difficult to map G2 to a real-world counterpart in a similarly to G1. Also, the former would seem to suggest that research starts with an already existing output which does not make sense. Some material or events must of course exist for an empirica research to start with, but the observations of any such entities become research data only through the actions
  • 9. perspectives on data that are being processed. Finally, a research opera one output as its input, which is difficult to model if we define edg illustrated by +# in Figure 3. Figure 3. Research operation !! with multiple input results in an ill-defined graph Given the problems with +" and +#, and the overall approach they repr research operations including their outputs. An operation is therefore G3 v1 v2 v3 e1 e1 Figure 3. Research operation e1 with multiple inputs results in an ill-defined graph the output within the research operation. It follows from this is that the references, that is, edges between research operations will actually point backwards in time. This makes it intuitive to trace research results back to the operations that contributed to them and are needed to Vertices and Edges 3. A research operation can use more than one output as its input, which is difficult to model if we define research operations as edges. Research operation with multiple inputs results in an ill-defined graph.
  • 10. Research Process Graph, RPG Research operation references the outputs of two earlier operations as its inputs. ("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The strictly preceding operations, since an operation can only use existi Figure 4 illustrates how 01+ solves the problem of capturing multipl minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin Figure 4. Research operation "" references the outputs of two earlier operations "! and "# as its inputs The edges of 01+ record the order of operations for each path in the g can arise if we need to know the order of two operations that do not outputs from other research operations. The edges are thus directed and defined as ordered pairs of operations, e = (vtail, vhead) where vtail, vhead 2 V and E = {e1, e2, e3, ..., em}. The edges can only point to strictly preceding operations, since an operation can only use existing outputs as its inputs. Figure 4 illustrates how RPG solves the problem of capturing multiple inputs in G3. Also, the minimal, RPG = ((v1), ;), is isomorphic with our elegant starting point G1. RPG1 v1 v2 v3 e1 e2 Figure 4. Research operation v3 references the outputs of two earlier operations v1 and v2 as its inputs The edges of RPG record the order of operations for each path in the graph. However, problems can arise if we need to know the order of two operations that do not appear on the same path. This can happen, for Let us define vertices as research operations including their outputs, and edges as references to outputs of earlier research operations. It follows that references (edges) point backward in time. This makes it intuitive to trace back a research operation (results) to all those operations that contributed to it and makes actual implementation of the graph more straightforward. Note that the minimal RPG is again isomorphic with our elegant starting point G1
  • 11. Research Process Graph, RPG Research operation references the outputs of two earlier operations as its inputs. ("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The strictly preceding operations, since an operation can only use existi Figure 4 illustrates how 01+ solves the problem of capturing multipl minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin Figure 4. Research operation "" references the outputs of two earlier operations "! and "# as its inputs The edges of 01+ record the order of operations for each path in the g can arise if we need to know the order of two operations that do not outputs from other research operations. The edges are thus directed and defined as ordered pairs of operations, e = (vtail, vhead) where vtail, vhead 2 V and E = {e1, e2, e3, ..., em}. The edges can only point to strictly preceding operations, since an operation can only use existing outputs as its inputs. Figure 4 illustrates how RPG solves the problem of capturing multiple inputs in G3. Also, the minimal, RPG = ((v1), ;), is isomorphic with our elegant starting point G1. RPG1 v1 v2 v3 e1 e2 Figure 4. Research operation v3 references the outputs of two earlier operations v1 and v2 as its inputs The edges of RPG record the order of operations for each path in the graph. However, problems can arise if we need to know the order of two operations that do not appear on the same path. This can happen, for Graphing the Research Process albeit the order of operations in ! and other edges in % may rule out such a possibility. e 5 summarizes the definition of 01+. 1: #$% = ((, *) 2: ( = (,$, ,%, ,&, … , ,'), where . > 0 3: Research operations are added to ( in a non-decreasing order according to their completion time. 5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0 6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;< 7: #$% is a directed acyclic graph. Figure 5. The definition of research process graph does not have to be a simple or connected graph. There may be parallel edges, that is, more one reference between two operations in the case the latter uses two different outputs from rmer. To distinguish between the parallel edges and, more generally, to help identify the t that is being referenced, we add a set of edge labels ;' 1 . We also add a set of vertex ;% 2 that allows to construct unique identifiers for research operations and their outputs by ining vertex and edge labels together. To serve their function as identifiers, the edge labels However, to allow mapping the graph to actual research processes, we need identifiers for research operations and references (edges).
  • 12. Research Process Graph, RPG Research operation references the outputs of two earlier operations as its inputs. ("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The strictly preceding operations, since an operation can only use existi Figure 4 illustrates how 01+ solves the problem of capturing multipl minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin Figure 4. Research operation "" references the outputs of two earlier operations "! and "# as its inputs The edges of 01+ record the order of operations for each path in the g can arise if we need to know the order of two operations that do not outputs from other research operations. The edges are thus directed and defined as ordered pairs of operations, e = (vtail, vhead) where vtail, vhead 2 V and E = {e1, e2, e3, ..., em}. The edges can only point to strictly preceding operations, since an operation can only use existing outputs as its inputs. Figure 4 illustrates how RPG solves the problem of capturing multiple inputs in G3. Also, the minimal, RPG = ((v1), ;), is isomorphic with our elegant starting point G1. RPG1 v1 v2 v3 e1 e2 Figure 4. Research operation v3 references the outputs of two earlier operations v1 and v2 as its inputs The edges of RPG record the order of operations for each path in the graph. However, problems can arise if we need to know the order of two operations that do not appear on the same path. This can happen, for Graphing the Research Process albeit the order of operations in ! and other edges in % may rule out such a possibility. e 5 summarizes the definition of 01+. 1: #$% = ((, *) 2: ( = (,$, ,%, ,&, … , ,'), where . > 0 3: Research operations are added to ( in a non-decreasing order according to their completion time. 5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0 6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;< 7: #$% is a directed acyclic graph. Figure 5. The definition of research process graph does not have to be a simple or connected graph. There may be parallel edges, that is, more one reference between two operations in the case the latter uses two different outputs from rmer. To distinguish between the parallel edges and, more generally, to help identify the t that is being referenced, we add a set of edge labels ;' 1 . We also add a set of vertex ;% 2 that allows to construct unique identifiers for research operations and their outputs by ining vertex and edge labels together. To serve their function as identifiers, the edge labels Graphing the Research Process cture that provides globally unique and searchable identifiers for research operations we will discuss below, verify their integrity). This would allow, in principle, any operation to reference outputs from any other operation in the global network of al research—note that I am not talking about the papers published from the study but the empirical operations that were performed to come up with the results. This may sound a hed vision, yet it is exactly what happens in software development, where newly written ypically a fraction of the total codebase of a new product. 8: =0 = {<$ 0 , <% 0 , <& 0 , … , <' 0 }, where . is the number of vertices in #$% 9: =1 = {<$ 1 , <% 1 , <& 1 , … , <( 1 }, where 4 is the number of vertices in #$% Figure 6. Adding vertex and edge labels to >?@ isolated vertices and separate graph components can capture false starts and separate inquiry that did not contribute to each other or did not lead to useful findings, but are still hile to report as they may provide valuable resources to other studies. These ations lead to two important questions concerning the boundaries of 01+. First, one choose whether to include in 01+ false starts, all the iterations, and separates lines of
  • 13. Toward a Global Research Graph The graph-based notation of research process graph leaves many practical issues open. 1. The identification of research operations and their outputs in practice 2. The degree of process modularization 3. Research operation metadata 4. The stability and verifiability of research operations
  • 14. The Vision Research process graphs could evolve into boundary objects in academic communication that allow researchers to make intermediary outputs from empirical operations broadly available to each other. To explore the feasibility of a global research graph, future studies should: 1. Develop algorithms to create visually appealing ways to plot RPGs 2. Assess the idea of RPG with respect to existing infrastructures 3. Simulate the benefits of modular empirical research