SlideShare a Scribd company logo
Figure 7. Research operation "2 uses two outputs
from operation "# and one from operation "! as its
inputs, and is independent of operation "".
If that which I have said above helps you to see empirical work in a n
since the researcher(s) responsible for the operation will
undoubtedly assign, for instance, different filenames for
different outputs.
By contrast, the namespace for labeling vertices,
that is, research operations is potentially global and
cuts across various governance boundaries. This makes
vertex labeling more difficult. It is relatively easy to
come up with a scheme to name research operations in
an individual research project, even if the project is large
and transcends organizational boundaries involving
many different researchers. However, interesting
opportunities arise if one could develop an infrastructure
that provides globally unique and searchable identifiers
for research operations (and, as we will discuss below,
verify their integrity). This would allow, in principle,
any research operation to reference outputs from any
other operation in the global network of empirical
research—note that I am not talking about the papers
RPG2
v1
v2
v3
v4
e1
e2
e3
Figure 7. Research operation v4 uses two outputs
from operation v2 and one from operation v1 as its
inputs, and is independent of operation v3.
If that which I have said above helps you to see
27 October 2021
INFORMS Annual Meeting
Graphing the Empirical Research Process:
Toward Modular Empirical Research
Aleksi Aaltonen
aleksi@temple.edu
Motivation
The nature of empirical research varies considerably between academic fields.
Methodological plurality and varying practices between academic communities
make it difficult to understand the process by which empirical studies produce their
results beyond one’s own niche.
This is a problem because it:
1. Makes research less transparent and reproducible
2. Hinders the re-usability of intermediate outputs in the research process
Research vs. Software Development Practices
In software development, we glue together existing, well-
tested and validated components while trying to write as little
new code as possible.
In empirical research, we tend to start from the scratch, from
‘raw’ data and do everything ourselves.
The Aim of the Project
Design a rigorous approach for modeling empirical research processes without
sacrificing the diversity of research. To this end, I make three assumption about
research:
1. Empirical research means producing a posteriori knowledge by justifying knowledge claims with
appropriately analyzed observations.
2. The observations are recorded on a relatively persistent medium as data.
3. An empirical study incorporates a process that starts from acquiring, simulating, or otherwise generating
data about a phenomenon of interest and then proceeds by performing analytical operations on the data.
The process can go through several iterations and dead ends until the data have
been transformed into a form in which they support a scholarly knowledge claim.
Definitions
Research process is a series of modular operations that transform data step-by-step into a form that
supports a posteriori knowledge claim.
Modularity entails dividing a complex system into relatively independent components so that the
relationships between the components are easily governable.
Research operation is the basic unit (module) of a research process. Internally, a research operation
is a bundle of closely related data manipulations. Externally, the manipulations that make up the
operation are separated from all other operations so that the individual operation can be
understood in isolation.
Graph is a representation of a structure formed by vertices that may or may not be connected by
edges.
Minimal Graph Based Representation
That is, status quo in the absence of common language
We can think G1 to capture the entire research
process as one massive operation, that is, the
process is collapsed into a tightly coupled bundle
of inputs, data manipulations, and outputs.
An effective description of the process must then
fall back on whatever idiosyncratic and field-
specific practices are available to the researcher.
Obviously, a graph-based representation is here
mostly superfluous…
arrive at the results. Such practices may be shared to within a specific
they are seldom pinned down as formally specified rules. It is p
projects are naturally like +!, that is, inherently difficult to break dow
A graph-based representation would then seem superfluous, although
the research externally as inputs to other research processes.
Figure 1. A minimal graph-based representation
an empirical research process
More interestingly, +! can be seen as the status quo in how resear
without a shared language: publications may include elaborate desc
that was performed to produce the results, but these tend to lack
available to the researcher to convey the steps that we
performed to arrive at the results. Such practices m
be shared to within a specific academic communi
but they are seldom pinned down as formally specifi
rules. It is possible that some research proje
are naturally like G1, that is, inherently difficult
break down into modular operations. A graph-bas
representation would then seem superfluous, althou
it might still help offering the research externally
inputs to other research processes.
G1
v1
Figure 1. A minimal graph-based representation o
an empirical research process
More interestingly, G1 can be seen as the stat
quo in how research processes are described witho
a shared language: publications may include elabora
descriptions of empirical work that was performed
produce the results, but these tend to lack a structu
that would be immediately recognizable to fello
academics. Without extensive and often tacit knowled
A minimal graph-based representation of an
empirical research process
Vertices and Edges
We need a meaningful way to define the
vertices and edges of the research
process graph.
To account for the temporal order of
operations, we make the graph directed.
Vertices as outputs/inputs and edges as
research operations.
results in a number of problems that are worth exploring in some detai
must have a vertex at its both ends and hence a minimal graph-based m
(!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem
operations, we also want to make +" a directed graph in which the set
ordered pair &! = ("!, ""). Figure 2 illustrates +".
Figure 2. Vertices as outputs/inputs and edges as
research operations
+" is immediately less elegant than +! as a starting point. It is difficult
counterpart similarly to +!. Also, the former would seem to suggest th
already existing output which does not make sense. Some material or e
k G1 to capture the entire research as one
ration; that is, the process is collapsed into
upled bundle of inputs, data manipulations,
.5
The description of the process must
ck on whatever idiosyncratic practices are
the researcher to convey the steps that were
o arrive at the results. Such practices may
o within a specific academic community,
seldom pinned down as formally specified
is possible that some research projects
y like G1, that is, inherently difficult to
into modular operations. A graph-based
on would then seem superfluous, although
ll help offering the research externally as
her research processes.
G1
v1
A minimal graph-based representation of
an empirical research process
terestingly, G1 can be seen as the status
research processes are described without
guage: publications may include elaborate
be tempting to think edges as research operations
and vertices as their outputs, yet this results in a
number of problems that are worth exploring in some
detail. To begin with, an edge must have a vertex
at its both ends and hence a minimal graph-based
model, let us call this G2 = (V2, E2), would be
V2 = {v1, v2}, E2 = {e1}. To account for the tempora
order of research operations, we also want to make G2 a
directed graph in which the set of edges E2 is made of
an ordered pair e1 = (v1, v2). Figure 2 illustrates G2.
G2
v1 v2
e1
Figure 2. Vertices as outputs/inputs and edges as
research actions
G2 is immediately less elegant than G1 as a starting
point. It is difficult to map G2 to a real-world
counterpart in a similarly to G1. Also, the former would
seem to suggest that research starts with an already
existing output which does not make sense. Some
material or events must of course exist for an empirica
research to start with, but the observations of any such
entities become research data only through the actions
Vertices and Edges
It is tempting to think edges as research
operations and vertices as their inputs/
outputs, but this results in several problems:
1. An edge must have vertices at both ends, which makes
the minimal graph G2 less elegant than G1 – it’s difficult
to map the former to a real-world counterpart in a
similar manner to the latter.
2. G2 suggests that research starts with an already existing
output, which does not make sense. Any observations
become data only through the actions of a researcher.
Vertices as outputs/inputs and edges as
research operations.
results in a number of problems that are worth exploring in some detai
must have a vertex at its both ends and hence a minimal graph-based m
(!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem
operations, we also want to make +" a directed graph in which the set
ordered pair &! = ("!, ""). Figure 2 illustrates +".
Figure 2. Vertices as outputs/inputs and edges as
research operations
+" is immediately less elegant than +! as a starting point. It is difficult
counterpart similarly to +!. Also, the former would seem to suggest th
already existing output which does not make sense. Some material or e
k G1 to capture the entire research as one
ration; that is, the process is collapsed into
upled bundle of inputs, data manipulations,
.5
The description of the process must
ck on whatever idiosyncratic practices are
the researcher to convey the steps that were
o arrive at the results. Such practices may
o within a specific academic community,
seldom pinned down as formally specified
is possible that some research projects
y like G1, that is, inherently difficult to
into modular operations. A graph-based
on would then seem superfluous, although
ll help offering the research externally as
her research processes.
G1
v1
A minimal graph-based representation of
an empirical research process
terestingly, G1 can be seen as the status
research processes are described without
guage: publications may include elaborate
be tempting to think edges as research operations
and vertices as their outputs, yet this results in a
number of problems that are worth exploring in some
detail. To begin with, an edge must have a vertex
at its both ends and hence a minimal graph-based
model, let us call this G2 = (V2, E2), would be
V2 = {v1, v2}, E2 = {e1}. To account for the tempora
order of research operations, we also want to make G2 a
directed graph in which the set of edges E2 is made of
an ordered pair e1 = (v1, v2). Figure 2 illustrates G2.
G2
v1 v2
e1
Figure 2. Vertices as outputs/inputs and edges as
research actions
G2 is immediately less elegant than G1 as a starting
point. It is difficult to map G2 to a real-world
counterpart in a similarly to G1. Also, the former would
seem to suggest that research starts with an already
existing output which does not make sense. Some
material or events must of course exist for an empirica
research to start with, but the observations of any such
entities become research data only through the actions
perspectives on data that are being processed. Finally, a research opera
one output as its input, which is difficult to model if we define edg
illustrated by +# in Figure 3.
Figure 3. Research operation !! with multiple input
results in an ill-defined graph
Given the problems with +" and +#, and the overall approach they repr
research operations including their outputs. An operation is therefore
G3
v1
v2
v3
e1
e1
Figure 3. Research operation e1 with multiple inputs
results in an ill-defined graph
the output within the research operation. It follows from
this is that the references, that is, edges between research
operations will actually point backwards in time. This
makes it intuitive to trace research results back to the
operations that contributed to them and are needed to
Vertices and Edges
3. A research operation can use more than one
output as its input, which is difficult to model if we
define research operations as edges.
Research operation with multiple inputs
results in an ill-defined graph.
Research Process Graph, RPG
Research operation references the outputs
of two earlier operations as its inputs.
("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The
strictly preceding operations, since an operation can only use existi
Figure 4 illustrates how 01+ solves the problem of capturing multipl
minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin
Figure 4. Research operation "" references the
outputs of two earlier operations "! and "# as its
inputs
The edges of 01+ record the order of operations for each path in the g
can arise if we need to know the order of two operations that do not
outputs from other research operations. The edges are
thus directed and defined as ordered pairs of operations,
e = (vtail, vhead) where vtail, vhead 2 V and E =
{e1, e2, e3, ..., em}. The edges can only point to strictly
preceding operations, since an operation can only use
existing outputs as its inputs. Figure 4 illustrates how
RPG solves the problem of capturing multiple inputs in
G3. Also, the minimal, RPG = ((v1), ;), is isomorphic
with our elegant starting point G1.
RPG1
v1
v2
v3
e1
e2
Figure 4. Research operation v3 references the
outputs of two earlier operations v1 and v2 as its
inputs
The edges of RPG record the order of operations
for each path in the graph. However, problems can arise
if we need to know the order of two operations that
do not appear on the same path. This can happen, for
Let us define vertices as research operations
including their outputs, and edges as references to
outputs of earlier research operations.
It follows that references (edges) point backward in time.
This makes it intuitive to trace back a research operation
(results) to all those operations that contributed to it and
makes actual implementation of the graph more
straightforward.
Note that the minimal RPG is again isomorphic with our
elegant starting point G1
Research Process Graph, RPG
Research operation references the outputs
of two earlier operations as its inputs.
("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The
strictly preceding operations, since an operation can only use existi
Figure 4 illustrates how 01+ solves the problem of capturing multipl
minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin
Figure 4. Research operation "" references the
outputs of two earlier operations "! and "# as its
inputs
The edges of 01+ record the order of operations for each path in the g
can arise if we need to know the order of two operations that do not
outputs from other research operations. The edges are
thus directed and defined as ordered pairs of operations,
e = (vtail, vhead) where vtail, vhead 2 V and E =
{e1, e2, e3, ..., em}. The edges can only point to strictly
preceding operations, since an operation can only use
existing outputs as its inputs. Figure 4 illustrates how
RPG solves the problem of capturing multiple inputs in
G3. Also, the minimal, RPG = ((v1), ;), is isomorphic
with our elegant starting point G1.
RPG1
v1
v2
v3
e1
e2
Figure 4. Research operation v3 references the
outputs of two earlier operations v1 and v2 as its
inputs
The edges of RPG record the order of operations
for each path in the graph. However, problems can arise
if we need to know the order of two operations that
do not appear on the same path. This can happen, for
Graphing the Research Process
albeit the order of operations in ! and other edges in % may rule out such a possibility.
e 5 summarizes the definition of 01+.
1: #$% = ((, *)
2: ( = (,$, ,%, ,&, … , ,'), where . > 0
3: Research operations are added to ( in a non-decreasing order according
to their completion time.
5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0
6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;<
7: #$% is a directed acyclic graph.
Figure 5. The definition of research process graph
does not have to be a simple or connected graph. There may be parallel edges, that is, more
one reference between two operations in the case the latter uses two different outputs from
rmer. To distinguish between the parallel edges and, more generally, to help identify the
t that is being referenced, we add a set of edge labels ;'
1
. We also add a set of vertex
;%
2
that allows to construct unique identifiers for research operations and their outputs by
ining vertex and edge labels together. To serve their function as identifiers, the edge labels
However, to allow mapping the graph to actual
research processes, we need identifiers for
research operations and references (edges).
Research Process Graph, RPG
Research operation references the outputs
of two earlier operations as its inputs.
("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The
strictly preceding operations, since an operation can only use existi
Figure 4 illustrates how 01+ solves the problem of capturing multipl
minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin
Figure 4. Research operation "" references the
outputs of two earlier operations "! and "# as its
inputs
The edges of 01+ record the order of operations for each path in the g
can arise if we need to know the order of two operations that do not
outputs from other research operations. The edges are
thus directed and defined as ordered pairs of operations,
e = (vtail, vhead) where vtail, vhead 2 V and E =
{e1, e2, e3, ..., em}. The edges can only point to strictly
preceding operations, since an operation can only use
existing outputs as its inputs. Figure 4 illustrates how
RPG solves the problem of capturing multiple inputs in
G3. Also, the minimal, RPG = ((v1), ;), is isomorphic
with our elegant starting point G1.
RPG1
v1
v2
v3
e1
e2
Figure 4. Research operation v3 references the
outputs of two earlier operations v1 and v2 as its
inputs
The edges of RPG record the order of operations
for each path in the graph. However, problems can arise
if we need to know the order of two operations that
do not appear on the same path. This can happen, for
Graphing the Research Process
albeit the order of operations in ! and other edges in % may rule out such a possibility.
e 5 summarizes the definition of 01+.
1: #$% = ((, *)
2: ( = (,$, ,%, ,&, … , ,'), where . > 0
3: Research operations are added to ( in a non-decreasing order according
to their completion time.
5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0
6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;<
7: #$% is a directed acyclic graph.
Figure 5. The definition of research process graph
does not have to be a simple or connected graph. There may be parallel edges, that is, more
one reference between two operations in the case the latter uses two different outputs from
rmer. To distinguish between the parallel edges and, more generally, to help identify the
t that is being referenced, we add a set of edge labels ;'
1
. We also add a set of vertex
;%
2
that allows to construct unique identifiers for research operations and their outputs by
ining vertex and edge labels together. To serve their function as identifiers, the edge labels
Graphing the Research Process
cture that provides globally unique and searchable identifiers for research operations
we will discuss below, verify their integrity). This would allow, in principle, any
operation to reference outputs from any other operation in the global network of
al research—note that I am not talking about the papers published from the study but the
empirical operations that were performed to come up with the results. This may sound a
hed vision, yet it is exactly what happens in software development, where newly written
ypically a fraction of the total codebase of a new product.
8: =0
= {<$
0
, <%
0
, <&
0
, … , <'
0
}, where . is the number of vertices in #$%
9: =1
= {<$
1
, <%
1
, <&
1
, … , <(
1
}, where 4 is the number of vertices in #$%
Figure 6. Adding vertex and edge labels to >?@
isolated vertices and separate graph components can capture false starts and separate
inquiry that did not contribute to each other or did not lead to useful findings, but are still
hile to report as they may provide valuable resources to other studies. These
ations lead to two important questions concerning the boundaries of 01+. First, one
choose whether to include in 01+ false starts, all the iterations, and separates lines of
Toward a Global Research Graph
The graph-based notation of research process graph leaves many practical issues
open.
1. The identification of research operations and their outputs in practice
2. The degree of process modularization
3. Research operation metadata
4. The stability and verifiability of research operations
The Vision
Research process graphs could evolve into boundary objects in academic
communication that allow researchers to make intermediary outputs from
empirical operations broadly available to each other.
To explore the feasibility of a global research graph, future studies should:
1. Develop algorithms to create visually appealing ways to plot RPGs
2. Assess the idea of RPG with respect to existing infrastructures
3. Simulate the benefits of modular empirical research

More Related Content

Similar to Graphing the Empirical Research Process: Toward Modular Empirical Research

IntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docxIntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docx
mariuse18nolet
 
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWSAN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
gerogepatton
 
An ai planning approach for generating
An ai planning approach for generatingAn ai planning approach for generating
An ai planning approach for generating
ijaia
 
Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2
Chris Rucker
 
Report writing
Report writingReport writing
Report writing
praveen3030
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
Doug Needham
 
Assignment HandoutProgr.docx
Assignment HandoutProgr.docxAssignment HandoutProgr.docx
Assignment HandoutProgr.docx
poulterbarbara
 
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
AftaZani1
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
giurca
 
Dawak f v.6camera-1
Dawak f v.6camera-1Dawak f v.6camera-1
Dawak f v.6camera-1
Mohammed El malki
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
BetshaTizazu2
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiryguest9fa52
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiryguest2137aa
 
Sq lite module2
Sq lite module2Sq lite module2
Sq lite module2
Highervista
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingTakuma Wakamori
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
David De Roure
 
Unit 5 project management
Unit 5 project management Unit 5 project management
Unit 5 project management
RASHMIPANWAR10
 
Critical Path Analysis
Critical Path AnalysisCritical Path Analysis
Critical Path Analysis
North Warwickshie & Hinckley college
 

Similar to Graphing the Empirical Research Process: Toward Modular Empirical Research (20)

IntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docxIntroductionThis report discusses the programming process whic.docx
IntroductionThis report discusses the programming process whic.docx
 
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWSAN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS
 
An ai planning approach for generating
An ai planning approach for generatingAn ai planning approach for generating
An ai planning approach for generating
 
Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2Correlations, Trends, and Outliers in ggplot2
Correlations, Trends, and Outliers in ggplot2
 
Report writing
Report writingReport writing
Report writing
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Assignment HandoutProgr.docx
Assignment HandoutProgr.docxAssignment HandoutProgr.docx
Assignment HandoutProgr.docx
 
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf01. Birta L. G., Arbez G. - Modelling and Simulation_  (2007).pdf
01. Birta L. G., Arbez G. - Modelling and Simulation_ (2007).pdf
 
From Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog VisualizationFrom Data to Knowledge thru Grailog Visualization
From Data to Knowledge thru Grailog Visualization
 
Reportwriteup
ReportwriteupReportwriteup
Reportwriteup
 
Dawak f v.6camera-1
Dawak f v.6camera-1Dawak f v.6camera-1
Dawak f v.6camera-1
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiry
 
New Topic Effectivesequencesofenquiry
New Topic EffectivesequencesofenquiryNew Topic Effectivesequencesofenquiry
New Topic Effectivesequencesofenquiry
 
Sq lite module2
Sq lite module2Sq lite module2
Sq lite module2
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Unit 5 project management
Unit 5 project management Unit 5 project management
Unit 5 project management
 
Critical Path Analysis
Critical Path AnalysisCritical Path Analysis
Critical Path Analysis
 
The Literature Review
The Literature ReviewThe Literature Review
The Literature Review
 

More from Aleksi Aaltonen

Data Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value CreationData Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value Creation
Aleksi Aaltonen
 
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Aleksi Aaltonen
 
The Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge WorkThe Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge Work
Aleksi Aaltonen
 
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
Aleksi Aaltonen
 
Beyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic ArtifactsBeyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic Artifacts
Aleksi Aaltonen
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Aleksi Aaltonen
 

More from Aleksi Aaltonen (6)

Data Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value CreationData Innovation Lens: A New Way to Approach Data Design as Value Creation
Data Innovation Lens: A New Way to Approach Data Design as Value Creation
 
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
Not Good Enough, But Try Again! The Impact of Improved Rejection Communicatio...
 
The Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge WorkThe Performative Production of Trace Data in Knowledge Work
The Performative Production of Trace Data in Knowledge Work
 
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
What Happens to Ratings When Both Sides Multihome? The Impact of Vertical Spi...
 
Beyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic ArtifactsBeyond the Facts: Data as Digital-Semantic Artifacts
Beyond the Facts: Data as Digital-Semantic Artifacts
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
 

Recently uploaded

In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 

Recently uploaded (20)

In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 

Graphing the Empirical Research Process: Toward Modular Empirical Research

  • 1. Figure 7. Research operation "2 uses two outputs from operation "# and one from operation "! as its inputs, and is independent of operation "". If that which I have said above helps you to see empirical work in a n since the researcher(s) responsible for the operation will undoubtedly assign, for instance, different filenames for different outputs. By contrast, the namespace for labeling vertices, that is, research operations is potentially global and cuts across various governance boundaries. This makes vertex labeling more difficult. It is relatively easy to come up with a scheme to name research operations in an individual research project, even if the project is large and transcends organizational boundaries involving many different researchers. However, interesting opportunities arise if one could develop an infrastructure that provides globally unique and searchable identifiers for research operations (and, as we will discuss below, verify their integrity). This would allow, in principle, any research operation to reference outputs from any other operation in the global network of empirical research—note that I am not talking about the papers RPG2 v1 v2 v3 v4 e1 e2 e3 Figure 7. Research operation v4 uses two outputs from operation v2 and one from operation v1 as its inputs, and is independent of operation v3. If that which I have said above helps you to see 27 October 2021 INFORMS Annual Meeting Graphing the Empirical Research Process: Toward Modular Empirical Research Aleksi Aaltonen aleksi@temple.edu
  • 2. Motivation The nature of empirical research varies considerably between academic fields. Methodological plurality and varying practices between academic communities make it difficult to understand the process by which empirical studies produce their results beyond one’s own niche. This is a problem because it: 1. Makes research less transparent and reproducible 2. Hinders the re-usability of intermediate outputs in the research process
  • 3. Research vs. Software Development Practices In software development, we glue together existing, well- tested and validated components while trying to write as little new code as possible. In empirical research, we tend to start from the scratch, from ‘raw’ data and do everything ourselves.
  • 4. The Aim of the Project Design a rigorous approach for modeling empirical research processes without sacrificing the diversity of research. To this end, I make three assumption about research: 1. Empirical research means producing a posteriori knowledge by justifying knowledge claims with appropriately analyzed observations. 2. The observations are recorded on a relatively persistent medium as data. 3. An empirical study incorporates a process that starts from acquiring, simulating, or otherwise generating data about a phenomenon of interest and then proceeds by performing analytical operations on the data. The process can go through several iterations and dead ends until the data have been transformed into a form in which they support a scholarly knowledge claim.
  • 5. Definitions Research process is a series of modular operations that transform data step-by-step into a form that supports a posteriori knowledge claim. Modularity entails dividing a complex system into relatively independent components so that the relationships between the components are easily governable. Research operation is the basic unit (module) of a research process. Internally, a research operation is a bundle of closely related data manipulations. Externally, the manipulations that make up the operation are separated from all other operations so that the individual operation can be understood in isolation. Graph is a representation of a structure formed by vertices that may or may not be connected by edges.
  • 6. Minimal Graph Based Representation That is, status quo in the absence of common language We can think G1 to capture the entire research process as one massive operation, that is, the process is collapsed into a tightly coupled bundle of inputs, data manipulations, and outputs. An effective description of the process must then fall back on whatever idiosyncratic and field- specific practices are available to the researcher. Obviously, a graph-based representation is here mostly superfluous… arrive at the results. Such practices may be shared to within a specific they are seldom pinned down as formally specified rules. It is p projects are naturally like +!, that is, inherently difficult to break dow A graph-based representation would then seem superfluous, although the research externally as inputs to other research processes. Figure 1. A minimal graph-based representation an empirical research process More interestingly, +! can be seen as the status quo in how resear without a shared language: publications may include elaborate desc that was performed to produce the results, but these tend to lack available to the researcher to convey the steps that we performed to arrive at the results. Such practices m be shared to within a specific academic communi but they are seldom pinned down as formally specifi rules. It is possible that some research proje are naturally like G1, that is, inherently difficult break down into modular operations. A graph-bas representation would then seem superfluous, althou it might still help offering the research externally inputs to other research processes. G1 v1 Figure 1. A minimal graph-based representation o an empirical research process More interestingly, G1 can be seen as the stat quo in how research processes are described witho a shared language: publications may include elabora descriptions of empirical work that was performed produce the results, but these tend to lack a structu that would be immediately recognizable to fello academics. Without extensive and often tacit knowled A minimal graph-based representation of an empirical research process
  • 7. Vertices and Edges We need a meaningful way to define the vertices and edges of the research process graph. To account for the temporal order of operations, we make the graph directed. Vertices as outputs/inputs and edges as research operations. results in a number of problems that are worth exploring in some detai must have a vertex at its both ends and hence a minimal graph-based m (!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem operations, we also want to make +" a directed graph in which the set ordered pair &! = ("!, ""). Figure 2 illustrates +". Figure 2. Vertices as outputs/inputs and edges as research operations +" is immediately less elegant than +! as a starting point. It is difficult counterpart similarly to +!. Also, the former would seem to suggest th already existing output which does not make sense. Some material or e k G1 to capture the entire research as one ration; that is, the process is collapsed into upled bundle of inputs, data manipulations, .5 The description of the process must ck on whatever idiosyncratic practices are the researcher to convey the steps that were o arrive at the results. Such practices may o within a specific academic community, seldom pinned down as formally specified is possible that some research projects y like G1, that is, inherently difficult to into modular operations. A graph-based on would then seem superfluous, although ll help offering the research externally as her research processes. G1 v1 A minimal graph-based representation of an empirical research process terestingly, G1 can be seen as the status research processes are described without guage: publications may include elaborate be tempting to think edges as research operations and vertices as their outputs, yet this results in a number of problems that are worth exploring in some detail. To begin with, an edge must have a vertex at its both ends and hence a minimal graph-based model, let us call this G2 = (V2, E2), would be V2 = {v1, v2}, E2 = {e1}. To account for the tempora order of research operations, we also want to make G2 a directed graph in which the set of edges E2 is made of an ordered pair e1 = (v1, v2). Figure 2 illustrates G2. G2 v1 v2 e1 Figure 2. Vertices as outputs/inputs and edges as research actions G2 is immediately less elegant than G1 as a starting point. It is difficult to map G2 to a real-world counterpart in a similarly to G1. Also, the former would seem to suggest that research starts with an already existing output which does not make sense. Some material or events must of course exist for an empirica research to start with, but the observations of any such entities become research data only through the actions
  • 8. Vertices and Edges It is tempting to think edges as research operations and vertices as their inputs/ outputs, but this results in several problems: 1. An edge must have vertices at both ends, which makes the minimal graph G2 less elegant than G1 – it’s difficult to map the former to a real-world counterpart in a similar manner to the latter. 2. G2 suggests that research starts with an already existing output, which does not make sense. Any observations become data only through the actions of a researcher. Vertices as outputs/inputs and edges as research operations. results in a number of problems that are worth exploring in some detai must have a vertex at its both ends and hence a minimal graph-based m (!", %"), would be !" = {"!, ""}, %" = {&!}. To account for the tem operations, we also want to make +" a directed graph in which the set ordered pair &! = ("!, ""). Figure 2 illustrates +". Figure 2. Vertices as outputs/inputs and edges as research operations +" is immediately less elegant than +! as a starting point. It is difficult counterpart similarly to +!. Also, the former would seem to suggest th already existing output which does not make sense. Some material or e k G1 to capture the entire research as one ration; that is, the process is collapsed into upled bundle of inputs, data manipulations, .5 The description of the process must ck on whatever idiosyncratic practices are the researcher to convey the steps that were o arrive at the results. Such practices may o within a specific academic community, seldom pinned down as formally specified is possible that some research projects y like G1, that is, inherently difficult to into modular operations. A graph-based on would then seem superfluous, although ll help offering the research externally as her research processes. G1 v1 A minimal graph-based representation of an empirical research process terestingly, G1 can be seen as the status research processes are described without guage: publications may include elaborate be tempting to think edges as research operations and vertices as their outputs, yet this results in a number of problems that are worth exploring in some detail. To begin with, an edge must have a vertex at its both ends and hence a minimal graph-based model, let us call this G2 = (V2, E2), would be V2 = {v1, v2}, E2 = {e1}. To account for the tempora order of research operations, we also want to make G2 a directed graph in which the set of edges E2 is made of an ordered pair e1 = (v1, v2). Figure 2 illustrates G2. G2 v1 v2 e1 Figure 2. Vertices as outputs/inputs and edges as research actions G2 is immediately less elegant than G1 as a starting point. It is difficult to map G2 to a real-world counterpart in a similarly to G1. Also, the former would seem to suggest that research starts with an already existing output which does not make sense. Some material or events must of course exist for an empirica research to start with, but the observations of any such entities become research data only through the actions
  • 9. perspectives on data that are being processed. Finally, a research opera one output as its input, which is difficult to model if we define edg illustrated by +# in Figure 3. Figure 3. Research operation !! with multiple input results in an ill-defined graph Given the problems with +" and +#, and the overall approach they repr research operations including their outputs. An operation is therefore G3 v1 v2 v3 e1 e1 Figure 3. Research operation e1 with multiple inputs results in an ill-defined graph the output within the research operation. It follows from this is that the references, that is, edges between research operations will actually point backwards in time. This makes it intuitive to trace research results back to the operations that contributed to them and are needed to Vertices and Edges 3. A research operation can use more than one output as its input, which is difficult to model if we define research operations as edges. Research operation with multiple inputs results in an ill-defined graph.
  • 10. Research Process Graph, RPG Research operation references the outputs of two earlier operations as its inputs. ("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The strictly preceding operations, since an operation can only use existi Figure 4 illustrates how 01+ solves the problem of capturing multipl minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin Figure 4. Research operation "" references the outputs of two earlier operations "! and "# as its inputs The edges of 01+ record the order of operations for each path in the g can arise if we need to know the order of two operations that do not outputs from other research operations. The edges are thus directed and defined as ordered pairs of operations, e = (vtail, vhead) where vtail, vhead 2 V and E = {e1, e2, e3, ..., em}. The edges can only point to strictly preceding operations, since an operation can only use existing outputs as its inputs. Figure 4 illustrates how RPG solves the problem of capturing multiple inputs in G3. Also, the minimal, RPG = ((v1), ;), is isomorphic with our elegant starting point G1. RPG1 v1 v2 v3 e1 e2 Figure 4. Research operation v3 references the outputs of two earlier operations v1 and v2 as its inputs The edges of RPG record the order of operations for each path in the graph. However, problems can arise if we need to know the order of two operations that do not appear on the same path. This can happen, for Let us define vertices as research operations including their outputs, and edges as references to outputs of earlier research operations. It follows that references (edges) point backward in time. This makes it intuitive to trace back a research operation (results) to all those operations that contributed to it and makes actual implementation of the graph more straightforward. Note that the minimal RPG is again isomorphic with our elegant starting point G1
  • 11. Research Process Graph, RPG Research operation references the outputs of two earlier operations as its inputs. ("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The strictly preceding operations, since an operation can only use existi Figure 4 illustrates how 01+ solves the problem of capturing multipl minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin Figure 4. Research operation "" references the outputs of two earlier operations "! and "# as its inputs The edges of 01+ record the order of operations for each path in the g can arise if we need to know the order of two operations that do not outputs from other research operations. The edges are thus directed and defined as ordered pairs of operations, e = (vtail, vhead) where vtail, vhead 2 V and E = {e1, e2, e3, ..., em}. The edges can only point to strictly preceding operations, since an operation can only use existing outputs as its inputs. Figure 4 illustrates how RPG solves the problem of capturing multiple inputs in G3. Also, the minimal, RPG = ((v1), ;), is isomorphic with our elegant starting point G1. RPG1 v1 v2 v3 e1 e2 Figure 4. Research operation v3 references the outputs of two earlier operations v1 and v2 as its inputs The edges of RPG record the order of operations for each path in the graph. However, problems can arise if we need to know the order of two operations that do not appear on the same path. This can happen, for Graphing the Research Process albeit the order of operations in ! and other edges in % may rule out such a possibility. e 5 summarizes the definition of 01+. 1: #$% = ((, *) 2: ( = (,$, ,%, ,&, … , ,'), where . > 0 3: Research operations are added to ( in a non-decreasing order according to their completion time. 5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0 6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;< 7: #$% is a directed acyclic graph. Figure 5. The definition of research process graph does not have to be a simple or connected graph. There may be parallel edges, that is, more one reference between two operations in the case the latter uses two different outputs from rmer. To distinguish between the parallel edges and, more generally, to help identify the t that is being referenced, we add a set of edge labels ;' 1 . We also add a set of vertex ;% 2 that allows to construct unique identifiers for research operations and their outputs by ining vertex and edge labels together. To serve their function as identifiers, the edge labels However, to allow mapping the graph to actual research processes, we need identifiers for research operations and references (edges).
  • 12. Research Process Graph, RPG Research operation references the outputs of two earlier operations as its inputs. ("*+(,, "-.+/) where "*+(,, "-.+/ ∈ ! and % = {&!, &", &#, … , &'}. The strictly preceding operations, since an operation can only use existi Figure 4 illustrates how 01+ solves the problem of capturing multipl minimal, 01+ = (("!), ∅), is isomorphic with our elegant starting poin Figure 4. Research operation "" references the outputs of two earlier operations "! and "# as its inputs The edges of 01+ record the order of operations for each path in the g can arise if we need to know the order of two operations that do not outputs from other research operations. The edges are thus directed and defined as ordered pairs of operations, e = (vtail, vhead) where vtail, vhead 2 V and E = {e1, e2, e3, ..., em}. The edges can only point to strictly preceding operations, since an operation can only use existing outputs as its inputs. Figure 4 illustrates how RPG solves the problem of capturing multiple inputs in G3. Also, the minimal, RPG = ((v1), ;), is isomorphic with our elegant starting point G1. RPG1 v1 v2 v3 e1 e2 Figure 4. Research operation v3 references the outputs of two earlier operations v1 and v2 as its inputs The edges of RPG record the order of operations for each path in the graph. However, problems can arise if we need to know the order of two operations that do not appear on the same path. This can happen, for Graphing the Research Process albeit the order of operations in ! and other edges in % may rule out such a possibility. e 5 summarizes the definition of 01+. 1: #$% = ((, *) 2: ( = (,$, ,%, ,&, … , ,'), where . > 0 3: Research operations are added to ( in a non-decreasing order according to their completion time. 5: * = {2$, 2%, 2&, … , 2(}, where 4 ≥0 6: 2 = (,)*+,, ,-+./), where ℎ278 < :7;< 7: #$% is a directed acyclic graph. Figure 5. The definition of research process graph does not have to be a simple or connected graph. There may be parallel edges, that is, more one reference between two operations in the case the latter uses two different outputs from rmer. To distinguish between the parallel edges and, more generally, to help identify the t that is being referenced, we add a set of edge labels ;' 1 . We also add a set of vertex ;% 2 that allows to construct unique identifiers for research operations and their outputs by ining vertex and edge labels together. To serve their function as identifiers, the edge labels Graphing the Research Process cture that provides globally unique and searchable identifiers for research operations we will discuss below, verify their integrity). This would allow, in principle, any operation to reference outputs from any other operation in the global network of al research—note that I am not talking about the papers published from the study but the empirical operations that were performed to come up with the results. This may sound a hed vision, yet it is exactly what happens in software development, where newly written ypically a fraction of the total codebase of a new product. 8: =0 = {<$ 0 , <% 0 , <& 0 , … , <' 0 }, where . is the number of vertices in #$% 9: =1 = {<$ 1 , <% 1 , <& 1 , … , <( 1 }, where 4 is the number of vertices in #$% Figure 6. Adding vertex and edge labels to >?@ isolated vertices and separate graph components can capture false starts and separate inquiry that did not contribute to each other or did not lead to useful findings, but are still hile to report as they may provide valuable resources to other studies. These ations lead to two important questions concerning the boundaries of 01+. First, one choose whether to include in 01+ false starts, all the iterations, and separates lines of
  • 13. Toward a Global Research Graph The graph-based notation of research process graph leaves many practical issues open. 1. The identification of research operations and their outputs in practice 2. The degree of process modularization 3. Research operation metadata 4. The stability and verifiability of research operations
  • 14. The Vision Research process graphs could evolve into boundary objects in academic communication that allow researchers to make intermediary outputs from empirical operations broadly available to each other. To explore the feasibility of a global research graph, future studies should: 1. Develop algorithms to create visually appealing ways to plot RPGs 2. Assess the idea of RPG with respect to existing infrastructures 3. Simulate the benefits of modular empirical research