SlideShare a Scribd company logo
1 of 8
Download to read offline
DyGraph: A Dynamic Graph Generator and Benchmark Suite
Andrew McCrabb
University of Michigan
mccrabb@umich.edu
Hellina Nigatu
University of California, Berkeley
hellina_nigatu@berkeley.edu
Absalat Getachew
Addis Ababa Institute of Technology
absalat.dawit@aait.edu.et
Valeria Bertacco
University of Michigan
valeria@umich.edu
ABSTRACT
Dynamic graph processing, execution on vertex-edge graphs that
change over time, is quickly becoming a key computing need of the
twenty-first century. Dynamic graph algorithms unlock real-time
optimization solutions and a wide range of data-mining applica-
tions in logistics, finance, marketing, healthcare, and social media,
among many others. However, graph algorithms are extremely
memory-bound (i.e., their performance is limited by the bandwidth
of memory accesses on the underlying hardware platform, rather
than the compute capacity). Moreover, dynamic graph algorithms
are being applied to increasingly-large datasets, further straining
the memory systems and reducing performance. As a result, ad-
ditional research is needed to leverage new memory technologies
for faster, more efficient, dynamic graph-based processing. Such
research is difficult without access to hitherto unavailable industrial-
scale dynamic graph datasets to evaluate solutions.
In this work, we present DyGraph, a dynamic graph synthetic
dataset generator paired with a collection of real-world graphs in
the domains of social media, recommendation systems, and fintech.
We demonstrate the breadth of graph features represented in this
repository and evaluate the DyGraph Generator’s ability to gen-
erate synthetic graphs that mimic these real datasets. In our case
study, we find that the degree distribution of DyGraph Generator
datasets correlate 3 to 5.5 times more closely to real-world datasets
than Power Law models, paving the way for much-needed research
for high-performance dynamic graph processing.
ACM Reference Format:
Andrew McCrabb, Hellina Nigatu, Absalat Getachew, and Valeria Bertacco.
2022. DyGraph: A Dynamic Graph Generator and Benchmark Suite. In Joint
Workshop on Graph Data Management Experiences & Systems (GRADES) and
Network Data Analytics (NDA) (GRADES NDA’22), June 12, 2022, Philadelphia,
PA, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3534540.
3534692
1 INTRODUCTION
Networks of all sorts have become an unavoidable, underlying
component of modern life. Messaging platforms like Facebook and
Twitter are deeply embedded in modern social life. Logistics systems
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@acm.org.
GRADES ’22, June 12, 2022, Philadelphia, PA
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9384-3/22/06...$15.00
https://doi.org/10.1145/3534540.3534692
T = 0 0→ 1 T = 1
Figure 1: A dynamic graph, represented as a discrete sequence
of static graphs, where each graph corresponds to a snapshot
of the dynamic graph at time 𝑇. Green elements are added
and red elements are removed.
send packages anywhere in the world within days. GPS navigation
provides instant turn-by-turn driving directions in a maze of inter-
connected roadways. Online retailers use market basket analysis
to anticipate consumer needs and manage supply. Recommenda-
tion systems help deliver useful or entertaining content to users of
media streaming services. Few industries are unaffected by today’s
rich web of widespread networks.
Such networks are digitally represented as vertex-edge graphs:
for example, social networks map people to vertices and friendships
to edges. Graph-based processing (or “graph processing”) has be-
come a critical area of big-data research, as scientists in industry and
academia seek to leverage it to generate field-specific knowledge,
improve user experiences, and expand industry services.
As graph processing spreads, processing on dynamic graphs,
that is, graphs whose topologies morph over time (see sketch in
Figure 1), has become popular for applications with evolving or real-
time data. Some dynamic graph applications are extensions of their
static counterparts to provide faster, more useful information to
their users. Navigation apps must adapt almost instantly to changes
in road networks, such as construction, congestion, or collisions.
Dynamic market basket analysis allows retailers to quickly adapt
to temporary surges in demand, such as non-perishable food before
natural disasters or merchandise linked to trending music artists. In
addition, dynamic graph processing enables new applications. For
example, contact tracing lets medical experts track the spread of
disease and create health policies to save lives in a global pandemic.
Graph processing performs poorly on typical modern comput-
ers because the graph data, especially in a large graph, is often
sparse; that is, each data point is loosely connected to the rest of
the structure [25]. This property makes it difficult to exploit spatial
and temporal locality optimizations, essential traits for quick and
efficient execution in modern computers. Researchers have devel-
oped many solutions, both in hardware (including [11] [1] [25])
and software (including [12] [22] [15]), to improve performance
and efficiency for both static and dynamic graph processing.
Research evaluations in this space use publicly-available, real-
world graphs whenever possible to evaluate the proposed solutions.
GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al.
Compute
Share
Update
vn
Create
vertex
data
Share
with
each vn
vn
vn
Change
graph
topology
Figure 2: Dynamic graph algorithms’ structure: the compute
stage creates per-vertex values for some (or all) vertices. The
share stage transfers these values to neighbors. The update
stage implements all topology updates that have been queued
since the previous iteration.
Graph repositories, like SNAP [21] and Network Repository [31],
provide several graph datasets, but many important types of graphs
used in real-world applications are missing from these repositories.
For example, in real-world scenarios, static graphs may include
billions to trillions of edges [32] [7]. Additionally, many real graphs
are power-law graphs; that is, a small set of vertices have a high
degree (i.e., they have many neighbors) and the degree distribu-
tion plot approximates Power Law [26]. The available datasets are
much smaller and do not cover the rich variety of realistic Power
Law properties. Thus, to evaluate their work on such large graphs,
researchers resort to synthetic graph generators. These tools cre-
ate artificial graph datasets with certain pre-set properties, such
as number of vertices, degree distribution (where degree is the
number of edges incident to the vertex), clustering coefficients, etc.
Synthetic generators thus bridge a crucial gap between what is
publicly available and what is needed in the research community.
This gap is even larger for dynamic graphs. A wider range of dy-
namic graphs are needed to evaluate new research projects because
dynamic graphs are defined by a richer set of properties across
different applications, such as the frequency of graph updates and
changes in degree distribution over time. At the same time, even
fewer dynamic graph datasets are available in public repositories for
multiple reasons. First, collecting dynamic graph data can be pro-
hibitively time-consuming and expensive. Second, existing dynamic
graphs, from sources like social media, often contain inherently
identifiable data, limiting researchers’ ability to share them publicly.
Third, the methods and information that companies use to collect
their dynamic graph datasets may be protected by industry secrets.
Moreover, the offering of synthetic dynamic graph generators
is extremely limited. The few available [2] [10] [29] [30] [35] are
built for specific purposes and are ill-suited to evaluate novel work.
This aspect leaves dynamic graph researchers with few options,
other than artificially creating dynamic graphs out of existing static
graphs, a crude, unrealistic substitute for real-world data.
To bridge this gap in dynamic graph offerings, we present Dy-
Graph, a dataset generator and benchmark suite for dynamic graph
applications. Specifically, DyGraph contributes the following:
• It provides the DyGraph Generator, a novel synthetic dataset
generator capable of both generating graphs with user-specified
uniformly random or power-law properties from scratch and mim-
icking the properties of an input dynamic graph.
• It collects many, real-world dynamic graph datasets, offers them
with a uniform representation, and makes them publicly available
in this format, for use as-is or jointly with the DyGraph Generator.
• It demonstrates how the DyGraph Generator may be used to
create graphs with properties that mimic real-world datasets and
modify a graph over time, while maintaining its original charac-
teristics. In our case study, we find that DyGraph is able to closely
match the degree distribution properties of real input datasets (3 to
5.5 times better than Power Law), and that users can control the
properties in the output graphs by applying small changes to an
automatically-generated script.
2 DYNAMIC GRAPHS
2.1 Temporal Representation
A static graph 𝐺 is a structure comprising a set of unique vertices 𝑉
and a set of unique edges 𝐸, such that 𝐺 = (𝑉, 𝐸). Graph algorithms
are most often organized as a series of iterations. Each iteration
includes a “compute” stage and a “share” stage. In the compute
stage, the same instructions are executed independently for each
vertex in 𝑉 , creating some result value. In the share stage, result
values are shared with some or all of the vertex’s neighbors across
all edges in 𝐸. [16], [24], and [34] discuss this and similar paradigms
further. This type of algorithm design only requires that the graph’s
topology is not modified within an iteration, thus it can be applied
to dynamic graphs as well, as long as this condition holds.
Indeed, most dynamic graph algorithms leverage this same al-
gorithmic structure with a small modification: they add an “up-
date” stage after each iteration, during which they apply topology
changes (e.g., adding an edge, removing a vertex, etc.) in batch. Fig-
ure 2 shows the complete algorithm organization. It is therefore
most appropriate to represent dynamic graphs as a sequence of 𝑇
static graphs 𝐺0..𝑇−1 with the set of changes between each 𝐺𝑡 and
𝐺𝑡+1. Note this discrete representation holds valid for both stream-
ing (future graphs states provided in real time) and non-streaming
(future states available from the start) applications.
Algorithms computing on dynamic graphs are often incremental;
that is, they use the solution from 𝐺𝑡 as the starting point for 𝐺𝑡+1
[4]. For example, if a shortest path passes through an edge 𝐴 → 𝐵
in 𝐺𝑡 which is removed to create 𝐺𝑡+1, a dynamic algorithm may
start by searching for a short path from 𝐴 to 𝐵 and maintain the rest
of the overall solution. Similarly, a dynamic PageRank algorithm
may use the values found for 𝐺𝑡 as an approximate solution to
𝐺𝑡+1. This informs two key differences between dynamic graphs
and a series of independent static graphs: (1) unchanged vertices
and edges must persist across timesteps with the same vertexID,
stable and unique across all timesteps and (2) we must be able to
accurately describe the topological difference between two adjacent
timesteps with a managable number of changes. For these reasons,
it is not viable to build a dynamic graph using multiple iterations
of an existing static graph generator, as each “timestep” would be
unrealistically different from the previous.
2.2 Intermediate Static Format
Compressed Sparse Row (CSR) is the most common format for
storing static graphs, as CSR representations are easier to read, offer
better space efficiency, and provide more regular memory access
opportunities than adjacency matrices or lists [33]. CSR contains
DyGraph: A Dynamic Graph Generator and Benchmark Suite GRADES ’22, June 12, 2022, Philadelphia, PA
Table 1: Key DyGraph Generator Commands
Command Description
add [𝑣] vertices Create 𝑣 disconnected vertices
add [𝑒] random edges Create 𝑒 edges, connecting two uniformly random vertices
add edge power law [𝑆𝑑] [𝐾𝑠] [𝑚𝑎𝑥𝐷𝑒𝑔] [𝑏𝑖𝑛𝑠] Add edges via Power Law to existing graph
remove [𝑣] vertices Delete 𝑣 random vertices and all connecting edges
remove [𝑒] random edges Delete 𝑒 random edges from the graph
commit Save state as 𝐺𝑡 . Begin changes for 𝐺𝑡+1
build [𝑆𝑑][𝐾𝑠][𝑚𝑎𝑥𝐷𝑒𝑔][𝑏𝑖𝑛𝑠] Add edges via Power Law from scratch (𝐺0 only)
bin [𝑏𝑖𝑛𝐼𝐷] [𝑏𝑖𝑛𝑆𝑖𝑧𝑒] [𝑙𝑜𝑐𝑎𝑙𝑀𝑎𝑥] Define bin parameters (after “add edge power law”/“build”)
two arrays: an edge list and a vertex list. The vertex list maps each
vertex to the starting index of its list of neighbors in the edge list,
one index per vertex. The edge list has one entry for each edge,
grouped by source vertex, where each entry holds a destination
vertex. For example, if vertices 3 and 4 had vertex list entries of 7
and 12, vertex 3 has five neighbors whose IDs are in slots 7-11 of the
edge list. While CSR is popular for its minimal storage footprint, it
is inefficient for representing dynamic graphs: changes to the graph
topology would require a complete reconstruction of the graph
representation. Moreover, pinpointing the differences between two
graphs in this format entails the complete construction of both
graphs. For these reasons, many research works using dynamic
graphs avoid CSR formats.
Specifically, these works and DyGraph uses a regular edge list,
offering a more effective representation for dynamic graphs. Each
edge is represented by two values: a source and a destination vertex
ID. This format requires more storage space (2|𝐸| instead of |𝑉 |+|𝐸|),
but it is easier to compute the difference between two graphs, or
two time-based snapshots of a same graph, 𝐺𝑡 and 𝐺𝑡+1, and derive
a change log between them.
3 DYGRAPH
In this section, we present the DyGraph Generator, and then sum-
marize the datasets adapted for benchmark distribution. Both can
be found at adacenter.org/dygraph.
3.1 DyGraph Generator
The DyGraph Generator is designed for three use cases. First, users
can create dynamic graphs from scratch. Second, they can automat-
ically generate datasets with properties that mimic those of other
input dynamic graphs via an automatically-generated intermediate
script. This feature allows users to create and share datasets with
the same profile as other graphs that they may be unable to share.
Third, users can modify this script to create augmented versions
of an input dynamic graph dataset, such as increasing the number
of vertices or edges. The DyGraph-generated scripts assume that
the input graph follows the Power Law, but commands are also
available to add and remove individual vertices and edges. This
latter functionality gives users the flexibility to create graphs with
any degree distribution, where degree is the number of neighbors
that a vertex is connected to. Note that the use of scripting allows
researchers to both share key properties of the datasets used in their
evaluations, and also give other researchers the means to create
their own similarly-profiled graphs, all without needing to publicly
distribute the original datasets.
3.1.1 DyGraph Generator commands. Users create dynamic graphs
from scratch by providing a list of commands (via console or script).
Commands to modify the graph either add vertices, add edges,
remove vertices, remove edges, or commit graph changes to the
current timestep. When a vertex is removed, all its edges are also re-
moved. Table 1 lists the primary DyGraph commands and their com-
mand parameters. DyGraph also provides the opportunity to add
or remove specific vertices by vertexID, and to add or remove spe-
cific edges by pairs of vertexIDs, enabling finer-grain control when
needed. Note that vertices have unique identifiers even when re-
moved from the graph, and that both new and previously-removed
vertices may be re-added using these commands, specified by ver-
texID. These features are detailed in the DyGraph Generator’s user
manual with the suite. Graph modifications to be applied to different
timesteps are separated by commit commands.
Note that when vertices are added, they have no neighbors. Ver-
tices with no neighbors are omitted from output files when a commit
occurs. There are three ways to add edges to vertices: (1) single-item,
(2) with uniform random distribution, and (3) with a power-law
distribution. To add a single edge, users specify the two vertices
to connect. To add randomly distributed edges, users specify the
number of edges to add: DyGraph adds that many edges, selecting
which existing vertices to connect with a uniform probability for
each edge. Finally, to add edges via Power Law, users invoke the
“add edge power law” command, as described below.
3.1.2 Power-law dynamic graph generation. To create a power-law
graph, DyGraph first determines the desired degree distribution,
then modifies the number of edges in the existing set. Note that
commands to create power-law graphs are commands to add edges;
these commands do not add any vertices. Indeed, vertices must be
created before adding any edge. New edges modify the degree of
each vertex so as to fit into a power-law distribution. The proper
degree distribution is calculated by leveraging a combination of (i)
an exponential decay (power-law) function and (ii) a set of bins.
The power-law function can be represented as:
|𝑉 (𝑑𝑒𝑔)| = 𝐾𝑠 · 𝑑𝑒𝑔𝑆𝑑 (1)
where 𝑑𝑒𝑔 is the degree and |𝑉 (𝑑𝑒𝑔)| is the number of vertices with
degree 𝑑𝑒𝑔. 𝐾𝑠 is a scaling factor for the number of vertices in each
degree, and 𝑆𝑑 (<0) controls the slope of |𝑉 (𝑑𝑒𝑔)|’s exponential
decay as 𝑑𝑒𝑔 increases. The DyGraph Generator determines how
many edges to add and which pairs of vertices to connect with
those edges, so to attain the power-law properties specified by 𝐾𝑠
and 𝑆𝑑.
GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al.
Bin 0
Deg 14-23
10
#
Vertices
100
1
10
1k
10k
100k
Degree 20 30 40
Bin 1
Deg 24-33
Etc…
Threshold
LocalMax
BinWidth
BinSize
binning starts
Power
Law
Figure 3: Schematic of the DyGraph Generator’s binning
process to approximate real-world power-law graphs. Degree
distribution is set by Power Law for low-degree vertices and
bin parameters for high-degree vertices.
3.1.3 Limitation of basic power-law distribution. We found that
many real-world power-law graphs do not fit precisely in a power-
law distribution, demonstrated in Section 4.2. This is expected,
as Power Law provides an efficient mechanism to approximate
many graphs arising from real-world situations, but it is still just
an approximation. The most impactful difference between an ideal
power-law graph and those we have observed in our real-world
datasets lies in the frequency of the high-degree vertices, that is,
those few vertices that have a high number of incident edges. Indeed,
many real graphs have more such vertices than predicted by a
basic power-law distribution. Figure 3 provides a schematic of this
divergence for a graph representing how the power-law distribution
fitted for this graph effectively approximates low-degree vertices
on the left part of the figure, but fails to model the high-degree
vertices on the right.
In striving to design a high-accuracy generator, we split each
degree in the distribution into one of two sections: i) Power Law
for low-degree vertices and ii) a process leveraging vertex-binning
for high-degree vertices, as discussed below. A user-defined thresh-
old determines which of the two approaches DyGraph will use in
modeling the connectivity of each vertex: the threshold separates
high and low degrees:
𝑚𝑖𝑛𝐷𝑒𝑔 = 𝑚𝑖𝑛(𝑑𝑒𝑔) 𝑠.𝑡. (|𝑉 (𝑑𝑒𝑔)| < 𝐻) (2)
where 𝐻 is the threshold (see Figure 3). Vertices with degrees lower
than 𝑚𝑖𝑛𝐷𝑒𝑔 are connected based on Power Law; vertices with
higher degrees follow the binning process described below in Sec-
tion 3.1.4. We found that the exact value of 𝐻 has little effect on the
quality of the real-world graph approximation, because real-world
graphs diverge slowly from the power-law function as vertices’
degrees increase. We empirically found that several graphs deviate
from Power Law at 𝐻 ≈ 10, so we set 𝐻 = 10.
3.1.4 Binning for power-law graphs. In order to model the vertices’
distribution in the segment where 𝑑𝑒𝑔 > 𝑚𝑖𝑛𝐷𝑒𝑔, which is the long
tail of the power-law distribution, we chose to approximate the tail
as a series of boxes, each of width binWidth and of height localMax.
We call these boxes “bins.” Below we describe the key traits we
capture for each bin.
Table 2: DyGraph Power-law Command Parameters
Param. Description
𝐾𝑠 Power-law scaling factor
𝑆𝑑 Power-law decay factor (< 0)
maxDeg Maximum degree among all vertices
bins # bins for power-law add commands
binID Bin ID number (0..𝑏𝑖𝑛𝑠 − 1)
binWidth # degrees in the bin
binSize # degrees in the bin with > 0 vertices
localMax Max # vertices with the same degree
Bins represent degree intervals; that is, each vertex whose degree
is within a certain range is assigned to the same bin. The “add edge
power law” command allows users to specify the number of bins,
while the DyGraph Generator labels each bin with its own identifier
binID. The first bin begins at degree 𝑚𝑖𝑛𝐷𝑒𝑔. The last bin contains
maxDeg, that is the highest degree of any vertex in the graph.
Each bin is specified by three key parameters: binWidth, localMax,
and binSize. BinWidth is the width of the interval in degrees
assigned to the bin (i.e., the degree capacity of a bin). The DyGraph
Generator creates bins that have all the same binWidth. LocalMax
is the maximum number of vertices with a same degree within the
bin. BinSize is the number of degree values within a bin that have
at least one vertex at that degree, that is, |𝑉 (𝑑𝑒𝑔)| > 0. Each bin has
a specific localMax and binSize. Table 2 summarizes all parameters
described.
The binWidth parameter determines the granularity we use in
modeling the distribution. We use localMax to capture how vertices
in each bin become more sparse as degree increases. In other words,
monitoring localMax helps us avoid the generation of spikes in the
distribution. We track the BinSize parameter because the far end
of the tail is often rarified; that is, there are very few vertices, and
many degrees have no vertex associated with them. BinSize and
localMax help us capture and reproduce that sparsity.
Consider the example in Figure 4. The first degree with fewer
than 10 vertices is degree 50, so minDeg is 50. BinWidth has been
set to a 100 and bin 0 thus spans degrees 50 to 149. Bin 0 has
a binSize of 97 and a localMax of 18. If the DyGraph Generator
were to add edges so to preserve the currently captured traits of
the graph distribution, it should add edges only to vertices whose
degree matches that of the 97 unique degrees, between 50 and
149, that already have at least one vertex. In addition, it would not
modify the localMax, thus, each degree would have no more than
18 vertices mapped to it. A similar analysis would take place for
bin 1, where the DyGraph Generator would only add edges so that
vertices only fall into one of the 58 degrees, between 150-249, that
already had vertices mapped to it. The DyGraph Generator would
also maintain the localMax for vertices in this bin as 5.
3.1.5 Adding edges using binning. The above power-law and bin-
ning structure provide a complete map of how new edges should
be distributed over the entire graph (set of vertices). Once this
map is computed, the DyGraph Generator adds edges such that
the resulting graph’s degree distribution aligns with the new dis-
tribution. Note that the user controls the degree distribution of
the final graph, not the specific number of edges to add. To this
DyGraph: A Dynamic Graph Generator and Benchmark Suite GRADES ’22, June 12, 2022, Philadelphia, PA
Table 3: Dynamic Graph Datasets
Name Description |V| Com. |E| Max |E| Timesteps Time Span
AdTraffic Online advertisement interactions 6M 410M 5.9M 135 30 days
Bitcoin User-to-User bitcoin trust network 5k 3.7M 36k 147 N/A
Email Internal research institution emails 1k 7.5M 167k 27 2.2 yrs
Forum Private college forum interactions 899 33.7k 848 103 N/A
Higgs Higgs-Boson Twitter Interactions 456k 34.3M 563k 125 30 days
Hospital Patient-to-staff proximity 75 2.2M 32k 141 96 hrs
Movies User-movie streaming network 138k 1.48B 16.4M 193 20 yrs
Music User-music streaming network 92k 12.7M 186k 184 N/A
Ubuntu AskUbuntu user-to-user interactions 159k 70.5M 964k 178 7.3 yrs
100
#
Vertices
Degree 200 300 400
1k
10k
Bin 0 Bin 1 Bin 2 Bin 3
100k
1M
50 150 250
16
12
8
4
Bin 0
LocalMax
Bin 1
97/100 58/100
Figure 4: Degree distribution for the Ubuntu dataset at T=50,
showing binning parameter values in bins 0 and 1.
end, DyGraph first computes the difference between the number of
vertices in each bin, and the final number of vertices that should
be in each bin. Then DyGraph determines which vertices should
be moved to each bin (unless they are already in the correct bin).
Finally, each vertex in each bin is assigned a final target degree,
and it should be connected to the additional number of edges as
required to reach its target degree. To track all the vertices that
should be connected to additional edges, we create an edge-addition
set, where each vertex is present as many times as the number of ad-
ditional incident edges it must receive. For example, if a vertex has
100 neighbors, and the new degree distribution dictates it should
have 115 neighbors, the vertex is added to the edge-addition set
15 times. Once the edge-addition set is built, edges are created by
connecting two vertices randomly from this set, removing both
from the set when connected. Self-loops (edges connecting a vertex
to itself) and duplicate edges (edges that already exist) are rejected.
These two aspects are easily modifiable by users.
The DyGraph Generator allows users to add edges with power-
law characteristics through two commands: “build” and “add edge
power law.” “Build” is a special case of “add edge power law”, to be
used for an initial graph with no existing edges. In that case, the
edge-addition set is created solely from a completely new degree
distribution (since the existing degree distribution is empty).
3.1.6 Removing edges. There are two ways to remove edges with
the DyGraph Generator: by removing a single edge, or by removing
multiple edges in a uniformly random fashion. Note that, when
removing edges, DyGraph can only choose to remove edges that
are already present in the graph. If the existing set of edges are
uniformly random among all vertices, then the multiple edge re-
moval is also uniformly random. However, if the existing set of
edges was distributed in accordance with Power Law, removing
edges randomly is thus also naturally abiding the Power Law.
3.1.7 Automated script generation. In addition to enabling users to
write their own DyGraph scripts, DyGraph can also automatically
generate scripts that mimic an input dynamic graph: it analyzes the
graph to extract the parameters described (vertices to add, power-
law parameters, bin parameters, etc.), using a pre-set binWidth. This
process can be applied to all timestamps, so that the script generates
a dynamic graph with the same degree distribution trends. This
script can then be used as is, or modified before generating the
synthetic graph.
3.2 Datasets
We provide a collection of real-world dynamic graph datasets for
two reasons. First, researchers may use existing datasets as a fast,
valuable first step to measure initial results. Second, users may ana-
lyze these existing datasets to find and adjust values for DyGraph’s
power-law command parameters.
We list the datasets included with DyGraph in Table 3, along with
several of their key characteristics. Though the original datasets
are publically available in a variety of formats, we have converted
them into a consistent format: a series of separate edge-list files,
one for each timestep, as described in Section 2. With reference
to Table 3, |𝑉 | is the number of unique vertices in the dataset.
Combined Edges (𝐶𝑜𝑚.|𝐸|) reports the sum of all edges across all
timesteps, counting each occurrence of edges that are listed in
multiple discrete timesteps. Max Edges (𝑀𝑎𝑥|𝐸|) reports the peak
number of edges in a timestep, thus tracking how large the dynamic
graph becomes throughout its evolution. Timesteps reports the
number of individual graph states included in the dynamic graph;
it also corresponds to the number of separate files. Finally, Time
Span is the duration of time represented by the dataset, as reported
by its original source.
Our dataset repository includes 9 datasets. The AdTraffic dataset
[9] is a sample of live online advertisement traffic from Criteo, a
computational advertising company. The Bitcoin dataset [18] [17]
represents a trust network of users of Bitcoin OTC, an over-the-
counter bitcoin trading marketplace. We include three social media
datasets: Email, Forum, and Higgs. The Email dataset [20] [36]
represents internal email transmissions among members of a Eu-
ropean research institute, thus a corporate communication setting.
The Forum dataset [27] is the network of both group and direct
messages among members of a private forum for college students.
The Higgs dataset [8] contains the replies, retweets, and mentions
GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al.
1
100
10k
1
1k
1M
1,000
100
10 100 10k 100 10k
T=0 T=50 T=100
#
Vertices
Degree
Original Dataset Made by DyGraph
Power Law Function
1
1k
1M
Figure 5: Log-log plots of degree distributions for the original input AdTraffic dataset (blue) and the synthetic dataset generated
by the DyGraph Generator (green) at three time steps: 0, 50, and 100. The plots are overlaid with the power-law function derived
from the dataset by DyGraph (pink).
#
Vertices
1k
1M
1
1k
1M
1
1 100 10k 1 100 10k
5 10 15
Degree Degree Degree
T=0
T=10
T=100
T=0
T=10
T=100
T=0
T=10
T=100
0
KS Sd
103.3 -2.3
105.7 -6.1
106.8 -6.0
v=10SidSd
Figure 6: Log-log plots of (left) the degree distributions over time for the original AdTraffic dataset, (center) the power-law
functions and their parameters derived from the original dataset by DyGraph, and (right) the synthetic dataset generated by
DyGraph to match the original.
of the Higgs-Boson particle discovery on Twitter. The Hospital
dataset [31] maps which patients and staff had close contact in a
hospital. We also include two datasets for recommendation system
applications. The Music dataset [6] is the network of 2,000 Last.fm
users and the music they played. The Movies dataset [14] is the
set of 5-star ratings and text reviews of randomly selected users of
MovieLens with >20 reviews. Finally, the Ubuntu dataset [28] is
the set of user interactions in the AskUbuntu online forum.
4 EXPERIMENTAL EVALUATION
In this section, we evaluate how the DyGraph Generator can build
graph datasets that closely resemble real-world graphs. We divide
our analysis into three sections: defining the evaluation metric,
showing how DyGraph can create graphs that mimic real-world
graphs, and demonstrating how users can edit these DyGraph
scripts to customize their graphs’ properties to their own needs.
4.1 Evaluation Metric
To evaluate whether DyGraph creates graphs with similar prop-
erties to an input graph, we must choose a metric to compare the
two graphs. Common metrics include diameter, clustering coef-
ficient, triangle counting, and degree distribution. For this work,
metrics must be applicable to all possible real-world graphs, allow
us to compare the size of two graphs, and demonstrate the dataset’s
sparsity, an essential factor in evaluating performance. Diameter
requires that there exist paths from all vertices to all other vertices
and ignores any disconnected components. Many real-world graphs
do not have this property. Triangle counting is heavily affected by
the size of the graph. Clustering coefficient can be measured for all
graphs, but it is unclear whether differences in power-law graph
sizes affect the clustering coefficient. We use degree distribution,
as the only common metric to fully meet these conditions.
4.2 Mimicking Existing Graphs
We evaluate whether the DyGraph Generator builds synthetic
graphs with degree distributions like those of other input, real-
world graphs. We use one of the largest of our dynamic graphs:
AdTraffic. As the graph changes over time, the degree distribution
also changes, so we evaluate the distribution at multiple timesteps.
Figure 5 shows degree distributions of AdTraffic and DyGraph’s
generated dataset at three timesteps. For degrees from 1 to 200, the
correlation coefficient (𝑟) of the DyGraph-generated dataset against
the original is 2.98x closer to ideal (𝑟 = 1) than that of the Power
Law function against the original when T=50, and 5.57x closer when
T=100. Similarly, the deviations of Power Law and DyGraph from
the original (two-sample Kolmogorov–Smirnov metrics) improve
from 0.053 to 0.023 for 𝑇 = 50 and 0.086 to 0.015 for 𝑇 = 100.
We make a few observations from these plots. First, the power-
law function aligns with the AdTraffic dataset only for low-degree
vertices (i.e., the left side of each plot). Once the power-law line falls
below ∼10 vertices, the degree distributions of both the original and
the synthetic dataset diverge from the power-law line (pink). This
trait shows, first, that the binning process successfully ensures that
the synthetic graph has a similar number of higher-degree vertices,
more than would be included by Power Law alone. Second, the
larger the graph, the more closely DyGraph’s degree distribution
aligns with the original dataset, as DyGraph has more opportu-
nities to create the precise distribution that it is targeting. Third,
while DyGraph generally creates a degree distribution shape which
matches the original far closer than the power-law function, it also
tends to create vertices with slightly lower average degree than the
Triangle density?
Modularity?
DyGraph: A Dynamic Graph Generator and Benchmark Suite GRADES ’22, June 12, 2022, Philadelphia, PA
1k
100
10
T=30
#
Vertices
Degree
Original
10k
1M
DyGraph DyG (x2)
10k
100
1M
2M 4M
2M
# Vertices # Edges
10 20 30 40 50 10 20 30 40 50
T =
(a)
(b) (c)
Figure 7: (a) Log-log plot of degree distribution, (b) vertex
count across time, and (c) edge count across time for the
original AdTraffic dataset, a matching synthetic DyGraph-
generated graph, and a synthetic graph generated by dou-
bling the original graph size in the script.
original. This deviation is caused by two constraints in DyGraph:
self-loops and duplicate edges are eliminated, and vertices with no
neighbors are omitted. As DyGraph is an open-source suite, users
may remove these constraints if desired.
Figure 6 illustrates how the AdTraffic dynamic graph evolves
over time and how the DyGraph-generated graph matches this
evolution. Figure 6 also highlights how the power-law properties
change over time. Initially (T=0 to T=10), more vertices are added
and connected to few neighbors, but few existing vertices are con-
nected to many neighbors. This leads to a higher scaling factor (𝐾𝑠
rises from 103.3 to 105.7) but a sharper relative decline as degree
increases (𝑆𝑑 falls from -2.3 to -6.1). After the initial growth (T=10
to T=100), 𝐾𝑠 continues to increase and 𝑆𝑑 stays steady around
-6.0. This reflects a trend in which new vertices with few neighbors
continue to join the graph, but many more of the existing vertices
connect to additional neighbors already in the graph.
4.3 Customizing Graph Properties
As described in Section 3.1, we designed DyGraph to be capable
of automatically generating a script which can, in turn, be used
to generate a synthetic graph with the same characteristics as the
original. We leveraged this approach to provide users the flexibility
to modify existing graph properties, and create variants from the
original real-world graphs provided with the benchmark suite. To
demonstrate this feature, we take the AdTraffic dataset, have Dy-
Graph produce the generating script, then modify such script to
double both vertices and edges.
We edited the script as follows. First, for each command adding
𝑛 vertices, we changed 𝑛 to 2𝑛. Second, for each command adding
edges to follow power-law properties, we increased both sections
of the degree distribution plot. For the power-law section, we in-
creased 𝐾𝑠 so to double the number of vertices at each degree. Fi-
nally, for the binning section of the degree distribution, we attained
our goal by doubling each localMax.
2k
4k
Seconds
T= 0 10 20 30 40
Original
AdTraffic
2x Vertices
2x Edges
3k
1k
0.57M
1.66M
2.31M
2.72M
3.25M
#Edges
Figure 8: DyGraph Generator execution time for generating
the synthetic graphs of Figure 7.
Figure 7 plots the degree distribution of the original AdTraffic
dataset, overlaid with that of the matching synthetic DyGraph-
generated graph, and also with that of the “doubled” synthetic graph,
obtained with the modified script. As the figure presents the graphs
in a log-plot, the doubled synthetic graph closely overlaps with
the original synthetic graph, indicating extremely similar degree
distributions. Figure 7 also reports how the number of vertices
and edges change over time. Note how the original graph and
the synthetic DyGraph edge-count align closely. In addition, the
“doubled” synthetic graph reports approximately double vertices
and edges. Note also how the baseline synthetic graph (green)
consistently includes slightly fewer vertices than the real-world
graph, because of the constraints described in Section 4.2.
4.4 DyGraph Generator Performance
The DyGraph Generator is an open-source software, written in C++.
Figure 8 reports execution times for building the synthetic graphs
identified in Figure 7, timed on a machine using Ubuntu 20.04, an
Intel i7-7700, 32GB of memory, and a 2TB HDD. We observe that
increasing either dimension of a graph (vertices or edges) extends
the execution time. We further observe that increasing |𝐸| takes
more time than a similar increase in |𝑉 |. Three factors explain this.
First, adding vertices alone requires almost no additional compu-
tation, as new vertices start with no neighbors. Runtime increases
only because a larger set of vertices increases the size of the distri-
bution when adding edges. Second, adding edges with power law
properties dominates execution time, as all other graph updates re-
quire trivial amounts of compute beyond I/O. Finally, doubling the
vertex count without increasing edge count result in more vertices
without edges, which are omitted from the output graph. DyGraph
is currently single-threaded, and we hope to parallelize DyGraph
as part of future work.
5 RELATED WORK
Prior works have identified that there are few dynamic graph gen-
erators, emphasizing that better such tools are needed [32] [13] [5]
[35]. Some static graph generators can be used to build dynamic
graphs. Kronecker graph generators [19], for example, use Kro-
necker multiplication to iteratively build increasingly-large graphs
with power-law degree distribution. Similarly, Barabási-Albert (BA)
models [3] iteratively attach new vertices to pre-existing vertices
in the graph using preferential attachment. Sets of new Kronecker
multiplications or preferential attachments may be used as new
graph states, collectively forming a dynamic graph. However, such
GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al.
generation methods enforce two undesirable restrictions: graph
sizes increase monotonically (i.e., vertices and edges are not re-
moved) and the density increases as the graph size increases.
Görke [10] proposed a model to generate uniformly random
graphs to evaluate algorithms for dynamic graph clustering. These
graphs follow an evolving ground-truth for clustering, which can
be compared against the clusters discovered by the algorithm. Un-
like prior work, vertices and edges in this model may be added or
removed over time. However, the model can only generate graphs
with uniformly-random degree distribution and is thus unable to
mimic many real-world applications.
Purohit’s more flexible model [30] uses atomic, temporal graph
motifs (i.e., sub-graph patterns of ≤ 3 vertices). This model is capa-
ble of mimicking many key properties of existing dynamic graphs,
including the original dataset’s growth rate, structure, and degree
distribution. However, vertices and edges cannot be removed, again
limiting its ability to emulate key applications [23].
Waudby [35] extended the LDBC Social Network Benchmark
[2] to include insertions and deletions for dynamic graphs using
lifespan. However, there is no functionality to mimic properties of
existing graphs. This limitation restricts users’ ability to expand
their own graphs which have specific desired properties, but are
too small to be used for effectively evaluating their solutions.
6 CONCLUSION
Dynamic graph processing is quickly becoming a critical area of
data mining and analytics. As researchers develop new algorithms,
optimizations, and hardware solutions for dynamic graphs, there
is an urgent need for more robust infrastructure to evaluate these
solutions. In this work, we present DyGraph, a solution for dynamic
graph workloads that includes both a wide range of real-world
dynamic graph datasets and a novel, flexible DyGraph Generator
for synthetic dynamic graph datasets. We demonstrate how the
synthetic graphs created by the DyGraph Generator closely mimic
the properties of real-world dynamic graphs, attaining 3 to 5.5 times
more accurate graph datasets than Power Law. Further, we illustrate
that the DyGraph Generator can be leveraged to automatically
produce a script describing a real-world graph by analyzing it: such
scripts can be later modified to generate new synthetic graphs of
any size and any power-law characteristics, allowing users to create
variants of real-world datasets to fit their needs and evaluations.
ACKNOWLEDGMENTS
This work was supported by the Applications Driving Architectures
(ADA) Research Center, a JUMP Center co-sponsored by SRC and
DARPA.
REFERENCES
[1] Abraham Addisie and Valeria Bertacco. 2020. Centaur: Hybrid Processing in
On/Off-chip Memory Architecture for Graph Analytics. In Proc. DAC.
[2] Renzo Angles, János Benjamin Antal, Alex Averbuch, Peter A. Boncz, Orri Erling,
Andrey Gubichev, Vlad Haprian, Moritz Kaufmann, Josep Lluís Larriba-Pey, Nor-
bert Martínez-Bazan, József Marton, Marcus Paradies, Minh-Duc Pham, Arnau
Prat-Pérez, Mirko Spasic, Benjamin A. Steer, Gábor Szárnyas, and Jack Waudby.
2020. The LDBC Social Network Benchmark. In arXiv CoRR.
[3] Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random
networks. In Science.
[4] Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, and Torsten Hoe-
fler. 2021. Practice of streaming processing of dynamic graphs: Concepts, models,
and systems. In Proc. TPDS.
[5] Angela Bonifati, Irena Holubová, Arnau Prat-Pérez, and Sherif Sakr. 2020. Graph
Generators: State of the Art and Open Challenges. In ACM Comput. Surv.
[6] Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. Workshop on Information
Heterogeneity and Fusion in Recommender Systems. In Proc. RecSys.
[7] Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi
Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. In
Proc. VLDB.
[8] Manlio De Domenico, Antonio Lima, Paul Mougel, and Mirco Musolesi. 2013.
The Anatomy of a Scientific Rumor. In Nature Sci. Rep.
[9] Eustache Diemert, Julien Meynet, Pierre Galland, and Damien Lefortier. 2017.
Attribution Modeling Increases Efficiency of Bidding in Display Advertising. In
Proc. ADKDD.
[10] Robert Görke, Roland Kluge, Andrea Schumm, Christian Staudt, and Dorothea
Wagner. 2012. An efficient generator for clustered dynamic random networks. In
Proc. MedAlg.
[11] Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret
Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accel-
erator for graph analytics. In Proc. MICRO.
[12] Kathrin Hanauer, Monika Henzinger, and Christian Schulz. 2020. Fully dynamic
single-source reachability in practice: An experimental study. In Proc. ALENEX.
[13] Kathrin Hanauer, Monika Henzinger, and Christian Schulz. 2021. Recent advances
in fully dynamic graph algorithms. arXiv preprint.
[14] Maxwell Harper and Joseph Konstan. 2015. The MovieLens Datasets: History
and Context. In ACM Trans. iiS.
[15] Takanori Hayashi, Takuya Akiba, and Ken-ichi Kawarabayashi. 2016. Fully
dynamic shortest-path distance query acceleration on massive networks. In Proc.
CIKM.
[16] Vasiliki Kalavri, Vladimir Vlassov, and Seif Haridi. 2018. High-Level Programming
Abstractions for Distributed Graph Processing. In IEEE Trans. KDE.
[17] Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and
V.S. Subrahmanian. 2018. REV2: Fraudulent User Prediction in Rating Platforms.
In Proc. WSDM.
[18] Srijan Kumar, Francesca Spezzano, V. S. Subrahmanian, and Christos Faloutsos.
2016. Edge Weight Prediction in Weighted Signed Networks. In Proc. ICDM.
[19] Jurij Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos.
2005. Realistic, mathematically tractable graph generation and evolution, using
kronecker multiplication. In Proc. ECML PKDD.
[20] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph Evolution:
Densification and Shrinking Diameters. In ACM Trans. KDD.
[21] Jure Leskovec and Rok Sosič. 2016. SNAP: A General-Purpose Network Analysis
and Graph-Mining Library. In ACM Trans. IST.
[22] Zhe Lin, Fan Zhang, Xuemin Lin, Wenjie Zhang, and Zhihong Tian. 2021. Hier-
archical core maintenance on large dynamic graphs. In Proc. VLDB.
[23] László Lőrincz, Júlia Koltai, Anna Fruzsina Győr, and Károly Takács. 2019. Col-
lapse of an online social network: Burning social capital to create it?. In Jour. Soc.
Netw.
[24] Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty
Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph
processing. In Proc. SIGMOD.
[25] Andrew McCrabb, Eric Winsor, and Valeria Bertacco. 2019. DREDGE: Dynamic
repartitioning during dynamic graph execution. In Proc. DAC.
[26] Mark Newman. 2005. Power laws, Pareto distributions and Zipf’s law. In Jour.
Contemp. Phys.
[27] Tore Opsahl. 2013. Triadic closure in two-mode networks: Redefining the global
and local clustering coefficients. In Jour. Soc. Netw.
[28] Ashwin Paranjape, Austin Benson, and Jure Leskovec. 2017. Motifs in Temporal
Networks. In Proc. WSDM.
[29] Tiago Peixoto. 2020. The Netzschleuder Network Catalogue and Repository.
[30] Sumit Purohit, Lawrence Holder, and George Chin. 2018. Temporal graph gener-
ation based on a distribution of temporal motifs. In Proc. MLG.
[31] Ryan Rossi and Nesreen Ahmed. 2015. The Network Data Repository with
Interactive Graph Analytics and Visualization. In Proc. AAAI.
[32] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and Tamer Özsu.
2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Pro-
cessing. In Proc. VLDB.
[33] Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and
Qiang-Sheng Hua. 2018. Graph processing on GPUs: A survey. In CSUR.
[34] Philip Stutz, Abraham Bernstein, and William Cohen. 2010. Signal/Collect: Graph
Algorithms for the (Semantic) Web. In Proc. ISWC.
[35] Jack Waudby, Benjamin Steer, Arnau Prat-Pérez, and Gábor Szárnyas. 2020.
Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social
Network Benchmark’s Data Generator. In Proc. GRADES-NDA.
[36] Hao Yin, Austin Benson, Jure Leskovec, and David Gleich. 2017. Local Higher-
Order Graph Clustering. In Proc. SIGKDD.

More Related Content

Similar to DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES

System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...
System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...
System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...Reza Nourjou, Ph.D.
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
AdMap: a framework for advertising using MapReduce pipeline
AdMap: a framework for advertising using MapReduce pipelineAdMap: a framework for advertising using MapReduce pipeline
AdMap: a framework for advertising using MapReduce pipelineCSITiaesprime
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...dbpublications
 
kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdfAkuhuruf
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
 
Challenges on geo spatial visual analytics eurographics
Challenges on geo spatial visual analytics eurographicsChallenges on geo spatial visual analytics eurographics
Challenges on geo spatial visual analytics eurographicsRaffaele de Amicis
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningIIRindia
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsIJERA Editor
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slidesJason Riedy
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computingabhijeetnawal
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
World Pipelines - Better Together - SCADA and GIS
World Pipelines - Better Together - SCADA and GISWorld Pipelines - Better Together - SCADA and GIS
World Pipelines - Better Together - SCADA and GISsmrobb
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 

Similar to DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES (20)

System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...
System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...
System Architecture of Cloud-based Web GIS for Real-Time Macroeconomic Loss E...
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
AdMap: a framework for advertising using MapReduce pipeline
AdMap: a framework for advertising using MapReduce pipelineAdMap: a framework for advertising using MapReduce pipeline
AdMap: a framework for advertising using MapReduce pipeline
 
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
 
kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdf
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database Techniques
 
Challenges on geo spatial visual analytics eurographics
Challenges on geo spatial visual analytics eurographicsChallenges on geo spatial visual analytics eurographics
Challenges on geo spatial visual analytics eurographics
 
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
 
SC10 project slides
SC10 project slidesSC10 project slides
SC10 project slides
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computing
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
World Pipelines - Better Together - SCADA and GIS
World Pipelines - Better Together - SCADA and GISWorld Pipelines - Better Together - SCADA and GIS
World Pipelines - Better Together - SCADA and GIS
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
Poster
PosterPoster
Poster
 

More from Subhajit Sahu

Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Subhajit Sahu
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESSubhajit Sahu
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESSubhajit Sahu
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESSubhajit Sahu
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTESSubhajit Sahu
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSubhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERSubhajit Sahu
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Subhajit Sahu
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESSubhajit Sahu
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESSubhajit Sahu
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTESSubhajit Sahu
 
Basic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESBasic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESSubhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESSubhajit Sahu
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESSubhajit Sahu
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...Subhajit Sahu
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESSubhajit Sahu
 
Youngistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESYoungistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESSubhajit Sahu
 
Youngistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTESYoungistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTESSubhajit Sahu
 

More from Subhajit Sahu (20)

Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
 
Can you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTESCan you fix farming by going back 8000 years : NOTES
Can you fix farming by going back 8000 years : NOTES
 
HITS algorithm : NOTES
HITS algorithm : NOTESHITS algorithm : NOTES
HITS algorithm : NOTES
 
Basic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTESBasic Computer Architecture and the Case for GPUs : NOTES
Basic Computer Architecture and the Case for GPUs : NOTES
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Are Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTESAre Satellites Covered in Gold Foil : NOTES
Are Satellites Covered in Gold Foil : NOTES
 
Taxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTESTaxation for Traders < Markets and Taxation : NOTES
Taxation for Traders < Markets and Taxation : NOTES
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
ApproxBioWear: Approximating Additions for Efficient Biomedical Wearable Comp...
 
Income Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTESIncome Tax Calender 2021 (ITD) : NOTES
Income Tax Calender 2021 (ITD) : NOTES
 
Youngistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTESYoungistaan Foundation: Annual Report 2020-21 : NOTES
Youngistaan Foundation: Annual Report 2020-21 : NOTES
 
Youngistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTESYoungistaan: Voting awarness-campaign : NOTES
Youngistaan: Voting awarness-campaign : NOTES
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES

  • 1. DyGraph: A Dynamic Graph Generator and Benchmark Suite Andrew McCrabb University of Michigan mccrabb@umich.edu Hellina Nigatu University of California, Berkeley hellina_nigatu@berkeley.edu Absalat Getachew Addis Ababa Institute of Technology absalat.dawit@aait.edu.et Valeria Bertacco University of Michigan valeria@umich.edu ABSTRACT Dynamic graph processing, execution on vertex-edge graphs that change over time, is quickly becoming a key computing need of the twenty-first century. Dynamic graph algorithms unlock real-time optimization solutions and a wide range of data-mining applica- tions in logistics, finance, marketing, healthcare, and social media, among many others. However, graph algorithms are extremely memory-bound (i.e., their performance is limited by the bandwidth of memory accesses on the underlying hardware platform, rather than the compute capacity). Moreover, dynamic graph algorithms are being applied to increasingly-large datasets, further straining the memory systems and reducing performance. As a result, ad- ditional research is needed to leverage new memory technologies for faster, more efficient, dynamic graph-based processing. Such research is difficult without access to hitherto unavailable industrial- scale dynamic graph datasets to evaluate solutions. In this work, we present DyGraph, a dynamic graph synthetic dataset generator paired with a collection of real-world graphs in the domains of social media, recommendation systems, and fintech. We demonstrate the breadth of graph features represented in this repository and evaluate the DyGraph Generator’s ability to gen- erate synthetic graphs that mimic these real datasets. In our case study, we find that the degree distribution of DyGraph Generator datasets correlate 3 to 5.5 times more closely to real-world datasets than Power Law models, paving the way for much-needed research for high-performance dynamic graph processing. ACM Reference Format: Andrew McCrabb, Hellina Nigatu, Absalat Getachew, and Valeria Bertacco. 2022. DyGraph: A Dynamic Graph Generator and Benchmark Suite. In Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES NDA’22), June 12, 2022, Philadelphia, PA, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3534540. 3534692 1 INTRODUCTION Networks of all sorts have become an unavoidable, underlying component of modern life. Messaging platforms like Facebook and Twitter are deeply embedded in modern social life. Logistics systems Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. GRADES ’22, June 12, 2022, Philadelphia, PA © 2022 Association for Computing Machinery. ACM ISBN 978-1-4503-9384-3/22/06...$15.00 https://doi.org/10.1145/3534540.3534692 T = 0 0→ 1 T = 1 Figure 1: A dynamic graph, represented as a discrete sequence of static graphs, where each graph corresponds to a snapshot of the dynamic graph at time 𝑇. Green elements are added and red elements are removed. send packages anywhere in the world within days. GPS navigation provides instant turn-by-turn driving directions in a maze of inter- connected roadways. Online retailers use market basket analysis to anticipate consumer needs and manage supply. Recommenda- tion systems help deliver useful or entertaining content to users of media streaming services. Few industries are unaffected by today’s rich web of widespread networks. Such networks are digitally represented as vertex-edge graphs: for example, social networks map people to vertices and friendships to edges. Graph-based processing (or “graph processing”) has be- come a critical area of big-data research, as scientists in industry and academia seek to leverage it to generate field-specific knowledge, improve user experiences, and expand industry services. As graph processing spreads, processing on dynamic graphs, that is, graphs whose topologies morph over time (see sketch in Figure 1), has become popular for applications with evolving or real- time data. Some dynamic graph applications are extensions of their static counterparts to provide faster, more useful information to their users. Navigation apps must adapt almost instantly to changes in road networks, such as construction, congestion, or collisions. Dynamic market basket analysis allows retailers to quickly adapt to temporary surges in demand, such as non-perishable food before natural disasters or merchandise linked to trending music artists. In addition, dynamic graph processing enables new applications. For example, contact tracing lets medical experts track the spread of disease and create health policies to save lives in a global pandemic. Graph processing performs poorly on typical modern comput- ers because the graph data, especially in a large graph, is often sparse; that is, each data point is loosely connected to the rest of the structure [25]. This property makes it difficult to exploit spatial and temporal locality optimizations, essential traits for quick and efficient execution in modern computers. Researchers have devel- oped many solutions, both in hardware (including [11] [1] [25]) and software (including [12] [22] [15]), to improve performance and efficiency for both static and dynamic graph processing. Research evaluations in this space use publicly-available, real- world graphs whenever possible to evaluate the proposed solutions.
  • 2. GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al. Compute Share Update vn Create vertex data Share with each vn vn vn Change graph topology Figure 2: Dynamic graph algorithms’ structure: the compute stage creates per-vertex values for some (or all) vertices. The share stage transfers these values to neighbors. The update stage implements all topology updates that have been queued since the previous iteration. Graph repositories, like SNAP [21] and Network Repository [31], provide several graph datasets, but many important types of graphs used in real-world applications are missing from these repositories. For example, in real-world scenarios, static graphs may include billions to trillions of edges [32] [7]. Additionally, many real graphs are power-law graphs; that is, a small set of vertices have a high degree (i.e., they have many neighbors) and the degree distribu- tion plot approximates Power Law [26]. The available datasets are much smaller and do not cover the rich variety of realistic Power Law properties. Thus, to evaluate their work on such large graphs, researchers resort to synthetic graph generators. These tools cre- ate artificial graph datasets with certain pre-set properties, such as number of vertices, degree distribution (where degree is the number of edges incident to the vertex), clustering coefficients, etc. Synthetic generators thus bridge a crucial gap between what is publicly available and what is needed in the research community. This gap is even larger for dynamic graphs. A wider range of dy- namic graphs are needed to evaluate new research projects because dynamic graphs are defined by a richer set of properties across different applications, such as the frequency of graph updates and changes in degree distribution over time. At the same time, even fewer dynamic graph datasets are available in public repositories for multiple reasons. First, collecting dynamic graph data can be pro- hibitively time-consuming and expensive. Second, existing dynamic graphs, from sources like social media, often contain inherently identifiable data, limiting researchers’ ability to share them publicly. Third, the methods and information that companies use to collect their dynamic graph datasets may be protected by industry secrets. Moreover, the offering of synthetic dynamic graph generators is extremely limited. The few available [2] [10] [29] [30] [35] are built for specific purposes and are ill-suited to evaluate novel work. This aspect leaves dynamic graph researchers with few options, other than artificially creating dynamic graphs out of existing static graphs, a crude, unrealistic substitute for real-world data. To bridge this gap in dynamic graph offerings, we present Dy- Graph, a dataset generator and benchmark suite for dynamic graph applications. Specifically, DyGraph contributes the following: • It provides the DyGraph Generator, a novel synthetic dataset generator capable of both generating graphs with user-specified uniformly random or power-law properties from scratch and mim- icking the properties of an input dynamic graph. • It collects many, real-world dynamic graph datasets, offers them with a uniform representation, and makes them publicly available in this format, for use as-is or jointly with the DyGraph Generator. • It demonstrates how the DyGraph Generator may be used to create graphs with properties that mimic real-world datasets and modify a graph over time, while maintaining its original charac- teristics. In our case study, we find that DyGraph is able to closely match the degree distribution properties of real input datasets (3 to 5.5 times better than Power Law), and that users can control the properties in the output graphs by applying small changes to an automatically-generated script. 2 DYNAMIC GRAPHS 2.1 Temporal Representation A static graph 𝐺 is a structure comprising a set of unique vertices 𝑉 and a set of unique edges 𝐸, such that 𝐺 = (𝑉, 𝐸). Graph algorithms are most often organized as a series of iterations. Each iteration includes a “compute” stage and a “share” stage. In the compute stage, the same instructions are executed independently for each vertex in 𝑉 , creating some result value. In the share stage, result values are shared with some or all of the vertex’s neighbors across all edges in 𝐸. [16], [24], and [34] discuss this and similar paradigms further. This type of algorithm design only requires that the graph’s topology is not modified within an iteration, thus it can be applied to dynamic graphs as well, as long as this condition holds. Indeed, most dynamic graph algorithms leverage this same al- gorithmic structure with a small modification: they add an “up- date” stage after each iteration, during which they apply topology changes (e.g., adding an edge, removing a vertex, etc.) in batch. Fig- ure 2 shows the complete algorithm organization. It is therefore most appropriate to represent dynamic graphs as a sequence of 𝑇 static graphs 𝐺0..𝑇−1 with the set of changes between each 𝐺𝑡 and 𝐺𝑡+1. Note this discrete representation holds valid for both stream- ing (future graphs states provided in real time) and non-streaming (future states available from the start) applications. Algorithms computing on dynamic graphs are often incremental; that is, they use the solution from 𝐺𝑡 as the starting point for 𝐺𝑡+1 [4]. For example, if a shortest path passes through an edge 𝐴 → 𝐵 in 𝐺𝑡 which is removed to create 𝐺𝑡+1, a dynamic algorithm may start by searching for a short path from 𝐴 to 𝐵 and maintain the rest of the overall solution. Similarly, a dynamic PageRank algorithm may use the values found for 𝐺𝑡 as an approximate solution to 𝐺𝑡+1. This informs two key differences between dynamic graphs and a series of independent static graphs: (1) unchanged vertices and edges must persist across timesteps with the same vertexID, stable and unique across all timesteps and (2) we must be able to accurately describe the topological difference between two adjacent timesteps with a managable number of changes. For these reasons, it is not viable to build a dynamic graph using multiple iterations of an existing static graph generator, as each “timestep” would be unrealistically different from the previous. 2.2 Intermediate Static Format Compressed Sparse Row (CSR) is the most common format for storing static graphs, as CSR representations are easier to read, offer better space efficiency, and provide more regular memory access opportunities than adjacency matrices or lists [33]. CSR contains
  • 3. DyGraph: A Dynamic Graph Generator and Benchmark Suite GRADES ’22, June 12, 2022, Philadelphia, PA Table 1: Key DyGraph Generator Commands Command Description add [𝑣] vertices Create 𝑣 disconnected vertices add [𝑒] random edges Create 𝑒 edges, connecting two uniformly random vertices add edge power law [𝑆𝑑] [𝐾𝑠] [𝑚𝑎𝑥𝐷𝑒𝑔] [𝑏𝑖𝑛𝑠] Add edges via Power Law to existing graph remove [𝑣] vertices Delete 𝑣 random vertices and all connecting edges remove [𝑒] random edges Delete 𝑒 random edges from the graph commit Save state as 𝐺𝑡 . Begin changes for 𝐺𝑡+1 build [𝑆𝑑][𝐾𝑠][𝑚𝑎𝑥𝐷𝑒𝑔][𝑏𝑖𝑛𝑠] Add edges via Power Law from scratch (𝐺0 only) bin [𝑏𝑖𝑛𝐼𝐷] [𝑏𝑖𝑛𝑆𝑖𝑧𝑒] [𝑙𝑜𝑐𝑎𝑙𝑀𝑎𝑥] Define bin parameters (after “add edge power law”/“build”) two arrays: an edge list and a vertex list. The vertex list maps each vertex to the starting index of its list of neighbors in the edge list, one index per vertex. The edge list has one entry for each edge, grouped by source vertex, where each entry holds a destination vertex. For example, if vertices 3 and 4 had vertex list entries of 7 and 12, vertex 3 has five neighbors whose IDs are in slots 7-11 of the edge list. While CSR is popular for its minimal storage footprint, it is inefficient for representing dynamic graphs: changes to the graph topology would require a complete reconstruction of the graph representation. Moreover, pinpointing the differences between two graphs in this format entails the complete construction of both graphs. For these reasons, many research works using dynamic graphs avoid CSR formats. Specifically, these works and DyGraph uses a regular edge list, offering a more effective representation for dynamic graphs. Each edge is represented by two values: a source and a destination vertex ID. This format requires more storage space (2|𝐸| instead of |𝑉 |+|𝐸|), but it is easier to compute the difference between two graphs, or two time-based snapshots of a same graph, 𝐺𝑡 and 𝐺𝑡+1, and derive a change log between them. 3 DYGRAPH In this section, we present the DyGraph Generator, and then sum- marize the datasets adapted for benchmark distribution. Both can be found at adacenter.org/dygraph. 3.1 DyGraph Generator The DyGraph Generator is designed for three use cases. First, users can create dynamic graphs from scratch. Second, they can automat- ically generate datasets with properties that mimic those of other input dynamic graphs via an automatically-generated intermediate script. This feature allows users to create and share datasets with the same profile as other graphs that they may be unable to share. Third, users can modify this script to create augmented versions of an input dynamic graph dataset, such as increasing the number of vertices or edges. The DyGraph-generated scripts assume that the input graph follows the Power Law, but commands are also available to add and remove individual vertices and edges. This latter functionality gives users the flexibility to create graphs with any degree distribution, where degree is the number of neighbors that a vertex is connected to. Note that the use of scripting allows researchers to both share key properties of the datasets used in their evaluations, and also give other researchers the means to create their own similarly-profiled graphs, all without needing to publicly distribute the original datasets. 3.1.1 DyGraph Generator commands. Users create dynamic graphs from scratch by providing a list of commands (via console or script). Commands to modify the graph either add vertices, add edges, remove vertices, remove edges, or commit graph changes to the current timestep. When a vertex is removed, all its edges are also re- moved. Table 1 lists the primary DyGraph commands and their com- mand parameters. DyGraph also provides the opportunity to add or remove specific vertices by vertexID, and to add or remove spe- cific edges by pairs of vertexIDs, enabling finer-grain control when needed. Note that vertices have unique identifiers even when re- moved from the graph, and that both new and previously-removed vertices may be re-added using these commands, specified by ver- texID. These features are detailed in the DyGraph Generator’s user manual with the suite. Graph modifications to be applied to different timesteps are separated by commit commands. Note that when vertices are added, they have no neighbors. Ver- tices with no neighbors are omitted from output files when a commit occurs. There are three ways to add edges to vertices: (1) single-item, (2) with uniform random distribution, and (3) with a power-law distribution. To add a single edge, users specify the two vertices to connect. To add randomly distributed edges, users specify the number of edges to add: DyGraph adds that many edges, selecting which existing vertices to connect with a uniform probability for each edge. Finally, to add edges via Power Law, users invoke the “add edge power law” command, as described below. 3.1.2 Power-law dynamic graph generation. To create a power-law graph, DyGraph first determines the desired degree distribution, then modifies the number of edges in the existing set. Note that commands to create power-law graphs are commands to add edges; these commands do not add any vertices. Indeed, vertices must be created before adding any edge. New edges modify the degree of each vertex so as to fit into a power-law distribution. The proper degree distribution is calculated by leveraging a combination of (i) an exponential decay (power-law) function and (ii) a set of bins. The power-law function can be represented as: |𝑉 (𝑑𝑒𝑔)| = 𝐾𝑠 · 𝑑𝑒𝑔𝑆𝑑 (1) where 𝑑𝑒𝑔 is the degree and |𝑉 (𝑑𝑒𝑔)| is the number of vertices with degree 𝑑𝑒𝑔. 𝐾𝑠 is a scaling factor for the number of vertices in each degree, and 𝑆𝑑 (<0) controls the slope of |𝑉 (𝑑𝑒𝑔)|’s exponential decay as 𝑑𝑒𝑔 increases. The DyGraph Generator determines how many edges to add and which pairs of vertices to connect with those edges, so to attain the power-law properties specified by 𝐾𝑠 and 𝑆𝑑.
  • 4. GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al. Bin 0 Deg 14-23 10 # Vertices 100 1 10 1k 10k 100k Degree 20 30 40 Bin 1 Deg 24-33 Etc… Threshold LocalMax BinWidth BinSize binning starts Power Law Figure 3: Schematic of the DyGraph Generator’s binning process to approximate real-world power-law graphs. Degree distribution is set by Power Law for low-degree vertices and bin parameters for high-degree vertices. 3.1.3 Limitation of basic power-law distribution. We found that many real-world power-law graphs do not fit precisely in a power- law distribution, demonstrated in Section 4.2. This is expected, as Power Law provides an efficient mechanism to approximate many graphs arising from real-world situations, but it is still just an approximation. The most impactful difference between an ideal power-law graph and those we have observed in our real-world datasets lies in the frequency of the high-degree vertices, that is, those few vertices that have a high number of incident edges. Indeed, many real graphs have more such vertices than predicted by a basic power-law distribution. Figure 3 provides a schematic of this divergence for a graph representing how the power-law distribution fitted for this graph effectively approximates low-degree vertices on the left part of the figure, but fails to model the high-degree vertices on the right. In striving to design a high-accuracy generator, we split each degree in the distribution into one of two sections: i) Power Law for low-degree vertices and ii) a process leveraging vertex-binning for high-degree vertices, as discussed below. A user-defined thresh- old determines which of the two approaches DyGraph will use in modeling the connectivity of each vertex: the threshold separates high and low degrees: 𝑚𝑖𝑛𝐷𝑒𝑔 = 𝑚𝑖𝑛(𝑑𝑒𝑔) 𝑠.𝑡. (|𝑉 (𝑑𝑒𝑔)| < 𝐻) (2) where 𝐻 is the threshold (see Figure 3). Vertices with degrees lower than 𝑚𝑖𝑛𝐷𝑒𝑔 are connected based on Power Law; vertices with higher degrees follow the binning process described below in Sec- tion 3.1.4. We found that the exact value of 𝐻 has little effect on the quality of the real-world graph approximation, because real-world graphs diverge slowly from the power-law function as vertices’ degrees increase. We empirically found that several graphs deviate from Power Law at 𝐻 ≈ 10, so we set 𝐻 = 10. 3.1.4 Binning for power-law graphs. In order to model the vertices’ distribution in the segment where 𝑑𝑒𝑔 > 𝑚𝑖𝑛𝐷𝑒𝑔, which is the long tail of the power-law distribution, we chose to approximate the tail as a series of boxes, each of width binWidth and of height localMax. We call these boxes “bins.” Below we describe the key traits we capture for each bin. Table 2: DyGraph Power-law Command Parameters Param. Description 𝐾𝑠 Power-law scaling factor 𝑆𝑑 Power-law decay factor (< 0) maxDeg Maximum degree among all vertices bins # bins for power-law add commands binID Bin ID number (0..𝑏𝑖𝑛𝑠 − 1) binWidth # degrees in the bin binSize # degrees in the bin with > 0 vertices localMax Max # vertices with the same degree Bins represent degree intervals; that is, each vertex whose degree is within a certain range is assigned to the same bin. The “add edge power law” command allows users to specify the number of bins, while the DyGraph Generator labels each bin with its own identifier binID. The first bin begins at degree 𝑚𝑖𝑛𝐷𝑒𝑔. The last bin contains maxDeg, that is the highest degree of any vertex in the graph. Each bin is specified by three key parameters: binWidth, localMax, and binSize. BinWidth is the width of the interval in degrees assigned to the bin (i.e., the degree capacity of a bin). The DyGraph Generator creates bins that have all the same binWidth. LocalMax is the maximum number of vertices with a same degree within the bin. BinSize is the number of degree values within a bin that have at least one vertex at that degree, that is, |𝑉 (𝑑𝑒𝑔)| > 0. Each bin has a specific localMax and binSize. Table 2 summarizes all parameters described. The binWidth parameter determines the granularity we use in modeling the distribution. We use localMax to capture how vertices in each bin become more sparse as degree increases. In other words, monitoring localMax helps us avoid the generation of spikes in the distribution. We track the BinSize parameter because the far end of the tail is often rarified; that is, there are very few vertices, and many degrees have no vertex associated with them. BinSize and localMax help us capture and reproduce that sparsity. Consider the example in Figure 4. The first degree with fewer than 10 vertices is degree 50, so minDeg is 50. BinWidth has been set to a 100 and bin 0 thus spans degrees 50 to 149. Bin 0 has a binSize of 97 and a localMax of 18. If the DyGraph Generator were to add edges so to preserve the currently captured traits of the graph distribution, it should add edges only to vertices whose degree matches that of the 97 unique degrees, between 50 and 149, that already have at least one vertex. In addition, it would not modify the localMax, thus, each degree would have no more than 18 vertices mapped to it. A similar analysis would take place for bin 1, where the DyGraph Generator would only add edges so that vertices only fall into one of the 58 degrees, between 150-249, that already had vertices mapped to it. The DyGraph Generator would also maintain the localMax for vertices in this bin as 5. 3.1.5 Adding edges using binning. The above power-law and bin- ning structure provide a complete map of how new edges should be distributed over the entire graph (set of vertices). Once this map is computed, the DyGraph Generator adds edges such that the resulting graph’s degree distribution aligns with the new dis- tribution. Note that the user controls the degree distribution of the final graph, not the specific number of edges to add. To this
  • 5. DyGraph: A Dynamic Graph Generator and Benchmark Suite GRADES ’22, June 12, 2022, Philadelphia, PA Table 3: Dynamic Graph Datasets Name Description |V| Com. |E| Max |E| Timesteps Time Span AdTraffic Online advertisement interactions 6M 410M 5.9M 135 30 days Bitcoin User-to-User bitcoin trust network 5k 3.7M 36k 147 N/A Email Internal research institution emails 1k 7.5M 167k 27 2.2 yrs Forum Private college forum interactions 899 33.7k 848 103 N/A Higgs Higgs-Boson Twitter Interactions 456k 34.3M 563k 125 30 days Hospital Patient-to-staff proximity 75 2.2M 32k 141 96 hrs Movies User-movie streaming network 138k 1.48B 16.4M 193 20 yrs Music User-music streaming network 92k 12.7M 186k 184 N/A Ubuntu AskUbuntu user-to-user interactions 159k 70.5M 964k 178 7.3 yrs 100 # Vertices Degree 200 300 400 1k 10k Bin 0 Bin 1 Bin 2 Bin 3 100k 1M 50 150 250 16 12 8 4 Bin 0 LocalMax Bin 1 97/100 58/100 Figure 4: Degree distribution for the Ubuntu dataset at T=50, showing binning parameter values in bins 0 and 1. end, DyGraph first computes the difference between the number of vertices in each bin, and the final number of vertices that should be in each bin. Then DyGraph determines which vertices should be moved to each bin (unless they are already in the correct bin). Finally, each vertex in each bin is assigned a final target degree, and it should be connected to the additional number of edges as required to reach its target degree. To track all the vertices that should be connected to additional edges, we create an edge-addition set, where each vertex is present as many times as the number of ad- ditional incident edges it must receive. For example, if a vertex has 100 neighbors, and the new degree distribution dictates it should have 115 neighbors, the vertex is added to the edge-addition set 15 times. Once the edge-addition set is built, edges are created by connecting two vertices randomly from this set, removing both from the set when connected. Self-loops (edges connecting a vertex to itself) and duplicate edges (edges that already exist) are rejected. These two aspects are easily modifiable by users. The DyGraph Generator allows users to add edges with power- law characteristics through two commands: “build” and “add edge power law.” “Build” is a special case of “add edge power law”, to be used for an initial graph with no existing edges. In that case, the edge-addition set is created solely from a completely new degree distribution (since the existing degree distribution is empty). 3.1.6 Removing edges. There are two ways to remove edges with the DyGraph Generator: by removing a single edge, or by removing multiple edges in a uniformly random fashion. Note that, when removing edges, DyGraph can only choose to remove edges that are already present in the graph. If the existing set of edges are uniformly random among all vertices, then the multiple edge re- moval is also uniformly random. However, if the existing set of edges was distributed in accordance with Power Law, removing edges randomly is thus also naturally abiding the Power Law. 3.1.7 Automated script generation. In addition to enabling users to write their own DyGraph scripts, DyGraph can also automatically generate scripts that mimic an input dynamic graph: it analyzes the graph to extract the parameters described (vertices to add, power- law parameters, bin parameters, etc.), using a pre-set binWidth. This process can be applied to all timestamps, so that the script generates a dynamic graph with the same degree distribution trends. This script can then be used as is, or modified before generating the synthetic graph. 3.2 Datasets We provide a collection of real-world dynamic graph datasets for two reasons. First, researchers may use existing datasets as a fast, valuable first step to measure initial results. Second, users may ana- lyze these existing datasets to find and adjust values for DyGraph’s power-law command parameters. We list the datasets included with DyGraph in Table 3, along with several of their key characteristics. Though the original datasets are publically available in a variety of formats, we have converted them into a consistent format: a series of separate edge-list files, one for each timestep, as described in Section 2. With reference to Table 3, |𝑉 | is the number of unique vertices in the dataset. Combined Edges (𝐶𝑜𝑚.|𝐸|) reports the sum of all edges across all timesteps, counting each occurrence of edges that are listed in multiple discrete timesteps. Max Edges (𝑀𝑎𝑥|𝐸|) reports the peak number of edges in a timestep, thus tracking how large the dynamic graph becomes throughout its evolution. Timesteps reports the number of individual graph states included in the dynamic graph; it also corresponds to the number of separate files. Finally, Time Span is the duration of time represented by the dataset, as reported by its original source. Our dataset repository includes 9 datasets. The AdTraffic dataset [9] is a sample of live online advertisement traffic from Criteo, a computational advertising company. The Bitcoin dataset [18] [17] represents a trust network of users of Bitcoin OTC, an over-the- counter bitcoin trading marketplace. We include three social media datasets: Email, Forum, and Higgs. The Email dataset [20] [36] represents internal email transmissions among members of a Eu- ropean research institute, thus a corporate communication setting. The Forum dataset [27] is the network of both group and direct messages among members of a private forum for college students. The Higgs dataset [8] contains the replies, retweets, and mentions
  • 6. GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al. 1 100 10k 1 1k 1M 1,000 100 10 100 10k 100 10k T=0 T=50 T=100 # Vertices Degree Original Dataset Made by DyGraph Power Law Function 1 1k 1M Figure 5: Log-log plots of degree distributions for the original input AdTraffic dataset (blue) and the synthetic dataset generated by the DyGraph Generator (green) at three time steps: 0, 50, and 100. The plots are overlaid with the power-law function derived from the dataset by DyGraph (pink). # Vertices 1k 1M 1 1k 1M 1 1 100 10k 1 100 10k 5 10 15 Degree Degree Degree T=0 T=10 T=100 T=0 T=10 T=100 T=0 T=10 T=100 0 KS Sd 103.3 -2.3 105.7 -6.1 106.8 -6.0 v=10SidSd Figure 6: Log-log plots of (left) the degree distributions over time for the original AdTraffic dataset, (center) the power-law functions and their parameters derived from the original dataset by DyGraph, and (right) the synthetic dataset generated by DyGraph to match the original. of the Higgs-Boson particle discovery on Twitter. The Hospital dataset [31] maps which patients and staff had close contact in a hospital. We also include two datasets for recommendation system applications. The Music dataset [6] is the network of 2,000 Last.fm users and the music they played. The Movies dataset [14] is the set of 5-star ratings and text reviews of randomly selected users of MovieLens with >20 reviews. Finally, the Ubuntu dataset [28] is the set of user interactions in the AskUbuntu online forum. 4 EXPERIMENTAL EVALUATION In this section, we evaluate how the DyGraph Generator can build graph datasets that closely resemble real-world graphs. We divide our analysis into three sections: defining the evaluation metric, showing how DyGraph can create graphs that mimic real-world graphs, and demonstrating how users can edit these DyGraph scripts to customize their graphs’ properties to their own needs. 4.1 Evaluation Metric To evaluate whether DyGraph creates graphs with similar prop- erties to an input graph, we must choose a metric to compare the two graphs. Common metrics include diameter, clustering coef- ficient, triangle counting, and degree distribution. For this work, metrics must be applicable to all possible real-world graphs, allow us to compare the size of two graphs, and demonstrate the dataset’s sparsity, an essential factor in evaluating performance. Diameter requires that there exist paths from all vertices to all other vertices and ignores any disconnected components. Many real-world graphs do not have this property. Triangle counting is heavily affected by the size of the graph. Clustering coefficient can be measured for all graphs, but it is unclear whether differences in power-law graph sizes affect the clustering coefficient. We use degree distribution, as the only common metric to fully meet these conditions. 4.2 Mimicking Existing Graphs We evaluate whether the DyGraph Generator builds synthetic graphs with degree distributions like those of other input, real- world graphs. We use one of the largest of our dynamic graphs: AdTraffic. As the graph changes over time, the degree distribution also changes, so we evaluate the distribution at multiple timesteps. Figure 5 shows degree distributions of AdTraffic and DyGraph’s generated dataset at three timesteps. For degrees from 1 to 200, the correlation coefficient (𝑟) of the DyGraph-generated dataset against the original is 2.98x closer to ideal (𝑟 = 1) than that of the Power Law function against the original when T=50, and 5.57x closer when T=100. Similarly, the deviations of Power Law and DyGraph from the original (two-sample Kolmogorov–Smirnov metrics) improve from 0.053 to 0.023 for 𝑇 = 50 and 0.086 to 0.015 for 𝑇 = 100. We make a few observations from these plots. First, the power- law function aligns with the AdTraffic dataset only for low-degree vertices (i.e., the left side of each plot). Once the power-law line falls below ∼10 vertices, the degree distributions of both the original and the synthetic dataset diverge from the power-law line (pink). This trait shows, first, that the binning process successfully ensures that the synthetic graph has a similar number of higher-degree vertices, more than would be included by Power Law alone. Second, the larger the graph, the more closely DyGraph’s degree distribution aligns with the original dataset, as DyGraph has more opportu- nities to create the precise distribution that it is targeting. Third, while DyGraph generally creates a degree distribution shape which matches the original far closer than the power-law function, it also tends to create vertices with slightly lower average degree than the Triangle density? Modularity?
  • 7. DyGraph: A Dynamic Graph Generator and Benchmark Suite GRADES ’22, June 12, 2022, Philadelphia, PA 1k 100 10 T=30 # Vertices Degree Original 10k 1M DyGraph DyG (x2) 10k 100 1M 2M 4M 2M # Vertices # Edges 10 20 30 40 50 10 20 30 40 50 T = (a) (b) (c) Figure 7: (a) Log-log plot of degree distribution, (b) vertex count across time, and (c) edge count across time for the original AdTraffic dataset, a matching synthetic DyGraph- generated graph, and a synthetic graph generated by dou- bling the original graph size in the script. original. This deviation is caused by two constraints in DyGraph: self-loops and duplicate edges are eliminated, and vertices with no neighbors are omitted. As DyGraph is an open-source suite, users may remove these constraints if desired. Figure 6 illustrates how the AdTraffic dynamic graph evolves over time and how the DyGraph-generated graph matches this evolution. Figure 6 also highlights how the power-law properties change over time. Initially (T=0 to T=10), more vertices are added and connected to few neighbors, but few existing vertices are con- nected to many neighbors. This leads to a higher scaling factor (𝐾𝑠 rises from 103.3 to 105.7) but a sharper relative decline as degree increases (𝑆𝑑 falls from -2.3 to -6.1). After the initial growth (T=10 to T=100), 𝐾𝑠 continues to increase and 𝑆𝑑 stays steady around -6.0. This reflects a trend in which new vertices with few neighbors continue to join the graph, but many more of the existing vertices connect to additional neighbors already in the graph. 4.3 Customizing Graph Properties As described in Section 3.1, we designed DyGraph to be capable of automatically generating a script which can, in turn, be used to generate a synthetic graph with the same characteristics as the original. We leveraged this approach to provide users the flexibility to modify existing graph properties, and create variants from the original real-world graphs provided with the benchmark suite. To demonstrate this feature, we take the AdTraffic dataset, have Dy- Graph produce the generating script, then modify such script to double both vertices and edges. We edited the script as follows. First, for each command adding 𝑛 vertices, we changed 𝑛 to 2𝑛. Second, for each command adding edges to follow power-law properties, we increased both sections of the degree distribution plot. For the power-law section, we in- creased 𝐾𝑠 so to double the number of vertices at each degree. Fi- nally, for the binning section of the degree distribution, we attained our goal by doubling each localMax. 2k 4k Seconds T= 0 10 20 30 40 Original AdTraffic 2x Vertices 2x Edges 3k 1k 0.57M 1.66M 2.31M 2.72M 3.25M #Edges Figure 8: DyGraph Generator execution time for generating the synthetic graphs of Figure 7. Figure 7 plots the degree distribution of the original AdTraffic dataset, overlaid with that of the matching synthetic DyGraph- generated graph, and also with that of the “doubled” synthetic graph, obtained with the modified script. As the figure presents the graphs in a log-plot, the doubled synthetic graph closely overlaps with the original synthetic graph, indicating extremely similar degree distributions. Figure 7 also reports how the number of vertices and edges change over time. Note how the original graph and the synthetic DyGraph edge-count align closely. In addition, the “doubled” synthetic graph reports approximately double vertices and edges. Note also how the baseline synthetic graph (green) consistently includes slightly fewer vertices than the real-world graph, because of the constraints described in Section 4.2. 4.4 DyGraph Generator Performance The DyGraph Generator is an open-source software, written in C++. Figure 8 reports execution times for building the synthetic graphs identified in Figure 7, timed on a machine using Ubuntu 20.04, an Intel i7-7700, 32GB of memory, and a 2TB HDD. We observe that increasing either dimension of a graph (vertices or edges) extends the execution time. We further observe that increasing |𝐸| takes more time than a similar increase in |𝑉 |. Three factors explain this. First, adding vertices alone requires almost no additional compu- tation, as new vertices start with no neighbors. Runtime increases only because a larger set of vertices increases the size of the distri- bution when adding edges. Second, adding edges with power law properties dominates execution time, as all other graph updates re- quire trivial amounts of compute beyond I/O. Finally, doubling the vertex count without increasing edge count result in more vertices without edges, which are omitted from the output graph. DyGraph is currently single-threaded, and we hope to parallelize DyGraph as part of future work. 5 RELATED WORK Prior works have identified that there are few dynamic graph gen- erators, emphasizing that better such tools are needed [32] [13] [5] [35]. Some static graph generators can be used to build dynamic graphs. Kronecker graph generators [19], for example, use Kro- necker multiplication to iteratively build increasingly-large graphs with power-law degree distribution. Similarly, Barabási-Albert (BA) models [3] iteratively attach new vertices to pre-existing vertices in the graph using preferential attachment. Sets of new Kronecker multiplications or preferential attachments may be used as new graph states, collectively forming a dynamic graph. However, such
  • 8. GRADES ’22, June 12, 2022, Philadelphia, PA McCrabb, et al. generation methods enforce two undesirable restrictions: graph sizes increase monotonically (i.e., vertices and edges are not re- moved) and the density increases as the graph size increases. Görke [10] proposed a model to generate uniformly random graphs to evaluate algorithms for dynamic graph clustering. These graphs follow an evolving ground-truth for clustering, which can be compared against the clusters discovered by the algorithm. Un- like prior work, vertices and edges in this model may be added or removed over time. However, the model can only generate graphs with uniformly-random degree distribution and is thus unable to mimic many real-world applications. Purohit’s more flexible model [30] uses atomic, temporal graph motifs (i.e., sub-graph patterns of ≤ 3 vertices). This model is capa- ble of mimicking many key properties of existing dynamic graphs, including the original dataset’s growth rate, structure, and degree distribution. However, vertices and edges cannot be removed, again limiting its ability to emulate key applications [23]. Waudby [35] extended the LDBC Social Network Benchmark [2] to include insertions and deletions for dynamic graphs using lifespan. However, there is no functionality to mimic properties of existing graphs. This limitation restricts users’ ability to expand their own graphs which have specific desired properties, but are too small to be used for effectively evaluating their solutions. 6 CONCLUSION Dynamic graph processing is quickly becoming a critical area of data mining and analytics. As researchers develop new algorithms, optimizations, and hardware solutions for dynamic graphs, there is an urgent need for more robust infrastructure to evaluate these solutions. In this work, we present DyGraph, a solution for dynamic graph workloads that includes both a wide range of real-world dynamic graph datasets and a novel, flexible DyGraph Generator for synthetic dynamic graph datasets. We demonstrate how the synthetic graphs created by the DyGraph Generator closely mimic the properties of real-world dynamic graphs, attaining 3 to 5.5 times more accurate graph datasets than Power Law. Further, we illustrate that the DyGraph Generator can be leveraged to automatically produce a script describing a real-world graph by analyzing it: such scripts can be later modified to generate new synthetic graphs of any size and any power-law characteristics, allowing users to create variants of real-world datasets to fit their needs and evaluations. ACKNOWLEDGMENTS This work was supported by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA. REFERENCES [1] Abraham Addisie and Valeria Bertacco. 2020. Centaur: Hybrid Processing in On/Off-chip Memory Architecture for Graph Analytics. In Proc. DAC. [2] Renzo Angles, János Benjamin Antal, Alex Averbuch, Peter A. Boncz, Orri Erling, Andrey Gubichev, Vlad Haprian, Moritz Kaufmann, Josep Lluís Larriba-Pey, Nor- bert Martínez-Bazan, József Marton, Marcus Paradies, Minh-Duc Pham, Arnau Prat-Pérez, Mirko Spasic, Benjamin A. Steer, Gábor Szárnyas, and Jack Waudby. 2020. The LDBC Social Network Benchmark. In arXiv CoRR. [3] Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. In Science. [4] Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, and Torsten Hoe- fler. 2021. Practice of streaming processing of dynamic graphs: Concepts, models, and systems. In Proc. TPDS. [5] Angela Bonifati, Irena Holubová, Arnau Prat-Pérez, and Sherif Sakr. 2020. Graph Generators: State of the Art and Open Challenges. In ACM Comput. Surv. [6] Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. Workshop on Information Heterogeneity and Fusion in Recommender Systems. In Proc. RecSys. [7] Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. In Proc. VLDB. [8] Manlio De Domenico, Antonio Lima, Paul Mougel, and Mirco Musolesi. 2013. The Anatomy of a Scientific Rumor. In Nature Sci. Rep. [9] Eustache Diemert, Julien Meynet, Pierre Galland, and Damien Lefortier. 2017. Attribution Modeling Increases Efficiency of Bidding in Display Advertising. In Proc. ADKDD. [10] Robert Görke, Roland Kluge, Andrea Schumm, Christian Staudt, and Dorothea Wagner. 2012. An efficient generator for clustered dynamic random networks. In Proc. MedAlg. [11] Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accel- erator for graph analytics. In Proc. MICRO. [12] Kathrin Hanauer, Monika Henzinger, and Christian Schulz. 2020. Fully dynamic single-source reachability in practice: An experimental study. In Proc. ALENEX. [13] Kathrin Hanauer, Monika Henzinger, and Christian Schulz. 2021. Recent advances in fully dynamic graph algorithms. arXiv preprint. [14] Maxwell Harper and Joseph Konstan. 2015. The MovieLens Datasets: History and Context. In ACM Trans. iiS. [15] Takanori Hayashi, Takuya Akiba, and Ken-ichi Kawarabayashi. 2016. Fully dynamic shortest-path distance query acceleration on massive networks. In Proc. CIKM. [16] Vasiliki Kalavri, Vladimir Vlassov, and Seif Haridi. 2018. High-Level Programming Abstractions for Distributed Graph Processing. In IEEE Trans. KDE. [17] Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and V.S. Subrahmanian. 2018. REV2: Fraudulent User Prediction in Rating Platforms. In Proc. WSDM. [18] Srijan Kumar, Francesca Spezzano, V. S. Subrahmanian, and Christos Faloutsos. 2016. Edge Weight Prediction in Weighted Signed Networks. In Proc. ICDM. [19] Jurij Leskovec, Deepayan Chakrabarti, Jon Kleinberg, and Christos Faloutsos. 2005. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In Proc. ECML PKDD. [20] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph Evolution: Densification and Shrinking Diameters. In ACM Trans. KDD. [21] Jure Leskovec and Rok Sosič. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. In ACM Trans. IST. [22] Zhe Lin, Fan Zhang, Xuemin Lin, Wenjie Zhang, and Zhihong Tian. 2021. Hier- archical core maintenance on large dynamic graphs. In Proc. VLDB. [23] László Lőrincz, Júlia Koltai, Anna Fruzsina Győr, and Károly Takács. 2019. Col- lapse of an online social network: Burning social capital to create it?. In Jour. Soc. Netw. [24] Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proc. SIGMOD. [25] Andrew McCrabb, Eric Winsor, and Valeria Bertacco. 2019. DREDGE: Dynamic repartitioning during dynamic graph execution. In Proc. DAC. [26] Mark Newman. 2005. Power laws, Pareto distributions and Zipf’s law. In Jour. Contemp. Phys. [27] Tore Opsahl. 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. In Jour. Soc. Netw. [28] Ashwin Paranjape, Austin Benson, and Jure Leskovec. 2017. Motifs in Temporal Networks. In Proc. WSDM. [29] Tiago Peixoto. 2020. The Netzschleuder Network Catalogue and Repository. [30] Sumit Purohit, Lawrence Holder, and George Chin. 2018. Temporal graph gener- ation based on a distribution of temporal motifs. In Proc. MLG. [31] Ryan Rossi and Nesreen Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proc. AAAI. [32] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and Tamer Özsu. 2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Pro- cessing. In Proc. VLDB. [33] Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph processing on GPUs: A survey. In CSUR. [34] Philip Stutz, Abraham Bernstein, and William Cohen. 2010. Signal/Collect: Graph Algorithms for the (Semantic) Web. In Proc. ISWC. [35] Jack Waudby, Benjamin Steer, Arnau Prat-Pérez, and Gábor Szárnyas. 2020. Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social Network Benchmark’s Data Generator. In Proc. GRADES-NDA. [36] Hao Yin, Austin Benson, Jure Leskovec, and David Gleich. 2017. Local Higher- Order Graph Clustering. In Proc. SIGKDD.