STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

STING: Spatio-Temporal Interaction
Networks and Graphs for Intel Platforms
David Bader, Jason Riedy,
David Ediger, Rob McColl,
Timothy G. Mattson∗

24 July 2012

Outline

Motivation

STING for streaming, graph-structured data on Intel-based
platforms

World-leading community detection

Community-building interactions

Plans and direction

Note: Without hard limits, academics will use at least all the time...

2 / 57

(Preﬁx)scale Data Analysis

Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating algorithms
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes

3 / 57

Graphs are pervasive
Graphs: things and relationships
• Different kinds of things, different kinds of relationships, but
graphs provide a framework for analyzing the relationships.
• New challenges for analysis: data sizes, heterogeneity,
uncertainty, data quality.

Astrophysics Bioinformatics Social Informatics
Problem Identifying target Problem Emergent behavior,
Problem Outlier detection
proteins information spread
Challenges Massive data
Challenges Data Challenges New analysis,
sets, temporal variation
heterogeneity, quality data uncertainty
Graph problems Matching,
Graph problems Centrality, Graph problems Clustering,
clustering
clustering ﬂows, shortest paths

4 / 57

No shortage of data...

Existing (some out-of-date) data volumes
NYSE 1.5 TB generated daily, maintain a 8 PB archive
Google “Several dozen” 1PB data sets (CACM, Jan 2010)
Wal-Mart 536 TB, 1B entries daily (2006)
EBay 2 PB, traditional DB, and 6.5PB streaming, 17
trillion records, 1.5B records/day, each web click is
50-150 details.
http://www.dbms2.com/2009/04/30/
ebays-two-enormous-data-warehouses/
Facebook 845 M users... and growing(?)

• All data is rich and semantic (graphs!) and changing.
• Base data rates include items and not relationships.

5 / 57

General approaches

• High-performance static graph analysis
• Develop techniques that apply to unchanging massive
graphs.
• Provides useful after-the-fact information, starting points.
• Serves many existing applications well: market research,
much bioinformatics, ...
• High-performance streaming graph analysis
• Focus on the dynamic changes within massive graphs.
• Find trends or new information as they appear.
• Serves upcoming applications: fault or threat detection,
trend analysis, ...

Both very important to different areas.
GT’s primary focus is on streaming.
Note: Not CS theory streaming, but analysis of streaming data.

6 / 57

Why analyze data streams?

Data transfer
Data volumes • 1 Gb Ethernet: 8.7TB daily at
NYSE 1.5TB daily 100%, 5-6TB daily realistic
LHC 41TB daily • Multi-TB storage on 10GE: 300TB
Facebook, etc. Who daily read, 90TB daily write
knows? • CPU ↔ Memory: QPI,HT:
2PB/day@100%

Data growth
Speed growth
• Facebook: > 2×/yr
• Ethernet/IB/etc.: 4× in next 2
• Twitter: > 10×/yr
years. Maybe.
• Growing sources:
• Flash storage, direct: 10× write,
Bioinformatics, 4× read. Relatively huge cost.
µsensors, security

7 / 57

Intel’s non-numeric computing
program (our perspective)

Supporting analysis of massive graph under rapid change
across the spectrum of Intel-based platforms.

STING on Intel-based platforms
• Maintain a graph and metrics under large data changes
• Performance, documentation, distribution
• Greatly improved ingestion rate, improved example metrics
• Available at http://www.cc.gatech.edu/stinger/.
• Algorithm development
• World-leading community detection (clustering) performance
on Intel-based platforms
• Good for focused or multi-level analysis, visualization, ...
• Developments on community monitoring in dynamic graphs

8 / 57

On to STING...

Motivation

STING for streaming, graph-structured data on Intel-based
platforms
Assumptions about the data
High-level architecture
Algorithms and performance

World-leading community detection

Community-building interactions

Plans and direction

9 / 57

Overall streaming approach

Yifan Hu’s (AT&T) visualization of the
Livejournal data set
Jason’s network via LinkedIn Labs

Assumptions
• A graph represents some real-world phenomenon.
• But not necessarily exactly!
• Noise comes from lost updates, partial information, ...

10 / 57



Assumptions
• We target massive, “social network” graphs.
• Small diameter, power-law degrees
• Small changes in massive graphs often are unrelated.

10 / 57



Assumptions
• The graph changes but we don’t need a continuous view.
• We can accumulate changes into batches...
• But not so many that it impedes responsiveness.

10 / 57

STING’s focus
Control
action prediction

summary
Source data Simulation / query Viz

• STING manages queries against changing graph data.
• Visualization and control often are application speciﬁc.
• Ideal: Maintain many persistent graph analysis kernels.
• One current snapshot of the graph, kernels keep smaller
histories.
• Also (a harder goal), coordinate the kernels’ cooperation.
• STING components:
• Framework for batching input changes.
• Lockless data structure, STINGER, accumulating a typed
graph with timestamps.

11 / 57

STINGER

STING Extensible Representation:

• Rule #1: No explicit locking.
• Rely on atomic operations.
• Massive graph: Scattered updates, scattered reads rarely
conﬂict.
• Use time stamps for some view of time.

12 / 57

STING’s year

Over the past year
• Improved performance on Intel-based platforms by
14×[HPEC12]
• Evaluated multiple dynamic kernels across
platforms[MTAAP12, ICASSP2012]
• Documentation and tutorials
• Assist projects with STING adoption on Intel-based
platforms

13 / 57

STING in use

Where is it?
http://www.cc.gatech.edu/stinger/
Code, development, documentation, presentations...

Who is using it?
• Center for Adaptive Supercomputing Software –
Multithreaded Architectures (CASS-MT) directed by PNNL
• PRODIGAL team for DARPA ADAMS (Anomaly Detection at
Multiple Scales) including GTRI, SAIC, Oregon State U., U.
Mass., and CMU
• DARPA SMISC (Social Media in Strategic Communication)
including GTRI
• Well-attended tutorials given at PPoPP, in MD, and locally.

14 / 57

Experimental setup
Unless otherwise noted
Line Model Speed (GHz) Sockets Cores
Nehalem X5570 2.93 2 4
Westmere X5650 2.66 2 6
Westmere E7-8870 2.40 4 10
• Large Westmere, mirasol, loaned by Intel (thank you!)
• All memory: 1067MHz DDR3, installed appropriately
• Implementations: OpenMP, gcc 4.6.1, Linux ≈ 3.2 kernel
• Artiﬁcial graph and edge stream generated by R-MAT
[Chakrabarti, Zhan, & Faloutsos].
• Scale x, edge factor f ⇒ 2x vertices, ≈ f · 2x edges.
• Edge actions: 7/8th insertions, 1/8th deletions
• Results over ﬁve batches of edge actions.
• Caveat: No vector instructions, low-level optimizations yet.
• Portable: OpenMP, Cray XMT family
15 / 57

STINGER insertion / removal rates

Edges per second

E7−8870 X5570
Aggregate STINGER updates per second

107 q
q q
q q
q
q q
q q q q
q
q
q q Batch size
qqq q
q q
q q
q q q q q
q q q
6.5 q q q q
10 q
qq
q q q
q
q
q
q
q
q q
q
q
q
q 3000000
qqqq q q
q
q q
q
q q q q q q q q
q q q q q
q q
q
qq
q
q
q
q q
q
q
q
q
q q
q q
q q q
q q q q q 1000000
q q
q q q q q q q q
106 q
q qqq
q qq
q q
q
q
q
q q
q
q q qq
q
q q
q 300000
q q q
q
q q q
q q q q
q
q q
q
q
q q q q q
qq q q q q q
qq
q
qq q q q q
q q
q q q q q 100000
5.5 qqq q q
10 q q q q
q
qq
q q
q
q q
q q q
q
q
q
q q
qq
qq q q
q
q
q 30000
q q
qqq q q q q
105 q q q q
qq q
q q
q q
q
q
q
q q q 10000
qq q q q q q
q q q q q
q q q q q
q q
q
q
q q
q
q q
q q q
q
q 3000
q q q
104.5 q q q q
q q q q
q q 1000
q
q q
q 300
104 q
q
q 100
q

20 40 60 80 5 10 15 20
Number of threads

16 / 57

STINGER insertion / removal rates

Seconds per batch, “latency”

E7−8870 X5570
q
q
q
q
Batch size
q
Seconds per batch processed

q q
0 q q 3000000
10 q
q q q
qq q q
q q
q
q q q
q q
q q
q q
q q q q
q
q
q q 1000000
qqq q q q q q q q q q
q q q q
qq q q q q
q q q
q q q q q q 300000
q q
q qqq q
10−1 q q q
q q q q
qq
q q q
q
q q q q q
q
q 100000
q q q
q qq q
q
q
q q q
q
q q q q q
q q q q 30000
q q q q
qq q q q q q
q q q q
q q q
qq q
q q
q q
q q q
q q
q q q q q
10−2
q q q q
q
q
q
q
q
q
q q
q q q q
q
q
q
q q q 10000
q q q q q q
q q
q q q q q q q q
q q
q q q
q q q
q q q q q
q q
q q
q q q q 3000
q
q q
q q q q
qq q q
q
q q q q q q q
q q q
q q q
q q
qq q
q q
q
q q
q
q q q q
q q
q
q q
q
q
q q
q
q q
q q q q 1000
−3 q
10 qq
q q
q
q
q
q
qq
q
q q q q
q 300
q q
q q
q q q
q
q q
q q q 100
q

20 40 60 80 5 10 15 20
Number of threads

16 / 57

Clustering coefficients

• Used to measure “small-world-ness”
[Watts & Strogatz] and potential i m
community structure
• Larger clustering coefficient ⇒ more v
inter-connected
j n
• Roughly the ratio of the number of actual
to potential triangles

• Defined in terms of triplets.
• i – v – j is a closed triplet (triangle).
• m – v – n is an open triplet.
• Clustering coefficient:
# of closed triplets / total # of triplets
• Locally around v or globally for entire graph.

17 / 57

STINGER clustering coefﬁcient rates

Edges per second

E7−8870 X5570
Clustering coefficient updates per second

q
q
q q q
q
105.5 q q q q Batch size
q q q q
q q q
q q q
q qqq q q q
q q q q q q q q
q
q
q q
q q q
q
q 3000000
q q q q q
q
5 q q q q q q q
q
10 qq q q q q q q q qq
q
q q
q
q
q
q
q
q
q
q
q q 1000000
q q
q q
q q qq q
q q q
qq qqq q q q
q q
q
qq
q q q q q q q
q q q 300000
q
q qq
q q q q
q q qqq q
q q
qq q q q q q q qq q
qqq q
q q q
q
q
q q
q q q q
q qq q q
104.5 q qqq
q
q q q
q
q
q
q q
q q q q q
q q q
q
qq q q q q
q 100000
q q q q
q q
q
q qqq q q q qq
q
q q q q
qq
q 30000
q
q
104 q
qq
q
q q q
q
q
q
q q
q q q
q
q q 10000
q q q q
q q q
q q q 3000
q q q
q q
103.5 q
q
q
q
q q q q q 1000
q
q
q 300
103 q q 100
q

20 40 60 80 5 10 15 20
Number of threads

18 / 57

STINGER clustering coefﬁcient rates

Speed-up over static recalculation
Clustering coefficient speedup over recalculation

E7−8870 X5570
q
q
qq
qq
q qq q
q
q
qq q
q
q qq
q qq
q q
q
Batch size
q
102 qq q q
q q
q q q
q
q
q
q q 3000000
qq q q qq q q
q q q q q q
q q q
q q q q q q
q
q q q q
q
q
q
q q q q
q
q
q
q
q
q
q
q q q 1000000
q q q q q q
q q
q
q q q q q
q
q q q q q q
q q q q q q q q
q 300000
q q q
101 q q q q q q
q q q q
q q
q q
q
q
q
q
q
q 100000
q q q q q
q
q q
q q q
q
qq q q q
qq
q
q q q 30000
q q q q q q
q
q q qq q q q
q q q
q
qq
q
q q
q
q q
q
q
q
q 10000
0 q q q
10 q q
q
q
q
q
q
q q q q q q q q q
q q
q
q
q q q 3000
q q q
q q
q q q q
q
q q
q
q
q
q q
q
q 1000
q q q q
q q q q q q
qqq q
−1 q q
q q
q
q
q 300
q
10 q q q q q q q q
q
q q
q q
q
q q
q 100
q q q
q q q q q

20 40 60 80 5 10 15 20
Number of threads

18 / 57

Connected components

• Maintain a mapping from vertex to
component.
• Global property, unlike triangle
counts
• In “scale free” social networks:
• Often one big component, and
• many tiny ones.
• Edge changes often sit within
components.
• Remaining insertions merge
components.
• Deletions are more difﬁcult...

19 / 57

Connected components: Deleted
edges

The difﬁcult case
• Very few deletions matter.
• Maintain a spanning tree, &
ignore deletion if
1 not in spanning tree,
2 endpoints share a
common neighbor∗ ,
3 loose endpoint reaches
root∗ .
• In the last two (∗), also ﬁx
the spanning tree.
• Rules out 99.7% of
deletions.
• (Long history, see related
work in [MTAAP11].)
20 / 57

STINGER component monitoring
rates

Edges per second

E7−8870 X5570
Connected component updates per second

105.5 q
q q q
q
q q
q q
q q q q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
qq q
q q q q q
q q q q q q q q
q q
q q
qq
qq q
q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
qqq q
q
q
q
q q
q q
q
q
Batch size
qq
q
q q q q q
q q q
q q q q qqq q q q
q q q q
q q q q q q q q qq
105 q q q q qq q
q
q
q q q
q q
q
q q q q
q
q
q 3000000
q
q q q q q q
q q q q
q q q
q q q q
q q q q
q q q q
q q
q q
q q
q
q
q
q 1000000
q q
104.5 q
qq qq
qq
q
q q
q
q
q
q q q q
q
q
q
q
q
q
q
q
q
q
q
q q q
q q
q q 300000
q q
q q q q
q
104 q
q
q
q q
q q
q q
q q
q q q q
q
q
q q
q
q
q q 100000
qq q q q q q q q
q
q q q
q q
q q
q q
q 30000
q q q
q q q q
q
q q
103.5 qq
q q
q q q q q
q q
qq
qq q q q
q q
q
q
q q 10000
q q
q q
q q
q
q q
q
3 q
q q 3000
10 q
q
q
q
q 1000
q
102.5 q 300
q
q 100
102 q

20 40 60 80 5 10 15 20
Number of threads

21 / 57

STINGER component monitoring
rates

Seconds per batch, “latency”

E7−8870 X5570
q
q
q
q q
q
qqq qq
q
q Batch size
qqq q q q q q q q
q q q q q q
Seconds per batch processed

q q q q q
q q q
q
101 q q
q q 3000000
q
qq q
q
qq q
qqq q q q
q q q q
q q
q
q q
q q 1000000
q q
q q q q q
q q q q
q q q q q q q q
100.5 q
q q
q q q
q 300000
q q
q q q q
q
q qqq q
q q q q
q q q
qq
q q q q q q q q q
q q
q
q q
q
q q
q 100000
qq q q
q q q q
100 q q
q
q
q q
q 30000
qq q
q q q
q q
qq q q q q q
q q
q
q
q q
q q q
q q q q
q
q q
q qq
q
qq q q q q
q q
q
q q 10000
10−0.5 q
q q q q
q
q qq
q
q q
q q q q
q q
q
q q
q
qq
q
q q
q 3000
q
q qq q q q q q q
q
q q q
q q
q
q q q q q
−1
q
q q q
q
q
q
q q
q q q
q
q
q
q 1000
q q q
10 q
qqqq
q qqq q q q q q
q
q q
q
q
q
q
q q q q q
q q
q q q q q
q q q
q
q
q q
q
q q q 300
q
qq q q q q
q q
q q q
q
qqq q q q q
q q q q q
q q
q q q q q q q q
−1.5 q
qq
qq q
q q
q
q
q
q
q
q
q
q q
q q q
q
q
q
q
q
q q
q
q
q q 100
10 q q
q
q q
q q
q q q
q
q
q q
q
q
q q
q
q
q
q
q

20 40 60 80 5 10 15 20
Number of threads

21 / 57

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms

Similar to STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms (20)

More from Jason Riedy

More from Jason Riedy (20)

Recently uploaded

Recently uploaded (20)

STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms