This document discusses tracking the structure of streaming social networks. It provides an overview of monitoring clustering coefficients and connected components in dynamic social networks. Initial work on community detection in these networks is also mentioned. The document notes that analyzing massive streaming complex networks is required to understand topics like disease spread, understanding online communities, and performing large-scale simulations. It provides examples of mining Twitter for social good and protecting public health through nationwide surveillance. Graph problems that arise from analyzing large-scale data are discussed.
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue'' architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments.
We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms. We argue that current tools like the Slurm resource manager can support new rogues without major infrastructure changes.
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
In one classic sense a rogue is someone who goes their own way, who breaks away from the crowd. The CRNCH Rogues Gallery aims to support computer architecture rogues by being a physical and virtual space providing access to novel computing architectures. Researchers find applications, and architects discover what happens when their prototypes hit reality. Our goals are to help kick-start software ecosystems, train students in novel system evaluation and use, and provide rapid feedback to architects. By exposing students and researchers to this set of unique hardware, we foster cross-cutting discussions about hardware designs that will drive future performance improvements in computing long after the Moore’s Law era of “cheap transistors” ends. We provide a brief description of the current Rogues Gallery along with successes and research highlights over the last year.
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProduct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augmentedMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33% speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
The Rogues Gallery is a new concept focused on developing our understanding of next-generation hardware with a focus on unorthodox and uncommon technologies. This project, initiated by Georgia Tech's Center for Research into Novel Computing Hierarchies (CRNCH), will acquire new and unique hardware (ie, the aforementioned "rogues") from vendors, research labs, and startups and make this hardware available to students, faculty, and industry collaborators within a managed data center environment. By exposing students and researchers to this set of unique hardware, we hope to foster cross-cutting discussions about hardware designs that will drive future performance improvements in computing long after the Moore's Law era of "cheap transistors" ends.
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in computer network security, social media analysis,and other areas rely on analyzing a changing environment. The data is rich in relationships and lends itself to graph analysis. Traditional static graph analysis cannot keep pace with network security applications analyzing nearly one million events per second and social networks like Facebook collecting 500 thousand comments per second. Streaming frameworks like STINGER support ingesting up three million of edge changes per second but there are few streaming analysis kernels that keep up with these rates. Here we present a new algorithm model for applying complex metrics to a changing graph. In this model, many more algorithms can be applied without having to stop the world.
High-Performance Analysis of Streaming Graphs Jason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STINGER.
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
Algorithm for efficiently and accurately updating PageRank as the graph changes from a stream of updates. Also includes needs from the upcoming GraphBLAS to support high-performance streaming graph analysis.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue'' architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments.
We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms. We argue that current tools like the Slurm resource manager can support new rogues without major infrastructure changes.
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
In one classic sense a rogue is someone who goes their own way, who breaks away from the crowd. The CRNCH Rogues Gallery aims to support computer architecture rogues by being a physical and virtual space providing access to novel computing architectures. Researchers find applications, and architects discover what happens when their prototypes hit reality. Our goals are to help kick-start software ecosystems, train students in novel system evaluation and use, and provide rapid feedback to architects. By exposing students and researchers to this set of unique hardware, we foster cross-cutting discussions about hardware designs that will drive future performance improvements in computing long after the Moore’s Law era of “cheap transistors” ends. We provide a brief description of the current Rogues Gallery along with successes and research highlights over the last year.
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProduct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augmentedMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33% speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
The Rogues Gallery is a new concept focused on developing our understanding of next-generation hardware with a focus on unorthodox and uncommon technologies. This project, initiated by Georgia Tech's Center for Research into Novel Computing Hierarchies (CRNCH), will acquire new and unique hardware (ie, the aforementioned "rogues") from vendors, research labs, and startups and make this hardware available to students, faculty, and industry collaborators within a managed data center environment. By exposing students and researchers to this set of unique hardware, we hope to foster cross-cutting discussions about hardware designs that will drive future performance improvements in computing long after the Moore's Law era of "cheap transistors" ends.
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in computer network security, social media analysis,and other areas rely on analyzing a changing environment. The data is rich in relationships and lends itself to graph analysis. Traditional static graph analysis cannot keep pace with network security applications analyzing nearly one million events per second and social networks like Facebook collecting 500 thousand comments per second. Streaming frameworks like STINGER support ingesting up three million of edge changes per second but there are few streaming analysis kernels that keep up with these rates. Here we present a new algorithm model for applying complex metrics to a changing graph. In this model, many more algorithms can be applied without having to stop the world.
High-Performance Analysis of Streaming Graphs Jason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STINGER.
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
Algorithm for efficiently and accurately updating PageRank as the graph changes from a stream of updates. Also includes needs from the upcoming GraphBLAS to support high-performance streaming graph analysis.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Full-RAG: A modern architecture for hyper-personalization
Graph Exploitation Seminar, 2011
1. Tracking Structure of Streaming Social
Networks
Jason Riedy, David Ediger, David A. Bader, & Henning
Meyerhenke
2. Overview
• Background: Georgia Tech's focus
• Dynamic social networks: Target rates of change
and software approach
• First proof-of-concept: Monitoring clustering
coefficients
• Monitoring connected components
– Edge insertions trivial, deletions...
• Initial work on community detection
• Forward directions
• Code: See
http://www.cc.gatech.edu/~bader/code.html
Jason Riedy, GraphEx 2011 2
3. Exascale Streaming Data
Analytics: Real-world challenges
All involve analyzing massive
streaming complex networks:
Exponential growth:
•Health care disease spread, detection
and prevention of epidemics/pandemics More than 750 million active users
(e.g. SARS, Avian flu, H1N1 “swine” flu)
•Massive social networks
understanding communities, intentions,
population dynamics, pandemic spread,
transportation and evacuation
•Intelligence business analytics,
anomaly detection, security, knowledge Sample queries:
discovery from massive data sets Allegiance switching:
•Systems Biology understanding identify entities that switch
complex life systems, drug design, microbial communities.
research, unravel the mysteries of the HIV Community structure:
virus; understand life, disease, Ex: discovered minimal identify the genesis and
•Electric Power Grid communication, dissipation of communities
changes in O(billions)-size
transportation, energy, water, food supply Phase change: identify
complex network that could significant change in the
•Modeling and Simulation Perform full-
hide or reveal top influencers network structure
scale economic-social-political simulations
in the community
REQUIRES PREDICTING / INFLUENCE CHANGE IN REAL-TIME AT SCALE
Jason Riedy, GraphEx 2011 3
4. Example: Mining Twitter
for Social Good
ICPP 2010
Image credit: bioethicsinstitute.org
Jason Riedy, GraphEx 2011 4
5. Massive Data Analytics:
Protecting our Nation
Public Health
US High Voltage Transmission Grid • CDC / Nation-scale surveillance of
(>150,000 miles of line) public health
• Cancer genomics and drug design
– computed Betweenness Centrality
of Human Proteome
ENSG0
Human Genome core protein interactions 000014
Degree vs. Betweenness Centrality
5332.2
1e+0
Kelch-
like
1e-1 protein
8
Betweenness Centrality
1e-2
implicat
1e-3 ed in
1e-4
breast
cancer
1e-5
1e-6
1e-7
1 10 100
Degree
Jason Riedy, GraphEx 2011 5
6. Graphs are pervasive
in large-scale data analysis
• Sources of massive data: petascale simulations, experimental
devices, the Internet, scientific applications.
• New challenges for analysis: data sizes, heterogeneity, uncertainty,
data quality. Social Informatics
Bioinformatics Problem: Discover emergent
Astrophysics Problems: Identifying drug
communities, model spread of
Problem: Outlier detection. target proteins, denovo
information.
Challenges: massive datasets, assembly.
Challenges: new analytics routines,
temporal variations. Challenges: Data heterogeneity,
uncertainty in data.
Graph problems: clustering, quality.
Graph problems: clustering,
matching. Graph problems: centrality,
shortest paths, flows.
clustering, path-finding.
Image sources: (1) http://physics.nmt.edu/images/astro/hst_starfield.jpg
(2,3) www.visualComplexity.com
Jason Riedy, GraphEx 2011 6
7. Informatics graphs are tough
• Very different from graphs in scientific
computing!
– Graphs can be enormous, billions of vertices and hundreds
of billions of edges!
– Power-law distribution of the number of neighbors
– Small world property – no long paths
– Very limited locality, not partitionable
• Perhaps clusterable?
– Highly unstructured
– Edges and vertices have types
Six degrees of Kevin Bacon
Source: Seokhee Hong
• Experience in scientific computing applications
provides only limited insight.
Jason Riedy, GraphEx 2011 7
8. Center for Adaptive Supercomputing
Software for MultiThreaded Architectures (CASS-MT)
• Launched July 2008
• Pacific-Northwest Lab
– Georgia Tech, Sandia, WA State, Delaware
• The newest breed of supercomputers have hardware set up not just
for speed, but also to better tackle large networks of seemingly
random data. And now, a multi-institutional group of researchers has
been awarded over $14 million to develop software for these
supercomputers. Applications include anywhere complex webs of
information can be found: from internet security and power grid
stability to complex biological networks.
Image: www.visualComplexity.com
Jason Riedy, GraphEx 2011 8
9. Ubiquitous High Performance
Computing (UHPC)
Goal: develop highly parallel, security enabled, power efficient
processing systems, supporting ease of programming, with resilient
execution through all failure modes and intrusion attacks
Architectural Drivers:
Energy Efficient
Security and Dependability
Programmability
Program Objectives:
One PFLOPS, single cabinet including self-contained cooling
50 GFLOPS/W (equivalent to 20 pJ/FLOP)
Total cabinet power budget 57KW, includes processing
resources, storage and cooling
Security embedded at all system levels
Parallel, efficient execution models
Highly programmable parallel systems
David A. Bader (CSE)
Scalable systems – from terascale to petascale
Echelon Leadership Team
“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch
Echelon: Extreme-scale Compute Hierarchies with
Efficient Locality-Optimized Nodes
10. PRODIGAL: SAIC
Georgia Tech Research Institute
Proactive Detection of Insider Carnegie-Mellon University
Threats with Graph Analysis Oregon State University
University of Massachusetts
and Learning
ADAMS Program Kickoff Meeting, June 6-7, 2011
Tuesday June 7, 2001
12. Workshop on Scalable
Graph Libraries
• Held at Georgia Tech, 29-30 June 2011
– Co-sponsored by PNNL & Georgia Tech
– Working workshop: Attendees collaborated on
common designs and data structures.
• Break-out groups for roadmap, data structure,
hierarchical capabilities, and requirement
discussions.
– 33 attendees from mix of sponsors, industry, and
academia.
– Outline and writing assignments given for a workshop
report. To be finished August 2011.
Jason Riedy, GraphEx 2011 12
13. And more...
• Spatio-Temporal Interaction Networks and Graphs
Software for Intel platforms
– Intel Project on Parallel Algorithms in Non-Numeric
Computing, multi-year project
– Collaborators: Mattson (Intel), Gilbert (UCSD), Blelloch
(CMU), Lumsdaine (Indiana)
• Graph500
– Shooting for reproducible benchmarks for massive graph-
structured data
• System evaluation
– PERCS, looking at Convey systems, etc.
• (If I'm forgetting something, bug me...)
Jason Riedy, GraphEx 2011 13
14. Typical Rates
• Facebook
– 750M+ users (avg. 130 friends)
– 250M+ mobile users
– 30B items shared per month
• 12,000 per second
• Twitter
– Avg. 140M tweets per day (2,000 per sec.)
– 1B queries per day (12,000 per sec.)
– Basic statistics: > 100,000 writes per sec.
GigE: 1.4M packets per second
http://www.facebook.com/press/info.php?statistics
http://engineering.twitter.com/2010/10/twitters-new-search-architecture.html
http://techcrunch.com/2011/02/04/twitter-rainbird/
Jason Riedy, GraphEx 2011 14
15. So what can we handle now?
• Accumulate edge insertions and deletions forward in
an evolving snapshot.
– Idea: Graph too large to keep history easily accessible.
– Want memory for metrics, sufficient statistics.
• Most results on Cray XMT @ PNNL (1TiB).
• Monitor clustering coefficients (number of triangles /
number of triplets)
– Exact: 50k updates/sec, Approx: 193k updates/sec
• A fast approach to tracking connected components
in scale-free streaming graphs
– 240,000 updates per sec
– Speed up from 3x to 20x over recomputation
• Starting on agglomerative community detection...
Jason Riedy, GraphEx 2011 15
16. Interlude: Our assumptions
• A graph is a representation of some
real-world phenomenon.
– Not an exact representation!
– Likely has noise...
• We are targeting “social networks.”
– Small diameter, power-law-ish degrees
– Massive graphs have different
characteristics than smaller samples.
– Have unexploited semantic aspects.
• Graphs change, but we don't need a
continuous view.
Jason Riedy, GraphEx 2011 16
17. The Cray XMT
• Tolerates latency by massive multithreading.
– Hardware support for 128 threads on each processor
– Globally hashed address space
– No data cache
– Single cycle context switch
– Multiple outstanding memory requests
• Support for fine-grained,
word-level synchronization
– Full/empty bit associated with every
memory word Image Source: cray.com
• Flexibly supports dynamic load balancing.
• Testing on a 128 processor XMT: 16384 threads
– 1 TB of globally shared memory
Jason Riedy, GraphEx 2011 17
18. Massive streaming data analytics
• Accumulate as much of the recent graph
data as possible in main memory.
Insertions / Pre-process,
Deletions Sort, Reconcile
“Age off” old vertices
STINGER
Alter graph graph
Affected vertices
Update metrics
Change detection
Jason Riedy, GraphEx 2011 18
19. STINGER: A temporal graph
data structure
• Semi-dense
edge list
blocks with
free space
• Compactly
stores
timestamps,
types, weights
• Maps from
application
IDs to storage
IDs
• Deletion by
negating IDs,
Core ideas: Index edges by source vertex and
separate semantic type to provide fast iteration along both.
compaction Group edges to open up low-level parallelism.
For details of vertex & edge attributes, see the tech report: [Bader et al. 2009]
Jason Riedy, GraphEx 2011 19
20. Generating artificial data
• R-MAT (Chakrabarti, Zhan, Faloutsos) as a graph &
edge stream generator
• Generate the initial graph with SCALE and edge
factor F, 2SCALE ∙ F edges
– SCALE 20: 1 M vertices, SCALE 24: 16 M vertices
– Edge factors 8, 16
– Small for testing...
• Generate 1 million actions
– Deletion chance 6.25% = 1/16
– Same RMAT process, will prefer same vertices
• Start with correct data on initial graph
• Update data (clustering coeff., components)
Jason Riedy, GraphEx 2011 20
21. Clustering coefficients
• Vertex-local property: Number of triangles / number of triplets
– Edge insertion & deletion affects local area.
• Can approximate by compressing edge list into Bloom filter.
• Ediger, Jiang, Riedy, Bader. Massive Streaming Data Analytics: A
Case Study with Clustering Coefficients (MTAAP 2010)
– Approximation provides exact result at this scale.
– B is batch size.
Updates per second
Algorithm B=1 B = 1000 B = 4000
Exact 90 25,100 50,100
Approx. 60 83,700 193,300
32 of 64P Cray XMT, 16M vertices, 134M edges
Jason Riedy, GraphEx 2011 21
22. Clustering coefficients performance
• Performance depends on degrees adjacent to
changes, so scattered.
• 5k batches run out of parallel work quickly.
• Depending on app, could focus on throughput.
Jason Riedy, GraphEx 2011 22
23. Tracking connected components
• Global property
– Track <vertex, component>
mapping
– Undirected, unweighted graphs
• Scale-free network
– most changes are within one large
component
• Edge insertions
– primarily merge small component
into the one large component
• Deletions
– rarely disconnect components
– small fraction of the edge stream
Jason Riedy, GraphEx 2011 23
24. Insertions: the easy case
Edge insertion (in batches):
• Relabel batch of insertions with
component numbers.
• Remove duplicates, self-edges.
– Now batch is <=(# comp.)2
– Each edge is a component
merge. Re-label smaller.
• Does not access the massive
graph!
– Can proceed concurrently with
STINGER modification.
Jason Riedy, GraphEx 2011 24
25. The problem with deletions
• An edge insertion is at most a merge.
• An edge deletion may split components,
but rarely ever does.
• Establishing connectivity
after a deletion could be a
global operation!
– But only one
component is
sufficiently large...
Jason Riedy, GraphEx 2011 25
26. Not a new problem
• Shiloach & Even (1981): Two breadth-first searches
– 1st to restablish connectivity, 2nd to find separation
• Eppstein et al. (1997): Partition according to degree
• Henzinger, King, Warnow (1999): Sequence & coloring
• Henzinger & King (1999): Partition dense to sparse
– Start BFS in the densest subgraph and move up
• Roditty & Zwick (2004): Sequence of graphs
• Conclusions:
– In the worst case, need to re-run global component
computation.
– Avoid with heuristics when possible.
Jason Riedy, GraphEx 2011 26
27. Handling deletions
Two methods:
• Heuristics (in a moment)
• Collect deletions over larger, multi-batch
epochs.
– Intermediate component information is
approximate.
– Could affect users (e.g. sampling), but
deletions rarely matter.
– Once epoch is over, results are exact.
Jason Riedy, GraphEx 2011 27
28. Tackling deletions at speed...
Rule out effects:
If edge is not in the spanning tree: SAFE
If edge endpoints have a common neighbor: SAFE
If a neighbor can still reach the root of the tree: SAFE
Else: Possibly cleaved a component
Spanning Tree Edge
Non-Tree Edge
Root vertex in black
X
Jason Riedy, GraphEx 2011 28
29. Tackling deletions at speed...
Rule out effects:
If edge is not in the spanning tree: SAFE
If edge endpoints have a common neighbor: SAFE
If a neighbor can still reach the root of the tree: SAFE
Else: Possibly cleaved a component
Spanning Tree Edge
Non-Tree Edge
Root vertex in black
X
Jason Riedy, GraphEx 2011 29
30. Tackling deletions at speed...
Rule out effects:
If edge is not in the spanning tree: SAFE
If edge endpoints have a common neighbor: SAFE
If a neighbor can still reach the root of the tree: SAFE
Else: Possibly cleaved a component
Spanning Tree Edge
Non-Tree Edge
Root vertex in black
X
Jason Riedy, GraphEx 2011 30
31. Performance: XMT & batching
B = 10,000 B = 1,000,000
Insertions only 21,000 931,000
Insertions + Deletion 11,800 240,000
Heuristic (neighbors)
Static Connected 1,070 78,000
Components
Updates per sec, 32 of 128P Cray XMT, 16M vertices, 135M edges
• Greater parallelism within a batch yields higher
performance
• Trade-off between time resolution and update
speed
• Note: Not delaying deletions, only one heuristic.
Jason Riedy, GraphEx 2011 31
32. Batching & data structure trade-off
NV = 1M, NE = 8M NV = 16M, NE = 135M
32 of 128P Cray XMT
• Insertions-only improvement 10x-20x over recomputation
• Triangle heuristic 10x-15x faster for small batch sizes
• Better scaling with increasing graph size than static connected
components
Jason Riedy, GraphEx 2011 32
34. Community detection: Static & scalable
• Agglomerative method: Merge graph vertices to
maximize modularity (or other metric).
• Parallel version:
Compute a
heavy matching,
merge all at
once.
• Scalable with
enough data
(XMT at PNNL to
right, slower on
Intel currently).
• 122M vertices,
2B edges: ~2
hours!
Jason Riedy, GraphEx 2011 34
35. Community detection: Quality?
• Agglomerative method: Merge graph vertices to
maximize modularity (or other metric).
• Parallel version:
Compute
something
different than
sequential.
• Optimizing a
non-linear
quantity in
irregular way...
• (SNAP:
http://snap-graph.sf.net )
Jason Riedy, GraphEx 2011 35
36. Community detection: Streaming...
• Agglomerative method: Merge graph vertices to
maximize modularity (or other metric).
• In progress...
– Algorithm: Maintain agglomerated community
graph. De-agglomerate affected vertices and
restart agglomeration.
– Sequential implementation promising...
• Other variations in progress, also.
– Seed set expansion: Select handful of vertices
and grow a relevant region around them.
Parallel within one set less interesting than
maintaining many...
Jason Riedy, GraphEx 2011 36
37. Conclusions
• Graphs are fun (a.k.a. difficult)!
– Real-world benefits and interest
• Parallel graph algorithms are possible...
but not always obvious.
• Streaming opens up new avenues for
performance.
– Can achieve real-world rates and sizes
now with special architectures (XMT),
close on common ones (Intel).
– Shooting for faster & larger & better & ...
Jason Riedy, GraphEx 2011 37
38. Related publications
• D. A. Bader, J. Berry, A. Amos-Binks, D. Chavarría-Miranda, C. Hastings, K. Madduri, and S.
C. Poulos, “STINGER: Spatio-Temporal Interaction Networks and Graphs (STING)
Extensible Representation,” Georgia Institute of Technology, Tech. Rep., 2009.
• D. Ediger, K. Jiang, J. Riedy, and D. Bader. “Massive Streaming Data Analytics: A Case
Study with Clustering Coefficients,” Fourth Workshop on Multithreaded Architectures and
Applications (MTAAP), Atlanta, GA, April 2010.
• D. Ediger, K. Jiang, J. Riedy, D. Bader, C. Corley, R. Farber, and W. Reynolds. “Massive
Social Network Analysis: Mining Twitter for Social Good,” International Conference on
Parallel Processing, San Diego, CA September 13-16, 2010.
• D. Ediger, J. Riedy, D. Bader, and H. Meyerhenke. “Tracking Structure of Streaming Social
Networks,” Fifth Workshop on Multithreaded Architectures and Applications (MTAAP), 2011.
• J. Riedy, D. Bader, K. Jiang, P. Panda, and R. Sharma. “Detecting Communities from Given
Seed Sets in Social Networks.” Georgia Tech Technical Report
http://hdl.handle.net/1853/36980
• J. Riedy, H. Meyerhenke, D. Ediger, and D. Bader. “Parallel Community Detection for
Massive Graphs,” Ninth International Conference on Parallel Processing and Applied
Mathematics, September 11-14 2011.
Jason Riedy, GraphEx 2011 38
39. References
• D. Chakrabarti, Y Zhan, and C. Faloutsos, “R-MAT: A recursive model for
.
graph mining,” in Proc. 4th SIAM Intl. Conf. on Data Mining (SDM). Orlando,
FL: SIAM, Apr. 2004.
• D. Eppstein, Z. Galil, G. F. Italiano, and A. Nissenzweig, “Sparsification—a
technique for speeding up dynamic graph algorithms,” J. ACM, vol. 44, no.
5, pp. 669–696, 1997.
• M. R. Henzinger and V. King, “Randomized fully dynamic graph algorithms
with polylogarithmic time per operation,” J. ACM, vol. 46, p. 516, 1999.
• M. R. Henzinger, V. King, and T. Warnow, “Constructing a tree from
homeomorphic subtrees, with applications to computational evolutionary
biology,” in Algorithmica, 1999, pp. 333–340.
• L. Roditty and U. Zwick, “A fully dynamic reachability algorithm for directed
graphs with an almost linear update time,” in Proc. of ACM Symposium on
Theory of Computing, 2004, pp. 184–191.
• Y Shiloach and S. Even, “An on-line edge-deletion problem,” J. ACM, vol. 28,
.
no. 1, pp. 1–4, 1981.
Jason Riedy, GraphEx 2011 39