The document discusses new algorithms and architectures for graph analysis on streaming data. It introduces a new algorithm model that allows analysis to run concurrently with graph changes without locking the graph. Some algorithms like degree computation are valid under this model while others like shortest paths may not be. A new architecture called Emu uses lightweight threads that migrate to data, showing better performance on pointer-chasing benchmarks than CPUs by better utilizing memory bandwidth. Overall the document explores new approaches for analyzing massive streaming graphs in real-time.
Simulation-Led Design Using SolidWorks® and COMSOL Multiphysics®Design World
Multiphysics has earned the reputation as an excellent approach for simulation in engineering and science. Applying multiphysics simulation early in the product development process brings you reliable computer models to verify and optimize your designs
This webinar will demonstrate how the COMSOL LiveLink for SolidWorks bridges the gap between design and analysis, integrating real-world simulation right into the CAD design environment of SolidWorks.
Attend this webinar to learn:
The importance of multiphysics modeling for true simulation of real-world applications
How to integrate analysis into the design process
the workflow of modeling with COMSOL Multiphysics and SolidWorks
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
Simulation-Led Design Using SolidWorks® and COMSOL Multiphysics®Design World
Multiphysics has earned the reputation as an excellent approach for simulation in engineering and science. Applying multiphysics simulation early in the product development process brings you reliable computer models to verify and optimize your designs
This webinar will demonstrate how the COMSOL LiveLink for SolidWorks bridges the gap between design and analysis, integrating real-world simulation right into the CAD design environment of SolidWorks.
Attend this webinar to learn:
The importance of multiphysics modeling for true simulation of real-world applications
How to integrate analysis into the design process
the workflow of modeling with COMSOL Multiphysics and SolidWorks
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
GraphChi (Michael Leznik, Head of BI - London, King)
GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
In model-driven software engineering (MDE), model queries are core technologies of many tool and transformation-specific challenges such as design rule validation, model synchronization, view maintenance, simulation and many more. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently face scalability issues that decrease productivity of engineers and increase development costs. Incremental graph queries offer a graph pattern based language for capturing queries. Furthermore, the result set of a query is cached and incrementally maintained upon model changes to provide instantaneous query response time. In this talk, first a brief overview is given on the EMF-IncQuery framework (which is an official Eclipse subproject). Then we discuss how to incorporate incremental queries over a distributed cloud infrastructure (to scale up from a single-node tool to a cluster of nodes) deployed over popular database back-ends (such as Cassandra. 4store, Neo4J, etc). We present our first benchmarking experiments with IncQuery-D to highlight that distributed incremental model queries can perform significantly better than the native query technologies of the underlying database back-end, especially, for complex queries.
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
Structured Streaming is a new API in Spark 2.0 that simplifies the end to end development of continuous applications. One such continuous application is online model updates: Online models are incrementally updated with new data and can be continuously queried while being updated. As a result, they can be fast to train and leverage new data faster than offline algorithms. In this talk, we give a brief introduction the area of online learning and describe how online model updates can be built using structured streaming APIs. The end result is a robust pipeline for updating models that is scalable, fast and fault tolerant.
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks Ryan Rossi
Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also investigate single GPU methods (using multiple cores) and multi-GPU methods that leverage all available GPUs simultaneously for computing induced subgraph statistics. Both methods leverage GPU devices only, whereas the hybrid multi-core CPU-GPU framework leverages all available multi-core CPUs and multiple GPUs for computing graphlets in large networks. Compared to recent approaches, our methods are orders of magnitude faster, while also more cost effective enjoying superior performance per capita and per watt. In particular, the methods are up to 300+ times faster than a recent state-of-the-art method. To the best of our knowledge, this is the first work to leverage multiple CPUs and GPUs simultaneously for computing induced subgraph statistics.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...asimkadav
Machine learning methods, such as SVM and neural net- works, often improve their accuracy by using models with more parameters trained on large numbers of examples. Building such models on a single machine is often impracti- cal because of the large amount of computation required.
We introduce MALT, a machine learning library that inte- grates with existing machine learning software and provides data parallel machine learning. MALT provides abstractions for fine-grained in-memory updates using one-sided RDMA, limiting data movement costs during incremental model up- dates. MALT allows machine learning developers to specify the dataflow and apply communication and representation optimizations. Through its general-purpose API, MALT can be used to provide data-parallelism to existing ML appli- cations written in C++ and Lua and based on SVM, ma- trix factorization and neural networks. In our results, we show MALT provides fault tolerance, network efficiency and speedup to these applications.
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...IJPEDS-IAES
Power electronic circuit simulation today has become increasingly more demanding in both
the speed and accuracy. Whilst almost every simulator has its own advantages and disadvantages,
co-simulations are becoming more prevalent. This paper provides an overview of
the co-simulation capabilities of device-level circuit simulators. More specifically, a listing
of device-level simulators with their salient features are compared and contrasted. The
co-simulation interfaces between several simulation tools are discussed. A case study is
presented to demonstrate the co-simulation between a device-level simulator (PSIM) interfacing
a system-level simulator (Simulink), and a finite element simulation tool (FLUX).
Results demonstrate the necessity and convenience as well as the drawbacks of such a comprehensive
simulation.
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
Graph-structured data in network security, social networks, finance, and other applications not only are massive but also under continual evolution. The changes often are scattered across the graph, permitting novel parallel and incremental analysis algorithms. We discuss analysis algorithms for streaming graph data to maintain both local and global metrics with low latency and high efficiency.
GraphChi (Michael Leznik, Head of BI - London, King)
GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
In model-driven software engineering (MDE), model queries are core technologies of many tool and transformation-specific challenges such as design rule validation, model synchronization, view maintenance, simulation and many more. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently face scalability issues that decrease productivity of engineers and increase development costs. Incremental graph queries offer a graph pattern based language for capturing queries. Furthermore, the result set of a query is cached and incrementally maintained upon model changes to provide instantaneous query response time. In this talk, first a brief overview is given on the EMF-IncQuery framework (which is an official Eclipse subproject). Then we discuss how to incorporate incremental queries over a distributed cloud infrastructure (to scale up from a single-node tool to a cluster of nodes) deployed over popular database back-ends (such as Cassandra. 4store, Neo4J, etc). We present our first benchmarking experiments with IncQuery-D to highlight that distributed incremental model queries can perform significantly better than the native query technologies of the underlying database back-end, especially, for complex queries.
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
Structured Streaming is a new API in Spark 2.0 that simplifies the end to end development of continuous applications. One such continuous application is online model updates: Online models are incrementally updated with new data and can be continuously queried while being updated. As a result, they can be fast to train and leverage new data faster than offline algorithms. In this talk, we give a brief introduction the area of online learning and describe how online model updates can be built using structured streaming APIs. The end result is a robust pipeline for updating models that is scalable, fast and fault tolerant.
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks Ryan Rossi
Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also investigate single GPU methods (using multiple cores) and multi-GPU methods that leverage all available GPUs simultaneously for computing induced subgraph statistics. Both methods leverage GPU devices only, whereas the hybrid multi-core CPU-GPU framework leverages all available multi-core CPUs and multiple GPUs for computing graphlets in large networks. Compared to recent approaches, our methods are orders of magnitude faster, while also more cost effective enjoying superior performance per capita and per watt. In particular, the methods are up to 300+ times faster than a recent state-of-the-art method. To the best of our knowledge, this is the first work to leverage multiple CPUs and GPUs simultaneously for computing induced subgraph statistics.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...asimkadav
Machine learning methods, such as SVM and neural net- works, often improve their accuracy by using models with more parameters trained on large numbers of examples. Building such models on a single machine is often impracti- cal because of the large amount of computation required.
We introduce MALT, a machine learning library that inte- grates with existing machine learning software and provides data parallel machine learning. MALT provides abstractions for fine-grained in-memory updates using one-sided RDMA, limiting data movement costs during incremental model up- dates. MALT allows machine learning developers to specify the dataflow and apply communication and representation optimizations. Through its general-purpose API, MALT can be used to provide data-parallelism to existing ML appli- cations written in C++ and Lua and based on SVM, ma- trix factorization and neural networks. In our results, we show MALT provides fault tolerance, network efficiency and speedup to these applications.
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...IJPEDS-IAES
Power electronic circuit simulation today has become increasingly more demanding in both
the speed and accuracy. Whilst almost every simulator has its own advantages and disadvantages,
co-simulations are becoming more prevalent. This paper provides an overview of
the co-simulation capabilities of device-level circuit simulators. More specifically, a listing
of device-level simulators with their salient features are compared and contrasted. The
co-simulation interfaces between several simulation tools are discussed. A case study is
presented to demonstrate the co-simulation between a device-level simulator (PSIM) interfacing
a system-level simulator (Simulink), and a finite element simulation tool (FLUX).
Results demonstrate the necessity and convenience as well as the drawbacks of such a comprehensive
simulation.
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
Graph-structured data in network security, social networks, finance, and other applications not only are massive but also under continual evolution. The changes often are scattered across the graph, permitting novel parallel and incremental analysis algorithms. We discuss analysis algorithms for streaming graph data to maintain both local and global metrics with low latency and high efficiency.
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
The Rogues Gallery is a new experimental testbed that is focused on tackling "rogue'' architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments.
We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms. We argue that current tools like the Slurm resource manager can support new rogues without major infrastructure changes.
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
All computing must be parallel to take advantage of modern systems like multicore processors, GPUs, and distributed systems. Results that are not bit-wise reproducible introduce doubt on many levels. Sometimes that is appropriate. Reproducibility limitations occur because underlying libraries do not specify their reproducibility requirements. New advances in interfaces, algorithms, and architectures allow selecting among those requirements in the future. This talk covers many of the upcoming options and their trade-offs.
In one classic sense a rogue is someone who goes their own way, who breaks away from the crowd. The CRNCH Rogues Gallery aims to support computer architecture rogues by being a physical and virtual space providing access to novel computing architectures. Researchers find applications, and architects discover what happens when their prototypes hit reality. Our goals are to help kick-start software ecosystems, train students in novel system evaluation and use, and provide rapid feedback to architects. By exposing students and researchers to this set of unique hardware, we foster cross-cutting discussions about hardware designs that will drive future performance improvements in computing long after the Moore’s Law era of “cheap transistors” ends. We provide a brief description of the current Rogues Gallery along with successes and research highlights over the last year.
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProduct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augmentedMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33% speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
The Rogues Gallery is a new concept focused on developing our understanding of next-generation hardware with a focus on unorthodox and uncommon technologies. This project, initiated by Georgia Tech's Center for Research into Novel Computing Hierarchies (CRNCH), will acquire new and unique hardware (ie, the aforementioned "rogues") from vendors, research labs, and startups and make this hardware available to students, faculty, and industry collaborators within a managed data center environment. By exposing students and researchers to this set of unique hardware, we hope to foster cross-cutting discussions about hardware designs that will drive future performance improvements in computing long after the Moore's Law era of "cheap transistors" ends.
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in computer network security, social media analysis,and other areas rely on analyzing a changing environment. The data is rich in relationships and lends itself to graph analysis. Traditional static graph analysis cannot keep pace with network security applications analyzing nearly one million events per second and social networks like Facebook collecting 500 thousand comments per second. Streaming frameworks like STINGER support ingesting up three million of edge changes per second but there are few streaming analysis kernels that keep up with these rates. Here we present a new algorithm model for applying complex metrics to a changing graph. In this model, many more algorithms can be applied without having to stop the world.
High-Performance Analysis of Streaming Graphs Jason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STINGER.
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
Algorithm for efficiently and accurately updating PageRank as the graph changes from a stream of updates. Also includes needs from the upcoming GraphBLAS to support high-performance streaming graph analysis.
High-performance graph analysis is unlocking knowledge in computer security, bioinformatics, social networks, and many other data integration areas. Graphs provide a convenient abstraction for many data problems beyond linear algebra. Some problems map directly to linear algebra. Others, like community detection, look eerily similar to sparse linear algebra techniques. And then there are algorithms that strongly resist attempts at making them look like linear algebra. This talk will cover recent results with an emphasis on streaming graph problems where the graph changes and results need updated with minimal latency. We’ll also touch on issues of sensitivity and reliability where graph analysis needs to learn from numerical analysis and linear algebra.
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Jason Riedy
High-performance graph analysis is unlocking knowledge in problems like anomaly detection in computer security, community structure in social networks, and many other data integration areas. While graphs provide a convenient abstraction, real-world problems' sparsity and lack of locality challenge current systems. This talk will cover current trends ranging from massive scales to low-power, low-latency systems and summarize opportunities and directions for graphs and computing systems.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Graph Analysis: New Algorithm Models, New Architectures
1. Graph Analysis: New Algorithm Models,
New Architectures
E. Jason Riedy and a large supporting cast of students
Georgia Institute of Technology
ACM Computing Frontiers, May
4. (insert prefix here)-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
Changes are important. Cannot stop the world...
New! And Graphs! — ACM CF, May /
5. Potential Applications
• Social Networks
• Identify communities, influences, bridges, trends,
anomalies (trends before they happen)...
• Potential to help social sciences, city planning, and
others with large-scale data.
• Cyber-security
• Determine if new connections can access a device or
represent new threat in <5ms...
• Is data transfer by a virus / persistent threat?
• Bioinformatics, health
• Construct gene sequences, analyze protein
interactions, map brain interactions
• Credit fraud forensics ⇒ detection ⇒ monitoring
• Real-time integration of all the customer’s data
New! And Graphs! — ACM CF, May /
6. Streaming graph data
Network data rates:
• Gigabit ethernet: k – . M packets per second
• Over flows per second on GigE (< . µs)
Person-level data rates:
• M posts per day on Twitter ( k / sec)
• M posts per minute on Facebook ( k / sec)
Should analyze only changes and not entire graph.
Throughput & latency trade off and expose different
levels of concurrency.
www.internetlivestats.com/twitter-statistics/
www.jeffbullas.com/ / / / -awesome-facebook-facts-and-statistics-you-need-to-check-out/
New! And Graphs! — ACM CF, May /
7. Streaming graph analysis
Terminology (not universal):
• Streaming changes into a massive, evolving graph
• Need to handle deletions as well as insertions
Previous STINGER performance results (x - ):
Data ingest > M upd/sec [Ediger, McColl, Poovey, Campbell, &
Bader ]
Clustering coefficients > K upd/sec [R, Meyerhenke, B, E,
& Mattson ]
Connected comp. > M upd/sec [McColl, Green, & B ]
Community clustering > K upd/sec∗
[R & B ]
PageRank Up to × latency improvement [R ]
New! And Graphs! — ACM CF, May /
9. Starting incremental / streaming algorithms
• Incremental and
streaming algorithms
start somewhere.
• Initial, static
computation can take a
rather long time...
• During which the graph
cannot change?
• What about supporting
many simultaneous
analyses?
Data ingest rates, R-MAT into
R-MAT, scales &
●
●
●
●
●
●
1e+02
1e+03
1e+04
1e+05
1e+06
1 10 100 1000 10000 1e+05
Batch size
Updaterate(upd/s)
platform ● Power8 Haswell Haswell−30
What can we run while the graph changes?
New! And Graphs! — ACM CF, May /
10. Starting incremental / streaming algorithms
• Incremental and
streaming algorithms
start somewhere.
• Initial, static
computation can take a
rather long time...
• During which the graph
cannot change?
• What about supporting
many simultaneous
analyses?
Graph
Changes
PageRank
Clustering
Coefficients
Clusters
s-t Path
What can we run while the graph changes?
New! And Graphs! — ACM CF, May /
11. What if we don’t hold up changes?
When is an algorithm valid?
Analyze concurrently with the graph changes, and
produce a result correct for the starting graph and
some implicit subset of concurrent changes.
• No locking beyond atomic operations.
• No versioned data structure.
• No stopping.
Extreme model for extreme data rates.
Chunxing Yin, Riedy, Bader. “Validity of Graph Algorithms on
Streaming Data.” . (in submission)
New! And Graphs! — ACM CF, May /
12. Sample of other execution models
• Put in a query, wait for sufficient data [Phillips, et al.
at Sandia]
• Different but very interesting model.
• Evolving: Sample, accurate w/high-prob.
• Difficult to generalize into graph results (e.g.
shortest path tree).
• Classical: dynamic algorithms, versioned data
• Can require drastically more storage, possibly a copy
of the graph per property, or more overhead for
techniques like read-copy-update.
Generally do not address the latency of computing the
“static” starting point.
New! And Graphs! — ACM CF, May /
13. Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over vertices, count the number of
neighbors.
Compute deg(v ) Compute deg(v )
delete edge
Cannot correspond to an undirected graph at all!
Valid for our model? No!
Not incorrect, just not valid for our model.
New! And Graphs! — ACM CF, May /
14. Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over edges, increment the degrees of
the endpoints.
Inc deg(v ), deg(v ) (later...)
delete edge
Corresponds to the beginning graph plus a subset of
concurrent changes.
Valid for our model? Yes!
Undirected stored as directed: skip edges with v ≥ v .
New! And Graphs! — ACM CF, May /
15. Algorithm validity in our model
s
w(e ) =
w(e ) = →
∆ =
• What is valid?
• Typical (direction optimizing) BFS
• Shiloach-Vishkin connected components
• PageRank, Katz via Jacobi
• Label propagation
• Triangle counting (carefully!)
• Saved decisions (can make a copy)...
• Extracting a subgraph or path.
• What may be invalid?
• Making a decision twice in implementations
• ∆-stepping SSSP: Decrease a weight below ∆
• Degree optimization: Cross threshold, miss vertex
• Applying old or different information
New! And Graphs! — ACM CF, May /
16. Fun properties for one-shot queries
Due to Chunxing Yin, under sensible assumptions:
. You can produce a single-change stream to
demonstrate invalidity.
• Idea: Start with a graph that incorporates all the
visible changes, introduce the one change at the
right time.
. Algorithms that produce a subgraph of their input
cannot be guaranteed to run concurrently with
changes and always produce moment-in-time
outputs.
• Idea: Any time a snapshot result could happen,
delete then re-insert an edge from the output.
New! And Graphs! — ACM CF, May /
17. On to streaming...
Can we update graph metrics as new data arrives without
just re-running?
• Track what changed during the one-shot query.
• Update locally around those changes, while other
changes are occuring.
• If the update is valid, can repeat to follow a
streaming graph.
Initial
∆
Upd. w/∆
∆
Upd. w/∆
∆
Examples: PageRank, refinement. Connected
components, maintain a spanning forest.
New! And Graphs! — ACM CF, May /
18. Open issues
Difficult problems: Updating triangle counts efficiently!
• Option: re-counting a region around changes,
stopping once counts do not change.
• Can mis-count on the region’s border, but only at
changes.
• Next run can fix those... A looser model?
Some algorithms essentially copy subgraphs.
• What are the size bounds?
• Can those bounds characterize algorithms /
properties?
New! And Graphs! — ACM CF, May /
20. Limitations of current architectures
• Graph analysis often uses relatively narrow memory
acceses, e.g. separate -byte integers.
• Currently under-utilizing memory bandwidth.
• One-eighth of a cache line: one-eighth of bandwidth.
• Typical DRAM pages are ≥ KiB. Entire page must be
powered on for an operation.
• New HBM: Kib-wide ⇒ potentially / th
BW
A new approach from Emu Technology: Lightweight
threads migrating to data in narrow-channel DRAM.
New! And Graphs! — ACM CF, May /
21. Emu PGAS architecture
1 nodelet
Gossamer
Core 1
Memory-Side Processor
Gossamer
Core 4
...
Migration Engine
RapidIODisk I/O
8 nodelets
per node
64 nodelets
per Chick
RapidIO
Stationary
Core
• Multithreaded multicore
• Memory-side “processor” for
atomics, etc. w/NCDIMM
• Stationary core for OS
• Threads migrate in
hardware on reads!
New! And Graphs! — ACM CF, May /
22. Emu Chick prototype
Experimental system:
• Soft processors (Arria
FPGAs)
• One Gossamer Core (GC) per
nodelet, max threadlets
• Memory and cores are
under-clocked.
• Firmware bugs limit
inter-node migration, file I/O
New! And Graphs! — ACM CF, May /
23. Pointer chasing benchmark
Data-dependent loads, fine-grained access
Ordered
Intra-block shuffle: weak locality
Full block shuffle: weak locality
Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy.
“An Initial Characterization of the Emu Chick,” AsHES .
New! And Graphs! — ACM CF, May /
24. Pointer Chasing: Intel Xeon
Performance varies drastically.
New! And Graphs! — ACM CF, May /
25. Pointer Chasing: Emu Chick
Matches simulation to a consistent factor of two.
Simulation of larger, full Emu systems shows promising
results... More later.
New! And Graphs! — ACM CF, May /
26. Pointer Chasing: Bandwidth utilization
Full shuffle. Measured against STREAM.
New! And Graphs! — ACM CF, May /
27. Pointer Chasing: Bandwidth scaling
Full machine results. STREAM around GB/s.
Still need many threads, but not as many as MTA/XMT.
(Thanks to Eric Hein.)
New! And Graphs! — ACM CF, May /
28. Pointer Chasing: Bandwidth scaling
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
0
2000
4000
6000
8000
10000
Memorybandwidth(MBs)
1024 threads
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
peak STREAM bandwidth
2048 threads
block_shuffle intra_block_shuffle full_block_shuffle
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
Block size (number of 16B elements)
4096 threads
Pointer Chasing (Emu Chick, 64 nodelets)
Full machine results. STREAM around GB/s.
Still need many threads, but not as many as MTA/XMT.
(Thanks to Eric Hein.)
New! And Graphs! — ACM CF, May /
30. Closing
• Summary
• Analysis concurrent with graph change can work.
• But not all implementations are valid.
• New and novel architectures show promise for
fine-grained access and parallelism.
• Future work
• Track subgraphs / communities for “slow” analyses
• Can offload subgraphs to accelerators?
• Develop more valid updating methods,
approximation results
• Experiment with even more new architectures
New! And Graphs! — ACM CF, May /
31. Introducing the CRNCH Rogues Gallery
A physical & virtual space for hosting novel computing
architectures, systems, and accelerators.
Host / manage remote access for novel architectures!
• Emu Chick
• FPGA + HMC: D stacked
• FPAA: Analog/Neuromorphic
Amortize effort and cost of trying novel architectures.
Break the “but it’s too much work” barrier.
http://crnch.gatech.edu/rogues-gallery
New! And Graphs! — ACM CF, May /