This document discusses using graphics processing units (GPUs) to perform approximate Bayesian computation (ABC) for parameter estimation of complex models. It describes how GPUs are well-suited for ABC due to their ability to perform linear computations on many threads in parallel. The document provides examples of applying ABC to GPUs for problems involving dynamical systems, network evolution models, and parameter estimation for protein interaction networks.
It's the deck for one Hulu internal machine learning workshop, which introduces the background, theory and application of expectation propagation method.
It's the deck for one Hulu internal machine learning workshop, which introduces the background, theory and application of expectation propagation method.
Typically quantifying uncertainty requires many evaluations of a computational model or simulator. If a simulator is computationally expensive and/or high-dimensional, working directly with a simulator often proves intractable. Surrogates of expensive simulators are popular and powerful tools for overcoming these challenges. I will give an overview of surrogate approaches from an applied math perspective and from a statistics perspective with the goal of setting the stage for the "other" community.
We are interested in finding a permutation of the entries of a given square matrix so that the maximum number of its nonzero entries are moved to one of the corners in a L-shaped fashion.
If we interpret the nonzero entries of the matrix as the edges of a graph, this problem boils down to the so-called core–periphery structure, consisting of two sets: the core, a set of nodes that is highly connected across the whole graph, and the periphery, a set of nodes that is well connected only to the nodes that are in the core.
Matrix reordering problems have applications in sparse factorizations and preconditioning, while revealing core–periphery structures in networks has applications in economic, social and communication networks.
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
We consider the p-Laplacian on discrete graphs, a nonlinear operator that generalizes the standard graph Laplacian (obtained for p=2). We consider a set of variational eigenvalues of this operator and analyze the nodal domain count of the corresponding eigenfunctions. In particular, we show that the famous Courant’s nodal domain theorem for the linear Laplacian carries over almost unchanged to the nonlinear case. Moreover, we use the nodal domains to prove a higher-order Cheeger inequality that relates the k-way graph cut to the k-th variational eigenvalue of the p-Laplacian
Statistical analysis of network data and evolution on GPUs: High-performance ...Michael Stumpf
Talk given on the 25th of January 2012 at the GPU in Statistics workshop in Warwick.
The talk covers approximate Bayesian computation (ABC) on GPUs, how to use spectral graph theory in ABC, and how to generate good random numbers on GPUs.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
Typically quantifying uncertainty requires many evaluations of a computational model or simulator. If a simulator is computationally expensive and/or high-dimensional, working directly with a simulator often proves intractable. Surrogates of expensive simulators are popular and powerful tools for overcoming these challenges. I will give an overview of surrogate approaches from an applied math perspective and from a statistics perspective with the goal of setting the stage for the "other" community.
We are interested in finding a permutation of the entries of a given square matrix so that the maximum number of its nonzero entries are moved to one of the corners in a L-shaped fashion.
If we interpret the nonzero entries of the matrix as the edges of a graph, this problem boils down to the so-called core–periphery structure, consisting of two sets: the core, a set of nodes that is highly connected across the whole graph, and the periphery, a set of nodes that is well connected only to the nodes that are in the core.
Matrix reordering problems have applications in sparse factorizations and preconditioning, while revealing core–periphery structures in networks has applications in economic, social and communication networks.
Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...Francesco Tudisco
We consider the p-Laplacian on discrete graphs, a nonlinear operator that generalizes the standard graph Laplacian (obtained for p=2). We consider a set of variational eigenvalues of this operator and analyze the nodal domain count of the corresponding eigenfunctions. In particular, we show that the famous Courant’s nodal domain theorem for the linear Laplacian carries over almost unchanged to the nonlinear case. Moreover, we use the nodal domains to prove a higher-order Cheeger inequality that relates the k-way graph cut to the k-th variational eigenvalue of the p-Laplacian
Statistical analysis of network data and evolution on GPUs: High-performance ...Michael Stumpf
Talk given on the 25th of January 2012 at the GPU in Statistics workshop in Warwick.
The talk covers approximate Bayesian computation (ABC) on GPUs, how to use spectral graph theory in ABC, and how to generate good random numbers on GPUs.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
High-dimensional polytopes defined by oracles: algorithms, computations and a...Vissarion Fisikopoulos
The processing and analysis of high dimensional geometric data plays a fundamental role in disciplines of science and engineering. A systematic framework to study these problems has been developing in the research area of discrete and computational geometry. This Phd thesis studies problems in this area. The fundamental geometric objects of our study are high dimensional convex polytopes defined byan oracle.The contribution of the thesis is threefold. First, the design and analysis of geometric algorithms for problems concerning high-dimensional convex polytopes, such as convex hull and volume computation and their applications to computational algebraic geometry and optimization. Second, the establishment of combinatorial characterization results for essential polytope families. Third, the implementation and experimental analysis of the proposed algorithms and methods
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
This paper presents a novel technique for
designing an Infinite Impulse Response (IIR) Filter with
Linear Phase Response. The design of IIR filter is always a
challenging task due to the reason that a Linear Phase
Response is not realizable in this kind. The conventional
techniques involve large number of samples and higher
order filter for better approximation resulting in complex
hardware for implementing the same. In addition, an
extensive computational resource for obtaining the inverse
of huge matrices is required. However, we propose a
technique, which uses the frequency domain sampling along
with the linear programming concept to achieve a filter
design, which gives a best approximation for the linear
phase response. The proposed method can give the closest
response with less number of samples (only 10) and is
computationally simple. We have presented the filter design
along with its formulation and solving methodology.
Numerical results are used to substantiate the efficiency of
the proposed method.
The paper examines the problem of systems redesign within the context of passive electrical networks and through analogies provides also the means of addressing issues of re-design of mechanical networks. The problem addressed here are special cases of the more general network redesign problem. Redesigning autonomous passive electric networks involves changing the network natural dynamics by modification of the types of elements, possibly their values, interconnection topology and possibly addition, or elimination of parts of the network. We investigate the modelling of systems, whose structure is not fixed but evolves during the system lifecycle. As such, this is a problem that differs considerably from a standard control problem, since it involves changing the system itself without control and aims to achieve the desirable system properties, as these may be expressed by the natural frequencies by system re-engineering. In fact, this problem involves the selection of alternative values for dynamic elements and non-dynamic elements within a fixed interconnection topology and/or alteration of the network interconnection topology and possible evolution of the cardinality of physical elements (increase of elements, branches). The aim of the paper is to define an appropriate representation framework that allows the deployment of control theoretic tools for the re-engineering of properties of a given network. We use impedance and admittance modelling for passive electrical networks and develop a systems framework that is capable of addressing “life-cycle design issues” of networks where the problems of alteration of existing topology and values of the elements, as well as issues of growth, or death of parts of the network are addressed.
We use the Natural Impedance/ Admittance (NI-A) models and we establish a representation of the different types of transformations on such models. This representation provides the means for an appropriate formulation of natural frequencies assignment using the Determinantal Assignment Problem framework defined on appropriate structured transformations. The developed natural representation of transformations are expressed as additive structured transformations. For the simpler case of RL or RC networks it is shown that the single parameter variation problem (dynamic or non-dynamic) is equivalent to Root Locus problems.
follow IEEE NTUA SB on facebook:
https://www.facebook.com/IeeeNtuaSB
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields are fine tuning methodologies toward these goals that extracts the best of information relevant to the users search query. In the proposed methodology discussed in this paper find ways to ease the search complexity tackling the severe issues hindering the performance of traditional approaches in use. The proposed methodology find effective means to find all possible semantic relatable frequent sets with FP Growth algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal attractive points for the web documents to get clustered meeting the requirement of the search query without losing the relevance. On the whole the proposed system optimizes the objective function of minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all possible relationships with the search context intact. The major contribution being the system finds all possible combinations matching the user search transaction and thereby making the system more meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus being unbiased and maintains a innate behaviour for any number of new additions to follow the herd behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective enhancements over the conventional approaches
Improving initial generations in pso algorithm for transportation network des...ijcsit
Transportation Network Design Problem (TNDP) aims to select the best project sets among a number of new projects. Recently, metaheuristic methods are applied to solve TNDP in the sense of finding better solutions sooner. PSO as a metaheuristic method is based on stochastic optimization and is a parallel revolutionary computation technique. The PSO system initializes with a number of random solutions and seeks for optimal solution by improving generations. This paper studies the behavior of PSO on account of improving initial generation and fitness value domain to find better solutions in comparison with previous attempts.
Similar to Approximate Bayesian Computation on GPUs (20)
Gaining Confidence in Signalling and Regulatory NetworksMichael Stumpf
Mathematical models of signalling and gene regulatory systems are abstractions of much more complicated processes. Even as more and larger data sets are becoming available we are not be able to dispense entirely with mechanistic models of real-world processes; nor should we. However, trying to develop informative and realistic models of such systems typically involves suitable statistical inference methods, domain expertise and a modicum of luck. Except
for cases where physical principles provide sucient guidance it will also be generally possible to come up with a large number of potential models that are compatible with a given biological system and any finite amount of data generated from experiments on that system.
Here I will discuss how we can systematically evaluate
potentially vast sets of mechanistic candidate models in light
of experimental and prior knowledge about biological systems. This enables us to evaluate quantitatively
the dependence of model inferences and predictions on the assumed model structures. Failure to consider the impact of structural uncertainty introduces biases into the analysis and potentially gives rise to misleading conclusions.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Approximate Bayesian Computation on GPUs
1. Approximate Bayesian Computation on Graphical
Processing Units
Thomas Thorne, Juliane Liepe, Sarah
Filippi & Michael P.H. Stumpf
Theoretical Systems Biology Group
06/09/2012
ABC on GPUs Thorne, Liepe, Filippi&Stumpf 1 of 23
2. Trends in High-Performance Computing
CUDA OpenCL MPI OpenMPI PVM
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Introduction 2 of 23
3. (Hidden) Costs of Computing
Prototyping/Development Time is typically reduced for scripting
languages such as R or Python.
Run Time on single threads C/C++ (or Fortran) have better
performance characteristics. But for specialized tasks
other languages, e.g. Haskell, can show good
characteristics.
Energy Requirements Every Watt we use for computing we also have
to extract with air conditioning.
The role of GPUs
• GPUs can be accessed from many different programming
languages (e.g. PyCUDA).
• GPUs have a comparatively small footprint and relatively modest
energy requirements compared to clusters of CPUs.
• GPUs were designed for consumer electronics: computer gamers
have different needs from the HPC community.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Introduction 3 of 23
4. What GPUs are Good At
• GPUs are good for linear threads involving mathematical functions.
• We should avoid loops and branches in the code.
• In an optimal world we should aim for all threads to finish at the
same time.
Execution cycle
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Introduction 4 of 23
5. Approximate Bayesian Computation
We can define the posterior as
f (x |θi )π(θi )
p(θi |x ) =
p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜ = g (y ; θ).
dt
But we can still simulate from the data-generating model, whence
1(y = x )f (y |θi )π(θi )
p(θi |x ) = dy
X p (x )
1 (∆(y , x ) < ) f (y |θi )π(θi )
≈ dy
X p (x )
Solutions for Complex Problems (?)
Approximate (i) data, (ii) model or (iii) distance.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Approximate Bayesian Computation 5 of 23
6. Approximate Bayesian Computation
Prior, π(θ) Define set of intermediate distributions, πt , t = 1, ...., T
1 > 2 > ...... > T
πt −1 (θ|∆(Xs , X ) < t −1 )
πt (θ|∆(Xs , X ) < t)
πT (θ|∆(Xs , X ) < T)
Sequential importance sampling:
Sample from proposal, ηt (θt ) and weight
wt (θt ) = πt (θt )/ηt (θt ) with
ηt (θt ) = πt −1 (θt −1 )Kt (θt −1 , θt )d θt −1 where
Kt (θt −1 , θt ) is Markov perturbation kernel
Toni et al., J.Roy.Soc. Interface (2009); Toni & Stumpf, Bioinformatics (2010).
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Approximate Bayesian Computation 6 of 23
7. Experimental Design for Dynamical Systems
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 7 of 23
8. Experimental Design for Dynamical Systems
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 7 of 23
9. Parameter Estimation for Dynamical Systems
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 8 of 23
10. Parameter Estimation for Dynamical Systems
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 8 of 23
11. Parameter Estimation for Dynamical Systems
Sampling parameters from the ABC posterior yields excellent
agreement with provided data (grey dots). The scatter is entirely
explained by the added noise.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Dynamical Systems 8 of 23
12. Network Evolution Models
(a) Duplication attachment (b) Duplication attachment
with complimentarity
wj
(c) Linear preferential wi
(d) General scale-free
attachment
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 9 of 23
13. ABC on Networks
Summarizing Networks
• Data are noisy and incomplete.
• We can simulate models of network
evolution, but this does not allow us to
calculate likelihoods for all but very
trivial models.
• There is also no sufficient statistic that
would allow us to summarize networks,
so ABC approaches require some
thought.
• Many possible summary statistics of
networks are expensive to calculate.
Full likelihood: Wiuf et al., PNAS (2006).
ABC: Ratman et al., PLoS Comp.Biol. (2008).
Stumpf & Wiuf, J. Roy. Soc. Interface (2010).
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 10 of 23
14. Graph Spectrum
c
a b c d e
0 1 1 1 0 a
a e
d 1 0 1 1 0 b
A = 1 1 0 0 0 c
1 1 0 0 1 d
b 0 0 0 1 0 e
Graph Spectra
Given a graph G comprised of a set of nodes N and edges (i , j ) ∈ E
with i , j ∈ N, the adjacency matrix, A, of the graph is defined by
1 if (i , j ) ∈ E ,
ai ,j =
0 otherwise.
The eigenvalues, λ, of this matrix provide one way of defining the
graph spectrum.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 11 of 23
15. Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,
D (A, B ) = (ai ,j − bi ,j )2 .
i ,j
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 12 of 23
16. Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,
D (A, B ) = (ai ,j − bi ,j )2 .
i ,j
However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance
D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 ,
i ,j
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 12 of 23
17. Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,
D (A, B ) = (ai ,j − bi ,j )2 .
i ,j
However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance
D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 ,
i ,j
Given a spectrum (which is relatively cheap to compute) we have
(α) (β) 2
D (A, B ) = λl − λl
l
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 12 of 23
18. Protein Interaction Network Data
Species Proteins Interactions Genome size Sampling fraction
S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35
0.5
Model Selection
• Inference here was based on all
0.4
the data, not summary
Model probability
0.3
Organism
S.cerevisae statistics.
D.melanogaster
H.pylori
E.coli • Duplication models receive the
0.2
strongest support from the data.
0.1 • Several models receive support
and no model is chosen
0.0
unambiguously.
DA DAC LPA SF DACL DACR
Model
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Network Evolution 13 of 23
19. GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
numerically demanding jobs relatively straightforwardly.
• Whenever poor-man’s parallelization is possible, GPUs offer
considerable advantages over multi-core CPU systems (at
favourable cost and energy requirements).
• Performance depends crucially on our ability to map tasks onto the
hardware.
Challenges
• GPU hardware was initially conceived for different purposes —
computer gamers need fewer random numbers than MCMC or
SMC procedures. The address space accessible to
single-precision numbers also suffices to kill zombies etc .
• Combining several GPUs requires additional work, using e.g. MPI.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf GPU and the Alternatives 14 of 23
20. Alternatives to GPUs: FPGAs
Field Programmable Gate Arrays are configurable electronic circuits
that are partly (re)configurable. Here the hardware is adapted to the
2 D.B. Thomas, W. Luk, and M. Stumpf
problem at hand and encapsulates the programme in its
programmable logic arrays.
Multiple instances in xc4vlx60
Performance (MSwap-Compare/s)
100000
Single Instance in Virtex-4
Quad Opteron Software
10000
1000
100
10
1
4 9 14 19 24
Graph Size
ABC on GPUs Thorne, Liepe, Filippi&Stumpf GPU and the Alternatives 15 of 23
21. Alternatives to GPUs: CPUs (and MPI. . . )
CPUs with multiple cores
40
are flexible, have large
address spaces and a
wealth of flexible numerical 30
routines. This makes
implementation of processor
time
numerically demanding 20
CPU
GPU
tasks relatively
straightforward. In
particular there is less of an 10
incentive to consider how a
problem is best
implemented in software 100 500 1000 5000 10000 50000
size
that takes advantage of
hardware features. 6-core Xeon vs M2050 (448 cores) programmed in
OpenCL
ABC on GPUs Thorne, Liepe, Filippi&Stumpf GPU and the Alternatives 16 of 23
22. (Pseudo) Random Numbers
The Mersenne-Twister is one of the standard random number
generators for simulation. MT19937 has a period of
219937 − 1 ≈ 4 × 106001 .
But MT does not have cryptographic strength (once 624 iterates have
been observed, all future states are predictable), unlike
Blum-Blum-Shub,
xn+1 = xn modM ,
where M = pq with p, q large prime numbers. But it is too slow for
simulations.
Parallel Random Number Generation
• Running RNGs in parallel does not produce reliable sets of random
numbers.
• We need algorithms to produce large numbers of parallel streams
of “good” random numbers.
• We also need better algorithms for weighted sampling.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Random Numbers on GPUs 17 of 23
23. Generating RNGs in Parallel
We assume that we have a function xn+1 = f (xn ) which generates a
stream of (pseudo) random numbers
x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz
Parallel Streams
x0 , x1 , . . .
x0 , x1 , . . .
x0 , x1 , . . .
Sub-Streams
x0 , . . . , xr −1
xr , . . . , xs−1
xs , . . . , xt −1
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Random Numbers on GPUs 18 of 23
24. Counter-Based RNGs
We can represent RNGs as state-space processes,
yn+1 = f (yn ) with f :Y −→ Y
xn+1 = gk ,nmodJ (y n+1/J ) with g :Y × K × ZJ −→ X ,
where x are essentially behaving as x ∼ U[0,1] ; here K is the key
space, J the number of random numbers generated from the internal
state of the RNG.
Salmon et al.(SC11) propose to use a simple form for f (·) and
gk ,j = hk ◦ bk
For a sufficiently complex bijection bk we can use simple updates
and leave the randomization to bk . Here cryptographic routines come
in useful.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Random Numbers on GPUs 19 of 23
25. RNG Performance on CPUs and GPUs
Method Max. Min. Output Intel CPU Nvidia GPU AMD GPU
input state size cpB GB/s cpB GB/s cpB GB/s
Counter-based, Cryptographic
AES(sw) (1+0)×16 11×16 1×16 31.2 0.4 – – – –
AES(hw) (1+0)×16 11×16 1×16 1.7 7.2 – – – –
Threefish (Threefry-4×64-72) (4+4)×8 0 4×8 7.3 1.7 51.8 15.3 302.8 4.5
Counter-based, Crush-resistant
ARS-5(hw) (1+1)×16 0 1×16 0.7 17.8 – – – –
ARS-7(hw) (1+1)×16 0 1×16 1.1 11.1 – – – –
Threefry-2×64-13 (2+2)×8 0 2×8 2.0 6.3 13.6 58.1 25.6 52.5
Threefry-2×64-20 (2+2)×8 0 2×8 2.4 5.1 15.3 51.7 30.4 44.5
Threefry-4×64-12 (4+4)×8 0 4×8 1.1 11.2 9.4 84.1 15.2 90.0
Threefry-4×64-20 (4+4)×8 0 4×8 1.9 6.4 15.0 52.8 29.2 46.4
Threefry-4×32-12 (4+4)×4 0 4×4 2.2 5.6 9.5 83.0 12.8 106.2
Threefry-4×32-20 (4+4)×4 0 4×4 3.9 3.1 15.7 50.4 25.2 53.8
Philox2×64-6 (2+1)×8 0 2×8 2.1 5.9 8.8 90.0 37.2 36.4
Philox2×64-10 (2+1)×8 0 2×8 4.3 2.8 14.7 53.7 62.8 21.6
Philox4×64-7 (4+2)×8 0 4×8 2.0 6.0 8.6 92.4 36.4 37.2
Philox4×64-10 (4+2)×8 0 4×8 3.2 3.9 12.9 61.5 54.0 25.1
Philox4×32-7 (4+2)×4 0 4×4 2.4 5.0 3.9 201.6 12.0 113.1
Philox4×32-10 (4+2)×4 0 4×4 3.6 3.4 5.4 145.3 17.2 79.1
Conventional, Crush-resistant
MRG32k3a 0 6×4 1000×4 3.8 3.2 – – – –
MRG32k3a 0 6×4 4×4 20.3 0.6 – – – –
MRGk5-93 0 5×4 1×4 7.6 1.6 9.2 85.5 – –
Conventional, Crushable
Mersenne Twister 0 312×8 1×8 2.0 6.1 43.3 18.3 – –
XORWOW 0 6×4 1×4 1.6 7.7 5.8 136.7 16.8 81.1
Table 2: Memory and performance characteristics for a variety of counter-based and conventional PRNGs.
Taken from Salmon et al.(see References). a counter type of width c×w bytes and a key type of width
Maximum input is written as (c+k)×w, indicating
k×w bytes. Minimum state and output size are c×w bytes. Counter-based PRNG performance is reported
with the minimal number of Liepe, Filippi&Stumpf
ABC on GPUs Thorne, rounds for Crush-resistance,Numbers onwith extra rounds for “safety margin.” 23
Random and also GPUs 20 of
Performance is shown in bold for recommended PRNGs that have the best platform-specific performance with
26. Conclusions and Challenges
• ABC is perhaps a method best left for cases when all else fails.
• Its use as an inferential framework in its own right is certainly worth
further investigation.
• Memory and address space are potential issues: using single
precision we are prone to run out of numbers for challenging SMC
applications very quickly.
• Bandwidth is an issue — coordination between GPU and CPU is
much more challenging in statistical applications than e.g. for
texture mapping or graphics rendering tasks.
• Programming has to be much more hardware aware than for CPUs
— more precisely, non-hardware adapted programming will be
more obviously less efficient than for CPUs.
• There are some differences from conventional ANSI standards for
mathematics (e.g. rounding).
• Knowing how to harness computational power does allow us to
tackle real-world problems.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Conclusions 21 of 23
27. References
GPUs and ABC
• Zhou, Liepe, Sheng, Stumpf, Barnes (2011) GPU accelerated biochemical network
simulation. Bioinformatics 27:874-876.
• Liepe, Barnes, Cule, Erguler, Kirk, Toni, Stumpf (2010) ABC-SysBioapproximate Bayesian
computation in Python with GPU support. Bioinformatics 26:1797-1799.
• Thorne, Stumpf (2012) Graph spectral analysis of protein interaction network evolution.
J.Roy.Soc. Interface 9:2653-2666.
GPUs and Random Numbers
• Salmon, Moraes, Dror, Shaw (2011) Parallel Random Numbers: As Easy as 1, 2, 3. in
Proceedings of 2011 International Conference for High Performance Computing,
Networking, Storage and Analysis, ACM, New York;
http://doi.acm.org/10.1145/2063384.2063405.
• Howes, Thomas (2007) Efficient Random Number Generation and Application Using CUDA
in GPU Gems 3, Addison-Wesley Professional, Boston;
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html.
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Conclusions 22 of 23
28. Acknowledgements
ABC on GPUs Thorne, Liepe, Filippi&Stumpf Conclusions 23 of 23