The document discusses using OpenCL to accelerate genomic analysis through parallelization. It introduces OpenCL and provides examples of using it to parallelize algorithms for copy number inference in tumors, computing relatedness between individuals, and performing variable selection in regression. Key applications discussed include hidden Markov models for copy number inference, principal component analysis on relatedness matrices, and coordinate descent algorithms for lasso regression. Performance gains of up to 155x are reported for the parallel implementations compared to serial code.
[Japanese]Obake-GAN (Perturbative GAN): GAN with Perturbation Layersyumakishi
Abstract
Obake-GAN (Perturbative GAN), which replaces convolution layers of existing convolutional GANs (DCGAN, WGAN-GP , BIGGAN, etc.) with perturbation layers that adds a fixed noise mask, is proposed. Compared with the convolutional GANs, the number of parameters to be trained is smaller, the convergence of training is faster, the inception score of generated images is higher, and the overall training cost is reduced. Algorithmic generation of the noise masks is also proposed, with which the training, as well as the generation, can be boosted with hardware acceleration. Obake-GAN is evaluated using conventional datasets (CIFAR10, LSUN, ImageNet), both in the cases when a perturbation layer is adopted only for Generators and when it is introduced to both Generator and Discriminator .
修士論文「Obake-GAN: GAN with Perturbation Layers」の発表資料
GANの畳込層の代わりに摂動層を導入し、
・Generator 学習パラメータ52%削減
・Discriminator 学習パラメータ87%削減
・ImageNetでInception Score 45%改善
・学習の収束を高速化
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Deshanand Singh, Director of Software Engineering at Altera, presents the "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Convolutional neural networks (CNN) are becoming increasingly popular in embedded applications such as vision processing and automotive driver assistance systems. The structure of CNN systems is characterized by cascades of FIR filters and transcendental functions. FPGA technology offers a very efficient way of implementing these structures by allowing designers to build custom hardware datapaths that implement the CNN structure. One challenge of using FPGAs revolves around the design flow that has been traditionally centered around tedious hardware description languages.
In this talk, Deshanand gives a detailed explanation of how CNN algorithms can be expressed in OpenCL and compiled directly to FPGA hardware. He gives detail on code optimizations and provides comparisons with the efficiency of hand-coded implementations.
[Japanese]Obake-GAN (Perturbative GAN): GAN with Perturbation Layersyumakishi
Abstract
Obake-GAN (Perturbative GAN), which replaces convolution layers of existing convolutional GANs (DCGAN, WGAN-GP , BIGGAN, etc.) with perturbation layers that adds a fixed noise mask, is proposed. Compared with the convolutional GANs, the number of parameters to be trained is smaller, the convergence of training is faster, the inception score of generated images is higher, and the overall training cost is reduced. Algorithmic generation of the noise masks is also proposed, with which the training, as well as the generation, can be boosted with hardware acceleration. Obake-GAN is evaluated using conventional datasets (CIFAR10, LSUN, ImageNet), both in the cases when a perturbation layer is adopted only for Generators and when it is introduced to both Generator and Discriminator .
修士論文「Obake-GAN: GAN with Perturbation Layers」の発表資料
GANの畳込層の代わりに摂動層を導入し、
・Generator 学習パラメータ52%削減
・Discriminator 学習パラメータ87%削減
・ImageNetでInception Score 45%改善
・学習の収束を高速化
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/altera/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Deshanand Singh, Director of Software Engineering at Altera, presents the "Efficient Implementation of Convolutional Neural Networks using OpenCL on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Convolutional neural networks (CNN) are becoming increasingly popular in embedded applications such as vision processing and automotive driver assistance systems. The structure of CNN systems is characterized by cascades of FIR filters and transcendental functions. FPGA technology offers a very efficient way of implementing these structures by allowing designers to build custom hardware datapaths that implement the CNN structure. One challenge of using FPGAs revolves around the design flow that has been traditionally centered around tedious hardware description languages.
In this talk, Deshanand gives a detailed explanation of how CNN algorithms can be expressed in OpenCL and compiled directly to FPGA hardware. He gives detail on code optimizations and provides comparisons with the efficiency of hand-coded implementations.
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
Protein Secondary Structure Prediction using Deep Learning methodsChrysoula Kosma
Presentation for my thesis defence. Topic: Protein Secondary Structure Prediction using Deep Learning Methods.
Full text: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17443
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
Explore the trade-offs of performing linear algebra for data analysis and machine learning using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Apache Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks.
This session will examine three widely-used and important matrix factorizations: NMF (for physical plausibility), PCA (for its ubiquity) and CX (for data interpretability). Learn how these methods are applied to terabyte-sized problems in particle physics, climate modeling and bioimaging, as use cases where interpretable analytics is of interest. The data matrices are tall-and-skinny, which enable the algorithms to map conveniently into Spark’s data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns and provide tuning guidance to obtain high performance. Based on joint work with Alex Gittens and many others.
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
Neuroscience core lecture given at the Icahn school of medicine at Mount Sinai. This is the version 2 of the same topic. I have made some modifications to give a more gentle introduction and add a new example for ngs.plot.
Protein Secondary Structure Prediction using Deep Learning methodsChrysoula Kosma
Presentation for my thesis defence. Topic: Protein Secondary Structure Prediction using Deep Learning Methods.
Full text: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17443
Next-generation sequencing format and visualization with ngs.plotLi Shen
Lecture given at the department of neuroscience, Icahn school of medicine at Mount Sinai. ngs.plot has been published in BMC genomics. Link: http://www.biomedcentral.com/1471-2164/15/284
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
Explore the trade-offs of performing linear algebra for data analysis and machine learning using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Apache Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks.
This session will examine three widely-used and important matrix factorizations: NMF (for physical plausibility), PCA (for its ubiquity) and CX (for data interpretability). Learn how these methods are applied to terabyte-sized problems in particle physics, climate modeling and bioimaging, as use cases where interpretable analytics is of interest. The data matrices are tall-and-skinny, which enable the algorithms to map conveniently into Spark’s data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns and provide tuning guidance to obtain high performance. Based on joint work with Alex Gittens and many others.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
OpenCL applications in genomics
1. Using OpenCL to accelerate
genomic analysis
Gary K. Chen
June 16, 2011
2. An outline
OpenCL Introduction
Copy number inference in tumors
Data considerations
Hidden Relatedness
Variable Selection
3. Scientific Programming on GPGPU
devices
nVidia and ATI are currently market leaders
Very competitive in performance and price
Impressive double-precision performance - though
still about 4 times slower than 32 bit FP
ATI 9370 chipset: 528 64-GFLOPS 4GB GDDR5
$2,399
nVidia Tesla C2050: 520 64-GFLOPS 3GB GDDR5
$2,199
Source: www.sabrepc.com
4.
5. Future multi-core CPUs
Intel’s 48 core SCC chip
Potentially a more powerful solution when
considering data intenstive computing. Not
constrained by PCI bus
10. An outline
OpenCL Introduction
Copy number inference in tumors
Data considerations
Hidden Relatedness
Variable Selection
11. Biology background
DNA
A string with a four letter alphabet: A,C,G,T
Humans have two copies: one from mom, one from
dad
Most of the sequence between two strands is the
same, except for a small proportion
Example sequence: ATATTGC. We could have:
A single nucleotide polymorphism (common point
mutation): ATATAGC
Copy number variants/abberations
(deletions,amplifications,translocations):
AT–GC
ATATTATTATTGC
13. What is observed
Microarray output
Probes are dyed, and microarrays scanned with
CCD cameras
X,Y: Intensities of A and B alleles (two possible
variants)
R = X+Y: Overall intensity
LRR (log2 R ratio): Intensity relative to a standard
intensity
BAF (B allele frequency): Ratio of allelic intensity
between A and B
15. Hidden Markov Model
A formalized statistical model
We want to use information from observables
(LRR,BAF) to infer true state of nature (copy
number, genotype)
Table: Example hidden states from PennCNV software
State CN possible genotypes
1 0 Null
2 1 A,B
3 2 AA,AB,BB
4 2 AA,BB
5 3 AAA,AAB,ABB,BBB
6 4 AAAA,AAAB,AABB,ABBB,BBBB
16. Copy number inference in tumors
Inference is harder!
1. When dissecting breast tissue for example,
stromal (normal cell) contamination is almost
inevitable. Hence you are modeling a mixture of
two or more cell populations
Suppose you have a state assuming normal CN=2,
tumor CN=4,α = .2
e.g. ri = αri,n + (1 − α)ri,t
expected mean intensity: .2(1) + .8(1.68) = 1.544
2. Amplification events can be wilder than
germline (e.g. blood) events, leading to greater
copy number/genotype possibilities
Combine issues 1) and 2) and you can get a huge
search space
19. Algorithm
Initialize
Empirically estimate σ of BAF and LRR
Compute emission matrix O for each state/obs
from a Gaussian pdf
Train: Expectation Maximization
Forward backward: computes posterior probs and
overall likelhood
Baum Welch: Compute MLE of transition
probabilites in matrix T
Traverse state path
Viterbi (dynamic programming): walk the state
path based on max-product
20. Parallel Forward Algorithm
We compute the probability vector at observation
t: f0:t = f0:t−1 TOt
Each state (element of the m-state vector) can
independently compute a sum-product
Threadblocks map to states
Threads calculate products in parallel, followed by
a log2(m) addition reduction
21. Technical issue: Underflow
Tiny probabilities often have to be represented
in log space (even for FP64)
How do we deal with adding log probabilities?
We usually exponentiate, add, then log
Remedy
Add an offset to log before exponentiating
Subtract the offset from the log space answer
26. Performance
Table: One EM iteration on Chr 1 (41,263 SNPs)
states CPU GPU fold-speedup
128 9.5m 37s 15x
512 2h 35m 1m 44s 108x
27. An outline
OpenCL Introduction
Copy number inference in tumors
Data considerations
Hidden Relatedness
Variable Selection
28. Storing data
Global memory
Relatively abundant, but slow
However, even 4GB may be insufficient for modern
datasets
Genotype data
Highly compressible
We only care if a position differs from the
canonical sequence
Thus: AA,AB,BB,NULL are 4 possible genotypes
Should be able to encode this into two bits, so 4
genotypes per byte
29. Possible approaches
Store as a float array
+: Easy to implement
-: Uses 16 times as much memory as needed!
Store as an int array
Allocate a local memory array of 256 rows, 4 cols
for mapping all possible genotype 4-tuples
+: Uses global memory efficiently, maximizes
bandwidth
-: You might not even have enough local memory,
much less for real work
Store as a char array
Right bitshift pairs of bits, then OR mask with 3
+: Uses global memory efficiently, saves on local
memory
-: Threads load a minimum of 4 bytes per word,
you use 25% of available bandwidth
30. One solution: custom container
Idea:
Designate each threadblock to handle 512
genotypes
First 32 threads: each loads a packedgeno t
element
For each of the 32 threads:
Loop four times, extracting each char
Subloop four times, extacting each genotype via
bitshift/mask
32. An outline
OpenCL Introduction
Copy number inference in tumors
Data considerations
Hidden Relatedness
Variable Selection
33. Inferring Relatedness
Inferring relatedness
The human race is one large pedigree
Individuals of the same ethnicity are expected
to share more SNP alleles
We can summarize this relationship through a
correlation matrix called ’K’
34. Uses for the ’K’ matrix
Principal Components Analysis
A singular value decomposition on ’K’
K = VDV
V contains orthogonal axes, facilitating population
structure inference
Estimating heritability
In random effects models
Y = µ + βX + γ 2 K + σ 2 I
2
h2 = γ 2γ 2
+σ
36. Computing K
Essentially a matrix multiplication
(x −2fi )(xik −2f )
Kjk = m m ij 4fi (1−fi ) i
ˆ 1
i=1
Or in another words: K=ZZ’
Including more SNPs adds more precise, subtle
information
Parallel code
Carrying out matrix multiplication is
straightforward on GPU
Matrix multiplication is ideal for GPU: Approx.
240x speedup.
Because K is summed over SNPs, we can split
genotype matrix by subsets of SNPs and run each
K slice in parallel
37. An outline
OpenCL Introduction
Copy number inference in tumors
Data considerations
Hidden Relatedness
Variable Selection
38. Variable Selection
One goal in biomedical research is correlating
DNA variation to disease phenotypes
Genomics technology
The number of subjects n remains about the same
(cost of recruiting, sample preps, etc), while
number of features p is exploding
Rate that data is being generated per dollar
surpasses Moore’s Law
39.
40. Regression
Standard logistic regression
The usual method for hypothesis testing of
candidate predictors
p
log( 1−p ) = βX , p being the probability of affection
We apply Newton-Raphson scoring until f (β) is
maximized.
Logistic regression simple fails whens p > n
L1 penalized regression, aka LASSO
Idea: Fit the logistic regression model, but subject
to a penalty parameter λ
g (β) = f (β) − λ p |βj |
j=1
41. Algorithms for fitting the LASSO
One dimensional Newton Raphson at variable
j:
Cyclic Coordinate Descent
(new ) g β
∆βj = βj − βj = − g βjj
n
1
g (βj ) = xi,j yi − sgn(βj )λ
i=1
1 + exp(xi,j βj yi )
n
2 exp(xi,j βj yi )
g (βj ) = xi,j
i=1
(1 + exp(xi,j βj yi ))2
We cycle through each j until likelihood stops
increasing within some tolerance
Performs great, but only allows parallelization
across samples
ref: Genkin,Lewis,Madigan: Am Stat Assoc 2007 Vol 49,No. 3
42. Distributed GPU implementation
If possible to parallelize across variables, it is
worth splitting up design matrix
For really large dimensions, we can link up an
arbitrary number of GPUs
Message Passing Interface allows us to be
agnostic to physical location of GPU devices
43. Distributed GPU implementation
Approach:
MPI master node delegates heavy lifting to slaves
across network
Master node performs fast serial code, such as
sampling new λ, comparing logLs, broadcasting
gradients, etc.
Network traffic is kept to a minimum
Implemented for Greedy Coordinate Descent and
Gradient Descent
Developed on server at USC Epigenome Center: 2
Tesla C2050s
44.
45. Parallel algorithms for fitting the LASSO
Greedy coordinate descent (ref)
Same algorithm as CCD, except for each variable
sweep, update only j that gives greatest increase in
logL
No dependencies between subjects and variables,
massive parallelization across subjects AND
variables
Ideal if you have a huge dataset, and you want a
stringest type 1 error rate (only care about a few
variables)
Ayers and Cordell, Gen Epi 2010: Permute, and
pick largest λ that allows first “false” variable to
enter
ref: Wu, Lange: Annals Appl Stat 2008 Vol 2,No. 1
47. Overview of Greedy CD algorithm
Newton-Raphson kernel
Each threadblock maps to a block of 512 subjects
(theads) for 1 variable
Each thread calculates subject’s contribution to
gradient and hessian
Sum (reduction) across 512 subjects
Sum (reduction) across subject blocks in new
kernel
Compute log-likelihood change for each
variable (like above).
Apply a max operator (log2 reduction) to
select variable with greatest contribution to
likelihood.
Iterate repeatedly until likelihood increase less
than epsilon
48. Evaluation on large dataset
GWAS data
6,806 subjects in a case control study of prostate
cancer
1,047,986 SNPs typed
Invoke approx. 7 billion threads per iteration
Total walltime for 1 GCD iteration (sweep
across all variables)
15 minutes on optimized serial implementation
split across 2 slave CPUs
5.8 seconds on parallel implementation across 2
nVidia Tesla C2050 GPU devices
155x speed up
49. Parallel algorithms for fitting the LASSO
(Stochastic Mirror) Gradient Descent (ref)
Sometimes, we are interested in tuning λ for say
the best cross validation errors
Greedy descent seems awfully wasteful in that only
one βj is updated
However, we can update all variables in parallel
cycling through subjects
Algorithm
Extremely simple:
−yi
For subject i: gradient gi = (1+exp(xi βyi ))
Update his βi vector, where βi,j = βi,j − ηgi xi,j
η is a learning parameter, set sufficiently small
(e.g. .0001)
ref: Shwartz,Tewari: Proc. 26th Intern. Conf Machine
Learning 2009
50. Gradient descent
Performance
Slow convergence compared to serial cyclic
coordinate descent, but far more scalable
For large lambdas, slower than greedy coordinate
descent
Computation:bandwidth ratio not great
For 1 million SNPs, only about 15x speedup. Far
more SNPs are needed
Technical issues
Must store genotypes in subject major order to
enabled coalesced memory loads/stores
Makes SNP level summaries like means and SDs
difficult to compute.
Heterogeneous data types: floats: (E,ExG),
compresesed chars: (G,GxG)
Memory constrained: can perform interactions on
the fly with SNP major
51. Potential for robust variable selection:
Subsampling:
Applying LASSO once overfits data. Model
selection inconsistent
Subsampling is preferable: Bootstrapping, stability
selection, x-fold cross validation
Number of replicates << number of samples <<
number of features
Bayesian variable selection:
If we assume βLASSO conditionally independent
Master node can (quickly) sample hyperparameters
(e.g. λ) from a prior distribution