SlideShare a Scribd company logo
1 of 29
Going Server-less for Web-Services that
need to Crunch Large Volumes of Data
HEATH & BIOSECURITY
Dr Denis Bauer | Bioinformatics | @allPowerde
9 Mar 2018 – Continuous Delivery and DevOps Day, Agile India
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
CSIRO
An agile government
research
organization.
Overview
GT-Scan2
A Serverless web-
service for complex
research workflows.
GenPhen DB
A serverless system
for large data.
Cryptobreeder
A Serverless system
that continuously
learns.
Not CSIRO-funded
Team CSIRO
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
5319
talented staff
$1billion+
budget
Working
with over
2800+
industry
partners
55
sites across
Australia
Top 1%
of global
research
agencies
Each year
6 CSIRO
technologies
contribute
$5 billion to
the economy
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
EXTENDED
WEAR
CONTACTS
POLYMER
BANKNOTES
RELENZA
FLU TREATMENT
Fast WLAN
Wireless Local
Area Network
AEROGARD
TOTAL
WELLBEING
DIET
RAFT
POLYMERISATION
BARLEYmax™
SELF
TWISTING
YARN
SOFTLY
WASHING
LIQUID
HENDRA
VACCINE
NOVACQ™
PRAWN FEED
Australia’s innovation catalyst
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
CSIRO
An agile government
research
organization.
Overview
GT-Scan2
A Serverless web-
service for complex
research workflows.
GenPhen DB
A serverless system
for large data.
Cryptobreeder
A Serverless system
that continuously
learns.
Not CSIRO-funded
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Recruiting instantaneous appropriately powered compute
Desktop
compute
High-performance
compute
Hadoop/Spark Serverless
Focus small data Compute-intensive Data-intensive Agility
Fault tolerant No No Yes (Yes)
Node-bound Yes Yes No No
Parallelization 10 CPU 100 CPU 1000 CPU 1000 CPU
Parallelization procedure bespoke bespoke standardized standardized
Overhead in the cloud NA spin-up lag spin-up lag instantaneously
CSIRO solution
Ideal application case for serverless:
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Small tasks…
• Embarrassingly parallele
tasks
…that need to scale
• Unpredictable burstable
workload that needs to
be delivered online
Agility + Scalability =
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Genome Editing (CRISPR) can
correct genetic diseases, such as
hypertrophic cardiomyopathy.
However, editing does not work
every time, e.g. only 7 in 10
embryos were mutation free.
Aim: Develop computational
guidance framework to enable
edits the first time; every time.
Ma et al. Nature 2017 *
* Some controversy around the paper
First serverless research
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Featured in
Used by
GT-Scan2
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
GT-Scan.net
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Interoperable Workflows
• Programmable call to GT-
Scan2 (API)
• Automatic result retrieval
to notebook environment
• Seamless and
reproducible access to
tertiary analytics.
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
This notebook shows the workflow for genome
engineering of finding a specific target site and
then having the result from GT-Scan for direct
visualization.
GenEng
Reproducible Genome
Engineering
Demo GT-Scan2
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
http://gt-scan2.csiro.au/notebook-
casestudy.html
Serverless systems are hard to optimize
• Pay only for what you use
-> Optimize to use as little as possible
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
GT-Scan2 X-Ray Analysis
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
25
50
75
getFastaSequence
createJobtargetScan
offtargetScanStarter
offtargetSearch
targetIntersects
targetTranscriptionIntersects
targetW
uScorer
targetSgR
N
AScorer
O
nTargetScorer
genom
eC
R
ISPR
functions
runtime(s)
Type
base
old
Results – 4x Faster (80% improvement)
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
2 min
30 sec
Using hypothesis-driven
architecture to improve
serverless infrastructure
Architecture as
text
Evolve
Automatic
performance
measure
Evaluate
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
James Lewis
https://www.epsagon.com/
4pm
Kief Morris
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
CSIRO
An agile government
research
organization.
Overview
GT-Scan2
A Serverless web-
service for complex
research workflows.
GenPhen DB
A serverless system
for large data.
Cryptobreeder
A Serverless system
that continuously
learns.
Not CSIRO-funded
CryptoKitties in a nutshell
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
What’s in the genes of your CryptoKitties? By cryptobreeder
Not CSIRO-funded
CryptoBreeder.net
• Objective:
• Build a machine learning
web-service to predict the
‘cattributes’ of the offspring
from a breeding pair
• Problem:
• New ‘cattributes’ emerge all
the time
• Solution:
• Continuously learning model
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Not CSIRO-funded
Uptake: 164 sessions / week
CryptoBreeder.net
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Not CSIRO-funded
Web service Layer
Machine Learning
Layer
Cost to date
• Any guesses?
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Cost to date: AU $24.35
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Ongoing costs:
Still within the 1 Year AWS Free Tier – however already exceeded S3 limit and EC2 not eligible
Uptake: 164 sessions / week
Not CSIRO-funded
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
CSIRO
An agile government
research
organization.
Overview
GT-Scan2
A Serverless web-
service for complex
research workflows.
GenPhen DB
A serverless system
for large data.
Cryptobreeder
A Serverless system
that continuously
learns.
Stephens et al. PLOS Biology 2015
Genomics will outpace other BigData disciplines
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Astronomy
Twitter
YouTube
Genomics
Clinical use of GenPhen DB
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
• Objective:
• Build a web-service that can
query databases in response to a
patient’s genome and medical
record
• Problem:
• Genomic data is so large
• Solution:
• Athena-based query engine
genomephenome
PhenGen Database
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Three things to remember
• Distributed architecture (serverless) can cater for a wide
range of applications
• compute intensive tasks (GT-Scan)
• Tasks requiring continuous learning (CryptoBreeder)
• Data intensive tasks (PhenGen Database)
• Interoperability is built in supporting evidence based
decision-making
• Optimization is currently still work intensive; however
there are many startups addressing this issue
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Denis Bauer,
PhD
Oscar Luo,
PhD
Rob Dunne,
PhD
Piotr Szul
Team
Aidan O’BrienLaurence Wilson,
PhD
Collaborators
News
Software
Kaitao Lai,
PhD
Arash Bayat
Lynn Langit
Natalie Twine,
PhD
Top 10 Australian IT stories of 2017
Transformational Bioinformatics

More Related Content

What's hot

TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDATiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDAShanker Trivedi
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
GPUdb: A Distributed Database for Many-Core Devices
GPUdb: A Distributed Database for Many-Core DevicesGPUdb: A Distributed Database for Many-Core Devices
GPUdb: A Distributed Database for Many-Core Devicesinside-BigData.com
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudJamie Kinney
 
UberCloud Webinar Abaqus and cloud computing
UberCloud Webinar Abaqus and cloud computingUberCloud Webinar Abaqus and cloud computing
UberCloud Webinar Abaqus and cloud computingThomas Francis
 

What's hot (6)

TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDATiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
TiECon Florida keynote - New opportunities for entrepreneurs using GPU & CUDA
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
GPUdb: A Distributed Database for Many-Core Devices
GPUdb: A Distributed Database for Many-Core DevicesGPUdb: A Distributed Database for Many-Core Devices
GPUdb: A Distributed Database for Many-Core Devices
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the Cloud
 
UberCloud Webinar Abaqus and cloud computing
UberCloud Webinar Abaqus and cloud computingUberCloud Webinar Abaqus and cloud computing
UberCloud Webinar Abaqus and cloud computing
 

Similar to Going Server-less for Web-Services that need to Crunch Large Volumes of Data

How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Don't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsDon't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsRed Gate Software
 
Next Generation Manufacturing
Next Generation ManufacturingNext Generation Manufacturing
Next Generation ManufacturingElliot Duff
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Xing Xu
 
Predictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentPredictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentAltoros
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for NetworkDamien Garros
 
apidays London 2023 - Open Standards, AI and Data for better business decisio...
apidays London 2023 - Open Standards, AI and Data for better business decisio...apidays London 2023 - Open Standards, AI and Data for better business decisio...
apidays London 2023 - Open Standards, AI and Data for better business decisio...apidays
 
Accelerating Cloud Services - Intel
Accelerating Cloud Services - IntelAccelerating Cloud Services - Intel
Accelerating Cloud Services - IntelAmazon Web Services
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationInside Analysis
 
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...Codit
 
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationCloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationVMware Tanzu
 
Elastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial InternetElastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial InternetReal-Time Innovations (RTI)
 

Similar to Going Server-less for Web-Services that need to Crunch Large Volumes of Data (20)

How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Don't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsDon't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOps
 
Next Generation Manufacturing
Next Generation ManufacturingNext Generation Manufacturing
Next Generation Manufacturing
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012
 
Predictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentPredictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is Different
 
Omniverse for the Metaverse
Omniverse for the MetaverseOmniverse for the Metaverse
Omniverse for the Metaverse
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for Network
 
apidays London 2023 - Open Standards, AI and Data for better business decisio...
apidays London 2023 - Open Standards, AI and Data for better business decisio...apidays London 2023 - Open Standards, AI and Data for better business decisio...
apidays London 2023 - Open Standards, AI and Data for better business decisio...
 
Accelerating Cloud Services - Intel
Accelerating Cloud Services - IntelAccelerating Cloud Services - Intel
Accelerating Cloud Services - Intel
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of Innovation
 
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
Maturing IoT solutions with Microsoft Azure (Sam Vanhoutte & Glenn Colpaert a...
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous IntegrationCloud-Native Fundamentals: Accelerating Development with Continuous Integration
Cloud-Native Fundamentals: Accelerating Development with Continuous Integration
 
OGCE SC10
OGCE SC10OGCE SC10
OGCE SC10
 
Elastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial InternetElastic Software Infrastructure to Support the Industrial Internet
Elastic Software Infrastructure to Support the Industrial Internet
 

More from Denis C. Bauer

Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Denis C. Bauer
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site predictionDenis C. Bauer
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site predictionDenis C. Bauer
 

More from Denis C. Bauer (19)

Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site prediction
 

Recently uploaded

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Going Server-less for Web-Services that need to Crunch Large Volumes of Data

  • 1. Going Server-less for Web-Services that need to Crunch Large Volumes of Data HEATH & BIOSECURITY Dr Denis Bauer | Bioinformatics | @allPowerde 9 Mar 2018 – Continuous Delivery and DevOps Day, Agile India
  • 2. Transformational Bioinformatics | Denis C. Bauer | @allPowerde CSIRO An agile government research organization. Overview GT-Scan2 A Serverless web- service for complex research workflows. GenPhen DB A serverless system for large data. Cryptobreeder A Serverless system that continuously learns. Not CSIRO-funded
  • 3. Team CSIRO Transformational Bioinformatics | Denis C. Bauer | @allPowerde 5319 talented staff $1billion+ budget Working with over 2800+ industry partners 55 sites across Australia Top 1% of global research agencies Each year 6 CSIRO technologies contribute $5 billion to the economy
  • 4. Transformational Bioinformatics | Denis C. Bauer | @allPowerde EXTENDED WEAR CONTACTS POLYMER BANKNOTES RELENZA FLU TREATMENT Fast WLAN Wireless Local Area Network AEROGARD TOTAL WELLBEING DIET RAFT POLYMERISATION BARLEYmax™ SELF TWISTING YARN SOFTLY WASHING LIQUID HENDRA VACCINE NOVACQ™ PRAWN FEED Australia’s innovation catalyst
  • 5. Transformational Bioinformatics | Denis C. Bauer | @allPowerde CSIRO An agile government research organization. Overview GT-Scan2 A Serverless web- service for complex research workflows. GenPhen DB A serverless system for large data. Cryptobreeder A Serverless system that continuously learns. Not CSIRO-funded
  • 6. Transformational Bioinformatics | Denis C. Bauer | @allPowerde Recruiting instantaneous appropriately powered compute Desktop compute High-performance compute Hadoop/Spark Serverless Focus small data Compute-intensive Data-intensive Agility Fault tolerant No No Yes (Yes) Node-bound Yes Yes No No Parallelization 10 CPU 100 CPU 1000 CPU 1000 CPU Parallelization procedure bespoke bespoke standardized standardized Overhead in the cloud NA spin-up lag spin-up lag instantaneously CSIRO solution
  • 7. Ideal application case for serverless: Transformational Bioinformatics | Denis C. Bauer | @allPowerde Small tasks… • Embarrassingly parallele tasks …that need to scale • Unpredictable burstable workload that needs to be delivered online Agility + Scalability =
  • 8. Transformational Bioinformatics | Denis C. Bauer | @allPowerde Genome Editing (CRISPR) can correct genetic diseases, such as hypertrophic cardiomyopathy. However, editing does not work every time, e.g. only 7 in 10 embryos were mutation free. Aim: Develop computational guidance framework to enable edits the first time; every time. Ma et al. Nature 2017 * * Some controversy around the paper
  • 9. First serverless research Transformational Bioinformatics | Denis C. Bauer | @allPowerde Featured in Used by
  • 10. GT-Scan2 Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 11. GT-Scan.net Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 12. Interoperable Workflows • Programmable call to GT- Scan2 (API) • Automatic result retrieval to notebook environment • Seamless and reproducible access to tertiary analytics. Transformational Bioinformatics | Denis C. Bauer | @allPowerde This notebook shows the workflow for genome engineering of finding a specific target site and then having the result from GT-Scan for direct visualization. GenEng Reproducible Genome Engineering
  • 13. Demo GT-Scan2 Transformational Bioinformatics | Denis C. Bauer | @allPowerde http://gt-scan2.csiro.au/notebook- casestudy.html
  • 14. Serverless systems are hard to optimize • Pay only for what you use -> Optimize to use as little as possible Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 15. GT-Scan2 X-Ray Analysis Transformational Bioinformatics | Denis C. Bauer | @allPowerde 25 50 75 getFastaSequence createJobtargetScan offtargetScanStarter offtargetSearch targetIntersects targetTranscriptionIntersects targetW uScorer targetSgR N AScorer O nTargetScorer genom eC R ISPR functions runtime(s) Type base old
  • 16. Results – 4x Faster (80% improvement) Transformational Bioinformatics | Denis C. Bauer | @allPowerde 2 min 30 sec
  • 17. Using hypothesis-driven architecture to improve serverless infrastructure Architecture as text Evolve Automatic performance measure Evaluate Transformational Bioinformatics | Denis C. Bauer | @allPowerde James Lewis https://www.epsagon.com/ 4pm Kief Morris
  • 18. Transformational Bioinformatics | Denis C. Bauer | @allPowerde CSIRO An agile government research organization. Overview GT-Scan2 A Serverless web- service for complex research workflows. GenPhen DB A serverless system for large data. Cryptobreeder A Serverless system that continuously learns. Not CSIRO-funded
  • 19. CryptoKitties in a nutshell Transformational Bioinformatics | Denis C. Bauer | @allPowerde What’s in the genes of your CryptoKitties? By cryptobreeder Not CSIRO-funded
  • 20. CryptoBreeder.net • Objective: • Build a machine learning web-service to predict the ‘cattributes’ of the offspring from a breeding pair • Problem: • New ‘cattributes’ emerge all the time • Solution: • Continuously learning model Transformational Bioinformatics | Denis C. Bauer | @allPowerde Not CSIRO-funded Uptake: 164 sessions / week
  • 21. CryptoBreeder.net Transformational Bioinformatics | Denis C. Bauer | @allPowerde Not CSIRO-funded Web service Layer Machine Learning Layer
  • 22. Cost to date • Any guesses? Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 23. Cost to date: AU $24.35 Transformational Bioinformatics | Denis C. Bauer | @allPowerde Ongoing costs: Still within the 1 Year AWS Free Tier – however already exceeded S3 limit and EC2 not eligible Uptake: 164 sessions / week Not CSIRO-funded
  • 24. Transformational Bioinformatics | Denis C. Bauer | @allPowerde CSIRO An agile government research organization. Overview GT-Scan2 A Serverless web- service for complex research workflows. GenPhen DB A serverless system for large data. Cryptobreeder A Serverless system that continuously learns.
  • 25. Stephens et al. PLOS Biology 2015 Genomics will outpace other BigData disciplines Transformational Bioinformatics | Denis C. Bauer | @allPowerde Astronomy Twitter YouTube Genomics
  • 26. Clinical use of GenPhen DB Transformational Bioinformatics | Denis C. Bauer | @allPowerde • Objective: • Build a web-service that can query databases in response to a patient’s genome and medical record • Problem: • Genomic data is so large • Solution: • Athena-based query engine genomephenome
  • 27. PhenGen Database Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 28. Three things to remember • Distributed architecture (serverless) can cater for a wide range of applications • compute intensive tasks (GT-Scan) • Tasks requiring continuous learning (CryptoBreeder) • Data intensive tasks (PhenGen Database) • Interoperability is built in supporting evidence based decision-making • Optimization is currently still work intensive; however there are many startups addressing this issue Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 29. Transformational Bioinformatics | Denis C. Bauer | @allPowerde Denis Bauer, PhD Oscar Luo, PhD Rob Dunne, PhD Piotr Szul Team Aidan O’BrienLaurence Wilson, PhD Collaborators News Software Kaitao Lai, PhD Arash Bayat Lynn Langit Natalie Twine, PhD Top 10 Australian IT stories of 2017 Transformational Bioinformatics

Editor's Notes

  1. 12:30 Going Server-less for Web-Services that need to Crunch Large Volumes of Data By Denis Bauer Team Leader Transformational Bioinformatics @ CSIRO 12:30 - 13:15 Real-time analysis through cloud-based solutions is expected in every domain, including life sciences. However, keeping runtime to real-time and constant can be challenging for problems that vary in their complexity such as genome engineering. Here, the whole genome needs to be analyzed for every potential modification spot, hence the computational complexity of finding the optima spot can vary by orders of magnitude. Using AWS Lambda we break down this task into smaller sub-tasks that can be solved in parallel by instantaneously recruiting additional Lambda functions as the complexity increases. The resulting web-tool, GT-Scan2 was featured on the prestigious AWS Jeff Barr blog as it brings together novel scientific insights and unprecedented cloud-compute capacity. This same idea has been used for building CryptoBreeder. In this presentation, we will discuss the general template for serverless web-application and discuss bespoke solutions for overcoming technical limitations server-less imposes. CONTINUOUS DELIVERY AND DEVOPS
  2. Staff # as at 3 March 2016 = 5319 2014–15 budget = $1.2 billion -------------------- Today we have around 5300 talented people working out of 50-plus centres in Australia and internationally. We are a billion dollar organisation We generate $485+ million in external revenue – essentially nearly 40% per cent of our revenue is externally sourced Our people work closely with industry and communities to leave a lasting legacy. Our ability to achieve results is shown by the quality of our research. We are in the top 1% of global research institutions in 15 of 22 research fields and in the top 0.1% in four research fields. CSIRO is the key connector of institutions in the Australian system for some areas. CSIRO is the most central Australian institution in 6 research fields – Agricultural Sciences, Environment/Ecology, Plant and Animal Sciences, Geosciences, Chemistry and Materials Science. CSIRO works with 1208 SME’s and 2,877 customers each year. We’re always looking for ways we can help business and industry.
  3. Square Kilometre Array (SKA) project is expected to lead to a storage demand of 1 exabyte per year. YouTube currently requires from 100 petabytes to 1 exabyte for storage and may be projected to require between 1 and 2 exabytes additional storage per year by 2025. Twitter’s storage needs today are estimated at 0.5 petabytes per year, which may increase to 1.5 petabytes in the next ten years.