Australia’s National Science Agency
Cloud-native machine learning
Transforming bioinformatics research
• Workflows need to be
reproducible/compliant.
• Data sizes are ever
increasing.
• Algorithms getting more
complex/interdependent.
#FutureOfBioinf is
cloud-native
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde2 |
1. Steep learning curve for non-IT
researchers
2. Granting bodies have not fully
embraced cloud expenses
3. Data privacy and protection
becomes your problem
Cloud is not easy
Cloud-native bioinformatics | @Tbioinf @allPowerde3 |
• Impact: Diverse citations
• Reach: News coverage
• Influence: Global collaborators
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde4 |
Not easy but rewarding
CSIRO
Australia’s National
Science Agency
Reading the
Genome
Finding disease genes in
billions of DNA
molecules
Democratize
Genomics
Affordable data exchange
for population-scale
cohorts
Cloud-native Innovation
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde5 |
Bioinformatics | Denis C. Bauer | @allPowerde
Credit https://toolstotal.com/
• Invented WiFi, used in five billion devices globally.
• Developed the vaccine for the Hendra Virus.
• Developed the Total Wellbeing & Low-Carb Diets.
CSIRO: We are
innovators and builders
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde6 |
‘Improve health
care through
digital technology
and services.’
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde7 |
Cloud-native machine learning | A/Prof Denis Bauer | @Tbioinf @allPowerde
99% have a genetic
variation that affects
medical care
Chanfreau-Coffinier et al. JAMA
Netw Open. 2019
CSIRO
Australia’s National
Science Agency
Reading the
Genome
Finding disease genes in
billions of DNA
molecules
Democratize
Genomics
Affordable data exchange
for population-scale
cohorts
Cloud-native Innovation
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde9 |
Finding the cure for
ALS
Credit: National Institute on Aging, National Institutes of HealthCloud-native Innovation | A/Prof Denis Bauer | @allPowerde10 |
• Unprecedented scale
• Groundbreaking research
• Latest Technology
Finding the disease gene(s)
cases
controls
Gene1 Gene2
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde11 |
Complex diseases are driven by
multiple interacting genes with variable contribution
cases
controls
Need a more
sophisticated
ML approach,
such as
Random Forest
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde12 |
Finding higher order interactions
Cloud-native bioinformatics | A/Prof Denis Bauer | @Tbioinf @allPowerde13 |
Score = (5*B6*B2 – 4*B6 – 4*B2 + 1) + (7*R1*C2 – 6*R1 – 6*C2 + 1)
Machine learning on 1.7 Trillion data points
80 Million features
Individuals
Genomic profile Disease
status
22,500samples
Disease genes
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde14 |
Transcription, medical records,
population data
Survival, Drug-choice,
Pathogenicity
Predictive markers
Population-scale genomic data analysis requires BigData solutions
Desktop compute High-performance
compute cluster
Apache Spark cluster
Focus Small data Compute-intensive Data-intensive
Node-bound Yes Yes No
Parallelization 10 CPU 100+ CPU 1000+ CPU
Parallelization procedure bespoke bespoke standardized
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde15 |
Detecting disease genes efficiently
Cloud-native bioinformatics | @Tbioinf @allPowerde16 |
Analyzes 3000 individuals
with 80M features in
30 minutesFaster
Usedby
low Accuracy high
lowSpeedhigh
Available on all major public
cloud providers as well as
on premisis
Accessible
Bring analysis to the data
Cloud-native machine learning | A/Prof Denis Bauer | @Tbioinf @allPowerde
CSIRO takes genome product to
the world on Amazon
1 AWS review
Reproductible research with share-able stacks
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde18 |
Prioritizing variants based on shared IBD
Cloud-native bioinformatics | A/Prof Denis Bauer | @Tbioinf @allPowerde19 |
Using ancestry
information (identity
by decent) to narrow
down search area for
disease genes
• TRIBES uncovers 53 novel relationships up
to 7th degree (great-great-great grandparent )
• Identified novel shared variant in FIG4
gene
• rare variant in coding region for one SALS pair – validated by
Sanger sequencing
• FIG4 previously reported in familial Charcot-Marie-Tooth.
Suggests low-penetrant FALS
• Focus on variants in known ALS-FTD genes, that cause
detrimental protein changes
Expanding relationships let to novel disease variants
Cloud-native bioinformatics | A/Prof Denis Bauer | @Tbioinf @allPowerde20 |
By 2030 it is estimated that 50% of the world
population will have been sequenced.
20 EB Storage / year
Stephens et al. BigData: Astronomical or Genomical (2015)
Data acquisition of BigData disciplines in 2025
GenomicsYouTube
Astronomy
Twitter
Frost&Sullivan
Cloud-native bioinformatics | @Tbioinf @allPowerde21 |
CSIRO
Australia’s National
Science Agency
Reading the
Genome
Finding disease genes in
billions of DNA
molecules
Democratize
Genomics
Affordable data exchange
for population-scale
cohorts
Cloud-native Innovation
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde22 |
• Up to $4000/month to
maintain
• Up to 33 hours to update as
new data comes in
• Up to 20 second query time
Resource consumption
unsustainable
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde23 |
Desktop
computers
Cloud server
(Autos-scale)
Serverless
Focus Full control Flexible Agility
Flexibility
No Overhead
Scalability
Cost-effective
Analogy
own car
by madd.org
ride share
by blacklane.com
chauffeur
Recruiting instantaneous appropriately powered compute
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde24 |
Re-thinking traditional architectures
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde25 |
5000-fold faster to update new data
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde26 | Hosking B. et al.
300-fold monthly cost saving
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde27 |
$
$
Hosking B. et al.
17-fold faster query time
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde28 | Hosking B. et al.
Exchanging genomic data cloud-natively
Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde29 |
Partners
Serve a query in 1s and
costs < $15/month to
serve 100,000 WGS dataFaster &
cheaper
Fine grained control over
data ownership and accessbetter
control
1. Steep learning curve for non-IT
researchers
2. Granting bodies have not fully
embraced cloud expenses
3. Data privacy and protection
becomes your problem
Not “cloud-only”
but “cloud-first”
Cloud-native bioinformatics | @Tbioinf @allPowerde30 |
Australia’s National Science Agency
CSIRO Health and
Biosecurity
Denis Bauer
Denis.Bauer@csiro.au
Let’s work together
Visit our site:
https://bioinformatics.csiro.au
@allPowerde
@Tbioinf

Cloud-native machine learning - Transforming bioinformatics research

  • 1.
    Australia’s National ScienceAgency Cloud-native machine learning Transforming bioinformatics research
  • 2.
    • Workflows needto be reproducible/compliant. • Data sizes are ever increasing. • Algorithms getting more complex/interdependent. #FutureOfBioinf is cloud-native Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde2 |
  • 3.
    1. Steep learningcurve for non-IT researchers 2. Granting bodies have not fully embraced cloud expenses 3. Data privacy and protection becomes your problem Cloud is not easy Cloud-native bioinformatics | @Tbioinf @allPowerde3 |
  • 4.
    • Impact: Diversecitations • Reach: News coverage • Influence: Global collaborators Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde4 | Not easy but rewarding
  • 5.
    CSIRO Australia’s National Science Agency Readingthe Genome Finding disease genes in billions of DNA molecules Democratize Genomics Affordable data exchange for population-scale cohorts Cloud-native Innovation Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde5 |
  • 6.
    Bioinformatics | DenisC. Bauer | @allPowerde Credit https://toolstotal.com/ • Invented WiFi, used in five billion devices globally. • Developed the vaccine for the Hendra Virus. • Developed the Total Wellbeing & Low-Carb Diets. CSIRO: We are innovators and builders Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde6 |
  • 7.
    ‘Improve health care through digitaltechnology and services.’ Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde7 |
  • 8.
    Cloud-native machine learning| A/Prof Denis Bauer | @Tbioinf @allPowerde 99% have a genetic variation that affects medical care Chanfreau-Coffinier et al. JAMA Netw Open. 2019
  • 9.
    CSIRO Australia’s National Science Agency Readingthe Genome Finding disease genes in billions of DNA molecules Democratize Genomics Affordable data exchange for population-scale cohorts Cloud-native Innovation Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde9 |
  • 10.
    Finding the curefor ALS Credit: National Institute on Aging, National Institutes of HealthCloud-native Innovation | A/Prof Denis Bauer | @allPowerde10 | • Unprecedented scale • Groundbreaking research • Latest Technology
  • 11.
    Finding the diseasegene(s) cases controls Gene1 Gene2 Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde11 |
  • 12.
    Complex diseases aredriven by multiple interacting genes with variable contribution cases controls Need a more sophisticated ML approach, such as Random Forest Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde12 |
  • 13.
    Finding higher orderinteractions Cloud-native bioinformatics | A/Prof Denis Bauer | @Tbioinf @allPowerde13 | Score = (5*B6*B2 – 4*B6 – 4*B2 + 1) + (7*R1*C2 – 6*R1 – 6*C2 + 1)
  • 14.
    Machine learning on1.7 Trillion data points 80 Million features Individuals Genomic profile Disease status 22,500samples Disease genes Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde14 | Transcription, medical records, population data Survival, Drug-choice, Pathogenicity Predictive markers
  • 15.
    Population-scale genomic dataanalysis requires BigData solutions Desktop compute High-performance compute cluster Apache Spark cluster Focus Small data Compute-intensive Data-intensive Node-bound Yes Yes No Parallelization 10 CPU 100+ CPU 1000+ CPU Parallelization procedure bespoke bespoke standardized Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde15 |
  • 16.
    Detecting disease genesefficiently Cloud-native bioinformatics | @Tbioinf @allPowerde16 | Analyzes 3000 individuals with 80M features in 30 minutesFaster Usedby low Accuracy high lowSpeedhigh Available on all major public cloud providers as well as on premisis Accessible
  • 17.
    Bring analysis tothe data Cloud-native machine learning | A/Prof Denis Bauer | @Tbioinf @allPowerde CSIRO takes genome product to the world on Amazon 1 AWS review
  • 18.
    Reproductible research withshare-able stacks Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde18 |
  • 19.
    Prioritizing variants basedon shared IBD Cloud-native bioinformatics | A/Prof Denis Bauer | @Tbioinf @allPowerde19 | Using ancestry information (identity by decent) to narrow down search area for disease genes
  • 20.
    • TRIBES uncovers53 novel relationships up to 7th degree (great-great-great grandparent ) • Identified novel shared variant in FIG4 gene • rare variant in coding region for one SALS pair – validated by Sanger sequencing • FIG4 previously reported in familial Charcot-Marie-Tooth. Suggests low-penetrant FALS • Focus on variants in known ALS-FTD genes, that cause detrimental protein changes Expanding relationships let to novel disease variants Cloud-native bioinformatics | A/Prof Denis Bauer | @Tbioinf @allPowerde20 |
  • 21.
    By 2030 itis estimated that 50% of the world population will have been sequenced. 20 EB Storage / year Stephens et al. BigData: Astronomical or Genomical (2015) Data acquisition of BigData disciplines in 2025 GenomicsYouTube Astronomy Twitter Frost&Sullivan Cloud-native bioinformatics | @Tbioinf @allPowerde21 |
  • 22.
    CSIRO Australia’s National Science Agency Readingthe Genome Finding disease genes in billions of DNA molecules Democratize Genomics Affordable data exchange for population-scale cohorts Cloud-native Innovation Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde22 |
  • 23.
    • Up to$4000/month to maintain • Up to 33 hours to update as new data comes in • Up to 20 second query time Resource consumption unsustainable Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde23 |
  • 24.
    Desktop computers Cloud server (Autos-scale) Serverless Focus Fullcontrol Flexible Agility Flexibility No Overhead Scalability Cost-effective Analogy own car by madd.org ride share by blacklane.com chauffeur Recruiting instantaneous appropriately powered compute Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde24 |
  • 25.
    Re-thinking traditional architectures Cloud-nativeInnovation | A/Prof Denis Bauer | @allPowerde25 |
  • 26.
    5000-fold faster toupdate new data Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde26 | Hosking B. et al.
  • 27.
    300-fold monthly costsaving Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde27 | $ $ Hosking B. et al.
  • 28.
    17-fold faster querytime Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde28 | Hosking B. et al.
  • 29.
    Exchanging genomic datacloud-natively Cloud-native Innovation | A/Prof Denis Bauer | @allPowerde29 | Partners Serve a query in 1s and costs < $15/month to serve 100,000 WGS dataFaster & cheaper Fine grained control over data ownership and accessbetter control
  • 30.
    1. Steep learningcurve for non-IT researchers 2. Granting bodies have not fully embraced cloud expenses 3. Data privacy and protection becomes your problem Not “cloud-only” but “cloud-first” Cloud-native bioinformatics | @Tbioinf @allPowerde30 |
  • 31.
    Australia’s National ScienceAgency CSIRO Health and Biosecurity Denis Bauer Denis.Bauer@csiro.au Let’s work together Visit our site: https://bioinformatics.csiro.au @allPowerde @Tbioinf