Genome sequencing using OPOenStack

•Download as PPTX, PDF•

0 likes•140 views

Lorick Jain

Human Genome sequencing with OpenStack

Software

Human Genome
Sequencing
Genome sequencing technology driving down costs faster than
moore’s law.
-LORICK JAIN

How to store large amounts of data?
• Firstly it should be cheap and use less manpower
• They use OpenStack Swift and are the largest contributors to it.
• Researching Sequencing a lot of data.
• Hudson Alpha – they do sequence processing, use biology, using storage
compute and expertise.
• There was no storage before to do any of this.
• They had a low budget so then they decided to use Swift.

Data Avalanche
• Turnover time should be decreased.
• Metadata
• Data Proliferation.
• Cost of Downtime
• HPC throughput
• Multiple generations of hardware.

Storage is expensive!
• Amazon charges 37$ per TB
• So to store they needed 504 drives, but limited to 8 per customer then(s3),so
they got a 4PB rack at 150k$.
• It should be durable, available and flexible.
• They used Cgate and Swift so they could manage very easily.
• SwiftStack irradicates difficulty with provisioning.

Swift Stack
• Cost is reduced
• Architecture is simple.
• Practical application of this –
• Auto discarding objects, temp url for customers, file system gateway for
object storage, offsite replica if not using a CDN.
• Future – Erasure coding to erase replicas as this is a lot of data, pipelining
work, scaling.

What's hot

Cloud Overviewiasaglobal

MySQL Head to Head PerformanceKyle Bader

DOWNSAMPLING DATAInfluxData

Cloud Costing Services InnoTech

Persistent Storage for Containerized ApplicationsColleen Corrice

Kapacitor Stream ProcessingInfluxData

New use cases for Ceph, beyond OpenStack, Luis RicoCeph Community

Tips, Tricks & Best Practices for large scale HDInsight DeploymentsAshish Thapliyal

Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryAshish Thapliyal

A True Story About Database OrchestrationInfluxData

Running Presto with Alluxio on AWS EMRAlluxio, Inc.

Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)Alluxio, Inc.

Hadoop summit-ams-2014-04-03SDanzanvilliersCriteo

(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014Amazon Web Services

Llnl talkTed Dunning

Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB

BDX 2016 - Arnon rotem gal-oz @ appsflyerIdo Shilon

Impala turbocharge your big data accessOphir Cohen

Cassandra On EPAM Cloud - VDAY 2017Oresztész Margaritisz

Five essential new enhancements in azure HDnsightAshish Thapliyal

What's hot (20)

Cloud Overview

MySQL Head to Head Performance

DOWNSAMPLING DATA

Cloud Costing Services

Persistent Storage for Containerized Applications

Kapacitor Stream Processing

New use cases for Ceph, beyond OpenStack, Luis Rico

Tips, Tricks & Best Practices for large scale HDInsight Deployments

Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query

A True Story About Database Orchestration

Running Presto with Alluxio on AWS EMR

Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)

Hadoop summit-ams-2014-04-03

(BDT202) HPC Now Means 'High Personal Computing' | AWS re:Invent 2014

Llnl talk

Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra

BDX 2016 - Arnon rotem gal-oz @ appsflyer

Impala turbocharge your big data access

Cassandra On EPAM Cloud - VDAY 2017

Five essential new enhancements in azure HDnsight

Similar to Genome sequencing using OPOenStack

Taming the resource tigerElizabeth Smith

Flashy prefetching for high performance flash drivesPratik Bhat

Working with Instrument Data (GlobusWorld Tour - UMich)Globus

Distributed applications using HazelcastTaras Matyashovsky

2010 AIRI Petabyte Challenge - View From The TrenchesGeorge Ang

Azug - successfully breeding rabitsYves Goeleven

AWS Summit Tel Aviv - Enterprise Track - Data WarehouseAmazon Web Services

Real time analytics using Hadoop and ElasticsearchAbhishek Andhavarapu

BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...Dirk Petersen

Devnexus 2018Roy Russo

Hadoop Ecosystem and Low Latency Streaming ArchitectureInSemble

Applications in the CloudEberhard Wolff

Alluxio - Scalable Filesystem Metadata ServicesAlluxio, Inc.

Juggling with Bits and Bytes - How Apache Flink operates on binary dataFabian Hueske

Oracle Exadata Version 2Jarod Wang

Research Data Management Storage Requirements: University of LeedsResearch Data Leeds

Elasticsearch 5.0Matias Cascallares

Machine Learning With H2O vs SparkMLArnab Biswas

Managing Security At 1M Events a Second using ElasticsearchJoe Alex

Similar to Genome sequencing using OPOenStack (20)

Taming the resource tiger

Flashy prefetching for high performance flash drives

Working with Instrument Data (GlobusWorld Tour - UMich)

Distributed applications using Hazelcast

2010 AIRI Petabyte Challenge - View From The Trenches

Azug - successfully breeding rabits

AWS Summit Tel Aviv - Enterprise Track - Data Warehouse

Real time analytics using Hadoop and Elasticsearch

BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...

Devnexus 2018

Hadoop Ecosystem and Low Latency Streaming Architecture

Applications in the Cloud

Alluxio - Scalable Filesystem Metadata Services

Juggling with Bits and Bytes - How Apache Flink operates on binary data

Oracle Exadata Version 2

Research Data Management Storage Requirements: University of Leeds

Elasticsearch 5.0

Machine Learning With H2O vs SparkML

Managing Security At 1M Events a Second using Elasticsearch

Recently uploaded

WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba

Architecture decision records - How not to get lost in the pastPapp Krisztián

WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2

WSO2CON2024 - It's time to go PlatformlessWSO2

WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2

WSO2CON 2024 - How to Run a Security ProgramWSO2

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver

What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen

AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfryanfarris8

WSO2CON 2024 Slides - Unlocking Value with AIWSO2

WSO2CON 2024 - Building a Digital Government in UgandaWSO2

WSO2CON 2024 - Does Open Source Still Matter?WSO2

WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2

WSO2Con204 - Hard Rock Presentation - KeynoteWSO2

Recently uploaded (20)

WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein

Architecture decision records - How not to get lost in the past

WSO2Con2024 - Unleashing the Financial Potential of 13 Million People

WSO2CON2024 - It's time to go Platformless

WSO2Con2024 - Software Delivery in Hybrid Environments

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...

WSO2CON 2024 - How to Run a Security Program

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

What Goes Wrong with Language Definitions and How to Improve the Situation

AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf

WSO2CON 2024 Slides - Unlocking Value with AI

WSO2CON 2024 - Building a Digital Government in Uganda

WSO2CON 2024 - Does Open Source Still Matter?

WSO2Con2024 - Organization Management: The Revolution in B2B CIAM

WSO2Con204 - Hard Rock Presentation - Keynote

Genome sequencing using OPOenStack

1. Human Genome Sequencing Genome sequencing technology driving down costs faster than moore’s law. -LORICK JAIN

2. How to store large amounts of data? • Firstly it should be cheap and use less manpower • They use OpenStack Swift and are the largest contributors to it. • Researching Sequencing a lot of data. • Hudson Alpha – they do sequence processing, use biology, using storage compute and expertise. • There was no storage before to do any of this. • They had a low budget so then they decided to use Swift.

3. Sequencing Workflow

4. Data Avalanche • Turnover time should be decreased. • Metadata • Data Proliferation. • Cost of Downtime • HPC throughput • Multiple generations of hardware.

5. Storage is expensive! • Amazon charges 37$ per TB • So to store they needed 504 drives, but limited to 8 per customer then(s3),so they got a 4PB rack at 150k$. • It should be durable, available and flexible. • They used Cgate and Swift so they could manage very easily. • SwiftStack irradicates difficulty with provisioning.

6. Swift Stack • Cost is reduced • Architecture is simple. • Practical application of this – • Auto discarding objects, temp url for customers, file system gateway for object storage, offsite replica if not using a CDN. • Future – Erasure coding to erase replicas as this is a lot of data, pipelining work, scaling.

Genome sequencing using OPOenStack

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Genome sequencing using OPOenStack

Similar to Genome sequencing using OPOenStack (20)

More from Lorick Jain

More from Lorick Jain (6)

Recently uploaded

Recently uploaded (20)

Genome sequencing using OPOenStack