SlideShare a Scribd company logo
1 of 31
Cloud Computing
Technologies for Genomic
Big Data Analysis
Fabrício A. B. Silva, Alberto Davila
FIOCRUZ
{fabs,davila}@fiocruz.br
Big Data – A Definition
“Big data is a term used to describe
information assemblages that make
conventional data, or database, processing
problematic due to any combination of
their size (volume), frequency of update
(velocity), or diversity (variety)”
Hay SI, George DB, Moyes CL, Brownstein JS (2013) Big Data
Opportunities for Global Infectious Disease Surveillance. PLoS Med
10(4): e1001413. doi:10.1371/journal.pmed.1001413
The Data Deluge
“In the last five years, more scientific data
has been generated than in the entire
history of mankind. You can imagine
what’s going to happen in the next five.”
Winston Hide, associate professor of bioinformatics
Harvard School of Public Health.
The promise of big data. HSPH News, Spring/Summer 2012
Exemple: Genbank

http://www.ncbi.nlm.nih.gov/genbank/statistics
Accessed on Oct 22, 2013
DNA Sequencing Evolution

Stein, L. D. (2010). The case for cloud computing in genome
informatics. Genome Biol, 11(5), 207.
Interesting Facts...
• Sequencing a human genome has
decreased in cost from US$ 1 million in
2007 to US$1 thousand in 2012
• An human DNA has 3 billion bp ~ 100
GB of raw data
• NCI’s million genomes project: 1 million
TB, or 1000 petabyte, or 1 Exabyte
Driscoll, A. O., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop
and cloud computing in genomics. Journal of biomedical informatics.
The Processing Bottleneck
Number
Software of Cores

Start

Finish

Processing Time

File sizes

Flash

24

9/12/13 22:48

9/12/13 22:48

0:00:53

2 files: 237 Mb and 238 Mb

Velveth

1

9/12/13 22:50

9/12/13 22:52

0:01:39

3 files: 100 Mb, 166 Mb and 165 Mb

Velvetg

1

9/12/13 22:54

9/12/13 22:59

0:04:53

2 files: 250 Mb and 75 Mb

Mira

24

9/12/13 23:11

9/12/13 23:32

0:21:21

2 files: 69 Mb and 6 Mb

Glimmer3

1

9/12/13 23:40

9/12/13 23:40

0:00:40

2 files: 6 Mb and 1.4 Mb

Blastx

24

9/12/13 23:46

9/13/13 9:23

9:36:15

Against RefSeq (17.411.217 enries)

Pipeline processed @ Computational and Systems Biology Lab, Bioinformatics Platform, Instituto
Oswaldo Cruz, FIOCRUZ – Input Data size: 500MB
NGS: Expect Much More Data
12

10

8
Coluna 1
Coluna 2
Coluna 3

6

4

2

0
Linha 1

Linha 2

Linha 3

Linha 4
What Then?
Cloud Computing: a
Definition
• “Cloud computing is a model for
enabling convenient, on-demand network
access to a shared pool of configurable
computing resources (e.g., networks,
servers, storage, applications, and
services) that can be rapidly provisioned
and released with minimal management
effort or service provider interaction”
NIST – Available at http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
Cloud Computing:
Advantages
• Flexibility
– Use of virtualization technology

• Scalability
– Large number of nodes with local speed
connection

• Availability/Accessibility
– Even small labs can harness the power of the
Cloud
Cloud Scalability: Example

Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L., & Nolan, G. P. (2011). Cloud
and heterogeneous computing solutions exist today for the emerging big data problems
in biology. Nature Reviews Genetics, 12(3), 224-224.
Cloud Computing:
Challenges
• Bandwidth Limits
– Large data sets needs to be moved to the
cloud

• Security/Privacy Issues
– Limited control over remote storage

• Expertise
– Adapting new applications to the cloud still
requires some technical expertise
MapReduce
• MapReduce/Hadoop
– MapReduce: Parallel distributed framework
invented by Google for processing large data sets
– Data and computations are spread over thousands
of computers, processing petabytes of data each
day
– Hadoop is the leading open-source implementation
MapReduce
• MapReduce/Hadoop: Advantages
– Scalable, Efficient, Reliable
– Easy to program
– Runs on commodity computers

• MapReduce/Hadoop: Challenges
– Redesigning, retooling applications
Cloud Computing in
Genomics
• Crossbow
– Scalable software pipeline for whole genome
resequencing analysis over Hadoop

• CloudBurst
– Highly sensitive short read mapping over Hadoop

• Myrna
– Tool for calculating differential gene expression in large
RNA-seq datasets over Hadoop
Cloud Computing in
Genomics
• Contrail
– De novo assembly of large genomes over Hadoop

• CloudBlast
– Scalable BLAST over Hadoop

• Quake
– DNA sequence error detection and correction in sequence
reads over Hadoop
Cloud Computing in
Genomics
• More examples of Hadoop based apps:
– CloudAligner
– BlastReduce
– CloudBrush
– GATK
– Nephele
– BlueSNP
– Etc…
Crossbow: Hadoop
Streaming

Langmead, B., Schatz, M. C., Lin, J., Pop, M., & Salzberg, S. L. (2009). Searching for
SNPs with cloud computing. Genome Biol, 10(11), R134.
Crossbow: Hadoop
Streaming
1. Map (Bowtie): many sequencing reads are
mapped to the reference genome in parallel.
2. Shuffle: the sequence alignments are
aggregated so that all alignments on the same
chromosome or locus are grouped together
and sorted by position.
3. Reduce/Scan
(SOAPsnp):
the
sorted
alignments are scanned to identify SNPs
(Single Nucleotide Polymorphism) within each
region.
Cloud-enabled
Technologies
• Apache HBase
– Open
source,
non-relational,
distributed database modeled after
Google's BigTable. It runs on top of
HDFS
(Hadoop
Distributed
Filesystem), providing BigTable-like
capabilities for Hadoop
Cloud-enabled
Technologies
• Apache Cassandra
– Linear scalable and high available
database that can run on commodity
hardware or cloud infrastructure,
with support for replication across
multiple datacenters.

• Google's Pregel/Apache
Giraph
– Iterative graph processing system
built for high scalability
Cloud-enabled
Technologies
• Apache Hive
– data warehouse system for Hadoop
that
facilitates
easy
data
summarization, ad-hoc queries, and
the analysis of large datasets

• Apache Pig
– high-level language for expressing
data analysis programs, coupled with
evaluation
infrastructure
over
Hadoop
Parallel Patterns for the
Cloud
• Stream-oriented
– Farm
– Farm with feedback
– Pipeline

• Data-parallel
– Map
– Reduce
Pipeline Pattern: Stingray@Galaxy
Multiple Parallel Patterns

Aldinucci, Marco, et al. Parallel stochastic systems biology in the cloud. Briefings in
Bioinformatics (2013).
But...Our group do not have the expertise to develop
our own Cloud applications...
Can we still use the Cloud/Mapreduce for genomic
processing?
Galaxy Cloudman
Cloudgene

Schönherr, S. et al. (2012). Cloudgene: A graphical execution platform for
MapReduce programs on private and public clouds. BMC bioinformatics, 13(1),
200.
What's Next?
• Beyond Hadoop
– Adoption of new technologies/parallel
patterns for genomic data analysis in the
cloud

• Scalable Data Storage
– High Availability/Support for replication
– Preliminary work on HBase by Intel

• Private/Hybrid/Corporate Clouds
– Privacy/security issues
– Data tenancy
Thank You!!!
Acknowledgements: Nelson Kotowski, Rodrigo Jardim (FIOCRUZ)

More Related Content

What's hot

Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Globus
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.Chris Evelo
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database
Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph DatabaseAnalyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database
Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph DatabaseNeo4j
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicDatabricks
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...marcosmartinezromero
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008bosc_2008
 
Text Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - PublisherText Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - Publisherjudsondunham
 

What's hot (20)

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects Sciunits: Reusable Research Objects
Sciunits: Reusable Research Objects
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database
Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph DatabaseAnalyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database
Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008Smith T Bio Hdf Bosc2008
Smith T Bio Hdf Bosc2008
 
Text Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - PublisherText Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - Publisher
 

Viewers also liked

Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
Social Business =Cloud + Big Data + Social Media + Mobile Computing
Social Business =Cloud + Big Data + Social Media + Mobile ComputingSocial Business =Cloud + Big Data + Social Media + Mobile Computing
Social Business =Cloud + Big Data + Social Media + Mobile ComputingWilliam Tanenbaum
 
The Big Data - Same Humans Problem (CIDR 2015)
The Big Data - Same Humans Problem (CIDR 2015)The Big Data - Same Humans Problem (CIDR 2015)
The Big Data - Same Humans Problem (CIDR 2015)Alexandros Labrinidis
 
Healthcare Statitsics - a Market Research Report by RapidValue Solutions
Healthcare Statitsics - a Market Research Report by RapidValue SolutionsHealthcare Statitsics - a Market Research Report by RapidValue Solutions
Healthcare Statitsics - a Market Research Report by RapidValue SolutionsRapidValue
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
prpl: a non-profit foundation embracing IoT diversity, big data, and analytics
prpl: a non-profit foundation embracing IoT diversity, big data, and analyticsprpl: a non-profit foundation embracing IoT diversity, big data, and analytics
prpl: a non-profit foundation embracing IoT diversity, big data, and analyticsAmit Rohatgi
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big DataRené Kuipers
 
Intel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineIntel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineKetan Paranjape
 
Key knowledge, skills and behaviours required by Learning and Development Pro...
Key knowledge, skills and behaviours required by Learning and Development Pro...Key knowledge, skills and behaviours required by Learning and Development Pro...
Key knowledge, skills and behaviours required by Learning and Development Pro...Learning and Development Freelancer
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...Fiona Nielsen
 
Genomic futures v_pitt_kent_osu
Genomic futures v_pitt_kent_osuGenomic futures v_pitt_kent_osu
Genomic futures v_pitt_kent_osuBen Busby
 
Mobile cloud Computing
Mobile cloud ComputingMobile cloud Computing
Mobile cloud ComputingPooja Sharma
 
Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1Mark Skilton
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 

Viewers also liked (20)

Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Plimmer® Capacitive Deionization at Idropan
Plimmer® Capacitive Deionization at IdropanPlimmer® Capacitive Deionization at Idropan
Plimmer® Capacitive Deionization at Idropan
 
Genome Big Data
Genome Big DataGenome Big Data
Genome Big Data
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Social Business =Cloud + Big Data + Social Media + Mobile Computing
Social Business =Cloud + Big Data + Social Media + Mobile ComputingSocial Business =Cloud + Big Data + Social Media + Mobile Computing
Social Business =Cloud + Big Data + Social Media + Mobile Computing
 
The Big Data - Same Humans Problem (CIDR 2015)
The Big Data - Same Humans Problem (CIDR 2015)The Big Data - Same Humans Problem (CIDR 2015)
The Big Data - Same Humans Problem (CIDR 2015)
 
Healthcare Statitsics - a Market Research Report by RapidValue Solutions
Healthcare Statitsics - a Market Research Report by RapidValue SolutionsHealthcare Statitsics - a Market Research Report by RapidValue Solutions
Healthcare Statitsics - a Market Research Report by RapidValue Solutions
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
prpl: a non-profit foundation embracing IoT diversity, big data, and analytics
prpl: a non-profit foundation embracing IoT diversity, big data, and analyticsprpl: a non-profit foundation embracing IoT diversity, big data, and analytics
prpl: a non-profit foundation embracing IoT diversity, big data, and analytics
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Intel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicineIntel big data analytics in health and life sciences personalized medicine
Intel big data analytics in health and life sciences personalized medicine
 
Coaching poster
Coaching posterCoaching poster
Coaching poster
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Key knowledge, skills and behaviours required by Learning and Development Pro...
Key knowledge, skills and behaviours required by Learning and Development Pro...Key knowledge, skills and behaviours required by Learning and Development Pro...
Key knowledge, skills and behaviours required by Learning and Development Pro...
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
 
Genomic futures v_pitt_kent_osu
Genomic futures v_pitt_kent_osuGenomic futures v_pitt_kent_osu
Genomic futures v_pitt_kent_osu
 
Mobile cloud Computing
Mobile cloud ComputingMobile cloud Computing
Mobile cloud Computing
 
Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1Big data and digital ecosystem mark skilton jan 2014 v1
Big data and digital ecosystem mark skilton jan 2014 v1
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 

Similar to Cloud Computing Technologies for Genomic Big Data Analysis

The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Sciencepetermurrayrust
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of ScienceGlobus
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Gridnoho
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Ola Spjuth
 
Public Sector Case Studies - AWS Summit 2012 - NYC
Public Sector Case Studies - AWS Summit 2012 - NYCPublic Sector Case Studies - AWS Summit 2012 - NYC
Public Sector Case Studies - AWS Summit 2012 - NYCAmazon Web Services
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Robert Grossman
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
Thesis blending big data and cloud -epilepsy global data research and inform...
Thesis  blending big data and cloud -epilepsy global data research and inform...Thesis  blending big data and cloud -epilepsy global data research and inform...
Thesis blending big data and cloud -epilepsy global data research and inform...Anup Singh
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 

Similar to Cloud Computing Technologies for Genomic Big Data Analysis (20)

The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Semantic Web in Physical Science
Semantic Web in Physical ScienceSemantic Web in Physical Science
Semantic Web in Physical Science
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Public Sector Case Studies - AWS Summit 2012 - NYC
Public Sector Case Studies - AWS Summit 2012 - NYCPublic Sector Case Studies - AWS Summit 2012 - NYC
Public Sector Case Studies - AWS Summit 2012 - NYC
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Thesis blending big data and cloud -epilepsy global data research and inform...
Thesis  blending big data and cloud -epilepsy global data research and inform...Thesis  blending big data and cloud -epilepsy global data research and inform...
Thesis blending big data and cloud -epilepsy global data research and inform...
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 

More from Flávio Codeço Coelho

Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Flávio Codeço Coelho
 
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosAlerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosFlávio Codeço Coelho
 
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Flávio Codeço Coelho
 
Gabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsGabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsFlávio Codeço Coelho
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Flávio Codeço Coelho
 
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Flávio Codeço Coelho
 
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Flávio Codeço Coelho
 
Mark smolinski big data and public health
Mark smolinski   big data and public healthMark smolinski   big data and public health
Mark smolinski big data and public healthFlávio Codeço Coelho
 
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes   datasus - Informações em Saúde: história, uso e desafiosHaroldo lopes   datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes datasus - Informações em Saúde: história, uso e desafiosFlávio Codeço Coelho
 
Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Flávio Codeço Coelho
 
Access to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilAccess to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilFlávio Codeço Coelho
 

More from Flávio Codeço Coelho (19)

Big dengue
Big dengueBig dengue
Big dengue
 
Alerta_Dengue simplified english
Alerta_Dengue simplified englishAlerta_Dengue simplified english
Alerta_Dengue simplified english
 
dengueARS0
dengueARS0dengueARS0
dengueARS0
 
Alerta dengue expo epi out2014
Alerta dengue expo epi out2014Alerta dengue expo epi out2014
Alerta dengue expo epi out2014
 
Alerta dengue abrasco 2014
Alerta dengue   abrasco 2014Alerta dengue   abrasco 2014
Alerta dengue abrasco 2014
 
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
 
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosAlerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
 
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
 
Gabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsGabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data Needs
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
 
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
 
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
 
Mark smolinski big data and public health
Mark smolinski   big data and public healthMark smolinski   big data and public health
Mark smolinski big data and public health
 
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes   datasus - Informações em Saúde: história, uso e desafiosHaroldo lopes   datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
 
Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.
 
Access to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilAccess to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in Brazil
 
Mining legal texts with Python
Mining legal texts with PythonMining legal texts with Python
Mining legal texts with Python
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
In trodução ao Epigrass
In trodução ao EpigrassIn trodução ao Epigrass
In trodução ao Epigrass
 

Recently uploaded

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 

Recently uploaded (20)

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 

Cloud Computing Technologies for Genomic Big Data Analysis

  • 1. Cloud Computing Technologies for Genomic Big Data Analysis Fabrício A. B. Silva, Alberto Davila FIOCRUZ {fabs,davila}@fiocruz.br
  • 2. Big Data – A Definition “Big data is a term used to describe information assemblages that make conventional data, or database, processing problematic due to any combination of their size (volume), frequency of update (velocity), or diversity (variety)” Hay SI, George DB, Moyes CL, Brownstein JS (2013) Big Data Opportunities for Global Infectious Disease Surveillance. PLoS Med 10(4): e1001413. doi:10.1371/journal.pmed.1001413
  • 3. The Data Deluge “In the last five years, more scientific data has been generated than in the entire history of mankind. You can imagine what’s going to happen in the next five.” Winston Hide, associate professor of bioinformatics Harvard School of Public Health. The promise of big data. HSPH News, Spring/Summer 2012
  • 5. DNA Sequencing Evolution Stein, L. D. (2010). The case for cloud computing in genome informatics. Genome Biol, 11(5), 207.
  • 6. Interesting Facts... • Sequencing a human genome has decreased in cost from US$ 1 million in 2007 to US$1 thousand in 2012 • An human DNA has 3 billion bp ~ 100 GB of raw data • NCI’s million genomes project: 1 million TB, or 1000 petabyte, or 1 Exabyte Driscoll, A. O., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal of biomedical informatics.
  • 7. The Processing Bottleneck Number Software of Cores Start Finish Processing Time File sizes Flash 24 9/12/13 22:48 9/12/13 22:48 0:00:53 2 files: 237 Mb and 238 Mb Velveth 1 9/12/13 22:50 9/12/13 22:52 0:01:39 3 files: 100 Mb, 166 Mb and 165 Mb Velvetg 1 9/12/13 22:54 9/12/13 22:59 0:04:53 2 files: 250 Mb and 75 Mb Mira 24 9/12/13 23:11 9/12/13 23:32 0:21:21 2 files: 69 Mb and 6 Mb Glimmer3 1 9/12/13 23:40 9/12/13 23:40 0:00:40 2 files: 6 Mb and 1.4 Mb Blastx 24 9/12/13 23:46 9/13/13 9:23 9:36:15 Against RefSeq (17.411.217 enries) Pipeline processed @ Computational and Systems Biology Lab, Bioinformatics Platform, Instituto Oswaldo Cruz, FIOCRUZ – Input Data size: 500MB
  • 8. NGS: Expect Much More Data 12 10 8 Coluna 1 Coluna 2 Coluna 3 6 4 2 0 Linha 1 Linha 2 Linha 3 Linha 4
  • 10. Cloud Computing: a Definition • “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” NIST – Available at http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
  • 11. Cloud Computing: Advantages • Flexibility – Use of virtualization technology • Scalability – Large number of nodes with local speed connection • Availability/Accessibility – Even small labs can harness the power of the Cloud
  • 12. Cloud Scalability: Example Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L., & Nolan, G. P. (2011). Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology. Nature Reviews Genetics, 12(3), 224-224.
  • 13. Cloud Computing: Challenges • Bandwidth Limits – Large data sets needs to be moved to the cloud • Security/Privacy Issues – Limited control over remote storage • Expertise – Adapting new applications to the cloud still requires some technical expertise
  • 14. MapReduce • MapReduce/Hadoop – MapReduce: Parallel distributed framework invented by Google for processing large data sets – Data and computations are spread over thousands of computers, processing petabytes of data each day – Hadoop is the leading open-source implementation
  • 15. MapReduce • MapReduce/Hadoop: Advantages – Scalable, Efficient, Reliable – Easy to program – Runs on commodity computers • MapReduce/Hadoop: Challenges – Redesigning, retooling applications
  • 16. Cloud Computing in Genomics • Crossbow – Scalable software pipeline for whole genome resequencing analysis over Hadoop • CloudBurst – Highly sensitive short read mapping over Hadoop • Myrna – Tool for calculating differential gene expression in large RNA-seq datasets over Hadoop
  • 17. Cloud Computing in Genomics • Contrail – De novo assembly of large genomes over Hadoop • CloudBlast – Scalable BLAST over Hadoop • Quake – DNA sequence error detection and correction in sequence reads over Hadoop
  • 18. Cloud Computing in Genomics • More examples of Hadoop based apps: – CloudAligner – BlastReduce – CloudBrush – GATK – Nephele – BlueSNP – Etc…
  • 19. Crossbow: Hadoop Streaming Langmead, B., Schatz, M. C., Lin, J., Pop, M., & Salzberg, S. L. (2009). Searching for SNPs with cloud computing. Genome Biol, 10(11), R134.
  • 20. Crossbow: Hadoop Streaming 1. Map (Bowtie): many sequencing reads are mapped to the reference genome in parallel. 2. Shuffle: the sequence alignments are aggregated so that all alignments on the same chromosome or locus are grouped together and sorted by position. 3. Reduce/Scan (SOAPsnp): the sorted alignments are scanned to identify SNPs (Single Nucleotide Polymorphism) within each region.
  • 21. Cloud-enabled Technologies • Apache HBase – Open source, non-relational, distributed database modeled after Google's BigTable. It runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop
  • 22. Cloud-enabled Technologies • Apache Cassandra – Linear scalable and high available database that can run on commodity hardware or cloud infrastructure, with support for replication across multiple datacenters. • Google's Pregel/Apache Giraph – Iterative graph processing system built for high scalability
  • 23. Cloud-enabled Technologies • Apache Hive – data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets • Apache Pig – high-level language for expressing data analysis programs, coupled with evaluation infrastructure over Hadoop
  • 24. Parallel Patterns for the Cloud • Stream-oriented – Farm – Farm with feedback – Pipeline • Data-parallel – Map – Reduce
  • 26. Multiple Parallel Patterns Aldinucci, Marco, et al. Parallel stochastic systems biology in the cloud. Briefings in Bioinformatics (2013).
  • 27. But...Our group do not have the expertise to develop our own Cloud applications... Can we still use the Cloud/Mapreduce for genomic processing?
  • 29. Cloudgene Schönherr, S. et al. (2012). Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds. BMC bioinformatics, 13(1), 200.
  • 30. What's Next? • Beyond Hadoop – Adoption of new technologies/parallel patterns for genomic data analysis in the cloud • Scalable Data Storage – High Availability/Support for replication – Preliminary work on HBase by Intel • Private/Hybrid/Corporate Clouds – Privacy/security issues – Data tenancy
  • 31. Thank You!!! Acknowledgements: Nelson Kotowski, Rodrigo Jardim (FIOCRUZ)

Editor's Notes

  1. {"4":"O número de bases no Genbank dobra a cada 18 meses\n"}