GENOME DATA
ANALYSIS
AMELDA AKOIJAM
DEPARTMENT OF ECONOMICS
CHRIST (DEEMED TO BE
UNIVERSITY)
INTRODUCTION
 Human genome is big data of life.
 Human genome is composed of 20,000-25,000 genes which is
composed of 3 billion base pairs. That’s around 3 gigabytes of data.
 Sequencing millions of human genomes would add up to hundreds
of petabytes of data. Analysis of gene interactions multiplies this
data even further.
 Big Data Analytics becomes critical in Genomics because of its
ability to store, transform and analyze large amounts of genomic
information which can unearth highly valuable medical insights for
disease prevention and cure.
 The Genome Wide Association Studies (GWAS) who is at the
forefront to Genomics are using multiple Big Data & Analytics
models to conduct research based on exploring the connections
between genes and diseases.
WHY BIG DATA IN GENOME?
BIG
DATA IN
GENOME
BETTER
ANALYTICAL
TOOL
NEED FOR
PERSONALISED
HEALTH CARE
MORE RAPID IT
DEVELOPMENTS
NEW DATA
STREAMS
Analytics Application Avenues in
Genomics
DNA
SQUENCING
LIBRARY
ANNOTATION
GENOMIC
COMPARISIO
N
GENOMIC
VISUALISATI
ON
SYNTENY
 DNA Sequencing Library: The Sequencing System needs to maintain a vast
universal library for sequencing any DNA sample from a virus to a bacterium to a
human. This library would contain every possible sequence applicable to any sample
being tested. Keeping such a huge archive of DNA sequences calls for Big Data
Analytics Systems.
 Annotation: In Genomics, annotation process involves marking a description of an
individual gene and its protein (or RNA) product. The focus of each such record is
the function assigned to the gene product. Complex Automated scripts using
Decision Analytics are used to determine how to assign gene functions.
 Genomic Comparisons: Comparing genomes involves aligning billions of DNA
reads to a genome and finding out the likelihood of similarities between random
sequences. This requires systems that can handle Big Sequence Data, and complex
correlation algorithms.
 Genomic Visualization: Genomic Browsing tools are required to display complex
correlations and vast options for customization.
 Synteny: A process involving assessing two or more genomic regions to deduce if it
comes from a single ancestral genomic region. This has similar system requirements
to Genomic Comparisons basing on complex statistical correlation algorithms.
BENEFITS
GENOME
SEQUENCING
COST
REDUCTION
TIME SAVING BIG DATA
TRANSFER
DATA
STORAGE
BETTER
ANALYSIS
SECUITY
AND
PRIVACY

GENOME DATA ANALYSIS

  • 1.
    GENOME DATA ANALYSIS AMELDA AKOIJAM DEPARTMENTOF ECONOMICS CHRIST (DEEMED TO BE UNIVERSITY)
  • 2.
    INTRODUCTION  Human genomeis big data of life.  Human genome is composed of 20,000-25,000 genes which is composed of 3 billion base pairs. That’s around 3 gigabytes of data.  Sequencing millions of human genomes would add up to hundreds of petabytes of data. Analysis of gene interactions multiplies this data even further.  Big Data Analytics becomes critical in Genomics because of its ability to store, transform and analyze large amounts of genomic information which can unearth highly valuable medical insights for disease prevention and cure.  The Genome Wide Association Studies (GWAS) who is at the forefront to Genomics are using multiple Big Data & Analytics models to conduct research based on exploring the connections between genes and diseases.
  • 3.
    WHY BIG DATAIN GENOME? BIG DATA IN GENOME BETTER ANALYTICAL TOOL NEED FOR PERSONALISED HEALTH CARE MORE RAPID IT DEVELOPMENTS NEW DATA STREAMS
  • 4.
    Analytics Application Avenuesin Genomics DNA SQUENCING LIBRARY ANNOTATION GENOMIC COMPARISIO N GENOMIC VISUALISATI ON SYNTENY
  • 5.
     DNA SequencingLibrary: The Sequencing System needs to maintain a vast universal library for sequencing any DNA sample from a virus to a bacterium to a human. This library would contain every possible sequence applicable to any sample being tested. Keeping such a huge archive of DNA sequences calls for Big Data Analytics Systems.  Annotation: In Genomics, annotation process involves marking a description of an individual gene and its protein (or RNA) product. The focus of each such record is the function assigned to the gene product. Complex Automated scripts using Decision Analytics are used to determine how to assign gene functions.  Genomic Comparisons: Comparing genomes involves aligning billions of DNA reads to a genome and finding out the likelihood of similarities between random sequences. This requires systems that can handle Big Sequence Data, and complex correlation algorithms.  Genomic Visualization: Genomic Browsing tools are required to display complex correlations and vast options for customization.  Synteny: A process involving assessing two or more genomic regions to deduce if it comes from a single ancestral genomic region. This has similar system requirements to Genomic Comparisons basing on complex statistical correlation algorithms.
  • 6.
    BENEFITS GENOME SEQUENCING COST REDUCTION TIME SAVING BIGDATA TRANSFER DATA STORAGE BETTER ANALYSIS SECUITY AND PRIVACY