Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
BigData in Life Sciences, Genomics and Systems Biology
1. BigData in Life Sciences, Genomics and
Systems Biology
Harsha Rajasimha
9th September 2015
2. BigData in Life Sciences, Genomics and
Systems Biology
What is Bigdata
Life sciences, Genomics and systems biology
BigData in life sciences – where is it coming from?
Genomics and Systems Biology – BigData challenges.
Making sense of BigData
Future of BigData in genomics/SB
4. Medicine, Ag, Food Safety, Forensics, Epidemiology
concern the study of living organisms, including biology, botany,
zoology, microbiology, physiology, biochemistry, and related
subjects
5. Gen“omics”
Before 2000: One Gene at a time based on prior knowledge
Now: All ~25,000 genes at once – no prior knowledge necessary
5
Genomics is a discipline in genetics that applies recombinant
DNA, DNA sequencing methods, and bioinformatics to
sequence, assemble, and analyze the function and structure of
genomes (the complete set of DNA within a single cell of an
organism).
OMICS Characteristics
Comprehensiveness
Scale
High-throughput and low-cost technology development
Rapid data release
Social and ethical implications
6. Central Dogma of Molecular Biology
DNA RNA PROTEIN
Transcription
Reverse
Transcription
RNAi
Gene silencing
FUNCTION
Molecular biology is a branch of science concerning biological
activity at the molecular level. The field of molecular biology
overlaps with biology and chemistry and in particular, genetics and
biochemistry.
7. Systems biology is the study of systems of biological components,
which may be molecules, cells, organisms or entire species.
Living systems are dynamic and complex, and their behavior may
be hard to predict from the properties of individual parts.
8. Life Sciences BigData Examples
Measuring Instruments: LIMS, ELNs
Imaging: Molecular and cellular, pathology
Genomics: personal genomes, aggregate databases, gene
expression
Electronic Health Records: variety of information, phenotypes
Literature evidence: Pubmed, ISI web of science, Clinical trials,
WWW
Curated content: biochemical pathways, drug
response/resistance
9. Precision medicine is an emerging approach for disease
treatment and prevention that takes into account individual
variability in genes, environment, and lifestyle for each person.
15. 15
Genome / DNA Sequencing
•Game Changer 1: First
human genome sequenced
(2001)
•Game Changer 2: Human
genome costs <1K (2014)
Cost is decreasing at the square of Moore’s law:
Flatley’s Law
ability to digitize humans through
genomics and genotyping will
overturn the practice of medicine.
only a small fraction of 700,000
medical practitioners in US are upto
speed with genomics...
17. Other BigData Use Cases
• Insurance: Cost benefit analysis of tests
• Health record- guided drug development
• Patient Stratification – drug response based on DNA
• Measuring Instruments
• FDA Office of Regulatory affairs 14 labs, 1000+ instruments,
data
• 1000genomes, 100K genomes UK, PMI million cohort
– genomes + phenomes
• Biochemical pathways: Reactome, KEGG, etc.
18. Lot of BigData – not enough analysis
http://searchhealthit.techtarget.com/tip/Big-data-in-health-care-
Lots-of-data-but-not-enough-analysis
19. Solutions to BigData
Data Storage
Data Organization
Data Analytics
Data Movement
Data Exchange
Data Visualization
BD2K: BigData is worthless
Data Dissemination: Open data, Free data, Open Govt
21. Data Management, Retrieval
• Relational databases
• No-SQL databases
• Data use cases
http://www.tomsitpro.com/articles/rdbms-sql-cassandra-dba-
developer,2-547-2.html
22. Data Organization and DBs
http://www.enaxisconsulting.com/images/userfiles/image
s/MDM-Chart640Final(1).jpg
Business Cases, Continuity,
Infrastructure, Governance
E.g., NIH public data repositories
24. Data Movement / Transfer
• How is the data expected to move within and outside
the infrastructure?
• Bring data to analysis tools or tools to data?
• From Archives to compute storage, From local to cloud,
• Network bandwidth considerations
• DAS, NAS, SAN, Tapes, RAM, Cache
25. Data Integration and Exchange
• APIs: Application programming
interfaces for on-demand
access
• XML: SBML
• EMRs
• RDF/OWL: BioPAX
• FastQ
• DICOM
• Commons: genomics, cancer,
etc.
26. Data Visualization
Circos plot
Health InfoScape: 7+ million EMRs, SENSEable city lab at
MIT and GE HealthyMagination. Freq of co-occurrence
of medical conditions.
Alignment of 8 yersinia whole bacterial
genomes