This keynote talk discusses research using supercomputers and gene sequencing to study the human microbiome. The human microbiome contains 100 trillion microorganisms and their genes outnumber human genes 300 to 1. The speaker has been collecting data from his own body over 7 years to study his microbiome and immune system interactions. Collaborating researchers have sequenced his gut microbiome over time as well as samples from autoimmune disease patients. Supercomputers are needed to analyze the massive amount of sequencing data and reveal details of microbial ecology and genetics in health and disease. Studying the human microbiome will revolutionize medicine in the next decade.
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...Larry Smarr
Invited Presentation Microbiology and the Microbiome and the Implications for Human Health Analytic, Life Science & Diagnostic Association (ALDA) 2016 Senior Management Conference
Half Moon Bay, CA
October 3, 2016
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...Larry Smarr
Invited Presentation Microbiology and the Microbiome and the Implications for Human Health Analytic, Life Science & Diagnostic Association (ALDA) 2016 Senior Management Conference
Half Moon Bay, CA
October 3, 2016
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...Larry Smarr
Invited Presentation at EMBC ‘16
38th International Conference of the IEEE Engineering in Medicine and Biology Society Symposium: The Quantified Self: Visions for the Next Decade of Persistent Physiological Monitoring
Orlando, FL
August 18, 2016
The Human Microbiome and the Revolution in Digital HealthLarry Smarr
2014.01.22
Calit2 Director Larry Smarr speaks as part of the Pensacola Evening Lecture Series, organized by the Florida Institute for Human and Machine Cognition, in Pensacola, FL.
In a speech for the Global Health Program at the Council on Foreign Relations in New York City, Calit2 director Larry Smarr addresses the issue of biological diversity and the importance of monitoring the microbiome.
Exploring the Dynamics of The Microbiome in Health and DiseaseLarry Smarr
Remote Invited Provocateur Lecture
2017 Innovation Lab on Quantitative Approaches to Biomedical Data Science:
Challenges in our Understanding of the Microbiome
San Diego, CA
June 19, 2017
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
07.06.07
Director's Colloquium
Los Alamos National Laboratory
Title: Using Supercomputers and Supernetworks to Explore the Ocean of Life
Los Alamos, NM
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...Larry Smarr
Invited Presentation at EMBC ‘16
38th International Conference of the IEEE Engineering in Medicine and Biology Society Symposium: The Quantified Self: Visions for the Next Decade of Persistent Physiological Monitoring
Orlando, FL
August 18, 2016
The Human Microbiome and the Revolution in Digital HealthLarry Smarr
2014.01.22
Calit2 Director Larry Smarr speaks as part of the Pensacola Evening Lecture Series, organized by the Florida Institute for Human and Machine Cognition, in Pensacola, FL.
In a speech for the Global Health Program at the Council on Foreign Relations in New York City, Calit2 director Larry Smarr addresses the issue of biological diversity and the importance of monitoring the microbiome.
Exploring the Dynamics of The Microbiome in Health and DiseaseLarry Smarr
Remote Invited Provocateur Lecture
2017 Innovation Lab on Quantitative Approaches to Biomedical Data Science:
Challenges in our Understanding of the Microbiome
San Diego, CA
June 19, 2017
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
07.06.07
Director's Colloquium
Los Alamos National Laboratory
Title: Using Supercomputers and Supernetworks to Explore the Ocean of Life
Los Alamos, NM
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human GenomicsLarry Smarr
This presentation on February 27, 2014 to NGS and the Future of Medicine at Illumina Headquarters in La Jolla, CA, was made by Calit2 Director Larry Smarr.
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step FurtherLarry Smarr
Calit2 Director Larry Smarr gives the closing keynote address to the 2nd annual Learning Conference on Integrated Delivery Systems in San Diego on May 7, 2014.
Know Thyself: Quantifying Your Human Body and Its One Hundred Trillion MicrobesLarry Smarr
Understanding Cultures and Addressing Disparities in Society: Degrees of Health and Well-Being Public Lecture Series
University of California, San Diego
January 20, 2016
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...Larry Smarr
Invited Talk Delivered by Mehrdad Yazdani, Calit2 Ayasdi Sponsored Lunch & Learn American Society of Human Genetics (ASHG) San Diego Convention Center October 19, 2014
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
1. “Using Supercomputers and Gene Sequencers
to Discover Your Inner Microbiome”
Keynote Talk
International Conference on Computational Science
San Diego, CA
June 6, 2016
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
2. Abstract
The human body is host to 100 trillion microorganisms, ten times the number of DNA-bearing cells in the
human body, and these microbes contain 300 times the number of DNA genes that our human DNA does.
The microbial component of our "superorganism" is comprised of hundreds of species with immense
biodiversity. To put a more personal face on the "patient of the future," I have been collecting massive
amounts of data from my own body over the last seven years, which reveals detailed examples of the
episodic evolution of this coupled immune-microbial system. Collaborating with the UC San Diego Knight
Lab, we have genetically sequenced a time series of my gut microbiome, as well as single moments from
50 patients with autoimmune disease. An elaborate software pipeline, running on high performance
computers, reveals the details of the microbial ecology and its genetic components, in health as well as in
disease. Not only can we compare a person with a disease to a healthy population, but we can also follow
the dynamics of the diseased patient. We can look forward to revolutionary changes in medical practice
over the next decade.
3. Forty Years of Computing Gravitational Waves
From Colliding Black Holes
1977
L. Smarr and K. Eppley
Gravitational Radiation Computed
from an Axisymmetric
Black Hole Collision
2016
LIGO Consortium
Spiral Black Hole Collision
40 Years
4. Complexity of Computing First Gut Microbiome Dynamics
Versus First Dynamics of Colliding Black Holes
• My 1975 PhD Dissertation
– Solving Einstein’s Equations of General Relativity for Colliding Black Holes and Grav Waves
– CDC 6600 Megaflop/s
– Hundreds of Hours
• Rob Knight and Smarr Gut Microbiome Map
– Mapping From Illumina Sequencing to Taxonomy and Gene Abundance Dynamics
– Comet Petaflop/s
– Comet Core is 40,000x CDC6600 Speed
– Million Core-Hours
– 10,000x Supercomputer Time
• Gut Microbiome Takes ~ ½ Billion Times the Compute Power of Early Solutions of
Dynamic General Relativity
5. As a Model for the Precision Medicine Initiative,
I Have Tracked My Internal Biomarkers To Understand My Body’s Dynamics
My Quarterly
Blood Draw
Calit2 64 Megapixel VROOM
6. Only One of My Blood Measurements
Was Far Out of Range--Indicating Chronic Inflammation
Normal Range <1 mg/L
27x Upper Limit
Complex Reactive Protein (CRP) is a Blood Biomarker
for Detecting Presence of Inflammation
Episodic Peaks in Inflammation
Followed by Spontaneous Drops
7. Adding Stool Tests Revealed
Oscillatory Behavior in an Immune Variable Which is Antibacterial
Normal Range
<7.3 µg/mL
124x Upper Limit for Healthy
Lactoferrin is a Protein Shed from Neutrophils -
An Antibacterial that Sequesters Iron
Typical
Lactoferrin Value for
Active Inflammatory
Bowel Disease
(IBD)
8. To Understand the Interaction of Genetics and the Immune System
We Must Consider the Human Microbiome
Your Microbiome is
Your “Near-Body” Environment
and its Cells
Contain 100x as Many DNA Genes
As Your Human DNA-Bearing Cells
Your Body Has 10 Times
As Many Microbe Cells As DNA-Bearing
Human Cells
Inclusion of the “Dark Matter” of the Body
Will Radically Alter Medicine
9. Most of Evolutionary Time
Was in the Microbial World
You
Are
Here
Source: Carl Woese, et al
Tree of Life Derived from 16S rRNA Sequences
10. The Cost of Sequencing DNA
Has Fallen Over 100,000x in the Last Ten Years
This Has Enabled Sequencing of
Both Human and Microbial Genomes
11. June 8, 2012 June 14, 2012
Interest in the Human Microbiome
Has Moved Quickly From Frontier Science to Public Awareness
August 18, 2012June, 2012
13. To Map Out the Dynamics of Autoimmune Microbiome Ecology
Couples Next Generation Genome Sequencers to Big Data Supercomputer
5 Ileal Crohn’s Patients,
3 Points in Time
2 Ulcerative Colitis Patients,
6 Points in Time
“Healthy” Individuals
Source: Jerry Sheehan, Calit2
Weizhong Li, Sitao Wu, CRBS, UCSD
Total of 27 Billion Reads
Or 2.7 Trillion Bases
Inflammatory Bowel Disease (IBD) Patients
250 Subjects
1 Point in Time
7 Points in Time
Each Sample Has 100-200 Million Illumina Short Reads (100 bases)
Larry Smarr
(Colonic Crohn’s)
14. Computational NextGen Sequencing Pipeline:
From Sequence to Taxonomy and Function
PI: (Weizhong Li, CRBS, UCSD):
NIH R01HG005978 (2010-2013, $1.1M)
15. We Used SDSC’s Gordon Data-Intensive Supercomputer
to Completely Analyze a Subset of These Gut Microbiomes
• ~180,000 Core-Hours on Gordon
– KEGG Protein Family Annotation: 90,000 Core-Hours
– Mapping: 36,000 core-hrs
– Used 16 Cores/Node and up to 50 nodes
– Duplicates removal: 18,000 core-hrs
– Assembly: 18,000 core-hrs
– Other: 18,000 core-hrs
• Gordon RAM Required
– 64GB RAM for Reference DB
– 192GB RAM for Assembly
• Gordon Disk Required
– Ultra-Fast Disk Holds Ref DB for All Nodes
– 8TB for All Subjects
Enabled by
a Grant of Time
on Gordon from
SDSC Director
Mike Norman
Source: Weizhong Li, UCSD
16. We Used Dell’s HPC Cloud to Extend Our Taxonomic Analysis
to All of Our Human Gut Microbiomes
• Dell’s Sanger Cluster
– 32 Nodes, 512 Cores
– 48GB RAM per Node
• We Processed the Taxonomic Relative Abundance
– Used ~35,000 Core-Hours on Dell’s Sanger
• Produced Relative Abundance of
~10,000 Bacteria, Archaea, Viruses in ~300 People
– ~3Million Spreadsheet Cells
Source: Weizhong Li, UCSD
Enabled by
a Grant of Time
From Dell/R Systems
17. We Found Major State Shifts in Microbial Ecology Phyla
Between Healthy and Three Forms of IBD
Most
Common
Microbial
Phyla
Average HE
Average
Ulcerative Colitis
Average LS
Colonic Crohn’s Disease
Average
Ileal Crohn’s Disease
Collapse of Bacteroidetes
Explosion of Actinobacteria
Explosion of
Proteobacteria
Hybrid of UC and CD
High Level of Archaea
18. Building a UC San Diego High Performance Cyberinfrastructure
to Support Distributed Microbiome Analysis
FIONA
12 Cores/GPU
128 GB RAM
3.5 TB SSD
48TB Disk
10Gbps NIC
Knight Lab
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
PRP/
19. We Use OpenOrd on Calit2’s 64M Pixel Tiled Wall
to Explore Clustering of Patients and Microbe Species
Ileal
Crohn’s
Healthy
Ulcerative
Colitis
www.sandia.gov/~smartin/presentations/OpenOrd.pdf
Source:
Philip Weber,
QI, UCSD
25. Larry’s 40 Stool Samples Over 3.5 Years
to Rob’s lab on April 30, 2015
26. Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy
Between Two Time-Stable Equilibriums Correlated to Physical Symptoms
Lialda
&
Uceris
12/1/13
to
1/1/14
12/1/13-
1/1/14
Frequent IBD Symptoms
Weight Loss
7/1/12 to 12/1/14
Blue Balls on
Diagram to the Right
Principal Coordinate Analysis of
Microbiome Ecology
PCoA by Justine Debelius and Jose Navas,
Knight Lab, UCSD
Weight Data from Larry Smarr, Calit2, UCSD
Weekly Weight
Few IBD Symptoms
Weight Gain 1/1/14 to 8/1/16
Red Balls on
Diagram to the Right
27. Each Microbe Contains
a Few Thousand Genes on Its DNA
E. Coli Contains ~5000 Genes on its Circular Chromosome,
Which is 1000x the Length of the Cell!
Several Million Genes Can Occur in the Human Gut Microbiome
28. In a “Healthy” Gut Microbiome:
Large Taxonomy Variation, Low Protein Family Variation
Source: Nature, 486, 207-212 (2012)
Over 200 People
29. We Computed the Relative Abundance
of 10,000 KEGG Orthogolous Protein Families In Health and Disease States
http://www.genome.jp/kegg/
Kyoto Encyclopedia
of Genes and
Genomes (KEGG)
30. Using PCA on the 10,000 KEGG Protein Families
We Can Discover Over- and Under-Abundant Genes in Health and Disease
Source: Bryn Taylor, Justine Debelius, Rob Knight, Mehrdad Yazdani, Larry Smarr, UCSD; Weizhong Li, JCVI
31. Using Kolmogorov-Smirnov Test and Random Forest Machine Learning,
We Can Classify Over and Under-Abundant Protein Families
Source: Bryn Taylor, Justine Debelius, Rob Knight, Mehrdad Yazdani, Larry Smarr, UCSD; Weizhong Li, JCVI
Note: Orders of Magnitude Increase or Decrease in
Protein Families Between Health and Disease
Next Step: Which Proteins (Functions) are Altered?
32. To Expand IBD Project the Knight/Smarr Labs Were Awarded
~ 1 Million Core-Hours on SDSC’s Comet Supercomputer
• 8x Compute Resources Over Prior Study
• Smarr Gut Microbiome Time Series
– From 7 Samples Over 1.5 Years
– To 50 Samples Over 4 Years
• IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients
to ~100 Patients
– 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank
– 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients
• New Software Suite from Knight Lab
– Re-annotation of Reference Genomes, Functional / Taxonomic Variations
– Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner
33. We Used SDSC’s Comet to Uniformly Compute
Protein-Coding Genes, RNAs, & CRISPR Annotations
• We Downloaded from NCBI Over 60,000 Bacterial and Archaea Genomes
– Required 5 Core-Hours Per Genome
– 300,000 Core-Hours to Complete
– Ran 24 Cores in Parallel
– Over 400 Days Wall-Clock Time
• Requires a Variety of Software Programs
– Prodigal for Gene Prediction
– Diamond for Protein Homolog Search Against UniRef db
– Infernal for ncRNA Prediction
– RNAMMER for rRNA Prediction
– Aragorn for tRNA Prediction
• Will Make These Results a New Community Database
– Knight Lab, Calit2, SDSC
Source: Zhenjiang (Zech) Xu, Knight Lab, UCSD
34. Next Large Supercomputer Project:
Addressing the Challenges of Metagenomic Assembly
• Differences Between Closely Related Strains
• Varying Coverage Depth Across Individual Genomes
• Inter-Species Repeats (Ribosomal Genes, HGTs, etc.)
• Huge Size and Complexity of Datasets
metaSPAdes: a new versatile assembler
for metagenomic data
Nagarajan and Pop Nature Reviews Genetics 2013
Sergey Nurk1, Dmitry Meleshko1, Anton Korobeynikov1 and Pavel Pevzner1,2
1Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
2University of California San Diego, La Jolla, USA
35. Massive Research is Underway to Discover
A Wide Range of New Techniques for Manipulating Your Microbiome
www.huffingtonpost.com/entry/gut-bacteria-microbiome-disease_us_57068c55e4b053766188f383
www.synlogictx.com
36. Genetic Sequencing of Humans and Their Microbes
Is a Huge Growth Area and the Future Foundation of Medicine
Source: @EricTopol
Twitter 9/27/2014
37. Thanks to Our Great Team!
Calit2@UCSD
Future Patient Team
Jerry Sheehan
Tom DeFanti
Joe Keefe
John Graham
Kevin Patrick
Mehrdad Yazdani
Jurgen Schulze
Andrew Prudhomme
Philip Weber
Fred Raab
Ernesto Ramirez
UCSD CSE Department
Pavel Pevzner
JCVI Team
Karen Nelson
Shibu Yooseph
Manolito Torralba
Ayasdi
Devi Ramanan
Pek Lum
UCSD Metagenomics Team
Weizhong Li
Sitao Wu
SDSC Team
Michael Norman
Mahidhar Tatineni
Robert Sinkovits
Ilkay Altintas
UCSD Health Sciences Team
David Brenner
Rob Knight Lab
Justine Debelius
Jose Navas
Bryn Taylor
Gail Ackermann
Greg Humphrey
William J. Sandborn Lab
Elisabeth Evans
John Chang
Dell/R Systems
Brian Kucic
John Thompson
Thomas Hill