The Human Cell Atlas Data Coordination Platform
March 20th
12th CeBiTec Symposium
Big Data in Medicine and Biotechnology
Center for Interdisciplinary Research (ZiF), Bielefeld University
Laura Clarke
A “google maps” of human anatomy
Goal : A periodic table of our cells
To create a comprehensive reference map of the
types and properties of all human cells, the
fundamental unit of life, as a basis for
understanding, diagnosing, monitoring, and
treating health and disease
Human Cell Atlas: Mission
Experimental Approach
The Art of Clean Up, Ursus Wehrli, Kimberly Vardeman
Experimental Approach
The Art of Clean Up, Ursus Wehrli, Kimberly Vardeman
Bulk population based methods
Experimental Approach
The Art of Clean Up, Ursus Wehrli, Kimberly Vardeman
Bulk population based methods
Single Cell Methods
Experimental Approach
The Art of Clean Up, Ursus Wehrli, Kimberly Vardeman
Bulk population based methods
Single Cell Methods Spatial methods
Phase I status update
Ensure the data is open and accessible to all researchers
Encourage innovation on computation methods
Enable sharing of new methods and technologies
Organize and standardize terabytes of data for billions of
cells, across multiple modalities, generated by hundreds of
labs around the world.
Data Coordination Challenges
The Data Coordination Platform
The Team
62 scientists and engineers
4 different locations
3 time zones
~30 video conferences
a month
How far the DCP has come
The Data Coordination Platform
Cloud-based
Open
Modular
International
Data Generators
Many submitters
Global community
Diversity of biology
Specialists / cores
Ingest
User interface
API
Metadata
standards
Validation
Store
Writes valid data and
metadata
Unique versioned
identifiers
Cloud replication
Open Data
Raw data
immediately
available
Notification
system with
events
Analysis Pipelines
Standardized analysis
pipelines
Analysis Working Group
Full transcript scRNA-Seq
3’ scRNA-seq
Portability
Standardized Results
Analysis products
written using Ingest
Products are
stored in the
data store
Data Access
Search, download,
and compute on
all data
Web Browser
Command Line
Multiple Clouds
What is next for the DCP?
The Data Coordination Platform and the Community
Data Commons
HCA metadata standards are:
Agile
Modular
Flexible
Validated
Open
Community driven
Ensuring HCA data is FAIR
https://github.com/HumanCellAtlas/metadata-schema
Simple entities and flexible relationships
A concrete example
Tertiary portals
● Save searches as collections
● Share search collections
● Download files (FASTQ, bam, tiff,
expression matrix, etc)
Browsing the data
https://www.flaticon.com
● Filter for a set of data in the Data Browser
● “Handoff” mechanism to Tertiary Portals for analysis/visualization
● Sends search results
Hand data to the tertiary portals
Launch
https://www.flaticon.com
● What do you need from the DCP?
○ Submission
○ Discovery
○ Analysis
Join Slack http://join-slack.humancellatlas.org/
See our code https://github.com/humancellatlas
Email data-help@humancellatlas.org
How to get involved?
HCA registered members: 44 countries, 341 institutes
Thanks!
Mallory Freeberg
Vladimir Kiselev
Irene Papatheodrou
Brian O’Conor
Aviv Regev
Sarah Teichmann
Timothy Tickle
Mike Stubbington

The Human Cell Atlas Data Coordination Platform