SlideShare a Scribd company logo
Cell Line Metadata in ArxLab
• Create ArxLab Registration objects to
parental cell lines with minimal common
metadata
• Create ArxLab Registration entries for
project-specific daughter cell lines with
additional project-specific metadata
• Standardize where possible on ArxLab
Assay definitions
Why this is a Problem
• Lack of a common practicesplatform
inhibits collaboration between
groups since they have to rely on
external sources to know what internal
research has been done on a cell line
• When there is collaboration, e.g., with
one group supplying cell lines and data
to another group, may have issues with
updating metadata, e.g., primary site
change
• Lack of a common vocabulary leads to
data quality issues, e.g., what do you
mean by Doubling Time
• Velocity of scientific discovery is
slower as a result
Challenge • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad
Institute is connecting islands of related data.
• Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and
customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but
have no means of finding out about each other's work, partially due to different means of tracking cell-line data
• The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental
cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both
new science and accelerated progress on existing science
Solution Framework
• Use institutional
database as the
canonical source of
cell line metadata
• Ingest institutional data into local
data management
systems to link
project specific data
to parental cell line data
• Have a common registry of parental
cell lines (available to all) and
daughter cell lines (project specific
by default)
• Preserve heredity of cell lines and
allow searching by such
Example
• What metadata tracked at what level?
• Who decides the metadata categories
and values?
• How do we promote project-specific
metadata to parental cell lines?
Desired State
• Common cell line metadata categories
and data
• Defined, published, flexible processes
for collaborative reviewapproval of
metadata categories and data (e.g.,
intake, change, promotion)
• Retain ability for groups to work
independently on project-specific
metadata and data
• Technology that enables wide-spread
sharing of cell-line metadata categories
and data, inside and outside Broad
Hypothesis
• Use best practices from
manufacturing around
master data management
(e.g., Master Data Review board) to
build necessary organizational
practices
• Use technology to enable
organization processes
• Principles:
o Technology without
organizational
processes is a waste
o Organizational processes without
enabling, sustainable use of
technology will wither
Institutional Cell Line Database
Sample Entity Relationship Diagram
• Tracks multiple names and
annotations (e.g., lineage) and
the source of these claims
• Has no concept of samples or
instances (annotates the
abstract entity only)
• cell_sample: Name space
for a cell line name, e.g., CCLE,
CDDB, ATCC
Enabling Cross-Group
Collaboration on Cell Lines
Data exchange via Java Script Object Notation (JSON) file:
cell_sample = { cell_sample_names: [
{cell_name_type: "CCLE", cell_sample_name: "A375_SKIN"},
{cell_name_type: "cddb", cell_sample_name: "30"},
{cell_name_type: “ATCC", cell_sample_name: "ATCC: A-375 [A375] (ATCC® CRL-1619™)“} ] }
• cell_name_type: Name for
cell line and internal priority of
that name, e.g., may prefer one
name to another name
• cell_sample_name: array of
names for a cell line, e.g.,
o CCLE: A375_SKIN
o CDDB: 30
o ATCC: A-375 [A375] (ATCC®
CRL-1619™)
Bruce Kozuma, PMP, CPIM
Broad Institute
bkozuma@broadinstitute.org
Current State
• Multiple groups creating and using cell
lines at the Broad, e.g., Achilles, PRISM,
Cancer Cell Line Encyclopedia (CCLE)
• Some canonical sources of cell-line
data at Broad, e.g., Cancer Cell Line
Dependencies Database (CDDB)
• However!
o Limited coordination in definitions
of what constitutes a unique cell line
and how changes are made to that
definition over time
o No effective mechanisms to curate,
register, or search such definitions
o No automated refresh cycle for data
in CDDB
Credits
CDD Data Curation
Paul Clemons
Mahmoud Ghandi
Shuba Gopal
Gregory Gydush
Barbara Weir
Achilles
Francesca Vazquez
Sasha Pantel
Nicole Dabkowski
Phil Montgomery
Glenn Cowley
PRISM
Chris Mader
Jen Roth
Sam Bender
Massami Laird
Ed McBride
Broad Management
Alex Burgin
Anthony Philippakis
Scott Sutherland
BITS
Chris Dwan
Eric Jones
Arxspan
Jeff Carter
Kate Hardy

More Related Content

What's hot

A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Stuart Chalk
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
Stuart Chalk
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
Stuart Chalk
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
ericmeeks
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
FAIRDOM
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
Stuart Chalk
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
FAIRDOM
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Aleatha Parker-Wood
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
Carole Goble
 
Bh14 ogo
Bh14 ogoBh14 ogo
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
FAIRDOM
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 

What's hot (20)

A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Bh14 ogo
Bh14 ogoBh14 ogo
Bh14 ogo
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 

Similar to 2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1

2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
Bruce Kozuma
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
mestato
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
Pradeeban Kathiravelu, Ph.D.
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
Philippe Rocca-Serra
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
Rothamsted Research, UK
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
Enis Afgan
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
ManjulaPatel
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
Valery Tkachenko
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
Carole Goble
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
Matthew Critchlow
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
Elsevier
 
1650607.ppt
1650607.ppt1650607.ppt
1650607.ppt
KalsoomTahir2
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess
23mz02
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptx
TusharAgarwal49094
 
CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)
Dilawar Khan
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
Fayez Shayeb
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Syed Ahmad Chan Bukhari, PhD
 

Similar to 2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1 (20)

2016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v12016 Bio-IT World Cell Line Coordination 2016-04-06v1
2016 Bio-IT World Cell Line Coordination 2016-04-06v1
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Data Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data LakesData Café — A Platform For Creating Biomedical Data Lakes
Data Café — A Platform For Creating Biomedical Data Lakes
 
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
1650607.ppt
1650607.ppt1650607.ppt
1650607.ppt
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess
 
Data Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptxData Never Lies Presentation for beginners in data field.pptx
Data Never Lies Presentation for beginners in data field.pptx
 
CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1

  • 1. Cell Line Metadata in ArxLab • Create ArxLab Registration objects to parental cell lines with minimal common metadata • Create ArxLab Registration entries for project-specific daughter cell lines with additional project-specific metadata • Standardize where possible on ArxLab Assay definitions Why this is a Problem • Lack of a common practicesplatform inhibits collaboration between groups since they have to rely on external sources to know what internal research has been done on a cell line • When there is collaboration, e.g., with one group supplying cell lines and data to another group, may have issues with updating metadata, e.g., primary site change • Lack of a common vocabulary leads to data quality issues, e.g., what do you mean by Doubling Time • Velocity of scientific discovery is slower as a result Challenge • One of the key challenges in conducting research in a diverse and dynamic organization like the Broad Institute is connecting islands of related data. • Since scientific groups have traditionally been separated from each other, relying on each other as internal suppliers and customers, their data have similarly been separated; it is not uncommon to have two groups working on the same cell line but have no means of finding out about each other's work, partially due to different means of tracking cell-line data • The Broad Institute has collaborated with Arxspan to develop a configuration of ArxLab to share a common registry of parental cell lines, allowing different groups to have a common vocabulary about cell lines and opening collaboration possibilities for both new science and accelerated progress on existing science Solution Framework • Use institutional database as the canonical source of cell line metadata • Ingest institutional data into local data management systems to link project specific data to parental cell line data • Have a common registry of parental cell lines (available to all) and daughter cell lines (project specific by default) • Preserve heredity of cell lines and allow searching by such Example • What metadata tracked at what level? • Who decides the metadata categories and values? • How do we promote project-specific metadata to parental cell lines? Desired State • Common cell line metadata categories and data • Defined, published, flexible processes for collaborative reviewapproval of metadata categories and data (e.g., intake, change, promotion) • Retain ability for groups to work independently on project-specific metadata and data • Technology that enables wide-spread sharing of cell-line metadata categories and data, inside and outside Broad Hypothesis • Use best practices from manufacturing around master data management (e.g., Master Data Review board) to build necessary organizational practices • Use technology to enable organization processes • Principles: o Technology without organizational processes is a waste o Organizational processes without enabling, sustainable use of technology will wither Institutional Cell Line Database Sample Entity Relationship Diagram • Tracks multiple names and annotations (e.g., lineage) and the source of these claims • Has no concept of samples or instances (annotates the abstract entity only) • cell_sample: Name space for a cell line name, e.g., CCLE, CDDB, ATCC Enabling Cross-Group Collaboration on Cell Lines Data exchange via Java Script Object Notation (JSON) file: cell_sample = { cell_sample_names: [ {cell_name_type: "CCLE", cell_sample_name: "A375_SKIN"}, {cell_name_type: "cddb", cell_sample_name: "30"}, {cell_name_type: “ATCC", cell_sample_name: "ATCC: A-375 [A375] (ATCC® CRL-1619™)“} ] } • cell_name_type: Name for cell line and internal priority of that name, e.g., may prefer one name to another name • cell_sample_name: array of names for a cell line, e.g., o CCLE: A375_SKIN o CDDB: 30 o ATCC: A-375 [A375] (ATCC® CRL-1619™) Bruce Kozuma, PMP, CPIM Broad Institute bkozuma@broadinstitute.org Current State • Multiple groups creating and using cell lines at the Broad, e.g., Achilles, PRISM, Cancer Cell Line Encyclopedia (CCLE) • Some canonical sources of cell-line data at Broad, e.g., Cancer Cell Line Dependencies Database (CDDB) • However! o Limited coordination in definitions of what constitutes a unique cell line and how changes are made to that definition over time o No effective mechanisms to curate, register, or search such definitions o No automated refresh cycle for data in CDDB Credits CDD Data Curation Paul Clemons Mahmoud Ghandi Shuba Gopal Gregory Gydush Barbara Weir Achilles Francesca Vazquez Sasha Pantel Nicole Dabkowski Phil Montgomery Glenn Cowley PRISM Chris Mader Jen Roth Sam Bender Massami Laird Ed McBride Broad Management Alex Burgin Anthony Philippakis Scott Sutherland BITS Chris Dwan Eric Jones Arxspan Jeff Carter Kate Hardy