SlideShare a Scribd company logo
1 of 26
Download to read offline
SPONSORED BY THE NATIONAL CANCER INSTITUTE
And then there were 15
standards
Using Neo4j to harmonize data in cancer
research
Todd Pihl, Ph.D.
Mark Jensen, Ph.D.
https://xkcd.com/927
Biological data is naturally a graph
Graph management by subject matter experts
Node
s
Edge
s
Propert
y
Defs
Props referenced here … and defined
here
Entity names are the
keys
Nodes at the
ends,
with direction
Other attributes
specified
Constrain the
data values
to defined
types
Model Description Files
https://github.com/CBIIT/bento-mdf
Bento Framework
Installing a Bento Data Sharing Platform on a Cloud Platform
LOCAL
MACHINE
GITHUB
CLOUD
PLATFORM
Clone files
from
GitHub
Frontend
Backend
Neo4J
-Add test meta data to DB
-Edit UI config files
-View updates in real-time
-Save updated files in bento-frontend
-Push to Git Hub
bento-frontend
bento-backend
bento-data-model
bento-frontend
bento-backend
bento-data-model
Pull updated files
from GitHub
Load data from a
secure S3
bucket
Frontend
Backend
Neo4J
Data Sharing Platform
AWS Environment
Cancer Research Data Commons (CRDC)
Cancer Data Aggregator
Aggregate by patient, sample, study, disease, tissue, etc.
Clinical Proteomics Imaging
Genomics Immuno-
oncology
Animal
Models
Cancer
Biomarkers
Cancer
Research
Data Commons
0100111
0
0100001
1
0100100
1
Data Standards Services
Cancer Data Aggregator (CDA)
• CDA Mission: Provide a single location to query across all CRDC data repositories
• API, Python library
• Currently contains data from Genomics, Proteomics and Imaging Data Commons
• Remaining CRDC data repositories in progress
• Released for CRDC production use on June 28th
• Documentation: https://cda.readthedocs.io/en/latest/
• The Examples page has many Python use cases
• CDA Github: https://github.com/CancerDataAggregator
• Swagger: https://cda.datacommons.cancer.gov/api/swagger-ui.html
• For the first time, CDA allows us to easily look across CRDC at how data are presented to
users.
Houston, we have a problem
Example: Species
Are these fields really the same?
12
Models are for data, not vice versa.
13
Models are for data, not vice versa.
CRDC is a federation of going concerns
• Each CRDC node has its own data systems, business processes, stakeholders,
and users
• Each has its own purpose-built data model that enables data ingestion, query, and distribution.
• Each has large, ongoing inflows and outflows of data today.
• So – A top-down, prescriptive approach to standardization is not feasible.
(Believe us; we know.)
• Standardization emphasizing carrots instead of sticks:
• Access to the CDA is a benefit for any node wanting to extend the reach of its data.
• Approach data standardization as a practical mapping goal: “If you can place your model in the
context of the CDA’s data maps, the CDA can query and serve your data”
• Approach standardization as an iterative process: “Start with a high priority set of metadata, and
expand mapping over time.”
Graphs as a common language for expressing data models
Property Graph Relational Data OWL/RDF
Node Table rows Class
Property Table columns/cells Datatype Property
Relationship Foreign keys/Linking tables Object Property
Representing custom data models as graphs can provide:
• a unified context for managing data and semantics, and
• a framework for integrating data with minimal impact on repository operations.
Creating graph versions of many kinds of data models is possible, since many
popular modeling approaches find natural expression in the Property Graph:
Model Description Format (MDF) - simple, iterative model
recording and schematizing
MDF is a compact, human-readable—and computable—format for defining a
property graph:
• Define Nodes
• Node Properties
• Define Relationships
• Relationship Properties
• Relationship Attributes
• Define and Describe Properties
• Property Attributes, including
• Allowable value types or sets
https://github.com/CBIIT/bento-md
f
In the Bento framework:
• Data SMEs directly update MDF (in GitHub) to make model updates
• Backend data loader and frontend user interfaces are configured directly by MDF
MDF is simple and standardized
17
Philip Musk 12:06
And let me tell you, with data needs driving many
of ICDC's requirements as they are, and have
been thus far, being able to both write the
requirements, and make the required model
changes ahead of engineers doing their thing, is
really powerful. I don't have to explain what
model changes we need to make to someone else
- I can get the model changes done myself, and
explain what we need the engineers and the UI to
do with those changes.
SMEs
Engineering
• Practical principles towards a practical goal led us to practical tools, enabling
• Rapid prototypes and production tier commons
• Integrated Canine Data Commons
• Clinical Trial Data Commons
• Rapid prototypes for data modeling and model visualization
• Cancer Data Service
• Children’s Cancer Data Initiative
• New practical problem: management of multiple dynamic data models over
independent projects
• Creating new models: component reuse?
• Managing acceptable value sets for many Properties in models
• Understanding interrelationships between models for mapping and interoperability
Metamodel Database – the models as data
18
Both data and model as property graphs
Data
Model
("Schema")
Label:
Person
Label:
Person
Label:
Group
Metamodel Schema
20
Defines:
• Models
• Nodes, Relationships, Properties
• Origins, Terms, and Value Sets
• Concepts and Predicates
Schema is represented in MDF
https://github.com/CBIIT/bento-meta/blob/master/metamodel.yaml
Two models in an Metamodel DB (MDB)
21
ICDC CTDC
• In the simple context of Properties, Nodes, and Relationships, we have a
functional repository for multiple graph models
• Python packages move MDF into an MDB, create MDF from models in an MDB
• Docker containers easily run a local MDB, or can provide an instantiated, loaded MDB
• Based directly on Neo4j Community server
images
• Simple Terminology Server (STS) with MDB
as backend
• Enables both GUI and API access to the
models
• Model browsing and fulltext search across
all entities
• STS is also intended to be easy to
distribute and set up
MDB as a model repository and reference
The MDB schema also defines entities for relating models to one another
and to external authorities:
• Concepts & Predicates (“semantics”)
• Origins, Terms, & Value Sets (“terminology”)
Patterns for connecting these to model entities
create separable “layers” that can be added
or modified without disrupting the repository
function.
MDB as a cross-model tool
23
24
• Dynamic
• Like data and data models
• Pragmatic
• Not a repository of ultimate truth
• Tool to help us provide value to NCI today
• Friendly
• Communicates to humans and computers
• Simple, but well-defined
• Not necessarily exhaustive or “complete”
• Distributable
• Not necessarily “central”
• A platform for “mutual understanding” of data
MDB Philosophy: keys to its utility
25
https://cbiit.github.io/bento-meta/mdb-principles.html
• Mark Benson, PhD
• Phil Musk, PhD
• Ming Ying, MS
• Anjan Purkayastha, PhD
• Ye Wu, PhD
• Pat Dunn, PhD
• Nelson Moore, MS
• John Otridge, PhD
Acknowledgements
26

More Related Content

Similar to Government GraphSummit: And Then There Were 15 Standards

Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Mobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideMobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideRob Worthington
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSNicolas Georgeault
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzurePrecisely
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligenceAhsan Kabir
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationDenodo
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructureprojectandppt
 

Similar to Government GraphSummit: And Then There Were 15 Standards (20)

Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Mobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divideMobile Offline First for inclusive data that spans the data divide
Mobile Offline First for inclusive data that spans the data divide
 
SPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDSSPS Vancouver 2018 - What is CDM and CDS
SPS Vancouver 2018 - What is CDM and CDS
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Steps towards business intelligence
Steps towards business intelligenceSteps towards business intelligence
Steps towards business intelligence
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Dbms and it infrastructure
Dbms and  it infrastructureDbms and  it infrastructure
Dbms and it infrastructure
 

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Government GraphSummit: And Then There Were 15 Standards

  • 1. SPONSORED BY THE NATIONAL CANCER INSTITUTE And then there were 15 standards Using Neo4j to harmonize data in cancer research Todd Pihl, Ph.D. Mark Jensen, Ph.D.
  • 3. Biological data is naturally a graph
  • 4. Graph management by subject matter experts Node s Edge s Propert y Defs Props referenced here … and defined here Entity names are the keys Nodes at the ends, with direction Other attributes specified Constrain the data values to defined types Model Description Files https://github.com/CBIIT/bento-mdf
  • 6. Installing a Bento Data Sharing Platform on a Cloud Platform LOCAL MACHINE GITHUB CLOUD PLATFORM Clone files from GitHub Frontend Backend Neo4J -Add test meta data to DB -Edit UI config files -View updates in real-time -Save updated files in bento-frontend -Push to Git Hub bento-frontend bento-backend bento-data-model bento-frontend bento-backend bento-data-model Pull updated files from GitHub Load data from a secure S3 bucket Frontend Backend Neo4J Data Sharing Platform AWS Environment
  • 7. Cancer Research Data Commons (CRDC) Cancer Data Aggregator Aggregate by patient, sample, study, disease, tissue, etc. Clinical Proteomics Imaging Genomics Immuno- oncology Animal Models Cancer Biomarkers Cancer Research Data Commons 0100111 0 0100001 1 0100100 1 Data Standards Services
  • 8. Cancer Data Aggregator (CDA) • CDA Mission: Provide a single location to query across all CRDC data repositories • API, Python library • Currently contains data from Genomics, Proteomics and Imaging Data Commons • Remaining CRDC data repositories in progress • Released for CRDC production use on June 28th • Documentation: https://cda.readthedocs.io/en/latest/ • The Examples page has many Python use cases • CDA Github: https://github.com/CancerDataAggregator • Swagger: https://cda.datacommons.cancer.gov/api/swagger-ui.html • For the first time, CDA allows us to easily look across CRDC at how data are presented to users.
  • 9. Houston, we have a problem
  • 11. Are these fields really the same?
  • 12. 12 Models are for data, not vice versa.
  • 13. 13 Models are for data, not vice versa.
  • 14. CRDC is a federation of going concerns • Each CRDC node has its own data systems, business processes, stakeholders, and users • Each has its own purpose-built data model that enables data ingestion, query, and distribution. • Each has large, ongoing inflows and outflows of data today. • So – A top-down, prescriptive approach to standardization is not feasible. (Believe us; we know.) • Standardization emphasizing carrots instead of sticks: • Access to the CDA is a benefit for any node wanting to extend the reach of its data. • Approach data standardization as a practical mapping goal: “If you can place your model in the context of the CDA’s data maps, the CDA can query and serve your data” • Approach standardization as an iterative process: “Start with a high priority set of metadata, and expand mapping over time.”
  • 15. Graphs as a common language for expressing data models Property Graph Relational Data OWL/RDF Node Table rows Class Property Table columns/cells Datatype Property Relationship Foreign keys/Linking tables Object Property Representing custom data models as graphs can provide: • a unified context for managing data and semantics, and • a framework for integrating data with minimal impact on repository operations. Creating graph versions of many kinds of data models is possible, since many popular modeling approaches find natural expression in the Property Graph:
  • 16. Model Description Format (MDF) - simple, iterative model recording and schematizing MDF is a compact, human-readable—and computable—format for defining a property graph: • Define Nodes • Node Properties • Define Relationships • Relationship Properties • Relationship Attributes • Define and Describe Properties • Property Attributes, including • Allowable value types or sets https://github.com/CBIIT/bento-md f
  • 17. In the Bento framework: • Data SMEs directly update MDF (in GitHub) to make model updates • Backend data loader and frontend user interfaces are configured directly by MDF MDF is simple and standardized 17 Philip Musk 12:06 And let me tell you, with data needs driving many of ICDC's requirements as they are, and have been thus far, being able to both write the requirements, and make the required model changes ahead of engineers doing their thing, is really powerful. I don't have to explain what model changes we need to make to someone else - I can get the model changes done myself, and explain what we need the engineers and the UI to do with those changes. SMEs Engineering
  • 18. • Practical principles towards a practical goal led us to practical tools, enabling • Rapid prototypes and production tier commons • Integrated Canine Data Commons • Clinical Trial Data Commons • Rapid prototypes for data modeling and model visualization • Cancer Data Service • Children’s Cancer Data Initiative • New practical problem: management of multiple dynamic data models over independent projects • Creating new models: component reuse? • Managing acceptable value sets for many Properties in models • Understanding interrelationships between models for mapping and interoperability Metamodel Database – the models as data 18
  • 19. Both data and model as property graphs Data Model ("Schema") Label: Person Label: Person Label: Group
  • 20. Metamodel Schema 20 Defines: • Models • Nodes, Relationships, Properties • Origins, Terms, and Value Sets • Concepts and Predicates Schema is represented in MDF https://github.com/CBIIT/bento-meta/blob/master/metamodel.yaml
  • 21. Two models in an Metamodel DB (MDB) 21 ICDC CTDC
  • 22. • In the simple context of Properties, Nodes, and Relationships, we have a functional repository for multiple graph models • Python packages move MDF into an MDB, create MDF from models in an MDB • Docker containers easily run a local MDB, or can provide an instantiated, loaded MDB • Based directly on Neo4j Community server images • Simple Terminology Server (STS) with MDB as backend • Enables both GUI and API access to the models • Model browsing and fulltext search across all entities • STS is also intended to be easy to distribute and set up MDB as a model repository and reference
  • 23. The MDB schema also defines entities for relating models to one another and to external authorities: • Concepts & Predicates (“semantics”) • Origins, Terms, & Value Sets (“terminology”) Patterns for connecting these to model entities create separable “layers” that can be added or modified without disrupting the repository function. MDB as a cross-model tool 23
  • 24. 24
  • 25. • Dynamic • Like data and data models • Pragmatic • Not a repository of ultimate truth • Tool to help us provide value to NCI today • Friendly • Communicates to humans and computers • Simple, but well-defined • Not necessarily exhaustive or “complete” • Distributable • Not necessarily “central” • A platform for “mutual understanding” of data MDB Philosophy: keys to its utility 25 https://cbiit.github.io/bento-meta/mdb-principles.html
  • 26. • Mark Benson, PhD • Phil Musk, PhD • Ming Ying, MS • Anjan Purkayastha, PhD • Ye Wu, PhD • Pat Dunn, PhD • Nelson Moore, MS • John Otridge, PhD Acknowledgements 26