AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
1. GraphSummit | London | November 14th, 2023
Empowering AZ’s
Data Connectivity
Building an Internal Knowledge Graph Service
to foster Knowledge Graph projects, enhancing
Data Reusability with Federated Queries, and
harvesting LLM power to talk to the Graphs
Antonio Fabregat, PhD
Knowledge Graph Lead
Enterprise Data Office, IGNITE (AZ)
2. Presentation
Outline
• Revolutionizing Data Management
• Internal deployment of the Graph as a Service capability
• A view of the current AZ´s Knowledge Graph project landscape
• Queries Federation across several Knowledge Graphs
• Harvesting LLM power to talk to the graphs
2
4. Tackling Data Growth with Knowledge Graphs
4
• Drug Discovery and Research Data
• Genomic and proteomic data, high-throughput screening results, clinical trial data, etc.
• Real-World Evidence and Patient Data
• Electronic health records (EHRs), wearable devices and remote monitoring, patient-reported outcomes, etc.
• And more…
Rapid data growth from various sources
• Data management
• Analysis tools
Increasing need for efficient
• Organize and structure data
• Facilitate easier access and analysis
Knowledge Graphs as a solution
5. Knowledge Graphs vs Traditional Data Management Systems
5
•Struggle with complex relationships and semantics
•Limited in capturing meaning and context
Traditional data management systems
•Excel at representing relationships and semantics
•Understand meaning and context of data
•Enable more effective analysis and insights
Knowledge graphs advantages
6. AI and Machine Learning with Knowledge Graphs
6
Increasing use of AI and
machine learning
in Data Analysis
•Need for efficient data representation
and processing
Knowledge Graphs
as a Solution
•Enriched data model
•Enhance AI and machine learning
algorithms' understanding
•Improve data analysis capabilities
7. Data Integration and Interconnectivity
7
Seamless integration of data from various sources
• Creation of a unified view of information
Overcoming Data Silos
•Interconnectivity of Knowledge Graphs
Increased value from Data Assets
•Enhanced Data Analysis and Insights
•Improved Decision-Making
8. Enhanced Search and Discovery
8
Improved Search and
Discovery Capabilities
More intuitive Query Languages
Uncovering hidden
relationships within data
Accurate and Relevant
Search Results
Enhanced user experience
Increased productivity
9. Personalisation and Recommendation
9
• Rich data representation and relationship mapping
Understanding user preferences, behaviour, and context
• Relevant content based on user interests
Personalized and targeted recommendations
• Improved customer satisfaction and retention
Increased user engagement
10. Knowledge Graphs representation alternatives
10
* Adapted from documentation at W3C https://www.w3.org/
Two ways of representing/storing a Knowledge Graph
RDF-star (Resource Description Framework)
Semantic Web: Good for common standards and data exchange
Data model based on 3 parts: subject, predicate and objects
Nodes’ properties added as predicates. Edges with properties are “triple-resources” (like “meta-nodes”)
Storage: “Triple/Quad Stores” Graph Databases
Any type of real-world information, can be represented in a Knowledge Graph
18 nodes (5 instances, 4 classes, 8 literals, 1 triple-resource)
19 relationships (triples)
Knowledge Graph is a way of organizing data & information in the form of a graph
A collection of interlinked concepts, entities, events that represent a network of real-world entities, the relationships between them.
LPG (Labelled-Property Graph)
Good for highly dynamic, transactional use cases
Data organized as nodes, labels, relationships and properties
Both nodes and edges can have properties
Storage: Native Graph Databases
5 nodes (5 ids, 4 Labels, 8 properties)
4 relationships (2 properties)
12. Why Knowledge Graphs? and why a Service?
12
• Data management and analysis
• Overcoming data silos and integration challenges
Growing importance of knowledge graphs
• Hosting and development support for knowledge graphs
• Robust and scalable solutions
• Enhanced data-driven decision-making
Need for efficient and reliable services
• Improved data accessibility and insights
• Streamlined collaboration and innovation
Benefits for businesses and organizations
13. A view of the AZ´s
Knowledge Graph
Project Landscape
13
14. Biology | Market Strategy | Logistics | Environmental targets
14
Biological Insights
Knowledge Graph
Graph machine learning to help scientists
make faster & better drug discovery decisions
Competitive Intelligence
Knowledge Graph
One-stop-shop for competitive intelligence,
transforming a manual system into a rich service
Supply Chain
Knowledge Graph
Insights into the company’s supply chain,
streamlining processes to enhance decision-making
Sustainability
Initiative
Decision-making support system aiming to
reduce the company’s carbon footprint
15. Compounds
15
Compounds Synthesis
& Management
(CSMKG)
Combine several databases
Transforms operational data into business
insights to drive continuous improvements
in storage, logistics and delivery
High Throughput
Screening
(HTSKG)
Contains >£45 million worth of data
Increases the quality and efficiency
of future HTS screens
Compounds
& Fragments
(CFKG)
Creates a view of the chemical space
like a medicinal or computation chemist.
Contains all internal and selected external
libraries and allows users to modify a
search and receive feedback ‘live’
16. PharmaSci
16
Formulation
Knowledge Graph
Pre-clinical formulation design process
Leading to quicker, more effective
scientific developments
Boston Formulation
Knowledge Graph
Improves the understanding of our data
Enhances collaboration by breaking down
silos and connecting disparate data sources
Lipid Nano Particles
Knowledge Graph
Machine learning models
Predicts in-vivo activity from in-vitro
data for intra-cellular drug delivery
and LNP formulation design
19. 19
Let’s build bridges to connect “siloes” of interest…
Query federation describes a collection of
features that enable users and systems to
run queries against multiple siloed data
sources without needing to migrate all data
to a unified system.
Federated Queries
are these BRIDGES
20. 20
Let’s build bridges to connect “siloes” of interest…
The diagram shows the resulting subgraph for
the federated query that answers the question
“Find all genes in BIKG linked with a specific disease, and then
all trials in CIKG that are testing drugs targeting those genes”
Biological Insights
Knowledge Graph
Competitive Intelligence
Knowledge Graph
CIKG
23. Acknowledgments
• Aaron Holt
• Nicolas Mervaillie
• Joe Depeau
• Job Maelane
• Yuen Leung Tang
• Jesus Barrasa
• Daniel Addison
• Delyan Ivanov
• Suzy Jones
• Wolfgang Klute
• Michael Lainchbury
• Andriy Nikolov
• Nishank Mahore
• Cristina Mihetiu
• Justin Morley
• Michaël Ughetto
• Lauren Eardley
• Karen Roberts
• Anthony Puleo
• Cinthia Willaman
• Ivan Figueroa
• Carlos Mercado
• Jorge Gutierrez
• Koushik Srinivasan
24. Enterprise Data Office | IGNITE
Enterprise Knowledge Graph Service
Robert Hernandez
Knowledge Engineering
Lead
Sandra Carrasco
Senior Knowledge
Graph Engineer
Antonio Fabregat
Knowledge Graph Lead
Ronnie Mubayiwa
Senior DevOps Engineer
Varun Bhandary
Senior Solution Architect
Sree Balasubramanyam
Senior IT Project Manager
Vishal Kumar
DevOps Engineer
Preetha Mutharasu
Knowledge Graph
Engineer
Prem Oliver Vincent
Scrum Master
Andy Stafford-Hughes
Testing Manager
Umapathy Boopathy
Cloud Solution Architect
Pascual Lorente
Senior Knowledge
Graph Engineer
Editor's Notes
The rapid growth of data generated from various sources, including IoT devices, social media, and business applications, has led to an increasing need for efficient data management and analysis tools.
Knowledge graphs help to organise and structure this vast amount of data, making it easier to access and analyse.
Traditional data management systems often struggle to capture the complex relationships and semantics within data.
Knowledge graphs excel at representing and understanding the meaning and context of data, allowing for more effective analysis and insights.
The increasing use of AI and machine learning in data analysis requires more efficient ways to represent and process data.
Knowledge graphs provide an enriched data model that allows AI and machine learning algorithms to better understand and analyse data.
Knowledge graphs enable seamless integration of data from various sources, creating a unified view of information.
This interconnectivity helps overcome data silos, allowing organizations to derive more value from their data assets.
Knowledge graphs improve search and discovery capabilities by enabling more intuitive query languages and uncovering hidden relationships within data.
This results in more accurate and relevant search results, enhancing user experience and productivity.
Knowledge graphs support personalized and targeted recommendations by understanding user preferences, behaviour, and the context of their interactions.
This leads to more relevant recommendations and increased user engagement.
Growing importance of knowledge graphs in data management and analysis.
There is need for an efficient and reliable service to support both hosting and development of knowledge graphs.
AZ is investing in this area to create a robust and scalable service.
When we talk about multiple siloed databases, we could imagine an archipelago. At the first glance, visiting all islands, doesn't seem an easy task!
With the right infrastructure, multiple islands can be connected, and visiting them, suddenly, becomes way easier. Federating queries, across siloed databases, is like building bridges between islands.
This allows running queries against multiple siloed data sources, without needing to migrate all data to a unified system.