Fighting a Multi-armed Monster With Graph: Master Data Management in Neo4j

1. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. Fighting a Multi-armed Monster With Graph: Master Data Management in Neo4j Steven Scott Cognitive Software Engineer at Northrop Grumman Travis Confer Software Engineer at Northrop Grumman Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

2. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 2 Goal: Solve complex engineering and business problems The Problem Emerges • A single problem may require data from multiple systems • Systems do not generally interface nicely • Concepts often span multiple data stores “I need data from both system A and system B to do X, so I’ll just shuttle some data from A into B.” – Common thought pattern Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

3. © 2022 Neo4j, Inc. All rights reserved. Previous Hodge-Podge of Data Management Approaches Manually duplicate data from one system to another 💀💀💀 Manual export + import of data between systems 💀💀 Ad hoc scripts to shuttle data💀 Automated script runner 3 Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

4. © 2022 Neo4j, Inc. All rights reserved. 4 Data Shuttling Problem and Pain Points Data duplication Synchronization issues Which version to trust? Time consuming to create pair-wise connections Non-uniform data review Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

5. © 2022 Neo4j, Inc. All rights reserved. 5 Graph-First Approach • Graph-based approach to data modeling • Declarative • To learn more, see “Accelerating ML Ops with Graphs and Ontology-Driven Design” Ontology Driven Design • GraphQL • React • Apollo • Neo4j Database GRAND Stack Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

9. © 2022 Neo4j, Inc. All rights reserved. 9 Needed Functionality Determine whether data is out-of- sync Audit history of changes Review and certify data Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

10. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 10 ASOT: Authorized for a Use Case ASOT: Authored in some system A human in authority reviews a particular piece of data, determines that it is accurate, and gives it a stamp of approval An application generates the pieces of data. Since that is the original source of the data, it is considered authoritative Two conflated definitions for the Authoritative Source of Truth (ASOT) Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

11. © 2022 Neo4j, Inc. All rights reserved. 11 Tracking Digital Threads Tracing data from cradle to grave Needed to maintain an Authoritative Source of Truth (ASOT) Approved for Public Release: NG22-0878 © 2022, Northrop Grumman Start Update Current State by Jane by Joe

14. © 2022 Neo4j, Inc. All rights reserved. 14 Why Graphs are Useful for Master Data Management Dependencies are transparent Analytics can be done in the graph Apparent where data is authored Apparent what data has been authorized Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

15. © 2022 Neo4j, Inc. All rights reserved. 15 Drawbacks of this Approach Single-value nodes not ideal in Neo4j Results in considerable data storage scale-up Some data types are problematic Image data Blobs/binary files Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

16. © 2022 Neo4j, Inc. All rights reserved. 16 Benefits of this Approach Transparent system dependencies Remove unnecessary data shuttling Subscribe to true source Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

17. © 2022 Neo4j, Inc. All rights reserved. 17 When is this Approach Most Appropriate? Data is being copied into many systems Requires data tracking a granular level System interconnectivity is high Approved for Public Release: NG22-0878 © 2022, Northrop Grumman

Editor's Notes

(Introduce ourselves, then Northrop Grumman) Northrop Grumman is a Fortune 500 aerospace and defense technology company with over 90,000 employees. You may not have heard of us since our products are generally sold to governments, not individuals. Within Northrop, we are part of the Digital Transformation Organization, which focuses on improving business and engineering processes through modern software systems and data science practices. As you might have guessed by our presence at this conference, we often use graphs to accomplish this.
Solving complex engineering and business problems generally requires varied data spread across multiple applications. Many projects at Northrop Grumman are initially complex, with unnecessary complexity added because data is spread across so many systems. As we began looking at this problem last summer, we found that a common though pattern was to just shuttle data from one application to another. As we’ll see in the subsequent slides, this leads to many problems.
We realized that, as people sought to found various fragmented approaches to dealing with this data management challenge. This desire to simply shuttle data between systems without a unified approach leads to sub-optimal solutions. While manually entering data from one system into another isn’t commonly used, we did find instances where this was the data management approach. Obviously, this is labor-intensive, error-prone, and generally just a bad idea. Manually exporting the data from one system and importing it into another is a slight improvement over manual entry. However, it still requires a human to export/import data between systems and many systems do not allow data from another system to be cleanly imported. To remove the human-in-the-loop aspect of the upper solutions, people will often write some sort of script to automatically move data from one system to another. While this does improve from manual management, it requires an additional shuttling script for each system that You’ve probably watched people work their way up through this sequence of approaches if your company is anything like ours. While exposing data via APIs is certainly preferable, we found that it wasn’t always possible at Northrop to just provide API access to a set of data. Sometimes, due to functionality in a particular app, a contract requirement, or other constraint, data based on multiple systems MUST end up in a destination system.
If all you’ve done is lifted and shifted your data, you’ve missed the point Shuttling data between applications causes data duplication This can cause confusion as to which duplicate is to be trusted if they don’t match Shuttling data between applications causes synchronization issues Time consuming to create connections between each pair of applications If data is certified for a use case and then becomes updated, the certification must be re-reviewed Legacy applications lack interfaces with modern applications and vice versa
Solutions often structurally reflect the problem they are designed to solve Some claim the structure of CIA reflects Soviet Union structure (brain/leader at top, siloed organizations sending data up to the top and receiving decisions) Kraken has many arms connected to a central hub, which reflects the many-armed nature of the problem Prototype proposal As this problem seemed very graph-y, we submitted a proposal to build a prototype graph-based master data management solution Received funding and built the prototype Mention briefly, point people to the ODD talk TODO: put GRAND stack logos
Each system becomes a “tentacle” that plugs into the central hub Instead of creating connections between each pair of systems, each system just needs to connect into Kraken. Once a tentacle is connected to Kraken, the data in that tentacle can subscribe to data in any other tentacle. Conversely, each other tentacle can subscribe to the data in the newly connected tentacle. We divided the problem up into various domains, which represent a small part of the overall solution. For each tentacle, we identified the need for a source domain, capturing information about where the data initially comes from. Then, we layered on structure based on how the application structures itself. Then, we layered on semantics, which add more knowledge about what things relate to each other and what they mean to the user.
When you do need data to be synchronized between systems, creating a publisher – subscriber structure is an easy way to allow one item to “listen” to another, notifying the user when a change occurs
Our data management application would ideally provide several key functionalities Allow the user to create subscriptions (publisher – subscriber model) between specific data items Provide an auditable history of changes for key data items Allow data to be certified for a use case Provide an auditable history of certifications for data items Notify the user when certified data has been modified
While working on this problem, we discovered that, depending on who we talked to, they would talk about ASOTs in one of these two ways.
A digital thread is hot topic in the defense industry. It allows tracking data lineage and auditing changes over time. The publisher subscriber model and history of changes for an individual atom create, from Kraken’s point of view, a digital thread.
Fabricated data in the same structure as Tasks (Activities) in Jira and Activities in Primavera
Made using fabricated data. Jira is an Atlassian product to track progress on tasking, among other things. Primavera is an Oracle product to manage project portfolios.
Move subscriptions: A->B->C, then A->C is allowed

Fighting a Multi-armed Monster With Graph: Master Data Management in Neo4j

Recommended

Recommended

More Related Content

Similar to Fighting a Multi-armed Monster With Graph: Master Data Management in Neo4j

Similar to Fighting a Multi-armed Monster With Graph: Master Data Management in Neo4j (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Fighting a Multi-armed Monster With Graph: Master Data Management in Neo4j

Editor's Notes