(Introduce ourselves, then Northrop Grumman)
Northrop Grumman is a Fortune 500 aerospace and defense technology company with over 90,000 employees.
You may not have heard of us since our products are generally sold to governments, not individuals.
Within Northrop, we are part of the Digital Transformation Organization, which focuses on improving business and engineering processes through modern software systems and data science practices.
As you might have guessed by our presence at this conference, we often use graphs to accomplish this.
Solving complex engineering and business problems generally requires varied data spread across multiple applications.
Many projects at Northrop Grumman are initially complex, with unnecessary complexity added because data is spread across so many systems.
As we began looking at this problem last summer, we found that a common though pattern was to just shuttle data from one application to another.
As we’ll see in the subsequent slides, this leads to many problems.
We realized that, as people sought to found various fragmented approaches to dealing with this data management challenge.
This desire to simply shuttle data between systems without a unified approach leads to sub-optimal solutions.
While manually entering data from one system into another isn’t commonly used, we did find instances where this was the data management approach. Obviously, this is labor-intensive, error-prone, and generally just a bad idea.
Manually exporting the data from one system and importing it into another is a slight improvement over manual entry. However, it still requires a human to export/import data between systems and many systems do not allow data from another system to be cleanly imported.
To remove the human-in-the-loop aspect of the upper solutions, people will often write some sort of script to automatically move data from one system to another. While this does improve from manual management, it requires an additional shuttling script for each system that
You’ve probably watched people work their way up through this sequence of approaches if your company is anything like ours.
While exposing data via APIs is certainly preferable, we found that it wasn’t always possible at Northrop to just provide API access to a set of data. Sometimes, due to functionality in a particular app, a contract requirement, or other constraint, data based on multiple systems MUST end up in a destination system.
If all you’ve done is lifted and shifted your data, you’ve missed the point
Shuttling data between applications causes data duplication
This can cause confusion as to which duplicate is to be trusted if they don’t match
Shuttling data between applications causes synchronization issues
Time consuming to create connections between each pair of applications
If data is certified for a use case and then becomes updated, the certification must be re-reviewed
Legacy applications lack interfaces with modern applications and vice versa
Solutions often structurally reflect the problem they are designed to solve
Some claim the structure of CIA reflects Soviet Union structure (brain/leader at top, siloed organizations sending data up to the top and receiving decisions)
Kraken has many arms connected to a central hub, which reflects the many-armed nature of the problem
Prototype proposal
As this problem seemed very graph-y, we submitted a proposal to build a prototype graph-based master data management solution
Received funding and built the prototype
Mention briefly, point people to the ODD talk
TODO: put GRAND stack logos
Each system becomes a “tentacle” that plugs into the central hub
Instead of creating connections between each pair of systems, each system just needs to connect into Kraken.
Once a tentacle is connected to Kraken, the data in that tentacle can subscribe to data in any other tentacle. Conversely, each other tentacle can subscribe to the data in the newly connected tentacle.
We divided the problem up into various domains, which represent a small part of the overall solution.
For each tentacle, we identified the need for a source domain, capturing information about where the data initially comes from.
Then, we layered on structure based on how the application structures itself.
Then, we layered on semantics, which add more knowledge about what things relate to each other and what they mean to the user.
When you do need data to be synchronized between systems, creating a publisher – subscriber structure is an easy way to allow one item to “listen” to another, notifying the user when a change occurs
Our data management application would ideally provide several key functionalities
Allow the user to create subscriptions (publisher – subscriber model) between specific data items
Provide an auditable history of changes for key data items
Allow data to be certified for a use case
Provide an auditable history of certifications for data items
Notify the user when certified data has been modified
While working on this problem, we discovered that, depending on who we talked to, they would talk about ASOTs in one of these two ways.
A digital thread is hot topic in the defense industry.
It allows tracking data lineage and auditing changes over time.
The publisher subscriber model and history of changes for an individual atom create, from Kraken’s point of view, a digital thread.
Fabricated data in the same structure as Tasks (Activities) in Jira and Activities in Primavera
Made using fabricated data.
Jira is an Atlassian product to track progress on tasking, among other things.
Primavera is an Oracle product to manage project portfolios.