Unifying Space Mission Knowledge with NLP & KGs

1
ESA UNCLASSIFIED – For ESA Official Use Only
Unifying Space Mission Knowledge
with NLP & Knowledge Graphs
Dr. Audrey Berquand & Ana Victoria Ladeira
23/08/2022
Disclaimer: The views and opinions expressed in this
presentation are those of the speakers and do not
necessarily reflect the views and position of ESA.

2
Context – Mission Analysis and Design
Example challenge:
ESA needs a more effective way of detecting and monitoring potentially dangerous wildfires
Mission Operations element
DETERMINE:
Spacecraft element
Orbit and Trajectories
Launch element Spacecraft element
Instrumentation & Payloads

3
Context – Missions Variety
Cassini-Huygens
• Powered by nuclear fuel
• 6.8m x 4m
• Instruments for cosmic dust analysis, plasma
spectrometry, visible and ultraviolet light
• Included lander to land on Titan, one of Saturn’s
moons
• 7 years on route, 3 years orbiting Saturn
Cubesats
• 2kg max
• 10cm side cube
• Uses mostly off-the-shelf components
• Generates around 10W of power for instruments
• Very short lived – 1 to 3 years
Specifications of Space Missions can vary wildly!

4
Context – Increasing number of missions
An increasing number of space missions is
being designed and launched.
Each mission comes with a large data bundle of
reports, presentations, ...
© ESA/CNES/Arianespace/Optique Video du CSG/S Martin
Problem:
How do experts keep-up with all this data,
without losing sight of the Bigger Picture?

5
Our vision
From scattered and heterogenous
information...
...To a structured and unified view of
the space ecosystem

6
I. Background
II. Approach
Summary
Language Models
LMs applications
Knowledge Graphs &
Space
A manually defined schema
Population with the GPT-3 model
Our Knowledge Graphs
Demo & Use Cases
III. Corpus
IV. Results
V. Discussion

7
ESA UNCLASSIFIED - For ESA Official Use Only 7
BACKGROUND

8
Background – Language Models
A Language Model is a probability distribution over words or word sequences.
"In space no can hear you …"
Scream 0.89
Speak 0.75
…
Octopus 0.01
Why should you care about LMs?
• Pretrained LMs in Natural Language Processing (NLP) have pushed the limits of language
understanding and generation.
• It has become a main trend in NLP research
• Famous LMs include BERT, RoBERTa, T5, GPT-2, GPT-3, …
even SpaceRoBERTa and CosmicRoBERTa

9
Background – LMs Applications in your daily life
Code generation Content Generation
Headlime.com
Debuild
Semantic Search
Casetext.com

10
Background – LMs Applications Examples
1. Text Summarisation:
3. Text Parser:
2. Text Generation:
* These examples were all generated through
the OpenAI playground

11
Background – KGs & Space
From Engineering Models To
Knowledge Graph: Delivering
New Insights Into Models –
Migrates Engineering Models
of CubeSats to a KG
NASA’s Space Talent KG
helps them find the right
space expertise for their
projects
NASA’s Lessons
Learned KG combines
NLP and KG to
categorize their lessons

12
APPROACH

14
Approach - KG Schema
Manually defined to reflect both technical and
economical characteristics of missions.
Visualised with TypeDB Studio

15
Approach – KG Population
1. Each mission description is parsed with the Davinci engine of the GPT-3 model, through the OpenAI API
*In reality, we use
the Few-Shot
Learning approach,
and we have 16
parameters parsed.
16 Parameters Parsed:
Entity Attribute Example
Mission
missionName BIOMASS
missionStatus Operational
program Earth Explorer
objectives determine the amount
of biomass and carbon
stored in forests
launchDate 08/2023
endOfLife 10/2030
Stakeholder
StakeholderName
(agency, prime Contractor)
ESA, NASA,
Thales, Airbus,...
Entity Attribute Example
Instrument
instrumentName P-band Synthetic Aperture Radar
instrumentStatus Operational
instrumentType Hyperspectral camera, SAR
measurementApp measure forest biomass
Orbit
orbitType Sun-synchronous
orbitInclination 98 deg
orbitAltitude 660 km
orbitRepeatCycle 3 days

16
Approach – KG Population
3. Using the Vaticle Python API, insert queries are
generated to populate the KG
2. The parsed outputs are saved
into a JSON file with the following format:

17
CORPUS

18
Corpus
• 1200+ Articles, 895 selected for this project
• Each article contains a textual description of mission
• Pages do not follow a template nor contain
structurec data
• 335 missions
• Structured tables with information about
instruments, agencies, orbit and etc
• Maintained by CEOS and ESA
EoPortal Directory CEOS EO Handbook Database

19
RESULTS

20
Results – eoPortal Directory Knowledge Graph
The description of 237 mission descriptions are parsed.
The model provided an output for a majority of the parsing request.
Parameter [%] parsed
Agencies 100
Mission Objectives 100
Mission Status 99.6
Launch Date 98.7
Program 92.4
Instrument Name 90.7
Orbit Type 89.9
Instrument Applications 86.9
Parameter [%] parsed
Instrument Status 85.65
Instrument Type 85.65
Orbit Altitude 81
Prime Contractor 80.6
EOL Date 79.75
Orbit Inclination 78.1
Orbit Repeat Cycle 63.29
Instrument Manufacturer 63.29

21
Results – CEOS Knowledge Graph
The description of 335 missions were ingested from the Missions Table in the CEOS DB
The description of 936 instruments were ingested from the Instruments Table in the CEOS DB
Descriptions were not parsed but directly ingested using the TypeDB Python API given their
structured format.

22
Results – Comparing KGs
• GPT3 is able to capture certain relationships not
present on the CEOS database, like information
about who is the prime of certain missions
• The GPT3 KG captures way more
information about industry in general, CEOS
contains more information about agencies
• CEOS KG contains more information about other
instruments in the mission, while GPT3 only focus
on the main instruments
• GPT3 could be queried for more instruments
in the future
• Overall, the two KGs are very different!

23
DEMO

24
Basic Queries (1)
Inferring all missions handled by a Prime Contractor:
Console and Graph Outputs:

25
Basic Queries (2)
Inferring missions with similar objectives:
Outputs:

26
Results – Use Case 1 – Inference

27
Results – Use Case 2
Collaborating stakeholders were identified using TypeDB’s rules and inference engine. The new relationship can
be used to find collaborators that own instruments with desired applications or specifications. For example, we can
find ESA partners that own wind measuring instruments:
match
$sh1 isa stakeholder, has stakeholderName
"ESA";
$sh2 isa stakeholder, has stakeholderName
$sh2-name;
$inst isa instrument, has instrumentName
$inst-name;
$app isa application, has applicationType
contains "wind measurement", has applicationType
$apptype;
$ownsinst (agency: $sh2, payload: $inst) isa
ownsInstrument;
$enapp (goal: $app, goalFulfiller: $inst)
isa enables;
$collab (collaborator: $sh1,
collaborator:$sh2) isa collaboratesWith;
get $sh2-name; group $sh2-name; count;
Partner No of Instruments
CNES 4
NOAA 15
JAXA 11
UKSA 2
EUMETSAT 3
NASA 29
Partner No of Instruments
nec corporation 1
toshiba corporation 1
NASA 2
TAS 1
Los Alamos national
laboratory 1
batc 1
LusoSpace 1
CEOS
KG
GPT3
KG
CEOS
KG
GPT3
KG

28
Results – Use Case 2 – Combining KGs
• Both graphs share some mission names.
GPT3 is able to capture contractor information
not included on the CEOS database. These
relationships and extra stakeholders were
added to the CEOS KG to further enrich it.
• In the future, a more extensive merge of both
graphs will be attempted
Class Type GPT3 CEOS CEOS+GPT3
Entity stakeholder 389 86 389
Relationship ownsInstrument 165 1117 1154
Relationship isPrimeOf 193 0 193

29
DISCUSSION & CONCLUSION

30
Discussion
GPT-3 limitations:
• Only takes 2k tokens as input
• Human validation is often necessary
• Not open-source
Positive points:
• High parsing rate (86% over all parameters)
• Encouraging results
• Crucial support in populating a KG from text

31
Future Work
Parsing the whole EoDirectory page with GPT3 instead of only first 2000 tokens
Parsing the whole EoDirectory page with GPT3 instead of only first 2000 tokens
Using Open Source models like T5 or BLOOM instead of GPT3
Align the KG with the ESA Space System Ontology
Exploring different KG merging techniques to combine both KGs developed for this study
Adapting GPT3 script to extract more than one instance of each variable where applicable

32
Conclusion
Should we trust the GPT-3 models outputs?
Can Language Models support our vision of providing a unified overview of the space ecosystem?
The generated data should be validated by humans. However, it is impressive
how the model trained on general data understands domain-specific concepts,
Parsing with LMs is a tremendous time saver.
LMs' performances are still improving, and more open-source models are appearing.
So, definitely yes!

33
33
The End
Thank you for your attention,
For more information, see our paper: "From Mission Description to Knowledge Graph: Applying Transformer-
based models to map knowledge from publicly available satellite datasets" to be presented at the 10th
International Systems & Concurrent Engineering for Space Applications (SECESA 2022)
Contact:
Audrey.berquand@esa.int
Anavictoria.ladeira@esa.int

34
Extra slides

35
Approach – Validation of the GPT-3 outputs
For a subset of missions, we compared the generated text of the GPT-3 model with the CEOS database (manually verified). The CEOS
data is used as reference text. Below are the BLEU and ROUGE scores:
The BLEU and ROUGE scores are not always adapted for comparing strings, so we also used sentence-transformer model and cosine
similarity for the objective and the application parameters:

Unifying Space Mission Knowledge with NLP & KGs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Unifying Space Mission Knowledge with NLP & KGs

Similar to Unifying Space Mission Knowledge with NLP & KGs (20)

More from Vaticle

More from Vaticle (20)

Recently uploaded

Recently uploaded (20)

Unifying Space Mission Knowledge with NLP & KGs