Synopsis
The number of space missions being designed and launched worldwide is growing exponentially. Information on these missions, such as their objectives, orbit, or payload, is disseminated across various documents and datasets. Facilitating access to this information is key to accelerating the design of future missions, enabling experts to link an application to a mission, and following various stakeholders' activities.
This presentation introduces recent research done at the ESA to combine the latest Language Models with Knowledge Graphs, unifying our knowledge on space missions. Language Models such as GPT-3 and BERT are trained to understand the patterns of human (natural) language. These models have revolutionised the field of NLP, the branch of AI enabling machines to understand human language in all its complexity. In this work, key information on a mission is parsed from documents with the GPT-3 model, and the parsed data is then migrated to a TypeDB Knowledge Graph to be easily queried. Although this work focuses on an application in the space sector, the method can be transferred to other engineering fields.
Presenters
Dr. Audrey Berquand is a Research Fellow at the ESA. Her research aims at enhancing space mission design and knowledge management with text mining, NLP, and Knowledge Graphs. She was awarded her PhD in 2021 from the University of Strathclyde (Scotland) for her thesis on “Text Mining and Natural Language Processing for the Early Stages of Space Mission Design”. Audrey has a background in space systems engineering, she holds an MSc in Aerospace Engineering from the Royal Institute of Technology KTH (Sweden), and a diplôme d'ingénieur from the EPF Graduate School of Engineering (France). Before diving into the world of AI, she spent 3 years at ESA being involved in the early design phases of future Earth Observation missions.
Ana Victória Ladeira works with Knowledge Management at the ESA, using automated methods to exploit the information contained in the piles and piles of documents that ESA generates every day. With a Masters degree in Data Science from Maastricht University, Ana is particularly excited about how NLP methods can help large organizations connect different documents and highlight the bigger picture over a big universe of data sources, as well as using Knowledge Graphs to help connect people to the expertise and information they need.
1. 1
ESA UNCLASSIFIED – For ESA Official Use Only
Unifying Space Mission Knowledge
with NLP & Knowledge Graphs
Dr. Audrey Berquand & Ana Victoria Ladeira
23/08/2022
Disclaimer: The views and opinions expressed in this
presentation are those of the speakers and do not
necessarily reflect the views and position of ESA.
2. 2
Context – Mission Analysis and Design
Example challenge:
ESA needs a more effective way of detecting and monitoring potentially dangerous wildfires
Mission Operations element
DETERMINE:
Spacecraft element
Orbit and Trajectories
Launch element Spacecraft element
Instrumentation & Payloads
3. 3
Context – Missions Variety
Cassini-Huygens
• Powered by nuclear fuel
• 6.8m x 4m
• Instruments for cosmic dust analysis, plasma
spectrometry, visible and ultraviolet light
• Included lander to land on Titan, one of Saturn’s
moons
• 7 years on route, 3 years orbiting Saturn
Cubesats
• 2kg max
• 10cm side cube
• Uses mostly off-the-shelf components
• Generates around 10W of power for instruments
• Very short lived – 1 to 3 years
Specifications of Space Missions can vary wildly!
5. 5
Our vision
From scattered and heterogenous
information...
...To a structured and unified view of
the space ecosystem
6. 6
I. Background
II. Approach
Summary
Language Models
LMs applications
Knowledge Graphs &
Space
A manually defined schema
Population with the GPT-3 model
Our Knowledge Graphs
Demo & Use Cases
III. Corpus
IV. Results
V. Discussion
8. 8
Background – Language Models
A Language Model is a probability distribution over words or word sequences.
"In space no can hear you …"
Scream 0.89
Speak 0.75
…
Octopus 0.01
Why should you care about LMs?
• Pretrained LMs in Natural Language Processing (NLP) have pushed the limits of language
understanding and generation.
• It has become a main trend in NLP research
• Famous LMs include BERT, RoBERTa, T5, GPT-2, GPT-3, …
even SpaceRoBERTa and CosmicRoBERTa
9. 9
Background – LMs Applications in your daily life
Code generation Content Generation
Headlime.com
Debuild
Semantic Search
Casetext.com
10. 10
Background – LMs Applications Examples
1. Text Summarisation:
3. Text Parser:
2. Text Generation:
* These examples were all generated through
the OpenAI playground
11. 11
Background – KGs & Space
From Engineering Models To
Knowledge Graph: Delivering
New Insights Into Models –
Migrates Engineering Models
of CubeSats to a KG
NASA’s Space Talent KG
helps them find the right
space expertise for their
projects
NASA’s Lessons
Learned KG combines
NLP and KG to
categorize their lessons
14. 14
Approach - KG Schema
Manually defined to reflect both technical and
economical characteristics of missions.
Visualised with TypeDB Studio
15. 15
Approach – KG Population
1. Each mission description is parsed with the Davinci engine of the GPT-3 model, through the OpenAI API
*In reality, we use
the Few-Shot
Learning approach,
and we have 16
parameters parsed.
16 Parameters Parsed:
Entity Attribute Example
Mission
missionName BIOMASS
missionStatus Operational
program Earth Explorer
objectives determine the amount
of biomass and carbon
stored in forests
launchDate 08/2023
endOfLife 10/2030
Stakeholder
StakeholderName
(agency, prime Contractor)
ESA, NASA,
Thales, Airbus,...
Entity Attribute Example
Instrument
instrumentName P-band Synthetic Aperture Radar
instrumentStatus Operational
instrumentType Hyperspectral camera, SAR
measurementApp measure forest biomass
Orbit
orbitType Sun-synchronous
orbitInclination 98 deg
orbitAltitude 660 km
orbitRepeatCycle 3 days
16. 16
Approach – KG Population
3. Using the Vaticle Python API, insert queries are
generated to populate the KG
2. The parsed outputs are saved
into a JSON file with the following format:
18. 18
Corpus
• 1200+ Articles, 895 selected for this project
• Each article contains a textual description of mission
• Pages do not follow a template nor contain
structurec data
• 335 missions
• Structured tables with information about
instruments, agencies, orbit and etc
• Maintained by CEOS and ESA
EoPortal Directory CEOS EO Handbook Database
20. 20
Results – eoPortal Directory Knowledge Graph
The description of 237 mission descriptions are parsed.
The model provided an output for a majority of the parsing request.
Parameter [%] parsed
Agencies 100
Mission Objectives 100
Mission Status 99.6
Launch Date 98.7
Program 92.4
Instrument Name 90.7
Orbit Type 89.9
Instrument Applications 86.9
Parameter [%] parsed
Instrument Status 85.65
Instrument Type 85.65
Orbit Altitude 81
Prime Contractor 80.6
EOL Date 79.75
Orbit Inclination 78.1
Orbit Repeat Cycle 63.29
Instrument Manufacturer 63.29
21. 21
Results – CEOS Knowledge Graph
The description of 335 missions were ingested from the Missions Table in the CEOS DB
The description of 936 instruments were ingested from the Instruments Table in the CEOS DB
Descriptions were not parsed but directly ingested using the TypeDB Python API given their
structured format.
22. 22
Results – Comparing KGs
• GPT3 is able to capture certain relationships not
present on the CEOS database, like information
about who is the prime of certain missions
• The GPT3 KG captures way more
information about industry in general, CEOS
contains more information about agencies
• CEOS KG contains more information about other
instruments in the mission, while GPT3 only focus
on the main instruments
• GPT3 could be queried for more instruments
in the future
• Overall, the two KGs are very different!
27. 27
Results – Use Case 2
Collaborating stakeholders were identified using TypeDB’s rules and inference engine. The new relationship can
be used to find collaborators that own instruments with desired applications or specifications. For example, we can
find ESA partners that own wind measuring instruments:
match
$sh1 isa stakeholder, has stakeholderName
"ESA";
$sh2 isa stakeholder, has stakeholderName
$sh2-name;
$inst isa instrument, has instrumentName
$inst-name;
$app isa application, has applicationType
contains "wind measurement", has applicationType
$apptype;
$ownsinst (agency: $sh2, payload: $inst) isa
ownsInstrument;
$enapp (goal: $app, goalFulfiller: $inst)
isa enables;
$collab (collaborator: $sh1,
collaborator:$sh2) isa collaboratesWith;
get $sh2-name; group $sh2-name; count;
Partner No of Instruments
CNES 4
NOAA 15
JAXA 11
UKSA 2
EUMETSAT 3
NASA 29
Partner No of Instruments
nec corporation 1
toshiba corporation 1
NASA 2
TAS 1
Los Alamos national
laboratory 1
batc 1
LusoSpace 1
CEOS
KG
GPT3
KG
CEOS
KG
GPT3
KG
28. 28
Results – Use Case 2 – Combining KGs
• Both graphs share some mission names.
GPT3 is able to capture contractor information
not included on the CEOS database. These
relationships and extra stakeholders were
added to the CEOS KG to further enrich it.
• In the future, a more extensive merge of both
graphs will be attempted
Class Type GPT3 CEOS CEOS+GPT3
Entity stakeholder 389 86 389
Relationship ownsInstrument 165 1117 1154
Relationship isPrimeOf 193 0 193
30. 30
Discussion
GPT-3 limitations:
• Only takes 2k tokens as input
• Human validation is often necessary
• Not open-source
Positive points:
• High parsing rate (86% over all parameters)
• Encouraging results
• Crucial support in populating a KG from text
31. 31
Future Work
Parsing the whole EoDirectory page with GPT3 instead of only first 2000 tokens
Parsing the whole EoDirectory page with GPT3 instead of only first 2000 tokens
Using Open Source models like T5 or BLOOM instead of GPT3
Align the KG with the ESA Space System Ontology
Exploring different KG merging techniques to combine both KGs developed for this study
Adapting GPT3 script to extract more than one instance of each variable where applicable
32. 32
Conclusion
Should we trust the GPT-3 models outputs?
Can Language Models support our vision of providing a unified overview of the space ecosystem?
The generated data should be validated by humans. However, it is impressive
how the model trained on general data understands domain-specific concepts,
Parsing with LMs is a tremendous time saver.
LMs' performances are still improving, and more open-source models are appearing.
So, definitely yes!
33. 33
33
The End
Thank you for your attention,
For more information, see our paper: "From Mission Description to Knowledge Graph: Applying Transformer-
based models to map knowledge from publicly available satellite datasets" to be presented at the 10th
International Systems & Concurrent Engineering for Space Applications (SECESA 2022)
Contact:
Audrey.berquand@esa.int
Anavictoria.ladeira@esa.int
35. 35
Approach – Validation of the GPT-3 outputs
For a subset of missions, we compared the generated text of the GPT-3 model with the CEOS database (manually verified). The CEOS
data is used as reference text. Below are the BLEU and ROUGE scores:
The BLEU and ROUGE scores are not always adapted for comparing strings, so we also used sentence-transformer model and cosine
similarity for the objective and the application parameters: