Imaging Node is looking to expand its machine learning efforts to involve more researchers from within and without the PDS in the form of a machine learning cloud platform.
We are looking for use cases from the machine learning community, and whether such a product would be of use to them.
1. PDS Imaging Node’s Hosted Machine Learning Platform
https://pds-imaging.jpl.nasa.gov/updates
Kevin Grimes, Paul Ramirez, Rishi Verma, Kiri Wagstaff, Steven Lu
Jet Propulsion Laboratory, California Institute of Technology
Contact Email: kevin.m.grimes@jpl.nasa.gov
◦ Data products in excess of 1380 TB
◦ Dozens of celestial bodies captured
◦ Over 20 million photos from the surface of Mars
◦ Nearly 5 million photos taken of Mars’s surface from orbit
◦ Photos of Jupiter, Pluto, and beyond
◦ Rich image metadata
◦ Identifiable targets within image (“moon”, “star”, “rock”)
◦ Human-written captions (“Curiosity entering crater”)
◦ Spacecraft and instrument specifications (“InSight Lander”,
“MAHLI”)
◦ Image timestamp and SCLK
PDS Imaging Node’s
Planetary Imagery Archive
https://pds-imaging.jpl.nasa.gov/data
Running machine learning software in Python on
Imaging Node data is as simple as the following steps:
1. Request a workspace be created for your team
2. Create a new Jupyter Notebook from your existing
Python software (or start from scratch)
3. Point your notebook to the locally-mounted
Imaging Node data
4. Run your application
Run your algorithms on our data
The approach: an integrated machine learning platform
Entire platform hosted by
PDS Imaging Node in its
cloud
Imaging Node archive
mounted to your
environment
Collaborative environment
shared across your team
Share your results, opt for
integration(s)
The goal: to enable machine learning researchers with their existing
budgets to train and run their algorithms against Imaging Node’s vast
archives in an environment that encourages collaboration and
integration.
Share & integrate your results
The PDS Image Atlas, located at
https://pds-imaging.jpl.nasa.gov/search, makes over 30 million
images within the PDS Imaging Node archive searchable by their
metadata, including that which is generated by machine learning.
ML research teams
Environments within PDS Machine Learning Platform
Each machine learning team (blue, orange, gray) is given an
environment within the PDS Machine Learning Platform, including
space to run applications and a database to store their results.
2019 California Institute of Technology. Government sponsorship acknowledged.
The typical workflow of a machine learning researcher using
the PDS Machine Learning Platform, beginning with updates
to their Jupyter Notebook and ending with various forms of
publication.
Researcher updates
notebook
Researcher runs
notebook
Researcher verifies
results
Researcher makes
results public
Researcher
visualizes results
Researcher
expresses
dissatisfaction with
results
Researcher embeds
permalink in
publication
Researcher
downloads results
locally
Researcher posts
visualization to
LinkedIn
Researcher opts
results for Image
Atlas integration
• Could your machine learning project use a platform like this?
• Do you have a use case not covered in this poster?
• Want to know when the platform is available?
Sign up for updates, or contact the team directly!
Researchers leverage the PDS Imaging Node’s 1380 TB image archive
to train their machine learning algorithms and make
incredible discoveries.
The Imaging Node has worked with several machine learning researchers to integrate their
work into the Image Atlas: over 5 million images have been classified by their tools, and
their results have been integrated into the Image Atlas.
There are, however, several challenges that researchers external to the Imaging Node may
face should they decide to work with Imaging Node’s data:
◦ Hardware limitations: disk space, network speeds, lack of GPUs
◦ Disconnected software tooling: ML technologies can be difficult to connect to each other
◦ Sharing issues: fascinating results may sit unused, even if published
◦ Integration issues: even if published, discoveries may never be integrated into other tools