Poster presentation of ESA-SAPS: Science Archives Publication System at ADASS XXIV
held in Calgary, Canada, October 5-9, 2014. (http://www.adass2014.org/)
This poster presents the main characteristics of this system: the ESA SAPS (Science Archives Publications System), a system that will allow to homogeneously extract and classify relation of paper published information with observational data from ESA space based missions.
ESA has awarded a contract for under geo-returned countries to the consortium Planetek Hellas - National Observatory of Athens, for the building of this system.
Details of the ESA-SAPS project:
http://www.planetek.it/progetti/esa_science_archives_publication_system
1. ESA SAPS
Science Archives Publications System
Scientific data from many of ESA’s Space Missions are archived at the European Space
Astronomy Centre’s system of Scientific Archives. Those archives store and provide
access to data from astronomy, planetary and solar missions. (Visit the online website
at http://archives.esac.esa.int/ )
Scientists make use of the scientific data produced by the missions to publish their
findings in peer reviewed scientific literature. The data used to produce a particular
scientific paper is often not routinely recorded, although for many missions, authors are
requested to provide this information in the paper. Currently, the link between papers’
bibcodes and the observational data used has only been made systematically for some
of ESA’s scientific missions by reading the papers and recording identifiers to the data
used (referred to as the “OBSID”).
Once these links have been established, it is possible to gain valuable insights into the
scientific productivity of a mission. Interested parties can investigate which scientific
areas that are being contributed to, how the scientific productivity is evolving with time,
the delay between making an observation and publication, the number of new scientists
using a mission, whether the data for a publication is obtained from an archive of via a
successful response to an observing Announcement of Opportunity, the nationalities of
the authors and so on. For multi-instrument missions the productivity of the different
instruments and their different operation modes, if applicable, can be assessed. These
insights are useful to a mission’s Project Scientist, management and those involved in
the selection of ESA’s future science missions.
Some of the missions already provide links to their publications with some relevant
information extracted (see e.g., http://herschel.esac.esa.int/hpt). In some cases the
archives provide links from the literature to the observational data. (i.e. the XMM-Newton
Science Archive, XSA, see http://archives.esac.esa.int/xsa/).
ESA has awarded a contract for under geo-returned countries to the consortium
Planetek Hellas - National Observatory of Athens, for the building of a system that will
allow to homogeneously extract and classify relation of paper published information
with observational data from ESA space based missions. This poster presents the main
characteristics of this system: the ESA SAPS (Science Archives Publications System).
European Space Astronomy Centre, Madrid, Spain
Pedro Osuna (ESA)
Stratos Gerakakis (Planetek Hellas-NOA)
pedro.osuna@esa.int
ADASS XXIV
October 5-9, 2014
Calgary, Canada
Objectives
The main objective of the activity is to develop a system
that can provide information on the scientific
performance of ESA’s operating missions by examining
the publications and the observational data used to
produce them.
This will be performed by providing:
● a human user interface, allowing information from
publications in the archive and the associated
archival data to be presented.
● a human interface allowing the listed publications to
be selected using various criteria which may be
mission dependent
● a human interface to allow standard statistical sum-maries
to be produced for the selected publications.
● a human interface that will allow the production of
on-the-fly statistics on the scientific publications and
any parameter in the associated archived data
● a machine interface that will allow the ESAC Science
Archives to make the necessary queries to the
system and retrieve relevant relation of observational
data and papers to be shown within the archives
● contribution to the ADS tagging effort for Linking
Literature and Data
HHiigghh LLeevveell OOvveerrvviieeww
1.Consumes
● PDF files
● Groups of zipped PDFs
● Excel files with URLs of
PDFs
2.Classifies
● Automatically detects
Observations in the PDFs
● Requests Human
Intervention only if unsure
about the detection
1. Reports
● Web based Search page
● Full text searching
● Faceted searching
● OLAP reports
2. Integrates
● ESAC Mission Archives AIO
● Machine to Machine
RESTfull API server
AArrcchhiitteeccttuurraall DDeessiiggnn
● User uploads Publications
● System tries to locate in the publication references to
Observation IDs
● If none found, it tries to locate references to Observation Dates
● Dates are filtered to remove invalid matches
● They are scored according to location and matching keywords
in the surrounding text
● An aggregated score is calculated for each Date
● Top scoring Observation Dates are used to pull Observations
performed in those Dates
● These Observations are scored according to references in the
publication (instrument names, targets etc)
● The scored Observations are displayed in Mission
Administrator's dashboard
● The Mission Administrator:
● Approves or Rejects Observation suggestions
● Can manual specify Observations missed by automatic
parsing.
● Can specify threshold limits for scored Observations so
verification process can be automated
1. Project based on
● Java
● Spring Boot Framework
2. Front-end build with
● Java
● Google Window Toolkit (GWT)
● Twitter Bootstrap
3. Indexing / Searching provided by
● ElasticSearch
8. Charting
● HighCharts
9. Development Stack
● Eclipse IDE
● Apache Tomcat
● Maven
● Jenkins CI
● Trello Project management
10.Powered by
● Lots and lots of developer love
4. PDF parsing
● PDFBox
● TIKA
5. Reporting – OLAP Cubes
● JasperServer
6. Backend Datastore
● PostgreSQL
7. Communication between services
● RESTful calls with JSON payloads
CCllaassssiiffiiccaattiioonn WWoorrkkffllooww
Log output from Observation ID and Observation Date
matching showing a list of the matches and their relative
location in the surrounding text. (WIP)