ESA SAPS 
Science Archives Publications System 
Scientific data from many of ESA’s Space Missions are archived at the European Space 
Astronomy Centre’s system of Scientific Archives. Those archives store and provide 
access to data from astronomy, planetary and solar missions. (Visit the online website 
at http://archives.esac.esa.int/ ) 
Scientists make use of the scientific data produced by the missions to publish their 
findings in peer reviewed scientific literature. The data used to produce a particular 
scientific paper is often not routinely recorded, although for many missions, authors are 
requested to provide this information in the paper. Currently, the link between papers’ 
bibcodes and the observational data used has only been made systematically for some 
of ESA’s scientific missions by reading the papers and recording identifiers to the data 
used (referred to as the “OBSID”). 
Once these links have been established, it is possible to gain valuable insights into the 
scientific productivity of a mission. Interested parties can investigate which scientific 
areas that are being contributed to, how the scientific productivity is evolving with time, 
the delay between making an observation and publication, the number of new scientists 
using a mission, whether the data for a publication is obtained from an archive of via a 
successful response to an observing Announcement of Opportunity, the nationalities of 
the authors and so on. For multi-instrument missions the productivity of the different 
instruments and their different operation modes, if applicable, can be assessed. These 
insights are useful to a mission’s Project Scientist, management and those involved in 
the selection of ESA’s future science missions. 
Some of the missions already provide links to their publications with some relevant 
information extracted (see e.g., http://herschel.esac.esa.int/hpt). In some cases the 
archives provide links from the literature to the observational data. (i.e. the XMM-Newton 
Science Archive, XSA, see http://archives.esac.esa.int/xsa/). 
ESA has awarded a contract for under geo-returned countries to the consortium 
Planetek Hellas - National Observatory of Athens, for the building of a system that will 
allow to homogeneously extract and classify relation of paper published information 
with observational data from ESA space based missions. This poster presents the main 
characteristics of this system: the ESA SAPS (Science Archives Publications System). 
European Space Astronomy Centre, Madrid, Spain 
Pedro Osuna (ESA) 
Stratos Gerakakis (Planetek Hellas-NOA) 
pedro.osuna@esa.int 
ADASS XXIV 
October 5-9, 2014 
Calgary, Canada 
Objectives 
The main objective of the activity is to develop a system 
that can provide information on the scientific 
performance of ESA’s operating missions by examining 
the publications and the observational data used to 
produce them. 
This will be performed by providing: 
● a human user interface, allowing information from 
publications in the archive and the associated 
archival data to be presented. 
● a human interface allowing the listed publications to 
be selected using various criteria which may be 
mission dependent 
● a human interface to allow standard statistical sum-maries 
to be produced for the selected publications. 
● a human interface that will allow the production of 
on-the-fly statistics on the scientific publications and 
any parameter in the associated archived data 
● a machine interface that will allow the ESAC Science 
Archives to make the necessary queries to the 
system and retrieve relevant relation of observational 
data and papers to be shown within the archives 
● contribution to the ADS tagging effort for Linking 
Literature and Data 
HHiigghh LLeevveell OOvveerrvviieeww 
1.Consumes 
● PDF files 
● Groups of zipped PDFs 
● Excel files with URLs of 
PDFs 
2.Classifies 
● Automatically detects 
Observations in the PDFs 
● Requests Human 
Intervention only if unsure 
about the detection 
1. Reports 
● Web based Search page 
● Full text searching 
● Faceted searching 
● OLAP reports 
2. Integrates 
● ESAC Mission Archives AIO 
● Machine to Machine 
RESTfull API server 
AArrcchhiitteeccttuurraall DDeessiiggnn 
● User uploads Publications 
● System tries to locate in the publication references to 
Observation IDs 
● If none found, it tries to locate references to Observation Dates 
● Dates are filtered to remove invalid matches 
● They are scored according to location and matching keywords 
in the surrounding text 
● An aggregated score is calculated for each Date 
● Top scoring Observation Dates are used to pull Observations 
performed in those Dates 
● These Observations are scored according to references in the 
publication (instrument names, targets etc) 
● The scored Observations are displayed in Mission 
Administrator's dashboard 
● The Mission Administrator: 
● Approves or Rejects Observation suggestions 
● Can manual specify Observations missed by automatic 
parsing. 
● Can specify threshold limits for scored Observations so 
verification process can be automated 
1. Project based on 
● Java 
● Spring Boot Framework 
2. Front-end build with 
● Java 
● Google Window Toolkit (GWT) 
● Twitter Bootstrap 
3. Indexing / Searching provided by 
● ElasticSearch 
8. Charting 
● HighCharts 
9. Development Stack 
● Eclipse IDE 
● Apache Tomcat 
● Maven 
● Jenkins CI 
● Trello Project management 
10.Powered by 
● Lots and lots of developer love 
4. PDF parsing 
● PDFBox 
● TIKA 
5. Reporting – OLAP Cubes 
● JasperServer 
6. Backend Datastore 
● PostgreSQL 
7. Communication between services 
● RESTful calls with JSON payloads 
CCllaassssiiffiiccaattiioonn WWoorrkkffllooww 
Log output from Observation ID and Observation Date 
matching showing a list of the matches and their relative 
location in the surrounding text. (WIP)

ESA-SAPS: Science Archives Publication System

  • 1.
    ESA SAPS ScienceArchives Publications System Scientific data from many of ESA’s Space Missions are archived at the European Space Astronomy Centre’s system of Scientific Archives. Those archives store and provide access to data from astronomy, planetary and solar missions. (Visit the online website at http://archives.esac.esa.int/ ) Scientists make use of the scientific data produced by the missions to publish their findings in peer reviewed scientific literature. The data used to produce a particular scientific paper is often not routinely recorded, although for many missions, authors are requested to provide this information in the paper. Currently, the link between papers’ bibcodes and the observational data used has only been made systematically for some of ESA’s scientific missions by reading the papers and recording identifiers to the data used (referred to as the “OBSID”). Once these links have been established, it is possible to gain valuable insights into the scientific productivity of a mission. Interested parties can investigate which scientific areas that are being contributed to, how the scientific productivity is evolving with time, the delay between making an observation and publication, the number of new scientists using a mission, whether the data for a publication is obtained from an archive of via a successful response to an observing Announcement of Opportunity, the nationalities of the authors and so on. For multi-instrument missions the productivity of the different instruments and their different operation modes, if applicable, can be assessed. These insights are useful to a mission’s Project Scientist, management and those involved in the selection of ESA’s future science missions. Some of the missions already provide links to their publications with some relevant information extracted (see e.g., http://herschel.esac.esa.int/hpt). In some cases the archives provide links from the literature to the observational data. (i.e. the XMM-Newton Science Archive, XSA, see http://archives.esac.esa.int/xsa/). ESA has awarded a contract for under geo-returned countries to the consortium Planetek Hellas - National Observatory of Athens, for the building of a system that will allow to homogeneously extract and classify relation of paper published information with observational data from ESA space based missions. This poster presents the main characteristics of this system: the ESA SAPS (Science Archives Publications System). European Space Astronomy Centre, Madrid, Spain Pedro Osuna (ESA) Stratos Gerakakis (Planetek Hellas-NOA) pedro.osuna@esa.int ADASS XXIV October 5-9, 2014 Calgary, Canada Objectives The main objective of the activity is to develop a system that can provide information on the scientific performance of ESA’s operating missions by examining the publications and the observational data used to produce them. This will be performed by providing: ● a human user interface, allowing information from publications in the archive and the associated archival data to be presented. ● a human interface allowing the listed publications to be selected using various criteria which may be mission dependent ● a human interface to allow standard statistical sum-maries to be produced for the selected publications. ● a human interface that will allow the production of on-the-fly statistics on the scientific publications and any parameter in the associated archived data ● a machine interface that will allow the ESAC Science Archives to make the necessary queries to the system and retrieve relevant relation of observational data and papers to be shown within the archives ● contribution to the ADS tagging effort for Linking Literature and Data HHiigghh LLeevveell OOvveerrvviieeww 1.Consumes ● PDF files ● Groups of zipped PDFs ● Excel files with URLs of PDFs 2.Classifies ● Automatically detects Observations in the PDFs ● Requests Human Intervention only if unsure about the detection 1. Reports ● Web based Search page ● Full text searching ● Faceted searching ● OLAP reports 2. Integrates ● ESAC Mission Archives AIO ● Machine to Machine RESTfull API server AArrcchhiitteeccttuurraall DDeessiiggnn ● User uploads Publications ● System tries to locate in the publication references to Observation IDs ● If none found, it tries to locate references to Observation Dates ● Dates are filtered to remove invalid matches ● They are scored according to location and matching keywords in the surrounding text ● An aggregated score is calculated for each Date ● Top scoring Observation Dates are used to pull Observations performed in those Dates ● These Observations are scored according to references in the publication (instrument names, targets etc) ● The scored Observations are displayed in Mission Administrator's dashboard ● The Mission Administrator: ● Approves or Rejects Observation suggestions ● Can manual specify Observations missed by automatic parsing. ● Can specify threshold limits for scored Observations so verification process can be automated 1. Project based on ● Java ● Spring Boot Framework 2. Front-end build with ● Java ● Google Window Toolkit (GWT) ● Twitter Bootstrap 3. Indexing / Searching provided by ● ElasticSearch 8. Charting ● HighCharts 9. Development Stack ● Eclipse IDE ● Apache Tomcat ● Maven ● Jenkins CI ● Trello Project management 10.Powered by ● Lots and lots of developer love 4. PDF parsing ● PDFBox ● TIKA 5. Reporting – OLAP Cubes ● JasperServer 6. Backend Datastore ● PostgreSQL 7. Communication between services ● RESTful calls with JSON payloads CCllaassssiiffiiccaattiioonn WWoorrkkffllooww Log output from Observation ID and Observation Date matching showing a list of the matches and their relative location in the surrounding text. (WIP)