LAGOVirtual is an ongoing project to develop a platform to collaborate, in the Large Aperture GRB Observatory (LAGO). This continental-wide observatory is devised to detect high energy component of Gamma Ray Bursts, by using the single particle technique in arrays of Water Cherenkov Detectors (WCD) at high mountain sites
Optimizing NoSQL Performance Through Observability
Repositorio de Datos LAGO
1. www.eu-eela.eu
E-science grid facility for
Europe and Latin America
LAGOVirtual
A Collaborative Environment for the
Large Aperture GRB observatory
Rodrigo Torréns
Centro Nacional de Cálculo Científico
Universidad de Los Andes
Parque Tecnológico de Mérida
2nd EELA-2 Conference
25-27 November 2009
Choroní, Venezuela
2. www.eu-eela.eu 2
LAGOVirtual
–LAGOVirtual is an ongoing project to
develop a platform to collaborate, in the
Large Aperture GRB Observatory (LAGO).
This continental-wide observatory is
devised to detect high energy component
of Gamma Ray Bursts, by using the single
particle technique in arrays of Water
Cherenkov Detectors (WCD) at high
mountain sites:
-Chacaltaya, Bolivia, 5300 m a.s.l.,
-Pico Espejo, Venezuela, 4750 m a.s.l.,
-Sierra Negra, Mexico, 4650 m a.s.l.
-Malargue and Bariloche, Argentina
3. www.eu-eela.eu
LAGOVirtual
LAGOVirtual will have four modules:
1. Remote access to the detectors
2. Access to the AIRES, (system for air shower
simulations and ROOT data analysis tool).
3. Access to distributed data files.
4. Access to distributed document files.
We are working in the module of distributed
repositories of data files (3).
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 3
5. www.eu-eela.eu
Data Repositories
• The first initiatives were Pre-prints repositories in HEP
(Los Alamos archive: arXiv.org)
• With this initiative started the Open Access movement
• Networks of institutional repositories are being
constructed worldwide
But…
• Every year the data from the High Energy Physic
experiments are lost or not accesible (as in other
areas)
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 5
6. www.eu-eela.eu
Data Repositories
• It is essential that results of experiments remain
accessible for accountability, re-analysis, and training
of future generations.
• These needs have moved forward the Open Data
Movements and the construction of...
Data Repositories
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 6
7. www.eu-eela.eu
LAGO Data Repositories
• The LAGO Data Repositories (LAGO-DR) will allow the
collaboration to share data captured by the WCD
and/or generated by simulations.
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 7
LAGO-DRWCD
8. www.eu-eela.eu
LAGO Data Repositories
• Researchers at each
institution-country can self
archive data files
generated by the project
instruments
• They can also create
automatic entries and files
for bulk data load.
• Value added services
could be generated:
federated search, data
replication, data analysis
between sites...
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 8
9. www.eu-eela.eu
Data and metadata structure
Data for LAGO are classified mainly into three types:
• instrument calibration data
• datasets captured by the WCD instruments
• data simulated by the AIRES application.
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 9
10. www.eu-eela.eu
Data and metadata structure
• Each data file is described by a metadata element set
specifically adapted to LAGO.
• The metadata model we propose for LAGOvirtual is an
adaptation of the model raised for the CCLRC*, and
Dublin Core standard elements
*Council for the Central Laboratory of the Research Councils. UK
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 10
11. www.eu-eela.eu
Data and metadata structure
The existence and implementation of a scientific
metadata standard model will allow:
• an uniform access to data for all the members of LAGO
collaboration,
• the interoperability between scientific information
systems, and also ....
• contribute to the data preservation and its usability in
time.
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 11
12. www.eu-eela.eu
LAGOvirtual system
For LAGOVirtual …
• We have developed a prototype based on DSpace
software
• DSpace is an open source software that enables open
sharing of many types of content, generally used for
document institutional repositories.
• Members of the project can upload individual data files
using a Web interface, or can use bulk load mechanism
• The system allows to describe each data file using
standarized metadata schema
Choroní, 2nd EELA-2 Conference. 25-27 november 2009 12
As it can be appreciated from the diagram LAGOVirtual will have four modules: • Remote access to the detectors, which will allow the user to modify the Photomultiplier (PTM) gain and baseline and the input channel for the Photomultiplier . • Access to the AIRES, (A system for air shower simulations [4]) simulation environment and ROOT data analysis tool [5] ). • Access to distributed data files. • Access to distributed document files. The present contribution reports the characteristics and the data structure of a prototype for the module of distributed repositories of data files.
For almost two decades electronic high energy physics preprints have been submitted to repositories, stored, indexed, retrieved, and shared. This initiative started the Open Access movement which now is building a network of Institutional Repositories (IR) worldwide. Although the importance of preserving the scientific data has been recommended long time ago, every year the data from the High Energy Physic experiments are lost, forgotten or kept for the restricted specific community of the collaboration.
Nowadays the cost, the sophistication and the rapid advancement of new experiments makes it essential that previous results remain accessible for accountability, re-analysis, and training of future generations. These needs have moved forward the Open Data Movements At the moment, repositories are primarily designed for preprints but technically those documents could be linked to datasets hosted by the same repository or by others elsewhere. Having the data linked to the document or preserved in a repository students/researchers/groups outside the experimental collaboration could gain advantage from increased access to data.
The LAGO Data Repositories (LAGO-DR) allows the collaboration to share data captured by the WCD and/or generated by simulations.
Researchers at each institution-country can self archive data files generated by the project instruments They can also create automatic entries and files for bulk data load. Value added services could be generated: federated search, data replication, data analysis between sites...
The existence and implementation of a scientific metadata standard model will allow an uniform access to data for all the members of LAGO collaboration, the interoperability between scientific information systems and also will contribute to the data preservation and its usability in time. The metadata model we propose for LAGOvirtual is an adaptation of the model raised for the CCLRC (Council for the Central Laboratory of the Research Councils http://epubs.cclrc.ac.uk/bitstream/485/ ).
The existence and implementation of a scientific metadata standard model will allow an uniform access to data for all the members of LAGO collaboration, the interoperability between scientific information systems and also will contribute to the data preservation and its usability in time. The metadata model we propose for LAGOvirtual is an adaptation of the model raised for the CCLRC (Council for the Central Laboratory of the Research Councils http://epubs.cclrc.ac.uk/bitstream/485/ ).
We have developed a prototype based on DSpace software Dspace is an open source software that enables open sharing of many types of content, generally used for document institutional repositories. Members of the project can upload individual data files using a Web interface, or can use bulk load mechanism The systems allows to describe each data file using standarized metadata schema
This is the main page of the LAGOVirtual web site. We could retrieve data by differents criteria: by capture date, by country, institution, data type, among others.
Based on the community, sub-community, collection, item model; available at the DSpace software, we established a hierarchical structure starting with country who belong to the collaboration (community), institution (sub-community) and finally collections for different data types, to which the items (data and documents files) are associated.
This is an item view. Each data file (Dspace’s items) is described by a metadata set specifically adapted to LAGO. We observe here the metadata associated with the items: description, date of capture, institution, people responsible for the data, instruments used for data capture, problems associated with the data file, etc We can associate several files to each item: data files, graphics, etc. Members of the project can have appropiate level of access to insert new data files, and to review and modificate them