Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Arecibo
Observatory
Data Movement:
so much more than data
2021.05.12
Julio Alvarado Negron
Big Data Program Manager @ Arec...
What is Big Data?
A collection of data that is huge in volume and yet
growing exponentially with time. In short such data ...
A Full Spectrum Pioneer of Sciences Since 1963
Is the study of radio waves
produced by a astronomical objects
such as Sun,...
ALFA
The Arecibo L-band Feed Array (ALFA) is a seven feed system that allows large-scale surveys of the sky to be
conducte...
Knowing the Sources and Discoveries
Venus Characterization
Venus is covered in a thick layer of clouds, but Arecibo’s rada...
Knowing the Sources and Discoveries
Fast Radio Bursts
Fast radio bursts, or FRBs, are brief, brilliant blasts of radio wav...
50+ Years of Contributions
50+ Years of Contributions
First Cable Snaps
On August 10th a first cable snaps
causing damage to the dish.
Second Cable Snaps
On November 6th a secon...
The Big Picture
Arecibo Observatory holds over
3PB of data onsite. This amount is
spread between active hard drives,
offlin...
The Call for Help
Right after the collapse, the team
at Arecibo understood the urgency of
adding redundancy and safekeep t...
Data Migration
Once the working groups worked intensely
to establish the processes and the mechanisms, the
team at Arecibo...
Upcoming SlideShare
Loading in …5
×

GlobusWorld 2021: Saving Arecibo Observatory Data

The story of how Globus helped the Arecibo Observatory save 50+ years of data for posterity and future research. Presented at the GlobusWorld 2021 conference by Julio Alvarado Negron.

  • Be the first to comment

GlobusWorld 2021: Saving Arecibo Observatory Data

  1. 1. Arecibo Observatory Data Movement: so much more than data 2021.05.12 Julio Alvarado Negron Big Data Program Manager @ Arecibo Observatory George B. Robb III, EPOC - Performance Chaser ESnet - Infrastructure Team Globus World 2021
  2. 2. What is Big Data? A collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that limited traditional data management tools are able to store it or process it efficiently. Examples The New York Stock Exchange generates about 1TB of new trade data per day. Facebook generates over 500TB of data daily. A jet engine generates over 10TB of data in 30 minutes of flight. AO has the capability to generate over 80TB per day, with a total of over 3PB of data stored. Big Data @ AO - Data Management and Governance practices implementation - Facilitate access to community to Arecibo’s data - Enables access to High-Performing Computing - Implement best practices and lessons learned from partner observatories and research community Big Data Overview
  3. 3. A Full Spectrum Pioneer of Sciences Since 1963 Is the study of radio waves produced by a astronomical objects such as Sun, planets, pulsars, stars, etc. Arecibo radio telescope sensitivity allows astronomers to detect faint radio signals from far-off regions of the universe. Fast Radio Bursts, Pulsars, Spectral line, Exoplanets, VLBI. More Info Here Radio Astronomy Is the investigation of the earth's gaseous envelope. The Arecibo Radio Telescope can measure the growth and decay of disturbances in ionosphere (altitudes above 30 miles). The "big dish" is also used to study plasma physics processes in the electrically charged regions where radio waves are influenced most. More Info Here Atmospheric Sciences The Arecibo Observatory was the world's most powerful planetary radar system. The 305 meter Arecibo telescope equipped with a 1 MW transmitter at S-band (12.6 cm, 2380 MHz) was used for studies of small bodies in the solar system, terrestrial planets, and planetary satellites including the Moon. Near Earth Asteroids characterization, Surface Structure (spacecrafts landing) More Info Here Planetary Radar
  4. 4. ALFA The Arecibo L-band Feed Array (ALFA) is a seven feed system that allows large-scale surveys of the sky to be conducted with unprecedented sensitivity using the 305-m Arecibo telescope in Puerto Rico. ALFA, operating near 1.4 GHz, consists of a cluster of seven cooled dual-polarization feeds, a fiber-optical transmission system, and digital back-end signal processors. Most of this projects are considered “surveys” due to their nature. The radar is left static in a position while the Earth rotates, allowing to “drift scan” the sky above Arecibo. It could generate an aggregate of 875MB/s, 76TB per day. Knowing the Sources and Discoveries Using ALFA for ALFALFA
  5. 5. Knowing the Sources and Discoveries Venus Characterization Venus is covered in a thick layer of clouds, but Arecibo’s radar beams were able to cut through that haze and bounce off of the rocky planet’s surface, allowing researchers to map the terrain. In the figures, we can compare the first large scale view of Venus (1971) and the 2015 image with improved equipments. - Arecibo Discoveries
  6. 6. Knowing the Sources and Discoveries Fast Radio Bursts Fast radio bursts, or FRBs, are brief, brilliant blasts of radio waves with unknown origins. The first FRB known to give off multiple bursts was FRB 121102, which Arecibo first spotted in 2012 and again in 2015. Arecibo’s discovery backed up the theory from the Charles Parkes telescope in Australia that FRB’s are events that come farther than the Milky Way. Radio bursts are observed during 90 days followed by a silent period of 67 days. The same behaviour then repeats every 157 days. - Arecibo Discoveries
  7. 7. 50+ Years of Contributions
  8. 8. 50+ Years of Contributions
  9. 9. First Cable Snaps On August 10th a first cable snaps causing damage to the dish. Second Cable Snaps On November 6th a second cable snaps causing major damage to the dish. December’s Check Mate A main support cable broke from Tower 4, causing the platform to fall over the dish. The team got together and realized that the data safety and integrity was a priority. A Sequence of Snaps
  10. 10. The Big Picture Arecibo Observatory holds over 3PB of data onsite. This amount is spread between active hard drives, offline disks and the tape library. Arecibo also has copies of data stored on various institutions across the globe, to which we refer to as offsite data. Not enough fiber Arecibo’s Internet connection is limited to 1Gbps due to the condition of the infrastructure to the site. With the existing connection, transferring 3PB would need over 24 months. The Data in Numbers and Infra Limitations
  11. 11. The Call for Help Right after the collapse, the team at Arecibo understood the urgency of adding redundancy and safekeep the data. Immediately, we reached the Office of Research at UCF. From there, the logistics were driven funneled through the Research community. Getting the Teams Together In a matter of days, Arecibo got connected to working teams, BIG THANKS: - EPOC/ESnet - transfer optimization and hardware - CICoE - data management practices - TACC - high performance computing and storage - Univ of Puerto Rico HPCf - 10Gbps connectivity (I2 - AMPATH) - Engine-4 - 10Gbps connectivity - Globus - data transfer optimization The SOS Call THANK YOU!
  12. 12. Data Migration Once the working groups worked intensely to establish the processes and the mechanisms, the team at Arecibo proceeded to load the data to the NAS boxes. Those boxes are being taken to our partners, University of Puerto Rico at Mayaguez (UPRM) and Engine-4 (E4) in Bayamon. From there, the data is uploaded to the TACC via 10Gbps links. The UPRM has a 10Gbps via the AMPATH (I2) and E4 has a 10Gbps via commercial route. Benchmarks Before utilizing Globus, the team relied in rsync to move the data from Arecibo and the partners. That resulted in an avg transfer speed of 47MBps via 10Gbps wire. Once Globus Connect Personal was installed and configure in the NAS, the Effective Speed reported has been sustained at over 200MBps. The Data Transfer Arecibo Data Transfer Project Data uploaded to Computing Center 2 S t o r a g e t r a n s p o r t e d b a c k t o A O 3 1 D a t a t r a n s p o r t e d t o P a r t n e r

×