SlideShare a Scribd company logo
Arecibo
Observatory
Data Movement:
so much more than data
2021.05.12
Julio Alvarado Negron
Big Data Program Manager @ Arecibo Observatory
George B. Robb III,
EPOC - Performance Chaser
ESnet - Infrastructure Team
Globus World 2021
What is Big Data?
A collection of data that is huge in volume and yet
growing exponentially with time. In short such data is so
large and complex that limited traditional data
management tools are able to store it or process it
efficiently.
Examples
The New York Stock Exchange generates about
1TB of new trade data per day. Facebook generates over
500TB of data daily. A jet engine generates over 10TB of
data in 30 minutes of flight.
AO has the capability to generate over 80TB per
day, with a total of over 3PB of data stored.
Big Data @ AO
- Data Management and Governance practices
implementation
- Facilitate access to community to Arecibo’s data
- Enables access to High-Performing Computing
- Implement best practices and lessons learned from
partner observatories and research community
Big Data Overview
A Full Spectrum Pioneer of Sciences Since 1963
Is the study of radio waves
produced by a astronomical objects
such as Sun, planets, pulsars,
stars, etc. Arecibo radio telescope
sensitivity allows astronomers to
detect faint radio signals from
far-off regions of the universe.
Fast Radio Bursts, Pulsars,
Spectral line, Exoplanets, VLBI.
More Info Here
Radio Astronomy
Is the investigation of the earth's
gaseous envelope. The Arecibo
Radio Telescope can measure the
growth and decay of disturbances
in ionosphere (altitudes above 30
miles). The "big dish" is also used
to study plasma physics processes
in the electrically charged regions
where radio waves are influenced
most.
More Info Here
Atmospheric Sciences
The Arecibo Observatory was the
world's most powerful planetary
radar system. The 305 meter
Arecibo telescope equipped with a
1 MW transmitter at S-band (12.6
cm, 2380 MHz) was used for
studies of small bodies in the solar
system, terrestrial planets, and
planetary satellites including the
Moon.
Near Earth Asteroids
characterization, Surface Structure
(spacecrafts landing)
More Info Here
Planetary Radar
ALFA
The Arecibo L-band Feed Array (ALFA) is a seven feed system that allows large-scale surveys of the sky to be
conducted with unprecedented sensitivity using the 305-m Arecibo telescope in Puerto Rico. ALFA, operating near 1.4 GHz,
consists of a cluster of seven cooled dual-polarization feeds, a fiber-optical transmission system, and digital back-end signal
processors.
Most of this projects are considered “surveys” due to their nature. The radar is left static in a position while the Earth
rotates, allowing to “drift scan” the sky above Arecibo.
It could generate an aggregate of 875MB/s, 76TB per day.
Knowing the Sources and Discoveries
Using ALFA for ALFALFA
Knowing the Sources and Discoveries
Venus Characterization
Venus is covered in a thick layer of clouds, but Arecibo’s radar beams were able to cut through that haze and
bounce off of the rocky planet’s surface, allowing researchers to map the terrain.
In the figures, we can compare the first large scale view of Venus (1971) and the 2015 image with improved
equipments.
- Arecibo Discoveries
Knowing the Sources and Discoveries
Fast Radio Bursts
Fast radio bursts, or FRBs, are brief, brilliant blasts of radio waves with unknown origins. The first FRB known to
give off multiple bursts was FRB 121102, which Arecibo first spotted in 2012 and again in 2015.
Arecibo’s discovery backed up the theory from the Charles Parkes telescope in Australia that FRB’s are events
that come farther than the Milky Way.
Radio bursts are observed during 90 days followed by a silent period of 67 days. The same behaviour then repeats
every 157 days.
- Arecibo Discoveries
50+ Years of Contributions
50+ Years of Contributions
First Cable Snaps
On August 10th a first cable snaps
causing damage to the dish.
Second Cable Snaps
On November 6th a second cable
snaps causing major damage to the
dish.
December’s Check Mate
A main support cable broke from
Tower 4, causing the platform to fall
over the dish.
The team got together and
realized that the data safety and
integrity was a priority.
A Sequence of Snaps
The Big Picture
Arecibo Observatory holds over
3PB of data onsite. This amount is
spread between active hard drives,
offline disks and the tape library.
Arecibo also has copies of data
stored on various institutions across the
globe, to which we refer to as offsite
data.
Not enough fiber
Arecibo’s Internet connection is
limited to 1Gbps due to the condition of
the infrastructure to the site.
With the existing connection,
transferring 3PB would need over 24
months.
The Data in Numbers and Infra Limitations
The Call for Help
Right after the collapse, the team
at Arecibo understood the urgency of
adding redundancy and safekeep the
data. Immediately, we reached the
Office of Research at UCF. From there,
the logistics were driven funneled
through the Research community.
Getting the Teams Together
In a matter of days, Arecibo got
connected to working teams, BIG THANKS:
- EPOC/ESnet - transfer optimization
and hardware
- CICoE - data management practices
- TACC - high performance
computing and storage
- Univ of Puerto Rico HPCf - 10Gbps
connectivity (I2 - AMPATH)
- Engine-4 - 10Gbps connectivity
- Globus - data transfer optimization
The SOS Call THANK
YOU!
Data Migration
Once the working groups worked intensely
to establish the processes and the mechanisms, the
team at Arecibo proceeded to load the data to the
NAS boxes.
Those boxes are being taken to our partners,
University of Puerto Rico at Mayaguez (UPRM) and
Engine-4 (E4) in Bayamon. From there, the data is
uploaded to the TACC via 10Gbps links.
The UPRM has a 10Gbps via the AMPATH (I2)
and E4 has a 10Gbps via commercial route.
Benchmarks
Before utilizing Globus, the team relied in
rsync to move the data from Arecibo and the
partners. That resulted in an avg transfer speed of
47MBps via 10Gbps wire.
Once Globus Connect Personal was installed
and configure in the NAS, the Effective Speed
reported has been sustained at over 200MBps.
The Data Transfer
Arecibo Data Transfer
Project
Data uploaded to Computing Center
2
S
t
o
r
a
g
e
t
r
a
n
s
p
o
r
t
e
d
b
a
c
k
t
o
A
O
3
1
D
a
t
a
t
r
a
n
s
p
o
r
t
e
d
t
o
P
a
r
t
n
e
r

More Related Content

What's hot

AAPG GTW 2017: Deep Water and Shelf Reservoirs
AAPG GTW 2017: Deep Water and Shelf ReservoirsAAPG GTW 2017: Deep Water and Shelf Reservoirs
AAPG GTW 2017: Deep Water and Shelf Reservoirs
Dustin Dewett
 

What's hot (20)

Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
Solving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light SourceSolving Network Throughput Problems at the Diamond Light Source
Solving Network Throughput Problems at the Diamond Light Source
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
Internet & Climate Change: Cyberinfrastructure for a Carbon-Constrained World
Internet & Climate Change: Cyberinfrastructure for a Carbon-Constrained WorldInternet & Climate Change: Cyberinfrastructure for a Carbon-Constrained World
Internet & Climate Change: Cyberinfrastructure for a Carbon-Constrained World
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
AAPG GTW 2017: Deep Water and Shelf Reservoirs
AAPG GTW 2017: Deep Water and Shelf ReservoirsAAPG GTW 2017: Deep Water and Shelf Reservoirs
AAPG GTW 2017: Deep Water and Shelf Reservoirs
 
Improving access to geospatial Big Data in the hydrology domain
Improving access to geospatial Big Data in the hydrology domainImproving access to geospatial Big Data in the hydrology domain
Improving access to geospatial Big Data in the hydrology domain
 
Long Term Ecological Research Network
Long Term Ecological Research NetworkLong Term Ecological Research Network
Long Term Ecological Research Network
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Applying the Systems Engineering Process to a Conceptual Merucry CubeSat Mission
Applying the Systems Engineering Process to a Conceptual Merucry CubeSat MissionApplying the Systems Engineering Process to a Conceptual Merucry CubeSat Mission
Applying the Systems Engineering Process to a Conceptual Merucry CubeSat Mission
 
Ceoa Nov 2005 Final Small
Ceoa Nov 2005 Final SmallCeoa Nov 2005 Final Small
Ceoa Nov 2005 Final Small
 
Pacific Research Platform Science Drivers
Pacific Research Platform Science DriversPacific Research Platform Science Drivers
Pacific Research Platform Science Drivers
 
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
 
AusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesAusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data Cubes
 
Provisioning Janet
Provisioning JanetProvisioning Janet
Provisioning Janet
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
EOSDIS Status
EOSDIS StatusEOSDIS Status
EOSDIS Status
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
Linked Sensor Data cube
Linked Sensor Data cubeLinked Sensor Data cube
Linked Sensor Data cube
 

Similar to GlobusWorld 2021: Saving Arecibo Observatory Data

Similar to GlobusWorld 2021: Saving Arecibo Observatory Data (20)

Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
 
LSST/DM: Building a Next Generation Survey Data Processing System
LSST/DM: Building a Next Generation Survey Data Processing SystemLSST/DM: Building a Next Generation Survey Data Processing System
LSST/DM: Building a Next Generation Survey Data Processing System
 
GaiaCal2014: Creating and Calibrating LSST Data Product
GaiaCal2014: Creating and Calibrating LSST Data ProductGaiaCal2014: Creating and Calibrating LSST Data Product
GaiaCal2014: Creating and Calibrating LSST Data Product
 
PERICLES Preserving space data
PERICLES Preserving space dataPERICLES Preserving space data
PERICLES Preserving space data
 
6%2E2017-2021
6%2E2017-20216%2E2017-2021
6%2E2017-2021
 
Information Technology Infrastructure Committee (ITIC)
Information Technology Infrastructure Committee (ITIC)Information Technology Infrastructure Committee (ITIC)
Information Technology Infrastructure Committee (ITIC)
 
Virtual Observatories as the Drivers of Space Science - Robert Rankin, Univer...
Virtual Observatories as the Drivers of Space Science - Robert Rankin, Univer...Virtual Observatories as the Drivers of Space Science - Robert Rankin, Univer...
Virtual Observatories as the Drivers of Space Science - Robert Rankin, Univer...
 
SOI Annual Report 2
SOI Annual Report 2SOI Annual Report 2
SOI Annual Report 2
 
SVO Activities - SEA 2008
SVO Activities - SEA 2008SVO Activities - SEA 2008
SVO Activities - SEA 2008
 
Monitoring Oceans - Chris Atherton - SRD23
Monitoring Oceans - Chris Atherton - SRD23Monitoring Oceans - Chris Atherton - SRD23
Monitoring Oceans - Chris Atherton - SRD23
 
ESCAPE Kick-off meeting - FAIR, Facility for Antiproton and Ion Research (Feb...
ESCAPE Kick-off meeting - FAIR, Facility for Antiproton and Ion Research (Feb...ESCAPE Kick-off meeting - FAIR, Facility for Antiproton and Ion Research (Feb...
ESCAPE Kick-off meeting - FAIR, Facility for Antiproton and Ion Research (Feb...
 
CENIC: Pacific Wave and PRP Update Big News for Big Data
CENIC: Pacific Wave and PRP Update Big News for Big DataCENIC: Pacific Wave and PRP Update Big News for Big Data
CENIC: Pacific Wave and PRP Update Big News for Big Data
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Building a National Virtual Observatory: The Case of the Spanish Virtual Obse...
Building a National Virtual Observatory: The Case of the Spanish Virtual Obse...Building a National Virtual Observatory: The Case of the Spanish Virtual Obse...
Building a National Virtual Observatory: The Case of the Spanish Virtual Obse...
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
ESCAPE Kick-off meeting - KM3Net, Opening a new window on our universe (Feb 2...
 
The Next Decade of ISS and Beyond
The Next Decade of ISS and BeyondThe Next Decade of ISS and Beyond
The Next Decade of ISS and Beyond
 
TERN Facility Portals - Stuart Phinn
TERN Facility Portals - Stuart PhinnTERN Facility Portals - Stuart Phinn
TERN Facility Portals - Stuart Phinn
 
The Coming Revolution in Environmental Awareness
The Coming Revolution in Environmental AwarenessThe Coming Revolution in Environmental Awareness
The Coming Revolution in Environmental Awareness
 

More from Globus

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdfExtending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 

More from Globus (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)The Department of Energy's Integrated Research Infrastructure (IRI)
The Department of Energy's Integrated Research Infrastructure (IRI)
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdfExtending Globus into a Site-wide Automated Data Infrastructure.pdf
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
 
Globus at the United States Geological Survey
Globus at the United States Geological SurveyGlobus at the United States Geological Survey
Globus at the United States Geological Survey
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflowsGlobus Compute with Integrated Research Infrastructure (IRI) workflows
Globus Compute with Integrated Research Infrastructure (IRI) workflows
 
Reactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the GapReactive Documents and Computational Pipelines - Bridging the Gap
Reactive Documents and Computational Pipelines - Bridging the Gap
 

Recently uploaded

Recently uploaded (20)

GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 

GlobusWorld 2021: Saving Arecibo Observatory Data

  • 1. Arecibo Observatory Data Movement: so much more than data 2021.05.12 Julio Alvarado Negron Big Data Program Manager @ Arecibo Observatory George B. Robb III, EPOC - Performance Chaser ESnet - Infrastructure Team Globus World 2021
  • 2. What is Big Data? A collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that limited traditional data management tools are able to store it or process it efficiently. Examples The New York Stock Exchange generates about 1TB of new trade data per day. Facebook generates over 500TB of data daily. A jet engine generates over 10TB of data in 30 minutes of flight. AO has the capability to generate over 80TB per day, with a total of over 3PB of data stored. Big Data @ AO - Data Management and Governance practices implementation - Facilitate access to community to Arecibo’s data - Enables access to High-Performing Computing - Implement best practices and lessons learned from partner observatories and research community Big Data Overview
  • 3. A Full Spectrum Pioneer of Sciences Since 1963 Is the study of radio waves produced by a astronomical objects such as Sun, planets, pulsars, stars, etc. Arecibo radio telescope sensitivity allows astronomers to detect faint radio signals from far-off regions of the universe. Fast Radio Bursts, Pulsars, Spectral line, Exoplanets, VLBI. More Info Here Radio Astronomy Is the investigation of the earth's gaseous envelope. The Arecibo Radio Telescope can measure the growth and decay of disturbances in ionosphere (altitudes above 30 miles). The "big dish" is also used to study plasma physics processes in the electrically charged regions where radio waves are influenced most. More Info Here Atmospheric Sciences The Arecibo Observatory was the world's most powerful planetary radar system. The 305 meter Arecibo telescope equipped with a 1 MW transmitter at S-band (12.6 cm, 2380 MHz) was used for studies of small bodies in the solar system, terrestrial planets, and planetary satellites including the Moon. Near Earth Asteroids characterization, Surface Structure (spacecrafts landing) More Info Here Planetary Radar
  • 4. ALFA The Arecibo L-band Feed Array (ALFA) is a seven feed system that allows large-scale surveys of the sky to be conducted with unprecedented sensitivity using the 305-m Arecibo telescope in Puerto Rico. ALFA, operating near 1.4 GHz, consists of a cluster of seven cooled dual-polarization feeds, a fiber-optical transmission system, and digital back-end signal processors. Most of this projects are considered “surveys” due to their nature. The radar is left static in a position while the Earth rotates, allowing to “drift scan” the sky above Arecibo. It could generate an aggregate of 875MB/s, 76TB per day. Knowing the Sources and Discoveries Using ALFA for ALFALFA
  • 5. Knowing the Sources and Discoveries Venus Characterization Venus is covered in a thick layer of clouds, but Arecibo’s radar beams were able to cut through that haze and bounce off of the rocky planet’s surface, allowing researchers to map the terrain. In the figures, we can compare the first large scale view of Venus (1971) and the 2015 image with improved equipments. - Arecibo Discoveries
  • 6. Knowing the Sources and Discoveries Fast Radio Bursts Fast radio bursts, or FRBs, are brief, brilliant blasts of radio waves with unknown origins. The first FRB known to give off multiple bursts was FRB 121102, which Arecibo first spotted in 2012 and again in 2015. Arecibo’s discovery backed up the theory from the Charles Parkes telescope in Australia that FRB’s are events that come farther than the Milky Way. Radio bursts are observed during 90 days followed by a silent period of 67 days. The same behaviour then repeats every 157 days. - Arecibo Discoveries
  • 7. 50+ Years of Contributions
  • 8. 50+ Years of Contributions
  • 9. First Cable Snaps On August 10th a first cable snaps causing damage to the dish. Second Cable Snaps On November 6th a second cable snaps causing major damage to the dish. December’s Check Mate A main support cable broke from Tower 4, causing the platform to fall over the dish. The team got together and realized that the data safety and integrity was a priority. A Sequence of Snaps
  • 10. The Big Picture Arecibo Observatory holds over 3PB of data onsite. This amount is spread between active hard drives, offline disks and the tape library. Arecibo also has copies of data stored on various institutions across the globe, to which we refer to as offsite data. Not enough fiber Arecibo’s Internet connection is limited to 1Gbps due to the condition of the infrastructure to the site. With the existing connection, transferring 3PB would need over 24 months. The Data in Numbers and Infra Limitations
  • 11. The Call for Help Right after the collapse, the team at Arecibo understood the urgency of adding redundancy and safekeep the data. Immediately, we reached the Office of Research at UCF. From there, the logistics were driven funneled through the Research community. Getting the Teams Together In a matter of days, Arecibo got connected to working teams, BIG THANKS: - EPOC/ESnet - transfer optimization and hardware - CICoE - data management practices - TACC - high performance computing and storage - Univ of Puerto Rico HPCf - 10Gbps connectivity (I2 - AMPATH) - Engine-4 - 10Gbps connectivity - Globus - data transfer optimization The SOS Call THANK YOU!
  • 12. Data Migration Once the working groups worked intensely to establish the processes and the mechanisms, the team at Arecibo proceeded to load the data to the NAS boxes. Those boxes are being taken to our partners, University of Puerto Rico at Mayaguez (UPRM) and Engine-4 (E4) in Bayamon. From there, the data is uploaded to the TACC via 10Gbps links. The UPRM has a 10Gbps via the AMPATH (I2) and E4 has a 10Gbps via commercial route. Benchmarks Before utilizing Globus, the team relied in rsync to move the data from Arecibo and the partners. That resulted in an avg transfer speed of 47MBps via 10Gbps wire. Once Globus Connect Personal was installed and configure in the NAS, the Effective Speed reported has been sustained at over 200MBps. The Data Transfer Arecibo Data Transfer Project Data uploaded to Computing Center 2 S t o r a g e t r a n s p o r t e d b a c k t o A O 3 1 D a t a t r a n s p o r t e d t o P a r t n e r