SlideShare a Scribd company logo
1 of 18
OCEAN/ATMOSPHERE
SCIENCES: A DATA
DRIVEN SCIENCE
SURYACHANDRA A. RAO
(surya@tropmet.res.in)
INDIAN INSTITUTE OF TROPICAL METEOROLOGY
MINISTRY OF EARTH SCIENCES
INDIA
Thanks to DDN for this opportunity
Outline of the Presentation
• Data growth in Ocean/atmosphere Sciences
• Past to present setups of HPC and data storage systems of MOES
• Major challenges in management of data
• Future expectations
Mandate of MoES
The primary mandate of the Ministry of
Earth Sciences (MoES) is to provide the
nation with best possible services in
forecasting the monsoons and other
weather/climate parameters, ocean state,
earthquakes, tsunamis and other
phenomena related to earth systems.
Ekman Currents in the Ocean
MoES Weather, Climate and Ocean
State Forecasts
Short Range (Next 2-3 days)
Medium Range (Up to 7-10 days)
Extended Range (beyond 2 weeks up to one
month)
Long Range (Seasonal mean)
Climate change projections (contributing to
IPCC, CMIP6)
Ocean State Forecast (next 3-5 days)
Potential Fishing Zone advisories
Air quality forecast (next 2-3 days)
Agricultural, Forest Fire, Hydrology advisories
Tsunami Warnings
Exponential Data Growth in Earth Sciences in
last 2 decades
Data required to generate Forecasts
0
10
20
30
40
50
60
70
0
1
2
3
4
5
6
1997200120032006200820092010201120122013201420152016
FTP(GB/day)
GTS(GB/day)
Year
FTP (SAT + RADAR)
IMD(GTS)
Surface/Upper Air observations
Why so much data?
In order to make a single day forecast
• Initial data from various sensors/satellites  75 GB/day
• Analysis (combining the data with model forecasts)  75 Gb/day
• Model forecasts  250 GB/day (as it runs at much higher resolution)
• To reduce uncertainty in forecasts 50 of those model forecasts are re-run with
slight perturbations to initial data/analysis  50 TB/day
 In a day several forecasts are made for medium, extended and long range
forecasts in addition to climate change projections and R&D experiments to
improve models
HPC and Data Storage Capacities @ MOES
Year HPC Capacity @
IITM (@MoES)
Data Storage
Capacity
@MoES
Tape Capacity
2008 7 TF (50 TF) 300 TB 1 PB
2009 70 TF (115 TF) 3 PB 2 PB
2014 790 TF (1150 TF) 18 PB 30 PB
2018 4000 TF (8000 TF) 45 PB 40 PB
Challenges and Changes in data flow
Challenges
• Programming of efficient workflows
• Efficient analysis of data
• Organizing data sets
• Ensuring reproducibility of workflows/provenance of data
• Meeting the compute/storage needs in future complex hardware landscape
Expected Data Characteristics in 2020+
• Velocity: Input 5 TB/day (for NWP; reduced data from instruments)
• Volume: Data output of ensembles in PBs of data
• Data products are used by 3rd parties
• Various file formats
Source: Julian M. Kunkel, University of Reading
Major Concerns of data handling
• Data Collection and its Preservation
 Data is growing rapidly (multi-platform, multi-model with multi-resolutions)
Make it readily available or archive it for long storage?
Moving it from one configuration to the other as systems get upgraded
• Accessing the preserved data
 Retrieval of archived data at faster rates than present
• Utilizing the preserved data
 Combining the data from different experiments, forecasts etc.
 Transferring data from one computer to another (bandwidth limitations)
 Analyzing the big datasets themselves
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
MOES ASSOCIATION WITH DDN
2013 2016 2019
1PB DDN S2A9900 10PB DDN 7700X 27PB DDN SFA18K
5 GB/s Performance 200 GB/s Performance 81 GB/s Performance
Innovative Use of 300M IB Cables for 10PB
Storage
• Two data centers which are 300M apart at IITM. MoES wanted storage connectivity over RDMA capable
InfiniBand.
• First use of 300M Mellanox LinkX modules and InfiniBand cables in APAC. Storage delivered 200 GB/s
performance.
• Data migration from DDN S2A9900 connected to POWER6 compute over DDR InfiniBand.
10PB DDN Storage
in DC1
6x 648 Port Chassis
Switches in DC2 Leaf IB
Switches
in DC1
Protective Duct for Multi-Mode Fiber
300M SR4
optical
modules
Patch panel
with MPO
connector
Spine IB
Switches
in DC1
Patch panel
with MPO
connector
Aaditya 790+TF HPC System
Data
Movers
Qlogic/Silverstorm
DDR IB Switch
DDN S2A9900
DC2 DC1
Adoption of Disk Archive & HPSS for Long Term Archive
Core Switch :1 Core Switch :2
TORTOR
Home
File system
Scratch
File system
Existing CRAY Compute &
Storage Environment
Data Movers Data Movers
EDR InfiniBand N/w
DDN Storage
17PB @ IITM
10PB @ NCMRWF
NAS Gateways
CRAY Ethernet
Switch
• In 2019, MoES decided to procure 27PB disk
based archive along with HPSS for long term
archive at two of its sites.
• Through competitive tendering process, ATOS
with DDN were selected to provide this
technology.
• Factors that governed winning bid:
• Price/Performance
• Total Cost of Ownership including data
center footprint
• HPSS Integration experience
• Currently being installed at both sites.
Smallest DC footprint
Highest performance/$
Existing HPSS References
Challenges and Needs of MOES
Challenges
Data migration from one generation system to another including disk to disk, tape to tape and
disk to tape.
Evaluation of new technology that improves I/O of weather/climate simulations.
Data center footprint, electricity consumption
Filesystem reliability. All computation is time sensitive and must not stop because of storage or
filesystem issues.
Needs
Reliable storage solution and system integration capability with experience in data migration.
Vendors to come forward with innovative proposals and willingness to work with MoES to
reduce the simulation time of relevant applications
Focus on Total Cost of Ownership
Tight integration between storage and filesystem. Vendors filesystem support capabilities are
very important
Next Steps for MoES on Data Storage
• Evaluate new technologies
• Use of ESDM for Heterogeneous Storage Infrastructures/HPSS for incremental
scalability
• Benefits of 3D X-point storage
• Usage of Flash on wider scale. Burst Buffer based on MoES application
performance
• Single storage to support for diverse computing architectures
THANK YOU

More Related Content

More from inside-BigData.com

Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computinginside-BigData.com
 
Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)inside-BigData.com
 

More from inside-BigData.com (20)

Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computing
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Ocean/Atmosphere Sciences: A Data Driven Science

  • 1. OCEAN/ATMOSPHERE SCIENCES: A DATA DRIVEN SCIENCE SURYACHANDRA A. RAO (surya@tropmet.res.in) INDIAN INSTITUTE OF TROPICAL METEOROLOGY MINISTRY OF EARTH SCIENCES INDIA Thanks to DDN for this opportunity
  • 2. Outline of the Presentation • Data growth in Ocean/atmosphere Sciences • Past to present setups of HPC and data storage systems of MOES • Major challenges in management of data • Future expectations
  • 3. Mandate of MoES The primary mandate of the Ministry of Earth Sciences (MoES) is to provide the nation with best possible services in forecasting the monsoons and other weather/climate parameters, ocean state, earthquakes, tsunamis and other phenomena related to earth systems.
  • 4.
  • 5. Ekman Currents in the Ocean
  • 6. MoES Weather, Climate and Ocean State Forecasts Short Range (Next 2-3 days) Medium Range (Up to 7-10 days) Extended Range (beyond 2 weeks up to one month) Long Range (Seasonal mean) Climate change projections (contributing to IPCC, CMIP6) Ocean State Forecast (next 3-5 days) Potential Fishing Zone advisories Air quality forecast (next 2-3 days) Agricultural, Forest Fire, Hydrology advisories Tsunami Warnings
  • 7. Exponential Data Growth in Earth Sciences in last 2 decades
  • 8. Data required to generate Forecasts 0 10 20 30 40 50 60 70 0 1 2 3 4 5 6 1997200120032006200820092010201120122013201420152016 FTP(GB/day) GTS(GB/day) Year FTP (SAT + RADAR) IMD(GTS) Surface/Upper Air observations
  • 9. Why so much data? In order to make a single day forecast • Initial data from various sensors/satellites  75 GB/day • Analysis (combining the data with model forecasts)  75 Gb/day • Model forecasts  250 GB/day (as it runs at much higher resolution) • To reduce uncertainty in forecasts 50 of those model forecasts are re-run with slight perturbations to initial data/analysis  50 TB/day  In a day several forecasts are made for medium, extended and long range forecasts in addition to climate change projections and R&D experiments to improve models
  • 10. HPC and Data Storage Capacities @ MOES Year HPC Capacity @ IITM (@MoES) Data Storage Capacity @MoES Tape Capacity 2008 7 TF (50 TF) 300 TB 1 PB 2009 70 TF (115 TF) 3 PB 2 PB 2014 790 TF (1150 TF) 18 PB 30 PB 2018 4000 TF (8000 TF) 45 PB 40 PB
  • 11. Challenges and Changes in data flow Challenges • Programming of efficient workflows • Efficient analysis of data • Organizing data sets • Ensuring reproducibility of workflows/provenance of data • Meeting the compute/storage needs in future complex hardware landscape Expected Data Characteristics in 2020+ • Velocity: Input 5 TB/day (for NWP; reduced data from instruments) • Volume: Data output of ensembles in PBs of data • Data products are used by 3rd parties • Various file formats Source: Julian M. Kunkel, University of Reading
  • 12. Major Concerns of data handling • Data Collection and its Preservation  Data is growing rapidly (multi-platform, multi-model with multi-resolutions) Make it readily available or archive it for long storage? Moving it from one configuration to the other as systems get upgraded • Accessing the preserved data  Retrieval of archived data at faster rates than present • Utilizing the preserved data  Combining the data from different experiments, forecasts etc.  Transferring data from one computer to another (bandwidth limitations)  Analyzing the big datasets themselves
  • 13. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. MOES ASSOCIATION WITH DDN 2013 2016 2019 1PB DDN S2A9900 10PB DDN 7700X 27PB DDN SFA18K 5 GB/s Performance 200 GB/s Performance 81 GB/s Performance
  • 14. Innovative Use of 300M IB Cables for 10PB Storage • Two data centers which are 300M apart at IITM. MoES wanted storage connectivity over RDMA capable InfiniBand. • First use of 300M Mellanox LinkX modules and InfiniBand cables in APAC. Storage delivered 200 GB/s performance. • Data migration from DDN S2A9900 connected to POWER6 compute over DDR InfiniBand. 10PB DDN Storage in DC1 6x 648 Port Chassis Switches in DC2 Leaf IB Switches in DC1 Protective Duct for Multi-Mode Fiber 300M SR4 optical modules Patch panel with MPO connector Spine IB Switches in DC1 Patch panel with MPO connector Aaditya 790+TF HPC System Data Movers Qlogic/Silverstorm DDR IB Switch DDN S2A9900 DC2 DC1
  • 15. Adoption of Disk Archive & HPSS for Long Term Archive Core Switch :1 Core Switch :2 TORTOR Home File system Scratch File system Existing CRAY Compute & Storage Environment Data Movers Data Movers EDR InfiniBand N/w DDN Storage 17PB @ IITM 10PB @ NCMRWF NAS Gateways CRAY Ethernet Switch • In 2019, MoES decided to procure 27PB disk based archive along with HPSS for long term archive at two of its sites. • Through competitive tendering process, ATOS with DDN were selected to provide this technology. • Factors that governed winning bid: • Price/Performance • Total Cost of Ownership including data center footprint • HPSS Integration experience • Currently being installed at both sites. Smallest DC footprint Highest performance/$ Existing HPSS References
  • 16. Challenges and Needs of MOES Challenges Data migration from one generation system to another including disk to disk, tape to tape and disk to tape. Evaluation of new technology that improves I/O of weather/climate simulations. Data center footprint, electricity consumption Filesystem reliability. All computation is time sensitive and must not stop because of storage or filesystem issues. Needs Reliable storage solution and system integration capability with experience in data migration. Vendors to come forward with innovative proposals and willingness to work with MoES to reduce the simulation time of relevant applications Focus on Total Cost of Ownership Tight integration between storage and filesystem. Vendors filesystem support capabilities are very important
  • 17. Next Steps for MoES on Data Storage • Evaluate new technologies • Use of ESDM for Heterogeneous Storage Infrastructures/HPSS for incremental scalability • Benefits of 3D X-point storage • Usage of Flash on wider scale. Burst Buffer based on MoES application performance • Single storage to support for diverse computing architectures