In this deck from the DDN User Group at SC19, Dr. Suryachandra Rao from MoES presents: Ocean/Atmosphere Sciences: A Data Driven Science. The Ministry of Earth Sciences (MoES) is mandated to provide services for weather, climate, ocean and coastal state, hydrology, seismology, and natural hazards; to explore and harness marine living and non-living resources in a sustainable way and to explore the three poles (Arctic, Antarctic and Himalayas).
"MoES recently inaugurated a new supercomputer at the Indian Institute of Tropical Meteorology (IITM) in Pune, dedicated to improving weather and climate forecasts across the country. The high-performance computing (HPC) facility will provide improved weather forecasts at block level over India; higher resolution forecasts during the monsoon; high-resolution coupled models for cyclone prediction with more accuracy and lead time; improved ocean state forecasts including marine water quality forecasts at high resolution; tsunami forecasts with greater lead time; air quality forecasts for different smart cities; and high-resolution climate projections. The HPC facility will also be utilized by other MoES institutes for research activities to improve their respective weather and climate services. MoESs new supercomputer in Pune will boost the organizations overall HPC infrastructure to 6.8 petaflops of computing power, making it one of the most powerful HPC facilities in the world."
Watch the video: https://wp.me/p3RLHQ-lph
Learn more: https://www.moes.gov.in/
and
https://www.ddn.com/company/events/user-group-sc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
1. OCEAN/ATMOSPHERE
SCIENCES: A DATA
DRIVEN SCIENCE
SURYACHANDRA A. RAO
(surya@tropmet.res.in)
INDIAN INSTITUTE OF TROPICAL METEOROLOGY
MINISTRY OF EARTH SCIENCES
INDIA
Thanks to DDN for this opportunity
2. Outline of the Presentation
• Data growth in Ocean/atmosphere Sciences
• Past to present setups of HPC and data storage systems of MOES
• Major challenges in management of data
• Future expectations
3. Mandate of MoES
The primary mandate of the Ministry of
Earth Sciences (MoES) is to provide the
nation with best possible services in
forecasting the monsoons and other
weather/climate parameters, ocean state,
earthquakes, tsunamis and other
phenomena related to earth systems.
6. MoES Weather, Climate and Ocean
State Forecasts
Short Range (Next 2-3 days)
Medium Range (Up to 7-10 days)
Extended Range (beyond 2 weeks up to one
month)
Long Range (Seasonal mean)
Climate change projections (contributing to
IPCC, CMIP6)
Ocean State Forecast (next 3-5 days)
Potential Fishing Zone advisories
Air quality forecast (next 2-3 days)
Agricultural, Forest Fire, Hydrology advisories
Tsunami Warnings
8. Data required to generate Forecasts
0
10
20
30
40
50
60
70
0
1
2
3
4
5
6
1997200120032006200820092010201120122013201420152016
FTP(GB/day)
GTS(GB/day)
Year
FTP (SAT + RADAR)
IMD(GTS)
Surface/Upper Air observations
9. Why so much data?
In order to make a single day forecast
• Initial data from various sensors/satellites 75 GB/day
• Analysis (combining the data with model forecasts) 75 Gb/day
• Model forecasts 250 GB/day (as it runs at much higher resolution)
• To reduce uncertainty in forecasts 50 of those model forecasts are re-run with
slight perturbations to initial data/analysis 50 TB/day
In a day several forecasts are made for medium, extended and long range
forecasts in addition to climate change projections and R&D experiments to
improve models
11. Challenges and Changes in data flow
Challenges
• Programming of efficient workflows
• Efficient analysis of data
• Organizing data sets
• Ensuring reproducibility of workflows/provenance of data
• Meeting the compute/storage needs in future complex hardware landscape
Expected Data Characteristics in 2020+
• Velocity: Input 5 TB/day (for NWP; reduced data from instruments)
• Volume: Data output of ensembles in PBs of data
• Data products are used by 3rd parties
• Various file formats
Source: Julian M. Kunkel, University of Reading
12. Major Concerns of data handling
• Data Collection and its Preservation
Data is growing rapidly (multi-platform, multi-model with multi-resolutions)
Make it readily available or archive it for long storage?
Moving it from one configuration to the other as systems get upgraded
• Accessing the preserved data
Retrieval of archived data at faster rates than present
• Utilizing the preserved data
Combining the data from different experiments, forecasts etc.
Transferring data from one computer to another (bandwidth limitations)
Analyzing the big datasets themselves
14. Innovative Use of 300M IB Cables for 10PB
Storage
• Two data centers which are 300M apart at IITM. MoES wanted storage connectivity over RDMA capable
InfiniBand.
• First use of 300M Mellanox LinkX modules and InfiniBand cables in APAC. Storage delivered 200 GB/s
performance.
• Data migration from DDN S2A9900 connected to POWER6 compute over DDR InfiniBand.
10PB DDN Storage
in DC1
6x 648 Port Chassis
Switches in DC2 Leaf IB
Switches
in DC1
Protective Duct for Multi-Mode Fiber
300M SR4
optical
modules
Patch panel
with MPO
connector
Spine IB
Switches
in DC1
Patch panel
with MPO
connector
Aaditya 790+TF HPC System
Data
Movers
Qlogic/Silverstorm
DDR IB Switch
DDN S2A9900
DC2 DC1
15. Adoption of Disk Archive & HPSS for Long Term Archive
Core Switch :1 Core Switch :2
TORTOR
Home
File system
Scratch
File system
Existing CRAY Compute &
Storage Environment
Data Movers Data Movers
EDR InfiniBand N/w
DDN Storage
17PB @ IITM
10PB @ NCMRWF
NAS Gateways
CRAY Ethernet
Switch
• In 2019, MoES decided to procure 27PB disk
based archive along with HPSS for long term
archive at two of its sites.
• Through competitive tendering process, ATOS
with DDN were selected to provide this
technology.
• Factors that governed winning bid:
• Price/Performance
• Total Cost of Ownership including data
center footprint
• HPSS Integration experience
• Currently being installed at both sites.
Smallest DC footprint
Highest performance/$
Existing HPSS References
16. Challenges and Needs of MOES
Challenges
Data migration from one generation system to another including disk to disk, tape to tape and
disk to tape.
Evaluation of new technology that improves I/O of weather/climate simulations.
Data center footprint, electricity consumption
Filesystem reliability. All computation is time sensitive and must not stop because of storage or
filesystem issues.
Needs
Reliable storage solution and system integration capability with experience in data migration.
Vendors to come forward with innovative proposals and willingness to work with MoES to
reduce the simulation time of relevant applications
Focus on Total Cost of Ownership
Tight integration between storage and filesystem. Vendors filesystem support capabilities are
very important
17. Next Steps for MoES on Data Storage
• Evaluate new technologies
• Use of ESDM for Heterogeneous Storage Infrastructures/HPSS for incremental
scalability
• Benefits of 3D X-point storage
• Usage of Flash on wider scale. Burst Buffer based on MoES application
performance
• Single storage to support for diverse computing architectures