Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Infrastructure Development for SKA/Jasper Horrell


Published on

Presentation during AOSP ICT Infrastructure meeting, 14 May 2018, Pretoria, SA.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data Infrastructure Development for SKA/Jasper Horrell

  1. 1. Data Infrastructure Development for SKA Dr Jasper Horrell Inter-University Institute for Data Intensive Astronomy (on behalf of Prof Russ Taylor)
  2. 2. South SKA Timeline(s) SKA1 SKA 1% 10% 100% 2015 2020 20252010 SA Facilities MeerKAT Science Pathfinder Science global facilities Early studies (e.g. VLA, GMRT deep fields, CHILES,..) Precursor Science MeerKAT, ASKAP Large Survey Projects SKA1 Key Science Survey Programs SKA Wide-field Long baseline survey radio astronomy
  3. 3. The Square Kilometre Array Data to Knowledge Cloud enabled Analytics and visualization AI assisted data exploration and visualization High Performance Computing
  4. 4. Africa Growth of Data Volumes to Radio Astronomers Pathfinders Precursors SKA1 SKA
  5. 5. Changing Sociology of Radio Astronomy Key science on the SKA will be achieved by large-scale survey programs executed by globally distributed teams of researchers and creating massive data (nature of scientific enterprise changes) old new
  6. 6. MeerKAT image courtesy of SARAO
  7. 7. •LADUMA (Deep atomic hydrogen) •MIGHTEE (Deep continuum imaging of the early universe) •Fornax (Deep HI Survey of the Fornax cluster ) •MHONGOOSE (targeted nearby galaxies HI) •MeerKAT Absorption Line Survey (extagalactic HI absorption) •ThunderKAT (exotic phenomena, variables and transients) •TRAPUM (pulsar search) •Pulsar Timing (no acronym) •MESMER (High-z CO) •MeerGAL (Galactic Plane Survey) MeerKAT Large Survey Projects imaging Time domain
  8. 8. MeerKAT Large Surveys (43,000 hours allocated) 8 22 countries
  9. 9. MeerKAT MIGHTEE Large Survey Project Graphic: Ian Heywood
  10. 10. The Challenges: data to knowledge 10 Data Observation Theory Knowledge • Managing exponential increase in rates and volumes. • Empowering the end user for multi-purpose processing, analytics, data mining and exploration. • Data Fusion with big multi-wavelength data and big simulations • Collaboration, sharing and joint execution of data-intensive projects by globally distributed teams of researchers
  11. 11. SKA Regional “Science” and Data Centres What is the “roadmap”-how do we get there from here? • Leverage South African Investment in MeerKAT to SA expertise and leadership • Engage university researchers as stakeholder and drivers of innovation • Training of cohort of required data scientists • “Precursor” SKA Regional Science Centres?
  12. 12. • Collaborate on development of Precursor SKA Regional Science and Data Centres • ASTRON, IDIA, SKA-SA (SARAO) • Bring together MeerKAT and LOFAR key science • Two initial components: • Data transport for moving data to Tier 2 processing centres • Processing pipelines for executing at Tier 2 processing centres JVLA ALMA MoU Signed 17 November 2015 Precursor Regional Science Centres
  13. 13. EU Horizon 2020 Project Lead by ASTRON in the Netherlands 28 participants, including 3 in South Africa • IDIA • CSIR • NRF (SKA-SA) Precursor Regional Science Centres
  14. 14. Canada Foundation for Innovation ($10M – Oct 2017) Lead by Dunlop Institute, Canada “Unlocking the Radio Sky with Next Generation Survey Astronomy” • ASKAP, VLASS, CHIME Precursor Regional Science Centres
  15. 15. US National Radio Astronomy Observatory • Collaboration of development for data processing for data-intensive radio astronomy projects • Software system used for processing for Jansky Very Large Array, Atacama Large Millimetre Array and MeerKAT JVLA ALMA Signed 17 January 2016
  16. 16. CARTA collaboration Cube Analysis and Rendering Tool for Astronomy ASIAA (Taiwan) – IDIA (South Africa) – NRAO (US) To be deployed at ALMA Regional Science Centres • Web-based client-server visualization • IDIA’s role: • Introduce support for new file formats for MeerKAT and SKA data sets • Cloud-based distributed rendering and analysis for big data • Improve existing analysis algorithms and introduce distributed analysis algorithms • Design of graphic user interface and vis analytics for MeerKAT imaging LSP use cases
  17. 17. Data Intensive Astronomy Research Cloud Prototype federated cloud between UCT and NWU (2016) Cloud for data intensive research built from open source software
  18. 18. IDIA Data-Centric Facility: Jan 2017 • Builds on core services provided by ARC • Adds components for greater performance, increased capacity and for large local POSIX storage • 40 Compute nodes : • 2.6GHz Xeon Processors • 32 cores, 256 GB RAM / node • 4 nodes have 2x NVidia K80 GPUS • 4 x Storage Targets to provide POSIX volumes that add to the block and object storage from ARC nodes • Initially 500TB usable, growing to many PB • 50Gb/s Ethernet core, attached to ARC • 10Gb/s Access network connected to SANReN R10M IDIA investment
  19. 19. ILIFU: DIRISA Tier 2 Data Intensive Research Facility • Astronomy (IDIA, SKA-SA) • Data Intensive Astronomy with priority on MeerKAT Large Survey Programs • Precursor SKA Regional Science Centre with EU • Tier 2 node of South African Data Intensive Research Cloud federation with T1 and T3 infrastructure • Data Intensive Bioinformatics • Tuberculosis Surveillance in Africa (UWC) • Imputation service for African human genetics (UCT) • Omics for Precision Medicine (SU) • Research Data Management (CPUT) Joint investment CSIR/DIRISA, IDIA +UCT Bio
  20. 20. DIRISA Tier 2 Data Intensive Research Facility “ARC” Demonstrator Operational since 2016 IDIA Data-Intensive Research Facility Operational June 2017 CSIR Funded operational 2018
  21. 21. Data Intensive Astronomy Cloud - V 2.0 Production version
  22. 22. Data Intensive Astronomy Cloud
  23. 23. Data Intensive Astronomy Cloud Image: MeerKAT AR1.5 Deep Observation
  24. 24. Moving to scale (ideally) ILIFU
  25. 25. (Next Gen) Data-Intensive Research Cloud • University of Cape Town • University of the Western Cape • Wits University • University of Pretoria • North-West University • Sol Plaatje University • South African Radio Astronomy Observ. • South African National Space Agency Strategic research programs in Astronomy, Bioinformatics, Geospatial Link to DIRISA e-science training initiative Link to EU AENEAS centres
  26. 26. Precursor Regional Science Centres Data transport demonstrator for SKA Data Delivery Systems • Clone of IDIA data intensive astronomy cloud at ASTRON • Data sharing and integrated analytics for teams working with LOFAR and MeerKAT Large Projects
  27. 27. DTN Deployment • FTS used to provide restartable data transfers • GridFTP transfer endpoints as used in 100Gb/s HEP data transfer demonstrations • X509 Authorization for requests and for GridFTP transfers
  28. 28. IDIA Visualisation Lab • “Visualization Wall” for large image data formats • 2x2 array of high end ultra-HD displays (“8k” resolution 7860x4320) • Collaborative meeting space (on-line and local). Place to meet and share exploration of large data sets. • Plug-and-play with external graphics card support • Panoramic Immersive Visualization System • Immersive single-user mini-dome • Development platform for Iziko Digital Dome Visualization from the cloud • High end Virtual Reality setup
  29. 29. Virtual Reality Project • Supports over 1M data points • User can move around data by walking around room • Web-based user interface on user’s (virtual) wrist
  30. 30. Thank You for more information: