Data Infrastructure Development for SKA/Jasper Horrell
1. Data Infrastructure
Development for SKA
Dr Jasper Horrell
Inter-University Institute for Data Intensive Astronomy
(on behalf of Prof Russ Taylor)
2. South
SKA Timeline(s)
SKA1 SKA
1% 10% 100%
2015 2020 20252010
SA Facilities
MeerKAT
Science
Pathfinder Science
global facilities
Early studies (e.g. VLA,
GMRT deep fields,
CHILES,..)
Precursor
Science
MeerKAT, ASKAP
Large Survey Projects
SKA1
Key Science
Survey Programs
SKA
Wide-field
Long baseline
survey
radio astronomy
3. The Square Kilometre Array
Data to Knowledge
Cloud enabled Analytics and visualization
AI assisted data exploration and visualization
High Performance Computing
5. Changing Sociology of Radio Astronomy
Key science on the SKA will be achieved by large-scale survey
programs executed by globally distributed teams of researchers and
creating massive data (nature of scientific enterprise changes)
old new
10. The Challenges: data to knowledge
10
Data
Observation Theory
Knowledge
• Managing exponential increase in rates and volumes.
• Empowering the end user for multi-purpose
processing, analytics, data mining and exploration.
• Data Fusion with big multi-wavelength data and big
simulations
• Collaboration, sharing and joint execution of
data-intensive projects by globally distributed teams
of researchers
11. SKA Regional “Science” and Data Centres
What is the “roadmap”-how do we get there from here?
• Leverage South African Investment in MeerKAT to SA expertise and leadership
• Engage university researchers as stakeholder and drivers of innovation
• Training of cohort of required data scientists
• “Precursor” SKA Regional Science Centres?
12. • Collaborate on development of Precursor SKA Regional Science and Data Centres
• ASTRON, IDIA, SKA-SA (SARAO)
• Bring together MeerKAT and LOFAR key science
• Two initial components:
• Data transport for moving data to Tier 2 processing centres
• Processing pipelines for executing at Tier 2 processing centres
JVLA ALMA
MoU Signed 17 November 2015
Precursor Regional Science Centres
13. EU Horizon 2020 Project
Lead by ASTRON in the Netherlands
28 participants, including 3 in South Africa
• IDIA
• CSIR
• NRF (SKA-SA)
Precursor Regional Science Centres
14. Canada Foundation for Innovation
($10M – Oct 2017)
Lead by Dunlop Institute, Canada
“Unlocking the Radio Sky with Next
Generation Survey Astronomy”
• ASKAP, VLASS, CHIME
Precursor Regional Science Centres
15. US National Radio Astronomy Observatory
• Collaboration of development for data processing for data-intensive radio
astronomy projects
• Software system used for processing for Jansky Very Large Array, Atacama Large
Millimetre Array and MeerKAT
JVLA ALMA
Signed 17 January 2016
16. CARTA collaboration
Cube Analysis and Rendering Tool for Astronomy
ASIAA (Taiwan) – IDIA (South Africa) – NRAO (US)
To be deployed at ALMA Regional Science Centres
• Web-based client-server visualization
• IDIA’s role:
• Introduce support for new file formats for MeerKAT and SKA data sets
• Cloud-based distributed rendering and analysis for big data
• Improve existing analysis algorithms and introduce distributed analysis algorithms
• Design of graphic user interface and vis analytics for MeerKAT imaging LSP use cases
http://cartavis.github.io
17. Data Intensive Astronomy Research Cloud
Prototype federated cloud between UCT and NWU (2016)
Cloud for data intensive
research built from open
source software
18.
19. IDIA Data-Centric Facility: Jan 2017
• Builds on core services provided by ARC
• Adds components for greater performance, increased
capacity and for large local POSIX storage
• 40 Compute nodes :
• 2.6GHz Xeon Processors
• 32 cores, 256 GB RAM / node
• 4 nodes have 2x NVidia K80 GPUS
• 4 x Storage Targets to provide POSIX volumes that add to
the block and object storage from ARC nodes
• Initially 500TB usable, growing to many PB
• 50Gb/s Ethernet core, attached to ARC
• 10Gb/s Access network connected to SANReN
R10M IDIA investment
20. ILIFU: DIRISA Tier 2 Data Intensive Research Facility
• Astronomy (IDIA, SKA-SA)
• Data Intensive Astronomy with priority on MeerKAT Large Survey
Programs
• Precursor SKA Regional Science Centre with EU
• Tier 2 node of South African Data Intensive Research Cloud
federation with T1 and T3 infrastructure
• Data Intensive Bioinformatics
• Tuberculosis Surveillance in Africa (UWC)
• Imputation service for African human genetics (UCT)
• Omics for Precision Medicine (SU)
• Research Data Management (CPUT)
Joint investment CSIR/DIRISA, IDIA +UCT Bio
21. DIRISA Tier 2 Data Intensive Research Facility
“ARC” Demonstrator
Operational since 2016
IDIA Data-Intensive Research Facility
Operational June 2017
CSIR Funded
operational 2018
26. (Next Gen) Data-Intensive Research Cloud
• University of Cape Town
• University of the Western Cape
• Wits University
• University of Pretoria
• North-West University
• Sol Plaatje University
• South African Radio Astronomy Observ.
• South African National Space Agency
Strategic research programs in
Astronomy, Bioinformatics,
Geospatial
Link to DIRISA e-science training
initiative
Link to EU AENEAS centres
27. Precursor Regional Science Centres
Data transport demonstrator for SKA Data Delivery Systems
• Clone of IDIA data intensive astronomy cloud at ASTRON
• Data sharing and integrated analytics for teams working with LOFAR and MeerKAT Large Projects
28. DTN Deployment
• FTS used to provide
restartable data transfers
• GridFTP transfer endpoints
as used in 100Gb/s HEP
data transfer
demonstrations
• X509 Authorization for
requests and for GridFTP
transfers
29. IDIA Visualisation Lab
• “Visualization Wall” for large image data formats
• 2x2 array of high end ultra-HD displays (“8k” resolution 7860x4320)
• Collaborative meeting space (on-line and local). Place to meet and share exploration of
large data sets.
• Plug-and-play with external graphics card support
• Panoramic Immersive Visualization System
• Immersive single-user mini-dome
• Development platform for Iziko Digital Dome
Visualization from the cloud
• High end Virtual Reality setup
30. Virtual Reality Project
• Supports over 1M data points
• User can move around data by walking around room
• Web-based user interface on user’s (virtual) wrist