Data Infrastructure
Development for SKA
Dr Jasper Horrell
Inter-University Institute for Data Intensive Astronomy
(on behalf of Prof Russ Taylor)
South
SKA Timeline(s)
SKA1 SKA
1% 10% 100%
2015 2020 20252010
SA Facilities
MeerKAT
Science
Pathfinder Science
global facilities
Early studies (e.g. VLA,
GMRT deep fields,
CHILES,..)
Precursor
Science
MeerKAT, ASKAP
Large Survey Projects
SKA1
Key Science
Survey Programs
SKA
Wide-field
Long baseline
survey
radio astronomy
The Square Kilometre Array
Data to Knowledge
Cloud enabled Analytics and visualization
AI assisted data exploration and visualization
High Performance Computing
Africa
Growth of Data Volumes to Radio
Astronomers
Pathfinders Precursors SKA1 SKA
Changing Sociology of Radio Astronomy
Key science on the SKA will be achieved by large-scale survey
programs executed by globally distributed teams of researchers and
creating massive data (nature of scientific enterprise changes)
old new
MeerKAT
image courtesy of SARAO
•LADUMA (Deep atomic hydrogen)
•MIGHTEE (Deep continuum imaging of the early universe)
•Fornax (Deep HI Survey of the Fornax cluster )
•MHONGOOSE (targeted nearby galaxies HI)
•MeerKAT Absorption Line Survey (extagalactic HI absorption)
•ThunderKAT (exotic phenomena, variables and transients)
•TRAPUM (pulsar search)
•Pulsar Timing (no acronym)
•MESMER (High-z CO)
•MeerGAL (Galactic Plane Survey)
MeerKAT Large Survey Projects
http://public.ska.ac.za/meerkat/meerkat-large-survey-projects
imaging
Time domain
MeerKAT Large Surveys (43,000 hours allocated)
8
22 countries
MeerKAT MIGHTEE Large Survey Project
Graphic: Ian Heywood
The Challenges: data to knowledge
10
Data
Observation Theory
Knowledge
• Managing exponential increase in rates and volumes.
• Empowering the end user for multi-purpose
processing, analytics, data mining and exploration.
• Data Fusion with big multi-wavelength data and big
simulations
• Collaboration, sharing and joint execution of
data-intensive projects by globally distributed teams
of researchers
SKA Regional “Science” and Data Centres
What is the “roadmap”-how do we get there from here?
• Leverage South African Investment in MeerKAT to SA expertise and leadership
• Engage university researchers as stakeholder and drivers of innovation
• Training of cohort of required data scientists
• “Precursor” SKA Regional Science Centres?
• Collaborate on development of Precursor SKA Regional Science and Data Centres
• ASTRON, IDIA, SKA-SA (SARAO)
• Bring together MeerKAT and LOFAR key science
• Two initial components:
• Data transport for moving data to Tier 2 processing centres
• Processing pipelines for executing at Tier 2 processing centres
JVLA ALMA
MoU Signed 17 November 2015
Precursor Regional Science Centres
EU Horizon 2020 Project
Lead by ASTRON in the Netherlands
28 participants, including 3 in South Africa
• IDIA
• CSIR
• NRF (SKA-SA)
Precursor Regional Science Centres
Canada Foundation for Innovation
($10M – Oct 2017)
Lead by Dunlop Institute, Canada
“Unlocking the Radio Sky with Next
Generation Survey Astronomy”
• ASKAP, VLASS, CHIME
Precursor Regional Science Centres
US National Radio Astronomy Observatory
• Collaboration of development for data processing for data-intensive radio
astronomy projects
• Software system used for processing for Jansky Very Large Array, Atacama Large
Millimetre Array and MeerKAT
JVLA ALMA
Signed 17 January 2016
CARTA collaboration
Cube Analysis and Rendering Tool for Astronomy
ASIAA (Taiwan) – IDIA (South Africa) – NRAO (US)
To be deployed at ALMA Regional Science Centres
• Web-based client-server visualization
• IDIA’s role:
• Introduce support for new file formats for MeerKAT and SKA data sets
• Cloud-based distributed rendering and analysis for big data
• Improve existing analysis algorithms and introduce distributed analysis algorithms
• Design of graphic user interface and vis analytics for MeerKAT imaging LSP use cases
http://cartavis.github.io
Data Intensive Astronomy Research Cloud
Prototype federated cloud between UCT and NWU (2016)
Cloud for data intensive
research built from open
source software
IDIA Data-Centric Facility: Jan 2017
• Builds on core services provided by ARC
• Adds components for greater performance, increased
capacity and for large local POSIX storage
• 40 Compute nodes :
• 2.6GHz Xeon Processors
• 32 cores, 256 GB RAM / node
• 4 nodes have 2x NVidia K80 GPUS
• 4 x Storage Targets to provide POSIX volumes that add to
the block and object storage from ARC nodes
• Initially 500TB usable, growing to many PB
• 50Gb/s Ethernet core, attached to ARC
• 10Gb/s Access network connected to SANReN
R10M IDIA investment
ILIFU: DIRISA Tier 2 Data Intensive Research Facility
• Astronomy (IDIA, SKA-SA)
• Data Intensive Astronomy with priority on MeerKAT Large Survey
Programs
• Precursor SKA Regional Science Centre with EU
• Tier 2 node of South African Data Intensive Research Cloud
federation with T1 and T3 infrastructure
• Data Intensive Bioinformatics
• Tuberculosis Surveillance in Africa (UWC)
• Imputation service for African human genetics (UCT)
• Omics for Precision Medicine (SU)
• Research Data Management (CPUT)
Joint investment CSIR/DIRISA, IDIA +UCT Bio
DIRISA Tier 2 Data Intensive Research Facility
“ARC” Demonstrator
Operational since 2016
IDIA Data-Intensive Research Facility
Operational June 2017
CSIR Funded
operational 2018
Data Intensive Astronomy Cloud - V 2.0
Production
version
Data Intensive Astronomy Cloud
Data Intensive Astronomy Cloud
Image: MeerKAT AR1.5 Deep Observation
https://ca.cyberska.org/file/read/52799/deep-2-mfssc7imagett0fits
Moving to scale (ideally)
ILIFU
(Next Gen) Data-Intensive Research Cloud
• University of Cape Town
• University of the Western Cape
• Wits University
• University of Pretoria
• North-West University
• Sol Plaatje University
• South African Radio Astronomy Observ.
• South African National Space Agency
Strategic research programs in
Astronomy, Bioinformatics,
Geospatial
Link to DIRISA e-science training
initiative
Link to EU AENEAS centres
Precursor Regional Science Centres
Data transport demonstrator for SKA Data Delivery Systems
• Clone of IDIA data intensive astronomy cloud at ASTRON
• Data sharing and integrated analytics for teams working with LOFAR and MeerKAT Large Projects
DTN Deployment
• FTS used to provide
restartable data transfers
• GridFTP transfer endpoints
as used in 100Gb/s HEP
data transfer
demonstrations
• X509 Authorization for
requests and for GridFTP
transfers
IDIA Visualisation Lab
• “Visualization Wall” for large image data formats
• 2x2 array of high end ultra-HD displays (“8k” resolution 7860x4320)
• Collaborative meeting space (on-line and local). Place to meet and share exploration of
large data sets.
• Plug-and-play with external graphics card support
• Panoramic Immersive Visualization System
• Immersive single-user mini-dome
• Development platform for Iziko Digital Dome
Visualization from the cloud
• High end Virtual Reality setup
Virtual Reality Project
• Supports over 1M data points
• User can move around data by walking around room
• Web-based user interface on user’s (virtual) wrist
Thank You
for more information: www.idia.ac.za

Data Infrastructure Development for SKA/Jasper Horrell

  • 1.
    Data Infrastructure Development forSKA Dr Jasper Horrell Inter-University Institute for Data Intensive Astronomy (on behalf of Prof Russ Taylor)
  • 2.
    South SKA Timeline(s) SKA1 SKA 1%10% 100% 2015 2020 20252010 SA Facilities MeerKAT Science Pathfinder Science global facilities Early studies (e.g. VLA, GMRT deep fields, CHILES,..) Precursor Science MeerKAT, ASKAP Large Survey Projects SKA1 Key Science Survey Programs SKA Wide-field Long baseline survey radio astronomy
  • 3.
    The Square KilometreArray Data to Knowledge Cloud enabled Analytics and visualization AI assisted data exploration and visualization High Performance Computing
  • 4.
    Africa Growth of DataVolumes to Radio Astronomers Pathfinders Precursors SKA1 SKA
  • 5.
    Changing Sociology ofRadio Astronomy Key science on the SKA will be achieved by large-scale survey programs executed by globally distributed teams of researchers and creating massive data (nature of scientific enterprise changes) old new
  • 6.
  • 7.
    •LADUMA (Deep atomichydrogen) •MIGHTEE (Deep continuum imaging of the early universe) •Fornax (Deep HI Survey of the Fornax cluster ) •MHONGOOSE (targeted nearby galaxies HI) •MeerKAT Absorption Line Survey (extagalactic HI absorption) •ThunderKAT (exotic phenomena, variables and transients) •TRAPUM (pulsar search) •Pulsar Timing (no acronym) •MESMER (High-z CO) •MeerGAL (Galactic Plane Survey) MeerKAT Large Survey Projects http://public.ska.ac.za/meerkat/meerkat-large-survey-projects imaging Time domain
  • 8.
    MeerKAT Large Surveys(43,000 hours allocated) 8 22 countries
  • 9.
    MeerKAT MIGHTEE LargeSurvey Project Graphic: Ian Heywood
  • 10.
    The Challenges: datato knowledge 10 Data Observation Theory Knowledge • Managing exponential increase in rates and volumes. • Empowering the end user for multi-purpose processing, analytics, data mining and exploration. • Data Fusion with big multi-wavelength data and big simulations • Collaboration, sharing and joint execution of data-intensive projects by globally distributed teams of researchers
  • 11.
    SKA Regional “Science”and Data Centres What is the “roadmap”-how do we get there from here? • Leverage South African Investment in MeerKAT to SA expertise and leadership • Engage university researchers as stakeholder and drivers of innovation • Training of cohort of required data scientists • “Precursor” SKA Regional Science Centres?
  • 12.
    • Collaborate ondevelopment of Precursor SKA Regional Science and Data Centres • ASTRON, IDIA, SKA-SA (SARAO) • Bring together MeerKAT and LOFAR key science • Two initial components: • Data transport for moving data to Tier 2 processing centres • Processing pipelines for executing at Tier 2 processing centres JVLA ALMA MoU Signed 17 November 2015 Precursor Regional Science Centres
  • 13.
    EU Horizon 2020Project Lead by ASTRON in the Netherlands 28 participants, including 3 in South Africa • IDIA • CSIR • NRF (SKA-SA) Precursor Regional Science Centres
  • 14.
    Canada Foundation forInnovation ($10M – Oct 2017) Lead by Dunlop Institute, Canada “Unlocking the Radio Sky with Next Generation Survey Astronomy” • ASKAP, VLASS, CHIME Precursor Regional Science Centres
  • 15.
    US National RadioAstronomy Observatory • Collaboration of development for data processing for data-intensive radio astronomy projects • Software system used for processing for Jansky Very Large Array, Atacama Large Millimetre Array and MeerKAT JVLA ALMA Signed 17 January 2016
  • 16.
    CARTA collaboration Cube Analysisand Rendering Tool for Astronomy ASIAA (Taiwan) – IDIA (South Africa) – NRAO (US) To be deployed at ALMA Regional Science Centres • Web-based client-server visualization • IDIA’s role: • Introduce support for new file formats for MeerKAT and SKA data sets • Cloud-based distributed rendering and analysis for big data • Improve existing analysis algorithms and introduce distributed analysis algorithms • Design of graphic user interface and vis analytics for MeerKAT imaging LSP use cases http://cartavis.github.io
  • 17.
    Data Intensive AstronomyResearch Cloud Prototype federated cloud between UCT and NWU (2016) Cloud for data intensive research built from open source software
  • 19.
    IDIA Data-Centric Facility:Jan 2017 • Builds on core services provided by ARC • Adds components for greater performance, increased capacity and for large local POSIX storage • 40 Compute nodes : • 2.6GHz Xeon Processors • 32 cores, 256 GB RAM / node • 4 nodes have 2x NVidia K80 GPUS • 4 x Storage Targets to provide POSIX volumes that add to the block and object storage from ARC nodes • Initially 500TB usable, growing to many PB • 50Gb/s Ethernet core, attached to ARC • 10Gb/s Access network connected to SANReN R10M IDIA investment
  • 20.
    ILIFU: DIRISA Tier2 Data Intensive Research Facility • Astronomy (IDIA, SKA-SA) • Data Intensive Astronomy with priority on MeerKAT Large Survey Programs • Precursor SKA Regional Science Centre with EU • Tier 2 node of South African Data Intensive Research Cloud federation with T1 and T3 infrastructure • Data Intensive Bioinformatics • Tuberculosis Surveillance in Africa (UWC) • Imputation service for African human genetics (UCT) • Omics for Precision Medicine (SU) • Research Data Management (CPUT) Joint investment CSIR/DIRISA, IDIA +UCT Bio
  • 21.
    DIRISA Tier 2Data Intensive Research Facility “ARC” Demonstrator Operational since 2016 IDIA Data-Intensive Research Facility Operational June 2017 CSIR Funded operational 2018
  • 22.
    Data Intensive AstronomyCloud - V 2.0 Production version
  • 23.
  • 24.
    Data Intensive AstronomyCloud Image: MeerKAT AR1.5 Deep Observation https://ca.cyberska.org/file/read/52799/deep-2-mfssc7imagett0fits
  • 25.
    Moving to scale(ideally) ILIFU
  • 26.
    (Next Gen) Data-IntensiveResearch Cloud • University of Cape Town • University of the Western Cape • Wits University • University of Pretoria • North-West University • Sol Plaatje University • South African Radio Astronomy Observ. • South African National Space Agency Strategic research programs in Astronomy, Bioinformatics, Geospatial Link to DIRISA e-science training initiative Link to EU AENEAS centres
  • 27.
    Precursor Regional ScienceCentres Data transport demonstrator for SKA Data Delivery Systems • Clone of IDIA data intensive astronomy cloud at ASTRON • Data sharing and integrated analytics for teams working with LOFAR and MeerKAT Large Projects
  • 28.
    DTN Deployment • FTSused to provide restartable data transfers • GridFTP transfer endpoints as used in 100Gb/s HEP data transfer demonstrations • X509 Authorization for requests and for GridFTP transfers
  • 29.
    IDIA Visualisation Lab •“Visualization Wall” for large image data formats • 2x2 array of high end ultra-HD displays (“8k” resolution 7860x4320) • Collaborative meeting space (on-line and local). Place to meet and share exploration of large data sets. • Plug-and-play with external graphics card support • Panoramic Immersive Visualization System • Immersive single-user mini-dome • Development platform for Iziko Digital Dome Visualization from the cloud • High end Virtual Reality setup
  • 30.
    Virtual Reality Project •Supports over 1M data points • User can move around data by walking around room • Web-based user interface on user’s (virtual) wrist
  • 31.
    Thank You for moreinformation: www.idia.ac.za