nci.org.au
nci.org.au
@NCInews
nci.org.au
@NCInews
Big Data is today: key issues for big data
Dr Ben Evans
Associate Director
Research Engagements and Initiatives
nci.org.au
Impact of Collaborations around Earth Systems Science Research
Tropical Cyclones
Cyclone Winston
20-21 Feb, 2016
Volcanic Ash
Manam Eruption
31 July, 2015
Wye Valley and
Lorne Fires
25-31 Dec, 2015
Bush Fires
Societal impacts requiring cross-domain collaboration
• Modelling Extreme & High Impact events – BoM
• NWP, Climate Coupled Systems & Data Assimilation – BoM, CSIRO, Research Collabs
• Hazards - Geoscience Australia, BoM, States
• Geophysics, Potential Fields, Siesmic – Geoscience Australia, Universities
• Monitoring the Environment & Ocean – ANU, BoM, CSIRO, GA, Research, Fed/State
• International research – International agencies and Collaborative Programs
• Agriculture - …
Flooding
St George, QLD
February, 2011
© National Computational
Infrastructure 2016
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Emerging Petascale Geophysics HPC codes
- Assess priority Geophysics areas
- 3D/4D Geophysics: Magneto-tellurics, AEM
- Hydrology, Groundwater, Carbon Sequestration
- Forward and Inverse Seismic models and
analysis (onshore and offshore)
- Natural Hazard and Risk models: Tsunami, Ash-
cloud
- Issues
- Data across domains, data resolution (points,
lines, grids), data coverage
- Provenance capture and query
- Model maturity for running at scale
- Ensemble, Uncertainty analysis and Inferencing
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Growth of Genomics data generation and need for analysis
The arrival of the “$1,000” genome
© National Computational
Infrastructure 2016
c/- Marcel Dinger, Garvin Inst.
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Computational need to access big data
http://www.top500.org/statistics/perfdevel/
Current NCI
Next NCI
High-Performance Data
(HPD) (Evans, ISESS
2015, Springer)
• HPC – turning compute
into IO-bound problems
• HPD – turning IO-bound
into ontology + semantic
problems
• Computational
Performance increasing
• Number of CPU cores
increasing
• Data needs to scale
• Need compute to make full
use of data
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
NCI National Platform – to enable collaboration/transformation
NCI Proposal to NCRIS RDSI (RDS) for a High Performance Data Node to:
• Enable dramatic increases in the scale and reach of Australian research by providing
nationwide access to enabling data collections;
• Specialise in nationally significant research collections requiring high-performance
computational and data-intensive capabilities for their use in effective research
methods;
• Realise synergies with related national research infrastructure programs
As a result, Researchers will be able to:
• share, use and reuse significant collections of data that were previously either
unavailable to them or difficult to access
• access the data in a consistent manner which will support a general interface as well as
discipline specific access
• use the consistent interface established/funded by this project for access to data
collections at participating institutions and other locations as well as data held at the
Nodes
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
1. Climate/ESS Model Assets and Data Products
2. Earth and Marine Observations and Data Products
3. Geoscience Collections
4. Terrestrial Ecosystems Collections
5. Water Management and Hydrology Collections
NCI National Environment Research Data Collections
© National Computational
Infrastructure 2016
• Allocations and Review panels
• Science Data Committee
• Data Technical committee
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Enable global and continental scale
… and to scale-down to local/catchment/plot
• Water availability and usage
over time
• Catchment zone
• Vegetation changes
• Data fusion with point-clouds
and local
or other measurements
• Statistical techniques on key
variables
Preparing for:
• Better programmatic access
• Machine/Deep Learning
• Better Integration through
Semantic/Linked data
technologies
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Small Data to calibrate, validate and understand the Big Data
Image Credit: Japan Meteorological Agency (JMA)
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Diabolical data – understanding data complexity
© National Computational
Infrastructure 2016
• Data Collections
• Data SubCollections
• Data Sets (and granules)
• Data subsetting
and Dynamic data
• Versioning, licensing,
provenance, citation, sync,
linked/semantic data
• Social issues/responsibility mgt
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Data
Services
NERDIP Data Platform
Compute
Intensive
Virtual
Laboratories
NERDIP – simplified view
Fast/Deep
Data Access
Portal
views
Machine
Connected
© National Computational
Infrastructure 2016
Program
access
Server-side
functions
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
http://geonetwork.nci.org.au/ - access to metadata
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Licensing and Access for Earth and Environmental
• All metadata must be open and discoverable:
– through NCI, ANDS, FIND, data.gov.au and partner websites
• Where possible, data will be CC-BY
– Metadata and landing pages will document any access restrictions
• NCI worked with Baden Appleyard QC of AusGOAL (Australian
Governments Open Access and Licensing Framework)
© National Computational
Infrastructure 2015 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Standards – Ensure compliant with Standards
AIMS
CSIRO
MAR
Geoscience
Australia
BOM
Dept. of
Defence
AAD
Aust. Ocean Data
Centre Joint Facility
(AODCJF)
Data Integration
• eMII
• MACDDAP
Data Generation
• ARGO
• SOOP
• SOTS
• ANFOG
• AUV
• ANMN
• AATAMS
• FAIMMS
• SRS
NCRIS IMOS
Australian Ocean
Data Network
PortalsandAccess
Data Management
Components
•ANDS
•NCI
•RDSI
Other Components
•AAF
•AARNet
Data Mangement
Australian Research
Data Commons
VIC
WAGA
TAS
NT
QLD
Govt Geoscience
Info. Committee
(GGIC)
SA
NSW
• Data Integration
• AuScope Grid
• SISS
• ARSDC
Data Generation
• VCL
• Geospatiall
• SAM
• Earth Imaging
• Earth Composition
• Groundwater
NCRIS AuScope
AuScope Portal
Geoscience Portal
Research&DevelopmentGovernmentOperational
ANZLIC Spatial
Information Council
Australian Spatial
Data Directory
VIC
WA
OSDM
TAS
NT QLD
SA
NSW
ACT
NZ
ICSM
Data Integration
•Atlas of Living Australia
•Aust Phenomics
Network
Data Generation
Aust. Plant Phenomics
Facility
NCRIS Integrated
Biological Systems
Atlas of Living
Australia
Australian
Govt Water
VIC
WA
BOM
TASNT
QLD
SA
NSWACT
CSIRO
Aust Water Resources
Information System
• Australian Spatial
Consortium
• ASIBA
• SSI
• PSMA
• 43 Pty Ltd
CRC for Spatial
Information
NCRIS
TERN
e-MAST
BCCVL
TERN.
Climate
& Weather
NCRIS
CWSLab
Australian
Government
AGIMO Gov 2.0
CSSDP NAMF
NSS AGLS MDBC NWC
Aust. Govt. Online
Service Point
GA
NZ
NT
QLDNSW
VIC
WAACT
TASSA
CSIRO
Bureau of Met
ISO/OGC ISO/OGC ISO/OGC ISO/OGC
ISO/OGCISO/OGCISO/OGCISOISO/OGC
© National Computational
Infrastructure 2015
nci.org.au
Transform data to become transdisciplinary and born-connected
• A call to action for a Transdisciplinary approach starting at the conception of data
collections
• Researchers across the science disciplines and broader
• Then achieve interoperability and relevant information will be accessible to all sectors
• Data moving to Born-Connected, which is part of the semantic and linked data world
• Improves quality assurance of the data if “linked”
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Getting Serious about Profiling Data Performance
• Calltree analysis
• Main General global profiling tools:
– Scalasca/Score-P; TAU; OpenSpeedShop
– HPCToolKit; mpiP; ITAC
• IO analysis:
– Compare to baselines
– Darshan
– Global profiling tools focused on IO
www.hdfgroup.org
How data is stored?
August 7, 2013 Extreme Scale Computing HDF5 17$
Chunked
Chunked &
Compressed
Better access time
for subsets;
extendible
Improves storage
efficiency,
transmission speed
Contiguous
(default)
Data elements
stored physically
adjacent to each
other
Buffer in memory Data in the file
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
0
1000
2000
3000
4000
5000
6000
7000
1 8 16 32 64 128
MB/s
Stripe count
Independent Read
HDF5 MPIIO POSIX
Performance Access Factors
• Data packing
• Variable ordering
• Chunking/blocking
• Compression
• Caching
• Subsetting/Sieving
• Read vs Write
• Parallel IO
• Data conversion
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Data Classified Based On Processing Levels
Level* Proposed Name Description*
0 Raw Data Instrumental data as received from sensor. Includes any and all artefacts.
1 Instrument Data Instrument data that have been converted to sensor units but are otherwise
unprocessed. Data includes appended time and platform georeferencing
parameters (e.g., satellite ephemeris).
2 Calibrated Data Data that has undergone corrections or calibrations necessary to convert
instrument data into geophysical value. Data includes calculated position.
3 Gridded Data Data that has been gridded and undergone minor processing for
completeness and consistency (i.e., replacing missing data).
4 “Value-added”
Data Products
Analytical (modelled) data such as those derived from the application of
algorithms to multiple measurements or sensors.
5 Model-derived
Data Products
Data resulting from the simulation of physical processes and/or application of
expert knowledge and interpretation.
*The level numbers and descriptions above follow definitions used in satellite data processing, as defined by NASA. (see
<http://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products/>; <http://nsidc.org/the-
drift/2013/08/is-it-1b-2-or-3-definitions-of-data-processing-levels/>; <http://www.srl.caltech.edu/ACE/ASC/level1/dpl_def.htm>).
HPD
points
grids
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Quality Assurance, Conventions & Interoperable Standards
© National Computational
Infrastructure 2016
O&M ISO
standards
CF and ACDD
standards
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Barriers: Like my Coordinate System?
Mercator grid in south
Tripolar grid in north
Standards on Nested Grids
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Transforming data on-the-fly
nci.org.au
Examples of Virtual Labs and web tools
eReefs online analysis portal
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Matching to the database of events
Input Image
Feature Maps – Convolution Layer 1
© National Computational
Infrastructure 2016
c/- Rahul Ramachandran,
NASA / MSFC
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Reasonable progress
Overall Accuracy = 87.88%
MODIS Rapid Response Test Images (Images to Trained scheme)
True Positive True Positive True PositiveFalse Negative False Positive False Positive
Hurricane Dust Smoke
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
PROMS v3 uses an extension to
the PROV ontology as its data
model.
– Entities
– Activities
– Agent
RD-Switchboard http://www.rd-switchboard.org/
© National Computational
Infrastructure 2016
Enabling transparency, reproducibility & informatics techniques
Ben Evans, “Preparing for your data future”, July 2016
nci.org.au
Key Messages for raising a Data Centre in a Big Data World• Scientific Computing scales of today have to be built across collaborations of
national facilities around national institutions that both scale up and scale-
down
• Data needs to be
• born-connected,
• transdisciplinary,
• high quality,
• computationally ready
• Needs expertise around usability and performance tuning to ensure getting the
most out of the data.
• No one [insert grouping] can do it alone.
• No one organisation, no one group, no one country has the required resources or
the expertise.
• Collaborative efforts across disciplines and collaboration across nations
Working Collaboratively in the era of Exascale and Big Data
© National Computational
Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016

Big Data is today: key issues for big data - Dr Ben Evans

  • 1.
    nci.org.au nci.org.au @NCInews nci.org.au @NCInews Big Data istoday: key issues for big data Dr Ben Evans Associate Director Research Engagements and Initiatives
  • 2.
    nci.org.au Impact of Collaborationsaround Earth Systems Science Research Tropical Cyclones Cyclone Winston 20-21 Feb, 2016 Volcanic Ash Manam Eruption 31 July, 2015 Wye Valley and Lorne Fires 25-31 Dec, 2015 Bush Fires Societal impacts requiring cross-domain collaboration • Modelling Extreme & High Impact events – BoM • NWP, Climate Coupled Systems & Data Assimilation – BoM, CSIRO, Research Collabs • Hazards - Geoscience Australia, BoM, States • Geophysics, Potential Fields, Siesmic – Geoscience Australia, Universities • Monitoring the Environment & Ocean – ANU, BoM, CSIRO, GA, Research, Fed/State • International research – International agencies and Collaborative Programs • Agriculture - … Flooding St George, QLD February, 2011 © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 3.
    nci.org.au Emerging Petascale GeophysicsHPC codes - Assess priority Geophysics areas - 3D/4D Geophysics: Magneto-tellurics, AEM - Hydrology, Groundwater, Carbon Sequestration - Forward and Inverse Seismic models and analysis (onshore and offshore) - Natural Hazard and Risk models: Tsunami, Ash- cloud - Issues - Data across domains, data resolution (points, lines, grids), data coverage - Provenance capture and query - Model maturity for running at scale - Ensemble, Uncertainty analysis and Inferencing © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 4.
    nci.org.au Growth of Genomicsdata generation and need for analysis The arrival of the “$1,000” genome © National Computational Infrastructure 2016 c/- Marcel Dinger, Garvin Inst. Ben Evans, “Preparing for your data future”, July 2016
  • 5.
    nci.org.au Computational need toaccess big data http://www.top500.org/statistics/perfdevel/ Current NCI Next NCI High-Performance Data (HPD) (Evans, ISESS 2015, Springer) • HPC – turning compute into IO-bound problems • HPD – turning IO-bound into ontology + semantic problems • Computational Performance increasing • Number of CPU cores increasing • Data needs to scale • Need compute to make full use of data © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 6.
    nci.org.au NCI National Platform– to enable collaboration/transformation NCI Proposal to NCRIS RDSI (RDS) for a High Performance Data Node to: • Enable dramatic increases in the scale and reach of Australian research by providing nationwide access to enabling data collections; • Specialise in nationally significant research collections requiring high-performance computational and data-intensive capabilities for their use in effective research methods; • Realise synergies with related national research infrastructure programs As a result, Researchers will be able to: • share, use and reuse significant collections of data that were previously either unavailable to them or difficult to access • access the data in a consistent manner which will support a general interface as well as discipline specific access • use the consistent interface established/funded by this project for access to data collections at participating institutions and other locations as well as data held at the Nodes © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 7.
    nci.org.au 1. Climate/ESS ModelAssets and Data Products 2. Earth and Marine Observations and Data Products 3. Geoscience Collections 4. Terrestrial Ecosystems Collections 5. Water Management and Hydrology Collections NCI National Environment Research Data Collections © National Computational Infrastructure 2016 • Allocations and Review panels • Science Data Committee • Data Technical committee Ben Evans, “Preparing for your data future”, July 2016
  • 8.
    nci.org.au Enable global andcontinental scale … and to scale-down to local/catchment/plot • Water availability and usage over time • Catchment zone • Vegetation changes • Data fusion with point-clouds and local or other measurements • Statistical techniques on key variables Preparing for: • Better programmatic access • Machine/Deep Learning • Better Integration through Semantic/Linked data technologies © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 9.
    nci.org.au Small Data tocalibrate, validate and understand the Big Data Image Credit: Japan Meteorological Agency (JMA) © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 10.
    nci.org.au Diabolical data –understanding data complexity © National Computational Infrastructure 2016 • Data Collections • Data SubCollections • Data Sets (and granules) • Data subsetting and Dynamic data • Versioning, licensing, provenance, citation, sync, linked/semantic data • Social issues/responsibility mgt Ben Evans, “Preparing for your data future”, July 2016
  • 11.
    nci.org.au Data Services NERDIP Data Platform Compute Intensive Virtual Laboratories NERDIP– simplified view Fast/Deep Data Access Portal views Machine Connected © National Computational Infrastructure 2016 Program access Server-side functions Ben Evans, “Preparing for your data future”, July 2016
  • 12.
    nci.org.au http://geonetwork.nci.org.au/ - accessto metadata © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 13.
    nci.org.au Licensing and Accessfor Earth and Environmental • All metadata must be open and discoverable: – through NCI, ANDS, FIND, data.gov.au and partner websites • Where possible, data will be CC-BY – Metadata and landing pages will document any access restrictions • NCI worked with Baden Appleyard QC of AusGOAL (Australian Governments Open Access and Licensing Framework) © National Computational Infrastructure 2015 Ben Evans, “Preparing for your data future”, July 2016
  • 14.
    nci.org.au Standards – Ensurecompliant with Standards AIMS CSIRO MAR Geoscience Australia BOM Dept. of Defence AAD Aust. Ocean Data Centre Joint Facility (AODCJF) Data Integration • eMII • MACDDAP Data Generation • ARGO • SOOP • SOTS • ANFOG • AUV • ANMN • AATAMS • FAIMMS • SRS NCRIS IMOS Australian Ocean Data Network PortalsandAccess Data Management Components •ANDS •NCI •RDSI Other Components •AAF •AARNet Data Mangement Australian Research Data Commons VIC WAGA TAS NT QLD Govt Geoscience Info. Committee (GGIC) SA NSW • Data Integration • AuScope Grid • SISS • ARSDC Data Generation • VCL • Geospatiall • SAM • Earth Imaging • Earth Composition • Groundwater NCRIS AuScope AuScope Portal Geoscience Portal Research&DevelopmentGovernmentOperational ANZLIC Spatial Information Council Australian Spatial Data Directory VIC WA OSDM TAS NT QLD SA NSW ACT NZ ICSM Data Integration •Atlas of Living Australia •Aust Phenomics Network Data Generation Aust. Plant Phenomics Facility NCRIS Integrated Biological Systems Atlas of Living Australia Australian Govt Water VIC WA BOM TASNT QLD SA NSWACT CSIRO Aust Water Resources Information System • Australian Spatial Consortium • ASIBA • SSI • PSMA • 43 Pty Ltd CRC for Spatial Information NCRIS TERN e-MAST BCCVL TERN. Climate & Weather NCRIS CWSLab Australian Government AGIMO Gov 2.0 CSSDP NAMF NSS AGLS MDBC NWC Aust. Govt. Online Service Point GA NZ NT QLDNSW VIC WAACT TASSA CSIRO Bureau of Met ISO/OGC ISO/OGC ISO/OGC ISO/OGC ISO/OGCISO/OGCISO/OGCISOISO/OGC © National Computational Infrastructure 2015
  • 15.
    nci.org.au Transform data tobecome transdisciplinary and born-connected • A call to action for a Transdisciplinary approach starting at the conception of data collections • Researchers across the science disciplines and broader • Then achieve interoperability and relevant information will be accessible to all sectors • Data moving to Born-Connected, which is part of the semantic and linked data world • Improves quality assurance of the data if “linked” © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 16.
    nci.org.au Getting Serious aboutProfiling Data Performance • Calltree analysis • Main General global profiling tools: – Scalasca/Score-P; TAU; OpenSpeedShop – HPCToolKit; mpiP; ITAC • IO analysis: – Compare to baselines – Darshan – Global profiling tools focused on IO www.hdfgroup.org How data is stored? August 7, 2013 Extreme Scale Computing HDF5 17$ Chunked Chunked & Compressed Better access time for subsets; extendible Improves storage efficiency, transmission speed Contiguous (default) Data elements stored physically adjacent to each other Buffer in memory Data in the file © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 17.
    nci.org.au 0 1000 2000 3000 4000 5000 6000 7000 1 8 1632 64 128 MB/s Stripe count Independent Read HDF5 MPIIO POSIX Performance Access Factors • Data packing • Variable ordering • Chunking/blocking • Compression • Caching • Subsetting/Sieving • Read vs Write • Parallel IO • Data conversion © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 18.
    nci.org.au Data Classified BasedOn Processing Levels Level* Proposed Name Description* 0 Raw Data Instrumental data as received from sensor. Includes any and all artefacts. 1 Instrument Data Instrument data that have been converted to sensor units but are otherwise unprocessed. Data includes appended time and platform georeferencing parameters (e.g., satellite ephemeris). 2 Calibrated Data Data that has undergone corrections or calibrations necessary to convert instrument data into geophysical value. Data includes calculated position. 3 Gridded Data Data that has been gridded and undergone minor processing for completeness and consistency (i.e., replacing missing data). 4 “Value-added” Data Products Analytical (modelled) data such as those derived from the application of algorithms to multiple measurements or sensors. 5 Model-derived Data Products Data resulting from the simulation of physical processes and/or application of expert knowledge and interpretation. *The level numbers and descriptions above follow definitions used in satellite data processing, as defined by NASA. (see <http://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products/>; <http://nsidc.org/the- drift/2013/08/is-it-1b-2-or-3-definitions-of-data-processing-levels/>; <http://www.srl.caltech.edu/ACE/ASC/level1/dpl_def.htm>). HPD points grids © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 19.
    nci.org.au Quality Assurance, Conventions& Interoperable Standards © National Computational Infrastructure 2016 O&M ISO standards CF and ACDD standards Ben Evans, “Preparing for your data future”, July 2016
  • 20.
    nci.org.au Barriers: Like myCoordinate System? Mercator grid in south Tripolar grid in north Standards on Nested Grids © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 21.
  • 22.
    nci.org.au Examples of VirtualLabs and web tools eReefs online analysis portal © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 23.
    nci.org.au Matching to thedatabase of events Input Image Feature Maps – Convolution Layer 1 © National Computational Infrastructure 2016 c/- Rahul Ramachandran, NASA / MSFC Ben Evans, “Preparing for your data future”, July 2016
  • 24.
    nci.org.au Reasonable progress Overall Accuracy= 87.88% MODIS Rapid Response Test Images (Images to Trained scheme) True Positive True Positive True PositiveFalse Negative False Positive False Positive Hurricane Dust Smoke © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016
  • 25.
    nci.org.au PROMS v3 usesan extension to the PROV ontology as its data model. – Entities – Activities – Agent RD-Switchboard http://www.rd-switchboard.org/ © National Computational Infrastructure 2016 Enabling transparency, reproducibility & informatics techniques Ben Evans, “Preparing for your data future”, July 2016
  • 26.
    nci.org.au Key Messages forraising a Data Centre in a Big Data World• Scientific Computing scales of today have to be built across collaborations of national facilities around national institutions that both scale up and scale- down • Data needs to be • born-connected, • transdisciplinary, • high quality, • computationally ready • Needs expertise around usability and performance tuning to ensure getting the most out of the data. • No one [insert grouping] can do it alone. • No one organisation, no one group, no one country has the required resources or the expertise. • Collaborative efforts across disciplines and collaboration across nations Working Collaboratively in the era of Exascale and Big Data © National Computational Infrastructure 2016 Ben Evans, “Preparing for your data future”, July 2016

Editor's Notes

  • #5 Ref Dinger_IMB_Winter_School_2014.pptx
  • #22 Landsat: A mosaic composed from different scenes for the selected area, using the scenes which are closer to the selected date. An RGB image is composed mapping three different bands into the RGB colours. Himawari: A video corresponding to the selected date and area, 12 frames, corresponding to period around noon where every frame is 30 minutes apart. Each frame is an RGB image which is composed mapping the closest three bands of Himawari to Landsat to have a similar image. ERA interim: A video corresponding to the selected date and 2000 square kilometers around the selected region representing "ERA-Interim Evaporation [m] forecast on surface". 8 frames, corresponding to one day (one every 3 hours). Each frame is an RGB image which is composed using a colormap to represent the different values of evaporation.