SlideShare a Scribd company logo
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
RDS	
  /	
  AAF	
  /	
  ANDS	
  /	
  
NeCTAR /	
  AARNET
Data	
  Lifecycle	
  Project
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
RDS	
  – Research	
  Data	
  Services	
  *
ANDS	
  – Australian	
  National	
  Data	
  Service	
  *
NeCTAR – National	
  eResearch Collaboration	
  Tools	
  and	
  Resources	
   *
AAF	
  – The	
  Australian	
  Access	
  Federation
AARNET	
  – Australia’s	
  Academic	
  Research	
  Network
*	
  funded	
  by	
  the	
  National	
  Collaborative	
  Research	
  Infrastructure	
  Strategy	
  (NCRIS)
Data	
  Lifecycle	
  Project
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
What	
  is	
  the	
  research	
  data	
  lifecycle?
Creation	
  /	
  
Discovery
Description	
  /	
  
Provenance
Integration/	
  
Storage
Analysis	
  /	
  
Manipulation
Preserve	
  /	
  
Archive	
  /	
  
Discard
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
Another	
  way	
  of	
  looking	
  at	
  it
http://www.lib.ua.edu/wiki/sura/index.php/Data_Life_Cycle_M odels
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
And	
  another
http://www.lib.ua.edu/wiki/sura/index.php/Data_Life_Cycle_M odels
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
ingest share
dropbox-­‐like
process
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
Research  Data  Australia
Data  Lifecycle  Project  –
Existing  components  /  Services
process
Various	
  Storage	
  
Resources
Existing  Components/Services
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
Provisioning  /  
Access  /  
Provenancemetadata  db
ingest share
dropbox-­‐like
process
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
Research  Data  Australia
Data  Lifecycle  Project  –
Proposed  new  components:
process
National	
  Storage	
  
Resources
Proposed  New/Enhanced  Components
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
PAP  portal
metadata  db
ingest share
dropbox-­‐like
process
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
6
1
3
4
5
6
2
Researchers  access  grants  database  which  indicates  which  grants  they  ‘own’  or  have  
access  to.    This  “surrounding”  metadata  is  registered  with  the  metadata  db.
Space  marked  with  this  metadata  is  provisioned  on  dropbox-­like  storage  which  is  visible  
to  the  NeCTAR   cloud  – this  space  should  belong  to  a  project,  not  a  person.
Automated  and  manual  ingest  processes  feed  data  to  this  store,  harvesting  additional  
metadata  where  possible  and  relevant
Provisioned  space  should  be  as  dropbox like  as  possible.
Storage  is  immediately  visible  to  NeCTAR   cloud  and  processes  developed  to  ship  data  
to  local  HPC  or  peak  facilities  using  existing  high-­speed  networks  and  tools
Once  project  is  complete,  the  data  is  packaged  and  shipped  to  and  indexed  by  the  
relevant  domain   repository as  well  as  registered  with  the  RDA   index.  Research  Data  Australia
Data  Lifecycle  Project  –
All  components
process 5
National	
  Storage	
  
Resources
2
Possible  Project  Components
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
metadata  db
ingest share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
6
1
3
4
5
6
2
“What  can  I  access?”
“How   can  I  feed  my  data  into  it?”
“How   can  I  share  and  use  it  with  my  group  and  my  collaborators?”
“Can  I  process  it  on  the  Cloud?”     “And  here  at  Uni  of  X?”    “I  need  a  bigger  machine..”
”I’ve  finished  my  project  and  I  think  this  data  could  be  useful  to  someone  in  the  future,  please  
pack  it  away  and  make  it  available  somehow”  
“I  don’t  want  to  share  it  just  yet,  please  hold  on  to  it  and  let  me  know  if  someone  wants  access”Research  Data  Australia
Provisioning	
  (phase	
  1)
Use	
  (phase	
  2)
archive/publish/share/reuse/discard	
   (phase	
  3)
Data  Lifecycle  Project  
Outline  – phases/workflows
processing 5
“Where   is  it?”
National	
  Storage	
  
Resources
2
Phases  /  Workflows
PAP  portal
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
metadata  db
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
Research  Data  Australia
RDS
NeCTAR
ANDS
Data  Lifecycle  Project
Potential  Areas  of  Responsibility
AAF
AARNet
ingest share
dropbox-­‐like
process
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
process
Trove,  VL’s,  A1’s  etc
cloudstor?
High  speed  
data  shipping
Mediaflux/aspera?
Medici?  GPFS?
neutron?
manta?ORCiD?
mediaflux?
Dublin  core?
figshare?
In  house,  hybrid,  3rd
party  cloud?
Do  we  need  to  pick  one?    Can  we  interface  with  many?
Local  stores,  AWS,  DropBox,  Box,  Others?
National	
  Storage	
  
Resources
Others	
  (eg unis,	
  NCRIS	
  projects,	
  etc)
Possible  Project  Areas  of  Responsibility
PAP  portal
Identity	
  /	
  Authorisation /	
  Access
Identity	
  /	
  Authorisation /	
  Access
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
metadata  db
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
Research  Data  Australia
Doesn’t	
  exist
Exists	
  fully
Exists	
  Partially
Data  Lifecycle  Project
Existing  components  or  
parts  of  components
ingest share
dropbox-­‐like
process
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
process
National	
  Storage	
  
Resources
PAP  portal
Identity	
  /	
  Authorisation /	
  Access
Identity	
  /	
  Authorisation /	
  Access
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
NeCTAR/RDSI	
  Storage
metadata  db
ingest share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 Domain   /  VL  portal
repo	
  1 repo	
  2
Domain	
  /	
  
VL	
  repo
2
Research  Data  Australia
Example:  
RDS  A1  Projects  &
NeCTAR Virtual  Labs
Do  not  have  a  need  for  dropbox like  sharing  
or  peak  compute
processing 5
6
Examples:  Virtual  Lab
PAP  portal
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
National  
grants  db
Local	
  Storage
metadata  db
ingest share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 Uni Research  Data  portal
repo	
  1 repo	
  2
Local	
  
repo
2
Research  Data  Australia
Example:  
University  Workflow
Do  not  have  a  need  for  dropbox like  sharing  
or  peak  compute
processing 5
6
Examples:  University
PAP  portal
Local  
grants  db Ethics  db
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
Local	
  HPC	
  File	
  System
metadata  db
ingest share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
2
Research  Data  Australia
Example:  
HPC  Workflow
Does  not  have  a  need  for  dropbox like  sharing  
or  shoulder  compute
processing 5
6
Examples:  High  Performance  Computing
PAP  portal
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
metadata  db
ingest share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
2
Research  Data  Australia
Example:  
National  Cloudstor
/  OwnCloud
Focusses  on  the  ‘long  tail’  and  does  have  
a  need  for  dropbox like  sharing  and  easy  access
to  cloud  processing  platforms
processing 5
6
Examples:  AARNet Cloudstor
PAP  portal
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
ingestprovisioning
researchers
projects
grants  db
metadata  db
share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 portal  3
repo	
  1 repo	
  2 repo	
  3
2
Research  Data  Australia
Example:  
OwnCloud
Federation
processing 5
6
Uni A Uni B
(uses  the  national  
provisioning  portal)
Uni C
Uni A  
provisioning
portal
Uni C
provisioning
portal
Examples:  OwnCloud Federation
PAP  portal
2 2 2
Research	
  Data	
  Services
Icons	
  made	
  by Freepik from www.flaticon.com
provisioning
researchers
projects
grants  db
Amazon	
  S3
metadata  db
ingest share
dropbox-­‐like
processing
inc NCI,	
  Pawsey,	
  
local	
  HPC,	
  etc
1 3 4 5
portal  1 portal  2 portal  3
repo	
  1 repo	
  2
Glacier
2
Research  Data  Australia
Example:  
Amazon
Also  focusses  on  the  ‘long  tail’  and  has  a  need  
for  dropbox like  sharing  and  easy  access
to  highly-­elastic  and  integrated  infrastructure
EC	
  2 5
6
Examples:  Amazon
PAP  portal

More Related Content

What's hot

Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
andrea huang
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
EUCLID project
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Geoffrey Fox
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
Geoffrey Fox
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
EUCLID project
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
EUCLID project
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
The HDF-EOS Tools and Information Center
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
Geoffrey Fox
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
eduarderwee
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Andy Powell
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
EUCLID project
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
OpenLink Software
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
datacite
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Research Data Alliance
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
Steffen Staab
 

What's hot (20)

Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
Open for Business  Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business  Open Archives, OpenURL, RSS and the Dublin Core
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Virtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDFVirtuoso -- The Prometheus of RDF
Virtuoso -- The Prometheus of RDF
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
Enabling Precise Identification and Citability of Dynamic Data: Recommendatio...
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 

Similar to 160606 data lifecycle project outline

Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
Patrick Nicolas
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
Chelle Gentemann
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
jaxLondonConference
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
Rainer Sternfeld
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4
szbra
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
Denodo
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Artefactual Systems - AtoM
 
A hint of_mint
A hint of_mintA hint of_mint
A hint of_mint
Peter Sefton
 
Data Science
Data ScienceData Science
Data Science
Ahmet Bulut
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
Chris Mattmann
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Ricard de la Vega
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
The HDF-EOS Tools and Information Center
 
The DURAARK Workbench and PREMIS
The DURAARK Workbench and PREMISThe DURAARK Workbench and PREMIS
The DURAARK Workbench and PREMIS
lindlar
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
The Open Education Consortium
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
ARDC
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
ECHOES (Empowering Communities with a Heritage Open Ecosystem)
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
4Science
 

Similar to 160606 data lifecycle project outline (20)

Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4Flexible Resources In 3 6 And E4
Flexible Resources In 3 6 And E4
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
A hint of_mint
A hint of_mintA hint of_mint
A hint of_mint
 
Data Science
Data ScienceData Science
Data Science
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
The DURAARK Workbench and PREMIS
The DURAARK Workbench and PREMISThe DURAARK Workbench and PREMIS
The DURAARK Workbench and PREMIS
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

160606 data lifecycle project outline

  • 1. Research  Data  Services Icons  made  by Freepik from www.flaticon.com RDS  /  AAF  /  ANDS  /   NeCTAR /  AARNET Data  Lifecycle  Project
  • 2. Research  Data  Services Icons  made  by Freepik from www.flaticon.com RDS  – Research  Data  Services  * ANDS  – Australian  National  Data  Service  * NeCTAR – National  eResearch Collaboration  Tools  and  Resources   * AAF  – The  Australian  Access  Federation AARNET  – Australia’s  Academic  Research  Network *  funded  by  the  National  Collaborative  Research  Infrastructure  Strategy  (NCRIS) Data  Lifecycle  Project
  • 3. Research  Data  Services Icons  made  by Freepik from www.flaticon.com What  is  the  research  data  lifecycle? Creation  /   Discovery Description  /   Provenance Integration/   Storage Analysis  /   Manipulation Preserve  /   Archive  /   Discard
  • 4. Research  Data  Services Icons  made  by Freepik from www.flaticon.com Another  way  of  looking  at  it http://www.lib.ua.edu/wiki/sura/index.php/Data_Life_Cycle_M odels
  • 5. Research  Data  Services Icons  made  by Freepik from www.flaticon.com And  another http://www.lib.ua.edu/wiki/sura/index.php/Data_Life_Cycle_M odels
  • 6. Research  Data  Services Icons  made  by Freepik from www.flaticon.com ingest share dropbox-­‐like process inc NCI,  Pawsey,   local  HPC,  etc portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 Research  Data  Australia Data  Lifecycle  Project  – Existing  components  /  Services process Various  Storage   Resources Existing  Components/Services
  • 7. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db Provisioning  /   Access  /   Provenancemetadata  db ingest share dropbox-­‐like process inc NCI,  Pawsey,   local  HPC,  etc portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 Research  Data  Australia Data  Lifecycle  Project  – Proposed  new  components: process National  Storage   Resources Proposed  New/Enhanced  Components
  • 8. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db PAP  portal metadata  db ingest share dropbox-­‐like process inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 6 1 3 4 5 6 2 Researchers  access  grants  database  which  indicates  which  grants  they  ‘own’  or  have   access  to.    This  “surrounding”  metadata  is  registered  with  the  metadata  db. Space  marked  with  this  metadata  is  provisioned  on  dropbox-­like  storage  which  is  visible   to  the  NeCTAR   cloud  – this  space  should  belong  to  a  project,  not  a  person. Automated  and  manual  ingest  processes  feed  data  to  this  store,  harvesting  additional   metadata  where  possible  and  relevant Provisioned  space  should  be  as  dropbox like  as  possible. Storage  is  immediately  visible  to  NeCTAR   cloud  and  processes  developed  to  ship  data   to  local  HPC  or  peak  facilities  using  existing  high-­speed  networks  and  tools Once  project  is  complete,  the  data  is  packaged  and  shipped  to  and  indexed  by  the   relevant  domain   repository as  well  as  registered  with  the  RDA   index.  Research  Data  Australia Data  Lifecycle  Project  – All  components process 5 National  Storage   Resources 2 Possible  Project  Components
  • 9. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db metadata  db ingest share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 6 1 3 4 5 6 2 “What  can  I  access?” “How   can  I  feed  my  data  into  it?” “How   can  I  share  and  use  it  with  my  group  and  my  collaborators?” “Can  I  process  it  on  the  Cloud?”    “And  here  at  Uni  of  X?”    “I  need  a  bigger  machine..” ”I’ve  finished  my  project  and  I  think  this  data  could  be  useful  to  someone  in  the  future,  please   pack  it  away  and  make  it  available  somehow”   “I  don’t  want  to  share  it  just  yet,  please  hold  on  to  it  and  let  me  know  if  someone  wants  access”Research  Data  Australia Provisioning  (phase  1) Use  (phase  2) archive/publish/share/reuse/discard   (phase  3) Data  Lifecycle  Project   Outline  – phases/workflows processing 5 “Where   is  it?” National  Storage   Resources 2 Phases  /  Workflows PAP  portal
  • 10. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db metadata  db portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 Research  Data  Australia RDS NeCTAR ANDS Data  Lifecycle  Project Potential  Areas  of  Responsibility AAF AARNet ingest share dropbox-­‐like process inc NCI,  Pawsey,   local  HPC,  etc process Trove,  VL’s,  A1’s  etc cloudstor? High  speed   data  shipping Mediaflux/aspera? Medici?  GPFS? neutron? manta?ORCiD? mediaflux? Dublin  core? figshare? In  house,  hybrid,  3rd party  cloud? Do  we  need  to  pick  one?    Can  we  interface  with  many? Local  stores,  AWS,  DropBox,  Box,  Others? National  Storage   Resources Others  (eg unis,  NCRIS  projects,  etc) Possible  Project  Areas  of  Responsibility PAP  portal Identity  /  Authorisation /  Access Identity  /  Authorisation /  Access
  • 11. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db metadata  db portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 Research  Data  Australia Doesn’t  exist Exists  fully Exists  Partially Data  Lifecycle  Project Existing  components  or   parts  of  components ingest share dropbox-­‐like process inc NCI,  Pawsey,   local  HPC,  etc process National  Storage   Resources PAP  portal Identity  /  Authorisation /  Access Identity  /  Authorisation /  Access
  • 12. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db NeCTAR/RDSI  Storage metadata  db ingest share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 Domain   /  VL  portal repo  1 repo  2 Domain  /   VL  repo 2 Research  Data  Australia Example:   RDS  A1  Projects  & NeCTAR Virtual  Labs Do  not  have  a  need  for  dropbox like  sharing   or  peak  compute processing 5 6 Examples:  Virtual  Lab PAP  portal
  • 13. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects National   grants  db Local  Storage metadata  db ingest share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 Uni Research  Data  portal repo  1 repo  2 Local   repo 2 Research  Data  Australia Example:   University  Workflow Do  not  have  a  need  for  dropbox like  sharing   or  peak  compute processing 5 6 Examples:  University PAP  portal Local   grants  db Ethics  db
  • 14. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db Local  HPC  File  System metadata  db ingest share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 2 Research  Data  Australia Example:   HPC  Workflow Does  not  have  a  need  for  dropbox like  sharing   or  shoulder  compute processing 5 6 Examples:  High  Performance  Computing PAP  portal
  • 15. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db metadata  db ingest share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 2 Research  Data  Australia Example:   National  Cloudstor /  OwnCloud Focusses  on  the  ‘long  tail’  and  does  have   a  need  for  dropbox like  sharing  and  easy  access to  cloud  processing  platforms processing 5 6 Examples:  AARNet Cloudstor PAP  portal
  • 16. Research  Data  Services Icons  made  by Freepik from www.flaticon.com ingestprovisioning researchers projects grants  db metadata  db share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 portal  3 repo  1 repo  2 repo  3 2 Research  Data  Australia Example:   OwnCloud Federation processing 5 6 Uni A Uni B (uses  the  national   provisioning  portal) Uni C Uni A   provisioning portal Uni C provisioning portal Examples:  OwnCloud Federation PAP  portal 2 2 2
  • 17. Research  Data  Services Icons  made  by Freepik from www.flaticon.com provisioning researchers projects grants  db Amazon  S3 metadata  db ingest share dropbox-­‐like processing inc NCI,  Pawsey,   local  HPC,  etc 1 3 4 5 portal  1 portal  2 portal  3 repo  1 repo  2 Glacier 2 Research  Data  Australia Example:   Amazon Also  focusses  on  the  ‘long  tail’  and  has  a  need   for  dropbox like  sharing  and  easy  access to  highly-­elastic  and  integrated  infrastructure EC  2 5 6 Examples:  Amazon PAP  portal