SlideShare a Scribd company logo
1 of 35
Download to read offline
Building an Extensible
Storage Ecosystem with
WOS
Dr. Erik Deumens
SSERCA
SC’13 DDN User Meeting
SSERCA

•  Sunshine State Education & Research
Computing Alliance

o  Members: FIU, FSU, UCF, UF, UM, USF
o  Affiliates: FAMU, FAU, FIT, UNF
o  Glue: Florida LambdaRail regional network provider

•  Enable and enhance

o  collaborative research
o  for faculty and their teams in the state

•  Making them more competitive

o  by providing advanced cyber infrastructure
Proposal Vision and Overview
The researchers and their collaborations are
the central focus driving all design aspects of
the proposed extensible storage
environment.
Intellectual Merit

•  Address the need of working researchers
• 
• 
• 
• 

head-on
Not centered on some hardware or software
design
Naturally extensible
Intrinsically sustainable
Inclusive of new approaches
Broader Impacts

•  Open to all communities
•  Provide a framework to explore and broaden
• 
• 

a data centric research environment
Provide long-term roadmap to address
archival storage and transitioning data to it
Link campus and NSF XSEDE resources in
flexible way (eXtreme Science and
Engineering Discovery Environment)
Project Vision

•  What challenges are addressed?
•  What will the proposed project build with
• 
• 

NSF funding?
How are XSEDE resources leveraged?
Features of the architecture
o 
o 
o 

Sustainable
Extensible
Flexible and adaptable

•  What can others build leveraging this NSF
funded project?
Challenges for Storage Providers

•  Multiple sources, multiple sizes of data
o 
o 

Instrument data
Spreadsheets

•  Multiple places to store data
o 
o 

Campus systems
Cloud systems (Google Drive, Dropbox, etc)

•  Multiple actions and timescales in data life
o 
o 
o 

Analysis - compute and data intensive
Distribution - web site accessibility
§  general and restricted
Life cycle management - initial,
maturing, archiving
Principles
Create:
Effective environment for researchers….

• 

• 
• 

to work collaboratively
with complex workflows

•  Involving large and small data
We propose to bring the essence and simplicity of cloud
infrastructure to research:
Interactivity and instant gratification.
Think of something, and start doing it!
Proposal: XDESE
The eXtreme Digital Extensible Storage
Ecosystem - XDESE
Ecosystem is more complex than
environment
NSF funded and supported core

• 
• 

o 
o 
o 
o 

Distributed by design
Multi-access, multi-protocol, multi-owner
Leverage XSEDE resources
XRAC allocation process adapted for data
§  defined quota for defined time span
Storage Architecture
XSEDE – XRAC
Authentication
Authorization

XDESE - FIU

Data
Gateway

Internet
Data Replication

XDESE - UCF

Data
Gateway

XSEDE
resources:
Stampede
Kraken

Researcher
Proposal: XDESE (2)

•  Extensible with other funding
o  Geographically: campus and regional add-ons
§  plug and play racks
o  Organizationally: multiple communities
§  astrophysics, religion, archeology, ...
o  Functionally: add new protocols and formats
o  Public data: NSF funded
o  Restricted data: funded from other sources
o  Archival data and data repositories
XDESE Extension Architecture

•  Basic concept
o 
o 

o 
o 

WOScore storage system at remote location
WOScore provides
§  data replication and motion
§  policy and demand based
Add WOSaccess gateway to provide local
§  CIFS (personal) and NFS (organizational)
Add WOS GS bridge gateway to provide local
§  GPFS on GridScaler or Lustre on ExaScaler
Extension Architecture
WOS GS
Bridge

SSERCA
XDESE

Internet

Campus
WOS
Access

WOS GS
Bridge

XSEDE
Stampede

HPC

campus net
NFS/CIFS
Leverage XSEDE Resources

•  Users store and maintain data in XDESE
o 
o 

o 
o 

Long term project data
In support of collaboration
§  meaning easy access to many people
§  fine control over who can see and do what
Not intended for temporary data
XSEDE storage resources are suitable for that
Leverage XSEDE Resources (2)

•  Transfer data to XSEDE processors
o 
o 
o 
o 

Stampede, Kraken, etc
Bulk transfer
Complex data flow including data selection
XDESE will respond from multiple sites
§  improved performance, reliability, flexibility
Leverage XSEDE Resources (3)

•  Option 1 data transfer to XSEDE scratch file
system
o 
o 

During computation on XSEDE systems
Optimal performance is obtained
Leverage XSEDE Reseources (4)

•  Option 2 XSEDE compute job controls data
o 

o 

Program can control data selection..
o  from the XDESE storage
o  initiate transfer to and from of selected parts
XDESE storage (DDN WOScore) will optimize data
location among distributed XDESE storage nodes
o  use one of the extensions for further optimization
Partnerships

•  Network partner FLR
o 
o 
o 

Provide transport
Performance optimization with SDN and OpenFlow
Provide connection to Internet2 and XSEDEnet

•  Storage system vendor DDN
o 

Provides hardware, system software, and expertise
Builds the extension racks

o 

Data transfer: Globus Online

o 

•  Software interfaces
SSERCA XDESE Storage Solution
Florida	
  State	
  University	
  
	
  

SSERCA	
  End-­‐
Users	
  State	
  Wide	
  

University	
  of	
  Florida	
  

SSERCA	
  
Storage	
  
Cloud	
  

	
  

University	
  
of	
  South	
  
Florida	
  

University	
  of	
  
Central	
  Florida	
  
	
  
Florida	
  
Interna3onal	
  
	
  University	
  

University	
  of	
  
Miami	
  

©2012 DataDirect Networks. All Rights Reserved.

ddn.com
XDESE Building Block
At each SSERCA site
Storage server
• 

	
  

©2012 DataDirect Networks. All Rights Reserved.

2.1	
  PB	
  raw	
  

ddn.com
WOS6000 Cabinet
WOS6000 storage server
• 
• 
• 
• 

	
  

©2012 DataDirect Networks. All Rights Reserved.

12	
  drawers	
  	
  
180	
  TB	
  per	
  drawer	
  (2	
  nodes)	
  
2.1	
  PB	
  raw	
  capacity	
  
Policy	
  based	
  data	
  protecEon	
  
•  Ranges	
  from	
  100%	
  to	
  20%	
  
•  ReplicaEon	
  100%	
  overhead	
  
•  RAID-­‐like	
  encoding	
  20%	
  
overhead	
  

ddn.com
Resource Details

•  Primary data interface to the web
o  WOScloud (dropbox-like, REST over SSL, Oauth)
o  WOSshare (Amazon S3-like, S3=simple storage service, REST
interface, BitTorrent)

•  Generic server for Globus Online transfers
o  DDN customization needed for optimal speed
o  Initially simple NFS client via WOSaccess

•  Interface to SSERCA campus HPCs
o 
o 

Grid/ExaScaler to stage to GPFS/Lustre
Later read via NFS
Hardware Architecture

•  At the 6 SSERCA sites
o 
o 

Object Storage at 6 sites
Web server with data control panel

•  Data transfer mechanisms over FLR
XSEDEnet and Internet2
•  Extension racks at other locations
o 

o 
o 
o 
o 

Object storage
Network infrastructure OpenFlow capable
Provide multiple data path options to local campus
resources like NFS and CIFS access
Optional: compute resources
with scratch storage
XDESE: Extending and
Complementing XSEDE Storage
XDESE offers
Easy user interface
Composability of data flows and workflows
Multiple authentication domains
Ability to easily share data
Easy ingestion of instrument data

• 
• 
• 
• 
• 

User focused!
XDESE Storage and XSEDE
Compute

•  Full integration with XSEDE compute
• 
• 

resources
Easy data transfer is part of data and
workflow
The extensibility includes the option to..
o 
o 
o 
o 

install WOS GS bridge gateway
at XSEDE compute site(s)
for improved performance
works like Hierarchical File System
Authentication Interface

•  To be successful, compatibility with multiple
campus systems is also required
o 
o 

Need to design a simple system
Must allow users to manage multiple identities easily
§ 
§ 
§ 

XSEDE, XDESE, local campus, Google Drive, Dropbox,
Amazon S3, etc
globus Online supports transfer across authentication
domains
other tools like BitTorrent play a role too
Performance and Innovation for
Science & Engineering Applications

•  Performance, scalability, extensibility,
sustainability
Describe the general use case

• 
Example from humanities and social sciences
•  Select some strong science, engineering
o 

• 

application(s)
Innovation: explore archival strategies
Sustainable and Extensible

•  Distributed from inception
Basic functionalities will be tested and supported
•  Extensible simply by adding an XDESE rack
o 

o 
o 
o 
o 

Like NSF funded GENI project and GENI racks
Multiple vendors can supply the racks
Learn once from XDESE, apply everywhere
Path for even the smallest institutions
§  leverage NSF funded resource and get started quickly
§  single faculty can start working with XDESE
Use case: Generic researcher
Alice works on a project that involves..
data from an instrument and
more data generated by analysis and
modeling

• 
• 
Use Case: Setup

•  Alice gets an XDESE allocation
•  She arranges data to flow to the storage
from the instrument
o 

If the data flow demands it, she can set up a staging
rack (needs funds) with specs and support from
XDESE
Use Case: Data and Workflow

•  With the XDESE data & work control station
o  Looks like Galaxy https://main.g2.bx.psu.edu/
•  She controls data and workflow
o  Orchestrates data movement
o  Get all data in the right place
o  Right place is where the software and compute capability is at
XSEDE resources or on campus

•  Tools execute the movement
o 

Globus Online, etc.
Use Case: Results

•  The results can be viewed with tools from
• 
• 
• 

the location specified in the flow
Collaborators can get accounts and access
to her allocation
Multiple ways to access the data are
available
Further visualization and other processing
can easily be orchestrated
Use Case: Lifecycle Management

•  She can prepare the data for long-term
• 

sharing
Tools for creating metadata are provided
o  Rules for lifecycle management can be set up, e.g. iRODS
interface
o  Data can be annotated and recorded, e.g. Dataverse Network

•  Transition data to compatible systems
o 
o 

Campus libraries
Discipline-specific societies
Innovation: Archival Strategies

•  Proposed Architecture
o 
o 

XDESE provides an efficient path for exploration of
options
Institutions and libraries can buy an XDESE rack
§  dedicated to archival storage
§  data transfer in and out is supported
§  establish criteria for users to deposit data
• 

e.g. pass a data quality test of sufficient metadata
Thank You

More Related Content

What's hot

Hadoop file system
Hadoop file systemHadoop file system
Hadoop file systemJohn Veigas
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...ijcsit
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET Journal
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS Dr Neelesh Jain
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Survey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.pptSurvey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.pptRohn Wood
 

What's hot (20)

Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hdfs
HdfsHdfs
Hdfs
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
 
Hdfs design
Hdfs designHdfs design
Hdfs design
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Inroduction to Dspace
Inroduction to DspaceInroduction to Dspace
Inroduction to Dspace
 
Metadata in EOSDIS
Metadata in EOSDISMetadata in EOSDIS
Metadata in EOSDIS
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Survey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.pptSurvey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.ppt
 

Similar to Building and Extensible Storage Ecosystem with WOS

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureJetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureMatthew Vaughn
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk ProjectJulie Goldman
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Globus for Data Management: 2014 Joint Facility User Forum
Globus for Data Management: 2014 Joint Facility User ForumGlobus for Data Management: 2014 Joint Facility User Forum
Globus for Data Management: 2014 Joint Facility User ForumGlobus
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives John Towns
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Dr. Anita Goel
 
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream - Adding Cloud-based Computing to the National CyberinfrastructureJetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructureinside-BigData.com
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesMatthew Critchlow
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science GatewaysRaminder Singh
 

Similar to Building and Extensible Storage Ecosystem with WOS (20)

DataShare for UC Campuses
DataShare for UC CampusesDataShare for UC Campuses
DataShare for UC Campuses
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureJetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk Project
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Globus for Data Management: 2014 Joint Facility User Forum
Globus for Data Management: 2014 Joint Facility User ForumGlobus for Data Management: 2014 Joint Facility User Forum
Globus for Data Management: 2014 Joint Facility User Forum
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream - Adding Cloud-based Computing to the National CyberinfrastructureJetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science Gateways
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Building and Extensible Storage Ecosystem with WOS

  • 1. Building an Extensible Storage Ecosystem with WOS Dr. Erik Deumens SSERCA SC’13 DDN User Meeting
  • 2. SSERCA •  Sunshine State Education & Research Computing Alliance o  Members: FIU, FSU, UCF, UF, UM, USF o  Affiliates: FAMU, FAU, FIT, UNF o  Glue: Florida LambdaRail regional network provider •  Enable and enhance o  collaborative research o  for faculty and their teams in the state •  Making them more competitive o  by providing advanced cyber infrastructure
  • 3. Proposal Vision and Overview The researchers and their collaborations are the central focus driving all design aspects of the proposed extensible storage environment.
  • 4. Intellectual Merit •  Address the need of working researchers •  •  •  •  head-on Not centered on some hardware or software design Naturally extensible Intrinsically sustainable Inclusive of new approaches
  • 5. Broader Impacts •  Open to all communities •  Provide a framework to explore and broaden •  •  a data centric research environment Provide long-term roadmap to address archival storage and transitioning data to it Link campus and NSF XSEDE resources in flexible way (eXtreme Science and Engineering Discovery Environment)
  • 6. Project Vision •  What challenges are addressed? •  What will the proposed project build with •  •  NSF funding? How are XSEDE resources leveraged? Features of the architecture o  o  o  Sustainable Extensible Flexible and adaptable •  What can others build leveraging this NSF funded project?
  • 7. Challenges for Storage Providers •  Multiple sources, multiple sizes of data o  o  Instrument data Spreadsheets •  Multiple places to store data o  o  Campus systems Cloud systems (Google Drive, Dropbox, etc) •  Multiple actions and timescales in data life o  o  o  Analysis - compute and data intensive Distribution - web site accessibility §  general and restricted Life cycle management - initial, maturing, archiving
  • 8. Principles Create: Effective environment for researchers…. •  •  •  to work collaboratively with complex workflows •  Involving large and small data We propose to bring the essence and simplicity of cloud infrastructure to research: Interactivity and instant gratification. Think of something, and start doing it!
  • 9. Proposal: XDESE The eXtreme Digital Extensible Storage Ecosystem - XDESE Ecosystem is more complex than environment NSF funded and supported core •  •  o  o  o  o  Distributed by design Multi-access, multi-protocol, multi-owner Leverage XSEDE resources XRAC allocation process adapted for data §  defined quota for defined time span
  • 10. Storage Architecture XSEDE – XRAC Authentication Authorization XDESE - FIU Data Gateway Internet Data Replication XDESE - UCF Data Gateway XSEDE resources: Stampede Kraken Researcher
  • 11. Proposal: XDESE (2) •  Extensible with other funding o  Geographically: campus and regional add-ons §  plug and play racks o  Organizationally: multiple communities §  astrophysics, religion, archeology, ... o  Functionally: add new protocols and formats o  Public data: NSF funded o  Restricted data: funded from other sources o  Archival data and data repositories
  • 12. XDESE Extension Architecture •  Basic concept o  o  o  o  WOScore storage system at remote location WOScore provides §  data replication and motion §  policy and demand based Add WOSaccess gateway to provide local §  CIFS (personal) and NFS (organizational) Add WOS GS bridge gateway to provide local §  GPFS on GridScaler or Lustre on ExaScaler
  • 14. Leverage XSEDE Resources •  Users store and maintain data in XDESE o  o  o  o  Long term project data In support of collaboration §  meaning easy access to many people §  fine control over who can see and do what Not intended for temporary data XSEDE storage resources are suitable for that
  • 15. Leverage XSEDE Resources (2) •  Transfer data to XSEDE processors o  o  o  o  Stampede, Kraken, etc Bulk transfer Complex data flow including data selection XDESE will respond from multiple sites §  improved performance, reliability, flexibility
  • 16. Leverage XSEDE Resources (3) •  Option 1 data transfer to XSEDE scratch file system o  o  During computation on XSEDE systems Optimal performance is obtained
  • 17. Leverage XSEDE Reseources (4) •  Option 2 XSEDE compute job controls data o  o  Program can control data selection.. o  from the XDESE storage o  initiate transfer to and from of selected parts XDESE storage (DDN WOScore) will optimize data location among distributed XDESE storage nodes o  use one of the extensions for further optimization
  • 18. Partnerships •  Network partner FLR o  o  o  Provide transport Performance optimization with SDN and OpenFlow Provide connection to Internet2 and XSEDEnet •  Storage system vendor DDN o  Provides hardware, system software, and expertise Builds the extension racks o  Data transfer: Globus Online o  •  Software interfaces
  • 19. SSERCA XDESE Storage Solution Florida  State  University     SSERCA  End-­‐ Users  State  Wide   University  of  Florida   SSERCA   Storage   Cloud     University   of  South   Florida   University  of   Central  Florida     Florida   Interna3onal    University   University  of   Miami   ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 20. XDESE Building Block At each SSERCA site Storage server •    ©2012 DataDirect Networks. All Rights Reserved. 2.1  PB  raw   ddn.com
  • 21. WOS6000 Cabinet WOS6000 storage server •  •  •  •    ©2012 DataDirect Networks. All Rights Reserved. 12  drawers     180  TB  per  drawer  (2  nodes)   2.1  PB  raw  capacity   Policy  based  data  protecEon   •  Ranges  from  100%  to  20%   •  ReplicaEon  100%  overhead   •  RAID-­‐like  encoding  20%   overhead   ddn.com
  • 22. Resource Details •  Primary data interface to the web o  WOScloud (dropbox-like, REST over SSL, Oauth) o  WOSshare (Amazon S3-like, S3=simple storage service, REST interface, BitTorrent) •  Generic server for Globus Online transfers o  DDN customization needed for optimal speed o  Initially simple NFS client via WOSaccess •  Interface to SSERCA campus HPCs o  o  Grid/ExaScaler to stage to GPFS/Lustre Later read via NFS
  • 23. Hardware Architecture •  At the 6 SSERCA sites o  o  Object Storage at 6 sites Web server with data control panel •  Data transfer mechanisms over FLR XSEDEnet and Internet2 •  Extension racks at other locations o  o  o  o  o  Object storage Network infrastructure OpenFlow capable Provide multiple data path options to local campus resources like NFS and CIFS access Optional: compute resources with scratch storage
  • 24. XDESE: Extending and Complementing XSEDE Storage XDESE offers Easy user interface Composability of data flows and workflows Multiple authentication domains Ability to easily share data Easy ingestion of instrument data •  •  •  •  •  User focused!
  • 25. XDESE Storage and XSEDE Compute •  Full integration with XSEDE compute •  •  resources Easy data transfer is part of data and workflow The extensibility includes the option to.. o  o  o  o  install WOS GS bridge gateway at XSEDE compute site(s) for improved performance works like Hierarchical File System
  • 26. Authentication Interface •  To be successful, compatibility with multiple campus systems is also required o  o  Need to design a simple system Must allow users to manage multiple identities easily §  §  §  XSEDE, XDESE, local campus, Google Drive, Dropbox, Amazon S3, etc globus Online supports transfer across authentication domains other tools like BitTorrent play a role too
  • 27. Performance and Innovation for Science & Engineering Applications •  Performance, scalability, extensibility, sustainability Describe the general use case •  Example from humanities and social sciences •  Select some strong science, engineering o  •  application(s) Innovation: explore archival strategies
  • 28. Sustainable and Extensible •  Distributed from inception Basic functionalities will be tested and supported •  Extensible simply by adding an XDESE rack o  o  o  o  o  Like NSF funded GENI project and GENI racks Multiple vendors can supply the racks Learn once from XDESE, apply everywhere Path for even the smallest institutions §  leverage NSF funded resource and get started quickly §  single faculty can start working with XDESE
  • 29. Use case: Generic researcher Alice works on a project that involves.. data from an instrument and more data generated by analysis and modeling •  • 
  • 30. Use Case: Setup •  Alice gets an XDESE allocation •  She arranges data to flow to the storage from the instrument o  If the data flow demands it, she can set up a staging rack (needs funds) with specs and support from XDESE
  • 31. Use Case: Data and Workflow •  With the XDESE data & work control station o  Looks like Galaxy https://main.g2.bx.psu.edu/ •  She controls data and workflow o  Orchestrates data movement o  Get all data in the right place o  Right place is where the software and compute capability is at XSEDE resources or on campus •  Tools execute the movement o  Globus Online, etc.
  • 32. Use Case: Results •  The results can be viewed with tools from •  •  •  the location specified in the flow Collaborators can get accounts and access to her allocation Multiple ways to access the data are available Further visualization and other processing can easily be orchestrated
  • 33. Use Case: Lifecycle Management •  She can prepare the data for long-term •  sharing Tools for creating metadata are provided o  Rules for lifecycle management can be set up, e.g. iRODS interface o  Data can be annotated and recorded, e.g. Dataverse Network •  Transition data to compatible systems o  o  Campus libraries Discipline-specific societies
  • 34. Innovation: Archival Strategies •  Proposed Architecture o  o  XDESE provides an efficient path for exploration of options Institutions and libraries can buy an XDESE rack §  dedicated to archival storage §  data transfer in and out is supported §  establish criteria for users to deposit data •  e.g. pass a data quality test of sufficient metadata