SlideShare a Scribd company logo
1 of 35
Download to read offline
Building an Extensible
Storage Ecosystem with
WOS
Dr. Erik Deumens
SSERCA
SC’13 DDN User Meeting
SSERCA

•  Sunshine State Education & Research
Computing Alliance

o  Members: FIU, FSU, UCF, UF, UM, USF
o  Affiliates: FAMU, FAU, FIT, UNF
o  Glue: Florida LambdaRail regional network provider

•  Enable and enhance

o  collaborative research
o  for faculty and their teams in the state

•  Making them more competitive

o  by providing advanced cyber infrastructure
Proposal Vision and Overview
The researchers and their collaborations are
the central focus driving all design aspects of
the proposed extensible storage
environment.
Intellectual Merit

•  Address the need of working researchers
• 
• 
• 
• 

head-on
Not centered on some hardware or software
design
Naturally extensible
Intrinsically sustainable
Inclusive of new approaches
Broader Impacts

•  Open to all communities
•  Provide a framework to explore and broaden
• 
• 

a data centric research environment
Provide long-term roadmap to address
archival storage and transitioning data to it
Link campus and NSF XSEDE resources in
flexible way (eXtreme Science and
Engineering Discovery Environment)
Project Vision

•  What challenges are addressed?
•  What will the proposed project build with
• 
• 

NSF funding?
How are XSEDE resources leveraged?
Features of the architecture
o 
o 
o 

Sustainable
Extensible
Flexible and adaptable

•  What can others build leveraging this NSF
funded project?
Challenges for Storage Providers

•  Multiple sources, multiple sizes of data
o 
o 

Instrument data
Spreadsheets

•  Multiple places to store data
o 
o 

Campus systems
Cloud systems (Google Drive, Dropbox, etc)

•  Multiple actions and timescales in data life
o 
o 
o 

Analysis - compute and data intensive
Distribution - web site accessibility
§  general and restricted
Life cycle management - initial,
maturing, archiving
Principles
Create:
Effective environment for researchers….

• 

• 
• 

to work collaboratively
with complex workflows

•  Involving large and small data
We propose to bring the essence and simplicity of cloud
infrastructure to research:
Interactivity and instant gratification.
Think of something, and start doing it!
Proposal: XDESE
The eXtreme Digital Extensible Storage
Ecosystem - XDESE
Ecosystem is more complex than
environment
NSF funded and supported core

• 
• 

o 
o 
o 
o 

Distributed by design
Multi-access, multi-protocol, multi-owner
Leverage XSEDE resources
XRAC allocation process adapted for data
§  defined quota for defined time span
Storage Architecture
XSEDE – XRAC
Authentication
Authorization

XDESE - FIU

Data
Gateway

Internet
Data Replication

XDESE - UCF

Data
Gateway

XSEDE
resources:
Stampede
Kraken

Researcher
Proposal: XDESE (2)

•  Extensible with other funding
o  Geographically: campus and regional add-ons
§  plug and play racks
o  Organizationally: multiple communities
§  astrophysics, religion, archeology, ...
o  Functionally: add new protocols and formats
o  Public data: NSF funded
o  Restricted data: funded from other sources
o  Archival data and data repositories
XDESE Extension Architecture

•  Basic concept
o 
o 

o 
o 

WOScore storage system at remote location
WOScore provides
§  data replication and motion
§  policy and demand based
Add WOSaccess gateway to provide local
§  CIFS (personal) and NFS (organizational)
Add WOS GS bridge gateway to provide local
§  GPFS on GridScaler or Lustre on ExaScaler
Extension Architecture
WOS GS
Bridge

SSERCA
XDESE

Internet

Campus
WOS
Access

WOS GS
Bridge

XSEDE
Stampede

HPC

campus net
NFS/CIFS
Leverage XSEDE Resources

•  Users store and maintain data in XDESE
o 
o 

o 
o 

Long term project data
In support of collaboration
§  meaning easy access to many people
§  fine control over who can see and do what
Not intended for temporary data
XSEDE storage resources are suitable for that
Leverage XSEDE Resources (2)

•  Transfer data to XSEDE processors
o 
o 
o 
o 

Stampede, Kraken, etc
Bulk transfer
Complex data flow including data selection
XDESE will respond from multiple sites
§  improved performance, reliability, flexibility
Leverage XSEDE Resources (3)

•  Option 1 data transfer to XSEDE scratch file
system
o 
o 

During computation on XSEDE systems
Optimal performance is obtained
Leverage XSEDE Reseources (4)

•  Option 2 XSEDE compute job controls data
o 

o 

Program can control data selection..
o  from the XDESE storage
o  initiate transfer to and from of selected parts
XDESE storage (DDN WOScore) will optimize data
location among distributed XDESE storage nodes
o  use one of the extensions for further optimization
Partnerships

•  Network partner FLR
o 
o 
o 

Provide transport
Performance optimization with SDN and OpenFlow
Provide connection to Internet2 and XSEDEnet

•  Storage system vendor DDN
o 

Provides hardware, system software, and expertise
Builds the extension racks

o 

Data transfer: Globus Online

o 

•  Software interfaces
SSERCA XDESE Storage Solution
Florida	
  State	
  University	
  
	
  

SSERCA	
  End-­‐
Users	
  State	
  Wide	
  

University	
  of	
  Florida	
  

SSERCA	
  
Storage	
  
Cloud	
  

	
  

University	
  
of	
  South	
  
Florida	
  

University	
  of	
  
Central	
  Florida	
  
	
  
Florida	
  
Interna3onal	
  
	
  University	
  

University	
  of	
  
Miami	
  

©2012 DataDirect Networks. All Rights Reserved.

ddn.com
XDESE Building Block
At each SSERCA site
Storage server
• 

	
  

©2012 DataDirect Networks. All Rights Reserved.

2.1	
  PB	
  raw	
  

ddn.com
WOS6000 Cabinet
WOS6000 storage server
• 
• 
• 
• 

	
  

©2012 DataDirect Networks. All Rights Reserved.

12	
  drawers	
  	
  
180	
  TB	
  per	
  drawer	
  (2	
  nodes)	
  
2.1	
  PB	
  raw	
  capacity	
  
Policy	
  based	
  data	
  protecEon	
  
•  Ranges	
  from	
  100%	
  to	
  20%	
  
•  ReplicaEon	
  100%	
  overhead	
  
•  RAID-­‐like	
  encoding	
  20%	
  
overhead	
  

ddn.com
Resource Details

•  Primary data interface to the web
o  WOScloud (dropbox-like, REST over SSL, Oauth)
o  WOSshare (Amazon S3-like, S3=simple storage service, REST
interface, BitTorrent)

•  Generic server for Globus Online transfers
o  DDN customization needed for optimal speed
o  Initially simple NFS client via WOSaccess

•  Interface to SSERCA campus HPCs
o 
o 

Grid/ExaScaler to stage to GPFS/Lustre
Later read via NFS
Hardware Architecture

•  At the 6 SSERCA sites
o 
o 

Object Storage at 6 sites
Web server with data control panel

•  Data transfer mechanisms over FLR
XSEDEnet and Internet2
•  Extension racks at other locations
o 

o 
o 
o 
o 

Object storage
Network infrastructure OpenFlow capable
Provide multiple data path options to local campus
resources like NFS and CIFS access
Optional: compute resources
with scratch storage
XDESE: Extending and
Complementing XSEDE Storage
XDESE offers
Easy user interface
Composability of data flows and workflows
Multiple authentication domains
Ability to easily share data
Easy ingestion of instrument data

• 
• 
• 
• 
• 

User focused!
XDESE Storage and XSEDE
Compute

•  Full integration with XSEDE compute
• 
• 

resources
Easy data transfer is part of data and
workflow
The extensibility includes the option to..
o 
o 
o 
o 

install WOS GS bridge gateway
at XSEDE compute site(s)
for improved performance
works like Hierarchical File System
Authentication Interface

•  To be successful, compatibility with multiple
campus systems is also required
o 
o 

Need to design a simple system
Must allow users to manage multiple identities easily
§ 
§ 
§ 

XSEDE, XDESE, local campus, Google Drive, Dropbox,
Amazon S3, etc
globus Online supports transfer across authentication
domains
other tools like BitTorrent play a role too
Performance and Innovation for
Science & Engineering Applications

•  Performance, scalability, extensibility,
sustainability
Describe the general use case

• 
Example from humanities and social sciences
•  Select some strong science, engineering
o 

• 

application(s)
Innovation: explore archival strategies
Sustainable and Extensible

•  Distributed from inception
Basic functionalities will be tested and supported
•  Extensible simply by adding an XDESE rack
o 

o 
o 
o 
o 

Like NSF funded GENI project and GENI racks
Multiple vendors can supply the racks
Learn once from XDESE, apply everywhere
Path for even the smallest institutions
§  leverage NSF funded resource and get started quickly
§  single faculty can start working with XDESE
Use case: Generic researcher
Alice works on a project that involves..
data from an instrument and
more data generated by analysis and
modeling

• 
• 
Use Case: Setup

•  Alice gets an XDESE allocation
•  She arranges data to flow to the storage
from the instrument
o 

If the data flow demands it, she can set up a staging
rack (needs funds) with specs and support from
XDESE
Use Case: Data and Workflow

•  With the XDESE data & work control station
o  Looks like Galaxy https://main.g2.bx.psu.edu/
•  She controls data and workflow
o  Orchestrates data movement
o  Get all data in the right place
o  Right place is where the software and compute capability is at
XSEDE resources or on campus

•  Tools execute the movement
o 

Globus Online, etc.
Use Case: Results

•  The results can be viewed with tools from
• 
• 
• 

the location specified in the flow
Collaborators can get accounts and access
to her allocation
Multiple ways to access the data are
available
Further visualization and other processing
can easily be orchestrated
Use Case: Lifecycle Management

•  She can prepare the data for long-term
• 

sharing
Tools for creating metadata are provided
o  Rules for lifecycle management can be set up, e.g. iRODS
interface
o  Data can be annotated and recorded, e.g. Dataverse Network

•  Transition data to compatible systems
o 
o 

Campus libraries
Discipline-specific societies
Innovation: Archival Strategies

•  Proposed Architecture
o 
o 

XDESE provides an efficient path for exploration of
options
Institutions and libraries can buy an XDESE rack
§  dedicated to archival storage
§  data transfer in and out is supported
§  establish criteria for users to deposit data
• 

e.g. pass a data quality test of sufficient metadata
Thank You

More Related Content

What's hot

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Vaibhav Jain
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
ijcsit
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
MapR Technologies
 

What's hot (20)

Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hdfs
HdfsHdfs
Hdfs
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
 
Hdfs design
Hdfs designHdfs design
Hdfs design
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Inroduction to Dspace
Inroduction to DspaceInroduction to Dspace
Inroduction to Dspace
 
Metadata in EOSDIS
Metadata in EOSDISMetadata in EOSDIS
Metadata in EOSDIS
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Survey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.pptSurvey of clustered_parallel_file_systems_004_lanl.ppt
Survey of clustered_parallel_file_systems_004_lanl.ppt
 

Similar to Building and Extensible Storage Ecosystem with WOS

Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
World Agroforestry (ICRAF)
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 

Similar to Building and Extensible Storage Ecosystem with WOS (20)

DataShare for UC Campuses
DataShare for UC CampusesDataShare for UC Campuses
DataShare for UC Campuses
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National CyberinfrastructureJetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream: Adding Cloud-based Computing to the National Cyberinfrastructure
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Dataverse Netowrk Project
Dataverse Netowrk ProjectDataverse Netowrk Project
Dataverse Netowrk Project
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Globus for Data Management: 2014 Joint Facility User Forum
Globus for Data Management: 2014 Joint Facility User ForumGlobus for Data Management: 2014 Joint Facility User Forum
Globus for Data Management: 2014 Joint Facility User Forum
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream - Adding Cloud-based Computing to the National CyberinfrastructureJetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
Jetstream - Adding Cloud-based Computing to the National Cyberinfrastructure
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Duraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository ServicesDuraspace Hot Topics Series 6: Metadata and Repository Services
Duraspace Hot Topics Series 6: Metadata and Repository Services
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science Gateways
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 

Building and Extensible Storage Ecosystem with WOS

  • 1. Building an Extensible Storage Ecosystem with WOS Dr. Erik Deumens SSERCA SC’13 DDN User Meeting
  • 2. SSERCA •  Sunshine State Education & Research Computing Alliance o  Members: FIU, FSU, UCF, UF, UM, USF o  Affiliates: FAMU, FAU, FIT, UNF o  Glue: Florida LambdaRail regional network provider •  Enable and enhance o  collaborative research o  for faculty and their teams in the state •  Making them more competitive o  by providing advanced cyber infrastructure
  • 3. Proposal Vision and Overview The researchers and their collaborations are the central focus driving all design aspects of the proposed extensible storage environment.
  • 4. Intellectual Merit •  Address the need of working researchers •  •  •  •  head-on Not centered on some hardware or software design Naturally extensible Intrinsically sustainable Inclusive of new approaches
  • 5. Broader Impacts •  Open to all communities •  Provide a framework to explore and broaden •  •  a data centric research environment Provide long-term roadmap to address archival storage and transitioning data to it Link campus and NSF XSEDE resources in flexible way (eXtreme Science and Engineering Discovery Environment)
  • 6. Project Vision •  What challenges are addressed? •  What will the proposed project build with •  •  NSF funding? How are XSEDE resources leveraged? Features of the architecture o  o  o  Sustainable Extensible Flexible and adaptable •  What can others build leveraging this NSF funded project?
  • 7. Challenges for Storage Providers •  Multiple sources, multiple sizes of data o  o  Instrument data Spreadsheets •  Multiple places to store data o  o  Campus systems Cloud systems (Google Drive, Dropbox, etc) •  Multiple actions and timescales in data life o  o  o  Analysis - compute and data intensive Distribution - web site accessibility §  general and restricted Life cycle management - initial, maturing, archiving
  • 8. Principles Create: Effective environment for researchers…. •  •  •  to work collaboratively with complex workflows •  Involving large and small data We propose to bring the essence and simplicity of cloud infrastructure to research: Interactivity and instant gratification. Think of something, and start doing it!
  • 9. Proposal: XDESE The eXtreme Digital Extensible Storage Ecosystem - XDESE Ecosystem is more complex than environment NSF funded and supported core •  •  o  o  o  o  Distributed by design Multi-access, multi-protocol, multi-owner Leverage XSEDE resources XRAC allocation process adapted for data §  defined quota for defined time span
  • 10. Storage Architecture XSEDE – XRAC Authentication Authorization XDESE - FIU Data Gateway Internet Data Replication XDESE - UCF Data Gateway XSEDE resources: Stampede Kraken Researcher
  • 11. Proposal: XDESE (2) •  Extensible with other funding o  Geographically: campus and regional add-ons §  plug and play racks o  Organizationally: multiple communities §  astrophysics, religion, archeology, ... o  Functionally: add new protocols and formats o  Public data: NSF funded o  Restricted data: funded from other sources o  Archival data and data repositories
  • 12. XDESE Extension Architecture •  Basic concept o  o  o  o  WOScore storage system at remote location WOScore provides §  data replication and motion §  policy and demand based Add WOSaccess gateway to provide local §  CIFS (personal) and NFS (organizational) Add WOS GS bridge gateway to provide local §  GPFS on GridScaler or Lustre on ExaScaler
  • 14. Leverage XSEDE Resources •  Users store and maintain data in XDESE o  o  o  o  Long term project data In support of collaboration §  meaning easy access to many people §  fine control over who can see and do what Not intended for temporary data XSEDE storage resources are suitable for that
  • 15. Leverage XSEDE Resources (2) •  Transfer data to XSEDE processors o  o  o  o  Stampede, Kraken, etc Bulk transfer Complex data flow including data selection XDESE will respond from multiple sites §  improved performance, reliability, flexibility
  • 16. Leverage XSEDE Resources (3) •  Option 1 data transfer to XSEDE scratch file system o  o  During computation on XSEDE systems Optimal performance is obtained
  • 17. Leverage XSEDE Reseources (4) •  Option 2 XSEDE compute job controls data o  o  Program can control data selection.. o  from the XDESE storage o  initiate transfer to and from of selected parts XDESE storage (DDN WOScore) will optimize data location among distributed XDESE storage nodes o  use one of the extensions for further optimization
  • 18. Partnerships •  Network partner FLR o  o  o  Provide transport Performance optimization with SDN and OpenFlow Provide connection to Internet2 and XSEDEnet •  Storage system vendor DDN o  Provides hardware, system software, and expertise Builds the extension racks o  Data transfer: Globus Online o  •  Software interfaces
  • 19. SSERCA XDESE Storage Solution Florida  State  University     SSERCA  End-­‐ Users  State  Wide   University  of  Florida   SSERCA   Storage   Cloud     University   of  South   Florida   University  of   Central  Florida     Florida   Interna3onal    University   University  of   Miami   ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  • 20. XDESE Building Block At each SSERCA site Storage server •    ©2012 DataDirect Networks. All Rights Reserved. 2.1  PB  raw   ddn.com
  • 21. WOS6000 Cabinet WOS6000 storage server •  •  •  •    ©2012 DataDirect Networks. All Rights Reserved. 12  drawers     180  TB  per  drawer  (2  nodes)   2.1  PB  raw  capacity   Policy  based  data  protecEon   •  Ranges  from  100%  to  20%   •  ReplicaEon  100%  overhead   •  RAID-­‐like  encoding  20%   overhead   ddn.com
  • 22. Resource Details •  Primary data interface to the web o  WOScloud (dropbox-like, REST over SSL, Oauth) o  WOSshare (Amazon S3-like, S3=simple storage service, REST interface, BitTorrent) •  Generic server for Globus Online transfers o  DDN customization needed for optimal speed o  Initially simple NFS client via WOSaccess •  Interface to SSERCA campus HPCs o  o  Grid/ExaScaler to stage to GPFS/Lustre Later read via NFS
  • 23. Hardware Architecture •  At the 6 SSERCA sites o  o  Object Storage at 6 sites Web server with data control panel •  Data transfer mechanisms over FLR XSEDEnet and Internet2 •  Extension racks at other locations o  o  o  o  o  Object storage Network infrastructure OpenFlow capable Provide multiple data path options to local campus resources like NFS and CIFS access Optional: compute resources with scratch storage
  • 24. XDESE: Extending and Complementing XSEDE Storage XDESE offers Easy user interface Composability of data flows and workflows Multiple authentication domains Ability to easily share data Easy ingestion of instrument data •  •  •  •  •  User focused!
  • 25. XDESE Storage and XSEDE Compute •  Full integration with XSEDE compute •  •  resources Easy data transfer is part of data and workflow The extensibility includes the option to.. o  o  o  o  install WOS GS bridge gateway at XSEDE compute site(s) for improved performance works like Hierarchical File System
  • 26. Authentication Interface •  To be successful, compatibility with multiple campus systems is also required o  o  Need to design a simple system Must allow users to manage multiple identities easily §  §  §  XSEDE, XDESE, local campus, Google Drive, Dropbox, Amazon S3, etc globus Online supports transfer across authentication domains other tools like BitTorrent play a role too
  • 27. Performance and Innovation for Science & Engineering Applications •  Performance, scalability, extensibility, sustainability Describe the general use case •  Example from humanities and social sciences •  Select some strong science, engineering o  •  application(s) Innovation: explore archival strategies
  • 28. Sustainable and Extensible •  Distributed from inception Basic functionalities will be tested and supported •  Extensible simply by adding an XDESE rack o  o  o  o  o  Like NSF funded GENI project and GENI racks Multiple vendors can supply the racks Learn once from XDESE, apply everywhere Path for even the smallest institutions §  leverage NSF funded resource and get started quickly §  single faculty can start working with XDESE
  • 29. Use case: Generic researcher Alice works on a project that involves.. data from an instrument and more data generated by analysis and modeling •  • 
  • 30. Use Case: Setup •  Alice gets an XDESE allocation •  She arranges data to flow to the storage from the instrument o  If the data flow demands it, she can set up a staging rack (needs funds) with specs and support from XDESE
  • 31. Use Case: Data and Workflow •  With the XDESE data & work control station o  Looks like Galaxy https://main.g2.bx.psu.edu/ •  She controls data and workflow o  Orchestrates data movement o  Get all data in the right place o  Right place is where the software and compute capability is at XSEDE resources or on campus •  Tools execute the movement o  Globus Online, etc.
  • 32. Use Case: Results •  The results can be viewed with tools from •  •  •  the location specified in the flow Collaborators can get accounts and access to her allocation Multiple ways to access the data are available Further visualization and other processing can easily be orchestrated
  • 33. Use Case: Lifecycle Management •  She can prepare the data for long-term •  sharing Tools for creating metadata are provided o  Rules for lifecycle management can be set up, e.g. iRODS interface o  Data can be annotated and recorded, e.g. Dataverse Network •  Transition data to compatible systems o  o  Campus libraries Discipline-specific societies
  • 34. Innovation: Archival Strategies •  Proposed Architecture o  o  XDESE provides an efficient path for exploration of options Institutions and libraries can buy an XDESE rack §  dedicated to archival storage §  data transfer in and out is supported §  establish criteria for users to deposit data •  e.g. pass a data quality test of sufficient metadata