Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building an Extensible
Storage Ecosystem with
WOS
Dr. Erik Deumens
SSERCA
SC’13 DDN User Meeting
SSERCA

•  Sunshine State Education & Research
Computing Alliance

o  Members: FIU, FSU, UCF, UF, UM, USF
o  Affiliates: F...
Proposal Vision and Overview
The researchers and their collaborations are
the central focus driving all design aspects of
...
Intellectual Merit

•  Address the need of working researchers
• 
• 
• 
• 

head-on
Not centered on some hardware or softw...
Broader Impacts

•  Open to all communities
•  Provide a framework to explore and broaden
• 
• 

a data centric research e...
Project Vision

•  What challenges are addressed?
•  What will the proposed project build with
• 
• 

NSF funding?
How are...
Challenges for Storage Providers

•  Multiple sources, multiple sizes of data
o 
o 

Instrument data
Spreadsheets

•  Mult...
Principles
Create:
Effective environment for researchers….

• 

• 
• 

to work collaboratively
with complex workflows

•  ...
Proposal: XDESE
The eXtreme Digital Extensible Storage
Ecosystem - XDESE
Ecosystem is more complex than
environment
NSF fu...
Storage Architecture
XSEDE – XRAC
Authentication
Authorization

XDESE - FIU

Data
Gateway

Internet
Data Replication

XDES...
Proposal: XDESE (2)

•  Extensible with other funding
o  Geographically: campus and regional add-ons
§  plug and play rac...
XDESE Extension Architecture

•  Basic concept
o 
o 

o 
o 

WOScore storage system at remote location
WOScore provides
§...
Extension Architecture
WOS GS
Bridge

SSERCA
XDESE

Internet

Campus
WOS
Access

WOS GS
Bridge

XSEDE
Stampede

HPC

campu...
Leverage XSEDE Resources

•  Users store and maintain data in XDESE
o 
o 

o 
o 

Long term project data
In support of col...
Leverage XSEDE Resources (2)

•  Transfer data to XSEDE processors
o 
o 
o 
o 

Stampede, Kraken, etc
Bulk transfer
Comple...
Leverage XSEDE Resources (3)

•  Option 1 data transfer to XSEDE scratch file
system
o 
o 

During computation on XSEDE sy...
Leverage XSEDE Reseources (4)

•  Option 2 XSEDE compute job controls data
o 

o 

Program can control data selection..
o ...
Partnerships

•  Network partner FLR
o 
o 
o 

Provide transport
Performance optimization with SDN and OpenFlow
Provide co...
SSERCA XDESE Storage Solution
Florida	
  State	
  University	
  
	
  

SSERCA	
  End-­‐
Users	
  State	
  Wide	
  

Univer...
XDESE Building Block
At each SSERCA site
Storage server
• 

	
  

©2012 DataDirect Networks. All Rights Reserved.

2.1	
  ...
WOS6000 Cabinet
WOS6000 storage server
• 
• 
• 
• 

	
  

©2012 DataDirect Networks. All Rights Reserved.

12	
  drawers	
...
Resource Details

•  Primary data interface to the web
o  WOScloud (dropbox-like, REST over SSL, Oauth)
o  WOSshare (Amazo...
Hardware Architecture

•  At the 6 SSERCA sites
o 
o 

Object Storage at 6 sites
Web server with data control panel

•  Da...
XDESE: Extending and
Complementing XSEDE Storage
XDESE offers
Easy user interface
Composability of data flows and workflow...
XDESE Storage and XSEDE
Compute

•  Full integration with XSEDE compute
• 
• 

resources
Easy data transfer is part of dat...
Authentication Interface

•  To be successful, compatibility with multiple
campus systems is also required
o 
o 

Need to ...
Performance and Innovation for
Science & Engineering Applications

•  Performance, scalability, extensibility,
sustainabil...
Sustainable and Extensible

•  Distributed from inception
Basic functionalities will be tested and supported
•  Extensible...
Use case: Generic researcher
Alice works on a project that involves..
data from an instrument and
more data generated by a...
Use Case: Setup

•  Alice gets an XDESE allocation
•  She arranges data to flow to the storage
from the instrument
o 

If ...
Use Case: Data and Workflow

•  With the XDESE data & work control station
o  Looks like Galaxy https://main.g2.bx.psu.edu...
Use Case: Results

•  The results can be viewed with tools from
• 
• 
• 

the location specified in the flow
Collaborators...
Use Case: Lifecycle Management

•  She can prepare the data for long-term
• 

sharing
Tools for creating metadata are prov...
Innovation: Archival Strategies

•  Proposed Architecture
o 
o 

XDESE provides an efficient path for exploration of
optio...
Thank You
Upcoming SlideShare
Loading in …5
×

Building and Extensible Storage Ecosystem with WOS

1,963 views

Published on

In this presentation from the DDN User Meeting at SC13, Erik Deumans from SSERCA describes how the institution is sharing data with WOS from DDN.

Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/

Published in: Technology
  • Be the first to comment

Building and Extensible Storage Ecosystem with WOS

  1. 1. Building an Extensible Storage Ecosystem with WOS Dr. Erik Deumens SSERCA SC’13 DDN User Meeting
  2. 2. SSERCA •  Sunshine State Education & Research Computing Alliance o  Members: FIU, FSU, UCF, UF, UM, USF o  Affiliates: FAMU, FAU, FIT, UNF o  Glue: Florida LambdaRail regional network provider •  Enable and enhance o  collaborative research o  for faculty and their teams in the state •  Making them more competitive o  by providing advanced cyber infrastructure
  3. 3. Proposal Vision and Overview The researchers and their collaborations are the central focus driving all design aspects of the proposed extensible storage environment.
  4. 4. Intellectual Merit •  Address the need of working researchers •  •  •  •  head-on Not centered on some hardware or software design Naturally extensible Intrinsically sustainable Inclusive of new approaches
  5. 5. Broader Impacts •  Open to all communities •  Provide a framework to explore and broaden •  •  a data centric research environment Provide long-term roadmap to address archival storage and transitioning data to it Link campus and NSF XSEDE resources in flexible way (eXtreme Science and Engineering Discovery Environment)
  6. 6. Project Vision •  What challenges are addressed? •  What will the proposed project build with •  •  NSF funding? How are XSEDE resources leveraged? Features of the architecture o  o  o  Sustainable Extensible Flexible and adaptable •  What can others build leveraging this NSF funded project?
  7. 7. Challenges for Storage Providers •  Multiple sources, multiple sizes of data o  o  Instrument data Spreadsheets •  Multiple places to store data o  o  Campus systems Cloud systems (Google Drive, Dropbox, etc) •  Multiple actions and timescales in data life o  o  o  Analysis - compute and data intensive Distribution - web site accessibility §  general and restricted Life cycle management - initial, maturing, archiving
  8. 8. Principles Create: Effective environment for researchers…. •  •  •  to work collaboratively with complex workflows •  Involving large and small data We propose to bring the essence and simplicity of cloud infrastructure to research: Interactivity and instant gratification. Think of something, and start doing it!
  9. 9. Proposal: XDESE The eXtreme Digital Extensible Storage Ecosystem - XDESE Ecosystem is more complex than environment NSF funded and supported core •  •  o  o  o  o  Distributed by design Multi-access, multi-protocol, multi-owner Leverage XSEDE resources XRAC allocation process adapted for data §  defined quota for defined time span
  10. 10. Storage Architecture XSEDE – XRAC Authentication Authorization XDESE - FIU Data Gateway Internet Data Replication XDESE - UCF Data Gateway XSEDE resources: Stampede Kraken Researcher
  11. 11. Proposal: XDESE (2) •  Extensible with other funding o  Geographically: campus and regional add-ons §  plug and play racks o  Organizationally: multiple communities §  astrophysics, religion, archeology, ... o  Functionally: add new protocols and formats o  Public data: NSF funded o  Restricted data: funded from other sources o  Archival data and data repositories
  12. 12. XDESE Extension Architecture •  Basic concept o  o  o  o  WOScore storage system at remote location WOScore provides §  data replication and motion §  policy and demand based Add WOSaccess gateway to provide local §  CIFS (personal) and NFS (organizational) Add WOS GS bridge gateway to provide local §  GPFS on GridScaler or Lustre on ExaScaler
  13. 13. Extension Architecture WOS GS Bridge SSERCA XDESE Internet Campus WOS Access WOS GS Bridge XSEDE Stampede HPC campus net NFS/CIFS
  14. 14. Leverage XSEDE Resources •  Users store and maintain data in XDESE o  o  o  o  Long term project data In support of collaboration §  meaning easy access to many people §  fine control over who can see and do what Not intended for temporary data XSEDE storage resources are suitable for that
  15. 15. Leverage XSEDE Resources (2) •  Transfer data to XSEDE processors o  o  o  o  Stampede, Kraken, etc Bulk transfer Complex data flow including data selection XDESE will respond from multiple sites §  improved performance, reliability, flexibility
  16. 16. Leverage XSEDE Resources (3) •  Option 1 data transfer to XSEDE scratch file system o  o  During computation on XSEDE systems Optimal performance is obtained
  17. 17. Leverage XSEDE Reseources (4) •  Option 2 XSEDE compute job controls data o  o  Program can control data selection.. o  from the XDESE storage o  initiate transfer to and from of selected parts XDESE storage (DDN WOScore) will optimize data location among distributed XDESE storage nodes o  use one of the extensions for further optimization
  18. 18. Partnerships •  Network partner FLR o  o  o  Provide transport Performance optimization with SDN and OpenFlow Provide connection to Internet2 and XSEDEnet •  Storage system vendor DDN o  Provides hardware, system software, and expertise Builds the extension racks o  Data transfer: Globus Online o  •  Software interfaces
  19. 19. SSERCA XDESE Storage Solution Florida  State  University     SSERCA  End-­‐ Users  State  Wide   University  of  Florida   SSERCA   Storage   Cloud     University   of  South   Florida   University  of   Central  Florida     Florida   Interna3onal    University   University  of   Miami   ©2012 DataDirect Networks. All Rights Reserved. ddn.com
  20. 20. XDESE Building Block At each SSERCA site Storage server •    ©2012 DataDirect Networks. All Rights Reserved. 2.1  PB  raw   ddn.com
  21. 21. WOS6000 Cabinet WOS6000 storage server •  •  •  •    ©2012 DataDirect Networks. All Rights Reserved. 12  drawers     180  TB  per  drawer  (2  nodes)   2.1  PB  raw  capacity   Policy  based  data  protecEon   •  Ranges  from  100%  to  20%   •  ReplicaEon  100%  overhead   •  RAID-­‐like  encoding  20%   overhead   ddn.com
  22. 22. Resource Details •  Primary data interface to the web o  WOScloud (dropbox-like, REST over SSL, Oauth) o  WOSshare (Amazon S3-like, S3=simple storage service, REST interface, BitTorrent) •  Generic server for Globus Online transfers o  DDN customization needed for optimal speed o  Initially simple NFS client via WOSaccess •  Interface to SSERCA campus HPCs o  o  Grid/ExaScaler to stage to GPFS/Lustre Later read via NFS
  23. 23. Hardware Architecture •  At the 6 SSERCA sites o  o  Object Storage at 6 sites Web server with data control panel •  Data transfer mechanisms over FLR XSEDEnet and Internet2 •  Extension racks at other locations o  o  o  o  o  Object storage Network infrastructure OpenFlow capable Provide multiple data path options to local campus resources like NFS and CIFS access Optional: compute resources with scratch storage
  24. 24. XDESE: Extending and Complementing XSEDE Storage XDESE offers Easy user interface Composability of data flows and workflows Multiple authentication domains Ability to easily share data Easy ingestion of instrument data •  •  •  •  •  User focused!
  25. 25. XDESE Storage and XSEDE Compute •  Full integration with XSEDE compute •  •  resources Easy data transfer is part of data and workflow The extensibility includes the option to.. o  o  o  o  install WOS GS bridge gateway at XSEDE compute site(s) for improved performance works like Hierarchical File System
  26. 26. Authentication Interface •  To be successful, compatibility with multiple campus systems is also required o  o  Need to design a simple system Must allow users to manage multiple identities easily §  §  §  XSEDE, XDESE, local campus, Google Drive, Dropbox, Amazon S3, etc globus Online supports transfer across authentication domains other tools like BitTorrent play a role too
  27. 27. Performance and Innovation for Science & Engineering Applications •  Performance, scalability, extensibility, sustainability Describe the general use case •  Example from humanities and social sciences •  Select some strong science, engineering o  •  application(s) Innovation: explore archival strategies
  28. 28. Sustainable and Extensible •  Distributed from inception Basic functionalities will be tested and supported •  Extensible simply by adding an XDESE rack o  o  o  o  o  Like NSF funded GENI project and GENI racks Multiple vendors can supply the racks Learn once from XDESE, apply everywhere Path for even the smallest institutions §  leverage NSF funded resource and get started quickly §  single faculty can start working with XDESE
  29. 29. Use case: Generic researcher Alice works on a project that involves.. data from an instrument and more data generated by analysis and modeling •  • 
  30. 30. Use Case: Setup •  Alice gets an XDESE allocation •  She arranges data to flow to the storage from the instrument o  If the data flow demands it, she can set up a staging rack (needs funds) with specs and support from XDESE
  31. 31. Use Case: Data and Workflow •  With the XDESE data & work control station o  Looks like Galaxy https://main.g2.bx.psu.edu/ •  She controls data and workflow o  Orchestrates data movement o  Get all data in the right place o  Right place is where the software and compute capability is at XSEDE resources or on campus •  Tools execute the movement o  Globus Online, etc.
  32. 32. Use Case: Results •  The results can be viewed with tools from •  •  •  the location specified in the flow Collaborators can get accounts and access to her allocation Multiple ways to access the data are available Further visualization and other processing can easily be orchestrated
  33. 33. Use Case: Lifecycle Management •  She can prepare the data for long-term •  sharing Tools for creating metadata are provided o  Rules for lifecycle management can be set up, e.g. iRODS interface o  Data can be annotated and recorded, e.g. Dataverse Network •  Transition data to compatible systems o  o  Campus libraries Discipline-specific societies
  34. 34. Innovation: Archival Strategies •  Proposed Architecture o  o  XDESE provides an efficient path for exploration of options Institutions and libraries can buy an XDESE rack §  dedicated to archival storage §  data transfer in and out is supported §  establish criteria for users to deposit data •  e.g. pass a data quality test of sufficient metadata
  35. 35. Thank You

×