In this presentation from the DDN User Meeting at SC13, Erik Deumans from SSERCA describes how the institution is sharing data with WOS from DDN.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
2. SSERCA
• Sunshine State Education & Research
Computing Alliance
o Members: FIU, FSU, UCF, UF, UM, USF
o Affiliates: FAMU, FAU, FIT, UNF
o Glue: Florida LambdaRail regional network provider
• Enable and enhance
o collaborative research
o for faculty and their teams in the state
• Making them more competitive
o by providing advanced cyber infrastructure
3. Proposal Vision and Overview
The researchers and their collaborations are
the central focus driving all design aspects of
the proposed extensible storage
environment.
4. Intellectual Merit
• Address the need of working researchers
•
•
•
•
head-on
Not centered on some hardware or software
design
Naturally extensible
Intrinsically sustainable
Inclusive of new approaches
5. Broader Impacts
• Open to all communities
• Provide a framework to explore and broaden
•
•
a data centric research environment
Provide long-term roadmap to address
archival storage and transitioning data to it
Link campus and NSF XSEDE resources in
flexible way (eXtreme Science and
Engineering Discovery Environment)
6. Project Vision
• What challenges are addressed?
• What will the proposed project build with
•
•
NSF funding?
How are XSEDE resources leveraged?
Features of the architecture
o
o
o
Sustainable
Extensible
Flexible and adaptable
• What can others build leveraging this NSF
funded project?
7. Challenges for Storage Providers
• Multiple sources, multiple sizes of data
o
o
Instrument data
Spreadsheets
• Multiple places to store data
o
o
Campus systems
Cloud systems (Google Drive, Dropbox, etc)
• Multiple actions and timescales in data life
o
o
o
Analysis - compute and data intensive
Distribution - web site accessibility
§ general and restricted
Life cycle management - initial,
maturing, archiving
8. Principles
Create:
Effective environment for researchers….
•
•
•
to work collaboratively
with complex workflows
• Involving large and small data
We propose to bring the essence and simplicity of cloud
infrastructure to research:
Interactivity and instant gratification.
Think of something, and start doing it!
9. Proposal: XDESE
The eXtreme Digital Extensible Storage
Ecosystem - XDESE
Ecosystem is more complex than
environment
NSF funded and supported core
•
•
o
o
o
o
Distributed by design
Multi-access, multi-protocol, multi-owner
Leverage XSEDE resources
XRAC allocation process adapted for data
§ defined quota for defined time span
10. Storage Architecture
XSEDE – XRAC
Authentication
Authorization
XDESE - FIU
Data
Gateway
Internet
Data Replication
XDESE - UCF
Data
Gateway
XSEDE
resources:
Stampede
Kraken
Researcher
11. Proposal: XDESE (2)
• Extensible with other funding
o Geographically: campus and regional add-ons
§ plug and play racks
o Organizationally: multiple communities
§ astrophysics, religion, archeology, ...
o Functionally: add new protocols and formats
o Public data: NSF funded
o Restricted data: funded from other sources
o Archival data and data repositories
12. XDESE Extension Architecture
• Basic concept
o
o
o
o
WOScore storage system at remote location
WOScore provides
§ data replication and motion
§ policy and demand based
Add WOSaccess gateway to provide local
§ CIFS (personal) and NFS (organizational)
Add WOS GS bridge gateway to provide local
§ GPFS on GridScaler or Lustre on ExaScaler
14. Leverage XSEDE Resources
• Users store and maintain data in XDESE
o
o
o
o
Long term project data
In support of collaboration
§ meaning easy access to many people
§ fine control over who can see and do what
Not intended for temporary data
XSEDE storage resources are suitable for that
15. Leverage XSEDE Resources (2)
• Transfer data to XSEDE processors
o
o
o
o
Stampede, Kraken, etc
Bulk transfer
Complex data flow including data selection
XDESE will respond from multiple sites
§ improved performance, reliability, flexibility
16. Leverage XSEDE Resources (3)
• Option 1 data transfer to XSEDE scratch file
system
o
o
During computation on XSEDE systems
Optimal performance is obtained
17. Leverage XSEDE Reseources (4)
• Option 2 XSEDE compute job controls data
o
o
Program can control data selection..
o from the XDESE storage
o initiate transfer to and from of selected parts
XDESE storage (DDN WOScore) will optimize data
location among distributed XDESE storage nodes
o use one of the extensions for further optimization
18. Partnerships
• Network partner FLR
o
o
o
Provide transport
Performance optimization with SDN and OpenFlow
Provide connection to Internet2 and XSEDEnet
• Storage system vendor DDN
o
Provides hardware, system software, and expertise
Builds the extension racks
o
Data transfer: Globus Online
o
• Software interfaces
22. Resource Details
• Primary data interface to the web
o WOScloud (dropbox-like, REST over SSL, Oauth)
o WOSshare (Amazon S3-like, S3=simple storage service, REST
interface, BitTorrent)
• Generic server for Globus Online transfers
o DDN customization needed for optimal speed
o Initially simple NFS client via WOSaccess
• Interface to SSERCA campus HPCs
o
o
Grid/ExaScaler to stage to GPFS/Lustre
Later read via NFS
23. Hardware Architecture
• At the 6 SSERCA sites
o
o
Object Storage at 6 sites
Web server with data control panel
• Data transfer mechanisms over FLR
XSEDEnet and Internet2
• Extension racks at other locations
o
o
o
o
o
Object storage
Network infrastructure OpenFlow capable
Provide multiple data path options to local campus
resources like NFS and CIFS access
Optional: compute resources
with scratch storage
24. XDESE: Extending and
Complementing XSEDE Storage
XDESE offers
Easy user interface
Composability of data flows and workflows
Multiple authentication domains
Ability to easily share data
Easy ingestion of instrument data
•
•
•
•
•
User focused!
25. XDESE Storage and XSEDE
Compute
• Full integration with XSEDE compute
•
•
resources
Easy data transfer is part of data and
workflow
The extensibility includes the option to..
o
o
o
o
install WOS GS bridge gateway
at XSEDE compute site(s)
for improved performance
works like Hierarchical File System
26. Authentication Interface
• To be successful, compatibility with multiple
campus systems is also required
o
o
Need to design a simple system
Must allow users to manage multiple identities easily
§
§
§
XSEDE, XDESE, local campus, Google Drive, Dropbox,
Amazon S3, etc
globus Online supports transfer across authentication
domains
other tools like BitTorrent play a role too
27. Performance and Innovation for
Science & Engineering Applications
• Performance, scalability, extensibility,
sustainability
Describe the general use case
•
Example from humanities and social sciences
• Select some strong science, engineering
o
•
application(s)
Innovation: explore archival strategies
28. Sustainable and Extensible
• Distributed from inception
Basic functionalities will be tested and supported
• Extensible simply by adding an XDESE rack
o
o
o
o
o
Like NSF funded GENI project and GENI racks
Multiple vendors can supply the racks
Learn once from XDESE, apply everywhere
Path for even the smallest institutions
§ leverage NSF funded resource and get started quickly
§ single faculty can start working with XDESE
29. Use case: Generic researcher
Alice works on a project that involves..
data from an instrument and
more data generated by analysis and
modeling
•
•
30. Use Case: Setup
• Alice gets an XDESE allocation
• She arranges data to flow to the storage
from the instrument
o
If the data flow demands it, she can set up a staging
rack (needs funds) with specs and support from
XDESE
31. Use Case: Data and Workflow
• With the XDESE data & work control station
o Looks like Galaxy https://main.g2.bx.psu.edu/
• She controls data and workflow
o Orchestrates data movement
o Get all data in the right place
o Right place is where the software and compute capability is at
XSEDE resources or on campus
• Tools execute the movement
o
Globus Online, etc.
32. Use Case: Results
• The results can be viewed with tools from
•
•
•
the location specified in the flow
Collaborators can get accounts and access
to her allocation
Multiple ways to access the data are
available
Further visualization and other processing
can easily be orchestrated
33. Use Case: Lifecycle Management
• She can prepare the data for long-term
•
sharing
Tools for creating metadata are provided
o Rules for lifecycle management can be set up, e.g. iRODS
interface
o Data can be annotated and recorded, e.g. Dataverse Network
• Transition data to compatible systems
o
o
Campus libraries
Discipline-specific societies
34. Innovation: Archival Strategies
• Proposed Architecture
o
o
XDESE provides an efficient path for exploration of
options
Institutions and libraries can buy an XDESE rack
§ dedicated to archival storage
§ data transfer in and out is supported
§ establish criteria for users to deposit data
•
e.g. pass a data quality test of sufficient metadata