Peter Tiernan
Systems and Storage Engineer
Digital Repository of Ireland
TCHPC
Ceph at the DRI
DRI:
The Digital Repository Of Ireland (DRI) is an
interactive, national trusted digital repository
for contemporary and h...
OAIS Model:
- is concerned with all technical aspects of digital
repositories
- describes ‘components and services require...
OAIS Model:
Source:www.digital-preservation.com
DRI Storage Requirements:
OAIS/TRAC requires the following from storage:
- Minimal conditions for performing long-term
pre...
DRI Storage Requirements:
- Open Source/Open Standards
- Independence
- High Availability
- Dynamically Configurable
- Eas...
Storage Solutions We Tested:
Why we didn't choose HDFS:
- Interfaces limited. Not posix compliant due to immutable
nature of filesystem.
- Performance ...
Why we didn't choose iRODS:
- Default Interfaces limited. No Restful, RBD.
- Single point of failure at its iCAT metadata ...
Why we chose Ceph:
- We like its distributed, clustered architecture
- Provides complete high availability on install
- Sc...
Findings:
HDFS iRODS Ceph GPFS
API Yes Yes Yes Yes
Fedora 3.6.x
Driver
Yes No No No
Interface: Posix No Yes Yes Yes
Interf...
DRI Infrastructure
Performance
Poor performance with low number of OSDs (6) and
replication.
Performance
Adding OSDs (26) improves replicated performance
Source: Diana Gudu, KIT
Source:DianaGudu,KIT
Source:DianaGudu...
Features we want from Ceph:
- Asynchronous Replication
- Erasure Coding
- Tiering
- Multi-datacenter/Rados level async rep...
Other Ceph Projects:
- TCHPC: 100TB cluster used for backups
- collaboration with KIT/PSNC: performance testing, WAN
scale...
DRI: www.dri.ie
Trinity HPC: www.tchpc.tcd.ie
Trinity College Dublin: www.tcd.ie
Questions?
Links:
Ceph: www.ceph.com
HDFS: hadoop.apache.org
IRODS: www.irods.org
GPFS: www.ibm.com/systems/software/gpfs/
Project Hy...
Upcoming SlideShare
Loading in …5
×

Peter Tiernan - Ceph at the Digital Repository of Ireland

644 views

Published on

The DRI has a need for vastly scalable and dynamic storage. In this presentation we explore 4 storage solutions and describe how we made the choice to use Ceph.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
644
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Peter Tiernan - Ceph at the Digital Repository of Ireland

  1. 1. Peter Tiernan Systems and Storage Engineer Digital Repository of Ireland TCHPC Ceph at the DRI
  2. 2. DRI: The Digital Repository Of Ireland (DRI) is an interactive, national trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions. The DRI follows the Open Archival Information System (OAIS) ISO reference model and The Trusted Repository Audit Checklist (TRAC)
  3. 3. OAIS Model: - is concerned with all technical aspects of digital repositories - describes ‘components and services required to develop and maintain archives’ - is broken down into 'Functional Entities' and 'Work Packages'. - WP8 is responsible for the ‘Archival Storage’ functional entity.
  4. 4. OAIS Model: Source:www.digital-preservation.com
  5. 5. DRI Storage Requirements: OAIS/TRAC requires the following from storage: - Minimal conditions for performing long-term preservation of digital assets - Long Term Preservation of digital assets, even if the OAIS (repository) itself is not permanent or present.
  6. 6. DRI Storage Requirements: - Open Source/Open Standards - Independence - High Availability - Dynamically Configurable - Ease of Interoperability (Interfaces, APIs) - Data Security/Placement (Replication, Erasure coding, Placement, Tiering, Federation) - Self Contained - Commodity Hardware
  7. 7. Storage Solutions We Tested:
  8. 8. Why we didn't choose HDFS: - Interfaces limited. Not posix compliant due to immutable nature of filesystem. - Performance geared towards large data streams. I/O of many small files is poor. - Single point of failure and bottleneck at its Namenode. - Doesn’t provide any federation
  9. 9. Why we didn't choose iRODS: - Default Interfaces limited. No Restful, RBD. - Single point of failure at its iCAT metadata server - Overlapping functionality with Fedora Commons Why we didn't choose GPFS: - Default Interfaces limited. No Restful, RBD. - Data Replica limit of 2. - Closed source
  10. 10. Why we chose Ceph: - We like its distributed, clustered architecture - Provides complete high availability on install - Scales out horizontally to massive levels - Data Security: Distributed, Replicated - Many interface options - Rich, documented, multi-level APIs - Dynamically configurable - Very good Performance for general use (many small file I/O) - Solid release schedule, new features
  11. 11. Findings: HDFS iRODS Ceph GPFS API Yes Yes Yes Yes Fedora 3.6.x Driver Yes No No No Interface: Posix No Yes Yes Yes Interface: RBD No No Yes No Interface: RESTful Yes No Yes No Dynamic Configuration Yes Yes Yes Yes High Availability: Data Yes Yes Yes Yes High Availability: Service No No Yes Yes Max Raw Storage (PetaByte) >100 N/A >100 4 - 10^14 On-Read Data Checking No Yes No No Max Replicas 512 >2 ~2.1 Billion 2 Federation No Yes No Yes
  12. 12. DRI Infrastructure
  13. 13. Performance Poor performance with low number of OSDs (6) and replication.
  14. 14. Performance Adding OSDs (26) improves replicated performance Source: Diana Gudu, KIT Source:DianaGudu,KIT Source:DianaGudu,KIT
  15. 15. Features we want from Ceph: - Asynchronous Replication - Erasure Coding - Tiering - Multi-datacenter/Rados level async replication - Micro-services
  16. 16. Other Ceph Projects: - TCHPC: 100TB cluster used for backups - collaboration with KIT/PSNC: performance testing, WAN scale replication testing (Sync/Async)
  17. 17. DRI: www.dri.ie Trinity HPC: www.tchpc.tcd.ie Trinity College Dublin: www.tcd.ie Questions?
  18. 18. Links: Ceph: www.ceph.com HDFS: hadoop.apache.org IRODS: www.irods.org GPFS: www.ibm.com/systems/software/gpfs/ Project Hydra: projecthydra.org Fedora Commons: www.fedora-commons.org Apache SOLR: lucene.apache.org/solr/ HAProxy: haproxy.1wt.eu

×