0
Peter Tiernan
Systems and Storage Engineer
Digital Repository of Ireland
TCHPC
Ceph at the DRI
DRI:
The Digital Repository Of Ireland (DRI) is an
interactive, national trusted digital repository
for contemporary and h...
OAIS Model:
- is concerned with all technical aspects of digital
repositories
- describes ‘components and services require...
OAIS Model:
Source:www.digital-preservation.com
DRI Storage Requirements:
OAIS/TRAC requires the following from storage:
- Minimal conditions for performing long-term
pre...
DRI Storage Requirements:
- Open Source/Open Standards
- Independence
- High Availability
- Dynamically Configurable
- Eas...
Storage Solutions We Tested:
Why we didn't choose HDFS:
- Interfaces limited. Not posix compliant due to immutable
nature of filesystem.
- Performance ...
Why we didn't choose iRODS:
- Default Interfaces limited. No Restful, RBD.
- Single point of failure at its iCAT metadata ...
Why we chose Ceph:
- We like its distributed, clustered architecture
- Provides complete high availability on install
- Sc...
Findings:
HDFS iRODS Ceph GPFS
API Yes Yes Yes Yes
Fedora 3.6.x
Driver
Yes No No No
Interface: Posix No Yes Yes Yes
Interf...
DRI Infrastructure
Performance
Poor performance with low number of OSDs (6) and
replication.
Performance
Adding OSDs (26) improves replicated performance
Source: Diana Gudu, KIT
Source:DianaGudu,KIT
Source:DianaGudu...
Features we want from Ceph:
- Asynchronous Replication
- Erasure Coding
- Tiering
- Multi-datacenter/Rados level async rep...
Other Ceph Projects:
- TCHPC: 100TB cluster used for backups
- collaboration with KIT/PSNC: performance testing, WAN
scale...
DRI: www.dri.ie
Trinity HPC: www.tchpc.tcd.ie
Trinity College Dublin: www.tcd.ie
Questions?
Links:
Ceph: www.ceph.com
HDFS: hadoop.apache.org
IRODS: www.irods.org
GPFS: www.ibm.com/systems/software/gpfs/
Project Hy...
Upcoming SlideShare
Loading in...5
×

Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt

826

Published on

Peter Tiernan, Trinity College

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
826
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
70
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Ceph at the Digital Repository of Ireland - Ceph Day Frankfurt "

  1. 1. Peter Tiernan Systems and Storage Engineer Digital Repository of Ireland TCHPC Ceph at the DRI
  2. 2. DRI: The Digital Repository Of Ireland (DRI) is an interactive, national trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions. The DRI follows the Open Archival Information System (OAIS) ISO reference model and The Trusted Repository Audit Checklist (TRAC)
  3. 3. OAIS Model: - is concerned with all technical aspects of digital repositories - describes ‘components and services required to develop and maintain archives’ - is broken down into 'Functional Entities' and 'Work Packages'. - WP8 is responsible for the ‘Archival Storage’ functional entity.
  4. 4. OAIS Model: Source:www.digital-preservation.com
  5. 5. DRI Storage Requirements: OAIS/TRAC requires the following from storage: - Minimal conditions for performing long-term preservation of digital assets - Long Term Preservation of digital assets, even if the OAIS (repository) itself is not permanent or present.
  6. 6. DRI Storage Requirements: - Open Source/Open Standards - Independence - High Availability - Dynamically Configurable - Ease of Interoperability (Interfaces, APIs) - Data Security/Placement (Replication, Erasure coding, Placement, Tiering, Federation) - Self Contained - Commodity Hardware
  7. 7. Storage Solutions We Tested:
  8. 8. Why we didn't choose HDFS: - Interfaces limited. Not posix compliant due to immutable nature of filesystem. - Performance geared towards large data streams. I/O of many small files is poor. - Single point of failure and bottleneck at its Namenode. - Doesn’t provide any federation
  9. 9. Why we didn't choose iRODS: - Default Interfaces limited. No Restful, RBD. - Single point of failure at its iCAT metadata server - Overlapping functionality with Fedora Commons Why we didn't choose GPFS: - Default Interfaces limited. No Restful, RBD. - Data Replica limit of 2. - Closed source
  10. 10. Why we chose Ceph: - We like its distributed, clustered architecture - Provides complete high availability on install - Scales out horizontally to massive levels - Data Security: Distributed, Replicated - Many interface options - Rich, documented, multi-level APIs - Dynamically configurable - Very good Performance for general use (many small file I/O) - Solid release schedule, new features
  11. 11. Findings: HDFS iRODS Ceph GPFS API Yes Yes Yes Yes Fedora 3.6.x Driver Yes No No No Interface: Posix No Yes Yes Yes Interface: RBD No No Yes No Interface: RESTful Yes No Yes No Dynamic Configuration Yes Yes Yes Yes High Availability: Data Yes Yes Yes Yes High Availability: Service No No Yes Yes Max Raw Storage (PetaByte) >100 N/A >100 4 - 10^14 On-Read Data Checking No Yes No No Max Replicas 512 >2 ~2.1 Billion 2 Federation No Yes No Yes
  12. 12. DRI Infrastructure
  13. 13. Performance Poor performance with low number of OSDs (6) and replication.
  14. 14. Performance Adding OSDs (26) improves replicated performance Source: Diana Gudu, KIT Source:DianaGudu,KIT Source:DianaGudu,KIT
  15. 15. Features we want from Ceph: - Asynchronous Replication - Erasure Coding - Tiering - Multi-datacenter/Rados level async replication - Micro-services
  16. 16. Other Ceph Projects: - TCHPC: 100TB cluster used for backups - collaboration with KIT/PSNC: performance testing, WAN scale replication testing (Sync/Async)
  17. 17. DRI: www.dri.ie Trinity HPC: www.tchpc.tcd.ie Trinity College Dublin: www.tcd.ie Questions?
  18. 18. Links: Ceph: www.ceph.com HDFS: hadoop.apache.org IRODS: www.irods.org GPFS: www.ibm.com/systems/software/gpfs/ Project Hydra: projecthydra.org Fedora Commons: www.fedora-commons.org Apache SOLR: lucene.apache.org/solr/ HAProxy: haproxy.1wt.eu
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×