Guiding ideas of the FACIT Storage Architecture: UTK Presentation


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • All the problems of long term preservations Multiplicity and complexity of data formats Highly distributed user community Need for historical data is routine Transparent transport of data to LoC
  • The view that technology refresh, data migration is at the center for the action is
  • IBP- Similar to Object Storage Device approach, T10; also similar to peer-to-peer approaches
  • This is a wide area example, but exNodes work the same way when you’re striping data across the depots of a cluster. Enables you to do software in raid.
  • 30 threads, 5MB blocks, 12.78MB per sec, 102.24Mb per sec
  • Guiding ideas of the FACIT Storage Architecture: UTK Presentation

    1. 1. Guiding ideas of the FACIT Storage Architecture Library of Congress Storage Vendor Meeting Sept 18, 2007 Terry Moore, University of Tennessee, Knoxville Larry Carver, University of California at Santa Barabara
    2. 2. What is FACIT <ul><li>FACIT – F ederated A rchive C yber i nfrastructure T estbed </li></ul><ul><li>FACIT: project of National Geospatial Data Archive (NDIIPP partner) </li></ul><ul><li>Goal of FACIT: Create a testbed to test a different approach to federated resource sharing, redundancy and access </li></ul><ul><li>FACIT partners: </li></ul><ul><ul><li>NGDA (UCSB and Stanford) </li></ul></ul><ul><ul><li>Logistical Networking (UTK) – network storage tech </li></ul></ul><ul><ul><li>REDDnet (Vanderbilt) – NSF funded infrastructure using LN for data intensive collaboration </li></ul></ul>
    3. 3. Typical design perspective recent content now take action “ The archive begins today” now + 100 years
    4. 4. FACIT design perspective old content content ancient content “ mid-century perspective” now - 50 now + 50 now take action
    5. 5. What archivist in the middle sees <ul><li>Repeated migrations across storage media and storage systems </li></ul><ul><ul><li>past and future – 20 to 30+ over a century </li></ul></ul><ul><li>Repeated migrations across archive systems </li></ul><ul><ul><li>each possibly necessitating transformation and reorganization of archived content </li></ul></ul><ul><li>Repeated handoffs between institutions </li></ul><ul><ul><li>each implementing different policies </li></ul></ul><ul><li>How can we create a “handoff process” that can be sustained? </li></ul><ul><ul><li>Design for interoperability and deployment scalability first </li></ul></ul><ul><ul><li>How do you do that? </li></ul></ul>
    6. 6. Generic storage stack The common interface “virtualizes” the technology beneath it What interface goes here? Issue: Whatever you choose will become the basis for storage interoperability for adopters LN hypothesis: Do it the way that the network people did it
    7. 7. “ Bits are bits” infrastructure for storage <ul><li>Standardize on what we have an adequate common model for </li></ul><ul><ul><li>Storage/buffer management </li></ul></ul><ul><ul><li>Coarse-grained data transfer </li></ul></ul><ul><li>Leave everything else to higher layers </li></ul><ul><ul><li>End-to-end services: checksums, encryption, error encoding, etc. </li></ul></ul><ul><li>Enable autonomy in wide area service creation: security, resource allocation, QoS guarantees… </li></ul><ul><li>Gain the benefits of interoperability today! </li></ul>One infrastructure serves all
    8. 8. Basic elements of the LN stack <ul><li>Highly generic, “best effort” protocol for using storage </li></ul><ul><ul><li>Generic -> doesn’t restrict applications </li></ul></ul><ul><ul><li>“ best effort”-> low burden on providers </li></ul></ul><ul><ul><li>Easy to port and deploy </li></ul></ul><ul><li>Metadata container bit-level structure </li></ul><ul><ul><li>Modeled on Unix inode </li></ul></ul><ul><ul><li>bit-level structure, control keys, … </li></ul></ul><ul><ul><li>XML encoded </li></ul></ul>
    9. 9. Sample exNodes Crossing administrative domains, sharing resources Question: Where is data object C? A B C 0 300 200 100 IBP Depots Network Tennessee UCSB Stanford REDDnet
    10. 10. New federation members? <ul><li>Add new depots </li></ul>LoC <ul><li>Rewrite the exNodes </li></ul><ul><li>Copy the data </li></ul>IBP Depots Network Tennessee UCSB Stanford REDDnet
    11. 11. LN file download IBP Depots exNode – 4 copies TCP streams
    12. 12. LN file download
    13. 13. REDDnet depot unit: COTS <ul><li>Dual core 2.4GHz AMD 64 X2 processors with 4GB of memory, </li></ul><ul><li>4 x750GB SATA2 drives in hot-swap bays </li></ul><ul><li>Dual GigE NICs.  </li></ul><ul><li>OS stored on a USB-header mounted transflash drive; all disk drives available for use </li></ul><ul><li>>$700 per TB </li></ul><ul><li>“ But there’s so much it doesn’t do!” True </li></ul><ul><li>Question: How much can we do in software on top? E.g. check-sums, error encoding, encryption, etc. </li></ul>
    14. 14. What is “data logistics”? <ul><li>“ Information is physical” -Though abstract in one sense, information exists only if its physically encoded. </li></ul><ul><li>When the data gets big relative to the resources you have, it’s physicality becomes clear – ”It does matter where you are.” </li></ul><ul><li>Data logistics is the management of the movement and positioning of digital data, and computing resources it requires, in order to enable people to take action at a given time and place to achieve some purpose. </li></ul><ul><li>The logistical challenges of data intensive collaborations require storage to be everywhere . But how can you make storage deployment that scalable? </li></ul>