Mark Cusack "Data Intensive Computing on the Amazon Cloud."

1,214 views
1,140 views

Published on

Mark Cusack's lightning talk at CloudCamp London #4 9th July 2009 see video at http://skillsmatter.com/podcast/cloud-grid/breako

Published in: Technology, Travel, Sports
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,214
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mark Cusack "Data Intensive Computing on the Amazon Cloud."

  1. 1. Structured Data Archiving: Key Issues 9th July 2009 – CloudCamp, London Mark Cusack Principal Architect RainStor
  2. 2. Ideal Use Case: Application Retirement Dev & Test Retirement Mission Critical Production System - On-premise Implementation End-of-Life Life Cycle Usage Cloud Bursting e.g reporting & peak processing
  3. 3. Data Volume Issues <ul><li>How to transfer in terabytes of data? </li></ul><ul><li>Structured data can be massively compressed before uploading to a storage cloud, e.g. Amazon S3: </li></ul>Network Connection Upload time (1 TB) Compressed (40:1) upload time DSL 166 days 4 days 10 Mbps 13 days 8 hours 1 Gbps Less than a day Less than an hour
  4. 4. Data Security Issues <ul><li>How to maintain data privacy and integrity? </li></ul><ul><li>Theoretical solution: homomorphic encryption!? </li></ul><ul><li>Reality: encrypt network pathways and data rest-points </li></ul><ul><li>Blind cloud storage </li></ul><ul><ul><li>Key should be generated by the application owner </li></ul></ul><ul><ul><li>Data should be encrypted on-premise, prior to transfer to the cloud </li></ul></ul><ul><li>Tamper-proofing and auditing </li></ul><ul><ul><li>Keep digests of database files off-cloud </li></ul></ul><ul><ul><li>Practical in the case of application retirement (read-only) </li></ul></ul>
  5. 5. Data Availability Issues <ul><li>How to ensure that the data is always available? </li></ul><ul><li>Make multiple copies of the data </li></ul><ul><ul><li>Data compression makes this cost-effective </li></ul></ul><ul><li>Employ emerging cloud interoperability standards </li></ul><ul><ul><li>De facto standards: Eucalyptus, Amazon Web Services </li></ul></ul><ul><ul><li>SNIA eXtensible Access Method (XAM): as an alternative cloud storage interoperability standard </li></ul></ul>
  6. 6. Data Query Issues <ul><li>How to query the data without compromising performance, security and accessibility? </li></ul><ul><li>Use a compute cloud to query the data </li></ul><ul><ul><li>E.g. run retired application reports on EC2 against S3 data </li></ul></ul><ul><li>Query directly against compressed data </li></ul><ul><ul><li>Problem becomes CPU-bound rather than IO-bound </li></ul></ul><ul><li>Pass encryption key to the cloud on a per-query basis </li></ul><ul><li>Provide an ODBC/JDBC interface to compute instance </li></ul>
  7. 7. More Information and Contact Details <ul><li>www.rainstor.com </li></ul><ul><li>twitter.com/rainstor </li></ul><ul><li>twitter.com/markcusack </li></ul><ul><li>[email_address] </li></ul>

×