SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:


Published on

Sponsored workshop presentation from Paul Speciale, Amplidata
More at http://event.gigaom.com/structuredata/

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:

  1. Big “Unstructured” Data A Case for Optimized Object Storage Paul SpecialeFriday, July 27, 2012
  2. Storage facts and trends Recent studies estimate that data storage capacities will likely increase by over 30X in the coming decade to over 35 Zettabytes 35ZB High-capacity drives Less Staff / TB Unstructured Data Storage Consumption 30X Time 2020Friday, July 27, 2012
  3. Storage facts and trends But…. The number of qualified people to manage this huge volume of data will stay flat (~1.5X) Administrators will be expected to manage 20X more data each Efficiency: automate & reduce overhead Capcity / Budget ts en m re q ui Re e ag or St dget ag e Bu Stor TimeFriday, July 27, 2012
  4. Storage facts and trends • Much of that growth (80%) is driven by unstructured data • Billions of large objects and files Media Archives Online Images Large Files Medical Images Online Storage Online Movies 4Friday, July 27, 2012
  5. Storage facts and trends: Media & Entertainment Industry Example M&E is driving huge capacity requirements, both with file sizes and volume of files and storage capacities in use, driven by HD, 3D video formats: “Petabytes are peanuts” 3TB per hour for 4K video 5Friday, July 27, 2012
  6. Big Data for Analytics vs. Big “Unstructured” Data 6Friday, July 27, 2012
  7. Big Data for Analytics • In the 90’s, we experienced an explosion of data captured for analytics purposes: • Academic Research • Chemical R&D facilities • Travel industry • Geo-industry, oil & gas • Financial / Trading • Agriculture • In the 2000’s, online applications & social media triggered a flood of trend data 7Friday, July 27, 2012
  8. Big Data for Analytics • Data is captured as many small log files & concatenated as “Big Data” • Relational databases were not optimal: • Too much data, too big • Insufficient performance for analytics • This stimulated innovations: • Hadoop, MapReduce, GFS • XML databases • => This is Big Data for Analytics 8Friday, July 27, 2012
  9. Big Data Evolution • Today, Big Data trend refers to Big Data for Analytics & Big Unstructured Data: • Media • Streaming • Business • Scientific • Fundamentally different data but with lots of similarities • Immense capacities • Number of transactions or objects • Unstructured data is traditionally stored on host files systems but: • Host file systems impose fixed limits - do not scale up to the size we need • File systems do not meet performance requirements due to host limiting access 9Friday, July 27, 2012
  10. Big Unstructured Data • Most unstructured data is archived, often to tape (cost), then difficult to access • Volumes are increasing exponentially • Data archives are an organization & management burden (Grandma’s Attic) 10Friday, July 27, 2012
  11. Big Unstructured Data • Companies are starting to see the value of the data in their archives: • Documents of individuals can be valuable for others • Some companies have legal reasons to keep data available • Unexplored analytics opportunities • This data can be mined and monetized 11Friday, July 27, 2012
  12. Big Unstructured Data But how do store all this data in a cost efficient way? “Building cost-efficient Live Archives” 12Friday, July 27, 2012
  13. Big Unstructured Data What are the requirements? • Tape is a difficult option: access Disk Storage latency is key (online, low-latency access) • Data has to be always available online } + Open application API’s (App & Cloud-enabled) } • Direct interface to the applications + Ultra-high data durability (Erasure Coding) • Petabyte scalability • Extreme reliability, integrity = Optimized Object • Cost-efficient Storage • Security 13Friday, July 27, 2012
  14. Disk vs. Tape Tape has several obvious advantages over disk & there will always be use cases for tape But disks enable live archives with instant data accessibility More arguments for disk-based archives • Disks can be powered down • Tape requires replication to protect against media errors • Data integrity checking • Massive migration projects • … 14Friday, July 27, 2012
  15. Object Storage Simplifies this Problem • File System organization of data becomes a burden • File systems impose limitations on numbers of files & directories • Very time-consuming to organize data • Object Storage simplifies this problem Application Application Application • Flat “Namespaces” (not file systems) - without storage limits • Let’s the applications talk directly Object API to the Storage • Use “Object” application API’s to let applications directly manage objects & metadata • File Gateways can be used as a transition bridge • Bring legacy data and apps into Object Storage 15Friday, July 27, 2012
  16. Petabyte Scalability and Beyond Systems should scale BIG • Beyond petabytes of data – no built-in limits • Beyond billions of data objects Systems should scale uniformly • Add resources incrementally and grow as a Single System View • Manage from a “Single Pane of Glass” • Scale performance and capacity separately • Migration and seamless growth across newer generations of component technologies (processors, disk densities) 16Friday, July 27, 2012
  17. Ultra-High Levels of Data Integrity • Data needs to be archived for lifetimes • Expect “bit perfect” integrity to store gold-copy of critical assets • Consolidate multiple copies of data into a single highly-durable tier • Ensuring the integrity of long-term unstructured data archive requires new data protection algorithms, to: • Address the increasing capacity of disk drives • Solve issues related to long RAID rebuild windows “Object storage systems based on erasure-coding can not only protect data from higher numbers of drive failures, but also against the failure of entire storage modules.” 17Friday, July 27, 2012
  18. Big Unstructured Data What are the requirements? • Tape is a difficult option: access Disk Storage latency is key (online, low-latency access) • Data has to be always available online } + Open application API’s (App & Cloud-enabled) } • Direct interface to the applications • Petabyte scalability + Ultra-high data durability (Erasure Coding) • Extreme reliability, integrity • Cost-efficient = Optimized Object • Security Storage 18Friday, July 27, 2012
  19. Thank You! Paul Speciale, VP Products, Amplidata Inc. www.amplidata.comFriday, July 27, 2012
  20. Sponsored WorkshopFriday, July 27, 2012