SlideShare a Scribd company logo
Long-­‐Term	
  Storage	
  
Panel	
  Session	
  
Erik	
  Riedel,	
  EMC	
  
Library	
  of	
  Congress	
  Workshop	
  
September	
  2012	
  
top	
  picture	
  “Once	
  Blue”	
  by	
  Jesse	
  Wagstaff	
  via	
  flickr/cc	
  	
  
right	
  picture	
  by	
  AusNn	
  Marshall	
  via	
  flickr/cc	
  
revision	
  3	
  
Parameters	
  
•  Non-­‐compressible	
  data	
  
•  Long-­‐term	
  storage	
  
•  Very	
  high	
  reliability	
  
•  Request	
  rate	
  of	
  10%	
  per	
  year	
  
•  5,	
  20,	
  50	
  PB	
  in	
  2012,	
  2015,	
  2018	
  
Density	
  
2012	
   Disks	
  (raw)	
  @	
  3TB	
   Disks	
  (protected)	
   Racks	
  @	
  480	
  disks	
  
5	
  PB	
   1,700	
  disks	
   2,700	
  disks	
   6	
  racks	
  
20	
  PB	
   6,700	
  disks	
   11,000	
  disks	
   23	
  racks	
  
50	
  PB	
   17,000	
  disks	
   27,000	
  disks	
   56	
  racks	
  
Density	
  
2012	
   Disks	
  (raw)	
  @	
  3TB	
   Disks	
  (protected)	
   Racks	
  @	
  480	
  disks	
  
5	
  PB	
   1,700	
  disks	
   2,700	
  disks	
   6	
  racks	
  
20	
  PB	
   6,700	
  disks	
   11,000	
  disks	
   23	
  racks	
  
50	
  PB	
   17,000	
  disks	
   27,000	
  disks	
   56	
  racks	
  
2015	
   Disks	
  (raw)	
  @	
  6TB	
   Disks	
  (protected)	
   Racks	
  @	
  600	
  disks	
  
5	
  PB	
   830	
  disks	
   1,300	
  disks	
   3	
  racks	
  
20	
  PB	
   3,300	
  disks	
   5,300	
  disks	
   9	
  racks	
  
50	
  PB	
   8,300	
  disks	
   13,000	
  disks	
   23	
  racks	
  
Density	
  
2012	
   Disks	
  (raw)	
  @	
  3TB	
   Disks	
  (protected)	
   Racks	
  @	
  480	
  disks	
  
5	
  PB	
   1,700	
  disks	
   2,700	
  disks	
   6	
  racks	
  
20	
  PB	
   6,700	
  disks	
   11,000	
  disks	
   23	
  racks	
  
50	
  PB	
   17,000	
  disks	
   27,000	
  disks	
   56	
  racks	
  
2015	
   Disks	
  (raw)	
  @	
  6TB	
   Disks	
  (protected)	
   Racks	
  @	
  600	
  disks	
  
5	
  PB	
   830	
  disks	
   1,300	
  disks	
   3	
  racks	
  
20	
  PB	
   3,300	
  disks	
   5,300	
  disks	
   9	
  racks	
  
50	
  PB	
   8,300	
  disks	
   13,000	
  disks	
   23	
  racks	
  
2018	
   Disks	
  (raw)	
  @	
  10TB	
   Disks	
  (protected)	
   Racks	
  @	
  600	
  disks	
  
5	
  PB	
   500	
  disks	
   800	
  disks	
   2	
  racks	
  
20	
  PB	
   2,000	
  disks	
   3,200	
  disks	
   6	
  racks	
  
50	
  PB	
   5,000	
  disks	
   8,000	
  disks	
   14	
  racks	
  
Performance	
  
2012	
   10%/yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
Performance	
  
2012	
   10%/yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2012	
   10%/2day	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   2.9	
  GB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   11	
  GB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   29	
  GB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
Performance	
  
2012	
   10%/yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2018	
   10%/2day	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
  BW	
   Days-­‐to-­‐fill	
  
5	
  PB	
   2.9	
  GB/s	
   800	
   80	
  GB/s	
   2	
  	
   10	
  GB/s	
   3.3	
  GB/s	
   17	
  
20	
  PB	
   11	
  GB/s	
   3,200	
   320	
  GB/s	
   6	
   30	
  GB/s	
   10	
  GB/s	
   23	
  
50	
  PB	
   29	
  GB/s	
   8,000	
   800	
  GB/s	
   14	
   70	
  GB/s	
   23	
  GB/s	
   25	
  
Cost	
  
2012	
   10%yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
Cost	
  
2012	
   10%yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2012	
   $/month	
  @	
  $0.01/GB	
  
5	
  PB	
   $50,000/month	
  
20	
  PB	
   $200,000/month	
  
50	
  PB	
   $500,000/month	
  
Cost	
  if	
  using	
  e.g.	
  “cold”	
  public	
  cloud	
  storage	
  
Cost	
  
2012	
   10%yr	
   Disks	
   Disk	
  BW	
   Racks	
   Bandwidth	
   Actual	
   Days-­‐to-­‐fill	
  
5	
  PB	
   16	
  MB/s	
   2,700	
   200	
  GB/s	
   6	
  	
   30	
  GB/s	
   3	
  GB/s	
   19	
  
20	
  PB	
   63	
  MB/s	
   11,000	
   1.1	
  TB/s	
   23	
   115	
  GB/s	
   11	
  GB/s	
   20	
  
50	
  PB	
   159	
  MB/s	
   27,000	
   2.7	
  TB/s	
   56	
   280	
  GB/s	
   28	
  GB/s	
   21	
  
2012	
   sqN/person	
   $/sqN	
   $/month	
  
20	
  employees	
   90	
   $48	
  	
   $86,000/month	
   Washington,	
  DC	
  
80	
  employees	
   75	
   $48	
   $288,000/month	
   Washington,	
  DC	
  
200	
  employees	
   75	
   $24	
   $360,000/month	
   Minneapolis,	
  MN	
  
2012	
   $/month	
  @	
  $0.01/GB	
  
5	
  PB	
   $50,000/month	
  
20	
  PB	
   $200,000/month	
  
50	
  PB	
   $500,000/month	
  
Cost	
  if	
  using	
  e.g.	
  “cold”	
  public	
  cloud	
  storage	
  
For	
  comparison,	
  the	
  cost	
  to	
  “store”	
  
20	
  librarians	
  or	
  data	
  scienNsts	
  
AssumpNons	
  
•  Data	
  protecNon	
  in	
  a	
  single	
  data	
  center,	
  using	
  an	
  erasure-­‐coding	
  
scheme	
  at	
  1.6x	
  overhead	
  
•  480	
  drive	
  racks	
  in	
  2012	
  (40U)	
  
•  600	
  drive	
  racks	
  in	
  2015	
  and	
  2018	
  (50+U)	
  
•  10%/year	
  access	
  assumes	
  10%	
  of	
  total	
  data	
  is	
  accessed	
  in	
  even	
  
distribuNon	
  over	
  365	
  days/year,	
  24	
  hours/day	
  –	
  opNmisNc	
  
•  10%/2day	
  access	
  assumes	
  10%	
  of	
  data	
  is	
  accessed	
  on	
  only	
  2	
  days	
  
per	
  year	
  (say	
  Thanksgiving	
  and	
  Xmas)	
  –	
  very	
  bursty	
  
•  Bandwidth	
  is	
  theoreNcal	
  bandwidth	
  at	
  40	
  Gb/s	
  per	
  rack	
  (4x	
  10	
  GbE)	
  
•  Actual	
  bandwidth	
  is	
  1/10	
  of	
  theoreNcal	
  maximum	
  for	
  2012	
  and	
  
2015;	
  up	
  to	
  1/3	
  theoreNcal	
  max	
  for	
  2018	
  (sohware	
  improvements)	
  
•  sqh	
  per	
  person	
  and	
  $/sqh	
  references	
  
hip://www.inc.com/news/arNcles/2010/10/washington-­‐dc-­‐rents-­‐top-­‐those-­‐in-­‐nyc.html	
  
hip://newsfeed.Nme.com/2011/02/08/youre-­‐not-­‐imagining-­‐it-­‐your-­‐cubicle-­‐is-­‐gekng-­‐smaller/	
  
References	
  
•  Why	
  access	
  to	
  data	
  maiers,	
  not	
  just	
  “dark	
  storage”,	
  
but	
  wide	
  access	
  to	
  electronic	
  data:	
  
–  The	
  Internet	
  Archive	
  
–  hip://archive.org/about/	
  
–  History	
  of	
  the	
  Internet,	
  sNll	
  online	
  aher	
  20	
  years	
  
–  hip://www.cs.cmu.edu/~riedel/library/birthday.html	
  
	
  (from	
  April	
  2003,	
  LoC	
  workshop	
  on	
  Digital	
  PreservaNon)	
  
•  What	
  about	
  Flash?	
  
–  Death	
  of	
  Disks	
  (has	
  been	
  widely	
  exaggerated)	
  
–  hip://www.cs.cmu.edu/~riedel/#HECFSIO2011	
  
–  How	
  to	
  Build	
  Big	
  Storage	
  as	
  a	
  Cloud	
  
–  hip://storageconference.org/2012/PresentaNons/R00.Keynote.pdf	
  
Backup	
  
What	
  About	
  Tape?	
  
pictures	
  by	
  Gill	
  Wildman	
  via	
  flickr/cc	
  
What	
  About	
  Tape?	
  
•  Tapes	
  are	
  not	
  a	
  commodity	
  technology	
  
•  2011	
  total	
  worldwide	
  market	
  for	
  tape	
  cartridges	
  
is	
  about	
  8m	
  units	
  (just	
  under	
  $1b	
  annual	
  
revenue)	
  
•  Compare	
  to	
  the	
  HDD	
  business	
  at	
  650m	
  units	
  in	
  
2010	
  (close	
  to	
  $40b	
  annual	
  revenue)	
  
•  80	
  disk	
  drives	
  are	
  manufactured	
  for	
  each	
  tape	
  
cartridge;	
  robots	
  are	
  complicated	
  
•  Fits	
  parNcular	
  applicaNon	
  segments	
  very	
  well,	
  but	
  
is	
  not	
  a	
  general-­‐purpose	
  soluNon	
  
hip://www.storagenewsleier.com/news/tapes/sccg-­‐ww-­‐tape-­‐market-­‐lto-­‐1q11	
  
hip://techreport.com/discussions.x/20890	
  
David	
  Anderson,	
  James	
  Dykes,	
  Erik	
  Riedel	
  “SCSI	
  vs.	
  ATA	
  -­‐	
  More	
  than	
  
an	
  interface”	
  2nd	
  Conference	
  on	
  File	
  and	
  Storage	
  Technology	
  (FAST).	
  
San	
  Francisco,	
  CA.	
  April	
  2003.	
  www.cs.cmu.edu/~riedel/#SCSIvsATA	
  

More Related Content

Similar to Long-Term Storage - Panel Session @ Library of Congress Workshop

Storage: Alternate Futures
Storage: Alternate FuturesStorage: Alternate Futures
Storage: Alternate Futures
小新 制造
 
Optimizing Your WAN Bandwidth Has Immediate ROI
Optimizing Your WAN Bandwidth Has Immediate ROIOptimizing Your WAN Bandwidth Has Immediate ROI
Optimizing Your WAN Bandwidth Has Immediate ROI
Signiant
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
Amazon Web Services
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
Amazon Web Services
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
Amazon Web Services
 
Accelerating forensic and incident response workflow: the case for a new stan...
Accelerating forensic and incident response workflow: the case for a new stan...Accelerating forensic and incident response workflow: the case for a new stan...
Accelerating forensic and incident response workflow: the case for a new stan...
Bradley Schatz
 
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMCloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
RightScale
 
Bandwidthreport
BandwidthreportBandwidthreport
Bandwidthreport
ssuser962e80
 
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
Amazon Web Services Korea
 
Chapter 8a: PowerPoint Presentation for External Hard Drives
Chapter 8a: PowerPoint Presentation for External Hard DrivesChapter 8a: PowerPoint Presentation for External Hard Drives
Chapter 8a: PowerPoint Presentation for External Hard Drives
MichaelHernandez217
 
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
Red_Hat_Storage
 
History of data storage: Infographic
History of data storage: InfographicHistory of data storage: Infographic
History of data storage: Infographic
WebFX
 
Blue Ray Disc
Blue Ray DiscBlue Ray Disc
Blue Ray Disc
Jaydeep Palekar
 
10tb hard drive
10tb hard drive10tb hard drive
10tb hard drive
ssuser1eca7d
 
Cis1 202d-ch8b-project-valenzuela-zibouche
Cis1 202d-ch8b-project-valenzuela-zibouche  Cis1 202d-ch8b-project-valenzuela-zibouche
Cis1 202d-ch8b-project-valenzuela-zibouche
MacarenaValenzuela14
 
Blu ray disc by gautam
Blu ray disc by gautamBlu ray disc by gautam
Blu ray disc by gautamGAUTAM
 
Presentation on Blu ray disc by gautam
Presentation on Blu ray disc by gautamPresentation on Blu ray disc by gautam
Presentation on Blu ray disc by gautamGAUTAM
 

Similar to Long-Term Storage - Panel Session @ Library of Congress Workshop (20)

Storage devices
Storage devicesStorage devices
Storage devices
 
Storage devices
Storage devicesStorage devices
Storage devices
 
Storage: Alternate Futures
Storage: Alternate FuturesStorage: Alternate Futures
Storage: Alternate Futures
 
Optimizing Your WAN Bandwidth Has Immediate ROI
Optimizing Your WAN Bandwidth Has Immediate ROIOptimizing Your WAN Bandwidth Has Immediate ROI
Optimizing Your WAN Bandwidth Has Immediate ROI
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
 
Deep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block StoreDeep Dive on Amazon Elastic Block Store
Deep Dive on Amazon Elastic Block Store
 
Accelerating forensic and incident response workflow: the case for a new stan...
Accelerating forensic and incident response workflow: the case for a new stan...Accelerating forensic and incident response workflow: the case for a new stan...
Accelerating forensic and incident response workflow: the case for a new stan...
 
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBMCloud Storage Comparison: AWS vs Azure vs Google vs IBM
Cloud Storage Comparison: AWS vs Azure vs Google vs IBM
 
2879 771435
2879 7714352879 771435
2879 771435
 
Bandwidthreport
BandwidthreportBandwidthreport
Bandwidthreport
 
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
 
Chapter 8a: PowerPoint Presentation for External Hard Drives
Chapter 8a: PowerPoint Presentation for External Hard DrivesChapter 8a: PowerPoint Presentation for External Hard Drives
Chapter 8a: PowerPoint Presentation for External Hard Drives
 
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
 
History of data storage: Infographic
History of data storage: InfographicHistory of data storage: Infographic
History of data storage: Infographic
 
Blue Ray Disc
Blue Ray DiscBlue Ray Disc
Blue Ray Disc
 
10tb hard drive
10tb hard drive10tb hard drive
10tb hard drive
 
Cis1 202d-ch8b-project-valenzuela-zibouche
Cis1 202d-ch8b-project-valenzuela-zibouche  Cis1 202d-ch8b-project-valenzuela-zibouche
Cis1 202d-ch8b-project-valenzuela-zibouche
 
Blu ray disc by gautam
Blu ray disc by gautamBlu ray disc by gautam
Blu ray disc by gautam
 
Presentation on Blu ray disc by gautam
Presentation on Blu ray disc by gautamPresentation on Blu ray disc by gautam
Presentation on Blu ray disc by gautam
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Long-Term Storage - Panel Session @ Library of Congress Workshop

  • 1. Long-­‐Term  Storage   Panel  Session   Erik  Riedel,  EMC   Library  of  Congress  Workshop   September  2012   top  picture  “Once  Blue”  by  Jesse  Wagstaff  via  flickr/cc     right  picture  by  AusNn  Marshall  via  flickr/cc   revision  3  
  • 2. Parameters   •  Non-­‐compressible  data   •  Long-­‐term  storage   •  Very  high  reliability   •  Request  rate  of  10%  per  year   •  5,  20,  50  PB  in  2012,  2015,  2018  
  • 3. Density   2012   Disks  (raw)  @  3TB   Disks  (protected)   Racks  @  480  disks   5  PB   1,700  disks   2,700  disks   6  racks   20  PB   6,700  disks   11,000  disks   23  racks   50  PB   17,000  disks   27,000  disks   56  racks  
  • 4. Density   2012   Disks  (raw)  @  3TB   Disks  (protected)   Racks  @  480  disks   5  PB   1,700  disks   2,700  disks   6  racks   20  PB   6,700  disks   11,000  disks   23  racks   50  PB   17,000  disks   27,000  disks   56  racks   2015   Disks  (raw)  @  6TB   Disks  (protected)   Racks  @  600  disks   5  PB   830  disks   1,300  disks   3  racks   20  PB   3,300  disks   5,300  disks   9  racks   50  PB   8,300  disks   13,000  disks   23  racks  
  • 5. Density   2012   Disks  (raw)  @  3TB   Disks  (protected)   Racks  @  480  disks   5  PB   1,700  disks   2,700  disks   6  racks   20  PB   6,700  disks   11,000  disks   23  racks   50  PB   17,000  disks   27,000  disks   56  racks   2015   Disks  (raw)  @  6TB   Disks  (protected)   Racks  @  600  disks   5  PB   830  disks   1,300  disks   3  racks   20  PB   3,300  disks   5,300  disks   9  racks   50  PB   8,300  disks   13,000  disks   23  racks   2018   Disks  (raw)  @  10TB   Disks  (protected)   Racks  @  600  disks   5  PB   500  disks   800  disks   2  racks   20  PB   2,000  disks   3,200  disks   6  racks   50  PB   5,000  disks   8,000  disks   14  racks  
  • 6. Performance   2012   10%/yr   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21  
  • 7. Performance   2012   10%/yr   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2012   10%/2day   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   2.9  GB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   11  GB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   29  GB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21  
  • 8. Performance   2012   10%/yr   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2018   10%/2day   Disks   Disk  BW   Racks   Bandwidth   Actual  BW   Days-­‐to-­‐fill   5  PB   2.9  GB/s   800   80  GB/s   2     10  GB/s   3.3  GB/s   17   20  PB   11  GB/s   3,200   320  GB/s   6   30  GB/s   10  GB/s   23   50  PB   29  GB/s   8,000   800  GB/s   14   70  GB/s   23  GB/s   25  
  • 9. Cost   2012   10%yr   Disks   Disk  BW   Racks   Bandwidth   Actual   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21  
  • 10. Cost   2012   10%yr   Disks   Disk  BW   Racks   Bandwidth   Actual   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2012   $/month  @  $0.01/GB   5  PB   $50,000/month   20  PB   $200,000/month   50  PB   $500,000/month   Cost  if  using  e.g.  “cold”  public  cloud  storage  
  • 11. Cost   2012   10%yr   Disks   Disk  BW   Racks   Bandwidth   Actual   Days-­‐to-­‐fill   5  PB   16  MB/s   2,700   200  GB/s   6     30  GB/s   3  GB/s   19   20  PB   63  MB/s   11,000   1.1  TB/s   23   115  GB/s   11  GB/s   20   50  PB   159  MB/s   27,000   2.7  TB/s   56   280  GB/s   28  GB/s   21   2012   sqN/person   $/sqN   $/month   20  employees   90   $48     $86,000/month   Washington,  DC   80  employees   75   $48   $288,000/month   Washington,  DC   200  employees   75   $24   $360,000/month   Minneapolis,  MN   2012   $/month  @  $0.01/GB   5  PB   $50,000/month   20  PB   $200,000/month   50  PB   $500,000/month   Cost  if  using  e.g.  “cold”  public  cloud  storage   For  comparison,  the  cost  to  “store”   20  librarians  or  data  scienNsts  
  • 12. AssumpNons   •  Data  protecNon  in  a  single  data  center,  using  an  erasure-­‐coding   scheme  at  1.6x  overhead   •  480  drive  racks  in  2012  (40U)   •  600  drive  racks  in  2015  and  2018  (50+U)   •  10%/year  access  assumes  10%  of  total  data  is  accessed  in  even   distribuNon  over  365  days/year,  24  hours/day  –  opNmisNc   •  10%/2day  access  assumes  10%  of  data  is  accessed  on  only  2  days   per  year  (say  Thanksgiving  and  Xmas)  –  very  bursty   •  Bandwidth  is  theoreNcal  bandwidth  at  40  Gb/s  per  rack  (4x  10  GbE)   •  Actual  bandwidth  is  1/10  of  theoreNcal  maximum  for  2012  and   2015;  up  to  1/3  theoreNcal  max  for  2018  (sohware  improvements)   •  sqh  per  person  and  $/sqh  references   hip://www.inc.com/news/arNcles/2010/10/washington-­‐dc-­‐rents-­‐top-­‐those-­‐in-­‐nyc.html   hip://newsfeed.Nme.com/2011/02/08/youre-­‐not-­‐imagining-­‐it-­‐your-­‐cubicle-­‐is-­‐gekng-­‐smaller/  
  • 13. References   •  Why  access  to  data  maiers,  not  just  “dark  storage”,   but  wide  access  to  electronic  data:   –  The  Internet  Archive   –  hip://archive.org/about/   –  History  of  the  Internet,  sNll  online  aher  20  years   –  hip://www.cs.cmu.edu/~riedel/library/birthday.html    (from  April  2003,  LoC  workshop  on  Digital  PreservaNon)   •  What  about  Flash?   –  Death  of  Disks  (has  been  widely  exaggerated)   –  hip://www.cs.cmu.edu/~riedel/#HECFSIO2011   –  How  to  Build  Big  Storage  as  a  Cloud   –  hip://storageconference.org/2012/PresentaNons/R00.Keynote.pdf  
  • 15. What  About  Tape?   pictures  by  Gill  Wildman  via  flickr/cc  
  • 16. What  About  Tape?   •  Tapes  are  not  a  commodity  technology   •  2011  total  worldwide  market  for  tape  cartridges   is  about  8m  units  (just  under  $1b  annual   revenue)   •  Compare  to  the  HDD  business  at  650m  units  in   2010  (close  to  $40b  annual  revenue)   •  80  disk  drives  are  manufactured  for  each  tape   cartridge;  robots  are  complicated   •  Fits  parNcular  applicaNon  segments  very  well,  but   is  not  a  general-­‐purpose  soluNon   hip://www.storagenewsleier.com/news/tapes/sccg-­‐ww-­‐tape-­‐market-­‐lto-­‐1q11   hip://techreport.com/discussions.x/20890  
  • 17. David  Anderson,  James  Dykes,  Erik  Riedel  “SCSI  vs.  ATA  -­‐  More  than   an  interface”  2nd  Conference  on  File  and  Storage  Technology  (FAST).   San  Francisco,  CA.  April  2003.  www.cs.cmu.edu/~riedel/#SCSIvsATA