Your SlideShare is downloading. ×
Did you really_want_that_data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Did you really_want_that_data

2,501
views

Published on

A review of past work on data loss in large scale systems and a discussion on its implications for Apache Hadoop, including proposals for operations processes and future source code improvements. …

A review of past work on data loss in large scale systems and a discussion on its implications for Apache Hadoop, including proposals for operations processes and future source code improvements.

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,501
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Problem: is the data safe? Memory Network Storage Summary Did you really want that data? Steve Loughran1 1 Hewlett-Packard LaboratoriesBristol Hadoop & NoSQL Workshop, September 2011 Steve Loughran Did you really want that data?
  • 2. Problem: is the data safe? Memory Network Storage SummaryOutline 1 Problem: is the data safe? 2 Memory 3 Network 4 Storage Steve Loughran Did you really want that data?
  • 3. Problem: is the data safe? Memory Network Storage SummaryIs data in Apache™ Hadoop™ safe? 1 Can data get lost or corrupted? 2 Can this be detected and corrected? 3 Where are the risks: RAM, Network, Storage? 4 What about Hadoop itself? 5 What can we do about this? Steve Loughran Did you really want that data?
  • 4. Problem: is the data safe? Memory Network Storage SummaryMemory: risk grows linearly per GB of RAM Microsoft 2011 study of consumer PCs: correlation with overclocking & CPU cycles. P(recurrence) = 0.3. [Nightingale 2011]. CERN: 1-bit errors unexpectedly low; 2-bit errors found when none expected [Panzer-Steindel 2007]. Google: recurrent “hard” errors dominate; “chip-kill” best ECC. 8% of DRAMs have ≥1 error/year. [Schroeder 2009]. “In more than 85% of the cases a correctable error is followed by at least one more correctable error in the same month” Steve Loughran Did you really want that data?
  • 5. Problem: is the data safe? Memory Network Storage SummaryMemory: Reduce the risk 1 Use Chip-kill/Chipspare ECC for the servers that matter. 2 Burn-in tests to read/write memory patterns. 3 Monitor ECC failures and swap DIMMs with recurrent problems. 4 Data-scrubbing for main memory? Steve Loughran Did you really want that data?
  • 6. Problem: is the data safe? Memory Network Storage SummaryNetwork Issues Risk: undetected corruption Ethernet: flawed CRC32 - [Stone 2000]. IPv4: CRC32. IPv6: no checksum. TCP + UDP : weak 16 bit additive sum only. HTTP: optional content-length header. Steve Loughran Did you really want that data?
  • 7. Problem: is the data safe? Memory Network Storage SummaryRecommendations for Hadoop 1 Explore Jumbo Ethernet frames. 2 Servlets/JSP pages to send content-length headers. 3 HTTP client code to verify content-length. 4 Consider Content-MD5 header [RFC1864] 5 Servlets/JSP pages to disable caching. Steve Loughran Did you really want that data?
  • 8. Problem: is the data safe? Memory Network Storage SummaryStorage HDFS saves data to disks in 64-2048 MB Blocks. Each block is replicated to three or more servers (usually). Each block has a CRC checksum stored alongside it. Block checksums are checked on block reads. Blocks are verified in idle times; once/week is apparently normal. Checksum failures trigger re-replication of good copies. Steve Loughran Did you really want that data?
  • 9. Problem: is the data safe? Memory Network Storage SummaryHard Disk Drives: The Good, the Bad and the Ugly[Elerath 2007] Head fly height Head/platter contact: scratches and thermal asperities Side-Track Erasure: may increase with density Tracking problems: “sector not found” Drive electronics Steve Loughran Did you really want that data?
  • 10. Problem: is the data safe? Memory Network Storage SummaryAre Disks the Dominant Contributor for Storage Failures?Jiang 2008: Breakdown of failures in Netapp server classes Steve Loughran Did you really want that data?
  • 11. Problem: is the data safe? Memory Network Storage SummaryMy interpretation of [Jiang 2008] 1 Disk failures are only a part of the problem. 2 High end interconnects only help once they’ve got redundancy. 3 RAID assumes the SPOF is the disk and only corrects for that. Steve Loughran Did you really want that data?
  • 12. Problem: is the data safe? Memory Network Storage SummaryBetween HDD and HDFS Physical Interconnect [Jiang 2008] Controller/HDD incompatibilities [Panzer-Steindel 2007, Ghemawat 2003] DMA Operating System Device Drivers OS-level Filesystem Steve Loughran Did you really want that data?
  • 13. Problem: is the data safe? Memory Network Storage SummaryHadoop Itself Risk of bugs in Hadoop HDFS library Race conditions Intra-server checksums, not inter-server Past: HDFS replicator under-replicating Pre 0.20.204: append unreliable Risk of bugs in JVM versions Namenode log corruption Only HDFS data is scrubbed; less efficient than SCSI/SATA VERIFY commands Steve Loughran Did you really want that data?
  • 14. Problem: is the data safe? Memory Network Storage SummaryIs your data safe? P(single-block-corrupt) may increase with larger block sizes. Time to recover increases with larger block sizes (disk I/O bound for a single block). P(all-blocks-corrupt) may increase with larger block sizes Some compression schemes (gzip) hard to recover from single bit corruption. How will the layers up the Hadoop stack cope? Yahoo! show risk is small, but non-zero [Radia 2011]. Steve Loughran Did you really want that data?
  • 15. Problem: is the data safe? Memory Network Storage SummaryUser actions One immediate option Replicate critical files at 4x. Steve Loughran Did you really want that data?
  • 16. Problem: is the data safe? Memory Network Storage SummaryOperational actions 1 Burn-in tests. 2 Monitor SMART errors and any reported corrupt blocks. 3 Decommission any disk with SMART errors or corrupt blocks offline immediately. 4 Test outside Hadoop: SATA VERIFY, others? 5 Use LZO compression. 6 Use ext4 w/ journalling. 7 Share stats w/ other HDFS users. One issue: if its an interconnect problem, does a disk swap fix it? Steve Loughran Did you really want that data?
  • 17. Problem: is the data safe? Memory Network Storage SummaryHDFS Source Enhancements Add option to SATA VERIFY critical data after a write. Add methods decommission/recommission a single drive. Add monitoring of spill data corruption. Tools to recover from corrupted LZO files. Tool for a bit-by-vote over 3+ inconsistent files Add a means to have small block sequences hosted on same node (assuming Federated HDFS enables small blocks in large clusters). Leave corrupt blocks alone to prevent physical sector reuse? Steve Loughran Did you really want that data?
  • 18. Problem: is the data safe? Memory Network Storage SummaryProposal: HDFS Fault Injection Deliberately corrupt mutiple HDFS block replicas. Hand corrupt blocks to the layers above. Report: errors, timeouts on read/write operations. Encourage upper layers to support & test recovery Steve Loughran Did you really want that data?
  • 19. Problem: is the data safe? Memory Network Storage SummarySummary: data can get corrupted Memory is a problem but low risk outside the SPOFs. Networking is manageable. Storage is the threat —all the way down. Operations tactics can mitigate this. Scope to improve HDFS; Hadoop networking. Fault injection could stress upper layers better. Steve Loughran Did you really want that data?