SlideShare a Scribd company logo
1 of 27
Facebook’s Approach to Big Data
Storage Challenge


Weiyan Wang
Software Engineer (Data Infrastructure – HStore)
March 1 2013
Agenda
1   Data Warehouse Overview and Challenge

2   Smart Retention

3   Sort Before Compression

4   HDFS Raid

5   Directory XOR & Compaction

6   Q&A
Life of a tag in Data Warehouse
        Periodic Analysis                                      Adhoc Analysis
                                nocron
  Daily report on count of                       hipal          Count photos tagged by
photo tags by country (1day)                                  females age 20-25 yesterday

                                                             Scrapes
                                                            User info reaches
                                         Warehouse          Warehouse (1day)        UDB

                               copier/loader   Log line reaches            User tags
                                               warehouse (1hr)              a photo

                       puma
  Realtime Analytics                  Scribe Log Storage        www.facebook.com
 Count users tagging photos              Log line reaches         Log line generated:
   in the last hour (1min)                 Scribeh (10s)          <user_id, photo_id>
History (2008/03-2012/03)
Data, Data, and more Data


         Facebook                 Scribe Data/    Nodes in
                    Queries/Day                              Size (Total)
           Users                      Day        Warehouse




Growth      14X         60X          250X          260X        2500X
Directions to handle data growth problem
• Improve the software
•   HDFS Federation
•   Prism

• Improve storage efficiency
•   Store more data without increasing capacity
•   More and more important, translate into millions of
    dollars saving.
Ways to Improve Storage Efficiency
• Better capacity management

• Reduce space usage of Hive tables

• Reduce replication factor of data
Smart Retention – Motivation
• Hive table “retention” metadata
 •   Partitions older than retention value are automatically
     purged by system

• Table owners are unaware of table usage
 •   Difficult to set retention value right at the beginning.

• Improper retention setting may waste spaces
 •   Users only accessed recent 30-day partitions of a 3-
     month-retention table
Smart Retention
• Add a post-execute hook that logs table/partition
  names and query start time to MySQL.

• Calculate the “empirical-retention” per table
  Given a partition P whose creation time is CTP:
  Data_age_at_last_queryP =
              max{StartTimeQ - CTP | ∀query Q
  accesses P}
  Given a table T:
  Empirical_retentionT =
            max{Data_age_at_last_queryP | ∀ P ∈ T}
Smart Retention
• Inform Empirical_retentionT to table owners with a
  call to action:
 •   Accept the empirical value and change retention
 •   Review table query history, figure out better setting

• After 2weeks, the system will archive partitions
  that are older than Empirical_retentionT
 •   Free up spaces after partitions get archived
 •   Users need to restore archived data for querying
Smart Retention – Things Learned
• Table query history enables table owners to
  identify outliers:
 •   A table is queried mostly < 32 days olds data but there
     was one time a 42 days old partition was accessed

• Prioritize tables with the most space savings
 •   Save 8PB from the TOP 100 tables!
Sort Before Compression - Motivation
• In RCFile format, data are stored in columns inside
  every row block
 •   Sort by one or two columns with lots of duplicate values
     reduces final compressed data size

• Trade extra computation for space saving
Sort Before Compression
• Identify the best column to sort
 •   Take a sample of table and sort it by every column. Pick
     the one with the most space saving.

• Transfer target partitions from service clusters to
  compute clusters
• Sort them into compressed RCFile format.
• Sorted partitions are transferred back to service
  clusters to replace original ones
How we sort
set hive.exec.reducers.max=1024;
set hive.io.rcfile.record.buffer.size=67108864;
 INSERT OVERWRITE TABLE hive_table PARTITION
(ds='2012-08-06',source_type='mobile_sort')
     SELECT `(ds|source_type)?+.+` from hive_table
     WHERE ds='2012-08-06' and source_type='mobile'
     DISTRIBUTE BY IF (userid <> 0 AND NOT (userid
is null), userid, CAST(RAND() AS STRING))
     SORT BY userid, ip_address;
Sort Before Compression – Things Learned
• Sorting achieves >40% space saving!

• It’s important to verify data correctness
 •   Compare original and sorted partitions’ hash values
 •   Find a hive bug

• Sort cold data first, and gradually move to hot
  data
HDFS Raid
In HDFS, data are 3X replicated
                                    Meta operations       NameNode
  /warehouse/file1                                        (Metadata)

                               Client
   1     2       3

                     Read/Write Data



             1           3                            2                3


                 2             1         3            1                2

        DataNode 1     DataNode 2       DataNode 3 DataNode 4     DataNode 5
HDFS Raid – File-level XOR (10, 1)
                 Before                                    After
            /warehouse/file1                          /warehouse/file1

1   2 3     4   5   6   7   8   9   10   1    2 3      4   5        6   7   8   9   10

1   2   3   4   5   6   7   8   9   10   1    2   3    4   5        6   7   8   9   10

1   2   3   4   5   6   7   8   9   10
                                              Parity file: /raid/warehouse/file1


                                                               11

                                    (10, 1)                    11

                3X                                         2.2X
HDFS Raid
• What if a file has 15 blocks
•   Treat as 20 blocks and generate one parity with 2
    blocks
•   Replication factor = (15*2+2*2)/15 = 2.27

• Reconstruction
•   Online reconstruction – DistributedRaidFileSystem
•   Offline reconstruction – RaidNode

• Block Placement
HDFS Raid – File-level Reed Solomon
(10, 4) Before
      /warehouse/file1
                                After
                           /warehouse/file1

1   2 3     4   5   6   7   8   9   10   1     2 3     4   5   6   7   8    9   10

1   2   3   4   5   6   7   8   9   10

        3                                    Parity file: /raidrs/warehouse/file1
1   2       4   5   6   7   8   9   10


                                                      11   12 13 14


                                    (10, 4)
                3X                                         1.4X
HDFS Raid – Hybrid Storage
                                       Even older
                              ×1.4
                     RS                3months older
                     Raided
                          ×2.2
                                       1day older
                   XOR Raided
                        ×3
                                       Born
Life of file /warehouse/facebook.jpg
HDFS Raid – Things Learned
• Replication factor 3 ->2.65 (12% space saving)

• Avoid flooding namenode with requests
•   Daily pipeline scans fsimage to pick raidable files
    rather than recursively search from namenode

• Small files disallow more replication reduction
•   50% of files in the warehouse have only 1 or 2
    blocks. They are too small to be raided.
Raid Warm Small Files: Directory level XOR
                    Before                                       After
                          /data/file3                                    /data/file3
/data/file1   /data/file2        /data/file4   /data/file1   /data/file2        /data/file4


 1   2 3      4    5   6    7    8   9   10    1    2 3      4   5    6   7   8   9   10

 1   2 3      4    5   6    7    8   9   10    1   2 3       4   5    6   7   8   9   10
                   5   6                 10
/raid/data/file1           /raid/data/file3             Parity file: /dir-raid/data

      11                        12
                                                                     11
      11                        12
                                                                     11

                   2.7X                                      2.2X
Handle Directory Change                                        Directory change
                                                                   happens very
                          /namespace/infra/ds=2013-07-07
                                                                   infrequently in
                  2
                                                                   warehouse
                                      file1    File2     file3
      Client
                                      file1    file2     file3           Try to read file2,
                                                                    3         encounter
                                      parity
                                                                         missing blocks
                                      parity
                                               1                                            Client

                                        Stripe store (MySQL)         Look at the stripe
    RaidNode
                                      Block id         Stripe id     table, figure out
                                                                     that file4 does not
       4                              Blk_file_1       Strp_1
                                                                     belong to the
                                      Blk_file_2       Strp_1        stripe, and file3 is
Re-raid the directory, before file3   Blk_file_3       Strp_1        in trash.
is actually deleted from cluster      Blk_parity       Strp_1
                                                                     Reconstruct file2!!
Raid Cold Small Files: Compaction
• Compact cold small files into large files and apply
  file-level RS
•       No need to handle directory changes for file-level RS
    •    Re-raid a Directory-RS Raided directory is expensive
•       Raid-aware compaction can achieve best space saving
    •    Change block size to produce files with multiples of
         ten blocks
•       Reduce the number of metadata
Raid-Aware Compaction
▪       Compaction settings:
         set mapred.min.split.size = 39*blockSize;
         set mapred.max.split.size = 39*blockSize;
         set mapred.min.split.size.per.node = 39*blockSize;
         set mapred.min.split.size.per.rack = 39*blockSize;
         set dfs.block.size = blockSize;
         set hive.input.format =
              org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

▪       Calculate the best block size for a partition
    ▪    Make sure bestBlockSize * N ≈ Partition size where
         N = 39p + q (p ∈N+ , q ∈ {10, 20, 30})
    ▪    Compaction will generate p 40-block files and one q-
         block file
Raid-Aware Compaction
▪       Compact SeqFile format partition
    ▪    INSERT OVERWRITE TABLE seq_table
         PARTITION (ds = "2012-08-17")
            SELECT `(ds)?+.+` FROM seq_table
            WHERE ds = "2012-08-17";
▪       Compact RCFile format partition
    ▪    ALTER TABLE rc_table PARTITION
             (ds="2009-08-31") CONCATENATE;
Directory XOR & Compaction - Things Learned
 • Replication factor 2.65 ->2.35! (additional 12% space
   saving) Still rolling out

 • Bookkeeping blocks’ checksums could avoid data
   corruption caused by bugs

 • Unawareness of Raid in HDFS causes some issues
  •   Operational error could cause data loss (forget to move
      parity data with source data)

 • Directory XOR & Compaction only work for warehouse
   data
Questions?

More Related Content

What's hot

Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDataWorks Summit
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesNitin Khattar
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsDataWorks Summit
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHanborq Inc.
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 

What's hot (20)

Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Dynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File SystemDynamic Namespace Partitioning with Giraffa File System
Dynamic Namespace Partitioning with Giraffa File System
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 
Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 

Similar to Facebook's Approach to Big Data Storage Challenge

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Putting Wings on the Elephant
Putting Wings on the ElephantPutting Wings on the Elephant
Putting Wings on the ElephantDataWorks Summit
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemMilad Sobhkhiz
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
CloverETL + Hadoop
CloverETL + HadoopCloverETL + Hadoop
CloverETL + HadoopDavid Pavlis
 

Similar to Facebook's Approach to Big Data Storage Challenge (20)

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Putting Wings on the Elephant
Putting Wings on the ElephantPutting Wings on the Elephant
Putting Wings on the Elephant
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
CloverETL + Hadoop
CloverETL + HadoopCloverETL + Hadoop
CloverETL + Hadoop
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Facebook's Approach to Big Data Storage Challenge

  • 1. Facebook’s Approach to Big Data Storage Challenge Weiyan Wang Software Engineer (Data Infrastructure – HStore) March 1 2013
  • 2. Agenda 1 Data Warehouse Overview and Challenge 2 Smart Retention 3 Sort Before Compression 4 HDFS Raid 5 Directory XOR & Compaction 6 Q&A
  • 3. Life of a tag in Data Warehouse Periodic Analysis Adhoc Analysis nocron Daily report on count of hipal Count photos tagged by photo tags by country (1day) females age 20-25 yesterday Scrapes User info reaches Warehouse Warehouse (1day) UDB copier/loader Log line reaches User tags warehouse (1hr) a photo puma Realtime Analytics Scribe Log Storage www.facebook.com Count users tagging photos Log line reaches Log line generated: in the last hour (1min) Scribeh (10s) <user_id, photo_id>
  • 4. History (2008/03-2012/03) Data, Data, and more Data Facebook Scribe Data/ Nodes in Queries/Day Size (Total) Users Day Warehouse Growth 14X 60X 250X 260X 2500X
  • 5. Directions to handle data growth problem • Improve the software • HDFS Federation • Prism • Improve storage efficiency • Store more data without increasing capacity • More and more important, translate into millions of dollars saving.
  • 6. Ways to Improve Storage Efficiency • Better capacity management • Reduce space usage of Hive tables • Reduce replication factor of data
  • 7. Smart Retention – Motivation • Hive table “retention” metadata • Partitions older than retention value are automatically purged by system • Table owners are unaware of table usage • Difficult to set retention value right at the beginning. • Improper retention setting may waste spaces • Users only accessed recent 30-day partitions of a 3- month-retention table
  • 8. Smart Retention • Add a post-execute hook that logs table/partition names and query start time to MySQL. • Calculate the “empirical-retention” per table Given a partition P whose creation time is CTP: Data_age_at_last_queryP = max{StartTimeQ - CTP | ∀query Q accesses P} Given a table T: Empirical_retentionT = max{Data_age_at_last_queryP | ∀ P ∈ T}
  • 9. Smart Retention • Inform Empirical_retentionT to table owners with a call to action: • Accept the empirical value and change retention • Review table query history, figure out better setting • After 2weeks, the system will archive partitions that are older than Empirical_retentionT • Free up spaces after partitions get archived • Users need to restore archived data for querying
  • 10. Smart Retention – Things Learned • Table query history enables table owners to identify outliers: • A table is queried mostly < 32 days olds data but there was one time a 42 days old partition was accessed • Prioritize tables with the most space savings • Save 8PB from the TOP 100 tables!
  • 11. Sort Before Compression - Motivation • In RCFile format, data are stored in columns inside every row block • Sort by one or two columns with lots of duplicate values reduces final compressed data size • Trade extra computation for space saving
  • 12. Sort Before Compression • Identify the best column to sort • Take a sample of table and sort it by every column. Pick the one with the most space saving. • Transfer target partitions from service clusters to compute clusters • Sort them into compressed RCFile format. • Sorted partitions are transferred back to service clusters to replace original ones
  • 13. How we sort set hive.exec.reducers.max=1024; set hive.io.rcfile.record.buffer.size=67108864; INSERT OVERWRITE TABLE hive_table PARTITION (ds='2012-08-06',source_type='mobile_sort') SELECT `(ds|source_type)?+.+` from hive_table WHERE ds='2012-08-06' and source_type='mobile' DISTRIBUTE BY IF (userid <> 0 AND NOT (userid is null), userid, CAST(RAND() AS STRING)) SORT BY userid, ip_address;
  • 14. Sort Before Compression – Things Learned • Sorting achieves >40% space saving! • It’s important to verify data correctness • Compare original and sorted partitions’ hash values • Find a hive bug • Sort cold data first, and gradually move to hot data
  • 15. HDFS Raid In HDFS, data are 3X replicated Meta operations NameNode /warehouse/file1 (Metadata) Client 1 2 3 Read/Write Data 1 3 2 3 2 1 3 1 2 DataNode 1 DataNode 2 DataNode 3 DataNode 4 DataNode 5
  • 16. HDFS Raid – File-level XOR (10, 1) Before After /warehouse/file1 /warehouse/file1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Parity file: /raid/warehouse/file1 11 (10, 1) 11 3X 2.2X
  • 17. HDFS Raid • What if a file has 15 blocks • Treat as 20 blocks and generate one parity with 2 blocks • Replication factor = (15*2+2*2)/15 = 2.27 • Reconstruction • Online reconstruction – DistributedRaidFileSystem • Offline reconstruction – RaidNode • Block Placement
  • 18. HDFS Raid – File-level Reed Solomon (10, 4) Before /warehouse/file1 After /warehouse/file1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 3 Parity file: /raidrs/warehouse/file1 1 2 4 5 6 7 8 9 10 11 12 13 14 (10, 4) 3X 1.4X
  • 19. HDFS Raid – Hybrid Storage Even older ×1.4 RS 3months older Raided ×2.2 1day older XOR Raided ×3 Born Life of file /warehouse/facebook.jpg
  • 20. HDFS Raid – Things Learned • Replication factor 3 ->2.65 (12% space saving) • Avoid flooding namenode with requests • Daily pipeline scans fsimage to pick raidable files rather than recursively search from namenode • Small files disallow more replication reduction • 50% of files in the warehouse have only 1 or 2 blocks. They are too small to be raided.
  • 21. Raid Warm Small Files: Directory level XOR Before After /data/file3 /data/file3 /data/file1 /data/file2 /data/file4 /data/file1 /data/file2 /data/file4 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 5 6 10 /raid/data/file1 /raid/data/file3 Parity file: /dir-raid/data 11 12 11 11 12 11 2.7X 2.2X
  • 22. Handle Directory Change Directory change happens very /namespace/infra/ds=2013-07-07 infrequently in 2 warehouse file1 File2 file3 Client file1 file2 file3 Try to read file2, 3 encounter parity missing blocks parity 1 Client Stripe store (MySQL) Look at the stripe RaidNode Block id Stripe id table, figure out that file4 does not 4 Blk_file_1 Strp_1 belong to the Blk_file_2 Strp_1 stripe, and file3 is Re-raid the directory, before file3 Blk_file_3 Strp_1 in trash. is actually deleted from cluster Blk_parity Strp_1 Reconstruct file2!!
  • 23. Raid Cold Small Files: Compaction • Compact cold small files into large files and apply file-level RS • No need to handle directory changes for file-level RS • Re-raid a Directory-RS Raided directory is expensive • Raid-aware compaction can achieve best space saving • Change block size to produce files with multiples of ten blocks • Reduce the number of metadata
  • 24. Raid-Aware Compaction ▪ Compaction settings: set mapred.min.split.size = 39*blockSize; set mapred.max.split.size = 39*blockSize; set mapred.min.split.size.per.node = 39*blockSize; set mapred.min.split.size.per.rack = 39*blockSize; set dfs.block.size = blockSize; set hive.input.format = org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; ▪ Calculate the best block size for a partition ▪ Make sure bestBlockSize * N ≈ Partition size where N = 39p + q (p ∈N+ , q ∈ {10, 20, 30}) ▪ Compaction will generate p 40-block files and one q- block file
  • 25. Raid-Aware Compaction ▪ Compact SeqFile format partition ▪ INSERT OVERWRITE TABLE seq_table PARTITION (ds = "2012-08-17") SELECT `(ds)?+.+` FROM seq_table WHERE ds = "2012-08-17"; ▪ Compact RCFile format partition ▪ ALTER TABLE rc_table PARTITION (ds="2009-08-31") CONCATENATE;
  • 26. Directory XOR & Compaction - Things Learned • Replication factor 2.65 ->2.35! (additional 12% space saving) Still rolling out • Bookkeeping blocks’ checksums could avoid data corruption caused by bugs • Unawareness of Raid in HDFS causes some issues • Operational error could cause data loss (forget to move parity data with source data) • Directory XOR & Compaction only work for warehouse data