Your SlideShare is downloading. ×
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook


Published on

Дмитрий Мольков, Facebook …

Дмитрий Мольков, Facebook

Бакалавр прикладной математики Киевского национального университета им. Тараса Шевченко (2007). Магистр компьютерных наук Stony Brook University (2009). Hadoop HDFS Commiter с 2011 года. Член команды Hadoop в Facebook с 2009 года.

Тема доклада
Масштабируемость Hadoop в Facebook.

Hadoop и Hive являются прекрасным инструментарием для хранения и анализа петабайтов информации в Facebook. Работая с такими объемами информации, команда разработчиков Hadoop в Facebook ежедневно сталкивается с проблемами масштабируемости и эффективности Hadoop. В докладе пойдет речь о некоторых деталях оптимизаций в разных частях Hadoop инфраструктуры в Facebook, которые позволяют предоставлять высококачественный сервис. Это может быть, например, оптимизация стоимости хранения в многопетабайтных HDFS кластерах, увеличение пропускной способности системы, сокращение времени отказа системы с помощью High Availability разработок для HDFS.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Hadoop Scalability at Facebook
    • Dmytro Molkov ( [email_address] )
    • YaC, Moscow, September 19, 2011
  • 2. How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid HDFS Raid HDFS Raid HDFS Raid
  • 3. How Facebook uses Hadoop
  • 4. Usages of Hadoop at Facebook
    • Warehouse
      • Thousands of machines in the cluster
      • Tens of petabytes of data
      • Tens of thousands of jobs/queries a day
      • Over a hundred million files
    • Scribe-HDFS
      • Dozens of small clusters
      • Append support
      • High availability
      • High throughput
  • 5. Usages of Hadoop at Facebook (contd.)
    • Realtime Analytics
      • Medium sized hbase clusters
      • High throughput/low latency
    • FB Messages Storage
      • Medium sized hbase clusters
      • Low latency
      • High data durability
      • High Availability
    • Misc Storage/Backup clusters
      • Small to medium sized
      • Various availability/performance requirements
  • 6. Hadoop Scalability
  • 7. Hadoop Scalability
    • Warehouse Cluster - A “Single Cluster” approach
      • Good data locality
      • Ease of data access
      • Operational Simplicity
      • NameNode is the bottleneck
        • Memory pressure - too many files and blocks
        • CPU pressure - too many metadata operations against a single node
      • Long Startup Time
      • JobTracker is the bottleneck
        • Memory Pressure - too many jobs/tasks/counters in memory
        • CPU pressure - scheduling computation is expensive
  • 8. HDFS Federation Wishlist
    • Single Cluster
    • Preserve Data Locality
    • Keep Operations Simple
    • Distribute both CPU and Memory Load
  • 9. Hadoop Federation Design NameNode #1 NameNode #N Data Node Data Node ... Data Node
  • 10. HDFS Federation Overview
      • Each NameNode holds a part of the NameSpace
      • Hive tables are distributed between namenodes
      • Hive Metastore stores full locations of the tables (including the namenode) -> Hive clients know which cluster the data is stored in
      • HDFS Clients have a mount table to know where the data is
      • Each namespace uses all datanodes for storage -> the cluster load is fully balanced (Storage and I/O)
      • Single Datanode process per node ensures good utilization of resources
  • 11. Map-Reduce Federation
      • Backward Compatibility with the old code
      • Preserve data locality
      • Make scheduling faster
      • Ease the resource pressure on the JobTracker
  • 12. Map Reduce Federation Cluster Resource Manager Job Client Resource Request Task Tracker Task Tracker Resource Heartbeats Job Communication ...
  • 13. MapReduce Federation Overview
      • Cluster Manager only allocates resources
      • JobTracker per user -> few tasks per JobTracker -> more responsive scheduling
      • ClusterManager is stateless -> shorter restart times -> better availability
  • 14. Hadoop High Availability
  • 15. Warehouse High Availability
      • Full cluster restart takes 90-120 mins
      • Software upgrade is 20-30 hrs of downtime/year
      • Cluster crash is 5 hrs of downtime/year
      • MapReduce tolerates failures
  • 16. HDFS High Availability Design Primary NN Standby NN NFS DataNodes Edits Log Edits Log Block Reports/ Block Received Block Reports/ Block Received
  • 17. Clients Design
      • Using ZooKeeper as a method of name resolution
      • Under normal conditions ZooKeeper contains a location of the primary node
      • During the failover ZooKeeper record is empty and the clients know to wait for the failover to complete
      • On a network failure clients check if the ZooKeeper entry has changed and retry the command agains the new Primary NameNode if the failover has occurred
      • For the large clusters Clients also cache the location of the primary on the local node to ease the load on the zookeeper cluster
  • 18. HDFS Raid
  • 19. HDFS Raid
      • 3 way replication
        • Data locality - necessary only for the new data
        • Data availability - necessary for all kinds of data
      • Erasure codes
        • Data locality is worse than 3 way replication
        • Data availability is at least as good as 3 way replication
  • 20. HDFS Raid Detais 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks replicated twice + checksum (XOR) block replicated twice = 22 physical blocks. Effective replication factor 2.2 XOR Reed Solomon Encoding 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks with replication factor 1 + erasure codes (RS) replicated once = 14 physical blocks. Effective replication factor 1.4
  • 21. HDFS Raid Pros and Cons
      • Saves a lot of space
      • Provides same guarantees for data availability
      • Worse data locality
      • Need to reconstruct blocks instead of replicating (CPU + Network cost)
      • Block location in the cluster is important and needs to be maintained
  • 22. [email_address] [email_address]
  • 23.