Your SlideShare is downloading. ×
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook


Published on

Дмитрий Мольков, Facebook …

Дмитрий Мольков, Facebook

Бакалавр прикладной математики Киевского национального университета им. Тараса Шевченко (2007). Магистр компьютерных наук Stony Brook University (2009). Hadoop HDFS Commiter с 2011 года. Член команды Hadoop в Facebook с 2009 года.

Тема доклада
Масштабируемость Hadoop в Facebook.

Hadoop и Hive являются прекрасным инструментарием для хранения и анализа петабайтов информации в Facebook. Работая с такими объемами информации, команда разработчиков Hadoop в Facebook ежедневно сталкивается с проблемами масштабируемости и эффективности Hadoop. В докладе пойдет речь о некоторых деталях оптимизаций в разных частях Hadoop инфраструктуры в Facebook, которые позволяют предоставлять высококачественный сервис. Это может быть, например, оптимизация стоимости хранения в многопетабайтных HDFS кластерах, увеличение пропускной способности системы, сокращение времени отказа системы с помощью High Availability разработок для HDFS.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Hadoop Scalability at Facebook
    • Dmytro Molkov ( [email_address] )
    • YaC, Moscow, September 19, 2011
  • 2. How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid HDFS Raid HDFS Raid HDFS Raid
  • 3. How Facebook uses Hadoop
  • 4. Usages of Hadoop at Facebook
    • Warehouse
      • Thousands of machines in the cluster
      • Tens of petabytes of data
      • Tens of thousands of jobs/queries a day
      • Over a hundred million files
    • Scribe-HDFS
      • Dozens of small clusters
      • Append support
      • High availability
      • High throughput
  • 5. Usages of Hadoop at Facebook (contd.)
    • Realtime Analytics
      • Medium sized hbase clusters
      • High throughput/low latency
    • FB Messages Storage
      • Medium sized hbase clusters
      • Low latency
      • High data durability
      • High Availability
    • Misc Storage/Backup clusters
      • Small to medium sized
      • Various availability/performance requirements
  • 6. Hadoop Scalability
  • 7. Hadoop Scalability
    • Warehouse Cluster - A “Single Cluster” approach
      • Good data locality
      • Ease of data access
      • Operational Simplicity
      • NameNode is the bottleneck
        • Memory pressure - too many files and blocks
        • CPU pressure - too many metadata operations against a single node
      • Long Startup Time
      • JobTracker is the bottleneck
        • Memory Pressure - too many jobs/tasks/counters in memory
        • CPU pressure - scheduling computation is expensive
  • 8. HDFS Federation Wishlist
    • Single Cluster
    • Preserve Data Locality
    • Keep Operations Simple
    • Distribute both CPU and Memory Load
  • 9. Hadoop Federation Design NameNode #1 NameNode #N Data Node Data Node ... Data Node
  • 10. HDFS Federation Overview
      • Each NameNode holds a part of the NameSpace
      • Hive tables are distributed between namenodes
      • Hive Metastore stores full locations of the tables (including the namenode) -> Hive clients know which cluster the data is stored in
      • HDFS Clients have a mount table to know where the data is
      • Each namespace uses all datanodes for storage -> the cluster load is fully balanced (Storage and I/O)
      • Single Datanode process per node ensures good utilization of resources
  • 11. Map-Reduce Federation
      • Backward Compatibility with the old code
      • Preserve data locality
      • Make scheduling faster
      • Ease the resource pressure on the JobTracker
  • 12. Map Reduce Federation Cluster Resource Manager Job Client Resource Request Task Tracker Task Tracker Resource Heartbeats Job Communication ...
  • 13. MapReduce Federation Overview
      • Cluster Manager only allocates resources
      • JobTracker per user -> few tasks per JobTracker -> more responsive scheduling
      • ClusterManager is stateless -> shorter restart times -> better availability
  • 14. Hadoop High Availability
  • 15. Warehouse High Availability
      • Full cluster restart takes 90-120 mins
      • Software upgrade is 20-30 hrs of downtime/year
      • Cluster crash is 5 hrs of downtime/year
      • MapReduce tolerates failures
  • 16. HDFS High Availability Design Primary NN Standby NN NFS DataNodes Edits Log Edits Log Block Reports/ Block Received Block Reports/ Block Received
  • 17. Clients Design
      • Using ZooKeeper as a method of name resolution
      • Under normal conditions ZooKeeper contains a location of the primary node
      • During the failover ZooKeeper record is empty and the clients know to wait for the failover to complete
      • On a network failure clients check if the ZooKeeper entry has changed and retry the command agains the new Primary NameNode if the failover has occurred
      • For the large clusters Clients also cache the location of the primary on the local node to ease the load on the zookeeper cluster
  • 18. HDFS Raid
  • 19. HDFS Raid
      • 3 way replication
        • Data locality - necessary only for the new data
        • Data availability - necessary for all kinds of data
      • Erasure codes
        • Data locality is worse than 3 way replication
        • Data availability is at least as good as 3 way replication
  • 20. HDFS Raid Detais 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks replicated twice + checksum (XOR) block replicated twice = 22 physical blocks. Effective replication factor 2.2 XOR Reed Solomon Encoding 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks with replication factor 1 + erasure codes (RS) replicated once = 14 physical blocks. Effective replication factor 1.4
  • 21. HDFS Raid Pros and Cons
      • Saves a lot of space
      • Provides same guarantees for data availability
      • Worse data locality
      • Need to reconstruct blocks instead of replicating (CPU + Network cost)
      • Block location in the cluster is important and needs to be maintained
  • 22. [email_address] [email_address]
  • 23.