Hadoop Scalability at Facebook <ul><li>Dmytro Molkov ( [email_address] ) </li></ul><ul><li>YaC, Moscow, September 19, 2011...
How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid HDFS Raid HDFS Raid HDFS Raid
How Facebook uses Hadoop
Usages of Hadoop at Facebook <ul><li>Warehouse </li></ul><ul><ul><li>Thousands of machines in the cluster </li></ul></ul><...
Usages of Hadoop at Facebook (contd.) <ul><li>Realtime Analytics </li></ul><ul><ul><li>Medium sized hbase clusters </li></...
Hadoop Scalability
Hadoop Scalability <ul><li>Warehouse Cluster - A “Single Cluster” approach </li></ul><ul><ul><li>Good data locality </li><...
HDFS Federation Wishlist <ul><li>Single Cluster </li></ul><ul><li>Preserve Data Locality </li></ul><ul><li>Keep Operations...
Hadoop Federation Design NameNode  #1 NameNode  #N Data Node Data Node ... Data Node
HDFS Federation Overview <ul><ul><li>Each NameNode holds a part of the NameSpace </li></ul></ul><ul><ul><li>Hive tables ar...
Map-Reduce Federation <ul><ul><li>Backward Compatibility with the old code </li></ul></ul><ul><ul><li>Preserve data locali...
Map Reduce Federation Cluster Resource Manager Job Client Resource Request Task Tracker Task Tracker Resource Heartbeats J...
MapReduce Federation Overview <ul><ul><li>Cluster Manager only allocates resources </li></ul></ul><ul><ul><li>JobTracker p...
Hadoop High Availability
Warehouse High Availability <ul><ul><li>Full cluster restart takes 90-120 mins </li></ul></ul><ul><ul><li>Software upgrade...
HDFS High Availability Design Primary NN Standby NN NFS DataNodes Edits Log Edits Log Block Reports/ Block Received Block ...
Clients Design <ul><ul><li>Using ZooKeeper as a method of name resolution </li></ul></ul><ul><ul><li>Under normal conditio...
HDFS Raid
HDFS Raid <ul><ul><li>3 way replication </li></ul></ul><ul><ul><ul><li>Data locality - necessary only for the new data </l...
HDFS Raid Detais 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks replicated t...
HDFS Raid Pros and Cons <ul><ul><li>Saves a lot of space </li></ul></ul><ul><ul><li>Provides same guarantees for data avai...
facebook.com/dms [email_address] [email_address]
 
Upcoming SlideShare
Loading in...5
×

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

10,792

Published on

Дмитрий Мольков, Facebook

Бакалавр прикладной математики Киевского национального университета им. Тараса Шевченко (2007). Магистр компьютерных наук Stony Brook University (2009). Hadoop HDFS Commiter с 2011 года. Член команды Hadoop в Facebook с 2009 года.

Тема доклада
Масштабируемость Hadoop в Facebook.

Тезисы
Hadoop и Hive являются прекрасным инструментарием для хранения и анализа петабайтов информации в Facebook. Работая с такими объемами информации, команда разработчиков Hadoop в Facebook ежедневно сталкивается с проблемами масштабируемости и эффективности Hadoop. В докладе пойдет речь о некоторых деталях оптимизаций в разных частях Hadoop инфраструктуры в Facebook, которые позволяют предоставлять высококачественный сервис. Это может быть, например, оптимизация стоимости хранения в многопетабайтных HDFS кластерах, увеличение пропускной способности системы, сокращение времени отказа системы с помощью High Availability разработок для HDFS.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
10,792
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

  1. 1. Hadoop Scalability at Facebook <ul><li>Dmytro Molkov ( [email_address] ) </li></ul><ul><li>YaC, Moscow, September 19, 2011 </li></ul>
  2. 2. How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid HDFS Raid HDFS Raid HDFS Raid
  3. 3. How Facebook uses Hadoop
  4. 4. Usages of Hadoop at Facebook <ul><li>Warehouse </li></ul><ul><ul><li>Thousands of machines in the cluster </li></ul></ul><ul><ul><li>Tens of petabytes of data </li></ul></ul><ul><ul><li>Tens of thousands of jobs/queries a day </li></ul></ul><ul><ul><li>Over a hundred million files </li></ul></ul><ul><li>Scribe-HDFS </li></ul><ul><ul><li>Dozens of small clusters </li></ul></ul><ul><ul><li>Append support </li></ul></ul><ul><ul><li>High availability </li></ul></ul><ul><ul><li>High throughput </li></ul></ul>
  5. 5. Usages of Hadoop at Facebook (contd.) <ul><li>Realtime Analytics </li></ul><ul><ul><li>Medium sized hbase clusters </li></ul></ul><ul><ul><li>High throughput/low latency </li></ul></ul><ul><li>FB Messages Storage </li></ul><ul><ul><li>Medium sized hbase clusters </li></ul></ul><ul><ul><li>Low latency </li></ul></ul><ul><ul><li>High data durability </li></ul></ul><ul><ul><li>High Availability </li></ul></ul><ul><li>Misc Storage/Backup clusters </li></ul><ul><ul><li>Small to medium sized </li></ul></ul><ul><ul><li>Various availability/performance requirements </li></ul></ul>
  6. 6. Hadoop Scalability
  7. 7. Hadoop Scalability <ul><li>Warehouse Cluster - A “Single Cluster” approach </li></ul><ul><ul><li>Good data locality </li></ul></ul><ul><ul><li>Ease of data access </li></ul></ul><ul><ul><li>Operational Simplicity </li></ul></ul><ul><ul><li>NameNode is the bottleneck </li></ul></ul><ul><ul><ul><li>Memory pressure - too many files and blocks </li></ul></ul></ul><ul><ul><ul><li>CPU pressure - too many metadata operations against a single node </li></ul></ul></ul><ul><ul><li>Long Startup Time </li></ul></ul><ul><ul><li>JobTracker is the bottleneck </li></ul></ul><ul><ul><ul><li>Memory Pressure - too many jobs/tasks/counters in memory </li></ul></ul></ul><ul><ul><ul><li>CPU pressure - scheduling computation is expensive </li></ul></ul></ul>
  8. 8. HDFS Federation Wishlist <ul><li>Single Cluster </li></ul><ul><li>Preserve Data Locality </li></ul><ul><li>Keep Operations Simple </li></ul><ul><li>Distribute both CPU and Memory Load </li></ul>
  9. 9. Hadoop Federation Design NameNode #1 NameNode #N Data Node Data Node ... Data Node
  10. 10. HDFS Federation Overview <ul><ul><li>Each NameNode holds a part of the NameSpace </li></ul></ul><ul><ul><li>Hive tables are distributed between namenodes </li></ul></ul><ul><ul><li>Hive Metastore stores full locations of the tables (including the namenode) -> Hive clients know which cluster the data is stored in </li></ul></ul><ul><ul><li>HDFS Clients have a mount table to know where the data is </li></ul></ul><ul><ul><li>Each namespace uses all datanodes for storage -> the cluster load is fully balanced (Storage and I/O) </li></ul></ul><ul><ul><li>Single Datanode process per node ensures good utilization of resources </li></ul></ul>
  11. 11. Map-Reduce Federation <ul><ul><li>Backward Compatibility with the old code </li></ul></ul><ul><ul><li>Preserve data locality </li></ul></ul><ul><ul><li>Make scheduling faster </li></ul></ul><ul><ul><li>Ease the resource pressure on the JobTracker </li></ul></ul>
  12. 12. Map Reduce Federation Cluster Resource Manager Job Client Resource Request Task Tracker Task Tracker Resource Heartbeats Job Communication ...
  13. 13. MapReduce Federation Overview <ul><ul><li>Cluster Manager only allocates resources </li></ul></ul><ul><ul><li>JobTracker per user -> few tasks per JobTracker -> more responsive scheduling </li></ul></ul><ul><ul><li>ClusterManager is stateless -> shorter restart times -> better availability </li></ul></ul>
  14. 14. Hadoop High Availability
  15. 15. Warehouse High Availability <ul><ul><li>Full cluster restart takes 90-120 mins </li></ul></ul><ul><ul><li>Software upgrade is 20-30 hrs of downtime/year </li></ul></ul><ul><ul><li>Cluster crash is 5 hrs of downtime/year </li></ul></ul><ul><ul><li>MapReduce tolerates failures </li></ul></ul>
  16. 16. HDFS High Availability Design Primary NN Standby NN NFS DataNodes Edits Log Edits Log Block Reports/ Block Received Block Reports/ Block Received
  17. 17. Clients Design <ul><ul><li>Using ZooKeeper as a method of name resolution </li></ul></ul><ul><ul><li>Under normal conditions ZooKeeper contains a location of the primary node </li></ul></ul><ul><ul><li>During the failover ZooKeeper record is empty and the clients know to wait for the failover to complete </li></ul></ul><ul><ul><li>On a network failure clients check if the ZooKeeper entry has changed and retry the command agains the new Primary NameNode if the failover has occurred </li></ul></ul><ul><ul><li>For the large clusters Clients also cache the location of the primary on the local node to ease the load on the zookeeper cluster </li></ul></ul>
  18. 18. HDFS Raid
  19. 19. HDFS Raid <ul><ul><li>3 way replication </li></ul></ul><ul><ul><ul><li>Data locality - necessary only for the new data </li></ul></ul></ul><ul><ul><ul><li>Data availability - necessary for all kinds of data </li></ul></ul></ul><ul><ul><li>Erasure codes </li></ul></ul><ul><ul><ul><li>Data locality is worse than 3 way replication </li></ul></ul></ul><ul><ul><ul><li>Data availability is at least as good as 3 way replication </li></ul></ul></ul>
  20. 20. HDFS Raid Detais 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks replicated twice + checksum (XOR) block replicated twice = 22 physical blocks. Effective replication factor 2.2 XOR Reed Solomon Encoding 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks with replication factor 1 + erasure codes (RS) replicated once = 14 physical blocks. Effective replication factor 1.4
  21. 21. HDFS Raid Pros and Cons <ul><ul><li>Saves a lot of space </li></ul></ul><ul><ul><li>Provides same guarantees for data availability </li></ul></ul><ul><ul><li>Worse data locality </li></ul></ul><ul><ul><li>Need to reconstruct blocks instead of replicating (CPU + Network cost) </li></ul></ul><ul><ul><li>Block location in the cluster is important and needs to be maintained </li></ul></ul>
  22. 22. facebook.com/dms [email_address] [email_address]
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×