Your SlideShare is downloading. ×
0
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

10,768

Published on

Дмитрий Мольков, Facebook …

Дмитрий Мольков, Facebook

Бакалавр прикладной математики Киевского национального университета им. Тараса Шевченко (2007). Магистр компьютерных наук Stony Brook University (2009). Hadoop HDFS Commiter с 2011 года. Член команды Hadoop в Facebook с 2009 года.

Тема доклада
Масштабируемость Hadoop в Facebook.

Тезисы
Hadoop и Hive являются прекрасным инструментарием для хранения и анализа петабайтов информации в Facebook. Работая с такими объемами информации, команда разработчиков Hadoop в Facebook ежедневно сталкивается с проблемами масштабируемости и эффективности Hadoop. В докладе пойдет речь о некоторых деталях оптимизаций в разных частях Hadoop инфраструктуры в Facebook, которые позволяют предоставлять высококачественный сервис. Это может быть, например, оптимизация стоимости хранения в многопетабайтных HDFS кластерах, увеличение пропускной способности системы, сокращение времени отказа системы с помощью High Availability разработок для HDFS.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
10,768
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop Scalability at Facebook <ul><li>Dmytro Molkov ( [email_address] ) </li></ul><ul><li>YaC, Moscow, September 19, 2011 </li></ul>
  • 2. How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid HDFS Raid HDFS Raid HDFS Raid
  • 3. How Facebook uses Hadoop
  • 4. Usages of Hadoop at Facebook <ul><li>Warehouse </li></ul><ul><ul><li>Thousands of machines in the cluster </li></ul></ul><ul><ul><li>Tens of petabytes of data </li></ul></ul><ul><ul><li>Tens of thousands of jobs/queries a day </li></ul></ul><ul><ul><li>Over a hundred million files </li></ul></ul><ul><li>Scribe-HDFS </li></ul><ul><ul><li>Dozens of small clusters </li></ul></ul><ul><ul><li>Append support </li></ul></ul><ul><ul><li>High availability </li></ul></ul><ul><ul><li>High throughput </li></ul></ul>
  • 5. Usages of Hadoop at Facebook (contd.) <ul><li>Realtime Analytics </li></ul><ul><ul><li>Medium sized hbase clusters </li></ul></ul><ul><ul><li>High throughput/low latency </li></ul></ul><ul><li>FB Messages Storage </li></ul><ul><ul><li>Medium sized hbase clusters </li></ul></ul><ul><ul><li>Low latency </li></ul></ul><ul><ul><li>High data durability </li></ul></ul><ul><ul><li>High Availability </li></ul></ul><ul><li>Misc Storage/Backup clusters </li></ul><ul><ul><li>Small to medium sized </li></ul></ul><ul><ul><li>Various availability/performance requirements </li></ul></ul>
  • 6. Hadoop Scalability
  • 7. Hadoop Scalability <ul><li>Warehouse Cluster - A “Single Cluster” approach </li></ul><ul><ul><li>Good data locality </li></ul></ul><ul><ul><li>Ease of data access </li></ul></ul><ul><ul><li>Operational Simplicity </li></ul></ul><ul><ul><li>NameNode is the bottleneck </li></ul></ul><ul><ul><ul><li>Memory pressure - too many files and blocks </li></ul></ul></ul><ul><ul><ul><li>CPU pressure - too many metadata operations against a single node </li></ul></ul></ul><ul><ul><li>Long Startup Time </li></ul></ul><ul><ul><li>JobTracker is the bottleneck </li></ul></ul><ul><ul><ul><li>Memory Pressure - too many jobs/tasks/counters in memory </li></ul></ul></ul><ul><ul><ul><li>CPU pressure - scheduling computation is expensive </li></ul></ul></ul>
  • 8. HDFS Federation Wishlist <ul><li>Single Cluster </li></ul><ul><li>Preserve Data Locality </li></ul><ul><li>Keep Operations Simple </li></ul><ul><li>Distribute both CPU and Memory Load </li></ul>
  • 9. Hadoop Federation Design NameNode #1 NameNode #N Data Node Data Node ... Data Node
  • 10. HDFS Federation Overview <ul><ul><li>Each NameNode holds a part of the NameSpace </li></ul></ul><ul><ul><li>Hive tables are distributed between namenodes </li></ul></ul><ul><ul><li>Hive Metastore stores full locations of the tables (including the namenode) -> Hive clients know which cluster the data is stored in </li></ul></ul><ul><ul><li>HDFS Clients have a mount table to know where the data is </li></ul></ul><ul><ul><li>Each namespace uses all datanodes for storage -> the cluster load is fully balanced (Storage and I/O) </li></ul></ul><ul><ul><li>Single Datanode process per node ensures good utilization of resources </li></ul></ul>
  • 11. Map-Reduce Federation <ul><ul><li>Backward Compatibility with the old code </li></ul></ul><ul><ul><li>Preserve data locality </li></ul></ul><ul><ul><li>Make scheduling faster </li></ul></ul><ul><ul><li>Ease the resource pressure on the JobTracker </li></ul></ul>
  • 12. Map Reduce Federation Cluster Resource Manager Job Client Resource Request Task Tracker Task Tracker Resource Heartbeats Job Communication ...
  • 13. MapReduce Federation Overview <ul><ul><li>Cluster Manager only allocates resources </li></ul></ul><ul><ul><li>JobTracker per user -> few tasks per JobTracker -> more responsive scheduling </li></ul></ul><ul><ul><li>ClusterManager is stateless -> shorter restart times -> better availability </li></ul></ul>
  • 14. Hadoop High Availability
  • 15. Warehouse High Availability <ul><ul><li>Full cluster restart takes 90-120 mins </li></ul></ul><ul><ul><li>Software upgrade is 20-30 hrs of downtime/year </li></ul></ul><ul><ul><li>Cluster crash is 5 hrs of downtime/year </li></ul></ul><ul><ul><li>MapReduce tolerates failures </li></ul></ul>
  • 16. HDFS High Availability Design Primary NN Standby NN NFS DataNodes Edits Log Edits Log Block Reports/ Block Received Block Reports/ Block Received
  • 17. Clients Design <ul><ul><li>Using ZooKeeper as a method of name resolution </li></ul></ul><ul><ul><li>Under normal conditions ZooKeeper contains a location of the primary node </li></ul></ul><ul><ul><li>During the failover ZooKeeper record is empty and the clients know to wait for the failover to complete </li></ul></ul><ul><ul><li>On a network failure clients check if the ZooKeeper entry has changed and retry the command agains the new Primary NameNode if the failover has occurred </li></ul></ul><ul><ul><li>For the large clusters Clients also cache the location of the primary on the local node to ease the load on the zookeeper cluster </li></ul></ul>
  • 18. HDFS Raid
  • 19. HDFS Raid <ul><ul><li>3 way replication </li></ul></ul><ul><ul><ul><li>Data locality - necessary only for the new data </li></ul></ul></ul><ul><ul><ul><li>Data availability - necessary for all kinds of data </li></ul></ul></ul><ul><ul><li>Erasure codes </li></ul></ul><ul><ul><ul><li>Data locality is worse than 3 way replication </li></ul></ul></ul><ul><ul><ul><li>Data availability is at least as good as 3 way replication </li></ul></ul></ul>
  • 20. HDFS Raid Detais 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks replicated twice + checksum (XOR) block replicated twice = 22 physical blocks. Effective replication factor 2.2 XOR Reed Solomon Encoding 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks with replication factor 1 + erasure codes (RS) replicated once = 14 physical blocks. Effective replication factor 1.4
  • 21. HDFS Raid Pros and Cons <ul><ul><li>Saves a lot of space </li></ul></ul><ul><ul><li>Provides same guarantees for data availability </li></ul></ul><ul><ul><li>Worse data locality </li></ul></ul><ul><ul><li>Need to reconstruct blocks instead of replicating (CPU + Network cost) </li></ul></ul><ul><ul><li>Block location in the cluster is important and needs to be maintained </li></ul></ul>
  • 22. facebook.com/dms [email_address] [email_address]
  • 23.  

×