Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Upcoming SlideShare
Loading in...5
×
 

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook

on

  • 11,175 views

Дмитрий Мольков, Facebook...

Дмитрий Мольков, Facebook

Бакалавр прикладной математики Киевского национального университета им. Тараса Шевченко (2007). Магистр компьютерных наук Stony Brook University (2009). Hadoop HDFS Commiter с 2011 года. Член команды Hadoop в Facebook с 2009 года.

Тема доклада
Масштабируемость Hadoop в Facebook.

Тезисы
Hadoop и Hive являются прекрасным инструментарием для хранения и анализа петабайтов информации в Facebook. Работая с такими объемами информации, команда разработчиков Hadoop в Facebook ежедневно сталкивается с проблемами масштабируемости и эффективности Hadoop. В докладе пойдет речь о некоторых деталях оптимизаций в разных частях Hadoop инфраструктуры в Facebook, которые позволяют предоставлять высококачественный сервис. Это может быть, например, оптимизация стоимости хранения в многопетабайтных HDFS кластерах, увеличение пропускной способности системы, сокращение времени отказа системы с помощью High Availability разработок для HDFS.

Statistics

Views

Total Views
11,175
Views on SlideShare
1,878
Embed Views
9,297

Actions

Likes
0
Downloads
27
Comments
0

14 Embeds 9,297

http://yac2011.yandex.ru 5957
http://yac2011.yandex.com 2059
http://events.yandex.ru 696
http://ya-events.narod.ru 408
http://tech.yandex.ru 131
https://tech.yandex.ru 23
http://yac.tadatuta.graymantle.yandex.ru 6
http://external.events.test.tools.yandex-team.ru 5
http://events.lynx.yandex.ru 4
http://yaconf2011.narod.ru 3
http://web-chib.events.pavo.yandex.ru 2
http://web-chib.events.lacerta.yandex-team.ru 1
http://news.google.com 1
http://events.indus.yandex.ru 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook Presentation Transcript

  • Hadoop Scalability at Facebook
    • Dmytro Molkov ( [email_address] )
    • YaC, Moscow, September 19, 2011
  • How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid HDFS Raid HDFS Raid HDFS Raid
  • How Facebook uses Hadoop
  • Usages of Hadoop at Facebook
    • Warehouse
      • Thousands of machines in the cluster
      • Tens of petabytes of data
      • Tens of thousands of jobs/queries a day
      • Over a hundred million files
    • Scribe-HDFS
      • Dozens of small clusters
      • Append support
      • High availability
      • High throughput
  • Usages of Hadoop at Facebook (contd.)
    • Realtime Analytics
      • Medium sized hbase clusters
      • High throughput/low latency
    • FB Messages Storage
      • Medium sized hbase clusters
      • Low latency
      • High data durability
      • High Availability
    • Misc Storage/Backup clusters
      • Small to medium sized
      • Various availability/performance requirements
  • Hadoop Scalability
  • Hadoop Scalability
    • Warehouse Cluster - A “Single Cluster” approach
      • Good data locality
      • Ease of data access
      • Operational Simplicity
      • NameNode is the bottleneck
        • Memory pressure - too many files and blocks
        • CPU pressure - too many metadata operations against a single node
      • Long Startup Time
      • JobTracker is the bottleneck
        • Memory Pressure - too many jobs/tasks/counters in memory
        • CPU pressure - scheduling computation is expensive
  • HDFS Federation Wishlist
    • Single Cluster
    • Preserve Data Locality
    • Keep Operations Simple
    • Distribute both CPU and Memory Load
  • Hadoop Federation Design NameNode #1 NameNode #N Data Node Data Node ... Data Node
  • HDFS Federation Overview
      • Each NameNode holds a part of the NameSpace
      • Hive tables are distributed between namenodes
      • Hive Metastore stores full locations of the tables (including the namenode) -> Hive clients know which cluster the data is stored in
      • HDFS Clients have a mount table to know where the data is
      • Each namespace uses all datanodes for storage -> the cluster load is fully balanced (Storage and I/O)
      • Single Datanode process per node ensures good utilization of resources
  • Map-Reduce Federation
      • Backward Compatibility with the old code
      • Preserve data locality
      • Make scheduling faster
      • Ease the resource pressure on the JobTracker
  • Map Reduce Federation Cluster Resource Manager Job Client Resource Request Task Tracker Task Tracker Resource Heartbeats Job Communication ...
  • MapReduce Federation Overview
      • Cluster Manager only allocates resources
      • JobTracker per user -> few tasks per JobTracker -> more responsive scheduling
      • ClusterManager is stateless -> shorter restart times -> better availability
  • Hadoop High Availability
  • Warehouse High Availability
      • Full cluster restart takes 90-120 mins
      • Software upgrade is 20-30 hrs of downtime/year
      • Cluster crash is 5 hrs of downtime/year
      • MapReduce tolerates failures
  • HDFS High Availability Design Primary NN Standby NN NFS DataNodes Edits Log Edits Log Block Reports/ Block Received Block Reports/ Block Received
  • Clients Design
      • Using ZooKeeper as a method of name resolution
      • Under normal conditions ZooKeeper contains a location of the primary node
      • During the failover ZooKeeper record is empty and the clients know to wait for the failover to complete
      • On a network failure clients check if the ZooKeeper entry has changed and retry the command agains the new Primary NameNode if the failover has occurred
      • For the large clusters Clients also cache the location of the primary on the local node to ease the load on the zookeeper cluster
  • HDFS Raid
  • HDFS Raid
      • 3 way replication
        • Data locality - necessary only for the new data
        • Data availability - necessary for all kinds of data
      • Erasure codes
        • Data locality is worse than 3 way replication
        • Data availability is at least as good as 3 way replication
  • HDFS Raid Detais 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks replicated twice + checksum (XOR) block replicated twice = 22 physical blocks. Effective replication factor 2.2 XOR Reed Solomon Encoding 10 blocks replicated 3 times = 30 physical blocks Effective replication factor 3.0 10 blocks with replication factor 1 + erasure codes (RS) replicated once = 14 physical blocks. Effective replication factor 1.4
  • HDFS Raid Pros and Cons
      • Saves a lot of space
      • Provides same guarantees for data availability
      • Worse data locality
      • Need to reconstruct blocks instead of replicating (CPU + Network cost)
      • Block location in the cluster is important and needs to be maintained
  • facebook.com/dms [email_address] [email_address]
  •