SlideShare a Scribd company logo
Realtime Big Data at Facebook
with Hadoop and HBase


Jonathan Gray
November ,
Hadoop World NYC
Agenda


   Why Hadoop and HBase?

   Applications of HBase at Facebook

   Future of HBase at Facebook
About Me                                  Jonathan Gray
▪ Previous life as Co-Founder of Streamy.com
  ▪ Realtime Social News Aggregator

  ▪ Big Data problems led us to Hadoop/HBase

  ▪ HBase committer and Hadoop user/complainer



▪ Software Engineer at Facebook
  ▪ Develop, support, and evangelize HBase across teams

  ▪ Recently joined Database Infrastructure Engineering
      MySQL and HBase together at last!
Why Hadoop and HBase?
For Realtime Data?
Cache      Data analysis




OS   Web server    Database       Language
Problems with existing stack
▪ MySQL is stable, but...
  ▪ Limited throughput

  ▪ Not inherently distributed

  ▪ Table size limits

  ▪ Inflexible schema


▪ Memcached is fast, but...
  ▪ Only key-value so data is opaque

  ▪ No write-through
Problems with existing stack
▪ Hadoop is scalable, but...
  ▪ MapReduce is slow

  ▪ Writing MapReduce is difficult

  ▪ Does not support random writes

  ▪ Poor support for random reads
Specialized solutions
▪ Inbox Search
  ▪ Cassandra


▪ High-throughput, persistent key-value
  ▪ Tokyo Cabinet


▪ Large scale data warehousing
  ▪ Hive


▪ Custom C++ servers for lots of other stuff
Finding a new online data store
▪ Consistent patterns emerge
  ▪ Massive datasets, often largely inactive

  ▪ Lots of writes

  ▪ Fewer reads

  ▪ Dictionaries and lists

  ▪ Entity-centric schemas
     ▪   per-user, per-domain, per-app
Finding a new online data store
▪ Other requirements laid out
  ▪ Elasticity

  ▪ High availability

  ▪ Strong consistency within a datacenter

  ▪ Fault isolation


▪ Some non-requirements
  ▪ Network partitions within a single datacenter

  ▪ Active-active serving from multiple datacenters
Finding a new online data store
▪ In         , engineers at FB compared DBs
   ▪ Apache Cassandra, Apache HBase, Sharded MySQL


▪ Compared performance, scalability, and features
   ▪ HBase gave excellent write performance, good reads

   ▪ HBase already included many nice-to-have features
       ▪   Atomic read-modify-write operations
       ▪   Multiple shards per server
       ▪   Bulk importing
       ▪   Range scans
HBase uses HDFS
We get the benefits of HDFS as a storage
system for free
▪ Fault tolerance

▪ Scalability

▪ Checksums fix corruptions

▪ MapReduce

▪ Fault isolation of disks

▪ HDFS battle tested at petabyte scale at Facebook

  ▪   Lots of existing operational experience
HBase in a nutshell
▪ Sorted and column-oriented

▪ High write throughput

▪ Horizontal scalability

▪ Automatic failover

▪ Regions sharded dynamically
Applications of HBase at Facebook
Use Case
    Titan
(Facebook Messages)
The New Facebook Messages




Messages   IM/Chat   email   SMS
Facebook Messaging
▪ Largest engineering effort in the history of FB
  ▪    engineers over more than a year
  ▪ Incorporates over   infrastructure technologies
      ▪ Hadoop, HBase, Haystack, ZooKeeper, etc...



▪ A product at massive scale on day one

  ▪ Hundreds of millions of active users

  ▪   + billion messages a month
  ▪   k instant messages a second on average
Messaging Challenges
▪ High write throughput
  ▪ Every message, instant message, SMS, and e-mail

  ▪ Search indexes and metadata for all of the above

  ▪ Denormalized schema


▪ Massive clusters
  ▪ So much data and usage requires a large server footprint

  ▪ Do not want outages to impact availability

  ▪ Must be able to easily scale out
High Write Throughput
            Write
            Key Value

                                                  Sequential
  Key val                          Key val          write

  Key val                          Key val
  Key val                          Key val
    .
    .                                  .
                                       .
    .                                  . memory
                               Sorted in
  Key val                          Key val
    .
    .
    .
  Key val         Sequential       Key val
                  write

Commit Log                      Memstore
Horizontal Scalability
Region

         ...                ...
Automatic Failover
                              Find new
               HBase client   server from
                              META

server
 died
Facebook Messages Stats
▪   B+ messages per day
    ▪   B+ read/write ops to HBase per day
        ▪ . M ops/sec at peak

        ▪     read,     write
        ▪~   columns per operation across multiple families

▪   PB+ of online data in HBase
    ▪ LZO compressed and un-replicated (   PB replicated)
    ▪ Growing at    TB/month
Use Case
   Puma
(Facebook Insights)
Before Puma
                     Offline ETL
Web Tier            HDFS         Hive         MySQL
           Scribe          MR           SQL




                           SQL


                      8-24 hours
Puma
                    Realtime ETL
Web Tier            HDFS             Puma            HBase
           Scribe           PTail           HTable




                            Thrift


                    2-30 seconds
Puma as Realtime MapReduce
▪ Map phase with PTail
  ▪ Divide the input log stream into N shards

  ▪ First version supported random bucketing

  ▪ Now supports application-level bucketing


▪ Reduce phase with HBase
  ▪ Every row+column in HBase is an output key

  ▪ Aggregate key counts using atomic counters

  ▪ Can also maintain per-key lists or other structures
Puma for Facebook Insights
▪ Realtime URL/Domain Insights
  ▪ Domain owners can see deep analytics for their site

  ▪ Clicks, Likes, Shares, Comments, Impressions

  ▪ Detailed demographic breakdowns (anonymized)

  ▪ Top URLs calculated per-domain and globally


▪ Massive Throughput
  ▪ Billions of URLs

  ▪>   Million counter increments per second
Future of Puma
▪ Centrally managed service for many products


▪ Several other applications in production
  ▪ Commerce Tracking

  ▪ Ad Insights


▪ Making Puma generic
  ▪ Dynamically configured by product teams

  ▪ Custom query language
Use Case
         ODS
(Facebook Internal Metrics)
ODS
▪ Operational Data Store
  ▪ System metrics (CPU, Memory, IO, Network)

  ▪ Application metrics (Web, DB, Caches)

  ▪ Facebook metrics (Usage, Revenue)
      ▪   Easily graph this data over time
      ▪   Supports complex aggregation, transformations, etc.

▪ Difficult to scale with MySQL

  ▪ Millions of unique time-series with billions of points

  ▪ Irregular data growth patterns
Dynamic sharding of regions
Region

                ...           ...




           server
         overloaded
Future of HBase at Facebook
User and Graph Data
      in HBase
Why now?
▪ MySQL+Memcached hard to replace, but...
  ▪ Joins and other RDBMS functionality are gone

  ▪ From writing SQL to using APIs

  ▪ Next generation of services and caches make the

   persistent storage engine transparent to www

▪ Primarily a financially motivated decision
  ▪ MySQL works, but can HBase save us money?

  ▪ Also, are there things we just couldn’t do before?
HBase vs. MySQL
▪ MySQL at Facebook
  ▪ Tier size determined solely by IOPS

  ▪ Heavy on random IO for reads and writes

  ▪ Rely on fast disks or flash to scale individual nodes



▪ HBase showing promise of cost savings
  ▪ Fewer IOPS on write-heavy workloads

  ▪ Larger tables on denser, cheaper nodes

  ▪ Simpler operations and replication “for free”
HBase vs. MySQL
▪ MySQL is not going anywhere soon
  ▪ It works!



▪ But HBase is a great addition to the tool belt
  ▪ Different set of trade-offs

  ▪ Great at storing key-values, dictionaries, and lists

  ▪ Products with heavy write requirements

  ▪ Generated data

  ▪ Potential capital and operational cost savings
UDB Challenges
▪ MySQL has a       + year head start
  ▪ HBase is still a pre-   . database system

▪ Insane Requirements
  ▪ Zero data loss, low latency, very high throughput

  ▪ Reads, writes, and atomic read-modify-writes

  ▪ WAN replication, backups w/ point-in-time recovery

  ▪ Live migration of critical user data w/ existing shards

  ▪ queryf() and other fun edge cases to deal with
Technical/Developer oriented talk tomorrow:




Apache HBase Road Map
A short history of nearly everything HBase. Past, present, and future.

Wednesday @ 1PM in the Met Ballroom
Check out the HBase at Facebook Page:

facebook.com/UsingHbase


    Thanks! Questions?

More Related Content

What's hot

Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
Cloudera, Inc.
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
Nick Dimiduk
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
Prasanna Rajaperumal
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Jonathan Seidman
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
Cloudera, Inc.
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
Microsoft TechNet - Belgium and Luxembourg
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
larsgeorge
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.
 

What's hot (20)

Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 

Viewers also liked

Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
Cloudera, Inc.
 
Advance Facebook Techniques
Advance Facebook TechniquesAdvance Facebook Techniques
Advance Facebook Techniques
BAM Social Media Services
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
nathanmarz
 
Instagram - Digital Marketing Tool
Instagram - Digital Marketing ToolInstagram - Digital Marketing Tool
Instagram - Digital Marketing Tool
CHRISTINE MA
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
DataWorks Summit
 
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Cosmin Lehene
 
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
NTT DATA OSS Professional Services
 
MemSQL
MemSQLMemSQL
Facebook App Dev 201 App Launch Distrib
Facebook App Dev 201 App Launch DistribFacebook App Dev 201 App Launch Distrib
Facebook App Dev 201 App Launch Distrib
Dave McClure
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
tatsuya6502
 
Power Prospecting Using Social Media
Power Prospecting Using Social MediaPower Prospecting Using Social Media
Power Prospecting Using Social Media
Doug Devitre
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Introduction to Firebase [Google I/O Extended Bangkok 2016]
Introduction to Firebase [Google I/O Extended Bangkok 2016]Introduction to Firebase [Google I/O Extended Bangkok 2016]
Introduction to Firebase [Google I/O Extended Bangkok 2016]
Sittiphol Phanvilai
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
hamaken
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Amazon Web Services
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
Jason Flittner
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
hamaken
 

Viewers also liked (20)

Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Sync your facebook friends with your database
Sync your facebook friends with your databaseSync your facebook friends with your database
Sync your facebook friends with your database
 
Advance Facebook Techniques
Advance Facebook TechniquesAdvance Facebook Techniques
Advance Facebook Techniques
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Instagram - Digital Marketing Tool
Instagram - Digital Marketing ToolInstagram - Digital Marketing Tool
Instagram - Digital Marketing Tool
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
 
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
並列分散処理基盤Hadoopの紹介と、開発者が語るHadoopの使いどころ (Silicon Valley x 日本 / Tech x Business ...
 
MemSQL
MemSQLMemSQL
MemSQL
 
Facebook App Dev 201 App Launch Distrib
Facebook App Dev 201 App Launch DistribFacebook App Dev 201 App Launch Distrib
Facebook App Dev 201 App Launch Distrib
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
 
Power Prospecting Using Social Media
Power Prospecting Using Social MediaPower Prospecting Using Social Media
Power Prospecting Using Social Media
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Introduction to Firebase [Google I/O Extended Bangkok 2016]
Introduction to Firebase [Google I/O Extended Bangkok 2016]Introduction to Firebase [Google I/O Extended Bangkok 2016]
Introduction to Firebase [Google I/O Extended Bangkok 2016]
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 

Similar to Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoop and HBase - Jonathan Gray, Facebook

Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
Cloudera, Inc.
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
larsgeorge
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
Felipe Ferreira
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturedrewz lin
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01jgregory1234
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构yiditushe
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
mysqlops
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
HBaseCon
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base智杰 付
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
dave_revell
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
bazaarvoice_engineering
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesfeng1212
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
Gokuldas Pillai
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
Salesforce Engineering
 
Apache Drill
Apache DrillApache Drill
Apache Drill
Ted Dunning
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 

Similar to Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoop and HBase - Jonathan Gray, Facebook (20)

Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
From Batch to Realtime with Hadoop - Berlin Buzzwords - June 2012
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Conhecendo o Apache HBase
Conhecendo o Apache HBaseConhecendo o Apache HBase
Conhecendo o Apache HBase
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
The Future of Hbase
The Future of HbaseThe Future of Hbase
The Future of Hbase
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoop and HBase - Jonathan Gray, Facebook

  • 1.
  • 2. Realtime Big Data at Facebook with Hadoop and HBase Jonathan Gray November , Hadoop World NYC
  • 3. Agenda Why Hadoop and HBase? Applications of HBase at Facebook Future of HBase at Facebook
  • 4. About Me Jonathan Gray ▪ Previous life as Co-Founder of Streamy.com ▪ Realtime Social News Aggregator ▪ Big Data problems led us to Hadoop/HBase ▪ HBase committer and Hadoop user/complainer ▪ Software Engineer at Facebook ▪ Develop, support, and evangelize HBase across teams ▪ Recently joined Database Infrastructure Engineering MySQL and HBase together at last!
  • 5. Why Hadoop and HBase? For Realtime Data?
  • 6. Cache Data analysis OS Web server Database Language
  • 7. Problems with existing stack ▪ MySQL is stable, but... ▪ Limited throughput ▪ Not inherently distributed ▪ Table size limits ▪ Inflexible schema ▪ Memcached is fast, but... ▪ Only key-value so data is opaque ▪ No write-through
  • 8. Problems with existing stack ▪ Hadoop is scalable, but... ▪ MapReduce is slow ▪ Writing MapReduce is difficult ▪ Does not support random writes ▪ Poor support for random reads
  • 9. Specialized solutions ▪ Inbox Search ▪ Cassandra ▪ High-throughput, persistent key-value ▪ Tokyo Cabinet ▪ Large scale data warehousing ▪ Hive ▪ Custom C++ servers for lots of other stuff
  • 10. Finding a new online data store ▪ Consistent patterns emerge ▪ Massive datasets, often largely inactive ▪ Lots of writes ▪ Fewer reads ▪ Dictionaries and lists ▪ Entity-centric schemas ▪ per-user, per-domain, per-app
  • 11. Finding a new online data store ▪ Other requirements laid out ▪ Elasticity ▪ High availability ▪ Strong consistency within a datacenter ▪ Fault isolation ▪ Some non-requirements ▪ Network partitions within a single datacenter ▪ Active-active serving from multiple datacenters
  • 12. Finding a new online data store ▪ In , engineers at FB compared DBs ▪ Apache Cassandra, Apache HBase, Sharded MySQL ▪ Compared performance, scalability, and features ▪ HBase gave excellent write performance, good reads ▪ HBase already included many nice-to-have features ▪ Atomic read-modify-write operations ▪ Multiple shards per server ▪ Bulk importing ▪ Range scans
  • 13. HBase uses HDFS We get the benefits of HDFS as a storage system for free ▪ Fault tolerance ▪ Scalability ▪ Checksums fix corruptions ▪ MapReduce ▪ Fault isolation of disks ▪ HDFS battle tested at petabyte scale at Facebook ▪ Lots of existing operational experience
  • 14. HBase in a nutshell ▪ Sorted and column-oriented ▪ High write throughput ▪ Horizontal scalability ▪ Automatic failover ▪ Regions sharded dynamically
  • 15. Applications of HBase at Facebook
  • 16. Use Case Titan (Facebook Messages)
  • 17. The New Facebook Messages Messages IM/Chat email SMS
  • 18. Facebook Messaging ▪ Largest engineering effort in the history of FB ▪ engineers over more than a year ▪ Incorporates over infrastructure technologies ▪ Hadoop, HBase, Haystack, ZooKeeper, etc... ▪ A product at massive scale on day one ▪ Hundreds of millions of active users ▪ + billion messages a month ▪ k instant messages a second on average
  • 19. Messaging Challenges ▪ High write throughput ▪ Every message, instant message, SMS, and e-mail ▪ Search indexes and metadata for all of the above ▪ Denormalized schema ▪ Massive clusters ▪ So much data and usage requires a large server footprint ▪ Do not want outages to impact availability ▪ Must be able to easily scale out
  • 20. High Write Throughput Write Key Value Sequential Key val Key val write Key val Key val Key val Key val . . . . . . memory Sorted in Key val Key val . . . Key val Sequential Key val write Commit Log Memstore
  • 22. Automatic Failover Find new HBase client server from META server died
  • 23. Facebook Messages Stats ▪ B+ messages per day ▪ B+ read/write ops to HBase per day ▪ . M ops/sec at peak ▪ read, write ▪~ columns per operation across multiple families ▪ PB+ of online data in HBase ▪ LZO compressed and un-replicated ( PB replicated) ▪ Growing at TB/month
  • 24. Use Case Puma (Facebook Insights)
  • 25. Before Puma Offline ETL Web Tier HDFS Hive MySQL Scribe MR SQL SQL 8-24 hours
  • 26. Puma Realtime ETL Web Tier HDFS Puma HBase Scribe PTail HTable Thrift 2-30 seconds
  • 27. Puma as Realtime MapReduce ▪ Map phase with PTail ▪ Divide the input log stream into N shards ▪ First version supported random bucketing ▪ Now supports application-level bucketing ▪ Reduce phase with HBase ▪ Every row+column in HBase is an output key ▪ Aggregate key counts using atomic counters ▪ Can also maintain per-key lists or other structures
  • 28. Puma for Facebook Insights ▪ Realtime URL/Domain Insights ▪ Domain owners can see deep analytics for their site ▪ Clicks, Likes, Shares, Comments, Impressions ▪ Detailed demographic breakdowns (anonymized) ▪ Top URLs calculated per-domain and globally ▪ Massive Throughput ▪ Billions of URLs ▪> Million counter increments per second
  • 29.
  • 30.
  • 31. Future of Puma ▪ Centrally managed service for many products ▪ Several other applications in production ▪ Commerce Tracking ▪ Ad Insights ▪ Making Puma generic ▪ Dynamically configured by product teams ▪ Custom query language
  • 32. Use Case ODS (Facebook Internal Metrics)
  • 33. ODS ▪ Operational Data Store ▪ System metrics (CPU, Memory, IO, Network) ▪ Application metrics (Web, DB, Caches) ▪ Facebook metrics (Usage, Revenue) ▪ Easily graph this data over time ▪ Supports complex aggregation, transformations, etc. ▪ Difficult to scale with MySQL ▪ Millions of unique time-series with billions of points ▪ Irregular data growth patterns
  • 34. Dynamic sharding of regions Region ... ... server overloaded
  • 35. Future of HBase at Facebook
  • 36. User and Graph Data in HBase
  • 37. Why now? ▪ MySQL+Memcached hard to replace, but... ▪ Joins and other RDBMS functionality are gone ▪ From writing SQL to using APIs ▪ Next generation of services and caches make the persistent storage engine transparent to www ▪ Primarily a financially motivated decision ▪ MySQL works, but can HBase save us money? ▪ Also, are there things we just couldn’t do before?
  • 38. HBase vs. MySQL ▪ MySQL at Facebook ▪ Tier size determined solely by IOPS ▪ Heavy on random IO for reads and writes ▪ Rely on fast disks or flash to scale individual nodes ▪ HBase showing promise of cost savings ▪ Fewer IOPS on write-heavy workloads ▪ Larger tables on denser, cheaper nodes ▪ Simpler operations and replication “for free”
  • 39. HBase vs. MySQL ▪ MySQL is not going anywhere soon ▪ It works! ▪ But HBase is a great addition to the tool belt ▪ Different set of trade-offs ▪ Great at storing key-values, dictionaries, and lists ▪ Products with heavy write requirements ▪ Generated data ▪ Potential capital and operational cost savings
  • 40. UDB Challenges ▪ MySQL has a + year head start ▪ HBase is still a pre- . database system ▪ Insane Requirements ▪ Zero data loss, low latency, very high throughput ▪ Reads, writes, and atomic read-modify-writes ▪ WAN replication, backups w/ point-in-time recovery ▪ Live migration of critical user data w/ existing shards ▪ queryf() and other fun edge cases to deal with
  • 41. Technical/Developer oriented talk tomorrow: Apache HBase Road Map A short history of nearly everything HBase. Past, present, and future. Wednesday @ 1PM in the Met Ballroom
  • 42. Check out the HBase at Facebook Page: facebook.com/UsingHbase Thanks! Questions?