2. Topics
• Variety of Data in 3 Vs
• RDBMS(Relational Database Management System) Vs
MapReduce
• Top Five High-Impact Use Cases for Big Data Analytics
• Big Data at Facebook (Facebook Messages - 2011)
3. Variety of Data
• Structured Data
• e.g. Database files, Csv
• Semi-Structured Data
• e.g. Json, XML
• Unstructured Data
• e.g. Text file, Log files
4. RDBMS vs MapReduce
Traditional RDBMS MapReduce
Data Size GB TB or PB
Access Interactive and Batch Batch
Updates Read/Write many times Write once, Read many
Transactions ACID None
Structure Schema-on-Write Schema-on-Read
Integrity High Low
Scaling Non-linear Linear
5. Top Five High-Impact Use Cases
for Big Data Analytics
• Customer Analytics (360-degree view of the
customer)
• Data Exploration (Big data service refinery)
• Operational Analytics (Internet of Things)
• Fraud and Compliance (Security Intelligence)
• Enterprise Data Warehouse(EDW) Optimization (Data
Warehouse Modernization)
6. Big Data at Facebook
(Facebook Messages - 2011)
• 120 Billion chat messages per month
~ 11TB per month with Mysql
• Now over 300TB per month
• Grow ~27X from previous year.
• Apache HBase (message metadata, message
body, search index)
• Haystack (Attachements)
7. Big Data at Facebook
(Facebook Messages - 2011)
Why Apache HBase?
• High write throughput
• Horizontal Scalability (Un-predicable growing)
• Automatic Failover
• HDFS (Distributed Storage)
8. Big Data at Facebook
(Facebook Messages - 2011)
Application Server
HBase/HDFS
Cell 1
Application Server
HBase/HDFS
Cell ..n
……
9. Big Data at Facebook
(Facebook Messages - 2011)
Client
Application Server
HBase/HDFS
Cell 1
HayStack
User Directory
Service
1) Ask for the user’s
data cell by user id
2) Go to the user’s cell
3) addMessage(userId, msg)
4) Strip and add attachments (if any)