SlideShare a Scribd company logo
Raghavendra Prabhu (RVP)
Zen: A Graph Data Model on
HBase
Xun Liu
HBase @ Pinterest - 2012
•Original use case: materialized home feed
•Replaced Redis
•Need: elasticity, high write load, serve from disk/SSD
•Challenges:
•Running on public cloud (AWS)
•User facing use case (MTTR, latency, fault tolerance etc.)
HBase @ Pinterest - 2013
•Need: highly elastic key-value store
•Access from Python
•Support “move fast”
•Low operational overhead
Enter UMegaStore
Storage-as-a-Service: Key-value thrift API on top of HBase
Features:
•Key partitioning to balance load
•Master-slave clusters, semi automatic failover
•Speculative execution
•Multi-tenancy with traffic isolation
Storage-as-a-service was a great step forward,
but could we do better?
“Given how robust the messenger is on
day one, it’s surprising to learn that
Pinterest built the entire product
in three months.” — The Verge
Example: Messages Data Model
Conversation
Message 1
Message 2
Message N
User
User
Participates Contains
Realization
•These object models closely resemble a graph
•Objects are nodes, edges represent relationships
•Typical needs:
• retrieve data for a node or edge
• get all outgoing edges from a node
• get all incoming edges from a node
• count incoming or outgoing edges for a node
Enter Zen
•Provides a graph data model instead of key-value
•Automatically creates necessary indexes
•Materializes counts for efficient querying
•Implemented on top of HBase
Zen API
Nodes:
• addNode, removeNode, getNode
• Node id: globally unique 64-bit integer
ID 123
Prop 1 Val 1
Prop 2 Val 2
Zen API
Edges:
• addEdge, removeEdge, getEdge
• Edge Ref: (edgeType, fromId, toId)
• Score for ordering
Edge Ref 120, 123, 4567
Prop 1 Val 1
Prop 2 Val 2
Zen API
Edge Queries:
• getEdges, countEdges, removeEdges
struct EdgeQuery {
1: required NodeId nodeId;
2: required EdgeDirection direction;
3: optional TypeId edgeType;
}
Zen API
Property Indexes
•Unique index
•Ensures a property value is unique across all nodes of a type
•Non-unique index
•Allows retrieval by property value
•Works for both nodes and edges
Illustration: Messages on Zen
Id:1234 Id:2345 Id:3456
Type: Participates Type: Contains
Type: Conversation
Started: 12 Aug 2014 08:00
Header: “Great pin!”
Pin Id: 10001 [non-unique]
Type: User
Name: “Ben Smith” [unique]
Status: Active
Type: Message
Sent: 12 Aug 2014 08:00
Text: “Great pin!”
Zen @ Pinterest - 2014
Close to 10 million operations every second
Smart Feed Messages News Interests
Zen Schema Design
Zen - Property
Data
type name score distance
12345 (node) 10 Ben Smith
12345-20-67890 (edge) 1000 1 mile
Zen - Property Index
Data
ID
unique-10-name=ben smith 12345
nonuniq-10-lastname=smith-12345
nonuniq-10-lastname=smith-67890
Zen - Edge Score Index
Data
12345-out-20-1000-67890
12345-out-20-1001-67891
12345-in-30-990-67892
12345-in-30-991-67893
Zen - Edge Count
Data
Count
12345-out-20 2
12345-in-30 4
Production Learnings
1. Avoid Hot Regions
•Salt row keys to achieve uniform distribution
•Reverse bits of auto increment integers
•Prepend hash to row keys
•Pre-split regions using uniform split
•Tall table
2. Batch For Throughput
•Bottleneck: HLog sync
•Deferred HLog sync can lose data
•Batching:
•Client-side: batch APIs for clients to do bulk insertion
•Server-side: Zen Server to buffer edits across clients and
flush together
3. Tune For Performance
•Memory v.s. Latency
•Default: BLOOMFILTER => ‘ROW’, BLOCKSIZE => '8192'
•Special: BLOOMFILTER => ‘NONE’, BLOCKSIZE => ‘32768'
•CPU v.s. Data Size
•Default: DATA_BLOCK_ENCODING => ‘FAST_DIFF’, COMPRESSION => ‘SNAPPY'
•Special: DATA_BLOCK_ENCODING => ‘PREFIX’, COMPRESSION => ‘NONE’
4. Namespace for Isolation
Node
Namespace 1
Edge Index Node
Namespace 2
Edge Index
•Dedicated HBase cluster for big applications
•Shared cluster with dedicated namespace for smaller ones
5. Coprocessor For Efficiency
•Use Case: remove a large number of edges (feeds)
•Usual Way:
•Scan all edges of a node
•Delete edges beyond a limit
•Major compact to remove delete markers
•Coprocessor
•Trim in a major compaction coprocessor
6. Best Effort Data Consistency
• No distributed transaction: keep things simple and fast
• Best effort to maintain data consistency:
• Manual rollback in Zen server upon failures
• Offline MapReduce jobs (Dr Zen) to scan and fix inconsistencies
Takeaways
•The graph data model can be a convenient abstraction to cut
down product development time.
•HBase has worked very well as a storage backend for Zen
for large scale user facing workloads.
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase

More Related Content

What's hot

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Databricks
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
Databricks
 
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Amazon Web Services
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)
Chris Bolman
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
Mauro Pagano
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to ask
Carlos Abalde
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
kafka
kafkakafka
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
Vadim Semenov
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
Steven Francia
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
DataWorks Summit
 

What's hot (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
Going Deep on Amazon Aurora Serverless (DAT427-R1) - AWS re:Invent 2018
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)
Timelines at Scale (Raffi Krikorian - VP of Engineering at Twitter)
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to ask
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
kafka
kafkakafka
kafka
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 

Viewers also liked

HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
malorie_pinterest
 
Data Driven Growth
Data Driven GrowthData Driven Growth
Data Driven Growth
Discover Pinterest
 
IDEs y Frameworks mas utilizados
IDEs y Frameworks mas utilizadosIDEs y Frameworks mas utilizados
IDEs y Frameworks mas utilizados
George Navarro Gomez
 
Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012Marco Mendes
 
Configurando o geany_para_python
Configurando o geany_para_pythonConfigurando o geany_para_python
Configurando o geany_para_pythonMarco Mendes
 
Seda an architecture for well-conditioned scalable internet services
Seda   an architecture for well-conditioned scalable internet servicesSeda   an architecture for well-conditioned scalable internet services
Seda an architecture for well-conditioned scalable internet services
bdemchak
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
Nitish Upreti
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
Stephen Mallette
 
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
DataStax
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
Patrick McFadin
 
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBase
HBaseCon
 
HBaseCon 2015: Meet HBase 1.0
HBaseCon 2015: Meet HBase 1.0HBaseCon 2015: Meet HBase 1.0
HBaseCon 2015: Meet HBase 1.0
HBaseCon
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 

Viewers also liked (20)

HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Data Driven Growth
Data Driven GrowthData Driven Growth
Data Driven Growth
 
IDEs y Frameworks mas utilizados
IDEs y Frameworks mas utilizadosIDEs y Frameworks mas utilizados
IDEs y Frameworks mas utilizados
 
Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012Configurando o Geany para Python - 03/2012
Configurando o Geany para Python - 03/2012
 
Configurando o geany_para_python
Configurando o geany_para_pythonConfigurando o geany_para_python
Configurando o geany_para_python
 
Seda an architecture for well-conditioned scalable internet services
Seda   an architecture for well-conditioned scalable internet servicesSeda   an architecture for well-conditioned scalable internet services
Seda an architecture for well-conditioned scalable internet services
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
 
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
Real World Tales of Repair (Alexander Dejanovski, The Last Pickle) | Cassandr...
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBase
 
HBaseCon 2015: Meet HBase 1.0
HBaseCon 2015: Meet HBase 1.0HBaseCon 2015: Meet HBase 1.0
HBaseCon 2015: Meet HBase 1.0
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
 

Similar to HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase

XMPP/Jingle(VoIP)/Perl Ocean 2012/03
XMPP/Jingle(VoIP)/Perl Ocean 2012/03XMPP/Jingle(VoIP)/Perl Ocean 2012/03
XMPP/Jingle(VoIP)/Perl Ocean 2012/03
Lyo Kato
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
IT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & TechnologyIT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & Technology
OpenIO Object Storage
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
MongoDB
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle Banerjee
 
JDD2014: REST API versioning practice: from header to model - Łukasz Wierzbicki
JDD2014: REST API versioning practice: from header to model - Łukasz WierzbickiJDD2014: REST API versioning practice: from header to model - Łukasz Wierzbicki
JDD2014: REST API versioning practice: from header to model - Łukasz Wierzbicki
PROIDEA
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
Alexey Rybak
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
Scrapinghub
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
Tony Tam
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 

Similar to HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase (20)

XMPP/Jingle(VoIP)/Perl Ocean 2012/03
XMPP/Jingle(VoIP)/Perl Ocean 2012/03XMPP/Jingle(VoIP)/Perl Ocean 2012/03
XMPP/Jingle(VoIP)/Perl Ocean 2012/03
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
IT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & TechnologyIT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & Technology
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
20120306 dublin js
20120306 dublin js20120306 dublin js
20120306 dublin js
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
JDD2014: REST API versioning practice: from header to model - Łukasz Wierzbicki
JDD2014: REST API versioning practice: from header to model - Łukasz WierzbickiJDD2014: REST API versioning practice: from header to model - Łukasz Wierzbicki
JDD2014: REST API versioning practice: from header to model - Łukasz Wierzbicki
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Recently uploaded

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 

Recently uploaded (20)

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 

HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase

  • 1. Raghavendra Prabhu (RVP) Zen: A Graph Data Model on HBase Xun Liu
  • 2. HBase @ Pinterest - 2012 •Original use case: materialized home feed •Replaced Redis •Need: elasticity, high write load, serve from disk/SSD •Challenges: •Running on public cloud (AWS) •User facing use case (MTTR, latency, fault tolerance etc.)
  • 3. HBase @ Pinterest - 2013 •Need: highly elastic key-value store •Access from Python •Support “move fast” •Low operational overhead
  • 4. Enter UMegaStore Storage-as-a-Service: Key-value thrift API on top of HBase Features: •Key partitioning to balance load •Master-slave clusters, semi automatic failover •Speculative execution •Multi-tenancy with traffic isolation
  • 5. Storage-as-a-service was a great step forward, but could we do better?
  • 6. “Given how robust the messenger is on day one, it’s surprising to learn that Pinterest built the entire product in three months.” — The Verge
  • 7. Example: Messages Data Model Conversation Message 1 Message 2 Message N User User Participates Contains
  • 8. Realization •These object models closely resemble a graph •Objects are nodes, edges represent relationships •Typical needs: • retrieve data for a node or edge • get all outgoing edges from a node • get all incoming edges from a node • count incoming or outgoing edges for a node
  • 9. Enter Zen •Provides a graph data model instead of key-value •Automatically creates necessary indexes •Materializes counts for efficient querying •Implemented on top of HBase
  • 10. Zen API Nodes: • addNode, removeNode, getNode • Node id: globally unique 64-bit integer ID 123 Prop 1 Val 1 Prop 2 Val 2
  • 11. Zen API Edges: • addEdge, removeEdge, getEdge • Edge Ref: (edgeType, fromId, toId) • Score for ordering Edge Ref 120, 123, 4567 Prop 1 Val 1 Prop 2 Val 2
  • 12. Zen API Edge Queries: • getEdges, countEdges, removeEdges struct EdgeQuery { 1: required NodeId nodeId; 2: required EdgeDirection direction; 3: optional TypeId edgeType; }
  • 13. Zen API Property Indexes •Unique index •Ensures a property value is unique across all nodes of a type •Non-unique index •Allows retrieval by property value •Works for both nodes and edges
  • 14. Illustration: Messages on Zen Id:1234 Id:2345 Id:3456 Type: Participates Type: Contains Type: Conversation Started: 12 Aug 2014 08:00 Header: “Great pin!” Pin Id: 10001 [non-unique] Type: User Name: “Ben Smith” [unique] Status: Active Type: Message Sent: 12 Aug 2014 08:00 Text: “Great pin!”
  • 15. Zen @ Pinterest - 2014 Close to 10 million operations every second Smart Feed Messages News Interests
  • 17. Zen - Property Data type name score distance 12345 (node) 10 Ben Smith 12345-20-67890 (edge) 1000 1 mile
  • 18. Zen - Property Index Data ID unique-10-name=ben smith 12345 nonuniq-10-lastname=smith-12345 nonuniq-10-lastname=smith-67890
  • 19. Zen - Edge Score Index Data 12345-out-20-1000-67890 12345-out-20-1001-67891 12345-in-30-990-67892 12345-in-30-991-67893
  • 20. Zen - Edge Count Data Count 12345-out-20 2 12345-in-30 4
  • 22. 1. Avoid Hot Regions •Salt row keys to achieve uniform distribution •Reverse bits of auto increment integers •Prepend hash to row keys •Pre-split regions using uniform split •Tall table
  • 23. 2. Batch For Throughput •Bottleneck: HLog sync •Deferred HLog sync can lose data •Batching: •Client-side: batch APIs for clients to do bulk insertion •Server-side: Zen Server to buffer edits across clients and flush together
  • 24. 3. Tune For Performance •Memory v.s. Latency •Default: BLOOMFILTER => ‘ROW’, BLOCKSIZE => '8192' •Special: BLOOMFILTER => ‘NONE’, BLOCKSIZE => ‘32768' •CPU v.s. Data Size •Default: DATA_BLOCK_ENCODING => ‘FAST_DIFF’, COMPRESSION => ‘SNAPPY' •Special: DATA_BLOCK_ENCODING => ‘PREFIX’, COMPRESSION => ‘NONE’
  • 25. 4. Namespace for Isolation Node Namespace 1 Edge Index Node Namespace 2 Edge Index •Dedicated HBase cluster for big applications •Shared cluster with dedicated namespace for smaller ones
  • 26. 5. Coprocessor For Efficiency •Use Case: remove a large number of edges (feeds) •Usual Way: •Scan all edges of a node •Delete edges beyond a limit •Major compact to remove delete markers •Coprocessor •Trim in a major compaction coprocessor
  • 27. 6. Best Effort Data Consistency • No distributed transaction: keep things simple and fast • Best effort to maintain data consistency: • Manual rollback in Zen server upon failures • Offline MapReduce jobs (Dr Zen) to scan and fix inconsistencies
  • 28. Takeaways •The graph data model can be a convenient abstraction to cut down product development time. •HBase has worked very well as a storage backend for Zen for large scale user facing workloads.

Editor's Notes

  1. Clients can define index on properties. Index can be either unique or non-unique for the node or edge type. CAS to ensure unique constraint Tall table to allow split of hot index value.
  2. We encode the row key like the following: For each node-id, direction and edge type, we append the score, then the node-id on the other side. This way, all edges for a given node-id, direction and edge-type are stored together in ascending order of the edge scores. The pagination is then a simple range scan.