SlideShare a Scribd company logo
1 of 24
HBase Accelerated:
In-Memory Flush and Compaction
E s h c a r H i l l e l , A n a s t a s i a B r a g i n s k y , E d w a r d B o r t n i k o v ⎪ H a d o o p S u m m i t E u r o p e ,
A p r i l 1 3 , 2 0 1 6
Motivation: Dynamic Content Processing on Top of HBase
2
 Real-time content processing pipelines
› Store intermediate results in persistent map
› Notification mechanism is prevalent
 Storage and notifications on the same platform
 Sieve – Yahoo’s real-time content management platform
Crawl
Docpro
c
Link
Analysis Queue
Crawl
schedule
Content
Queue
Links
Serving
Apache Storm
Apache HBase
Notification Mechanism is Like a Sliding Window
3
 Small working set but not necessarily FIFO queue
 Short life-cycle delete message after processing it
 High-churn workload message state can be updated
 Frequent scans to consume message
Producer
Producer
Producer
Consumer
Consumer
Consumer
HBase Accelerated: Mission Definition
4
Goal
Real-time performance and less I/O in persistent KV-stores
How?
Exploit redundancies in the workload to eliminate duplicates in memory
Reduce amount of new files
Reduce write amplification effect (overall I/O)
Reduce retrieval latencies
How much can we gain?
Gain is proportional to the duplications
 Persistent map
› Atomic get, put, and snapshot scans
 Table sorted set of rows
› Each row uniquely identified by a key
 Region unit of horizontal partitioning
 Column family unit of vertical partitioning
 Store implements a column family within a
region
 In production a region server (RS) can
manage 100s-1000s of regions
› Share resources: memory, cpu, disk
key Col1 Col2 Col3
5
HBase Data model
Column family
Region
 Random I/O  Sequential I/O
 Random writes absorbed in RAM (Cm)
› Low latency
› Built in versioning
› Write-ahead-log (WAL) for durability
 Periodic batch merge to disk
› High-throughput I/O
 Disk components (Cd) immutable
› Periodic merge&compact eliminate duplicate
versions and tombstones
 Random reads from Cm or Cd cache
6
Log-Structured Merge (LSM) Key-Value Store Approach
Cm memory
disk
C1
C2
merge
Cd
 Random writes absorbed in active segment (Cm)
 When active segment is full
› Becomes immutable (snapshot)
› A new mutable (active) segment serves writes
› Flushed to disk, truncate WAL
 Compaction reads a few files, merge-sorts them, writes back new files
7
HBase Writes
C’m
flush
memory HDFS
Cm
prepare-for-flush
Cd
WAL
write
 Random reads from Cm or C’m or Cd cache
 When data piles-up on disk
› Hit ratio drops
› Retrieval latency up
 Compaction re-writes small files into fewer bigger files
› Causes replication-related network and disk IO
8
HBase Reads
C’m
memory HDFS
Cm
Cd
Read
Block
cache
 New compaction pipeline
› Active segment flushed to pipeline
› Pipeline segments compacted in memory
› Flush to disk only when needed
9
New Design: In-Memory Flush and Compaction
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory
Trade read cache (BlockCache) for write cache (compaction pipeline)
10
New Design: In-Memory Flush and Compaction
write read
cache cache
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory
11
New Design: In-Memory Flush and Compaction
CPU
IO
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory
Trade read cache (BlockCache) for write cache (compaction pipeline)
More CPU cycles for less I/O
Evaluation Settings: In-Memory Working Set
12
 YCSB: compares compacting vs. default memstore
 Small cluster: 3 HDFS nodes on a single rack, 1 RS
› 1GB heap space, MSLAB enabled (2MB chunks)
› Default: 128MB flush size, 100MB block-cache
› Compacting: 192MB flush size, 36MB block-cache
 High-churn workload, small working set
› 128,000 records, 1KB value field
› 10 threads running 5 millions operations, various key distributions
› 50% reads 50% updates, target 1000ops
› 1% (long) scans 99% updates, target 500ops
 Measure average latency over time
› Latencies accumulated over intervals of 10 seconds
Evaluation Results: Read Latency (Zipfian Distribution)
13
Flush
to disk
Compaction
Data
fits into
cache
Evaluation Results: Read Latency (Uniform Distribution)
14
Region
split
Evaluation Results: Scan Latency (Uniform Distribution)
15
Evaluation Settings: Handling Tombstones
16
 YCSB: compares compacting vs. default memstore
 Small cluster: 3 HDFS nodes on a single rack, 1 RS
› 1GB heap space, MSLAB enabled (2MB chunks), 128MB flush size, 64MB block-cache
› Default: Minimum 4 files for compaction
› Compaction: Minimum 2 files for compaction
 High-churn workload, small working set with deletes
› 128,000 records, 1KB value field
› 10 threads running 5 millions operations, various key distributions
› 40% reads 40% updates 20% deletes (with 50,000 updates head start), target 1000ops
› 1% (long) scans 66% updates 33% deletes (with head start), target 500ops
 Measure average latency over time
› Latencies accumulated over intervals of 10 seconds
Evaluation Results: Read Latency (Zipfian Distribution)
17
(total 2 flushes and 1 disk compactions)
(total 15 flushes and 4 disk compactions)
Evaluation Results: Read Latency (Uniform Distribution)
18
(total 3 flushes and 2 disk compactions)
(total 15 flushes and 4 disk compactions)
Evaluation Results: Scan Latency (Zipfian Distribution)
19
Work in Progress: Memory Optimizations
20
 Current design
› Data stored in flat buffers, index is a skip-list
› All memory allocated on-heap
 Optimizations
› Flat layout for immutable segments index
› Manage (allocate, store, release) data buffers off-heap
 Pros
› Locality in access to index
› Reduce memory fragmentation
› Significantly reduce GC work
› Better utilization of memory and CPU
Status Umbrella Jira HBASE-14918
21
 HBASE-14919 HBASE-15016 HBASE-15359 Infrastructure refactoring
› Status: committed
 HBASE-14920 new compacting memstore
› Status: pre-commit
 HBASE-14921 memory optimizations (memory layout, off-heaping)
› Status: under code review
Summary
22
 Feature intended for HBase 2.0.0
 New design pros over default implementation
› Predictable retrieval latency by serving (mainly) from memory
› Less compaction on disk reduces write amplification effect
› Less disk I/O and network traffic reduces load on HDFS
 We would like to thank the reviewers
› Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu
Thank You
24
Evaluation Results: Write Latency

More Related Content

What's hot

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsCloudera, Inc.
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasDataWorks Summit
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Severalnines
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored NodesAchieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored NodesDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBaseCarol McDonald
 

What's hot (20)

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored NodesAchieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Getting Started with HBase
Getting Started with HBaseGetting Started with HBase
Getting Started with HBase
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 

Viewers also liked

HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Tarantool 1.6: NoSQL database and application server
Tarantool 1.6: NoSQL database and application serverTarantool 1.6: NoSQL database and application server
Tarantool 1.6: NoSQL database and application serverAlexander Gornyi
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)alexbaranau
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIn-Memory Computing Summit
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Teradata introduction
Teradata introductionTeradata introduction
Teradata introductionRameejmd
 
BigData_TP5 : Neo4J
BigData_TP5 : Neo4JBigData_TP5 : Neo4J
BigData_TP5 : Neo4JLilia Sfaxi
 

Viewers also liked (20)

HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Tarantool 1.6: NoSQL database and application server
Tarantool 1.6: NoSQL database and application serverTarantool 1.6: NoSQL database and application server
Tarantool 1.6: NoSQL database and application server
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)Real-time Analytics with HBase (short version)
Real-time Analytics with HBase (short version)
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Teradata introduction
Teradata introductionTeradata introduction
Teradata introduction
 
BigData_TP5 : Neo4J
BigData_TP5 : Neo4JBigData_TP5 : Neo4J
BigData_TP5 : Neo4J
 

Similar to HBase Accelerated: In-Memory Flush and Compaction

Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed_Hat_Storage
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
 
HBaseCon2017 Accordion: Apache HBase Beathes with In-Memory Compaction
HBaseCon2017 Accordion: Apache HBase Beathes with In-Memory CompactionHBaseCon2017 Accordion: Apache HBase Beathes with In-Memory Compaction
HBaseCon2017 Accordion: Apache HBase Beathes with In-Memory CompactionHBaseCon
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...HostedbyConfluent
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Community
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...Joao Galdino Mello de Souza
 
HBase at Flurry
HBase at FlurryHBase at Flurry
HBase at Flurryddlatham
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailInternet World
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 

Similar to HBase Accelerated: In-Memory Flush and Compaction (20)

Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
 
HBaseCon2017 Accordion: Apache HBase Beathes with In-Memory Compaction
HBaseCon2017 Accordion: Apache HBase Beathes with In-Memory CompactionHBaseCon2017 Accordion: Apache HBase Beathes with In-Memory Compaction
HBaseCon2017 Accordion: Apache HBase Beathes with In-Memory Compaction
 
Accordion HBaseCon 2017
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
HBase at Flurry
HBase at FlurryHBase at Flurry
HBase at Flurry
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

HBase Accelerated: In-Memory Flush and Compaction

  • 1. HBase Accelerated: In-Memory Flush and Compaction E s h c a r H i l l e l , A n a s t a s i a B r a g i n s k y , E d w a r d B o r t n i k o v ⎪ H a d o o p S u m m i t E u r o p e , A p r i l 1 3 , 2 0 1 6
  • 2. Motivation: Dynamic Content Processing on Top of HBase 2  Real-time content processing pipelines › Store intermediate results in persistent map › Notification mechanism is prevalent  Storage and notifications on the same platform  Sieve – Yahoo’s real-time content management platform Crawl Docpro c Link Analysis Queue Crawl schedule Content Queue Links Serving Apache Storm Apache HBase
  • 3. Notification Mechanism is Like a Sliding Window 3  Small working set but not necessarily FIFO queue  Short life-cycle delete message after processing it  High-churn workload message state can be updated  Frequent scans to consume message Producer Producer Producer Consumer Consumer Consumer
  • 4. HBase Accelerated: Mission Definition 4 Goal Real-time performance and less I/O in persistent KV-stores How? Exploit redundancies in the workload to eliminate duplicates in memory Reduce amount of new files Reduce write amplification effect (overall I/O) Reduce retrieval latencies How much can we gain? Gain is proportional to the duplications
  • 5.  Persistent map › Atomic get, put, and snapshot scans  Table sorted set of rows › Each row uniquely identified by a key  Region unit of horizontal partitioning  Column family unit of vertical partitioning  Store implements a column family within a region  In production a region server (RS) can manage 100s-1000s of regions › Share resources: memory, cpu, disk key Col1 Col2 Col3 5 HBase Data model Column family Region
  • 6.  Random I/O  Sequential I/O  Random writes absorbed in RAM (Cm) › Low latency › Built in versioning › Write-ahead-log (WAL) for durability  Periodic batch merge to disk › High-throughput I/O  Disk components (Cd) immutable › Periodic merge&compact eliminate duplicate versions and tombstones  Random reads from Cm or Cd cache 6 Log-Structured Merge (LSM) Key-Value Store Approach Cm memory disk C1 C2 merge Cd
  • 7.  Random writes absorbed in active segment (Cm)  When active segment is full › Becomes immutable (snapshot) › A new mutable (active) segment serves writes › Flushed to disk, truncate WAL  Compaction reads a few files, merge-sorts them, writes back new files 7 HBase Writes C’m flush memory HDFS Cm prepare-for-flush Cd WAL write
  • 8.  Random reads from Cm or C’m or Cd cache  When data piles-up on disk › Hit ratio drops › Retrieval latency up  Compaction re-writes small files into fewer bigger files › Causes replication-related network and disk IO 8 HBase Reads C’m memory HDFS Cm Cd Read Block cache
  • 9.  New compaction pipeline › Active segment flushed to pipeline › Pipeline segments compacted in memory › Flush to disk only when needed 9 New Design: In-Memory Flush and Compaction C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory
  • 10. Trade read cache (BlockCache) for write cache (compaction pipeline) 10 New Design: In-Memory Flush and Compaction write read cache cache C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory
  • 11. 11 New Design: In-Memory Flush and Compaction CPU IO C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory Trade read cache (BlockCache) for write cache (compaction pipeline) More CPU cycles for less I/O
  • 12. Evaluation Settings: In-Memory Working Set 12  YCSB: compares compacting vs. default memstore  Small cluster: 3 HDFS nodes on a single rack, 1 RS › 1GB heap space, MSLAB enabled (2MB chunks) › Default: 128MB flush size, 100MB block-cache › Compacting: 192MB flush size, 36MB block-cache  High-churn workload, small working set › 128,000 records, 1KB value field › 10 threads running 5 millions operations, various key distributions › 50% reads 50% updates, target 1000ops › 1% (long) scans 99% updates, target 500ops  Measure average latency over time › Latencies accumulated over intervals of 10 seconds
  • 13. Evaluation Results: Read Latency (Zipfian Distribution) 13 Flush to disk Compaction Data fits into cache
  • 14. Evaluation Results: Read Latency (Uniform Distribution) 14 Region split
  • 15. Evaluation Results: Scan Latency (Uniform Distribution) 15
  • 16. Evaluation Settings: Handling Tombstones 16  YCSB: compares compacting vs. default memstore  Small cluster: 3 HDFS nodes on a single rack, 1 RS › 1GB heap space, MSLAB enabled (2MB chunks), 128MB flush size, 64MB block-cache › Default: Minimum 4 files for compaction › Compaction: Minimum 2 files for compaction  High-churn workload, small working set with deletes › 128,000 records, 1KB value field › 10 threads running 5 millions operations, various key distributions › 40% reads 40% updates 20% deletes (with 50,000 updates head start), target 1000ops › 1% (long) scans 66% updates 33% deletes (with head start), target 500ops  Measure average latency over time › Latencies accumulated over intervals of 10 seconds
  • 17. Evaluation Results: Read Latency (Zipfian Distribution) 17 (total 2 flushes and 1 disk compactions) (total 15 flushes and 4 disk compactions)
  • 18. Evaluation Results: Read Latency (Uniform Distribution) 18 (total 3 flushes and 2 disk compactions) (total 15 flushes and 4 disk compactions)
  • 19. Evaluation Results: Scan Latency (Zipfian Distribution) 19
  • 20. Work in Progress: Memory Optimizations 20  Current design › Data stored in flat buffers, index is a skip-list › All memory allocated on-heap  Optimizations › Flat layout for immutable segments index › Manage (allocate, store, release) data buffers off-heap  Pros › Locality in access to index › Reduce memory fragmentation › Significantly reduce GC work › Better utilization of memory and CPU
  • 21. Status Umbrella Jira HBASE-14918 21  HBASE-14919 HBASE-15016 HBASE-15359 Infrastructure refactoring › Status: committed  HBASE-14920 new compacting memstore › Status: pre-commit  HBASE-14921 memory optimizations (memory layout, off-heaping) › Status: under code review
  • 22. Summary 22  Feature intended for HBase 2.0.0  New design pros over default implementation › Predictable retrieval latency by serving (mainly) from memory › Less compaction on disk reduces write amplification effect › Less disk I/O and network traffic reduces load on HDFS  We would like to thank the reviewers › Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu