In-memory Caching in HDFS: Lower Latency, Same Great Taste

DataWorks Summit
DataWorks SummitDataWorks Summit
In-memory Caching in HDFS: Lower
Latency, Same Great Taste
Andrew Wang and Colin McCabe
Cloudera
2
In-memory Caching in HDFS
Lower latency, same great taste
Andrew Wang | awang@cloudera.com
Colin McCabe | cmccabe@cloudera.com
Alice
Hadoop
cluster
Query
Result set
Alice
Fresh data
Fresh data
Alice
Rollup
Problems
• Data hotspots
• Everyone wants to query some fresh data
• Shared disks are unable to handle high load
• Mixed workloads
• Data analyst making small point queries
• Rollup job scanning all the data
• Point query latency suffers because of I/O contention
• Same theme: disk I/O contention!
7
How do we solve I/O issues?
• Cache important datasets in memory!
• Much higher throughput than disk
• Fast random/concurrent access
• Interesting working sets often fit in cluster memory
• Traces from Facebook’s Hive cluster
• Increasingly affordable to buy a lot of memory
• Moore’s law
• 1TB RAM server is 40k on HP’s website
8
Alice
Page cache
Alice
Repeated
query
?
Alice
Rollup
Alice
Extra copies
Alice
Checksum
verification
Extra copies
Design Considerations
1. Placing tasks for memory locality
• Expose cache locations to application schedulers
2. Contention for page cache from other users
• Explicitly pin hot datasets in memory
3. Extra copies when reading cached data
• Zero-copy API to read cached data
14
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
15
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
16
Cache Directives
• A cache directive describes a file or directory that
should be cached
• Path
• Cache replication factor: 1-N
• Stored permanently on the NameNode
• Also have cache pools for access control and quotas,
but we won’t be covering that here
17
Architecture
18
DataNode
DataNode DataNode
NameNode
DFSClient
Cache /foo
Cache commandsCache Heartbeats
DFSClient
getBlockLocations
mlock
• The DataNode pins each cached block into the page
cache using mlock, and checksums it.
• Because we’re using the page cache, the blocks don’t
take up any space on the Java heap.
19
DataNode
Page Cache
DFSClient read
mlock
Zero-copy read API
• Clients can use the zero-copy read API to map the
cached replica into their own address space
• The zero-copy API avoids the overhead of the
read() and pread() system calls
• However, we don’t verify checksums when using the
zero-copy API
• The zero-copy API can be only used on cached data, or
when the application computes its own checksums.
20
Zero-copy read API
New FSDataInputStream methods:
ByteBuffer read(ByteBufferPool pool,
int maxLength, EnumSet<ReadOption> opts);
void releaseBuffer(ByteBuffer buffer);
21
Skipping Checksums
• We would like to skip checksum verification when
reading cached data
• DataNode already checksums when caching the block
• Enables more efficient SCR, ZCR
• Requirements
• Client needs to know that the replica is cached
• DataNode needs to notify the client if the replica is
uncached
22
Skipping Checksums
• The DataNode and DFSClient use shared memory
segments to communicate which blocks are cached.
23
DataNode
Page Cache
DFSClient read
mlock
Shared
Memory
Segment
Skipping Checksums
24
Block 123
DataNode DFSClient
Can Skip Csums
In Use
Zero-Copy
MappedByteBuffer
Skipping Checksums
25
Block 123
DataNode DFSClient
Can Skip Csums
In Use
Zero-Copy
MappedByteBuffer
Architecture Summary
• The Cache Directive API provides per-file control over
what is cached
• The NameNode tracks cached blocks and coordinates
DataNode cache work
• The DataNodes use mlock to lock page cache blocks
into memory
• The DFSClient can determine whether it is safe to
skip checksums via the shared memory segment
• Caching makes it possible to use the efficient Zero-
Copy API on cached data
26
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Single-Node Microbenchmarks
• MapReduce
• Impala
• Future work
27
Test Node
• 48GB of RAM
• Configured 38GB of HDFS cache
• 11x SATA hard disks
• 2x4 core 2.13 GHz Westmere Xeon processors
• 10 Gbit/s full-bisection bandwidth network
28
Single-Node Microbenchmarks
• How much faster are cached and zero-copy reads?
• Introducing vecsum (vector sum)
• Computes sums of a file of doubles
• Highly optimized: uses SSE intrinsics
• libhdfs program
• Can toggle between various read methods
• Terminology
• SCR: short-circuit reads
• ZCR: zero-copy reads
29
Throughput Reading 1G File 20x
30
0.8 0.9
1.9
2.4
5.9
0
1
2
3
4
5
6
7
TCP TCP no
csums
SCR SCR no
csums
ZCR
GB/s
ZCR 1GB vs 20GB
31
5.9
2.7
0
1
2
3
4
5
6
7
1GB 20GB
GB/s
Throughput
• Skipping checksums matters more when going faster
• ZCR gets close to bus bandwidth
• ~6GB/s
• Need to reuse client-side mmaps for maximum perf
• page_fault function is 1.16% of cycles in 1G
• 17.55% in 20G
32
Client CPU cycles
33
57.6
51.8
27.1
23.4
12.7
0
10
20
30
40
50
60
70
TCP TCP no
csums
SCR SCR no
csums
ZCR
CPUcycles(billions)
Why is ZCR more CPU-efficient?
34
Why is ZCR more CPU-efficient?
35
Remote Cached vs. Local Uncached
• Zero-copy is only possible for local cached data
• Is it better to read from remote cache, or local disk?
36
Remote Cached vs. Local Uncached
37
841
1092
125 137
0
200
400
600
800
1000
1200
TCP iperf SCR dd
MB/s
Microbenchmark Conclusions
• Short-circuit reads need less CPU than TCP reads
• ZCR is even more efficient, because it avoids a copy
• ZCR goes much faster when re-reading the same
data, because it can avoid mmap page faults
• Network and disk may be bottleneck for remote or
uncached reads
38
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
39
MapReduce
• Started with example MR jobs
• Wordcount
• Grep
• 5 node cluster: 4 DNs, 1 NN
• Same hardware configuration as single node tests
• 38GB HDFS cache per DN
• 11 disks per DN
• 17GB of Wikipedia text
• Small enough to fit into cache at 3x replication
• Ran each job 10 times, took the average
40
wordcount and grep
41
275
52
280
55
0
50
100
150
200
250
300
350
400
wordcount wordcount
cached
grep grep cached
wordcount and grep
42
275
52
280
55
0
50
100
150
200
250
300
350
400
wordcount wordcount
cached
grep grep cached
Almost no
speedup!
wordcount and grep
43
275
52
280
55
0
50
100
150
200
250
300
350
400
wordcount wordcount
cached
grep grep cached
~60MB/s
~330MB/s
Not I/O bound
wordcount and grep
• End-to-end latency barely changes
• These MR jobs are simply not I/O bound!
• Best map phase throughput was about 330MB/s
• 44 disks can theoretically do 4400MB/s
• Further reasoning
• Long JVM startup and initialization time
• Many copies in TextInputFormat, doesn’t use zero-copy
• Caching input data doesn’t help reduce step
44
Introducing bytecount
• Trivial version of wordcount
• Counts # of occurrences of byte values
• Heavily CPU optimized
• Each mapper processes an entire block via ZCR
• No additional copies
• No record slop across block boundaries
• Fast inner loop
• Very unrealistic job, but serves as a best case
• Also tried 2GB block size to amortize startup costs
45
bytecount
46
52
39 35
55
45
58
0
10
20
30
40
50
60
70
bytecount
47
52
39 35
55
45
58
0
10
20
30
40
50
60
70
1.3x faster
bytecount
48
52
39 35
55
45
58
0
10
20
30
40
50
60
70
Still only
~500MB/s
MapReduce Conclusions
49
• Many MR jobs will see marginal improvement
• Startup costs
• CPU inefficiencies
• Shuffle and reduce steps
• Even bytecount sees only modest gains
• 1.3x faster than disk
• 500MB/s with caching and ZCR
• Nowhere close to GB/s possible with memory
• Needs more work to take full advantage of caching!
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
50
Impala Benchmarks
• Open-source OLAP database developed by Cloudera
• Tested with Impala 1.3 (CDH 5)
• Same 4 DN cluster as MR section
• 38GB of 48GB per DN configured as HDFS cache
• 152GB aggregate HDFS cache
• 11 disks per DN
51
Impala Benchmarks
• 1TB TPC-DS store_sales table, text format
• count(*) on different numbers of partitions
• Has to scan all the data, no skipping
• Queries
• 51GB small query (34% cache capacity)
• 148GB big query (98% cache capacity)
• Small query with concurrent workload
• Tested “cold” and “hot”
• echo 3 > /proc/sys/vm/drop_caches
• Lets us compare HDFS caching against page cache
52
Small Query
53
19.8
5.8
4.0 3.0
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
Small Query
54
19.8
5.8
4.0 3.0
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
2550 MB/s 17 GB/s
I/O bound!
Small Query
55
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
3.4x faster,
disk vs. memory
Small Query
56
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
1.3x after warmup, still
wins on CPU efficiency
Big Query
57
48.2
11.5
40.9
9.4
0
10
20
30
40
50
60
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
Big Query
58
0
10
20
30
40
50
60
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
4.2x
faster, disk
vs mem
Big Query
59
0
10
20
30
40
50
60
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
4.3x
faster, does
n’t fit in
page cache
Cannot schedule for
page cache locality
Small Query with Concurrent Workload
60
0
10
20
30
40
50
60
Uncached Cached Cached (not
concurrent)
Averageresponsetime(s)
Small Query with Concurrent Workload
61
0
10
20
30
40
50
60
Uncached Cached Cached (not
concurrent)
Averageresponsetime(s)
7x faster when small query
working set is cached
Small Query with Concurrent Workload
62
0
10
20
30
40
50
60
Uncached Cached Cached (not
concurrent)
Averageresponsetime(s)
2x slower than
isolated, CPU
contention
Impala Conclusions
• HDFS cache is faster than disk or page cache
• ZCR is more efficient than SCR from page cache
• Better when working set is approx. cluster memory
• Can schedule tasks for cache locality
• Significantly better for concurrent workloads
• 7x faster when contending with a single background query
• Impala performance will only improve
• Many CPU improvements on the roadmap
63
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
64
Future Work
• Automatic cache replacement
• LRU, LFU, ?
• Sub-block caching
• Potentially important for automatic cache replacement
• Columns in Parquet
• Compression, encryption, serialization
• Lose many benefits of zero-copy API
• Write-side caching
• Enables Spark-like RDDs for all HDFS applications
65
Conclusion
• I/O contention is a problem for concurrent workloads
• HDFS can now explicitly pin working sets into RAM
• Applications can place their tasks for cache locality
• Use zero-copy API to efficiently read cached data
• Substantial performance improvements
• 6GB/s for single thread microbenchmark
• 7x faster for concurrent Impala workload
66
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
bytecount
70
52
39 35
55
45
58
0
10
20
30
40
50
60
70
Less disk parallelism
In-memory Caching in HDFS: Lower Latency, Same Great Taste
1 of 71

Recommended

RedisConf17- Using Redis at scale @ Twitter by
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
3.8K views26 slides
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J... by
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
98.6K views44 slides
High Availability PostgreSQL with Zalando Patroni by
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniZalando Technology
25K views48 slides
Solving PostgreSQL wicked problems by
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
3.9K views61 slides
Deep Dive: Memory Management in Apache Spark by
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
14.5K views54 slides
Linux tuning to improve PostgreSQL performance by
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
35.9K views26 slides

More Related Content

What's hot

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn by
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
3.7K views46 slides
Introduction to Apache Flink by
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkmxmxm
483 views76 slides
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko... by
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...Amazon Web Services Korea
770 views81 slides
Running Apache Spark on Kubernetes: Best Practices and Pitfalls by
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
2.9K views36 slides
[Outdated] Secrets of Performance Tuning Java on Kubernetes by
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on KubernetesBruno Borges
1.6K views27 slides
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud by
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudDatabricks
1.5K views15 slides

What's hot(20)

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn by LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn3.7K views
Introduction to Apache Flink by mxmxm
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
mxmxm483 views
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko... by Amazon Web Services Korea
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
Running Apache Spark on Kubernetes: Best Practices and Pitfalls by Databricks
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks2.9K views
[Outdated] Secrets of Performance Tuning Java on Kubernetes by Bruno Borges
[Outdated] Secrets of Performance Tuning Java on Kubernetes[Outdated] Secrets of Performance Tuning Java on Kubernetes
[Outdated] Secrets of Performance Tuning Java on Kubernetes
Bruno Borges1.6K views
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud by Databricks
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Databricks1.5K views
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity by Wes McKinney
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney1.1K views
Apache Flink in the Cloud-Native Era by Flink Forward
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward173 views
Top 5 Mistakes When Writing Spark Applications by Spark Summit
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
Spark Summit26.4K views
Building Robust ETL Pipelines with Apache Spark by Databricks
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
Databricks34.6K views
Cloudera Impala Source Code Explanation and Analysis by Yue Chen
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
Yue Chen3.6K views
Parallelizing with Apache Spark in Unexpected Ways by Databricks
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
Databricks5.6K views
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 - by Yoshiyasu SAEKI
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI9.5K views
Presto query optimizer: pursuit of performance by DataWorks Summit
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
DataWorks Summit4.3K views
Apache Tez – Present and Future by DataWorks Summit
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit3.8K views
Plazma - Treasure Data’s distributed analytical database - by Treasure Data, Inc.
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
Treasure Data, Inc.12.6K views
Introduction to Apache ZooKeeper by Saurav Haloi
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi128.5K views
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud by Noritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama33.3K views
Apache storm vs. Spark Streaming by P. Taylor Goetz
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz210.5K views
From cache to in-memory data grid. Introduction to Hazelcast. by Taras Matyashovsky
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky42.6K views

Similar to In-memory Caching in HDFS: Lower Latency, Same Great Taste

In-memory Data Management Trends & Techniques by
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
1K views35 slides
August 2013 HUG: Removing the NameNode's memory limitation by
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
3.9K views21 slides
Colvin exadata mistakes_ioug_2014 by
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
82 views49 slides
Accelerating HBase with NVMe and Bucket Cache by
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
1.2K views33 slides
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ... by
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
120 views31 slides
Accelerating hbase with nvme and bucket cache by
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
571 views34 slides

Similar to In-memory Caching in HDFS: Lower Latency, Same Great Taste(20)

In-memory Data Management Trends & Techniques by Hazelcast
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
Hazelcast1K views
August 2013 HUG: Removing the NameNode's memory limitation by Yahoo Developer Network
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Colvin exadata mistakes_ioug_2014 by marvin herrera
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera82 views
Accelerating HBase with NVMe and Bucket Cache by Nicolas Poggi
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi1.2K views
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ... by Alluxio, Inc.
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.120 views
Accelerating hbase with nvme and bucket cache by David Grier
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier571 views
High performace network of Cloud Native Taiwan User Group by HungWei Chiu
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
HungWei Chiu1.8K views
Hadoop 3.0 - Revolution or evolution? by Uwe Printz
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz1.2K views
Tuning Linux for your database FLOSSUK 2016 by Colin Charles
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
Colin Charles1.9K views
Scaling with sync_replication using Galera and EC2 by Marco Tusa
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa2.6K views
Caching Methodology & Strategies by Tiệp Vũ
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & Strategies
Tiệp Vũ511 views
Caching methodology and strategies by Tiep Vu
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
Tiep Vu203 views
OSDC 2016 - Tuning Linux for your Database by Colin Charles by NETWAYS
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
NETWAYS83 views
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516] by Malin Weiss
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss103 views
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516] by Speedment, Inc.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.206 views
Hadoop 3.0 - Revolution or evolution? by Uwe Printz
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz807 views
CPU Caches by shinolajla
CPU CachesCPU Caches
CPU Caches
shinolajla4.6K views
Leveraging Databricks for Spark Pipelines by Rose Toomey
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
Rose Toomey80 views

More from DataWorks Summit

Data Science Crash Course by
Data Science Crash CourseData Science Crash Course
Data Science Crash CourseDataWorks Summit
19.3K views47 slides
Floating on a RAFT: HBase Durability with Apache Ratis by
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
2.9K views20 slides
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi by
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
2.1K views19 slides
HBase Tales From the Trenches - Short stories about most common HBase operati... by
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
1.8K views18 slides
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac... by
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
1.6K views74 slides
Managing the Dewey Decimal System by
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
1K views8 slides

More from DataWorks Summit(20)

Floating on a RAFT: HBase Durability with Apache Ratis by DataWorks Summit
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit2.9K views
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi by DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit2.1K views
HBase Tales From the Trenches - Short stories about most common HBase operati... by DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit1.8K views
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac... by DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit1.6K views
Practical NoSQL: Accumulo's dirlist Example by DataWorks Summit
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit834 views
HBase Global Indexing to support large-scale data ingestion at Uber by DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit915 views
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix by DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit714 views
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi by DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit1.3K views
Supporting Apache HBase : Troubleshooting and Supportability Improvements by DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit1.8K views
Security Framework for Multitenant Architecture by DataWorks Summit
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit1.1K views
Presto: Optimizing Performance of SQL-on-Anything Engine by DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit1.8K views
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl... by DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit3.2K views
Extending Twitter's Data Platform to Google Cloud by DataWorks Summit
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit1K views
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi by DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit4K views
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger by DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit957 views
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory... by DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit771 views
Computer Vision: Coming to a Store Near You by DataWorks Summit
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit214 views
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark by DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit615 views

Recently uploaded

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
21 views15 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
40 views69 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
53 views38 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
280 views86 slides
STPI OctaNE CoE Brochure.pdf by
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
14 views1 slide
Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentationssuserb54b561
15 views27 slides

Recently uploaded(20)

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab21 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software280 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Future of AR - Facebook Presentation by ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院

In-memory Caching in HDFS: Lower Latency, Same Great Taste