Llap: Locality is Dead

© Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP: Locality is dead (in the cloud)
Gopal Vijayaraghavan

Data Locality – as usually discussed
Disk
CPU
Memory
Network
Share-nothing
Shared

Cloud – The Network eats itself
Network
Processing
Memory
Network
Shared

Cutaway Demo – LLAP on Cloud
TL;DW - repeat LLA+S3 benchmark on HDC
3 LLAP (m4.xlarge) nodes, Fact table has 864,001,869 rows
--------------------------------------------------------------------------------
VERTICES: 06/06 [==========================>>] 100% ELAPSED TIME: 1.68 s
--------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 1.63 seconds
INFO :
Hortonworks Data Cloud LLAP is >25x faster
EMR

• Wait a minute – is this a new problem?
• How well do we handle data locality on-prem?
• Fast BI tools, how long can they afford to wait for locality?
• We do have non-local readers sometimes, I know it
• I mean, that’s why we have HDFS right?
Amdahl’s law knocks, who answers?

Data Locality – BI tools fight you, even on-prem
Disk
CPU
Memory
Network
Share-nothing
Shared

Data Locality – what it looks like (sometimes)
HDFS
CPU
Memory
Network
Share-nothing
Shared

Evaluation week of cool new BI tool – easy mistakes
Rack#1
Mistake #1 – use whole cluster to load sample data (to do it real fast, time is money)
Mistake #2 – use whole cluster to test BI tool (let’s really see how fast it can be)
Mistake #3 – Use exactly 1 rack (we’re not going to make that one)
Rack#2
Rack#3
☑
☑
☒

Someone says “Data Lake” and sets up a river
Rack#1
BI – becomes a 30% tenant
Rack#2
Rack#3
Arguments start on how to set it up
How about 1 node every rack? We’ll get lots of rack locality
All joins can’t be co-located, so shuffle is always cross-rack – SLOW!
And you noticed that the Kafka pipeline running on rack #2 is a big noisy neigbhour
Fast is what we’re selling, so that won’t do

“Noisy network neighbours? Get a dedicated rack”
Rack#1
BI – gets its own rack.
ETL – gets the other two
Rack#2
Rack#3
All files have 3 replicas - but you might still not have rack locality.
3 replicas in HDFS – always uses 2 racks
(3rd replica is in-rack to 2nd)
replication=10 on 20-node racks, uses 2 racks (1+9 replicas)

Dedicated rack – your DFS IO is now crossing racks
Rack#1
The real victims are now broadcast joins – which scan fact tables over the network.
If your ETL is coming from off-rack – 50% probability that your new data has no locality in rack #1
You either have 2 replicas in Rack #1 or none.
Rack#2
Rack#3
If you try to fix this with a custom placement policy, the DNs on rack #1 will get extra writes
Tail your DFS audit logs folks – there’s so much info there

Rack#1
However, you realize that you can make a “copy” in rack #1
But for this cluster, until you setrep 7, there’s no way to be sure rack #1 has a copy.
Rack#2
Rack#3
Cache!

Caching efficiently - LLAP’s tricks
LLAP’s cache is decentralized, columnar, automatic, additive, packed and layered.
When a new column or partition
is used, the cache adds to itself
incrementally - unlike
immutable caches

LLAP cache: ACID transactional snapshots
LLAP cache is built to handle Hive ACID 2.x data with overlapping read transactions.
With failure tolerance across retries
Q1Q2 LLAP
Partition=1 [txns=<1,2>]
Partition=1 [txns=<1>]
✔
✔
Partition=1 (retry) [txns=<1,2>]
Partition=1 (retry) [txns=<1>]
✔
✔
HIVE-12631
This works with a single cached
copy for any rows which are
common across the
transactions.
The retries work even if txn=2
deleted a row which existed in
txn=1.
Q2 is a txn ahead of Q1
Same partition, different data (in cache)

Locality is dead, long live cache affinity
Rack#1
1
3 2
Split #1 [2,3,1]
Split #2 [3,1,2]
If node #2 fails or is too busy, scheduler will skip.
When a node reboots, it takes up lowest open slot when it comes up
A reboot might cause an empty slot, but won’t cause cache misses on others

Questions?
This presentation represents the work of several folks from the Hive community over
several years – Sergey, Gunther, Prasanth, Rajesh, Nita, Ashutosh, Jesus, Deepak,
Jason, Sid, Matt, Teddy, Eugene and Vikram.

Llap: Locality is Dead

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Llap: Locality is Dead

Similar to Llap: Locality is Dead (20)

Recently uploaded

Recently uploaded (20)

Llap: Locality is Dead