17. Exploit Data Locality
Data is more likely to be read if:
• It was recently read (temporal locality)
• If it is adjacent to other data (e.g. arrays, fields in an object)
• If it is part of a pattern (e.g. looping, relations)
• Some data is naturally accessed more frequently e.g. Pareto
Distribution
18. Working with the CPU’s Cache Hierarchy
• Memory up to 30x slower than cache
• Alleviated somewhat by NUMA, wide
channel, multi-channel/large cache
• Vector instructions
• Work with Cache Lines
• Work with Memory Pages (TLBs)
• Work with Prefetching
• Exploit NUMA with cpu affinity
numactl --physcpubind=0 –localalloc java …
• Exploit natural data locality
19. Data Locality Effects – intra machine
0
20
40
60
80
100
120
140
160
Linear Random -
Page
Random -
Heap
Intel U4100
i7-860
i7-2760QM
20. 20
Tiered Storage
20
20
Local Disk
SSD and Rotational
(Restartable)
Local Storage
Heap
Store
Off-Heap Store
5,000,000+
1,000,000
10
1000+
2,000+
Speed (TPS) Size (GB)
100,000
10,000s -
Network Storage
Network Accessible Memory
- 100,000 +
21. 21
Data Locality Effects – inter machine
2121
Compared
with
hybrid
in-‐process
and
distributed
cache:
Latency
=
L1
speed
*
propor:on
+
L2
speed
*
propor:on
L1
=
0ms
(
5us)
for
on-‐heap
and
50-‐100
us
off-‐
heap
L2
=
1
ms
80%
L1
Pareto
Model:
=
0
*
.8
+
1
*
.2
=
.2
ms
90%
L1
Pareto
Model:
latency
=
0
*
.9
+
1
*
.1
=
.1
ms
22. Columnar Storage
• Manipulate data locality
• Sorted Dictionary compression
for finite values
• Allows values to be held in
cache for SSE instructions
• Better cache line effectiveness
• Fewer CPU cache misses for
aggregate calculations
• Cross-over point is around a
few dozen columns
23. Parallelism
• Multi-threading
• Avoid synchronized: CAS
• Query using a scatter gather pattern
• Map/Reduce e.g. Hazelcast Map/Reduce
24. Java: Will it make the cut?
Garbage Collection limits heap usage. G1
and Balanced aim for 100ms at 10GB.
Unused
Memory
64GB
4GB 4s
Heap
Java Apps Memory Bound
GC Pause
Time
Available
Memory
GC
Off-Heap Storage
No low-level CPU access
Java is challenged as an infrastructure
language despite its newly popular
usage for this
25. CEP/Stream Processing
• Don’t let data pool up and then process with “pull queries”.
• Invert that and process it as it streams in.“push queries”
• Queries execute against “tables” that breaks the stream up into
a current time window
• Hold the window and intermediate results in memory
Results are in real-time
26. In-Situ Processing
Rather than moving the data to be processed you process it in-situ.
Examples:
- HANA Calculation Engine
- Google Big Query
- Exadata Storage Servers
- Hazelcast EntryProcessor and Distributed Executor Service
27. 27
Souped-Up Von Neumann Architecture
27
Memory
Over The
Network
Memory
Over The
Network
SSD
(Flash and
RAM)
Multi-
processor
Multi-core/
Compression
64 bit
DRAM
More Cache, NUMA, Wide/
Multi channel, Locality
PCI Flash
PCI Flash
Vector/AES etc
29. 2929
The new data management world
Data Grid
Terracotta
Coherence
Gemfire …
30. SAP HANA
Relational | Analytical
• “Appliance”
• Aggressive IA64 optimisations
• ACID, SQL and MDX
• In-memory SSD and Disk
• Row and Column based Storage
• Fast aggregation on column store
• Single Instance 1TB limit
• Uses compression (est. 5x size)
• Parallel DB - round-robin, hash, or range partitioning of a table
with shared storage
• Updates as delta inserts
• Data is fed from source systems near real-time, real-time or
batch
31. Volt DB
Relational | New SQL | Operational | Analytical
• An all in-memory design
• Full SQL and full ACID
• Partitioned per core so that one thread own its partition –
avoids locking and latching
• Redundancy provided by
multiples instances with
writes being replicated
• Claims to be 45x faster
32. Oracle Exadata
Relational | Operational | Analytical | Appliance
• Combines Oracle RAC with “Storage Servers”
• Connected with the box with Infiniband QDR
• SS use PCI Flash (not SSD) for a 22 TB hardware cache
• In-situ computation on the Storage Servers with “Smart Scan”
• Uses “Hybrid Columnar Compression” a compromise of row
and column storage.
PCI Flash Card
33. Terracotta BigMemory
Key-Value | Operational | Data Grid
• In-memory
• Key-value with the Ehcache and soon javax.cache APIs
• In-process (L1) and server storage (L2)
• Persistence via log-forward Fast Restart Store: SSD or Disk
• Tiered Storage: local on-heap, local off-heap, server on-heap,
server off-heap
• Partitions with consistent hashing
• Search with parallel in-situ execution
• Off-heap allows 2TB uncompressed in each app server Java
process and on each server partition
• Compression
• Speed ranging from 1µs to a few ms.
34. Hazelcast
Key-Value | Operational | Data Grid
• In-memory
• Key-value Map API and javax.cache API
• Near cache and server data storage
• Tiered Storage: local on-heap, local off-heap, server on-heap,
server off-heap
• Partitions with consistent hashing
• Search with parallel in-situ execution
• In-situ processing with Entry Processors and Distributed
Executors
• Speed ranging from 1µs to a few ms.
35. Disk is the new tape
35
SSD is the new disk
Memory is the new
operational store