GPU databases - How to use them and what the future holds

@arnon86@sqreamtech
GPU DATABASES:
HOW TO USE THEM
AND WHAT THE FUTURE HOLDS
or
GD: HTUT AWTFH
for short

@arnon86@sqreamtech
Before we start…
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database

@arnon86@sqreamtech
Who I am
•From Israel
•4 years at SQream
•Originally part of the dev team
•Tweet about animals a lot - @arnon86

@arnon86@sqreamtech
Who I am
•A big aviation nerd

@arnon86@sqreamtech
“Moore’s law is ending”

@arnon86@sqreamtech
“The consensus was that if we could keep
doing that, if we could go to chips with
1,000 cores, everything would be fine,”

@arnon86@sqreamtech
“It turns out that’s really hard”
Dr. Doug Burger, an expert in chip design at Microsoft.

@arnon86@sqreamtech
So we just take things parallel, right?

@arnon86@sqreamtech
Let’s talk BIG data
Hundreds of TB
(Sometimes even petabytes of data)
coming in at a rate of multiple terabytes per day
Up to 1-4TB
2010 20162008
Up to 10TB
Data is STILL growing exponentially

@arnon86@sqreamtech
530 PB
12000
PB
15000
PB
CERN NSA Google
We’re in the petabyte age
• Petabyte datasets are now the norm
• Even small companies have dozens of terabytes of data for analysis
• Some outliers have more:
– CERN processes 1 petabyte per day,
stores 530 PB total
– In 2012, Facebook analyzed 5 petabytes per day,
stores estimated a few exabytes
– The NSA might hold 12 exabytes

Are we only analyzing the tip of the iceberg?

@arnon86@sqreamtech
What we’ll talk about
•Why GPUs?
•What are GPU databases?
•When are GPU databases good?
•The future

@arnon86@sqreamtech
What is a GPU?
• A processor specialized for display functions
• The GPU renders images, animations and video for the computer's screen.

@arnon86@sqreamtech
What is a GPGPU?
• A general-purpose GPU (GPGPU) is a GPU that performs non-specialized calculations that
would typically be conducted by the CPU.
• Put simply, it’s about taking the GPU and generalizing it for non-graphics.
• AMD and NVIDIA have their own APIs for doing GPGPU programming – rockM and CUDA
respectively.

@arnon86@sqreamtech
Let’s talk core count

@arnon86@sqreamtech
Tesla p100 – 3584 cuda cores

@arnon86@sqreamtech
it’s not a strange piece of hardware

@arnon86@sqreamtech
Gpus all around
• Pretty much all cloud providers now offer GPU instances
• Most hardware vendors offer specially tuned GPU servers
GPUCLOUD

@arnon86@sqreamtech
How gpu acceleration works

@arnon86@sqreamtech
What are GPU Databases?
• A GPU database is a database, relational or non-relational, that uses a GPU to perform
some database operations
• Most of the GPU databases tend to focus on analytics, and they’re offering it to a market
that was oversold on Hadoop for Big Data analytics
• And they’re typically pretty fast
And they’re not only disrupting the in-memory crowd
• GPU databases are more flexible in processing many different types of data, or much
larger amounts of data

@arnon86@sqreamtech
Why gpus in big data?
• High core count allows offloading of ‘heavy’ stuff like JOINs, ORDER BY, GROUP BY from the
CPU to the GPU
• Compression and Decompression processes reduce PCI and disk I/O. These are basically
free on the GPU
• Can also use GPU to do computationally intensive operations like deep learning,
cryptography.

@arnon86@sqreamtech
Today’s data market - databases
• A lot of new databases are in-memory, because “memory is cheap”
• In-memory can’t handle more than ~2TB without very expensive hardware
• Scaling out with in-memory gets very expensive, very fast:
8 SAP HANA machines for handling 40TB has a TCO of $22,000,000 for 4 years

@arnon86@sqreamtech
There’s more than one type of gpu database
In-memory GPU databases
• Typically for small datasets
• Stores data in-memory
• Very fast performance (milliseconds)
• For relatively simple queries
• Limited due to memory constraints
Big Data GPU databases
• Typically for giant datasets
• Stores data on-disk
• Fast performance (seconds-minutes)
• For complex queries
• Theoretically unlimited data-sets
• A good fit for today’s evolving needs

@arnon86@sqreamtech
Don’t BUY hardware, BUY the results
• Your boss (probably) does not care about the chips in the servers
• GPU is a cool buzzword, but buzzwords alone won’t get the job done
• Achieve incredible speeds without betting the (server) farm
• Evaluate databases based on functionality and what they can do for you

@arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Tracking customer behaviour at a large national mobile telecom operator with Tableau and
SQream DB to improve offering and increase revenue

@arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Understanding 40 million customers with SQream DB
80 nodes – 5 full racks
7600 CPU cores
SQream DB v1.9.6
HP Server with NVIDIA Tesla
96 GB RAM + 6 TB storage
Ingest time
Reporting time
Cost of Ownership $$$10,000,000
120 m
300 m 20 m
10 m
$200,000

@arnon86@sqreamtech
33.70
4.0
56
12,000,000
The cost of performance
ACV calculation on 24 TB of data, 300B rows, 8 different tables - with complex, nested joins
31.70
4.7
4
500,000
Netezza
8 full 42U racks, 56 S-Blades
7 TB RAM
SQream DB v1.9.6
Dell C4130 with 4x NVIDIA Tesla K80
512 GB RAM + iSCSI JBOD (20TB)
Average query time
(seconds)
Processing Units
(S-Blade / GPUs)
Compression ratio
Cost of Ownership $$

Major ad-tech increased revenues by improving bids
A major ad-tech deployed an 8 GPU SQream DB instances to unlock more insights from their Hadoop
cluster
Why they chose SQream DB
• TRILLIONS of ad impressions monthly equate to 360TB (raw).
This was too slow with Hadoop / Phoenix.
• Live analytics was unavailable due to Hadoop limitations
• The need to construct bidding histograms for dynamic CPM campaigns was extremely time-consuming
in the current system – query time around 5 hours!
8x NVIDIA Tesla GPUs
Qumulo NAS – 360TB

@arnon86@sqreamtech
Let’s see it in action

@arnon86@sqreamtech
Genome Research - Speed & Scale
SQream and Sheba medical center cut cancer cure research time from years to weeks
200 GB
Average size of a single human
genome sequencing
2 Months
Time it takes a genome researcher to
compare a handful of sequences
1 PB
The amount of storage needed by a
genome research institute
2 Hours
Time it takes a researcher to
compare up to hundreds of
sequences with SQream DB
x100
Factor of
improvement over
existing methods

@arnon86@sqreamtech
Chanel says racks are fashionable. Our customers
think otherwise

@arnon86@sqreamtech
BE EFFICIENT with your hardware
This configuration can analyze ~40TB of data
SQream DB with Tesla cards

@arnon86@sqreamtech
Environmentally friendly
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
Certified servers
Enabled with
Certified storage

@arnon86@sqreamtech
Let’s talk about the future

@arnon86@sqreamtech
Don’t be afraid of the future
• We know new databases are scary
• It’s a risk, but the reward is big
• Innovate all aspects of your data pipeline
Incremental Cold Fusion
The
scary
zone

@arnon86@sqreamtech
How we see the future of GPU databases
• The future is not just GPU databases. Different databases for different needs.
The relational model is still king for most of us
• More data = more processing power needed.
Scalable database solutions that can handle growing data become more relevant
• GPUs used for compute intensive stuff, e.g. graph processing, machine learning, AI
• Rising GPU offerings in the public cloud will allow adoption by more companies
GPUCLOUD

@arnon86@sqreamtech
How we see the future – hardware/Stack
• Improved programming extensions and better compilers in new CUDA/rockM will make it
easier to write good GPU code
• Faster HBM2 memory and PCIe v5.0 to reduce overhead of GPU processing
• More tightly-knit hardware integration, like the Intel H-series integrated GPU processor

@arnon86@sqreamtech
Reminder
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database

GPU databases - How to use them and what the future holds

More Related Content

What's hot

Viewers also liked

Similar to GPU databases - How to use them and what the future holds

Recently uploaded

GPU databases - How to use them and what the future holds