@arnon86@sqreamtech
GPU DATABASES:
HOW TO USE THEM
AND WHAT THE FUTURE HOLDS
or
GD: HTUT AWTFH
for short
@arnon86@sqreamtech
Before we start…
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database
@arnon86@sqreamtech
Who I am
•From Israel
•4 years at SQream
•Originally part of the dev team
•Tweet about animals a lot - @arnon86
@arnon86@sqreamtech
Who I am
•A big aviation nerd
@arnon86@sqreamtech
“Moore’s law is ending”
@arnon86@sqreamtech
“The consensus was that if we could keep
doing that, if we could go to chips with
1,000 cores, everything would be fine,”
@arnon86@sqreamtech
“It turns out that’s really hard”
Dr. Doug Burger, an expert in chip design at Microsoft.
@arnon86@sqreamtech
So we just take things parallel, right?
@arnon86@sqreamtech
Let’s talk BIG data
Hundreds of TB
(Sometimes even petabytes of data)
coming in at a rate of multiple terabytes per day
Up to 1-4TB
2010 20162008
Up to 10TB
Data is STILL growing exponentially
@arnon86@sqreamtech
530 PB
12000
PB
15000
PB
CERN NSA Google
We’re in the petabyte age
• Petabyte datasets are now the norm
• Even small companies have dozens of terabytes of data for analysis
• Some outliers have more:
– CERN processes 1 petabyte per day,
stores 530 PB total
– In 2012, Facebook analyzed 5 petabytes per day,
stores estimated a few exabytes
– The NSA might hold 12 exabytes
Are we only analyzing the tip of the iceberg?
@arnon86@sqreamtech
What we’ll talk about
•Why GPUs?
•What are GPU databases?
•When are GPU databases good?
•The future
@arnon86@sqreamtech
@arnon86@sqreamtech
What is a GPU?
• A processor specialized for display functions
• The GPU renders images, animations and video for the computer's screen.
@arnon86@sqreamtech
What is a GPGPU?
• A general-purpose GPU (GPGPU) is a GPU that performs non-specialized calculations that
would typically be conducted by the CPU.
• Put simply, it’s about taking the GPU and generalizing it for non-graphics.
• AMD and NVIDIA have their own APIs for doing GPGPU programming – rockM and CUDA
respectively.
@arnon86@sqreamtech
Let’s talk core count
@arnon86@sqreamtech
Tesla p100 – 3584 cuda cores
@arnon86@sqreamtech
it’s not a strange piece of hardware
@arnon86@sqreamtech
Gpus all around
• Pretty much all cloud providers now offer GPU instances
• Most hardware vendors offer specially tuned GPU servers
GPUCLOUD
@arnon86@sqreamtech
How gpu acceleration works
@arnon86@sqreamtech
What are GPU Databases?
• A GPU database is a database, relational or non-relational, that uses a GPU to perform
some database operations
• Most of the GPU databases tend to focus on analytics, and they’re offering it to a market
that was oversold on Hadoop for Big Data analytics
• And they’re typically pretty fast
And they’re not only disrupting the in-memory crowd
• GPU databases are more flexible in processing many different types of data, or much
larger amounts of data
@arnon86@sqreamtech
Why gpus in big data?
• High core count allows offloading of ‘heavy’ stuff like JOINs, ORDER BY, GROUP BY from the
CPU to the GPU
• Compression and Decompression processes reduce PCI and disk I/O. These are basically
free on the GPU
• Can also use GPU to do computationally intensive operations like deep learning,
cryptography.
@arnon86@sqreamtech
Today’s data market - databases
• A lot of new databases are in-memory, because “memory is cheap”
• In-memory can’t handle more than ~2TB without very expensive hardware
• Scaling out with in-memory gets very expensive, very fast:
8 SAP HANA machines for handling 40TB has a TCO of $22,000,000 for 4 years
@arnon86@sqreamtech
There’s more than one type of gpu database
In-memory GPU databases
• Typically for small datasets
• Stores data in-memory
• Very fast performance (milliseconds)
• For relatively simple queries
• Limited due to memory constraints
Big Data GPU databases
• Typically for giant datasets
• Stores data on-disk
• Fast performance (seconds-minutes)
• For complex queries
• Theoretically unlimited data-sets
• A good fit for today’s evolving needs
@arnon86@sqreamtech
Don’t BUY hardware, BUY the results
• Your boss (probably) does not care about the chips in the servers
• GPU is a cool buzzword, but buzzwords alone won’t get the job done
• Achieve incredible speeds without betting the (server) farm
• Evaluate databases based on functionality and what they can do for you
@arnon86@sqreamtech
@arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Tracking customer behaviour at a large national mobile telecom operator with Tableau and
SQream DB to improve offering and increase revenue
@arnon86@sqreamtech
Understanding 40m telecom customers with sqream db
Understanding 40 million customers with SQream DB
80 nodes – 5 full racks
7600 CPU cores
SQream DB v1.9.6
HP Server with NVIDIA Tesla
96 GB RAM + 6 TB storage
Ingest time
Reporting time
Cost of Ownership $$$10,000,000
120 m
300 m 20 m
10 m
$200,000
@arnon86@sqreamtech
33.70
4.0
56
12,000,000
The cost of performance
ACV calculation on 24 TB of data, 300B rows, 8 different tables - with complex, nested joins
31.70
4.7
4
500,000
Netezza
8 full 42U racks, 56 S-Blades
7 TB RAM
SQream DB v1.9.6
Dell C4130 with 4x NVIDIA Tesla K80
512 GB RAM + iSCSI JBOD (20TB)
Average query time
(seconds)
Processing Units
(S-Blade / GPUs)
Compression ratio
Cost of Ownership $$
Major ad-tech increased revenues by improving bids
A major ad-tech deployed an 8 GPU SQream DB instances to unlock more insights from their Hadoop
cluster
Why they chose SQream DB
• TRILLIONS of ad impressions monthly equate to 360TB (raw).
This was too slow with Hadoop / Phoenix.
• Live analytics was unavailable due to Hadoop limitations
• The need to construct bidding histograms for dynamic CPM campaigns was extremely time-consuming
in the current system – query time around 5 hours!
8x NVIDIA Tesla GPUs
Qumulo NAS – 360TB
@arnon86@sqreamtech
Let’s see it in action
@arnon86@sqreamtech
Genome Research - Speed & Scale
SQream and Sheba medical center cut cancer cure research time from years to weeks
200 GB
Average size of a single human
genome sequencing
2 Months
Time it takes a genome researcher to
compare a handful of sequences
1 PB
The amount of storage needed by a
genome research institute
2 Hours
Time it takes a researcher to
compare up to hundreds of
sequences with SQream DB
x100
Factor of
improvement over
existing methods
@arnon86@sqreamtech
Chanel says racks are fashionable. Our customers
think otherwise
@arnon86@sqreamtech
BE EFFICIENT with your hardware
This configuration can analyze ~40TB of data
SQream DB with Tesla cards
@arnon86@sqreamtech
Environmentally friendly
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
Certified servers
Enabled with
Certified storage
@arnon86@sqreamtech
Let’s talk about the future
@arnon86@sqreamtech
Don’t be afraid of the future
• We know new databases are scary
• It’s a risk, but the reward is big
• Innovate all aspects of your data pipeline
Incremental Cold Fusion
The
scary
zone
@arnon86@sqreamtech
How we see the future of GPU databases
• The future is not just GPU databases. Different databases for different needs.
The relational model is still king for most of us
• More data = more processing power needed.
Scalable database solutions that can handle growing data become more relevant
• GPUs used for compute intensive stuff, e.g. graph processing, machine learning, AI
• Rising GPU offerings in the public cloud will allow adoption by more companies
GPUCLOUD
@arnon86@sqreamtech
How we see the future – hardware/Stack
• Improved programming extensions and better compilers in new CUDA/rockM will make it
easier to write good GPU code
• Faster HBM2 memory and PCIe v5.0 to reduce overhead of GPU processing
• More tightly-knit hardware integration, like the Intel H-series integrated GPU processor
@arnon86@sqreamtech
Reminder
•We offer a free consultation and assessment
to anyone here
•We can help you understand the benefits of
using a GPU database
@arnon86@sqreamtech
Don’t BUY hardware, BUY the results
• Your boss (probably) does not care about the chips in the servers
• GPU is a cool buzzword, but buzzwords alone won’t get the job done
• Achieve incredible speeds without betting the (server) farm
• Evaluate databases based on functionality and what they can do for you

GPU databases - How to use them and what the future holds

  • 1.
    @arnon86@sqreamtech GPU DATABASES: HOW TOUSE THEM AND WHAT THE FUTURE HOLDS or GD: HTUT AWTFH for short
  • 2.
    @arnon86@sqreamtech Before we start… •Weoffer a free consultation and assessment to anyone here •We can help you understand the benefits of using a GPU database
  • 3.
    @arnon86@sqreamtech Who I am •FromIsrael •4 years at SQream •Originally part of the dev team •Tweet about animals a lot - @arnon86
  • 4.
  • 5.
  • 6.
    @arnon86@sqreamtech “The consensus wasthat if we could keep doing that, if we could go to chips with 1,000 cores, everything would be fine,”
  • 7.
    @arnon86@sqreamtech “It turns outthat’s really hard” Dr. Doug Burger, an expert in chip design at Microsoft.
  • 8.
    @arnon86@sqreamtech So we justtake things parallel, right?
  • 9.
    @arnon86@sqreamtech Let’s talk BIGdata Hundreds of TB (Sometimes even petabytes of data) coming in at a rate of multiple terabytes per day Up to 1-4TB 2010 20162008 Up to 10TB Data is STILL growing exponentially
  • 10.
    @arnon86@sqreamtech 530 PB 12000 PB 15000 PB CERN NSAGoogle We’re in the petabyte age • Petabyte datasets are now the norm • Even small companies have dozens of terabytes of data for analysis • Some outliers have more: – CERN processes 1 petabyte per day, stores 530 PB total – In 2012, Facebook analyzed 5 petabytes per day, stores estimated a few exabytes – The NSA might hold 12 exabytes
  • 11.
    Are we onlyanalyzing the tip of the iceberg?
  • 12.
    @arnon86@sqreamtech What we’ll talkabout •Why GPUs? •What are GPU databases? •When are GPU databases good? •The future
  • 13.
  • 14.
    @arnon86@sqreamtech What is aGPU? • A processor specialized for display functions • The GPU renders images, animations and video for the computer's screen.
  • 15.
    @arnon86@sqreamtech What is aGPGPU? • A general-purpose GPU (GPGPU) is a GPU that performs non-specialized calculations that would typically be conducted by the CPU. • Put simply, it’s about taking the GPU and generalizing it for non-graphics. • AMD and NVIDIA have their own APIs for doing GPGPU programming – rockM and CUDA respectively.
  • 16.
  • 17.
  • 18.
    @arnon86@sqreamtech it’s not astrange piece of hardware
  • 19.
    @arnon86@sqreamtech Gpus all around •Pretty much all cloud providers now offer GPU instances • Most hardware vendors offer specially tuned GPU servers GPUCLOUD
  • 20.
  • 21.
    @arnon86@sqreamtech What are GPUDatabases? • A GPU database is a database, relational or non-relational, that uses a GPU to perform some database operations • Most of the GPU databases tend to focus on analytics, and they’re offering it to a market that was oversold on Hadoop for Big Data analytics • And they’re typically pretty fast And they’re not only disrupting the in-memory crowd • GPU databases are more flexible in processing many different types of data, or much larger amounts of data
  • 22.
    @arnon86@sqreamtech Why gpus inbig data? • High core count allows offloading of ‘heavy’ stuff like JOINs, ORDER BY, GROUP BY from the CPU to the GPU • Compression and Decompression processes reduce PCI and disk I/O. These are basically free on the GPU • Can also use GPU to do computationally intensive operations like deep learning, cryptography.
  • 23.
    @arnon86@sqreamtech Today’s data market- databases • A lot of new databases are in-memory, because “memory is cheap” • In-memory can’t handle more than ~2TB without very expensive hardware • Scaling out with in-memory gets very expensive, very fast: 8 SAP HANA machines for handling 40TB has a TCO of $22,000,000 for 4 years
  • 24.
    @arnon86@sqreamtech There’s more thanone type of gpu database In-memory GPU databases • Typically for small datasets • Stores data in-memory • Very fast performance (milliseconds) • For relatively simple queries • Limited due to memory constraints Big Data GPU databases • Typically for giant datasets • Stores data on-disk • Fast performance (seconds-minutes) • For complex queries • Theoretically unlimited data-sets • A good fit for today’s evolving needs
  • 25.
    @arnon86@sqreamtech Don’t BUY hardware,BUY the results • Your boss (probably) does not care about the chips in the servers • GPU is a cool buzzword, but buzzwords alone won’t get the job done • Achieve incredible speeds without betting the (server) farm • Evaluate databases based on functionality and what they can do for you
  • 26.
  • 27.
    @arnon86@sqreamtech Understanding 40m telecomcustomers with sqream db Tracking customer behaviour at a large national mobile telecom operator with Tableau and SQream DB to improve offering and increase revenue
  • 28.
    @arnon86@sqreamtech Understanding 40m telecomcustomers with sqream db Understanding 40 million customers with SQream DB 80 nodes – 5 full racks 7600 CPU cores SQream DB v1.9.6 HP Server with NVIDIA Tesla 96 GB RAM + 6 TB storage Ingest time Reporting time Cost of Ownership $$$10,000,000 120 m 300 m 20 m 10 m $200,000
  • 29.
    @arnon86@sqreamtech 33.70 4.0 56 12,000,000 The cost ofperformance ACV calculation on 24 TB of data, 300B rows, 8 different tables - with complex, nested joins 31.70 4.7 4 500,000 Netezza 8 full 42U racks, 56 S-Blades 7 TB RAM SQream DB v1.9.6 Dell C4130 with 4x NVIDIA Tesla K80 512 GB RAM + iSCSI JBOD (20TB) Average query time (seconds) Processing Units (S-Blade / GPUs) Compression ratio Cost of Ownership $$
  • 30.
    Major ad-tech increasedrevenues by improving bids A major ad-tech deployed an 8 GPU SQream DB instances to unlock more insights from their Hadoop cluster Why they chose SQream DB • TRILLIONS of ad impressions monthly equate to 360TB (raw). This was too slow with Hadoop / Phoenix. • Live analytics was unavailable due to Hadoop limitations • The need to construct bidding histograms for dynamic CPM campaigns was extremely time-consuming in the current system – query time around 5 hours! 8x NVIDIA Tesla GPUs Qumulo NAS – 360TB
  • 31.
  • 32.
    @arnon86@sqreamtech Genome Research -Speed & Scale SQream and Sheba medical center cut cancer cure research time from years to weeks 200 GB Average size of a single human genome sequencing 2 Months Time it takes a genome researcher to compare a handful of sequences 1 PB The amount of storage needed by a genome research institute 2 Hours Time it takes a researcher to compare up to hundreds of sequences with SQream DB x100 Factor of improvement over existing methods
  • 33.
    @arnon86@sqreamtech Chanel says racksare fashionable. Our customers think otherwise
  • 34.
    @arnon86@sqreamtech BE EFFICIENT withyour hardware This configuration can analyze ~40TB of data SQream DB with Tesla cards
  • 35.
  • 36.
  • 37.
    @arnon86@sqreamtech Don’t be afraidof the future • We know new databases are scary • It’s a risk, but the reward is big • Innovate all aspects of your data pipeline Incremental Cold Fusion The scary zone
  • 38.
    @arnon86@sqreamtech How we seethe future of GPU databases • The future is not just GPU databases. Different databases for different needs. The relational model is still king for most of us • More data = more processing power needed. Scalable database solutions that can handle growing data become more relevant • GPUs used for compute intensive stuff, e.g. graph processing, machine learning, AI • Rising GPU offerings in the public cloud will allow adoption by more companies GPUCLOUD
  • 39.
    @arnon86@sqreamtech How we seethe future – hardware/Stack • Improved programming extensions and better compilers in new CUDA/rockM will make it easier to write good GPU code • Faster HBM2 memory and PCIe v5.0 to reduce overhead of GPU processing • More tightly-knit hardware integration, like the Intel H-series integrated GPU processor
  • 40.
    @arnon86@sqreamtech Reminder •We offer afree consultation and assessment to anyone here •We can help you understand the benefits of using a GPU database
  • 41.
    @arnon86@sqreamtech Don’t BUY hardware,BUY the results • Your boss (probably) does not care about the chips in the servers • GPU is a cool buzzword, but buzzwords alone won’t get the job done • Achieve incredible speeds without betting the (server) farm • Evaluate databases based on functionality and what they can do for you