4. What
is
Big
Data?
COMMON
SENSE
FROM
WIKIPEDIA
“Big
data
is
a
collecRon
of
data
sets
so
large
and
complex
that
it
becomes
difficult
to
process
using
on-‐hand
database
management
tools
or
tradiBonal
data
processing
applicaRons.
The
challenges
include
capture,
curaRon,
storage,
search,
sharing,
analysis
and
visualizaRon.”
4
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
5. WHAT
BIG
DATA
IS
NOT
A
COMMON
MISTAKE
Big Data is NOT Storage of large datasets
5
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
6. REAL-TIME IN BIG DATA IS A TWO-DIMENSIONAL PROBLEM
Continuous extremely fast data
load and availability
Sub-second response
times
6
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
7. ANALYTICS
LANDSCAPE
BIG
DATA
ANALYTICS
REQUIRES
NEW
TECHNOLOGICAL
SOLUTIONS
OperaBonal
Data
Big
Data
Stream-‐AnalyBcs
Real-‐Time
Real-‐Time
AnalyBcs
Complex
Event
Processing
OperaBons
AnalyBcs
Massively
parallel
(MPP)
Real-‐Time
1
sec
10
sec
Batch-‐AnalyBcs
OLAP
1
min
OLTP
ReporBng
Lag
Time
<
1..10
milli
sec
10..100
milli
sec
●
ParStream
In-‐Memory
DB
Response
Rme
Gigabyte
7
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
10
min
Map
Reduce
Batches
(NoSQL)
Terabyte
1h
Petabyte
8. PARSTREAM
IS
A
UNIQUE
PRODUCT
PARSTREAM
EMPOWERS
CUSTOMERS
TO
REALIZE
NEW
BUSINESS
OPPORTUNITIES
EVOLVING
WITH
BIG
DATA
! Analyze
and
Filter
Billions
of
Records
! Query
Data
Structures
with
1000’s
of
columns
! Get
Answers
in
Milliseconds
without
Cubes
! Get
Answers
in
Milliseconds
without
Cubes
Column
Store
! Execute
1000’s
of
Concurrent
Queries
High
Performance
Index
Scalability
In-‐Memory
Technology
High-‐Speed
Import
8
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
Clustering
Clustering
Scalability
Real-‐Rme
Queries
10. ARCHITECTURE
BUILDING
BLOCKS
PARSTREAM
IS
THE
BIG
DATA
ANALYTICS
PLATFORM
BASED
ON
A
UNIQUE
HIGH
PERFORMANCE
COMPRESSED
INDEX
! Columnar
Storage
! In
Memory
Technology
! Shared
Nothing
Architecture
! Standard
Interfaces
SQL/JDBC/ODBC
C++
UDF
API
! User
Defined
FuncRons
! Unique
High
Performance
Compressed
Index
In-‐Memory
&
Disc
Technology
MPP
10
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
Real-‐Time
AnalyRcs
Engine
Compressed
Index
Shared
Nothing
Fast
Columnar
Storage
ParRRoning
11. PARALLEL
ARCHITECTURE
PARSTREAM
OVERCOMES
LIMITATIONS
OF
TRADITIONAL
DW
ARCHITECTURES
Query
! STANDARD
DW
ARCHITECTURE
‒ Long
Query
RunRme
‒ Frequent
Full
Table
Scans
‒ Data
is
at
Least
1
Day
Old
Nightly
Batch
-‐
Import
! PARSTREAM
ARCHITECTURE
‒ Each
Query
Uses
MulRple
Processor
Cores
‒ Query
execuRon
using
compressed
indices
‒ ConRnuous
Import
Assures
Timeliness
of
Data
11
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
Query
HPCI
Parallel
Import
12. TRADITIONAL
DATABASE
QUERY
EXECUTION
STATIC
QUERY
EXECUTION
OpRmizer/
Planner
Parser
SQL-‐Statement
Parsed-‐Statement
12
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
Executor
ExecuRonPlan
13. MODULAR
EXECUTION
TREE
ATOMIC
OPERATIONS
COMBINED
USING
QUEUES
ExecuBon
Tree
! Parsed
query
descripRons
are
transformed
into
execuRon
trees
sort
! OpRmizer
distributes
execuRon
operaRons
to
available
hardware
aggregate
! Data-‐locality
and
current
load
are
used
for
allocaRon
! During
query
execuRon
opRmizer
can
re-‐
allocate
if
beneficial
! OpRmizer
conRnuously
refines
allocaRon
based
on
past
queries
aggregaRon
aggregaRon
aggregaRon
aggregaRon
filter
filter
filter
filter
calc
calc
calc
calc
fetch
fetch
fetch
fetch
! Flow
based
execuRon
control
! Each
ExecNode
processes
blocks
of
data
! Data
transfer
between
nodes
using
queues
13
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
15. ARCHITECUTRE
ALLOWS
USAGE
OF
DIFFERENT
PROCESSING
UNITS
ANY
PART
OF
THE
QUERY
MAY
BE
EXECUTED
INDIVIDUALLY
ExecuBon
Tree
! Each
atomic
operaRon
may
be
processed
using
any
available
compute
resource
sort
! Dynamic
workload
assignment
during
query
execuRon
aggregate
! Overall
workload
management
ensures
opRmal
resource
usage
aggregaRon
aggregaRon
aggregaRon
filter
filter
filter
filter
calc
calc
calc
calc
fetch
15
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
aggregaRon
fetch
fetch
fetch
16. PROBLEMS
USING
TRADITIONAL
GPU
COMPUTE
UNITS
THE
TRANSFER
AND
COMMUNICATION
PROBLEM
! Target
scenario
Real-‐Time
BIG
DATA
aggregaRon
filter
‒ Processing
huge
amounts
of
data
‒ Dynamically
changing
of
data
‒ InteracRve
response
Rme
! Part
of
the
data
fixed
in
GPU
memory
‒ Input
data
transferred
once
via
PCI
during
loading
‒ Transfer
of
result
via
PCI
during
execuRon
calc
fetch
aggregaRon
filter
calc
! Data
resident
in
main
memory
‒ Offload
of
computaRonal
task
to
GPU
‒ Transfer
in
and
out
via
PCI
during
execuRon
! Global
data
needs
to
be
transferred
to
GPU
too
! Global
data
needs
to
be
synchronized
! Latency
based
on
blockwise
processing
! Different
programming
models
16
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL
fetch
17. HSA
SOLVES
ALL
OUR
PROBLEMS
! No
Data
transfer
required
! Shared
page
table
support
! Coherent
memory
regions
!
User-‐level
command
queueing
! Hardware
scheduling
! Bold
allows
uniform
programming
model
17
|
REAL-‐TIME
INSIGHT
IN
BIG
DATA|
November
19,
2013
|
CONFIDENTIAL