Building your big data solution

Learn
with
WSO2
-‐
Building

your
Big
Data
Solu8on

Srinath
Perera

Director
of
Research

WSO2
Inc.

About WSO2
•  Providing the only complete open source componentized
cloud platform
–  Dedicated to removing all the stumbling blocks to enterprise agility
–  Enabling you to focus on business logic and business value
•  Recognized by leading analyst firms as visionaries and
leaders
–  Gartner cites WSO2 as visionaries in all 3 categories of
application infrastructure
–  Forrester places WSO2 in top 2 for API Management
•  Global corporation with offices in USA, UK & Sri Lanka
–  200+ employees and growing
•  Business model of selling comprehensive support &
maintenance for our products

150+ globally positioned support customers

Consider
a
day
in
your
life

•  What
is
the
best
road
to
take?

•  Would
there
be
any
bad

weather?

•  What
is
the
best
way
to
invest

the
money?

•  Should
I
take
that
loan?

•  Can
I
op8mize
my
day?

•  Is
there
a
way
to
do
this

faster?

•  What
have
others
done
in

similar
cases?

•  Which
product
should
I
buy?

People
wanted
to
(through
ages)

•  To
know
(what

happened?)

•  To
Explain
(why
it

happened)

•  To
Predict
(what
will

happen?)

What
is
Big
data?

•  There
is
lot
of
data
available

–  E.g.
Internet
of
things

•  We
have
compu8ng
power

•  We
have
technology

•  Goal
is
same

–  To
know

–  To
Explain

–  To
predict

•  Challenge
is
the
full
lifecycle

Data
Avalanche/
Moore’s
law
of
data

•  We
are
now
collec8ng
and
conver8ng
large
amount

of
data
to
digital
forms

•  90%
of
the
data
in
the
world
today
was
created

within
the
past
two
years.

•  Amount
of
data
we
have
doubles
very
fast

In
real
life,
most
data
are
Big

•  Web
does
millions
of
ac8vi8es
per
second,
and
so

much
server
logs
are
created.

•  Social
networks
e.g.
Facebook,
800
Million
ac8ve

users,
40
billion
photos
from
its
user
base.

•  There
are
>4
billion
phones
and
>25%
are
smart

phones.
There
are
billions
of
RFID
tags.

•  Observa8onal
and
Sensor
data

–  Weather
Radars,
Balloons

–  Environmental
Sensors

–  Telescopes

–  Complex
physics
simula8ons

Why
Big
Data
is
hard?

•  How
store?
Assuming
1TB
bytes
it

takes
1000
computers
to
store
a
1PB

•  How
to
move?
Assuming
10Gb

network,
it
takes
2
hours
to
copy
1TB,

or
83
days
to
copy
a
1PB

•  How
to
search?
Assuming
each
record

is
1KB
and
one
machine
can
process

1000
records
per
sec,
it
needs
277CPU

days
to
process
a
1TB
and
785
CPU

years
to
process
a
1
PB

•  How
to
process?

–  How
to
convert
algorithms
to
work
in

large
size

–  How
to
create
new
algorithms

hap://www.susanica.com/photo/9

Why
it
is
hard
(Contd.)?

•  System
build
of
many

computers

•  That
handles
lots
of
data

•  Running
complex
logic

•  This
pushes
us
to
fron8er
of

Distributed
Systems
and

Databases

•  More
data
does
not
mean

there
is
a
simple
model

•  Some
models
can
be
complex

as
the
system

hap://www.ﬂickr.com/photos/mariachily/5250487136,

Licensed
CC

WSO2
Oﬀerings

•  Two
tools

– WSO2
BAM
for
store
and
process

– WSO2
CEP
for
real8me
processing

•  These
tools
covers
whole
processing
life
cycle

for
your
Big
Data
with
help
of
few
other

products
as
needed.

– WSO2
Storage
server

– WSO2
User
Experience
Server

Big
Data
Architecture
Implementa8on

Sensors

•  Built
sensors
in
WSO2

Products

•  Event
logs

–  Click
streams,
Emails,
chat,

search,
tweets
,Transac8ons
…

•  Custom
Sensors

–  Video
surveillance,
Cash
flows,

Traffic,
Surveillance,
Smart
Grid,

Produc8on
line,
RFID
(e.g.

Walmart),
GPS
sensors,
Mobile

Phone,
Internet
of
Things

hap://www.flickr.com/photos/imuaoo/4257813689/
by
Ian
Muaoo,

hap://www.flickr.com/photos/eastcapital/4554220770/,
hap://www.flickr.com/
photos/patdavid/4619331472/
by
Pat
David
copyright
CC

Collec8ng
Data

•  Data
collected
at
sensors
and
sent
to
big
data

system
via
events
or
ﬂat
ﬁles

•  Event
Streams:
we
name
the
events
by
its

content/
originator

•  Get
data
through

– Point
to
Point

– Event
Bus

•  E.g.
Data
bridge

– a
thrij
based
transport
we

did
that
do
about
400k

events/
sec

Storing
Data

•  Historically
we
used
databases

–  Scale
is
a
challenge:
replica8on,

sharding

•  Scalable
op8ons

–  NoSQL
(Cassandra,
Hbase)
[If

data
is
structured]

•  Column
families
Gaining
Ground

–  Distributed
ﬁle
systems
(e.g.

HDFS)
[If
data
is
unstructured]

•  New
SQL

–  In
Memory
compu8ng,
VoltDB

•  Specialized
data
structures

–  Graph
Databases,
Data
structure

servers

hap://www.ﬂickr.com/photos/keso/
363133967/

Storing
Data
(Contd.)

•  WSO2
Oﬀerings
(WSO2
Storage
Server)

– Small
Structured
Data:

keep
in
rela8onal

databases.

– Large
structured
data
:
Cassandra

– Large
unstructured
data:
HDFS

Making
Sense
of
Data

•  To
know
(what
happened?)

–  Basic
analy8cs
+

visualiza8ons
(min,
max,

average,
histogram,

distribu8ons
…
)

–  Interac8ve
drill
down

•  To
explain
(why)

–  Data
mining,
classiﬁca8ons,

building
models,
clustering

•  To
forecast

–  Neural
networks,
decision

models

Making
Sense
of
Data
(Contd.)

•  Batch
processing
–
WSO2
BAM

– Hive
Scripts

– Map
Reduce
Jobs

•  Real
8me
processing
–
CEP

– Event
Query
Language

•  Above
two
are
the
plarorm,
you
need
to

program
your
usecase.

To
know
(what
happened?)

•  Mainly
Analy8cs

–  Min,
Max,
average,

correla8on,
histograms

–  Might
join
group
data
in

many
ways

•  Implemented
with

MapReduce
or
Queries

•  Data
is
ojen
presented
with

some
visualiza8ons

•  Examples

– 
forensics

–  Assessments

–  Historical
data/
reports/

trends

hap://www.ﬂickr.com/photos/isriya/
2967310333/

To
Explain
(Paaerns)

•  Correla8on

–  Scaaer
plot,
sta8s8cal

correla8on

•  Data
Mining
(Detec8ng

Paaerns)

–  Clustering
and
classifica8on

–  Finding
Similar
items

–  Finding
Hubs
and
authori8es

in
a
Graph

–  Finding
frequent
item
sets

–  Making
recommenda8on

•  Apache
Mahout

hap://www.flickr.com/photos/eriwst/2987739376/
and
hap://www.flickr.com/photos/focx/5035444779/

To
Predict:
Forecasts
and
Models

•  Trying
to
build
a
model
for
the

data

•  Theore8cally
or
empirically

–  Analy8cal
models
(e.g.
Physics)

–  Neural
networks

–  Reinforcement
learning

–  Unsupervised
learning
(clustering,

dimensionality
reduc8on,
kernel

methods)

•  Examples

–  Transla8on

–  Weather
Forecast
models

–  Building
profiles
of
users

–  Traffic
models

–  Economic
models

•  Lot
of
domain
specific
work

hap://misterbijou.blogspot.com/
2010_09_01_archive.html

Informa8on
Visualiza8on

•  Presen8ng
informa8on

–  To
end
user

–  To
decision
takers

–  To
scien8st

•  Interac8ve
explora8on

•  Sending
alerts

•  WSO2
UES

–  Jaggery
based

•  BAM/
CEP
can
Work
with

most
other
UI
tools

hap://www.ﬂickr.com/photos/
stevefaeembra/3604686097/

WSO2
UES

•  Dashboards,
and
Store

•  Build
your
own
Uis
with

Jaggery

MapReduce/
Hadoop

•  First
introduced
by
Google,

and
used
as
the
processing

model
for
their
architecture

•  Implemented
by
opensource

projects
like
Apache
Hadoop

and
Spark

•  Users
writes
two
func8ons:

map
and
reduce

•  The
framework
handles
the

details
like
distributed

processing,
fault
tolerance,

load
balancing
etc.

•  Widely
used,
and
the
one
of

the
catalyst
of
Big
data

void map(ctx, k, v){
tokens = v.split();
for t in tokens
ctx.emit(t,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}

Data
In
the
Move

•  Idea
is
to
process
data
as
they

are
received
in
streaming

fashion

•  Used
when
we
need

–  Very
fast
output

–  Lots
of
events
(few
100k
to

millions)

–  Processing
without
storing
(e.g.

too
much
data)

•  Two
main
technologies

–  Stream
Processing
(e.g.
Strom,

hap://storm-‐project.net/
)

–  Complex
Event
Processing
(CEP)

hap://wso2.com/products/
complex-‐event-‐processor/

Complex
Event
Processing
(CEP)

•  Sees
inputs
as
Event
streams
and
queried
with

SQL
like
language

•  Supports
Filters,
Windows,
Join,
Paaerns
and

Sequences

from p=PINChangeEvents#win.time(3600) join
t=TransactionEvents[p.custid=custid][amount>10000]
#win.time(3600)
return t.custid, t.amount;

Case
Study
1:
Tracing
Business
Process

•  Business
process
is
built
using
many
services

•  Track
trace
each

step,
and
analyze

to
understand

how
to
op8mize

•  E.g.
sales
pipeline

Some
Queries

•  Conversion
rate?

•  How
many
deals
in
pipeline
at
each
month?

•  Average
size
of
the
deals?

•  Average
8me
deal
takes?

•  Can
we
guess
an
large
size
deals
early?

•  Which
is
beaer?
Going
for
few
large
ones
or

many
small
ones?

•  Was
there
any
delays
from
Ourside?

Hive:
Average
Size
of
the
Deal

•  Hive
uses
an
SQL
like
synatax.

•  Easy
to
understand
and
learn

hive> LOAD DATA ..
hive> SELECT avg(value) from LEAD_ACTIVITY
WHERE action=“closedWon” groupby month;

Map
Reduce:
How
many
deals
in

Pipeline?

How
many
deals
in
Pipeline?(Contd.)

void map(ctx, k, v){
Deals deal= parse(v);
int month = getMonth(deal.time);
ctx.emit(month,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}

Case
study
2:
DEBS
Challenge

•  Event
Processing

challenge

•  Real
football
game,

sensors
in
player

shoes
+
ball

•  Events
in
15k
Hz

•  Event
format

–  Sensor
ID,
TS,
x,
y,
z,
v,

a

•  Queries

–  Running
Stats

–  Ball
Possession

–  Heat
Map
of
Ac8vity

–  Shots
at
Goal

Example:
Detect
ball
Possession

•  Possession
is
8me
a

player
hit
the
ball

un8l
someone
else

hits
it
or
it
goes
out

of
the
ground

from Ball#window.length(1) as b join
Players#window.length(1) as p
unidirectional
on debs: getDistance(b.x,b.y,b.z,
p.x, p.y, p.z) < 1000
and b.a > 55
select ...
insert into hitStream
from old = hitStream ,
b = hitStream [old. pid != pid ],
n= hitStream[b.pid == pid]*,
( e1 = hitStream[b.pid != pid ]
or e2= ballLeavingHitStream)
select ...
insert into BallPossessionStream
hap://www.ﬂickr.com/photos/glennharper/146164820/

Conclusions

•  What
is
Big
Data?

•  Big
Data
Architecture

– Collec8ng
data

– Storing
data

– Processing
Data

•  WSO2
Oﬀerings

•  Case
Studies

Engage with WSO2
•  Helping you get the most out of your deployments
•  From project evaluation and inception to development
and going into production, WSO2 is your partner in
ensuring 100% project success

Building your big data solution

Building your big data solution

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Building your big data solution

Similar to Building your big data solution (20)

More from WSO2

More from WSO2 (20)

Building your big data solution