Experiences Streaming Analytics at Petabyte Scale

Copyright
©
2012
Splunk
Inc.

Experiences
in
Streaming

Analy>cs
at
Petabyte

(or
larger)
Scale

Stephen
Sorkin

VP
Engineering,
Splunk
Inc.

Eddie
Sa/erly

Chief
Big
Data
Evangelist,
Splunk
Inc.

Big
Data
Comes
from
Machines

Volume

|

Velocity

|

Variety
|
Variability

Machine-‐generated
data
is
one
of
the

fastest
growing,
most
complex

GPS,

and
most
valuable
segments
of
big
data
RFID,

Hypervisor,

Web
Servers,

Email,
Messaging

Clickstreams,
Mobile,

Telephony,
IVR,
Databases,

Sensors,
Telema>cs,
Storage,

Servers,
Security
Devices,
Desktops

2

What
Does
Machine
Data
Look
Like?

Sources

Order
Processing

Middleware

Error

Care
IVR

Twi/er

3

Machine
Data
Contains
Cri>cal
Insights

Sources

Customer
ID
Order
ID
Product
ID

Order
Processing

Order
ID
Customer
ID

Middleware

Error

Time
Wai>ng
On
Hold

Care
IVR
Customer
ID

TwiZer
Customer’s
Tweet

ID

Twi/er

Company’s
TwiZer
ID

4

Big
Data
Technologies

Aster
Data
Cassandra

Greenplum
Voldemort

Big
Table

CouchDB

Hadoop

Single
Single
RDBMS
SQL
&
NoSQL

RDBMS
Bigger
Sharding
Map/Reduce

RDBMS

Map
/
Reduce

Rela>onal
Database
(highly
structured)
Key/Value,
Tables
or

Temporal,
Unstructured

Other
(semi-‐structured)
Heterogeneous

Time

5

Splunk
Turns
Machine
Data
into
Real-‐>me
Insights

Op>mized
for
real-‐>me,
low
latency
and
interac>vity

Ad
hoc

search

Monitor

and
alert

Real-‐Dme

CollecDon
and

Report
and

Indexing
analyze

Splunk
storage

Other

Custom

Stores
dashboards

Developer

PlaHorm

6

Splunk
Collects
and
Indexes
Any
Machine
Data

No
upfront
schema.
No
RDBMS.
No
custom
connectors.

Customer

Outside
the

Facing
Data
Datacenter

!  Click-‐stream
data
!  Manufacturing,

!  Shopping
cart
data
logis>cs…

!  Online
transac>on
data
!  CDRs
&
IPDRs

!  Power
consump>on

!  RFID
data

Logfiles
Configs
Messages
Traps

Metrics
Scripts
Changes
Tickets
!  GPS
data

Alerts

Windows
Linux/Unix
VirtualizaDon

ApplicaDons
Databases
Networking

!  Registry
!  Configura>ons

&
Cloud
!  Web
logs
!  Configura>ons
!  Configura>ons

!  Event
logs
!  syslog
!  Log4J,
JMS,
JMX
!  Audit/query
!  syslog

!  File
system
!  File
system
!  Hypervisor
!  .NET
events
logs
!  SNMP

! sysinternals
!  ps,
iostat,
top
!  Guest
OS,
Apps
!  Code
and
scripts
!  Tables
!  neglow

!  Cloud
!  Schemas

7

New
Approach
to
Analyzing
Heterogeneous
Data

Universal

Late
Structure
Analysis
and

Indexing
Binding
Visualiza>on

! No
data
normaliza>on
! Knowledge
applied
at
! Normaliza>on
as
it’s

! Automa>cally
handles
search-‐>me
needed

>mestamps
! No
briZle
schema
to
work
! Faster
implementa>on

! Parsers
not
required
around
! Easy
search
language

! Index
every
term
&
! Mul>ple
views
into
the
! Mul>ple
views
into
the

paZern
“blindly”
same
data
same
data

! No
aZempt
to
! Find
transac>ons,
paZerns

“understand”
up
front
and
trends

Rapid
>me-‐to-‐deploy:
hours
or
days

8

Splunk
Search
Processing
Language

Lots
of
random
“hypothe>cal
examples”
from
our
Mugs

9

Opera>onal
Intelligence
for
IT
and
Business
Users

IT
Opera>ons
Management
Web
Intelligence

Applica>on
Management

Business
Analy>cs

Security
&
Compliance

Customer
LOB
Owners/

Support
Execu>ves

Opera>ons
Website/Business

Teams
Analysts

System
IT

Administrator
Execu>ves

Development

Security
Auditors

Teams

Analysts

10

Scalability
to
Tens
of
TBs/Day
on
Commodity
Servers

Oﬄoad
search
load
to
Splunk
Search
Heads

Auto
load-‐balanced
forwarding
to
as
many
Splunk
Indexers
as
you
need
to
index
terabytes/day

Send
data
from
1000s
of
servers
using
combina>on
of
Splunk
Forwarders,
syslog,
WMI,
message
queues,
or
other
remote
protocols

11

Splunk
Big
Data
Solu>on

Product-‐based
Integrated
and

Performance

Solu>on
End-‐to-‐end
at
scale

!  Easy
to
download
and
!  Collects
data
from
tens
!  Proven
at
mul>-‐terabyte

deploy
of
thousands
of
sources
scale
per
day

!  Pre-‐integrated,
end-‐to-‐ !  Advanced
real-‐>me
and
!  Upwards
of
PB
under

end
func>onality
historical
analysis
of
management

!  Enterprise-‐grade
data
!  4,000+
customers

features
!  Fast,
custom

visualiza>ons
for
IT
and

business
users

!  Developer
APIs
SDKs

12

Accelerate
Games
Releases
with
Big
Data
Insight

Splunk
Use:

–  Over
10
TB/day
from
scaled-‐out
cloud
and
physical
infrastructure

–  Data
indexed
includes
web
server
and
applica>on
logs
for
games

–  Splunk
for
opera>onal
visibility,
troubleshoo>ng
and
monitoring

–  Users
include:
game
opera>ons,
developers,
and
corporate
IT

Value
Delivered:

–  Faster
game
releases
with
real-‐>me
visibility
into
produc>on
issues

–  Reduced
fault
resolu>on
>me
from
hours
to
minutes

–  Scale
ops
team
to
manage
and
monitor
growing
infrastructure

l  Leading
social
gaming
company

globally

l  232
million
monthly
ac>ve
users

l  60
million
daily
ac>ve
users

13

!  Launched
in
November
2008

!  Over
33
million
ac>ve
customers
(as
of
December
2011)

!  More
than
11,000
employees
worldwide

!  Ac>ve
in
48
countries

!  Running
over
1,000
deals/day
worldwide

Daily
Uses
of
Splunk

Key
AcDviDes
Splunk
Use
Cases

!  Guarantee
API
performance
!  All
log
data
is
available
through
Splunk

!   Monitor
API
data
usage
!  Dashboards

!  Early
access
to
key
business
metrics
!  No>ﬁca>ons

(conversions,
funnel,
etc.)

!  End-‐to-‐end
tes>ng
>
!  Near
real-‐>me

!  Ad
hoc
troubleshoo>ng

“Cannot
have
a
server
that
is

not
sending
data
into
Splunk”

15

Complemen>ng
BI
and
Hadoop

CollecDon
&
OperaDonal
Intelligence
Daily,
weekly,
monthly
metrics
across
promo>ons

oﬀers
and
acceptance
rates

Applica>on
Performance
Management
(APM)

and
system
availability

Hadoop
Machine
Data
ETL
–
highly
reliable
data
delivery

IntegraDon
to
HDFS

Data
Archival
&
Batch
Data
Science

Long-‐term
data
warehousing
and
specialized,
batch

analy>cs

17

Turning
Big
Data
Into

Opera>onal
Insights

at
Expedia

Formerly
-‐
Sr.
Director
–

Who

Eddie
Sa/erly

Am
I?
Architecture
&
Engineering,
Expedia

! The
World’s
Largest

! Discount
travel
site

Travel
Site
Hotwire®

! First
$1B
Quarter
in
2011
! 4,000+
Technology
Workers

! 90
localized
Expedia.com®
and
! Development
Team

Who
Is
Hotels.com®
sites
of
1,800

Expedia?
! NASDAQ:
(EXPE)

19

Where
Splunk
Comes
In

12,000+

27,000+

1,000+

227,000

Servers
Hosts
Source
Types
Sources

38
Indexers,

16
Search
heads
>
6.5TB
per
day
indexed

20+
Diﬀerent
Solu>ons
for
RCA

All
Migrated
to
Splunk
in
3
Months

20

SDK
Integra>ons

built
for
Cassandra

Why
Splunk?
Archiving
Data
to

Hadoop
for
batch

data
stores
analysis

Speed

of

Deployment

Splunkbase
Apps
Scales
via

Available
for
Commodity

Download
Hardware

Developers
Build
Aggrega>on
of

Custom
Apps
and
Log
Data
from

Dashboards
Any
Device

Simple
UI

for
IT
and

Business
Users

21

Splunk
Adop>on
Over
Ten
Months

Use
case:
Business
Unit

Use
case:
Ecommerce
Systems

Data:
125GB/day
Data:
1.8TB/day

Systems:
1100
Systems:
8700

Deployment:
Jan.
2011
Deployment:
March
2011
Big
Data
Integra>on

Use
case:
App
Transac>ons

Data:
3TB/day

Ini>al
Pilot
Viral
Growth
from
Systems:
90TB
Data
Per
Mo.

Demonstrated
Value
Deployment:
1Q12-‐2Q12

All
Devices,

All
Data
Centers

Use
case:
All
Devices

Data:
~4TB/day

Systems:
~21000

Deployment:
Aug.
2011

22

Integrate
External
Data

Extend
search
with
lookups
to
external
data
sources.

LDAP,
AD
Watch
Lists

CMDB
Message

Stores

Reference

Lookups

Correlate
across
mul>ple
data
sources
and
data
sets
using
indexes
and
keys

23

Unique
Characteris>cs
of
Splunk
MapReduce

•  Real-‐>me
temporal
MapReduce

•  Preview
in-‐progress
searches

•  Searching
works
on
any
devices

•  Simpliﬁed
Search
Language

24

Splunk
Impact
/
Top
Takeaways

Splunk
helped
deliver
Expedia
an
annual
ROI
of
over
$11
Million

ROI
=
5x
original
Splunk
usage

More
data
=

Business
Case
is
viral
more
beneﬁts

!  Tools
Consolida>on
!  50+
Apps
Developed

!  Adding
more
data
to

and
Re>rement
by
Our
Team
Splunk
via
weekly

deployments

!  83%
MTTR
Reduc>on

!  Over
1,400
Users
on

!  Analyzing
more
data

Outage
Avoidance
a
Regular
Basis

!  sets
in
Splunk
UI
from

Hadoop
&
Cassandra

25

splunk.com/bigdata

Ques>ons?

Sessions will resume at 11:25am

Page 27

Experiences Streaming Analytics at Petabyte Scale

More Related Content

Similar to Experiences Streaming Analytics at Petabyte Scale

More from DataWorks Summit

Recently uploaded

Experiences Streaming Analytics at Petabyte Scale