Optimize IT Infrastructure

October 15, 2015
Optimize your IT Infrastructure
with Scalar, EMC and Splunk

Scalar leads Canadian Business to
the Next Generation of IT through
Innovation, Expertise & Service

3
DAVID WIEDASECK
SR. Partner Sales Engineer
dw@splunk.com
JEFFREY WIGGINS
ETD SE Manager
jeffrey.wiggins@emc.com
MICHAEL TRAVES
Solutions Architect
michael.traves@scalar.ca

© 2015 Scalar Decisions Inc. Not for distribution outside of intended audience. 4
Scalar Client Solutions
Security
Context-Based
Enterprise Security
Infrastructure
Integration of Emerging
Technologies
Cloud
Hybrid Cloud
Solutions

Splunk Analytics – Use Cases
Operational Intelligence
§  IT Operations: Utilization, Capacity Growth
§  Security: Fraud Detection, Real-time Detection of Threats, Forensics
§  Internet of Things (IoT): Sensor Data, Machine-to-Machine, Machine-Human
Interactions

Consulting – Solution Design
§  Business Drivers
§  Alignment with IT
§  Stakeholders and Big Data Teams
§  (Data Scientists, Business Analysts, Marketing, IT, CxO, Dir.)
§  Sizing
§  Ingest Performance and Scalability, Search & Index
§  Infrastructure – Scale Out
§  Compute (Virtual, Physical)
§  Network (1/10/40GbE)
§  Storage (Hot/Warm and Cold/Frozen Tiers)
§  Data Security and Protection (Distributed or Consolidated)

Consulting – Deployment
§  Build
§  Pilot and Pre-production
§  Proof of Value
§  Integration with Big Data and Data Lake Initiatives
§  Validate
§  Performance and Scalability
§  Availability
§  Customize
§  Dashboards
§  Reporting and Alerting

But why should you work with US?
10

Top Tier Technical Talent
§  Engineers average 15 years of experience
§  World-class experts from some of the leading organizations in the industry
§  Dedicated PMO, finance, sales and operations teams

Copyright
©
2013
Splunk,
Inc.

Splunk
Big
Data
Analy=cs

Machine
Data
OR
Big
Data?

AND VALUABLE
SPLUNK - MAKE MACHINE DATA
ACCESSIBLE, USABLE
TO EVERYONE
What
is
Machine
Data

hEps://youtu.be/3YEE3RfXVVA

COLLECT
DATA

FROM
ANYWHERE

SEARCH

AND
ANALYZE

EVERYTHING

GAIN
REAL-‐TIME

DATA

INTELLIGENCE

The
Power
of
Splunk

15

16

Turning
Machine
Data
Into
Business
Value

Index
Untapped
Data:
Any
Source,
Type,
Volume

Online

Services

Web

Services

Servers

Security
GPS

Loca=on

Storage

Desktops

Networks

Packaged

Applica=ons

Custom

Applica=ons
Messaging

Telecoms

Online

Shopping

Cart

Web

Clickstreams

Databases

Energy

Meters

Call
Detail

Records

Smartphones

and
Devices

RFID

On-‐

Premises

Private

Cloud

Public

Cloud

Ask
Any
QuesQon

ApplicaQon
Delivery

Security,
Compliance

and
Fraud

IT
OperaQons

Business
AnalyQcs

Industrial
Data
and

the
Internet
of
Things

What
Does
Machine
Data
Look
Like?

Sources

Order
Processing

TwiTer

Care
IVR

Middleware

Error

17

Machine
Data
Contains
CriQcal
Insights

Customer
ID
Order
ID

Customer’s
Tweet

Time
Wai=ng
On
Hold

TwiEer
ID

Product
ID

Company’s
TwiEer
ID

Customer
ID
Order
ID

Customer
ID

Sources

Order
Processing

TwiTer

Care
IVR

Middleware

Error

18

Machine
Data
Contains
CriQcal
Insights

Order
ID

Customer’s
Tweet

Time
Wai=ng
On
Hold

Product
ID

Company’s
TwiEer
ID

Order
ID

Customer
ID

TwiEer
ID

Customer
ID

Customer
ID

Sources

Order
Processing

TwiTer

Care
IVR

Middleware

Error

19

SPLUNK TODAY

20

Mainframe
Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect MobileForwarders
Syslog,
TCP,
Other
Sensors,
Control
Systems
600+ Ecosystem of Apps
Stream

IT
Opera=ons

API

SDKs
UI

Server,
Storage,

Network

Server

Virtualiza=on

Opera=ng

Systems

Custom

Applica=ons

Business

Applica=ons

Cloud

Services

App
Performance

Monitoring
Ticke=ng/Other

Web
Intelligence

Mobile

Applica=ons

Servers

Storage

Desktops
Email
Web

Transac=on

Records

Network

Flows

DHCP/
DNS

Hypervisor

Custom

Apps

Physical

Access

Badges

Threat

Intelligence

Mobile

CMBD

23

Security

Intrusion

Detec=on

Firewall

Data
Loss

Preven=on

An=-‐
Malware

Vulnerability

Scans

Authen=ca=on

TradiQonal
SIEM

Business
Intelligence

Soda
Company
Use
Case

"   Soda
Company
extracts
data
from
vending
machines,
social
media,
and
loyalty

programs

–  Distribu=on

–  New
product
development

–  Insight
into
consumer
buying
paEerns

"   "without
data
you're
just
a
person
with
an
opinion".

"   Customers
face
challenges
with
“data
cartels”
within
their
organiza=on

"   Need
to
“free
the
data
lake”

from
ridgid
structured
data
warehouse

applica=ons

24

Analy=cs

"   What
we
are
looking
for
or
Why
will
depend
on
Who
we
ask

–  What
are
the
normal
characteris=cs
for
a
dog?

ê  Dog
Show:
height,
weight,
coat,
gait,
posture

ê  Veterinarian:
Immuniza=ons,
history
of
illness,
injuries,
diet

ê  Parent:
Suitability
for
children,
temperament,
allergies

ê  Data
Scien=st:

Mean
+/-‐
Standard
devia=on

25

-‐mean
+
std.
dev

-‐Mean

-‐Mean
–
std.dev

Internet
of
Things

26

CorrelaQon
Criteria

"   MAC
address
same

"   Content
in
Search
Results

"   Purchase
=me

Search
Results

(ApplicaQon
Logs)

Device
ID

(MAC
Address)

Time
of
Search

Content

Purchased

(IDA
#)

Device

(MAC
Address)

Time
of
Search

Amount
of

Purchase
($)

Billing
(Structured
Data)

Search
(Machine
Data)

Business
Value

"   Revenues
driven
by
Search

"   Improving
local
content
mix

"   BeEer
search
results

"   Tailor
content
promo=on

>

How
Splunk
Stores

Data

How
Splunk
Stores
Data

"   As
Splunk
indexes
your
data
it
creates
a
bunch
of
files

–  Raw
data
in
compressed
for
(rawdata)

–  Indexes
that
point
to
the
raw
data,
plus
some
meta
data
files
(Index
Files)

"   The
index
files
reside
in
directories
known
as
a
“bucket”

"   A
bucket
Moves
through
Several
Stages
as
it
ages

–  Hot
&
Warm

$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*

–  Cold

$SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/

–  Frozen

Archive
(Can
sEll
be
searched
and
thawed)

"   File
name
Format

db_<newest_Eme>_<oldest_Eme>_<localid>_<guid>

28

Splunk
Index
Buckets

29

Bucket

Stage

DescripQon
Searchable?

Hot
Newly
Indexed
Data,
One
or

more
hot
buckets
per
Index

Yes

Warm
Data
rolled
from
hot.
There

are
many
warm
buckets

Yes

Cold
Data
rolled
from
cold.
There

are
many
cold
buckets

Yes

Frozen
Data
rolled
from
cold.
Splunk

deletes
frozen
data
by

default,
but
it
can
also
be

archived.
Archived
data
can

later
be
thawed

Can
be

Storage
Considera=ons

"   Storage
requirements
!=
Index
Volume
(GB/day)

–  Search
proﬁle
and
number
of
searches
is
just
as
important

–  Also
must
consider
data
reten=on

" Splunk
u=lizes
I/O
to
perform
both
Searching
AND
Indexing

–  Load
=
Search
Volume
+
Indexing
Volume

–  Index
load
is
write
intensive

–  Search
load
is
read
intensive
against
the
data
searched
(current
vs
recent
vs
old)

–  SSDs
generally
provide
higher
performance
over
HDDs,
but
at
a
cost

30

Storage
Considera=ons

"   What
is
the
use-‐case?

–  IT
Opera=ons
use-‐cases
typically
search
against
recent
data
(e.g.
–
0
to
14
days)

–  Security
and
Analy=cs
use-‐cases
typically
search
all
data
(e.g.
–
days
to
months

to
years)

"   What
is
the
typical
=me
span
of
the
data
searched?

–  Most
ad-‐hoc
searches
are
against
current
or
recent
data

–  Analy=cs
may
span
a
very
large
=me
frame

–  Security
forensics
typically
search
all
data

–  Reports
or
Aler=ng
Searches
might
be
over
the
past
day
or
week

31

Splunk
Index
Replica=on
–
High
Availability

32

2

Master
asks
the
redundant

peer
to
act
as
primary

3

Peers
copies
the
search

ﬁles
/
index
ﬁles
/
raw
data

2

3

1

Master
auto-‐detects
that
a

peer
is
down

1

•  Default
is
3X
Replica=on

Scalable
Cluster
Base
Architecture

Send
data
from
1000s
of
servers
using
combina=on
of
Splunk
Forwarders,
syslog,
WMI,
message
queues,
or
other
remote
protocols

Auto
load-‐balanced
forwarding
to
as
many
Splunk
Indexers
as
you
need
to
index
terabytes/day

Oﬄoad
search
load
to
Splunk
Search
Heads

33

" Automa=c
load
balancing
linearly

scales
indexing

" Distributed
search
and
MapReduce

linearly
scales
search
and
repor=ng

Splunk
Real-‐Time
Analy=cs

Data

Parsing
Queue

Parsing
Pipeline

•  Source,
event
typing

•  Character
set

normaliza=on

•  Line
breaking

•  Timestamp
iden=ﬁca=on

•  Regex
transforms

Indexing

Pipeline

Real-‐=me

Buﬀer

Raw
data

Index
Files

Real-‐=me

Search

Process

Monitor
Input

Index
Queue

TCP/UDP
Input

Scripted
Input

Splunk

Index

34

Distributed
File
System

(semi-‐structured)

Key/Value,
Columnar
or

Other
(semi-‐structured)

RelaQonal
Database

(highly
structured)

MapReduce

Cassandra

Accumulo

MongoDB

Splunk
-‐
Big
Data
Technologies

SQL
&

MapReduce

NoSQL

Temporal,
Unstructured

Heterogeneous

Hadoop

RDBMS
HDFS
Storage
+

MapReduce

Real-‐Time
Indexing

35

Oracle

MySQL

IBM
DB2

Teradata

Copyright
©
2013
Splunk,
Inc.

Hunk
-‐
Hadoop

Image
Search
with
Hunk

hEp://blogs.splunk.com/2013/10/18/images-‐search-‐with-‐splunk-‐and-‐hunk/

37

•  Image
search
on
HDFS
using
Splunk

•  Select
images
based
on
ranges
of
color

•  3
parts

•  The
Preprocessor
using
Hadoop
Record

reader
in
Java

•  Splunk
Search

•  Splunk
UI

•  search
index=images
|
eval
score=color1+color2+
…+colorN
|
sort
-‐score
by
image

Why
Splunk
&
Hunk

•  Schema
on
the
Fly
–
fast,
flexible,
interac=ve
analy=cs
experience.

•  Interac=ve
Search
–
you
don’t
to
know
anything
about
the
data
in
advance,

Hunk
automa=cally
adds
structure
and
iden=fies
fields
of
interest,
keywords,

top
values,
and
paEerns
over
=me

•  Results
Preview
–
query
results
are
streamed
back
in
real
=me.
Pause
and

refine
queries
without
having
to
wait
for
jobs
to
finish.

•  Drag
and
Drop
Analy=cs
–
quickly
create
charts,
visuals
,
and
dashboards
using

pivot

•  Rich
App
ecosystem
for
popular
applica=ons
and
data
types

•  Hunk
–
Search
and
Report
on
na=ve
HDFS
without
inges=ng
the
data

38

Challenges
With
Open
Source
Analy=cs

•  Open
source
sozware
such
as
Hadoop
and
Cassandra
require
significant

services
effort
—
as
much
as
20X
higher
personnel
costs
rela=ve
to
sozware

purchases.

•  Challenges
Ge|ng
Value
from
Data
in
Hadoop

•  Easy
storage
but
hard
analy=cs:
difficult
for
non-‐specialists
to
explore,
analyze
and

visualize
data

•  Complex
technology:
wide
range
of
open
source
projects

•  Hard-‐to-‐staff
skills:
must
write
MapReduce
jobs
or
pre-‐define
schemas
for
Hive

•  Hadoop
was
designed
to
be
a
batch
job
processing
system,
ie
you
start
a
job

and
see
results
in
a
range
from
tens
of
minutes
to
days.

39

Gartner,
“Big
Data
Drives
Rapid
Changes
in
Infrastructure
and

US$232
Billion
in
IT
Spending
Through
2016”,
October
17,
2012

Splunk
and
Hadoop

40

"   Hunk:

–  Main
use
case
=
Analyze
Hadoop
Data
using
Hadoop
Processing

"  
Splunk
Hadoop
Connect:

–  Main
use
case
=
Real-‐=me
export
data
from
Splunk
to
Hadoop

"   Hunk
Archive

–  Main
use
case
=
Archive
Splunk
indexers
to
Hadoop

"   Splunk
HadoopOps:

–  Main
use
case
=
Monitor
Hadoop

41

Integrated
Analy=cs
Pla•orm

Full-‐featured,

Integrated

Product

Insights
for

Everyone

Works
with

What
You

Have
Today

Explore
Visualize
Dashboard
s

Share
Analyze

Hadoop
Clusters
NoSQL,
EMR,
S3
Buckets

Hadoop
Client
Libraries

for
Diverse
Data
Stores

Hunk
–
Unique

42

1.  Run
NaQvely
in
Hadoop:

–  Use
Hadoop
MapReduce

2.  Mixed
Mode:

–  Allows
for
data
Preview

3.  Auto
deploy
SplunkD
to
DataNodes:

–  On
the
ﬂy
Indexing

4.  Access
Control:

–  Allows
for
many
users
/
many
Hadoop
directories
/
support
Kerberos

5.  Schema
On
the
Fly

Mixed-‐mode
Search

43

Time

Hadoop
MR
/

Splunk
Index

Splunk
Stream

Switch
over

=me

preview

preview

•  Data
Preview

•  Allows
users
to
search
interac=vely
by
pausing
and

reﬁning
queries

44

Role-‐based
Security
for
Shared
Clusters

Pass-‐through

Authen=ca=on

•  Provide
role-‐based
security

for
Hadoop
clusters

•  Access
Hadoop
resources

under
security
and

compliance

•  Integrates
with
Kerberos

for
Hadoop
security

Business

Analyst

MarkeQng

Analyst

Sys

Admin

Business

Analyst

Queue:

Biz
AnalyQcs

MarkeQng

Analyst

Queue:

MarkeQng

Sys

Admin2

Queue:

Prod

Hadoop
as
a
Self
Service

45

Copyright
©
2015
Splunk
Inc.

Jeﬀ
Wiggins

Systems
Engineer
Manager,

Emerging
Technologies
@
EMC

Splunk…so
Big
and
Flashy

Building
Massive
and
Eﬃcient
Indexer

Storage
Environments
for
Splunk

Architecture
MaEers…

Scale-up Scale-Out

SPLUNK
STORAGE
REQUIREMENTS

•  High-‐Performance
Storage

–  Rare
&
Sparse
Searches

•  High-‐Capacity
Storage

–  Long-‐Term
Reten=on

•  Scale-‐Out
Infrastructure

– 
Indexer
&
Search
Heads

•  De-‐dupe
&
Compression

–  Clustered
Indexer
Deployments

•  Backup
&
Security

–  Data
Protec=on
&
Compliance

ENTERPRISE
PERFORMANCE
AND
DATA
SERVICES

Indexers

Search
Heads

Capacity
Triggered

HOT

WARM

COLD

DAS
PRESENTS
CHALLENGES

SPLUNK DAS ENVIRONMENT
1
Dedicated Storage Infrastructure
•  Silo that only runs Splunk
2
Compromised Availability
•  SSDs & servers fail
•  Index rebuilds can take hours to days
3
Lack of Enterprise Data Protection
•  No Snapshots or Compliance
•  DR limited to Multisite Clustering
4
Poor Storage Efficiency
•  Multiple copies of data
•  Multisite Clustering Increases Overhead
5
Non-Optimized Growth
•  Fixed compute to storage ratio
•  Servers must maintain storage symmetry
6
Management complexity
•  Multiple management points
1x
2x
3x
2x
3x
1x

WHY
EMC
FOR
SPLUNK

OPTIMIZED
INFRASTRUCTURE
FOR
BIG
&
FAST
DATA

OpQmized
Shared

Storage
&
Tiering

Hot & Warm
Data Deployed
On XtremIO or
ScaleIO
Cold & Frozen
Data Deployed
On Isilon

Powerful
Data
Services

Encyption &
Security
Index File
Compression
Deduplication Of
Clustered Indexes
Snapshots For
Backups
Cost-‐EﬀecQve
&
Flexible

Scale-‐Out

Scale-Out Capacity &
Compute Independently Or
As Converged Platform

Why
Flash?!?

Economic
Inﬂuences

ü  Consumer
Demand

ü  Data
Services
Reducing

Impact
of
Applica=on

Data
Copies

ü  Flash
technology
has

improved
at
a
faster
rate

than
Moore’s
Law

Intelligent
Scale-‐out
Flash

HDD

AGILE
WRITEABLE
SNAPSHOTS
INLINE
DATA AT REST
ENCRYPTION
XTREMIO DATA
PROTECTION
INLINE
DEDUPLICATION
INLINE
COMPRESSION
ALWAYS-ON
THIN
PROVISIONING
XTREMIO
DATA
SERVICES

ALWAYS-‐ON,
INLINE,
ZERO
PENALTY,
FREE

Data
Services
For
Hot
&

Warm
Data

Self-Encrypting
Flash Drives
Index File
Compression
Dedupe Clustered
Index Copies
In-Memory Data
Copy Services
EMC
XTREMIO
&
SPLUNK

ALL-‐FLASH
INFRASTRUCTURE
FOR
HOT
&
WARM
DATA

Scale-Out Flash For
I/O-Bound Data
>1M IOPS & <1ms Latencies
High-Speed Search
Accelerate SuperSparse
& Rare Searches
Indexers

Search
Heads

EMC
SCALEIO
&
SPLUNK

CONVERGED
ARCHITECTURE
FOR
HOT
&
WARM
DATA

Indexers

Search
Heads

Servers

Network

Storage

Converged
Splunk

Architecture

Leveraging
Exis=ng
Hardware

Investments

5K
IOPS

1
TB

5K
IOPS

1
TB

5K
IOPS

1
TB

5K
IOPS

1
TB

5K
IOPS

1
TB

Shared
Capacity
&

Performance

Remove
Silos
&
Increase
ROI
On

DAS
Capacity
&
No
Single
Point

Of
Failure

25K
IOPS
&
5TB

OneFS

EMC
Isilon
–
Deep
and
WIDE
Storage

Single
Volume/

File
System

Policy
based

Tiering

Simplicity
&

Ease
of
Use

Linear

Scalability

MulQ-‐protocol

support

High

Performance

Unmatched

Eﬃciency

Easy

Growth

Consolidate,
Protect
&

Secure
Cold
Data

SmartLock Protects
Cold & Frozen Data
SmartDedupe For
Clustered Indexes
Snapshots IQ
For Backups
EMC
ISILON
&
SPLUNK

LOW-‐COST
&
SECURE
SCALE-‐OUT
FOR
COLD
DATA

High-Speed Ingest
& Long-Term Retention With
Native HDFS Integration
Indexers

Search
Heads

Scale-Out Capacity
Up To 50PB Of Highly
Available Capacity
Self-Encrypting
Drives


For more information:
§  Read more about Scalar’s infrastructure practice model:
§  https://www.scalar.ca/en/what-we-do/#/services/pillar/infrastructure-en


Connect with us!
§  @scalardecisions
§  Scalar Decisions
§  Facebook.com/
ScalarDecisions

Optimize IT Infrastructure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Optimize IT Infrastructure

Similar to Optimize IT Infrastructure (20)

More from Scalar Decisions

More from Scalar Decisions (20)

Recently uploaded

Recently uploaded (20)

Optimize IT Infrastructure