The document discusses how Scalar, EMC, and Splunk can help optimize IT infrastructure for Canadian businesses through innovation, expertise, and services. It provides an overview of Scalar's client solutions in areas like security, cloud, and integrating emerging technologies. Use cases for Splunk analytics in IT operations, security, and IoT are also examined. The document explores consulting services around solution design, deployment, and customization.
14. AND VALUABLE
SPLUNK - MAKE MACHINE DATA
ACCESSIBLE, USABLE
TO EVERYONE
What
is
Machine
Data
hEps://youtu.be/3YEE3RfXVVA
15. COLLECT
DATA
FROM
ANYWHERE
SEARCH
AND
ANALYZE
EVERYTHING
GAIN
REAL-‐TIME
DATA
INTELLIGENCE
The
Power
of
Splunk
15
16. 16
Turning
Machine
Data
Into
Business
Value
Index
Untapped
Data:
Any
Source,
Type,
Volume
Online
Services
Web
Services
Servers
Security
GPS
Loca=on
Storage
Desktops
Networks
Packaged
Applica=ons
Custom
Applica=ons
Messaging
Telecoms
Online
Shopping
Cart
Web
Clickstreams
Databases
Energy
Meters
Call
Detail
Records
Smartphones
and
Devices
RFID
On-‐
Premises
Private
Cloud
Public
Cloud
Ask
Any
QuesQon
ApplicaQon
Delivery
Security,
Compliance
and
Fraud
IT
OperaQons
Business
AnalyQcs
Industrial
Data
and
the
Internet
of
Things
17. What
Does
Machine
Data
Look
Like?
Sources
Order
Processing
TwiTer
Care
IVR
Middleware
Error
17
18. Machine
Data
Contains
CriQcal
Insights
Customer
ID
Order
ID
Customer’s
Tweet
Time
Wai=ng
On
Hold
TwiEer
ID
Product
ID
Company’s
TwiEer
ID
Customer
ID
Order
ID
Customer
ID
Sources
Order
Processing
TwiTer
Care
IVR
Middleware
Error
18
19. Machine
Data
Contains
CriQcal
Insights
Order
ID
Customer’s
Tweet
Time
Wai=ng
On
Hold
Product
ID
Company’s
TwiEer
ID
Order
ID
Customer
ID
TwiEer
ID
Customer
ID
Customer
ID
Sources
Order
Processing
TwiTer
Care
IVR
Middleware
Error
19
20. SPLUNK TODAY
20
Mainframe
Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect MobileForwarders
Syslog,
TCP,
Other
Sensors,
Control
Systems
600+ Ecosystem of Apps
Stream
22. IT
Opera=ons
API
SDKs
UI
Server,
Storage,
Network
Server
Virtualiza=on
Opera=ng
Systems
Custom
Applica=ons
Business
Applica=ons
Cloud
Services
App
Performance
Monitoring
Ticke=ng/Other
Web
Intelligence
Mobile
Applica=ons
23. Servers
Storage
Desktops
Email
Web
Transac=on
Records
Network
Flows
DHCP/
DNS
Hypervisor
Custom
Apps
Physical
Access
Badges
Threat
Intelligence
Mobile
CMBD
23
Security
Intrusion
Detec=on
Firewall
Data
Loss
Preven=on
An=-‐
Malware
Vulnerability
Scans
Authen=ca=on
TradiQonal
SIEM
24. Business
Intelligence
Soda
Company
Use
Case
" Soda
Company
extracts
data
from
vending
machines,
social
media,
and
loyalty
programs
– Distribu=on
– New
product
development
– Insight
into
consumer
buying
paEerns
" "without
data
you're
just
a
person
with
an
opinion".
" Customers
face
challenges
with
“data
cartels”
within
their
organiza=on
" Need
to
“free
the
data
lake”
from
ridgid
structured
data
warehouse
applica=ons
24
25. Analy=cs
" What
we
are
looking
for
or
Why
will
depend
on
Who
we
ask
– What
are
the
normal
characteris=cs
for
a
dog?
ê Dog
Show:
height,
weight,
coat,
gait,
posture
ê Veterinarian:
Immuniza=ons,
history
of
illness,
injuries,
diet
ê Parent:
Suitability
for
children,
temperament,
allergies
ê Data
Scien=st:
Mean
+/-‐
Standard
devia=on
25
-‐mean
+
std.
dev
-‐Mean
-‐Mean
–
std.dev
26. Internet
of
Things
26
CorrelaQon
Criteria
" MAC
address
same
" Content
in
Search
Results
" Purchase
=me
Search
Results
(ApplicaQon
Logs)
Device
ID
(MAC
Address)
Time
of
Search
Content
Purchased
(IDA
#)
Device
(MAC
Address)
Time
of
Search
Amount
of
Purchase
($)
Billing
(Structured
Data)
Search
(Machine
Data)
Business
Value
" Revenues
driven
by
Search
" Improving
local
content
mix
" BeEer
search
results
" Tailor
content
promo=on
>
28. How
Splunk
Stores
Data
" As
Splunk
indexes
your
data
it
creates
a
bunch
of
files
– Raw
data
in
compressed
for
(rawdata)
– Indexes
that
point
to
the
raw
data,
plus
some
meta
data
files
(Index
Files)
" The
index
files
reside
in
directories
known
as
a
“bucket”
" A
bucket
Moves
through
Several
Stages
as
it
ages
– Hot
&
Warm
$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*
– Cold
$SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/
– Frozen
Archive
(Can
sEll
be
searched
and
thawed)
" File
name
Format
db_<newest_Eme>_<oldest_Eme>_<localid>_<guid>
28
29. Splunk
Index
Buckets
29
Bucket
Stage
DescripQon
Searchable?
Hot
Newly
Indexed
Data,
One
or
more
hot
buckets
per
Index
Yes
Warm
Data
rolled
from
hot.
There
are
many
warm
buckets
Yes
Cold
Data
rolled
from
cold.
There
are
many
cold
buckets
Yes
Frozen
Data
rolled
from
cold.
Splunk
deletes
frozen
data
by
default,
but
it
can
also
be
archived.
Archived
data
can
later
be
thawed
Can
be
30. Storage
Considera=ons
" Storage
requirements
!=
Index
Volume
(GB/day)
– Search
profile
and
number
of
searches
is
just
as
important
– Also
must
consider
data
reten=on
" Splunk
u=lizes
I/O
to
perform
both
Searching
AND
Indexing
– Load
=
Search
Volume
+
Indexing
Volume
– Index
load
is
write
intensive
– Search
load
is
read
intensive
against
the
data
searched
(current
vs
recent
vs
old)
– SSDs
generally
provide
higher
performance
over
HDDs,
but
at
a
cost
30
31. Storage
Considera=ons
" What
is
the
use-‐case?
– IT
Opera=ons
use-‐cases
typically
search
against
recent
data
(e.g.
–
0
to
14
days)
– Security
and
Analy=cs
use-‐cases
typically
search
all
data
(e.g.
–
days
to
months
to
years)
" What
is
the
typical
=me
span
of
the
data
searched?
– Most
ad-‐hoc
searches
are
against
current
or
recent
data
– Analy=cs
may
span
a
very
large
=me
frame
– Security
forensics
typically
search
all
data
– Reports
or
Aler=ng
Searches
might
be
over
the
past
day
or
week
31
32. Splunk
Index
Replica=on
–
High
Availability
32
2
Master
asks
the
redundant
peer
to
act
as
primary
3
Peers
copies
the
search
files
/
index
files
/
raw
data
2
3
1
Master
auto-‐detects
that
a
peer
is
down
1
• Default
is
3X
Replica=on
33. Scalable
Cluster
Base
Architecture
Send
data
from
1000s
of
servers
using
combina=on
of
Splunk
Forwarders,
syslog,
WMI,
message
queues,
or
other
remote
protocols
Auto
load-‐balanced
forwarding
to
as
many
Splunk
Indexers
as
you
need
to
index
terabytes/day
Offload
search
load
to
Splunk
Search
Heads
33
" Automa=c
load
balancing
linearly
scales
indexing
" Distributed
search
and
MapReduce
linearly
scales
search
and
repor=ng
34. Splunk
Real-‐Time
Analy=cs
Data
Parsing
Queue
Parsing
Pipeline
• Source,
event
typing
• Character
set
normaliza=on
• Line
breaking
• Timestamp
iden=fica=on
• Regex
transforms
Indexing
Pipeline
Real-‐=me
Buffer
Raw
data
Index
Files
Real-‐=me
Search
Process
Monitor
Input
Index
Queue
TCP/UDP
Input
Scripted
Input
Splunk
Index
34
35. Distributed
File
System
(semi-‐structured)
Key/Value,
Columnar
or
Other
(semi-‐structured)
RelaQonal
Database
(highly
structured)
MapReduce
Cassandra
Accumulo
MongoDB
Splunk
-‐
Big
Data
Technologies
SQL
&
MapReduce
NoSQL
Temporal,
Unstructured
Heterogeneous
Hadoop
RDBMS
HDFS
Storage
+
MapReduce
Real-‐Time
Indexing
35
Oracle
MySQL
IBM
DB2
Teradata
37. Image
Search
with
Hunk
hEp://blogs.splunk.com/2013/10/18/images-‐search-‐with-‐splunk-‐and-‐hunk/
37
• Image
search
on
HDFS
using
Splunk
• Select
images
based
on
ranges
of
color
• 3
parts
• The
Preprocessor
using
Hadoop
Record
reader
in
Java
• Splunk
Search
• Splunk
UI
• search
index=images
|
eval
score=color1+color2+
…+colorN
|
sort
-‐score
by
image
38. Why
Splunk
&
Hunk
• Schema
on
the
Fly
–
fast,
flexible,
interac=ve
analy=cs
experience.
• Interac=ve
Search
–
you
don’t
to
know
anything
about
the
data
in
advance,
Hunk
automa=cally
adds
structure
and
iden=fies
fields
of
interest,
keywords,
top
values,
and
paEerns
over
=me
• Results
Preview
–
query
results
are
streamed
back
in
real
=me.
Pause
and
refine
queries
without
having
to
wait
for
jobs
to
finish.
• Drag
and
Drop
Analy=cs
–
quickly
create
charts,
visuals
,
and
dashboards
using
pivot
• Rich
App
ecosystem
for
popular
applica=ons
and
data
types
• Hunk
–
Search
and
Report
on
na=ve
HDFS
without
inges=ng
the
data
38
39. Challenges
With
Open
Source
Analy=cs
• Open
source
sozware
such
as
Hadoop
and
Cassandra
require
significant
services
effort
—
as
much
as
20X
higher
personnel
costs
rela=ve
to
sozware
purchases.
• Challenges
Ge|ng
Value
from
Data
in
Hadoop
• Easy
storage
but
hard
analy=cs:
difficult
for
non-‐specialists
to
explore,
analyze
and
visualize
data
• Complex
technology:
wide
range
of
open
source
projects
• Hard-‐to-‐staff
skills:
must
write
MapReduce
jobs
or
pre-‐define
schemas
for
Hive
• Hadoop
was
designed
to
be
a
batch
job
processing
system,
ie
you
start
a
job
and
see
results
in
a
range
from
tens
of
minutes
to
days.
39
Gartner,
“Big
Data
Drives
Rapid
Changes
in
Infrastructure
and
US$232
Billion
in
IT
Spending
Through
2016”,
October
17,
2012
40. Splunk
and
Hadoop
40
" Hunk:
– Main
use
case
=
Analyze
Hadoop
Data
using
Hadoop
Processing
"
Splunk
Hadoop
Connect:
– Main
use
case
=
Real-‐=me
export
data
from
Splunk
to
Hadoop
" Hunk
Archive
– Main
use
case
=
Archive
Splunk
indexers
to
Hadoop
" Splunk
HadoopOps:
– Main
use
case
=
Monitor
Hadoop
41. 41
Integrated
Analy=cs
Pla•orm
Full-‐featured,
Integrated
Product
Insights
for
Everyone
Works
with
What
You
Have
Today
Explore
Visualize
Dashboard
s
Share
Analyze
Hadoop
Clusters
NoSQL,
EMR,
S3
Buckets
Hadoop
Client
Libraries
for
Diverse
Data
Stores
42. Hunk
–
Unique
42
1. Run
NaQvely
in
Hadoop:
– Use
Hadoop
MapReduce
2. Mixed
Mode:
– Allows
for
data
Preview
3. Auto
deploy
SplunkD
to
DataNodes:
– On
the
fly
Indexing
4. Access
Control:
– Allows
for
many
users
/
many
Hadoop
directories
/
support
Kerberos
5. Schema
On
the
Fly
43. Mixed-‐mode
Search
43
Time
Hadoop
MR
/
Splunk
Index
Splunk
Stream
Switch
over
=me
preview
preview
• Data
Preview
• Allows
users
to
search
interac=vely
by
pausing
and
refining
queries
44. 44
Role-‐based
Security
for
Shared
Clusters
Pass-‐through
Authen=ca=on
• Provide
role-‐based
security
for
Hadoop
clusters
• Access
Hadoop
resources
under
security
and
compliance
• Integrates
with
Kerberos
for
Hadoop
security
Business
Analyst
MarkeQng
Analyst
Sys
Admin
Business
Analyst
Queue:
Biz
AnalyQcs
MarkeQng
Analyst
Queue:
MarkeQng
Sys
Admin2
Queue:
Prod
50. DAS
PRESENTS
CHALLENGES
SPLUNK DAS ENVIRONMENT
1
Dedicated Storage Infrastructure
• Silo that only runs Splunk
2
Compromised Availability
• SSDs & servers fail
• Index rebuilds can take hours to days
3
Lack of Enterprise Data Protection
• No Snapshots or Compliance
• DR limited to Multisite Clustering
4
Poor Storage Efficiency
• Multiple copies of data
• Multisite Clustering Increases Overhead
5
Non-Optimized Growth
• Fixed compute to storage ratio
• Servers must maintain storage symmetry
6
Management complexity
• Multiple management points
1x
2x
3x
2x
3x
1x
51. WHY
EMC
FOR
SPLUNK
OPTIMIZED
INFRASTRUCTURE
FOR
BIG
&
FAST
DATA
OpQmized
Shared
Storage
&
Tiering
Hot & Warm
Data Deployed
On XtremIO or
ScaleIO
Cold & Frozen
Data Deployed
On Isilon
Powerful
Data
Services
Encyption &
Security
Index File
Compression
Deduplication Of
Clustered Indexes
Snapshots For
Backups
Cost-‐EffecQve
&
Flexible
Scale-‐Out
Scale-Out Capacity &
Compute Independently Or
As Converged Platform
52. Why
Flash?!?
Economic
Influences
ü Consumer
Demand
ü Data
Services
Reducing
Impact
of
Applica=on
Data
Copies
ü Flash
technology
has
improved
at
a
faster
rate
than
Moore’s
Law
Intelligent
Scale-‐out
Flash
HDD
54.
Data
Services
For
Hot
&
Warm
Data
Self-Encrypting
Flash Drives
Index File
Compression
Dedupe Clustered
Index Copies
In-Memory Data
Copy Services
EMC
XTREMIO
&
SPLUNK
ALL-‐FLASH
INFRASTRUCTURE
FOR
HOT
&
WARM
DATA
Scale-Out Flash For
I/O-Bound Data
>1M IOPS & <1ms Latencies
High-Speed Search
Accelerate SuperSparse
& Rare Searches
Indexers
Search
Heads
55. EMC
SCALEIO
&
SPLUNK
CONVERGED
ARCHITECTURE
FOR
HOT
&
WARM
DATA
Indexers
Search
Heads
Servers
Network
Storage
Converged
Splunk
Architecture
Leveraging
Exis=ng
Hardware
Investments
5K
IOPS
1
TB
5K
IOPS
1
TB
5K
IOPS
1
TB
5K
IOPS
1
TB
5K
IOPS
1
TB
Shared
Capacity
&
Performance
Remove
Silos
&
Increase
ROI
On
DAS
Capacity
&
No
Single
Point
Of
Failure
25K
IOPS
&
5TB
56. OneFS
EMC
Isilon
–
Deep
and
WIDE
Storage
Single
Volume/
File
System
Policy
based
Tiering
Simplicity
&
Ease
of
Use
Linear
Scalability
MulQ-‐protocol
support
High
Performance
Unmatched
Efficiency
Easy
Growth
57. Consolidate,
Protect
&
Secure
Cold
Data
SmartLock Protects
Cold & Frozen Data
SmartDedupe For
Clustered Indexes
Snapshots IQ
For Backups
EMC
ISILON
&
SPLUNK
LOW-‐COST
&
SECURE
SCALE-‐OUT
FOR
COLD
DATA
High-Speed Ingest
& Long-Term Retention With
Native HDFS Integration
Indexers
Search
Heads
Scale-Out Capacity
Up To 50PB Of Highly
Available Capacity
Self-Encrypting
Drives