Hunk - Unlocking the Power of Big Data

Copyright © 2015 Splunk Inc.
Hunk – Unlocking the
Power of Big Data

Splunk
Disruptive Approach to Unstructured Data
Structured
RDBMS
SQL Search
Schema at Write Schema at Read
1980-2010 2010+
ETL Universal
Indexing
Unstructured
Volume | Velocity | Variety

Mainframe
Data
VMware
Platform for Machine Data
Exchange PCI Security
DB Connect MobileForwarders
Syslog,
TCP,
Other
Sensors,
Control
Systems
600+ Ecosystem of Apps
Stream
SPLUNK TODAY

Splunk – Big Data Engine

5
Distributed File System
(semi-structured)
Key/Value, Columnar or
Other (semi-structured)
Relational Database
(highly structured)
MapReduce
Cassandra
Accumulo
MongoDB
Splunk - Big Data Technologies
SQL &
MapReduce
NoSQL
Temporal, Unstructured
Heterogeneous
Hadoop
RDBMS HDFS Storage +
MapReduce
Real-Time Indexing
5
Oracle
MySQL
IBM DB2
Teradata

Massive Linear Scalability to Tens of TBs/Day
Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day
Offload search load to Splunk Search Heads
6
Automatic load balancing linearly
scales indexing
Distributed search and MapReduce
linearly scales search and reporting

7
Splunk Real-Time Analytics
Data
ParsingQueue
Parsing Pipeline
• Source, event typing
• Character set
normalization
• Line breaking
• Timestamp identification
• Regex transforms
Indexing
Pipeline
Real-time
Buffer
Raw data
Index Files
Real-time
Search
Process
Monitor Input
IndexQueue
TCP/UDP Input
Scripted Input
Splunk
Index
7

8
Search Head Clustering
Ability to group search heads into a cluster in order to provide
Highly Available and Scalable search services = Thousands of Users
8
MISSION
CRITICAL
ENTERPRISE

9
Splunk Index Replication – High Availability
9
2
Master asks the redundant
peer to act as primary
3
Peers copies the search
files / index files / raw data
2 3
1
Master auto-detects that a
peer is down
1
• Default is 3X Replication

Hunk – Hadoop

11
Splunk and Hadoop
1
Hunk:
– Main use case = Analyze Hadoop Data using Hadoop Processing
Splunk Hadoop Connect:
– Main use case = Real-time export data from Splunk to Hadoop
Hunk Archive
– Main use case = Archive Splunk indexers to Hadoop
Splunk HadoopOps:
– Main use case = Monitor Hadoop

1
Integrated Analytics Platform
Full-featured,
Integrated
Product
Insights for
Everyone
Works with
What You
Have Today
Explore Visualize Dashboard
s
ShareAnalyze
Hadoop Clusters NoSQL and Other Data Stores
Hadoop Client Libraries Streaming Resource Libraries
for Diverse Data Stores

13
Hunk – Unique
1
1. Run Natively in Hadoop:
– Use Hadoop MapReduce
2. Mixed Mode:
– Allows for data Preview
3. Auto deploy SplunkD to DataNodes:
– On the fly Indexing
4. Access Control:
– Allows for many users / many Hadoop directories / support Kerberos
5. Schema On the Fly

14
Run Natively in Hadoop
External resource
(e.g. hadoop.prod)
MapReduce
jobs
Tasks
/ working
directory
Index on data nodes
Hunk
search head >
1
5
3
4
2
NameNode
JobTracker
(YARN)
DataNode /
TaskTracker
DataNode /
TaskTracker
DataNode /
TaskTracker
HDFS
14
Hadoop
MR Jobs

15
Mixed-mode Search
15
Time
Hadoop MR /
Splunk Index
Splunk Stream
Switch over
time
preview
preview
• Data Preview
• Allows users to search interactively by pausing and
refining queries

16
Indexing On the fly - Hunk Data Processing
16
HDFS
Results
Final search
results
ERP
Search process
Remote results Remote results
Search head
MapReduce
Search process
TaskTracker
raw
preprocessed
Remote results
Remote results

17 1
Role-based Security for Shared Clusters
Pass-through
Authentication
• Provide role-based security
for Hadoop clusters
• Access Hadoop resources
under security and
compliance
• Integrates with Kerberos
for Hadoop security
Business
Analyst
Marketing
Analyst
Sys
Admin
Business
Analyst
Queue:
Biz Analytics
Marketing
Analyst
Queue:
Marketing
Sys
Admin2
Queue:
Prod

18
Managed Archiving Splunk Enterprise to Hunk-HDFS
1
• Archive buckets to Hadoop (HDFS) instead of freezing buckets or throwing data away
• Store old data up to 1/10 cheaper in Hadoop cheap batch storage instead of SANs
• Optimize Splunk Enterprise search head performance for real-time monitoring,
alerting and dashboarding with short-term historical context
• Hunk search, analyze and visualize months or years of historical data in Hadoop
• Run federated queries and dashboards across Splunk Enterprise and Hunk
Hadoop Clusters
WARM
COLD
FROZEN

19
Hunk Enables Hadoop as Self Service
1

20
New Search
i ndex=" j obsummar y_l ogs_al l _r ed" cl ust er =" di l i t hi um* " | eval t ot al _sl ot _seconds=( m apSl ot Seconds + r educeSl ot Sec
onds) | eval gb_hour s=( ( t ot al _sl ot _seconds * 0. 5) / 3600) | eval gb_hour s=r ound( gb_h our s) | t i mechar t span=6h sum
( gb_hour s) as gb_hour s by queue
Last 7 days
✓ 1,175,726 events (5/20/ 14 8:00:00.000 PM to 5/ 27/14 8:26:26.000 PM)
200,000
400,000
600,000
_time ↕
OTH
ER
↕
apg_dai
lyhigh_
p3 ↕
apg_dail
ymedium
_p5 ↕
apg_hou
rlyhigh_
p1 ↕
apg_ho
urlylow_
p4 ↕
apg_hourl
ymedium
_p2 ↕
apg
_p7
↕
curveb
all_larg
e ↕
curveb
all_me
d ↕
sling
shot
↕
sling
stone
↕
Visualization
_time
Wed May 21
2014
Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26
Yahoo - Visualizing Hadoop
2
• 600PB of Data
• Very large clusters used by many
groups across the enterprise
• 35,000 individual Datanodes
• Hadoop is provided as a Self
Service

21
Vantrix Mobile media optimization
2
144 Hadoop Nodes,
69 TB SSD Storage
Analytics Application
10 Million subscribers generate:
• 80GB of raw session log data / day
• 26 Million video data session records
Hunk Query
• 20 sec – search through 27M events
• Returning 4.7M events
Hunk as indexer - Automatically indexed and counted field value occurrences
Hunk as Self Service - Proved invaluable for identifying and exploring use cases
Hunk business value – Help identify when subscribers abandon video

Hunk - Unlocking the Power of Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hunk - Unlocking the Power of Big Data

Similar to Hunk - Unlocking the Power of Big Data (20)

More from Splunk

More from Splunk (20)

Recently uploaded

Recently uploaded (20)

Hunk - Unlocking the Power of Big Data

Editor's Notes