Hunk: Splunk Analytics for Hadoop

Copyright © 2013 Splunk Inc.
Hunk:
Technical Overview
Juergen Magiera, Sales Engineer

Legal Notices
During the course of this presentation, we may make forward-looking statements regarding future events or the
expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could differ
materially. For important factors that may cause actual results to differ from those contained in our forward-looking
statements, please review our filings with the SEC. The forward-looking statements made in this presentation are
being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation
may not contain current or accurate information. We do not assume any obligation to update any forward-looking
statements we may make. In addition, any information about our roadmap outlines our general product direction
and is subject to change at any time without notice. It is for informational purposes only and shall not, be
incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the
features or functionality described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of
Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective
owners.
©2014 Splunk Inc. All rights reserved.

Agenda
1. What is Hunk?
2. Powerful Developer Platform
3. Preparation
4. Connect Hunk to HDFS and MapReduce
5. Create Virtual Indexes
6. MapReduce as the Orchestration Framework
7. Search Data in Hadoop
8. Flexible, Iterative Workflow for Business Users
3

Explore, Analyze, Visualize Data in Hadoop
No fixed schema to search unstructured data
Preview results while MapReduce jobs start
Easier app development than in raw Hadoop
4
Unlock business value of data in Hadoop
Fast to learn instead of scarce skills
Integrated – explore, analyze and visualize

Connect to HDFS and MapReduce
6
Connect to Apache HDFS and MapReduce
or your choice of Hadoop distribution
Hadoop Cluster 1

Extract to
in-memory store
Unmet Needs for Hadoop Analytics
8
• Scarce skill sets to hire
• Need to know MapReduce
• Wait for slow jobs to finish
• No results preview
• No built-in visualization
• No granular authentication
• Slow time to value
• Pre-defined fixed schema
• Need knowledge of data
• Miss data that “doesn’t fit”
• No built-in visualization
• Scarce skill sets to hire
• Slow time to value
• Data too big to move
• Limited drill down to raw data
• Another data mart
• Expensive hardware
“Do it yourself”
Hadoop / Pig
Problems
OPTION 1
Hive or SQL on
Hadoop
Problems
OPTION 2
Problems
OPTION 3

Hadoop in Real life
Using HunkMap Reduce Job for Hadoop
il.public class WordCount extends Configured implements Tool {
public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {static enum Counters { INPUT_WORDS }
• private Text word = new Text();
• private boolean caseSensitive = true;
• private Set<String> patternsToSkip = new HashSet<String>();
• private long numRecords = 0;
• private String inputFile;
• public void configure(JobConf job) {
• caseSensitive = job.getBoolean("wordcount.case.sensitive", true);
• inputFile = job.get("map.input.file");
• if (job.getBoolean("wordcount.skip.patterns", false)) {
• Path[] patternsFiles = new Path[0];
• try {
• patternsFiles = DistributedCache.getLocalCacheFiles(job);
• } catch (IOException ioe) {
• System.err.println("Caught exception while getting cached files: " +
StringUtils.stringifyException(ioe));
• }
• for (Path patternsFile : patternsFiles) {
• parseSkipFile(patternsFile);
• }
• }
• Index=Hadoop
• |wc usestopwords=f
• |stats sum(count) by word

Integrated Analytics Platform for Hadoop Data
10
10
Full-featured,
Integrated
Product
Insights for
Everyone
Works with
What You
Have Today
Explore Visualize Dashboards Share
Hadoop
(MapReduce
& HDFS)
Analyze

What Hunk Does Not Do
Hunk does not replace your Hadoop distribution
Hunk does not replace or require Splunk Enterprise
Interactive but not real time
No data ingest management (that’s Flume or Sqoop)
No Hadoop operations management
11
1.
2.
3.
4.
5.

Product Portfolio
12
Real-time
indexing
Real-time
search
Splunk Apps
Vibrant and passionate developer community
IT
Ops.
Security &
Compliance
Web
Intelli-
gence
App Dev
&
App
Mgmt.
Business
Analytics
Splunk Hadoop Connect
DB Connect
Ad hoc analytics of
historical data in Hadoop
Developers building big data apps on top of Hadoop
3600
Customer
View
Complete
Security
Analytics
Product and
Service
Analytics

Powerful Developer Platform with Familiar Tools
13
JavaScript Java Python PHP C# Ruby
API
Add New
UI components
Integrate into
Existing Systems
With Known
Languages
and Frameworks

Prerequisites
14
Hadoop
access
rights
Java 1.6+Hadoop
client
libraries
HDFS
scratch
space
Data in
Hadoop
to analyze
DataNode
local temp
disk space

MapReduce as the Orchestration Framework
15
1. Copy splunkd
binary
HDFS.tgz
TaskTracker 1 TaskTracker 2
.tgz
2. Copy
3. Expand in specified location on each TaskTracker
TaskTracker 3
.tgz
4. Receive binary in
subsequent searches
Hunk
Search Head >

Data Processing Pipeline
17 17
Raw data
(HDFS)
Custom
processing
Indexing
pipeline
Search
pipeline
You can plug in
data preprocessors
e.g. Apache Avro or
format readers
MapReduce/Java
stdin
Event breaking
Timestamping
Event typing
Lookups
Tagging
Search processors
splunkd/C++

Hunk applies schema for all fields – including transactions – at search time
Hunk Applies Schema on the Fly
18
• Structure applied at
search time
• No brittle schema to
work around
• Automatically find
patterns and trends

Mixed-mode Search
ReportingStreaming
• Transfers first several blocks from
HDFS to the Hunk Search Head
for immediate processing
• Pushes computation to the
DataNodes and TaskTrackers for
the complete search
20
• Hunk starts the streaming and reporting modes concurrently
• Streaming results show until the reporting results come in
• Allows users to search interactively by pausing and refining queries

Flexible, Iterative Workflow for Business Users
22
Explore
Analyze
Model
Pivot
Visualize
Share
Interactive Analytics
• Preview results
• Normalization as it’s
needed
• Faster implementation
and flexibility
• Easy search language +
data models & pivot
• Multiple views into the
same data

Hunk: Splunk Analytics for Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hunk: Splunk Analytics for Hadoop

Similar to Hunk: Splunk Analytics for Hadoop (20)

More from Georg Knon

More from Georg Knon (20)

Recently uploaded

Recently uploaded (20)

Hunk: Splunk Analytics for Hadoop

Editor's Notes