SlideShare a Scribd company logo
1 of 33
Copyright © 2013 Splunk Inc.

Hunk: Technical Overview
Agenda
What is Hunk?
2. Powerful Developer Platform
3. Preparation
4. Connect Hunk to HDFS and MapReduce
5. Create Virtual Indexes
6. MapReduce as the Orchestration Framework
7. Search Data in Hadoop
8. Flexible, Iterative Workflow for Business Users
1.

2
Explore, Analyze, Visualize Data in Hadoop
Unlock business value of data in Hadoop

No fixed schema to search unstructured data

Fast to learn instead of scarce skills

Preview results while MapReduce jobs start

Integrated – explore, analyze and visualize

Easier app development than in raw Hadoop

3
Unmet Needs for Hadoop Analytics
OPTION 1

“Do it yourself”
Hadoop / Pig

Hive or SQL on

Extract to
in-memory store

OPTION 2 Hadoop

OPTION 3

Problems

Problems

Problems

•
•
•
•
•
•
•
•
•

•
•
•
•
•
•
•
•

• Data too big to move
• Limited drill down to raw
data
• No results preview
• Another data mart
• Expensive hardware

Scarce skill sets to hire
Need to know MapReduce
Wait for slow jobs to finish
Upfront schema (Pig)
No interactive exploration
No results preview
No built-in visualization
No granular authentication
Slow time to value

Pre-defined fixed schema
Need knowledge of data
Miss data that “doesn’t fit”
No results preview
No built-in visualization
No granular authentication
Scarce skill sets to hire
Slow time to value

4
Integrated Analytics Platform for Hadoop Data
Full-featured,
Integrated
Product

Explore

Analyze

Visualize

Insights for
Everyone
Works with
What You
Have Today

Hadoop
(MapReduce
& HDFS)
5

5

Dashboards

Share
About Hunk
Features
Delivery Model
License Model

Trial License
Where Data is Stored and Read

Hunk
Licensed install
Size of Hadoop cluster: number of Hadoop DataNodes
Hunk does not require a Splunk Enterprise license
Free for 60 days
HDFS or HDFS proprietary variants (MapR)
Needs read only access to data

Supported Hadoop Distributions Hortonworks, Cloudera, MapR and Pivotal
Indexes
Supported Operating Systems
Operations Management
Data Ingest Management

Virtual Indexes
64-bit Linux
Splunk App for HadoopOps
HDFS API or Flume / Scribe / Sqoop: not managed by Hunk
Splunk Hadoop Connect between Splunk Enterprise and
HDFS
6
What Hunk Does Not Do
1.

Hunk does not replace your Hadoop distribution

2.

Hunk does not replace or require Splunk Enterprise

3.

Interactive but not real time or needle in
haystack search

4.

No data ingest management

5.

No Hadoop operations management

7
Product Portfolio
Real-time
indexing
Real-time
search

App Dev
&
App
Mgmt.

Ad hoc analytics of
historical data in Hadoop

IT
Ops.

Web
Intelligence

Security &
Compliance

Product and
Service
Analytics

Business
Analytics

Complete
3600
Customer Security
Analytics
View

Developers building big data apps on top of Hadoop
Splunk Apps
Vibrant and passionate developer community
8

Splunk Hadoop Connect
Powerful Developer Platform with Familiar Tools
Add New
UI components

JavaScript

Java

With Known
Languages
and Frameworks

Integrate into
Existing Systems

Python

PHP

API

9

C#

Ruby
Integration Methods
Dashboards and Views

User Interface Extensibility
• Interactive
dashboards and
user workflows

• Simple or
advanced XML
or REST API and
SDKs

• Custom styling,
behavior & visuals

• iframe embed

• Integrate Hunk charts, dashboards and query results into other applications
• Create workflows that trigger an action in an external system or use REST endpoints

10
Preparation
1.

2.

What are your goals for analytics of
data in Hadoop?

3.

What are the potential use cases?

4.

What is your Hadoop environment?

Who are the business and IT users?

5.

What are your Hadoop access policies?

Hadoop Cluster

11
Prerequisites

Data in
Hadoop
to analyze

Hadoop
client
libraries

Hadoop
access
rights

Java 1.6+

12

HDFS
scratch
space

DataNode
local temp
disk space
Get Started
1.

Set up virtual or physical 64-bit Linux server

2.

Download and install Hunk software

3.

Start Splunk > ./splunk/bin/splunk start

Follow instructions to install or update
4. Hadoop client libraries and Java

13
Hunk Server
Explore

Analyze

Visualize

Dashboards

Share

splunkweb
• Web and Application server
• Python, AJAX, CSS, XSLT, XML
REST API

COMMAND LINE

ODBC (beta)

splunkd
• Search Head
• Virtual Indexes
• C++, Web Services

Hadoop Interface
• Hadoop Client Libraries
• JAVA

64-bit Linux OS
14
Hunk Uses Virtual Indexes

• Enables seamless use of almost the entire Splunk stack on data in Hadoop
• Automatically handles MapReduce
• Technology is patent pending
17
Examples of Virtual Indexes
External System 1

index = syslog (/home/syslog/…)

Hunk
Search Head >

External System 2

External System 3

18

index = apache_logs

index = sensor_data

index = twitter
Point at Hadoop Cluster

Specify basic
properties about
the Hadoop cluster

Hunk works with any compression method
supported by HDFS (e.g., gzip, bzip or lzo)
19
Set Additional Parameters
Prepopulated
fields save time
and can be
overwritten

Add more MapReduce settings

•
•

Configuration files can be edited manually:
indexes.conf, props.conf and transforms.conf
No restart is necessary if working with .conf files.
20
Define Virtual Indexes and Paths
External Resource
(e.g. hadoop.prod)

Virtual Index
(e.g. twitter)

Virtual Index
(e.g. sensor data)

Virtual Index
(e.g. Apache logs)

Specify Virtual Index and data paths, and optionally:

• Filter files or directories using a whitelist or blacklist
• Extract metadata or time range from paths
• Use props/transforms.conf to specify search time processing

21

21
Set Authentication and Access Control

•

Splunk role-based access control

•

No field-based access control

•

LDAP/AD for authentication and group management

•

Single sign on (tokens, certificates)

22
MapReduce as the Orchestration Framework
1. Copy splunkd
binary

HDFS

.tgz

Hunk
Search Head >

2. Copy
.tgz

.tgz

TaskTracker 1

TaskTracker 2

3. Expand in specified location on each TaskTracker

23

TaskTracker 3
4. Receive binary in
subsequent searches
Search Data in Hadoop
Run a copy of splunkd to process
Hunk
Search Head >

1.

JSON
configs

External Resource
(e.g. hadoop.prod)

5.

DataNode /
TaskTracker
(Node in YARN)

NameNode

MapReduce
jobs

DataNode /
TaskTracker
(Node in YARN)

2.
JobTracker
(MapReduce
Resource
Manager in
YARN)

/ working
directory

Tasks
3.

24

DataNode /
TaskTracker
(Node in YARN)

HDFS

4.
Data Processing Pipeline
Raw data
(HDFS)

Custom
processing

stdin

You can plug in
data preprocessors
e.g. Apache Avro or
format readers

Indexing
pipeline
Event breaking
Timestamping

Search
pipeline
Event typing
Lookups
Tagging
Search processors

splunkd/C++

MapReduce/Java
25

25
Hunk Applies Schema on the Fly
• Structure applied at
search time
• No brittle schema to
work around
• Automatically find
patterns and trends

Hunk applies schema for all fields – including transactions – at search time
26
Hunk Usage in HDFS

hdfs://<scratch_space_path>/ bundles
– Search Head bundles: keeps last 5 bundles

packages
– Hunk .tgz packages: no automatic cleanup

dispatch/<sid>
– Search scratch space: cleanup when sid is invalid

27
Search Optimization: Partition Pruning

• Most data types are stored in hierarchical directories
– Such as /<base_path>/<date>/<hour>/<hostname>/somefile.log

• You can instruct Hunk to extract fields and time ranges from a path
• Searches ignore directories that cannot possibly contain search results
– Such as time ranges outside of a defined range

Example time-based partition pruning
Search: index=hunk earliest_time=“2013-06-10T01:00:00” latest_time =“2013-06-10T02:00:00”
28
Common Issues with Hunk Configuration
User running Hunk lacks permission to write to HDFS or run MapReduce
HDFS scratch space for Hunk is not writable
DataNode or TaskTracker scratch space is not writable or out of disk
Data reading permission issues

29
Search Performance with MapReduce
MapReduce considerations
Stats/chart/timechart/top/etc. commands work well in a distributed environment

– They MapReduce well
Time and order commands don’t work well in a distributed environment
– They don’t MapReduce well

Summary
Indexing

•
•
•
•

Useful for speeding up searches
Summaries could have different retention policy
In most cases resides on the search head
Backfill is a manual (scripted) process

30
Mixed-mode Search
Streaming

Reporting

• Transfers first several blocks from

• Pushes computation to the

HDFS to the Hunk Search Head
for immediate processing

DataNodes and TaskTrackers for
the complete search

• Hunk starts the streaming and reporting modes concurrently
• Streaming results show until the reporting results come in
• Allows users to search interactively by pausing and refining queries

31
Interactively Question your Data in Hadoop

Pause means stop fetching results
from Hadoop
Stop means treat the current results
as final and kill the MapReduce job

32
Data Discovery Modes

Hunk supports almost all of the Search Processing Language (SPL), excluding
Transactions and Localize, which require Splunk Enterprise native indexes.
33
Flexible, Iterative Workflow for Business Users
Interactive Analytics
Explore

• Preview results
• Normalization as it’s
needed
• Faster implementation
and flexibility
• Easy search language +
data models & pivot
• Multiple views into the
same data

Share

Analyze

Visualize

Model

Pivot

34
Thank You

More Related Content

What's hot

Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersDataWorks Summit
 
Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overviewAlex Fok
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsKinetica
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoopRommel Garcia
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hiveDavid Kaiser
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunk
 
Check Point Big Data Forum m3
Check Point Big Data Forum m3Check Point Big Data Forum m3
Check Point Big Data Forum m3Alex Fok
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integrationibi
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 

What's hot (20)

Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersEnabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overview
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Overview of stinger interactive query for hive
Overview of stinger   interactive query for hiveOverview of stinger   interactive query for hive
Overview of stinger interactive query for hive
 
Splunk Architecture
Splunk ArchitectureSplunk Architecture
Splunk Architecture
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojo
 
Check Point Big Data Forum m3
Check Point Big Data Forum m3Check Point Big Data Forum m3
Check Point Big Data Forum m3
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 

Similar to SplunkLive! Hunk Technical Overview

Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...SpringPeople
 
Big Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCBig Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCManish Chopra
 
HariKrishna4+_cv
HariKrishna4+_cvHariKrishna4+_cv
HariKrishna4+_cvrevuri
 
Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time Saantosh Rohera
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdfManoel Ribeiro
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 

Similar to SplunkLive! Hunk Technical Overview (20)

Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
Best Practices for Administering Hadoop with Hortonworks Data Platform (HDP) ...
 
Big Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCBig Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOC
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
HariKrishna4+_cv
HariKrishna4+_cvHariKrishna4+_cv
HariKrishna4+_cv
 
Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time Learn Hadoop at your Leisure time
Learn Hadoop at your Leisure time
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf9.-dados e processamento distribuido-hadoop.pdf
9.-dados e processamento distribuido-hadoop.pdf
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 

More from Splunk

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routineSplunk
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTVSplunk
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)Splunk
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank InternationalSplunk
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett Splunk
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)Splunk
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...Splunk
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...Splunk
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)Splunk
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)Splunk
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College LondonSplunk
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSplunk
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability SessionSplunk
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - KeynoteSplunk
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform SessionSplunk
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security SessionSplunk
 

More from Splunk (20)

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11y
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go Köln
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College London
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session
 

Recently uploaded

The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 

Recently uploaded (20)

The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 

SplunkLive! Hunk Technical Overview

  • 1. Copyright © 2013 Splunk Inc. Hunk: Technical Overview
  • 2. Agenda What is Hunk? 2. Powerful Developer Platform 3. Preparation 4. Connect Hunk to HDFS and MapReduce 5. Create Virtual Indexes 6. MapReduce as the Orchestration Framework 7. Search Data in Hadoop 8. Flexible, Iterative Workflow for Business Users 1. 2
  • 3. Explore, Analyze, Visualize Data in Hadoop Unlock business value of data in Hadoop No fixed schema to search unstructured data Fast to learn instead of scarce skills Preview results while MapReduce jobs start Integrated – explore, analyze and visualize Easier app development than in raw Hadoop 3
  • 4. Unmet Needs for Hadoop Analytics OPTION 1 “Do it yourself” Hadoop / Pig Hive or SQL on Extract to in-memory store OPTION 2 Hadoop OPTION 3 Problems Problems Problems • • • • • • • • • • • • • • • • • • Data too big to move • Limited drill down to raw data • No results preview • Another data mart • Expensive hardware Scarce skill sets to hire Need to know MapReduce Wait for slow jobs to finish Upfront schema (Pig) No interactive exploration No results preview No built-in visualization No granular authentication Slow time to value Pre-defined fixed schema Need knowledge of data Miss data that “doesn’t fit” No results preview No built-in visualization No granular authentication Scarce skill sets to hire Slow time to value 4
  • 5. Integrated Analytics Platform for Hadoop Data Full-featured, Integrated Product Explore Analyze Visualize Insights for Everyone Works with What You Have Today Hadoop (MapReduce & HDFS) 5 5 Dashboards Share
  • 6. About Hunk Features Delivery Model License Model Trial License Where Data is Stored and Read Hunk Licensed install Size of Hadoop cluster: number of Hadoop DataNodes Hunk does not require a Splunk Enterprise license Free for 60 days HDFS or HDFS proprietary variants (MapR) Needs read only access to data Supported Hadoop Distributions Hortonworks, Cloudera, MapR and Pivotal Indexes Supported Operating Systems Operations Management Data Ingest Management Virtual Indexes 64-bit Linux Splunk App for HadoopOps HDFS API or Flume / Scribe / Sqoop: not managed by Hunk Splunk Hadoop Connect between Splunk Enterprise and HDFS 6
  • 7. What Hunk Does Not Do 1. Hunk does not replace your Hadoop distribution 2. Hunk does not replace or require Splunk Enterprise 3. Interactive but not real time or needle in haystack search 4. No data ingest management 5. No Hadoop operations management 7
  • 8. Product Portfolio Real-time indexing Real-time search App Dev & App Mgmt. Ad hoc analytics of historical data in Hadoop IT Ops. Web Intelligence Security & Compliance Product and Service Analytics Business Analytics Complete 3600 Customer Security Analytics View Developers building big data apps on top of Hadoop Splunk Apps Vibrant and passionate developer community 8 Splunk Hadoop Connect
  • 9. Powerful Developer Platform with Familiar Tools Add New UI components JavaScript Java With Known Languages and Frameworks Integrate into Existing Systems Python PHP API 9 C# Ruby
  • 10. Integration Methods Dashboards and Views User Interface Extensibility • Interactive dashboards and user workflows • Simple or advanced XML or REST API and SDKs • Custom styling, behavior & visuals • iframe embed • Integrate Hunk charts, dashboards and query results into other applications • Create workflows that trigger an action in an external system or use REST endpoints 10
  • 11. Preparation 1. 2. What are your goals for analytics of data in Hadoop? 3. What are the potential use cases? 4. What is your Hadoop environment? Who are the business and IT users? 5. What are your Hadoop access policies? Hadoop Cluster 11
  • 12. Prerequisites Data in Hadoop to analyze Hadoop client libraries Hadoop access rights Java 1.6+ 12 HDFS scratch space DataNode local temp disk space
  • 13. Get Started 1. Set up virtual or physical 64-bit Linux server 2. Download and install Hunk software 3. Start Splunk > ./splunk/bin/splunk start Follow instructions to install or update 4. Hadoop client libraries and Java 13
  • 14. Hunk Server Explore Analyze Visualize Dashboards Share splunkweb • Web and Application server • Python, AJAX, CSS, XSLT, XML REST API COMMAND LINE ODBC (beta) splunkd • Search Head • Virtual Indexes • C++, Web Services Hadoop Interface • Hadoop Client Libraries • JAVA 64-bit Linux OS 14
  • 15. Hunk Uses Virtual Indexes • Enables seamless use of almost the entire Splunk stack on data in Hadoop • Automatically handles MapReduce • Technology is patent pending 17
  • 16. Examples of Virtual Indexes External System 1 index = syslog (/home/syslog/…) Hunk Search Head > External System 2 External System 3 18 index = apache_logs index = sensor_data index = twitter
  • 17. Point at Hadoop Cluster Specify basic properties about the Hadoop cluster Hunk works with any compression method supported by HDFS (e.g., gzip, bzip or lzo) 19
  • 18. Set Additional Parameters Prepopulated fields save time and can be overwritten Add more MapReduce settings • • Configuration files can be edited manually: indexes.conf, props.conf and transforms.conf No restart is necessary if working with .conf files. 20
  • 19. Define Virtual Indexes and Paths External Resource (e.g. hadoop.prod) Virtual Index (e.g. twitter) Virtual Index (e.g. sensor data) Virtual Index (e.g. Apache logs) Specify Virtual Index and data paths, and optionally: • Filter files or directories using a whitelist or blacklist • Extract metadata or time range from paths • Use props/transforms.conf to specify search time processing 21 21
  • 20. Set Authentication and Access Control • Splunk role-based access control • No field-based access control • LDAP/AD for authentication and group management • Single sign on (tokens, certificates) 22
  • 21. MapReduce as the Orchestration Framework 1. Copy splunkd binary HDFS .tgz Hunk Search Head > 2. Copy .tgz .tgz TaskTracker 1 TaskTracker 2 3. Expand in specified location on each TaskTracker 23 TaskTracker 3 4. Receive binary in subsequent searches
  • 22. Search Data in Hadoop Run a copy of splunkd to process Hunk Search Head > 1. JSON configs External Resource (e.g. hadoop.prod) 5. DataNode / TaskTracker (Node in YARN) NameNode MapReduce jobs DataNode / TaskTracker (Node in YARN) 2. JobTracker (MapReduce Resource Manager in YARN) / working directory Tasks 3. 24 DataNode / TaskTracker (Node in YARN) HDFS 4.
  • 23. Data Processing Pipeline Raw data (HDFS) Custom processing stdin You can plug in data preprocessors e.g. Apache Avro or format readers Indexing pipeline Event breaking Timestamping Search pipeline Event typing Lookups Tagging Search processors splunkd/C++ MapReduce/Java 25 25
  • 24. Hunk Applies Schema on the Fly • Structure applied at search time • No brittle schema to work around • Automatically find patterns and trends Hunk applies schema for all fields – including transactions – at search time 26
  • 25. Hunk Usage in HDFS hdfs://<scratch_space_path>/ bundles – Search Head bundles: keeps last 5 bundles packages – Hunk .tgz packages: no automatic cleanup dispatch/<sid> – Search scratch space: cleanup when sid is invalid 27
  • 26. Search Optimization: Partition Pruning • Most data types are stored in hierarchical directories – Such as /<base_path>/<date>/<hour>/<hostname>/somefile.log • You can instruct Hunk to extract fields and time ranges from a path • Searches ignore directories that cannot possibly contain search results – Such as time ranges outside of a defined range Example time-based partition pruning Search: index=hunk earliest_time=“2013-06-10T01:00:00” latest_time =“2013-06-10T02:00:00” 28
  • 27. Common Issues with Hunk Configuration User running Hunk lacks permission to write to HDFS or run MapReduce HDFS scratch space for Hunk is not writable DataNode or TaskTracker scratch space is not writable or out of disk Data reading permission issues 29
  • 28. Search Performance with MapReduce MapReduce considerations Stats/chart/timechart/top/etc. commands work well in a distributed environment – They MapReduce well Time and order commands don’t work well in a distributed environment – They don’t MapReduce well Summary Indexing • • • • Useful for speeding up searches Summaries could have different retention policy In most cases resides on the search head Backfill is a manual (scripted) process 30
  • 29. Mixed-mode Search Streaming Reporting • Transfers first several blocks from • Pushes computation to the HDFS to the Hunk Search Head for immediate processing DataNodes and TaskTrackers for the complete search • Hunk starts the streaming and reporting modes concurrently • Streaming results show until the reporting results come in • Allows users to search interactively by pausing and refining queries 31
  • 30. Interactively Question your Data in Hadoop Pause means stop fetching results from Hadoop Stop means treat the current results as final and kill the MapReduce job 32
  • 31. Data Discovery Modes Hunk supports almost all of the Search Processing Language (SPL), excluding Transactions and Localize, which require Splunk Enterprise native indexes. 33
  • 32. Flexible, Iterative Workflow for Business Users Interactive Analytics Explore • Preview results • Normalization as it’s needed • Faster implementation and flexibility • Easy search language + data models & pivot • Multiple views into the same data Share Analyze Visualize Model Pivot 34