1© Cloudera, Inc. All rights reserved.
Supercharge Splunk with
Cloudera
2© Cloudera, Inc. All rights reserved.
1,000,000,000,000+
[ events per day ]
3© Cloudera, Inc. All rights reserved.
Challenges with Splunk Today
Splunk can not cost effectively
scale to the volume and variety
of modern data
Only partial view of the
enterprise limits analytics and
slows decisions
Difficult to deploy custom
advanced machine learning
capabilities
Explosion of Data Limited Enterprise Visibility Limited Analytic Processing
DataAccess
1%50%100%
DataVolume
10PB1PB1TB
IF (X) AND (Y)
THEN (Z)
Time
User
Network
Endpoint
Archived
Data
Emerging
Data
4© Cloudera, Inc. All rights reserved.
Advantages of Cloudera over Splunk
Cloud-Native & On-Premise
Go Beyond Splunk’s SPL
• Share enriched data across
multiple analytic processing
engines
• Simple search, SQL, Python,
R, Scala
Data Flexibility
• Faster, more agile, full-
fidelity data acquisition
• Data portability: Open data
model and open storage
Cost-Effective Scalability
• Elastic scale on-prem or in
the cloud
• Cloud-native pay-per-use
and transience
• Proven at big data scale
Hybrid
• Runs across multi-clouds &
on-prem
• Multi-storage over S3, HDFS,
Kudu, Isilon, etc
¢¢¢
5© Cloudera, Inc. All rights reserved.
Optimizing Splunk with Cloudera
PackagedApplications
Analytic
Processing
(Spark, Impala, Solr)
Management,
Governance,Security
(ClouderaManager,Cloudera
Navigator)
Data and
Analytic
Management
Cloudera Data Hub
Open Source Custom
Apache Spot Open Data
Models
(HDFS, Hbase, Kudu)
Ingestion
(Kafka, Flume, Streamsets)
(On premise or Cloud)
Splunk
Servers Threat Intelligence Network User Endpoint
6© Cloudera, Inc. All rights reserved.
Support multiple workloads with community defined Open
Data Models
Endpoint User
Network
DIVERSE DATA SOURCES SINGLE ACCESS
Source: Momentum Partners Cybersecurity Snapshot April 2016
7© Cloudera, Inc. All rights reserved.
Many applications on one shared data set and architecture
Visualization & machine learning
applications can share common
data set & infrastructure
CustomPackaged
Spot community is developing out
machine learning (e.g. network
threat detection)
Open Source
Build custom applications &
analytics using Cloudera without
having to buy new infrastructure
8© Cloudera, Inc. All rights reserved.
When to Use Cloudera vs Splunk
Cloudera
Best for:
• Self-service exploratory analytics
• Machine learning
• Long term archive
• Custom data streams
Benefits:
• Faster performance over large amounts of data
• Complete analytic flexibility (search, SQL,
statistical, and machine learning)
• Cost-effective scale
• Open data models and open data storage
Splunk
Best for:
• Workflow management
• Using pre-package rules
• Hot data management
• Preconfigured connectors
Benefits:
• Optimized for specific use cases
• Existing rules and connectors built out
• Quickly query for hot data for simple
questions
• Proprietary data format and data storage
optimized for their applications
9© Cloudera, Inc. All rights reserved.
Two starter use cases for Splunk Optimization with
Cloudera
Getting Started
10© Cloudera, Inc. All rights reserved.
Potential Goals
Demonstrate Splunk
optimization
✓ Install and configure Cloudera clusters (cloud or on
prem)
✓ Install and configure Apache Spot Open Data Models
✓ Build ingest adapters for Splunk to Apache Spot
✓ Build visualization dashboard that delivers some subset
of optics currently defined in Splunk
Establish
Cloudera data hub
Provide analytic
foundations
✓ Build IT and cybersecurity analytics platform on the
Apache Spot Open Data Model (ODM)
11© Cloudera, Inc. All rights reserved.
2 potential starting places…
1. Splunk Cost Tuning
2. Context Enrichment and Increased Visibility
12© Cloudera, Inc. All rights reserved.
Splunk Cost Tuning
• Identify where enterprise wants to
optimize cost, ingest/ indexing or storage
• Offload event data from heavy forward to
reduce long term storage and ingest/
indexing costs
• Keep enough data in Splunk to power
dashboards with long term analytics in
Cloudera. Enable flexible analytics:
• Search
• SQL
• Machine Learning (Python, Scala, R)
Packaged Applications
Analytic
Processing
(Spark, Impala, Solr)
Management,Governance,
Security
(ClouderaManager,Cloudera
Navigator)
Data and
Analytic
Management
Cloudera Data Hub
Custom
Apache Spot Open Data
Models
(HDFS, Hbase, Kudu)
Ingestion
(Kafka, Flume, Streamsets)
(On premise or Cloud)
Splunk
Sources
Open Source
Splunk Heavy Forwarder
Splunk Storage
Threat Intelligence Network User Endpoint
13© Cloudera, Inc. All rights reserved.
Context Enrichment and Increased Visibility
• Load events and context sources into
EDH landing it in Apache Spot’s Open
Data Model
• Enrich and enhance events with
additional context in the ODM
• Keep enough data in Splunk to power
dashboards with long term analytics in
Cloudera. Enable flexible analytics:
• Search
• SQL
• Machine Learning (Python, Scala, R)
Packaged Applications
Analytic
Processing
(Spark, Impala, Solr)
Management,Governance,
Security
(ClouderaManager,Cloudera
Navigator)
Data and
Analytic
Management
Cloudera Data Hub
Custom
Apache Spot Open Data
Models
(HDFS, Hbase, Kudu)
Ingestion
(Kafka, Flume, Streamsets)
(On premise or Cloud)
Sources
Apache Spot Algorithms
Splunk
Splunk Heavy Forwarder
Splunk Indexer
Packaged Applications
Analytic
Processing
(Spark, Impala, Solr)
Management,Governance,
Security
(ClouderaManager,Cloudera
Navigator)
Data and
Analytic
Management
Cloudera Data Hub
Custom
Apache Spot Open Data
Models
(HDFS, Hbase, Kudu)
Ingestion
(Kafka, Flume, Streamsets)
(On premise or Cloud)
Splunk
Open Source
Splunk Heavy Forwarder
Splunk Storage
Threat Intelligence Network User Endpoint
14© Cloudera, Inc. All rights reserved.
Q&A
15© Cloudera, Inc. All rights reserved.
Thank You

Supercharge Splunk with Cloudera


  • 1.
    1© Cloudera, Inc.All rights reserved. Supercharge Splunk with Cloudera
  • 2.
    2© Cloudera, Inc.All rights reserved. 1,000,000,000,000+ [ events per day ]
  • 3.
    3© Cloudera, Inc.All rights reserved. Challenges with Splunk Today Splunk can not cost effectively scale to the volume and variety of modern data Only partial view of the enterprise limits analytics and slows decisions Difficult to deploy custom advanced machine learning capabilities Explosion of Data Limited Enterprise Visibility Limited Analytic Processing DataAccess 1%50%100% DataVolume 10PB1PB1TB IF (X) AND (Y) THEN (Z) Time User Network Endpoint Archived Data Emerging Data
  • 4.
    4© Cloudera, Inc.All rights reserved. Advantages of Cloudera over Splunk Cloud-Native & On-Premise Go Beyond Splunk’s SPL • Share enriched data across multiple analytic processing engines • Simple search, SQL, Python, R, Scala Data Flexibility • Faster, more agile, full- fidelity data acquisition • Data portability: Open data model and open storage Cost-Effective Scalability • Elastic scale on-prem or in the cloud • Cloud-native pay-per-use and transience • Proven at big data scale Hybrid • Runs across multi-clouds & on-prem • Multi-storage over S3, HDFS, Kudu, Isilon, etc ¢¢¢
  • 5.
    5© Cloudera, Inc.All rights reserved. Optimizing Splunk with Cloudera PackagedApplications Analytic Processing (Spark, Impala, Solr) Management, Governance,Security (ClouderaManager,Cloudera Navigator) Data and Analytic Management Cloudera Data Hub Open Source Custom Apache Spot Open Data Models (HDFS, Hbase, Kudu) Ingestion (Kafka, Flume, Streamsets) (On premise or Cloud) Splunk Servers Threat Intelligence Network User Endpoint
  • 6.
    6© Cloudera, Inc.All rights reserved. Support multiple workloads with community defined Open Data Models Endpoint User Network DIVERSE DATA SOURCES SINGLE ACCESS Source: Momentum Partners Cybersecurity Snapshot April 2016
  • 7.
    7© Cloudera, Inc.All rights reserved. Many applications on one shared data set and architecture Visualization & machine learning applications can share common data set & infrastructure CustomPackaged Spot community is developing out machine learning (e.g. network threat detection) Open Source Build custom applications & analytics using Cloudera without having to buy new infrastructure
  • 8.
    8© Cloudera, Inc.All rights reserved. When to Use Cloudera vs Splunk Cloudera Best for: • Self-service exploratory analytics • Machine learning • Long term archive • Custom data streams Benefits: • Faster performance over large amounts of data • Complete analytic flexibility (search, SQL, statistical, and machine learning) • Cost-effective scale • Open data models and open data storage Splunk Best for: • Workflow management • Using pre-package rules • Hot data management • Preconfigured connectors Benefits: • Optimized for specific use cases • Existing rules and connectors built out • Quickly query for hot data for simple questions • Proprietary data format and data storage optimized for their applications
  • 9.
    9© Cloudera, Inc.All rights reserved. Two starter use cases for Splunk Optimization with Cloudera Getting Started
  • 10.
    10© Cloudera, Inc.All rights reserved. Potential Goals Demonstrate Splunk optimization ✓ Install and configure Cloudera clusters (cloud or on prem) ✓ Install and configure Apache Spot Open Data Models ✓ Build ingest adapters for Splunk to Apache Spot ✓ Build visualization dashboard that delivers some subset of optics currently defined in Splunk Establish Cloudera data hub Provide analytic foundations ✓ Build IT and cybersecurity analytics platform on the Apache Spot Open Data Model (ODM)
  • 11.
    11© Cloudera, Inc.All rights reserved. 2 potential starting places… 1. Splunk Cost Tuning 2. Context Enrichment and Increased Visibility
  • 12.
    12© Cloudera, Inc.All rights reserved. Splunk Cost Tuning • Identify where enterprise wants to optimize cost, ingest/ indexing or storage • Offload event data from heavy forward to reduce long term storage and ingest/ indexing costs • Keep enough data in Splunk to power dashboards with long term analytics in Cloudera. Enable flexible analytics: • Search • SQL • Machine Learning (Python, Scala, R) Packaged Applications Analytic Processing (Spark, Impala, Solr) Management,Governance, Security (ClouderaManager,Cloudera Navigator) Data and Analytic Management Cloudera Data Hub Custom Apache Spot Open Data Models (HDFS, Hbase, Kudu) Ingestion (Kafka, Flume, Streamsets) (On premise or Cloud) Splunk Sources Open Source Splunk Heavy Forwarder Splunk Storage Threat Intelligence Network User Endpoint
  • 13.
    13© Cloudera, Inc.All rights reserved. Context Enrichment and Increased Visibility • Load events and context sources into EDH landing it in Apache Spot’s Open Data Model • Enrich and enhance events with additional context in the ODM • Keep enough data in Splunk to power dashboards with long term analytics in Cloudera. Enable flexible analytics: • Search • SQL • Machine Learning (Python, Scala, R) Packaged Applications Analytic Processing (Spark, Impala, Solr) Management,Governance, Security (ClouderaManager,Cloudera Navigator) Data and Analytic Management Cloudera Data Hub Custom Apache Spot Open Data Models (HDFS, Hbase, Kudu) Ingestion (Kafka, Flume, Streamsets) (On premise or Cloud) Sources Apache Spot Algorithms Splunk Splunk Heavy Forwarder Splunk Indexer Packaged Applications Analytic Processing (Spark, Impala, Solr) Management,Governance, Security (ClouderaManager,Cloudera Navigator) Data and Analytic Management Cloudera Data Hub Custom Apache Spot Open Data Models (HDFS, Hbase, Kudu) Ingestion (Kafka, Flume, Streamsets) (On premise or Cloud) Splunk Open Source Splunk Heavy Forwarder Splunk Storage Threat Intelligence Network User Endpoint
  • 14.
    14© Cloudera, Inc.All rights reserved. Q&A
  • 15.
    15© Cloudera, Inc.All rights reserved. Thank You