We all know that Splunk software is designed to handle streaming work flows and scale out very linearly. Infrastructure needs to be able to grow with Splunk’s scalability while being smart enough to take care of itself. There are multiple ways to deploy storage to support your Splunk environment but there are a few recommended practices to keep in mind as you begin planning your Splunk deployment to avoid the pitfalls of traditional DAS infrastructure. From flashing your home path, to ice cold data lakes, + converged solutions, we will cover what you need to know to avoid the common challenges of traditional IT when it comes to big data workloads.
6. 6
Enter SPLUNK ENTERPRISE
Enterprise
Scalability
Search &
Investigation
Proactive
Monitoring
Operational
Visibility
Real-time
Business
Insights
Operational IntelligenceAny Machine Data
INDUSTRY-LEADINGPLATFORMFORMACHINE DATA
Online
Services
Web
Services
Servers
Security
GPS
Location
Storage Desktops
Networks
Packaged
Applications
Custom
Applications
Messaging
Telecoms
Online
Shopping
Cart
Web
Clickstreams
Databases
Energy
Meters
Call Detail
Records
Smartphones
and Devices
RFID
Datacenter
Private
Cloud
Public
Cloud
7. 7
SPLUNK ARCHITECTURE
Search Heads
Query information across indexers
and are usually CPU and memory
intensive.
Indexers
Write data to disk and are both CPU
and I/O intensive.
Forwarders
Collect and forward data; usually
lightweight and not resource
intensive.
http://docs.splunk.com/Documentation/Splunk/latest/Overview/AboutSplunkEnterprisedeployments
9. 9
Splunk Indexer Buckets
HOT / WARM
• Recent searches/dashboards
• Usually block LUN
• High Random reads
• Sequential reads / writes
• Rare searches
• Usually NAS share
• Light Random Reads
• Sequential reads / writes
• Not searchable
• Usually offline media
• Only sequential write
Splunk moves indexes from between from hot/warm to cold to frozen
based on user configuration
Size of the buckets impacts performance
COLD FROZEN
10. 10
Indexer Storage Capacity
Indexer
Uncompressed ‘indexes’
70% of written data
350GB
1TB Ingested Data
= ½ Ingested Data
= 500GB
Compressed Raw data
30% of written data
150GB
Raw Data Indexes
*.gz *.tsidx
Written Data
11. 11
How much storage you need?
Indexer
1TB Data Ingested Daily
= 1TB * ½ * 60 days = 30TB
Raw Data Indexes
150GB/Day * 60 Days = 9TB 350GB/Day * 60 Days = 21TB
= Daily indexing rate
x ½
x Retention policy
12. 12
Multiple copies of index and raw data
• Replication factor (RF) = # of of copies of raw data
• Search factor (SF) = # copies of indexes
SPLUNK INDEXER AVAILABILITY
30TB written 30TB replicated
1TB * 60 days x ½ x 2
= 60TB (RF/SF=2) ** doubled **
1TB * 60 days x ½ x 3
= 90TB (RF/SF=3) ** tripled **
STORAGE CAPACITY
MULTIPLIES!
Raw
9TB
Index
21TB
Raw
9TB
Index
21TB
SF=2 / RF=2
14. 14
DAS PRESENTS CHALLENGES
SPLUNK DAS ENVIRONMENT
1
Dedicated Storage Infrastructure
• Silo that only runs Splunk
2
Compromised Availability
• SSDs & servers fail
• Index rebuilds can take hours to days
3
Lack of Enterprise Data Protection
• No Snapshots or Compliance
• DR limited to Multisite Clustering
4
Poor Storage Efficiency
• Multiple copies of data
• Multisite Clustering Increases Overhead
5
Non-Optimized Growth
• Fixed compute to storage ratio
• Servers must maintain storage symmetry
6
Management complexity
• Multiple management points
1x
2x
3x
2x
3x
1x
15. 15
WHY EMC FOR SPLUNK
OPTIMIZED INFRASTRUCTURE FOR BIG & FAST DATA
Optimized Shared
Storage & Tiering
Hot & Warm
Data Deployed
On XtremIO or
ScaleIO
Cold & Frozen
Data Deployed
On Isilon
Powerful Data
Services
Encyption &
Security
Index File
Compression
Deduplication Of
Clustered Indexes
Snapshots For
Backups
Cost-Effective &
Flexible Scale-Out
Scale-Out Capacity &
Compute Independently Or
As Converged Platform
16. 16
Why Flash?!?
Economic Influences
Consumer Demand
Data Services
Allowing free Copies
of Application Data
Flash technology has
improved at a faster
rate than Moore’s
Law
Intelligent Scale-out Flash
HDD
18. 18
Data Services For
Hot & Warm Data
Self-Encrypting
Flash Drives
Index File
Compression
Dedupe Clustered
Index Copies
In-Memory Data
Copy Services
EMC XTREMIO & SPLUNK
ALL-FLASH INFRASTRUCTURE FOR HOT & WARM DATA
Scale-Out Flash For
I/O-Bound Data
>1M IOPS & <1ms Latencies
High-Speed Search
Accelerate SuperSparse
& Rare Searches
Indexers
Search Heads
19. 19
XTREMIO & INDEXER AVAILABILITY
30TB written 30TB replicated 30TB replicated
Raw
9TB
Index
21TB
Raw
9TB
Index
21TB
(RF/SF=2)
Raw
9TB
Index
21TB
= 31.25TB
(RF/SF=3)
= 32.5TB
(RF/SF=1)
1TB * 60 days x ½ x 1 = 30TB
= 30TB
** doubled ** ** tripled **
With XTREMIO Inline De-Dup
1TB * 60 days x ½ x 2 = 60TB 1TB * 60 days x ½ x 3 = 90TB
20. 20
EMC SCALEIO & SPLUNK
CONVERGED ARCHITECTURE FOR HOT & WARM DATA
Indexers
Search Heads
Servers
Network
Storage
Converged Splunk
Architecture
Leveraging Existing
Hardware Investments
5K IOPS
1 TB
5K IOPS
1 TB
5K IOPS
1 TB
5K IOPS
1 TB
5K IOPS
1 TB
Shared Capacity &
Performance
Remove Silos & Increase
ROI On DAS Capacity & No
Single Point Of Failure
25K IOPS & 5TB
21. 21
OneFS
EMC Isilon – Deep and WIDE Storage
Single Volume/
File System
Policy based
Tiering
Simplicity &
Ease of Use
Linear
Scalability
Multi-protocol
support
High
Performance
Unmatched
Efficiency
Easy
Growth
22. 22
Consolidate, Protect
& Secure Cold Data
SmartLock Protects
Cold & Frozen Data
SmartDedupe For
Clustered Indexes
Snapshots IQ
For Backups
EMC ISILON & SPLUNK
LOW-COST & SECURE SCALE-OUT FOR COLD DATA
High-Speed Ingest
& Long-Term Retention With
Native HDFS Integration
Indexers
Search Heads
Scale-Out Capacity
Up To 50PB Of Highly
Available Capacity
Self-Encrypting
Drives
27. 27
EMC REFERENCE ARCHITECTURES
Single-Instance Distributed
HOT & COLD
Mostly searches
Heavy Random reads
Sequential writes
Adhoc searches
Light Random Reads
Sequential Writes
Indexers Indexers Search
WARM
XtremIO
Scale-Out 160 TB Flash & No Tuning
No RAID Configuration Needed
Many Copies & No Overhead
Isilon
Multi-Protocol = Always Searchable
Tier “Frozen” Data Without Migration
Scale-Out 50 PB Of Cold & Frozen Data
Deduplication & Compression SmartDedupe & SmartLock
28. 28
VBLOCK® SYSTEMS – THE ONLY TRUE CONVERGED
INFRASTRUCTURE
Application
Optimization
Lifecycle System
Assurance
API Enabled, Converged
Management
Integrated Protection and
Workload Mobility Solutions
Pre-engineered,
Pre-validated, Pre-tested
Best-of-breed
Technology
Fastest Time-to-Business
Highest Performance
Highest Availability
Converged Management
Lowest Risk
Customer Experience
Lowest TCO
29. 29
WHY VCE FOR SPLUNK?
Factory Physical and Logical Build
Roadmap and New Feature
Planning
Compliance-Ready
Configuration and Patch
Management
Performance and Availability Single Support Through VCE
FOCUS ON BUSINESS OPERATIONS, NOT MAINTAINING INFRASTRUCTURE
30. 30
VCE to Address Three Opportunities Splunk
VCE™
technology
extension for
EMC® Isilon®
Storage
Vblock/VxBlock
System 540
Vblock/VxBlock
System 340
HOT/WARM COLDRack-Scale Bundle
Block/Scale-Out Bundle
31. 31
SPLUNK APP FOR VCE Vision
VCE Systems presented as
an entity
Compliance history – what
has been changed?
System inventory and
health
KPI dashboards
One command to configure
system logs and events
33. 33
How We Size Splunk Infrastructure
• We Use Splunk Best Practices
“Religiously”
• We build “Converged Systems” first
• We have our own Ninjas to help!
• http://splunk-sizing.appspot.com/
Not Officially Splunk-Supported