Taking Splunk to the Next Level – Architecture

Copyright © 2015 Splunk Inc.
Take Splunk to the Next Level:
Technical - Architecture

2
Am I in the right place?
You should have familiarity with…
• Splunk Roles
• Administering a single Splunk instance
• Great if you’ve worked with a distributed environment
• General Server Hardware
• Disk I/O
• CPU Cores

3
Who’s This Dude?
Jeff Champagne
Client Architect
Started with Splunk in Fall 2014
Former Splunk customer in the Financial Services Industry
Lived previous lives as a Systems Administrator, Engineer,
and Architect

4
Splunk at the Next Level
Time to move beyond initial Splunk environment
• More use cases – how to tackle?
• More data – how do we scale?
• Splunk is mission critical == HA
• Global deployments
• Improving Splunk user experience Screenshot here

5
Growing your Splunk Deployment
Many customers start with a single use case…
• Ex: Monitor the web servers
• Help ensure up-time & response times
• Track usage, errors
• Provides business value

6
Value statement for each overall service
Your services exist in a larger context than just one app, or one tier.
What is the value of the service as a whole?
What are CIO commitments for the service?
• The organization’s web site is one of the most critical parts of the business.
• Performance of the overall environment must be maintained at all times.
• Failures in any portion of the web site must be quickly identified, send
notification to the appropriate parties.
• Dependencies on external processes must be monitored as well.

7
The larger context
• Failure in one system cascades
• Map dependencies, estimate costs
• Use Splunk to track all dependencies.
• What happens when it is down?
Dependencies often include:
• Networking dependencies
• Shared storage
• Databases, middleware, custom apps
• Virtualization layer
Screenshot here

8
Scales to Hundreds of TBs/Day
Enterprise-Class Scale, Resilience and Interoperability
Send data from thousands of servers using any combination of Splunk Forwarders
Auto load-balanced forwarding to Splunk Indexers
Offload search load to Splunk Search Heads

9
Product Roles
Searching and Reporting (Search Head)
Indexing and Search Services (Indexer)
Data Collection and Forwarding (Forwarder)
Indexer Cluster Master, SHC Deployer
Distributed Management / Deployment Server
License Master, Distributed Mgmt Console
Databases
Networks
Servers
Virtual
Machines
Smart
phones
and
Devices
Custom
Applications
Security
WebServer
Sensors

Forwarders

12
Splunk Universal Forwarder
Why use the UF over other methods?
Collect syslog / event log / custom application logs
Collect configuration files, registry settings
Collect data NOT in log files: scripted inputs on current state
Collect wire data – Splunk Stream
Faster, Lower overhead than “agentless” polling
Centrally administered
… and

13
Forwarder Load Balancing
Have UF balance across multiple indexers
Load Balance
– Multiple hosts in outputs
– DNS round robin
– LB not needed!
Geography-based routing
Optional SSL encryption
Compressed 10 to 1

14
Deployment Server
Central management of Splunk Forwarders
Deployment Server manages Apps, Configs
Select one or more classes for each host
Class defines apps & configs
Works by phone-home
Notes:
DS does not push forwarder binaries
Use Cluster Master to manage indexers in cluster, not DS

15
Forwarding Tier Design Best Practices
15
• Use a Syslog Server for Syslog data
• Deployment server (on a VM) for central management
• Let AutoLB distribute data across available indexers
• May need to increase UF throughput setting for high velocity sources
– Enable forceTimebasedAutoLB (for more even distribution)
– maxKBps (to adjust throttling)
Questions?

Indexers

17
Indexers
Dedicated indexers serve three primary roles:
Data Storage
Processing and parsing at index-time
Indexing
Data Management
Hot / warm / cold data rotation
Aging and removal
Data Retrieval
Perform search upon request, return data to search heads

18
Scaling - Indexers
Sizing for index performance
Indexers are usually storage-bound
Indexers: 150 to 250 GB per day, each. (With reference HW.)
Ref HW: 12 cores (2 GHz+), 12 GB RAM, 800+ IOPs
Optimal HW (normal disk): 16 CPU cores, 48 GB RAM
Optimal HW (SSD): 24 CPU cores, 132 GB RAM
Questions?

19
Tiered Storage
• Splunk supports tiered storage
• Hot / Warm buckets – put on fastest disk
• Size Hot/Warm for normal saved search durations. (7d, 30d)
• Use slower / cheaper storage (NAS?) for long term access
• Optional: Use Frozen to roll data to glacier, Hadoop, etc.

20
SSD Advantage
http://blogs.splunk.com/2012/05/10/quantifying-the-benefits-of-
splunk-with-ssds/
• Low cost random seeks
• Writes are not that much faster – no great improvement with Indexing
• Significant improvements with Sparse/needle-haystack searches
• Dense searches become CPU bound
• Searches run faster allowing for more completed searches/min
• Use Enterprise-grade SSDs, not commercial-grade.

21
Scaling - Storage
Manual storage calculation
Raw data rate  net compression of ~ 50% on disk.
Simple: rate * compression * retention / #indexers
Hot / warm requirements
– 200 GB / day * 50% * 30 days = 3TB per indexer
Cold storage requirements
– 200 GB / day * 50% * 335 days = 33.5TB per indexer
Clustering
– Changes storage story completely

22
Scaling - Storage
One example of good local storage
A well configured indexer using local storage might look like:
• SSDs in RAID 5, sized for 14 days of storage
• SATA drives in RAID 5, sized for 6 months of storage
SSDs: RAID 5 provides decent performance
Spinning disks:
• Hot/Warm, RAID 1+0, 800 IOPS or faster
• Cold – RAID 5 with proper block / stripe sizing

23
Scaling - Storage
Sizing Calculator: http://splunk-sizing.appspot.com/

Indexer Clustering

25
Delivers Mission-Critical Availability
• Data replication – maintain
searchability even if servers
go down
• Multi-site capable –
maintain searchability even
if a site goes down
• Search Affinity – optimized
searches by fetching from
the closest/fastest location
REPLICATION
Portland
Datacenter
New York
Datacenter
Clustering

26
Indexer Clustering
High-Availability, Out of the Box
Splunk indexer clustering
Active-Active= better performance
Specific terms:
– Master Node / Master Cluster Node
– Peer Node
– Search Factor
– Replication Factor
Additional details: Splunk Docs, Distributed Deployment Manual

27
Cross-site Clustering
Search Affinity by location
“Search locally”, “Store Globally”
DR scenarios

28
How Clustering Affects Sizing
• Increased storage:
– 15% of raw usage for every replica copy
– 35% MORE to make that searchable
• Increased processing
– Incoming data to indexer is streamed to indexing peers to satisfy required
number of copies
• More hosts
– Need “replication factor” + 2 (search head, cluster master)
2

29
Scaling - Storage
Sizing Calculator: http://splunk-sizing.appspot.com/

30
Downsides of Indexer Clustering
• Increased Storage
• Cluster master is required – use a VM.
• Increased bandwidth
Questions?
3

Search Heads

32
Scaling the Search Heads
Splunk Search is critical, too!
Scaling your search heads
Scale to handle # of concurrent queries
Dedicated Search heads for certain apps, scheduled alerts
Remember – Search heads virtualize well!

Search Head Clustering

34
SHP vs SHC
Seach Head Pooling
• Available since v4.2
• Sharing configurations through NFS
• Single point of failure
• Performance issues
• No shared storage requirement
• Replication using local storage
• Commodity hardware
• OSes: Linux or Solaris
NFS

35
1. Group search heads into a cluster
2. A captain gets elected dynamically
3. User created reports/dashboards automatically replicated
to other search heads

37
Search Tier Design Best Practices
37
• Minimum 3 nodes required
• ES will still require a Separate Search Head or dedicated SHC
• Use LDAP/AD/SSO for user Authentication
• Load Balancer configured for sticky sessions
• Must use deployer to push apps to search heads
• Confirm your applications’ support for SHC!
Questions?

The Final Stretch

39
Load Balancer
Search Head Cluster, Deployer
Clustered Peer Node + Cluster master
Deployment server
Universal Forwarders on Servers
Syslog, NetFlow data
HFs for scheduled polling via API
39

40
Hybrid Approach for rollout
40
• Add the existing Splunk
instance as a search peer
until the data retention
period has expired
• Disable scheduled searches
on the old instance
• Migrate any Summary Index
data to new Indexers

41
Additional considerations
• Don’t share hosts with other services
– Not co-located with Exchange, Active Directory, Hypervisors
• Don’t let anti-virus run on the Splunk partition on SH, Indexers
• Some data collection apps require a heavy forwarder, see docs
– VMware App
– NetApp App
– Checkpoint LEA
4

42
Distributed Management Console
Manage Splunk 6.2 environments
Replaces Deployment Monitor App
Incorporates SOS app prior to 6.2

43
Cloud & Hybrid
Scale without waiting for hardware

44
Suggested Reading
• Distributed Deployment Manual
– http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Distributedoverv
iew
• Highlights
– Reference hardware specs
– How searches affect performance
 Dense / Rare / Sparse
– App considerations
– Summary table
4

45
Top 5 things to Remember
45
• Indexers: Storage requirements, IOPS, RAID config
• Indexer clustering: HA, DR, and site affinity!
• SHC: Minimum buy-in for a SHC is 3
• When in doubt – add another Indexer
• Excellent VM candidates:
– Master Cluster Node (Indexer clustering)
– Deployer (Search head clustering)
– Deployment Server (Central Forwarder management)
– License Master
– Distributed Management Console

Taking Splunk to the Next Level – Architecture

More Related Content

What's hot

Viewers also liked

Similar to Taking Splunk to the Next Level – Architecture

More from Splunk

Recently uploaded

Taking Splunk to the Next Level – Architecture

Editor's Notes