2. 2
Splunk at the Next Level
Time to move beyond initial Splunk environment
• More use cases – how to tackle?
• More data – how do we scale?
• Splunk is mission critical == HA
• Global deployments
• Splunk user experience Screenshot here
4. 4
Growing your Splunk Deployment
Many customers start with a single use case…
• Ex: Monitor the web servers
• Help ensure up-time & response times
• Track usage, errors
• Provides business value
5. 5
Growing your Splunk Deployment
Value statement for each overall service
Your services exist in a larger context than just one app, or one tier.
What is the value of the service as a whole?
What are CIO commitments for the service?
• The company’s web store is one of the most critical parts of the business.
• Performance of the overall environment must be maintained at all times.
• Failures in any portion of the web store must be quickly identified, send
notification to the appropriate parties.
• Dependencies on external processes must be monitored as well.
6. 6
Growing your Splunk Deployment
The larger context
• Failure in one system cascades
• Map dependencies, estimate costs
• Use Splunk to track all dependencies.
• What happens when it is down?
Dependencies often include:
• Networking dependencies
• Shared storage
• Databases, middleware, custom apps
• Virtualization layer
Screenshot here
10. 10
Scaling - Storage
Simple storage to complex
Raw data rate net compression of ~ 50% on disk.
Simple: rate * compression * retention
200 GB / day * 50% * 100 days = 10TB
Consider cold storage on NAS
– Changes storage story.
– Retention on fast, retention on slow
Clustering
– Changes storage story
12. 12
Scaling - Storage
RAID + SSD deep dive
• For spinning disks, Splunk recommends RAID 1+0 with 1k IOPs
• SSDs provide extremely high IOPs (45,000 +)
• RAID 5 SSD arrays give great Splunk performance in most
scenarios.
Additional details: Splunk Docs, Capacity Planning Manual
13. 13
Forwarder Load Balancing
Have UF balance across multiple indexers
DNS round robin
Multiple hosts in outputs
LB not needed!
Geography-based routing
14. 14
Indexer Clustering
High-Availability, Out of the Box
Splunk indexer clustering
Active-Active= better performance
Specific terms:
– Master Node
– Peer Node
– Search Factor
– Replication Factor
Additional details: Splunk Docs, Distributed Deployment Manual
16. 16
Scaling the Search Heads
Splunk Search is critical, too!
Splunk Search high availability needs
Scale to handle # of concurrent queries
17. 17
SHP vs SHC
SHC
• SHP
• Available since v4.2
• Sharing configurations through NFS
• Single point of failure
• Performance issues
• No NFS
• Replication using local storage
• Commodity hardware
NFS
19. 19
Search Head Clustering
Use “Captain” for Master to avoid confusion with Index-Clustering
Minimum 3 nodes required. Odd is always preferred.
Cluster takes certain key decisions based on *majority* (consensus)
In multi-site setup have more nodes in main datacenter
21. 21
Deployment Server
Central management of Splunk Forwarders
Deployment Server manages Apps, Configs
Select one or more classes for each host
Class defines apps & configs
Works by phone-home
Notes:
DS does not push forwarder binaries
Use Cluster Master to manage indexers in cluster, not DS
23. The 6th Annual Splunk Worldwide Users’ Conference
September 21-24, 2015 The MGM Grand Hotel, Las Vegas
• 50+ Customer Speakers
• 50+ Splunk Speakers
• 35+ Apps in Splunk Apps Showcase
• 65 Technology Partners
• 4,000+ IT & Business Professionals
• 2 Keynote Sessions
• 3 days of technical content (150+ Sessions)
• 3 days of Splunk University
– Get Splunk Certified
– Get CPE credits for CISSP, CAP, SSCP, etc.
– Save thousands on Splunk education!
23
Register at: conf.splunk.com
24. The 6th Annual Splunk Worldwide Users’ Conference
September 21-24, 2015 The MGM Grand Hotel, Las Vegas
Did you like this session on Taking Splunk Architecture to the Next
Level? You should check out these sessions at .conf2015?
• 1 Architecting Splunk for High Availability and Disaster Recovery
• 2 Go Big or Go Home
• 3 Search Head Clustering
Register at: conf.splunk.com
26. 26
We Want to Hear your Feedback!
After the Breakout Sessions conclude
Text Splunk to 878787
And be entered for a chance to win a $100 AMEX gift card!
I want to look at my Web Server Environment
Gauge end user reponses, look for 404 errors or whatever you are doing in your environment
Look at dependencies that Web Servers have.
Same conversations about email, or Active Directory or other Key services
Dependencies, middleware, storage, information available to you on the wire.
If any part of this environment goes down, what is the business impact of that?
Started with just looking at a Web Server
But
Load balancers
Firewalls
DNS Servers Facing the Internet
All of that guides people to your Web
None of it works when the database is down,
middleware
How to Plan out the Number of Disks you Need as well as Scaling out your Search Heads
SSDs 50K IOPS . So far off the charts. On a SATA based SSD using MLC. So the cheapest thing you can buy and it just goes through the roof after that
RAID 5 is terrible for Performance if you are standard physical disk.
RAID 5 with SSDs an option
Avoid RAID 5 when you can afford RAID 1+0 or any time you have spinning Disks
When you want to scale that out, consider moving your Cold out to NAS
He said, we support SIFS (what are SIFS), we don’t recommend but can use for cold. Heavy reads, no writes
Virtualizing:
Biggest concern is shared disk storage.
Do you have OLTP high transactional Oracle Databases running in your
Too high of a Disk Profile, if no, then your Splunk Indexers shouldn’t be running there either.
Give 100% reservation when you can
Use the same reference specs
Our Splunk in the Cloud is all Virtualized. There is nothing inherently wrong about a virtualized environment, you just have to be careful
Splunk for Vmware App
Side Note: If you double the number of indexers, if you double the number of Indexers you will effectively double the performance.
When I go from
28.00
Outputs.conf file: IP or hostname of a single indexer
Pointed to a DNS multi-value A record (what they call a round robin A record) or you can identify the indexers
If you are using DNS round Robin. Lots of solutions, the first one they see
Indexers.splunk
Put them into a pool and randomly cycle through all ten of those indexers
Don’t need a Load Balancer, if you have
Any time you have an application that understands Load Balancing, it is going to do a better job because
31.20
Geography Based Routing.
How many have more than one physical location
Indexers that are geographically located. Data from all of those local sources can roll to the Indexer located locally within that data center
By the
32.25
Active-Active All are ‘in service’ at any given time
Search Head distributes it’s query across all of the Indexer
How it knows that is
Replication Factor
Cross Site Clustering
Search Affinity by Location (how does that work?)
37.0
Search Factor of Two, Rep Factor of 3
I want to have a copy of that Data sent over t
SH in New York knows to query the local copy of the LA data it has
How do you get from Not Clustered to Clustered
Master Node manages all of the Apps and Configurations
Turn on Clustering with a Search Factor of 1 and a Replication Factor of 1
Splunk is going to add a little bit of additional metadata at ingest time
Stand up another Indexer and increase your Replication and Search Factor
If I have 5 Regions, can I have a local set of replicated copies at that one location?
Multi-Site Clustering
41.00
Search Head Pooling: Sorry. Had to have extremely high speed NFS to handle it. Single point of failure if the storage went down
Search Head Clustering: Doesn’t require NFS
Replication using local storage. Spunk the app is replicating that data back and forth with regard to the Search Head
One search per core
Deployment Manager (see number of concurrent Searches Running)
Example Topology
One of the Cluster Members will self-elect as a Captain
Deployer is responsible for managing the configurations of all of these Search Heads
Take away from this slide: Clustering Works, No longer requires NFS.
Talk to your Engineer
Architecture Class
Documentation
Came out last October
We require three nodes in the SH cluster. We use majority decision consensus approach.
Load Balancer should be ‘pretty sticky’. How much affinity to that session.
Use Search Head Clustering so I can scale out (not really focused on HA so much)
If you have used SOS in the Past. Support analyzing diag
Scaling discussion we already had
Health of your Indexers, Search Heads, License Master, Deployment Servers, KV Store (new feature 6.1 or later)
Distributed Management Console rolls in all kinds of info
50.59
Puppet or Chef or some fancy auto sync method
If you don’t have those tools, can use the Deployment Server
Enables the Splunk
Knows kind of OS it’s coming from
Active Directory, Mac OS
Manually managing
Allows your Splunk Admins to control what Splunk is collecting without having to contact Puppet or Chef environment
Instead of waiting for change control
55.00
And finally, I would like to encourage all of you to attend our user conference in September.
The energy level and passion that our customers bring to this event is simply electrifying.
Combined with inspirational keynotes and 150+ breakout session across all areas of operational intelligence,
It is simply the best forum to bring our Splunk community together, to learn about new and advanced Splunk offerings, and most of all to learn from one another.