SFBA Splunk Usergroup meeting March 13, 2024

© 2019 SPLUNK INC.
Welcome to the March SF Bay Area
Splunk User Group Meeting!
SFBA User Group Leaders
Becky Burwell, Sr. Production Engineer, Yahoo
burwell@yahooinc.com
Manan Grover, Splunk
mgrover@splunk.com

© 2019 SPLUNK INC.
How your Splunk deployment can have it all: Speed,
Scale and Simplicity.
Jon MacPhee, Splunk Admin, Pure Storage
Seamus Coyle, Splunk Admin, Pure Storage

1
®2023 Pure Storage Pure//Accelerate 2023
Your Splunk Deployment Can Have
It All! Speed, Scale, and Simplicity

Jon MacPhee
Splunk Administrator
Pure Storage
Seamus Coyle
Systems Engineer
Pure Storage
Speakers

Splunk Use Cases at Pure Storage
Security logging and monitoring,
security detections, correlation
searches on Splunk enterprise
security
Security Ops
Application monitoring, tracing,
alerting on Splunk Enterprise
App/Dev Ops
Visualizations, auditing, user and
system management on Splunk
Enterprise
IT Ops

Let’s Talk to the Stakeholders
High demand for CPU cores for
correlation searches and high
utilisation of storage for historical
searches, performance is crucial.
Security Ops
“We need to double
our ingest”
Application logs are high volume,
if security needed to utilize
available storage, DevOps would
be secondary.
App/Dev Ops
“We need more verbose
logging to better support
our applications”
Infrastructure changes are
increasing as the company grows
and requirements for audits
increase.
IT Ops
“We rely on Splunk to
perform our role and require
high availability”

Fork in the Road….
High demand for CPU cores for
correlation searches and high
utilisation of storage for historical
searches, performance is crucial.
Add More Indexers
with Additional
Block Storage
Application logs are high volume,
if security needed to utilize
available storage, DevOps would
be secondary.
Migrate Splunk
Cloud Offering
Infrastructure changes are
increasing as the company grows
and requirements for audits
increase.
Splunk SmartStore on
FlashBlade® Separating
Storage from Compute
How do we meet our customer demands while not sacrificing performance?

Scaling Issues with ‘Classic’ Splunk Architecture
Managing block storage
is hard work!
Increased management
overhead to add
additional indexers
Unscheduled increases
of data ingestion
Planning for multi-site
growth

Use Cases, Demands, Resources, and Why
SmartStore
FlashBlade allowed us to increase
ingest without sacrificing search
performance
FlashBlade out-performed
spinning disk DAS cold storage
when searching historical data
FlashBlade provides easy storage
scalability with non-disruptive
upgrades

Key Takeaways
•SmartStore allowed us to scale up with fewer
scale out resources while maintaining high
performance
•Migration to SmartStore was transparent to
our users
•Future capacity upgrades can be dictated by
storage or CPU
•Future storage increase will be non-disruptive

9
Benefits of Smartstore on FlashBlade
•FlashBlade//S ObjectStore provides a performant,
scalable S3 compatible backend for SmartStore
•Bucket migration between sites is easy with zero impact
to Splunk utilizing free, built-in features of Purity
•Future capacity, performance, and EoL/EoS upgrades
can be performed non-disruptively without performance
impact
•Pure1 Manage eases management and observability of
multiple FlashArray and FlashBlade appliances

10

© 2019 SPLUNK INC.
Splunk Admin Lessons Learned going from 50GB to
10TB License
Daniel Wilson, Voleon, Senior Security Engineer,

12+year Splunk Admin Lessons Learned
going from 10tb to 50gig License
Daniel Wilson

#whoami – Daniel Wilson
“Balancing imposture syndrome and Dunning-Kruger with a risk-based approach”
• PCI, GDPR, SOX and SEC compliance stuff
• SOC operations, investigations and incident response
• Cloud and Hybrid Security in AWS, Azure and GCP
Stalk me on LinkedIn
- https://www.linkedin.com/in/daniel-wilson-0229177/

Splunk
Experience
• About ~12 year if Splunk
• Splunk Customer Advisory
• Occasional speaker at Splunk User group
• I’m told I am one of the handful who ever got Arch II before they got rid it
so that’s cool

Agenda
•Agenda
•Who am I
•Experience
•Some best Practices
•Closing out

eBay/StubHub
Splunk fell in my lap
Fell in love with Splunk
Between 2011 and 2019 we got to 10tb
130 daily users, about a dozen SOC users
3 Splunk stacks across 3 data centers + Splunk Cloud
Heavy Focus on Splunk itself
Extensive mixed-use cases, partnering with eBay and Paypal
Extremely favorable budget

Voleon
I was brought in to standup Splunk
Started with 25gig gig license to 100ish
2 daily users , peaking at 5
Stand alone instance, later 4 indexer cluster
Dozens of related and unrelated SIEM tasks
Highly focused use cases
Budget on hardware and software is tight.

Let’s chat
• You can’t do it all, every best practice
and every good idea
• You can overengineer
• This talk is an attempt to help share
what I Think mattered still looking back
in the last few years

What have I
learned?
- Some people are chickens and some are pigs
•I am not professional services
•There is a minimum cost to run Splunk
•Administrative overhead
•Training of users
•Focus on what matters, not shelving data

Documentation
• Both companies have varying standards, ask questions and
ask again. Everyone hates docs until they need them.
• People don’t understand Splunk, they THINK they
understand Splunk and that’s scarier than those who admit
they don’t get it. OVER DOCUMENT AND LINK TO TRAINING
or Conf talks
• Splunk Lantern + Copy/Paste is your friend
• Docs will help you get through complex change controls and
GRC
• I’ve run into management who wanted world class docs and
management who wants the high points, culture will guide
the level of docs but always better than asked for
• Build trust in documentation
• README files EVERYWHERE

README
• Match your app.conf version number to your History for
easier coordination
• Standardize your README files and index them for self
documenting
• Standardize your comments to improve readability

Comments
• Splunk support worked with me back ~2014 or so to create the standard you see and I’ve
been doing it ever since
• Comments need to answer, who, what, where, how, when and why
• At the very least link out to change control tickets
• Consider standardizing your comments format to make easier to script or ingest your
configs to build documentation dashboards
# 1.2.2023 – dwilson, I change the thing for the reason (TICKET)

Visio
• https://docs.splunk.com/
Documentation/Communi
ty/current/community/Re
sources
• Work with Draw.io and
Omnigraph

Educating the Team
• SPL isn’t easy, we start to take it for granted and
seeing it through eyes of someone very smart and
untrained can be eye opening
• Splunk classes are great, but ONLY if you can
immediately put that person into practical
application.
• Don’t just answer their question, ask them to post
it to Splunk Answers or Splunk Community Slack
and answer it there.
• If they are not curious, they are not going to learn
it. Focus on value of use cases and passion over
formal processes.

Support
• Smaller company getting help can be more
complex
• Technical and sales will rotate aggressively
• No one is there to get a beer with
• You really must make an extra effort to build a
connection with the community.
• Effort to sync your sales team is will pay off!

Networking
• Don’t change the default ports, it makes it that
much harder for people help you.
• Take the time review how your Firewall engineer
set up Splunk ports and paths. Help them build
aliases and groups that match your docs and
internal naming conventions
• Protect that deployment server, it has a lot of
power
• Disable port 8089 on Splunk when not needed

Deployment
Topologies
Stand alone instance is Splunk are almost always the wrong way to
go unless you are really really sure you're never going to cluster
Made the mistake of stand alone and expanding out was challenging.
CNAMES for licensing, deployment server… everything! make it
easier to migrate off shared instances.
Even if you have one indexer consider setting up Index cluster in a
“cluster of 1” model to make expansion easier
Don’t buy hardware you can’t get a year later.
Have a dedicated box you do all your admin work from, never your
workstation. Harden it and lock out everyone.

Configuration
Files
Going straight to GIT is challenging
Teams all need to be versed in GIT
Taking the GUI away from beginners made the learning curve harder
Conder relaxing the Config management until your teams are level set
Indent your config files to make easier to see change

Data
Keeping people
educated about data
is complex, formalize
the process
Folks want to use
Splunk as a data lake
Stay Case focus, not
data lakes.
Splunk is not a syslog
server
Metrics are too hard
in Splunk still, folk
don’t get it

Apps
• Use a global app zzzzMyApp remember Splunk applies
app configs BACKWARDS Z to A.
• Create an app Zglobal to set your defaults
• Folks have challenge understanding proper data
onboarding. It’s important to over educate.
• Don’t trust anything with a binary, Py etc in it. Test test
test.
• It’s not even just security, I’ve found gigs of error logs in
apps that were working fine. Only bring in what you
need and understand.

Roles and Users
• Modern SSO with Entra or OKTA is outstanding
especially for clusters.
• If you must use LDAP, use LDAPS don’t send creds
in clear text
• I found there was “demand” for extreme rights in
all cases, and in every case it’s resulted in bad
things.
• Match your Permissions to your Org chart as much
as possible, 5 years from now you will thank you
for not creating group creep

Closing Out
Save Money
• Don’t ingest what you don’t need
• Answer real questions, don’t
collect logs
• Use metrics if you can
• If use cases are unclear, Leave it
on a syslog server, zip it up and
save it

It’s a balancing act you can’t ‘win’,
but remember DOWNTIME to
reduce risk and waste
• Defect – if customer isn’t happy, it’s not worth it
• Overproduction – Have standards, but don’t over do the data enrichment
• Waiting – if you’re waiting or they are waiting, you’re not adding value
• Non-Utilized Talent – Design your roles, training programs and documentation for
self service
• Transport – Desing your experience to reduce moving between tooling when
possible.
• Inventory – Logs and metrics without use case are just inventory junk
• Motion – Multiple people to get one thing done is a recipe for a problems
• Extra Process – Too much process is just as bad as not enough.

SFBA Splunk Usergroup meeting March 13, 2024

Recommended

Recommended

More Related Content

Similar to SFBA Splunk Usergroup meeting March 13, 2024

Similar to SFBA Splunk Usergroup meeting March 13, 2024 (20)

More from Becky Burwell

More from Becky Burwell (13)

Recently uploaded

Recently uploaded (20)

SFBA Splunk Usergroup meeting March 13, 2024