Data Obfuscation in Splunk Enterprise

Copyright © 2015 Splunk Inc.
Data Obfuscation in
Splunk Enterprise

2
Agenda
The Drivers
Data-in-Flight
Data-at-Rest
Data Obfuscation within Splunk Enterprise
– Anonymization
– Pseudonymization
Summing Up
Demonstration

3
Agenda
The Drivers
Data-in-Flight
Data-at-Rest
Data Obfuscation within Splunk Enterprise
– Anonymization
– Pseudonymization
Summing Up
Demonstration

4
The Drivers
risk
minimization
strategy

5
The Drivers
Stakeholder* Workers
Council
Data Privacy
Officer
GDPR Privacy
Shield
PCI ….
Requirements* Anonymization Pseudonymization Pseudonymization Encryption RAW Event
archival for 1
year – 3
month
online
*Examples only | Your legal department will assist you.

6
The Drivers
Stakeholder* Workers
Council
Data Privacy
Officer
GDPR Privacy
Shield
PCI ….
Requirements* Anonymization Pseudonymization Pseudonymization Encryption RAW Event
archival for 1
year – 3
month
online
You need to ensure to have a flexible platform
that fits your needs
–
even if they change!
*Examples only | Your legal department will assist you.

7
Spoilt for Choice
What
– Confidentiality / Integrity / Authenticity
Where
– At Source / In Flight / At Rest / Presentation Layer
How
– Anonymization / Pseudonymization
Usability, Maintainability, Cost, …

9
Data-in-Flight
Encryption and/or authentication using your own certificates for:
– Communications between the browser and Splunk Web
– Communication from Splunk forwarders to indexers
– Other types of communication, such as communications between Splunk
instances over the management port
Type of exchange Client function Server function Encryption Certificate
Authentication
Common Name
checking
Type of data exchanged
Browser to Splunk
Web
Browser Splunk Web NOT enabled by default dictated by client
(browser)
dictated by client
(browser)
search term results
Inter-Splunk
communication
Splunk Web splunkd enabled by default NOT enabled by default NOT enabled by default search term results
Forwarding splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default data to be indexed
Deployment server to
indexers
splunkd as a
forwarder
splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default Not recommended. Use Pass4SymmKey
instead.
http://docs.splunk.com/Documentation/Splunk/latest/Security/AboutsecuringyourSplunkconfigurationwithSSL

11
Integrity
Compute SHA256 hash for every slice in hot bucket
When bucket rolls from hot to warm, create SHA256 hash of the file
containing the hashes of the individual slices
Can verify integrity from the CLI
Enable for an entire index
http://docs.splunk.com/Documentation/Splunk/latest/Security/Dataintegritycontrol http://blogs.splunk.com/2015/10/28/data-integrity-is-back-baby/

12
Encryption
Encryption of all data Splunk writes to disk
(index, raw data, metadata)
Pros:
– Easy to implement with OS or device means
– Covers all data
– Transparent to Splunk
Cons:
– Limited granularity
– Performance overhead
– Limited security against rogue users

13
Encryption
https://www.vormetric.com/sites/default/files/wp-splunk-vormetric.pdf

Data Obfuscation in
Splunk Enterprise

15
What is Anonymization?
Anonymization of data means processing it with the aim of irreversibly
preventing the identification of the individual to whom it relates.
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 ****** login successful

16
What is Pseudonymization?
Pseudonymization of data means replacing any identifying
characteristics of data with a pseudonym, or, in other words, a value
which does not allow the data subject to be directly identified.
2016-12-24 09:00 host1 mm28522 login successful
2016-12-24 09:00 host1 0fc43cd589ec74ddb677501adf6c295b login successful

19
At Indexing Time
Used SEDCMD or TRANSFORMS at indexing time
Pros:
– Easy to implement and maintain, easy usability, low
complexity
– No impact on licensing
Cons:
– Modifies raw events
– Anonymization -> less information available
https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata

21
Presentation Layer
Hide data at presentation layer
Locked down User
– Pre-defined App with dashboard access only
– No search app, no raw search, no raw event drill
down
| eval username = “****”
| eval username=sha256(username)
or use your own custom search command

22
Application Layer
Data pseudonymization before Splunk picks it up
Pros:
– Managed earliest as possible in the process
– Data source owner responsible
– Data-Privacy challenge solved for data stored on
source as well
Cons:
– Individual solution per data source/type/method
required

23
Event Duplication
Duplicate event, store original event and
pseudonymized event in separate indexes
Pros:
– Easy to implement and maintain, easy usability,
low complexity
Cons:
– Storage costs (can be limited with tsidx
retention but slower search)
– License costs
idx_cleartext
idx_pseudonym

24
Summary Index
Scheduled summary search transforms the data
and stores it in a new summary index
Pros:
– Summary index does not count against license
– Everything GUI managed
– Allows grouped aggregation (anonymization, too)
Cons:
– Regular search utilizing resources
– Breaks out-of-the-box CIM (source=search name,
sourcetype=stash, original sourcetype moved to
orig_sourcetype)
idx_cleartext
idx_summary

25
Input Layer
Data de-centralized piped through a custom
method using a modular input
Pros:
– High flexibility on encryption, hashing etc. methods
and requirements
– Processing can be done decentralized at each
forwarder to distribute processing load
Cons:
– Scripting required for modular inputs

26
Summing Up
Many possible ways – each has pros and cons
Anonymization
– Data aggregation might be an additional layer as specific access to a specific file
from a specific host does potentially allow identification back to an individual
Pseudonymization
– Requires a proper concept to ensure the pros and cons are known and accepted
in advance such that impact and additional complexity is understood in
production and operation use
We are transparent on possibilities, allow multiple ways and levels
which are available for data obfuscation.

28
Demo Scenario
Encryption
Modular Input
Log file with sensitive data
Read log file data
File Monitor input (UF)
Modular Input encrypts field
values
Data sent and stored
Decryption
Custom Search Command
Events in Splunk with encrypted
field values
User is authorized to use
custom search command
Custom search command
Decrypts fields
Anonymization
SEDCMD
Log file with sensitive data
Read log file data
File Monitor Input (UF)
Pipeline
Apply SEDCMD and replace data
Data stored

29
Modular Input?
https://splunkbase.splunk.com/app/1901/

30
Modular Input? Splunkbase!
http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ModInputsIntro

31
Protocol Data Input
Different input protocols
Custom data handler allows to
pre-process data
– Polyglot: many programming
languages can be used. E.g. Java,
JavaScript, Python, …
Different output protocols
Data Handler
https://splunkbase.splunk.com/app/1901/

32
Log File cleartext.log
Field Description Action we want to take
first First name Encrypt with AES
name Last Name Encrypt with AES
dob Date of Birth Encrypt with AES
uid Employee ID Anonymize

34
IDX Anonymization SEDCMD (props.conf)

35
Create PDI Custom Data Handler

36
Receiver – Protocol Data Input

37
PDI Configuation – Protocols

38
PDI Configuration – Data Handler
Parameters for custom data handler:
• regex: identify fields to encrypt
• *_Key_File: Keys to use to encrypt
PDI Custom data handler (here: Java)

40
Decrypt Data – Custom Search Command

Data Obfuscation in Splunk Enterprise

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Data Obfuscation in Splunk Enterprise

Similar to Data Obfuscation in Splunk Enterprise (20)

More from Splunk

More from Splunk (20)

Recently uploaded

Recently uploaded (20)

Data Obfuscation in Splunk Enterprise

Editor's Notes