Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Daten anonymisieren und pseudonymisieren in Splunk Enterprise

2,078 views

Published on

Es gibt unterschiedlichste Gründe, warum Maschinendaten vor unberechtigten Zugriffen geschützt werden sollten. Interne und Externe Compliance Vorgaben sowie "Privacy by Design" Strategien zur Verbesserung der Sicherheit oder als Teil einer Risiko-Minimierungsstrategie werden für Unternehmen im Big Data Bereich immer wichtiger.

In diesem Webinar erfahren Sie, wie Sie Ihre Maschinendaten auf unterschiedlichen Ebenen schützen:

- in Motion: sichern Sie die Verbindungen von und zu Splunk Enterprise ab
- Datenintegrität: stellen Sie die Datenintegrität der in Splunk gespeicherten Daten sicher
- At Rest: verschlüsseln Sie alle Daten, die Splunk auf Disk schreibt
- Einzelne sensible Felder in Ihren Maschinendaten anonymisieren / pseudonymisieren

Published in: Technology
  • Be the first to comment

Daten anonymisieren und pseudonymisieren in Splunk Enterprise

  1. 1. Copyright © 2015 Splunk Inc. Data Obfuscation in Splunk Enterprise
  2. 2. Agenda The Drivers Data-in-Flight Data-at-Rest Data Obfuscation within Splunk Enterprise – Anonymization – Pseudonymization – Summing Up Demonstration
  3. 3. Agenda The Drivers Data-in-Flight Data-at-Rest Data Obfuscation within Splunk Enterprise – Anonymization – Pseudonymization – Summing Up Demonstration
  4. 4. The Drivers risk minimization strategy
  5. 5. The Drivers Collect and Process Data 5 Stakeholder* Workers Council Data Privacy Officer GDPR Privacy Shield PCI …. Requirements* Anonymization Pseudonymization Pseudonymization Encryption RAW Event archival for 1 year – 3 month online *Examples only | Your legal department will assist you.
  6. 6. The Drivers Collect and Process Data 6 Stakeholder* Workers Council Data Privacy Officer GDPR Privacy Shield PCI …. Requirements* Anonymization Pseudonymization Pseudonymization Encryption RAW Event archival for 1 year – 3 month online *Examples only | Your legal department will assist you. You need to ensure to have a flexible platform that fits your needs – even if they change!
  7. 7. Spoilt for Choice What – Confidentiality / Integrity / Authenticity Where – At Source / In Flight / At Rest / Presentation Layer How – Anonymization / Pseudonymization Usability, Maintainability, Cost, … 7
  8. 8. Data-in-Flight
  9. 9. Data-in-Flight Ways to secure your connections to Splunk Enterprise Encryption and/or authentication using your own certificates for: – Communications between the browser and Splunk Web – Communication from Splunk forwarders to indexers – Other types of communication, such as communications between Splunk instances over the management port 9 Type of exchange Client function Server function Encryption Certificate Authentication Common Name checking Type of data exchanged Browser to Splunk Web Browser Splunk Web NOT enabled by default dictated by client (browser) dictated by client (browser) search term results Inter-Splunk communication Splunk Web splunkd enabled by default NOT enabled by default NOT enabled by default search term results Forwarding splunkd as a forwarder splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default data to be indexed Deployment server to indexers splunkd as a forwarder splunkd as an indexer NOT enabled by default NOT enabled by default NOT enabled by default Not recommended. Use Pass4SymmKey instead. http://docs.splunk.com/Documentation/Splunk/latest/Security/AboutsecuringyourSplunkconfigurationwithSSL
  10. 10. Data-at-Rest
  11. 11. Data-at-Rest Integrity Ways to ensure the integrity of your machine data stored in Splunk Compute SHA256 hash for every slice in hot bucket When bucket rolls from hot to warm, create SHA256 hash of the file containing the hashes of the individual slices Can verify integrity from the CLI Enable for an entire index 11 http://docs.splunk.com/Documentation/Splunk/latest/Security/Dataintegritycontrol http://blogs.splunk.com/2015/10/28/data-integrity-is-back-baby/
  12. 12. Data-at-Rest Encryption Entire data set Encryption of all data Splunk writes to disk (index, raw data, metadata) Pros: – Easy to implement with OS or device means / covers all data / transparent to Splunk Cons: – All indexes on a given file system / performance overhead / limited security against rogue users
  13. 13. Data-at-Rest Encryption Transparent Encryption-at-Rest with Vormetrics 13 https://www.vormetric.com/sites/default/files/wp-splunk-vormetric.pdf
  14. 14. Data Obfuscation within Splunk
  15. 15. What is Anonymization? Anonymization of data means processing it with the aim of irreversibly preventing the identification of the individual to whom it relates. 15 2016-12-24 09:00 host1 mm28522 login successful 2016-12-24 09:00 host1 ****** login successful
  16. 16. What is Pseudonymization? Pseudonymization of data means replacing any identifying characteristics of data with a pseudonym, or, in other words, a value which does not allow the data subject to be directly identified. 16 2016-12-24 09:00 host1 mm28522 login successful 2016-12-24 09:00 host1 0fc43cd589ec74ddb677501adf6c295b login successful
  17. 17. Anonymization
  18. 18. Anonymization At Rest / At Indexing Time / Modify Raw Events SEDCMD or TRANSFORMS props.conf [source::.../accounts.log] SEDCMD-accounts = s/ssn=d{5}(d{4})/ssn=xxxxx1/g [source::.../another.log] TRANSFORMS-anon=ssn-anon transforms.conf [ssn-anon] REGEX=(ssn=)d{5}(d{4}) FORMAT=$1xxxxx$2 DEST_KEY=_raw 18 https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata
  19. 19. Anonymization Presentation Layer / At Search Time Locked down User – Pre-defined App with dashboard access only – No search app, no raw search, no raw event drill down | eval username = “******“ 19 https://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Anonymizedata
  20. 20. Pseudonymization
  21. 21. Pseudonymization Presentation Layer / At Search Time Locked down User – Pre-defined App with dashboard access only – No search app, no raw search, no raw event drill down | eval username = sha256(username) or use your own custom search command 21 https://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Anonymizedata
  22. 22. Pseudonymization At Source / Application Data pseudonymization before Splunk picks it up Pros: – Managed earliest as possible in the process – Data source owner responsible – Data-Privacy challenge solved for data stored on source as well Cons: – Individual solution per data source/type/method required
  23. 23. Pseudonymization Event Duplication Into Different Indexes User authorization managed via role based access control for indexes Pros: – Easy to implement and maintain, easy usability, low complexity Cons: – Storage costs (can be limited with tsidx retention but slower search) – License costs idx_cleartext idx_pseudonym
  24. 24. Pseudonymization Using Summary Index Scheduled summary search transforms the data and stores it in a new summary index Pros: – Summary index does not count against license – Everything GUI managed – Allows grouped aggregation (anonymization, too) Cons: – Regular search utilizing resources – Breaks out-of-the-box CIM (source=search name, sourcetype=stash, original sourcetype moved to orig_sourcetype) idx_cleartext idx_summary
  25. 25. Pseudonymization Modular Input Data de-centralized piped through a custom method using a modular input Pros: – High flexibility on encryption, hashing etc. methods and requirements – Processing can be done decentralized at each forwarder to distribute processing load Cons: – Scripting required for modular inputs
  26. 26. Summing Up
  27. 27. Summing Up Many possible ways – each has pros and cons Anonymization – Data aggregation might be an additional layer as specific access to a specific file from a specific host does potentially allow identification back to an individual Pseudonymization – Requires a proper concept to ensure the pros and cons are known and accepted in advance such that impact and additional complexity is understood in production and operation use We are transparent on possibilities, allow multiple ways and levels which are available for data obfuscation. Choose the best and most efficient combination for you!
  28. 28. Demonstration
  29. 29. http://docs.splunk.com/Documentation/Splunk/latest/AdvancedDev/ModInputsIntro Modular Input Documentation
  30. 30. Modular Input Search on Splunkbase https://splunkbase.splunk.com/apps/#/search/Modular%20Input/
  31. 31. Protocol Data Inputs Different input protocols Custom data handler allows to pre-process data – Polyglot: many programming languages can be used. E.g. Java, JavaScript, Python, … Different output protocols Data Handler https://splunkbase.splunk.com/app/1901/
  32. 32. Demo Scenarios Encryption Modular Input Log file with sensitive data Read log file data File Monitor input (UF) Protocol Data Inputs Data Handler encrypts field values Data sent and stored Decryption Custom Search Command Events in Splunk with encrypted field values User is authorized to use custom search command Custom search command Decrypts fields Anonymization SEDCMD Log file with sensitive data Read log file data File Monitor Input (UF) Pipeline Apply SEDCMD and replace data Data stored 32
  33. 33. Log File With Sensitive Data – cleartext.log 33 Field Description Action we want to take first First name Encrypt with AES name Last Name Encrypt with AES dob Date of Birth Encrypt with AES uid Employee ID Anonymize
  34. 34. UF File Monitor – Forward Data 34
  35. 35. Receiving side – Protocol Data Inputs 35
  36. 36. Protocol Data Inputs Configuration – Protocols 36
  37. 37. Protocol Data Inputs Configuration – Data Handler 37 Parameters for custom data handler: • regex: identify fields to encrypt • AES_Key_File: Key to use to encrypt PDI Custom data handler (here: Java)
  38. 38. Processed Data 38
  39. 39. Decrypt Data – Custom Search Command 39
  40. 40. Anonymization 40
  41. 41. SEDCMD for Anonymization of uid Field (props.conf) 41
  42. 42. Q & A
  43. 43. Splunk User Groups EMEA 43 https://usergroups.splunk.com/
  44. 44. Thank You!

×