SlideShare a Scribd company logo
1 of 9
Download to read offline
Limitations of Privacy Solutions for Log Files
Jonathan Oliver jon oliver@trendmicro.com
31 August 2021
1 Introduction
In this paper we are considering collecting log files (in particular log files for security purposes)
and the storage / processing of those log files. Some use cases include:
• Working with data which has PII (personally identifiable information) embedded in it.
For example, data with email addresses in it.
• When data is processed in a 3rd party country. For example, data which is collected in
country A may be hosted on on cloud servers in country B. Complex situations may arise
because the data may fall under the laws of country B.
• Extracting IoCs (indicators of compromise) from data. We are interested in IoCs which
are public knowledge and do not uniquely identify a victim.
1.0.1 Privacy Example
Consider a situation with 3 people: Alice, Bob and Charlie. Each person generates log files
which track various events which occur on their computers.
Attackers send personalized malware with the string XYZZY (the malicious IoC) and the
name of the victim encoded. So the logs look like
Person Computer Event Data
------ -------- ----- ----
Alice Computer1 EventA-1 XYZZY-abc
Alice Computer1 EventA-2 XYZZY-abc
Alice Computer1 EventA-3 XYZZY-abc
...
Bob Computer2 EventB-1 XYZZY-def
Bob Computer2 EventB-2 XYZZY-def
Bob Computer2 EventB-3 XYZZY-def
...
Charlie Computer3 EventC-1 XYZZY-ghj
Charlie Computer3 EventC-2 XYZZY-ghj
Charlie Computer3 EventC-3 XYZZY-ghj
where
abc = encrypted(Alice)
def = encrypted(Bob)
ghj = encrypted(Charlie)
We want to extract an IOC associated with this malware (XYZZY in this case) while maximising
the privacy afforded to Alice / Bob / Charlie.
This example is typical of various log files which are generated by security products such as:
1
• Email logs;
• Window events logs;
• Firewall logs;
• . . .
1.1 Desireable Properties
We desire a privacy solution which allows us to collect the logs from various machines / computers
and process it in a way that protects the privacy of the individuals. Specifically we want to do
this in a way which meets our privacy requirements
• Collect these logs from multiple computers into a single repository
• Transform / delete parts of the data which identifies a person
• Retaining data which occurs accross multiple people (and hence may be considered public
data)
in a reasonable ammount of computation.
1.2 Review of Privacy Approaches
Here we give a review of the various privacy methods and attempt to apply them to our example
above. here we distinguish between 2 types of data:
• Descriptive data: which has one row per person (the majority of privacy methods ade-
quately address this problem)
• Log files: where a person may contribute multiple rows (typically many rows). This covers
the various log files mentioned above (event logs, firewall logs, etc) and we discuss below
why privacy solutions (such as differential privacy or k-anonymity) do not adeqautely
address these types of data.
1.2.1 Descriptive Data
A typical list of people might look like:
Person Country Industry
Id Name Email
1 Person A a@abc.company Argentina Accounting
2 Person B b@b.company Brazil Manufacturing
. . . . . .
100 Person Z z@z.company USA Health
This type of data can be made “private” using differential privacy or k-anonymity (well respected
privacy approaches used around the world).
1.2.2 Log Files
Log files consist of 2 seperate tables (explicitly or implicitly). Most log files take the form where
the first table defines the people under consideration, and the second table defines events or
transactions for each person in the first table.
2
The first table is a list of people:
Table 1
PID Col1 . . . ColMax1
P1 . . .
. . . . . .
PMax . . .
Column 1 is a PID which defines each person.
The second table is a list of events (or transactions) from the people in Table 1:
Table 2
PID Event Id Col1 . . . ColMax2
P1 Event1 . . .
P1 Event2 . . .
P1 Event3 . . .
. . . . . . . . .
Pj EventMax . . .
In the second table, we allow multiple events associated with a personal identifier. For example,
Table 3 has 3 events associated with PID P1.
1.2.3 Privacy Approaches
We review a range of privacy mechanisms in this paper, and consider how they can be applied
to the log file problem. We consider:
• Differential Privacy [1, 2]
• k-anonymity [3, 4]
• Homomorphic Encryption [5, 6]
• Monero style privacy [7]
• Secure Multiparty Computation [8, 9] (which also covers Federated Machine Learning [10])
• Secret Sharing Schemes [11]
1.2.4 Privacy Operations
The operations used by privacy mechanisms (including those listed above) include:
• Suppressing data (either deleting it or replacing it with NULL values);
• Generalizing data (example transforming a persons age into an age range);
• Encrypting data;
• Hashing data; and
• Adding errors to data.
3
2 Differential Privacy
Differential privacy is a system for publicly sharing information about a dataset by describing
the patterns of groups within the dataset while withholding information about individuals in
the dataset.
Consider the situation where we have a data row of interest. If errors are added in a
systematic way so that you get similar or the same answers with / without the row in question,
then we have protected the privacy of that row.
The definition and maths can extend to making 2 rows, 3 rows, ... private. This covers the
case that we may want to allow groups of individuals up to some size N to remain private. So
given N a maximum number of rows that we need to make private at once, we can determine
the error distribution to achieve that.
Differential Privacy is not suited for the log file problem. The amount error required to
achieve privacy on a log file depends on the number of rows which which may be associated with
a person. So a email log file for 1 day, might contain 100 emails from a user. To ensure the
privacy of this data would require an extra-ordinary ammount of error to be added, and almost
certainly make any analysis useless.
3 K-Anonymity
k-anonymity is a property possessed by certain anonymized data. A release of data is said to
have the k-anonymity property if the information for each person contained in the release cannot
be distinguished from at least k − 1 individuals whose information also appear in the release.
k-anonymity does appear to be relevant to the log file problem.
3.1 Limitations K-Anonymity
k-anonymity suffers from the following limitations:
• Background knowledge may be available that is not in the dataset which allows identifi-
cation.
• k-anonymity is not a good method to anonymize high dimensional data For example,
researchers from MIT [12] showed that, given 4 locations, the unicity 1 of mobile phone
timestamp-location datasets can be as high as 95
k-anonymity is not suited for the log file problem, or checking IoCs. The k value in k-
anonymity needs to be replaced by the MaxRows that we associate with a person. So if we
are analysing network logs where a single user has 100 rows, then we would need to apply
k-anonymity with k = 100 which would probably result in nearly all data in the log being
suppressed.
4 Homomorphic Encryption
Homomorphic Encryption involves doing computation on encrypted data. Microsoft in 2012 re-
ported a slow down of 6-7 orders of magnitude (https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/02/323.pdf). UPenn in 2016 reported a slow down of 9 orders of magni-
tude (https://haeberlen.cis.upenn.edu/papers/seabed-osdi2016.pdf). It would appear that Ho-
momorphic Encryption is not yet feasible for working with data at scale or processing large log
files.
1
Unicity is measured by the number of points needed to uniquely identify an individual in a data set.
4
5 Monero Style Privacy
Monero is a crypto-currency where the key features are those around privacy and anonymity:
• The value of transactions is obfuscated.
• Sending addresses are hidden in combination with other addresses (in a ”ring signature”)
so it is not clear exactly who sent a transaction.
• Receiving addresses are hidden using stealth addresses which are generated using a secret
sharing scheme.
There has been a back and forth between Monero and researchers who have pointed out
privacy concerns in the approaches used by Monero. More recently (September 2020), the
United States IRS posted a USD $625,000 bounty to a company to develop tools to help trace
Monero and related crypto-currencies.
6 Secure Multi-party Computation / Federated Learning
The example in Section 1.0.1 high-lights the problem with Federated Learning.
• A learner at Computer1 cannot distinguish between the IoC (XYZZY) and an encoded
version of the first victim (abc).
• A learner at Computer2 cannot distinguish between the IoC (XYZZY) and an encoded
version of the second victim (def).
• A learner at Computer3 cannot distinguish between the IoC (XYZZY) and an encoded
version of the thrid victim (ghj).
We need to merge the records from different people to identify which elements are private and
which elements are suitable as public IoCs. But the very process of merging the records breaks
the very privacy that we are attempting to create.
7 An Approach for Making Log Files Private
7.1 Proposal Step 1: Rewrite Identifiers with a Ring Signature
We may have sensitive data sets where we want/need to replace a personal identifier with another
token for the purposes of clustering / pivoting / identifying IoCs / etc.
The problematic table in a log file is Table 2:
Table 2
PID Event Id Col1 . . . ColMax2
P1 Event1 . . .
P1 Event2 . . .
P1 Event3 . . .
. . . . . . . . .
Pj EventMax . . .
We replace the PID with a Ring Signature for that data row. We define a parameter R to
determine how imprecise each Ring Signature will be. The Ring Signature for EventE which
came from person Pi should be created by
1. SetE = randomly generate a set of R − 1 people;
5
2. RSE = generate a ring signature for the set Pi + SetE
This gives us the following Table:
Table 3
Ring Event Id Col1 . . . ColMax2
Signature
RS1 Event1 . . .
RS2 Event2 . . .
RS3 Event3 . . .
. . . . . . . . .
RSj EventMax . . .
7.2 Proposal Step 2: Apply a modified k-anonymity
We now apply a modified k-anonymity procedure to Table 3. We apply a range of feature
extraction approaches (from Security or Machine Learning). Each of these methods gives use a
candidate feature, F, with a group of rows, G.
We apply the following steps to determine if F is potentially a privacy violation.
1. get the set of ring signatures for group G
2. MinPID(F) = process this set of ring signatures to determine the minimum number of
identities in the group
3. If MinPID(F) ≤ k then feature F is a privacy violation and needs to be suppressed or
deleted.
If MinPID(F) > k, then F (independant of other features) can be considered anonymous since
in isolation we can associate a set of identities with it (at least k identities).
7.3 Properties of Table 3
Table 3 is a useful table for identifying pivots and IoCs.
Lets consider the situation where we have logs from 100 people and each person has 100
events in Table 3. Let the Ring imprecision parameter R = 5. Table 3 has 10,000 events. Lets
consider what an attacker who got the entire contents of Table 3 might do:
• They may try to extract information about a specific event. Due to the ring signature,
they have R = 5 unidentified people that it may come from.
• They may try to extract all the events for person Pi. They would get a collection of 100
events from Pi and a collection of 400 events which were not generated by person Pi.
All they could identify was that each event had a chance of 1
R of really being from some
unidentified person.
7.4 Light Weight Ring Signatures (LWRS)
Most Ring Signature approaches create large signatures; the size of the cryptographic signature
increases linearly with the number of people (identifiers) which you are anonymizing [13, Section
Efficiency]. This makes their use for large log files / large sets of people more difficult.
Many aspects of the above proposal can be satisfied by the following approach:
• Allocate each person a large prime (a few hundred bits);
• The ring signature for a set of people is the product of the primes for each person;
6
• Given two light weight ring signatures, we can determine if they have one or more people
in common by performing a greatest common divisor (GCD) operation.
If the GCD(LWRS1, LWRS2) = 1 then we know that these 2 rows came from different
identities. We can do pairwise GCD calculations to show a group of LWRS came from > k
identities.
7.5 Worked Example
We now apply the proposal to the example from Section 1.0.1.
The data:
Person Location data Event Data
------ -------- ----- ----
Alice Computer1 EventA-1 XYZZY-abc
Alice Computer1 EventA-2 XYZZY-abc
Alice Computer1 EventA-3 XYZZY-abc
...
Bob Computer2 EventB-1 XYZZY-def
Bob Computer2 EventB-2 XYZZY-def
Bob Computer2 EventB-3 XYZZY-def
...
Charlie Computer3 EventC-1 XYZZY-ghj
Charlie Computer3 EventC-2 XYZZY-ghj
Charlie Computer3 EventC-3 XYZZY-ghj
where
abc = encrypted(Alice)
def = encrypted(Bob)
ghj = encrypted(Charlie)
7.6 Step 1: Rewrite Identifiers with a Ring Signature
We assign the following primes2:
Alice 3
Bob 13
Charlie 19
We generate Light Weight Ring Signatures for each person.
This results in an intermediate data set:
LW Ring Signature Data
----------------- ----
3 x 11 x 23 XYZZY-abc
3 x 29 x 31 XYZZY-abc
3 x 29 x 37 XYZZY-abc
...
5 x 13 x 17 XYZZY-def
13 x 19 x 57 XYZZY-def
13 x 7 x 61 XYZZY-def
...
19 x 57 x 67 XYZZY-ghj
19 x 5 x 71 XYZZY-ghj
11 x 19 x 73 XYZZY-ghj
2
In this example, we use small primes, but it a real application we would use large primes with 200+ binary
digits.
7
7.7 Step 2: Apply a modified k-anonymity
We define the GCD of a feature:
GCD(F) = GCD(set of LWRS for Feature F)
We now evaluate the GCD for a range of features:
• “XYZZY-abc”
• “XYZZY-def”
• “XYZZY-ghi”
• “XYZZY”
• “abc”
• “def”
• “ghi”
The group of data associated with feature = ”XYZZY-abc” has
GCD(“XYZZY − abc′′
) = GCD(3x11x23, 3x29x31, 3x29x37) = 3
and hence there data rows most likely came from a single person. Thus this feature should be
rejected.
Similarly,
GCD(“XYZZY − def′′
) = 13 AND GCD(“XYZZY − ghj′′
) = 19
and hence these strings must not be retained.
When we apply common string algorithms to the data, we also consider the strings ”abc”,
”def”, ”ghjh” and ”XYZZY”. We find that
GCD(“abc′′
) = 3 AND GCD(“def′′
) = 13 AND GCD(“ghj′′
) = 19
so these strings must not be retained. We find
GCD(“XYZZY′′
) = 1
so this feature can be used - we know it comes from multiple people.
The final transformed data set is:
LW Ring Signature Data
----------------- ----
3 x 11 x 23 XYZZY
3 x 29 x 31 XYZZY
3 x 29 x 37 XYZZY
...
5 x 13 x 17 XYZZY
13 x 19 x 57 XYZZY
13 x 7 x 61 XYZZY
...
19 x 57 x 67 XYZZY
19 x 5 x 71 XYZZY
11 x 19 x 73 XYZZY
8
8 Conclusion
We have considered applying a range of privacy solutions to log files. We found that methods
such as differential privacy and k-anonymity are not suitable for log files. We make a proposal
that replaces personal identifiers with ring signatures when collecting log files. In particular we
offer a light weight ring signature proposal which significantly improves the privacy for collecting
log files while allowing processing of those log files for tasks such as identifying IoCs.
References
[1] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in
private data analysis,” in Theory of cryptography conference. Springer, 2006, pp. 265–284,
https://link.springer.com/content/pdf/10.1007/11681878 14.pdf.
[2] “Differential privacy,” https://en.wikipedia.org/wiki/Differential privacy, [Online; accessed
17-May-2020].
[3] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information:
k-anonymity and its enforcement through generalization and suppression,” 1998,
https://dataprivacylab.org/dataprivacy/projects/kanonymity/paper3.pdf.
[4] “K-anonymity,” https://en.wikipedia.org/wiki/K-anonymity, [Online; accessed 17-May-
2020].
[5] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proceedings of the forty-
first annual ACM symposium on Theory of computing, 2009, pp. 169–178.
[6] “Homomorphic encryption,” https://en.wikipedia.org/wiki/Homomorphic encryption,
[Online; accessed 17-May-2020].
[7] “Monero,” https://en.wikipedia.org/wiki/Monero, [Online; accessed 17-May-2020].
[8] A. C. Yao, “Protocols for secure computations,” in 23rd annual symposium on foundations
of computer science (sfcs 1982). IEEE, 1982, pp. 160–164.
[9] “Secure multi-party computation,” https://en.wikipedia.org/wiki/Secure multi-party computation,
[Online; accessed 17-May-2020].
[10] “Federated learning,” https://en.wikipedia.org/wiki/Federated learning, [Online; accessed
17-May-2020].
[11] “Secret sharing,” https://en.wikipedia.org/wiki/Secret sharing, [Online; accessed 17-May-
2020].
[12] Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel, “Unique in the crowd:
The privacy bounds of human mobility,” Scientific reports, vol. 3, no. 1, pp. 1–5, 2013,
https://www.nature.com/articles/srep01376.
[13] “Ring signature,” https://en.wikipedia.org/wiki/Ring signature, [Online; accessed 17-May-
2020].
9

More Related Content

Similar to Privacy log files

Data+security+sp10
Data+security+sp10Data+security+sp10
Data+security+sp10ismaelhaider
 
DATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLE
DATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLEDATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLE
DATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLEijdms
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data miningMesbah Uddin Khan
 
Data Sharing: Ensure Accountability Distribution in the Cloud
Data Sharing: Ensure Accountability Distribution in the CloudData Sharing: Ensure Accountability Distribution in the Cloud
Data Sharing: Ensure Accountability Distribution in the CloudSuraj Mehta
 
secure mining of association rules in horizontally distributed databases
secure mining of association rules in horizontally distributed databasessecure mining of association rules in horizontally distributed databases
secure mining of association rules in horizontally distributed databasesswathi78
 
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data MiningCollusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Miningdbpublications
 
Data Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto TechniqueData Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto TechniqueIJCSIS Research Publications
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data MiningIJMER
 
Op Sy 03 Ch 61a
Op Sy 03 Ch 61aOp Sy 03 Ch 61a
Op Sy 03 Ch 61a Google
 
Secured Authorized Deduplication Based Hybrid Cloud
Secured Authorized Deduplication Based Hybrid CloudSecured Authorized Deduplication Based Hybrid Cloud
Secured Authorized Deduplication Based Hybrid Cloudtheijes
 
E031102034039
E031102034039E031102034039
E031102034039theijes
 
Privacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential DatabasePrivacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential Databaseijdmtaiir
 
Bt0088 cryptography and network security2
Bt0088 cryptography and network security2Bt0088 cryptography and network security2
Bt0088 cryptography and network security2Techglyphs
 
report on network security fundamentals
report on network security fundamentalsreport on network security fundamentals
report on network security fundamentalsJassika
 
Secure Data Sharing Using Compact Summation key in Hybrid Cloud Storage
Secure Data Sharing Using Compact Summation key in Hybrid Cloud StorageSecure Data Sharing Using Compact Summation key in Hybrid Cloud Storage
Secure Data Sharing Using Compact Summation key in Hybrid Cloud StorageIOSR Journals
 
Improving Cloud Security Using Multi Level Encryption and Authentication
Improving Cloud Security Using Multi Level Encryption and AuthenticationImproving Cloud Security Using Multi Level Encryption and Authentication
Improving Cloud Security Using Multi Level Encryption and AuthenticationAM Publications,India
 

Similar to Privacy log files (20)

Security Center.pdf
Security Center.pdfSecurity Center.pdf
Security Center.pdf
 
01337277
0133727701337277
01337277
 
Intro to dbms
Intro to dbmsIntro to dbms
Intro to dbms
 
Data+security+sp10
Data+security+sp10Data+security+sp10
Data+security+sp10
 
DATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLE
DATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLEDATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLE
DATABASE PRIVATE SECURITY JURISPRUDENCE: A CASE STUDY USING ORACLE
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data mining
 
Data Sharing: Ensure Accountability Distribution in the Cloud
Data Sharing: Ensure Accountability Distribution in the CloudData Sharing: Ensure Accountability Distribution in the Cloud
Data Sharing: Ensure Accountability Distribution in the Cloud
 
secure mining of association rules in horizontally distributed databases
secure mining of association rules in horizontally distributed databasessecure mining of association rules in horizontally distributed databases
secure mining of association rules in horizontally distributed databases
 
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data MiningCollusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
 
Data Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto TechniqueData Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto Technique
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data Mining
 
Op Sy 03 Ch 61a
Op Sy 03 Ch 61aOp Sy 03 Ch 61a
Op Sy 03 Ch 61a
 
Secured Authorized Deduplication Based Hybrid Cloud
Secured Authorized Deduplication Based Hybrid CloudSecured Authorized Deduplication Based Hybrid Cloud
Secured Authorized Deduplication Based Hybrid Cloud
 
E031102034039
E031102034039E031102034039
E031102034039
 
Privacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential DatabasePrivacy-Preserving Updates to Anonymous and Confidential Database
Privacy-Preserving Updates to Anonymous and Confidential Database
 
Bt0088 cryptography and network security2
Bt0088 cryptography and network security2Bt0088 cryptography and network security2
Bt0088 cryptography and network security2
 
report on network security fundamentals
report on network security fundamentalsreport on network security fundamentals
report on network security fundamentals
 
K017115359
K017115359K017115359
K017115359
 
Secure Data Sharing Using Compact Summation key in Hybrid Cloud Storage
Secure Data Sharing Using Compact Summation key in Hybrid Cloud StorageSecure Data Sharing Using Compact Summation key in Hybrid Cloud Storage
Secure Data Sharing Using Compact Summation key in Hybrid Cloud Storage
 
Improving Cloud Security Using Multi Level Encryption and Authentication
Improving Cloud Security Using Multi Level Encryption and AuthenticationImproving Cloud Security Using Multi Level Encryption and Authentication
Improving Cloud Security Using Multi Level Encryption and Authentication
 

More from JonathanOliver26

HACT_Fast_Search_COINS_pub.pdf
HACT_Fast_Search_COINS_pub.pdfHACT_Fast_Search_COINS_pub.pdf
HACT_Fast_Search_COINS_pub.pdfJonathanOliver26
 
2019 TrustCom: The role of ML and AI in Security
2019 TrustCom: The role of ML and AI in Security2019 TrustCom: The role of ML and AI in Security
2019 TrustCom: The role of ML and AI in SecurityJonathanOliver26
 
Using lexigraphical distancing to block spam
Using lexigraphical distancing to block spamUsing lexigraphical distancing to block spam
Using lexigraphical distancing to block spamJonathanOliver26
 
Introduction to MML and Supervised Learning
Introduction to MML and Supervised LearningIntroduction to MML and Supervised Learning
Introduction to MML and Supervised LearningJonathanOliver26
 

More from JonathanOliver26 (6)

blackhole.pdf
blackhole.pdfblackhole.pdf
blackhole.pdf
 
HACT_Fast_Search_COINS_pub.pdf
HACT_Fast_Search_COINS_pub.pdfHACT_Fast_Search_COINS_pub.pdf
HACT_Fast_Search_COINS_pub.pdf
 
2021_TLSH_SOC_pub.pdf
2021_TLSH_SOC_pub.pdf2021_TLSH_SOC_pub.pdf
2021_TLSH_SOC_pub.pdf
 
2019 TrustCom: The role of ML and AI in Security
2019 TrustCom: The role of ML and AI in Security2019 TrustCom: The role of ML and AI in Security
2019 TrustCom: The role of ML and AI in Security
 
Using lexigraphical distancing to block spam
Using lexigraphical distancing to block spamUsing lexigraphical distancing to block spam
Using lexigraphical distancing to block spam
 
Introduction to MML and Supervised Learning
Introduction to MML and Supervised LearningIntroduction to MML and Supervised Learning
Introduction to MML and Supervised Learning
 

Recently uploaded

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 

Recently uploaded (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 

Privacy log files

  • 1. Limitations of Privacy Solutions for Log Files Jonathan Oliver jon oliver@trendmicro.com 31 August 2021 1 Introduction In this paper we are considering collecting log files (in particular log files for security purposes) and the storage / processing of those log files. Some use cases include: • Working with data which has PII (personally identifiable information) embedded in it. For example, data with email addresses in it. • When data is processed in a 3rd party country. For example, data which is collected in country A may be hosted on on cloud servers in country B. Complex situations may arise because the data may fall under the laws of country B. • Extracting IoCs (indicators of compromise) from data. We are interested in IoCs which are public knowledge and do not uniquely identify a victim. 1.0.1 Privacy Example Consider a situation with 3 people: Alice, Bob and Charlie. Each person generates log files which track various events which occur on their computers. Attackers send personalized malware with the string XYZZY (the malicious IoC) and the name of the victim encoded. So the logs look like Person Computer Event Data ------ -------- ----- ---- Alice Computer1 EventA-1 XYZZY-abc Alice Computer1 EventA-2 XYZZY-abc Alice Computer1 EventA-3 XYZZY-abc ... Bob Computer2 EventB-1 XYZZY-def Bob Computer2 EventB-2 XYZZY-def Bob Computer2 EventB-3 XYZZY-def ... Charlie Computer3 EventC-1 XYZZY-ghj Charlie Computer3 EventC-2 XYZZY-ghj Charlie Computer3 EventC-3 XYZZY-ghj where abc = encrypted(Alice) def = encrypted(Bob) ghj = encrypted(Charlie) We want to extract an IOC associated with this malware (XYZZY in this case) while maximising the privacy afforded to Alice / Bob / Charlie. This example is typical of various log files which are generated by security products such as: 1
  • 2. • Email logs; • Window events logs; • Firewall logs; • . . . 1.1 Desireable Properties We desire a privacy solution which allows us to collect the logs from various machines / computers and process it in a way that protects the privacy of the individuals. Specifically we want to do this in a way which meets our privacy requirements • Collect these logs from multiple computers into a single repository • Transform / delete parts of the data which identifies a person • Retaining data which occurs accross multiple people (and hence may be considered public data) in a reasonable ammount of computation. 1.2 Review of Privacy Approaches Here we give a review of the various privacy methods and attempt to apply them to our example above. here we distinguish between 2 types of data: • Descriptive data: which has one row per person (the majority of privacy methods ade- quately address this problem) • Log files: where a person may contribute multiple rows (typically many rows). This covers the various log files mentioned above (event logs, firewall logs, etc) and we discuss below why privacy solutions (such as differential privacy or k-anonymity) do not adeqautely address these types of data. 1.2.1 Descriptive Data A typical list of people might look like: Person Country Industry Id Name Email 1 Person A a@abc.company Argentina Accounting 2 Person B b@b.company Brazil Manufacturing . . . . . . 100 Person Z z@z.company USA Health This type of data can be made “private” using differential privacy or k-anonymity (well respected privacy approaches used around the world). 1.2.2 Log Files Log files consist of 2 seperate tables (explicitly or implicitly). Most log files take the form where the first table defines the people under consideration, and the second table defines events or transactions for each person in the first table. 2
  • 3. The first table is a list of people: Table 1 PID Col1 . . . ColMax1 P1 . . . . . . . . . PMax . . . Column 1 is a PID which defines each person. The second table is a list of events (or transactions) from the people in Table 1: Table 2 PID Event Id Col1 . . . ColMax2 P1 Event1 . . . P1 Event2 . . . P1 Event3 . . . . . . . . . . . . Pj EventMax . . . In the second table, we allow multiple events associated with a personal identifier. For example, Table 3 has 3 events associated with PID P1. 1.2.3 Privacy Approaches We review a range of privacy mechanisms in this paper, and consider how they can be applied to the log file problem. We consider: • Differential Privacy [1, 2] • k-anonymity [3, 4] • Homomorphic Encryption [5, 6] • Monero style privacy [7] • Secure Multiparty Computation [8, 9] (which also covers Federated Machine Learning [10]) • Secret Sharing Schemes [11] 1.2.4 Privacy Operations The operations used by privacy mechanisms (including those listed above) include: • Suppressing data (either deleting it or replacing it with NULL values); • Generalizing data (example transforming a persons age into an age range); • Encrypting data; • Hashing data; and • Adding errors to data. 3
  • 4. 2 Differential Privacy Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Consider the situation where we have a data row of interest. If errors are added in a systematic way so that you get similar or the same answers with / without the row in question, then we have protected the privacy of that row. The definition and maths can extend to making 2 rows, 3 rows, ... private. This covers the case that we may want to allow groups of individuals up to some size N to remain private. So given N a maximum number of rows that we need to make private at once, we can determine the error distribution to achieve that. Differential Privacy is not suited for the log file problem. The amount error required to achieve privacy on a log file depends on the number of rows which which may be associated with a person. So a email log file for 1 day, might contain 100 emails from a user. To ensure the privacy of this data would require an extra-ordinary ammount of error to be added, and almost certainly make any analysis useless. 3 K-Anonymity k-anonymity is a property possessed by certain anonymized data. A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k − 1 individuals whose information also appear in the release. k-anonymity does appear to be relevant to the log file problem. 3.1 Limitations K-Anonymity k-anonymity suffers from the following limitations: • Background knowledge may be available that is not in the dataset which allows identifi- cation. • k-anonymity is not a good method to anonymize high dimensional data For example, researchers from MIT [12] showed that, given 4 locations, the unicity 1 of mobile phone timestamp-location datasets can be as high as 95 k-anonymity is not suited for the log file problem, or checking IoCs. The k value in k- anonymity needs to be replaced by the MaxRows that we associate with a person. So if we are analysing network logs where a single user has 100 rows, then we would need to apply k-anonymity with k = 100 which would probably result in nearly all data in the log being suppressed. 4 Homomorphic Encryption Homomorphic Encryption involves doing computation on encrypted data. Microsoft in 2012 re- ported a slow down of 6-7 orders of magnitude (https://www.microsoft.com/en-us/research/wp- content/uploads/2016/02/323.pdf). UPenn in 2016 reported a slow down of 9 orders of magni- tude (https://haeberlen.cis.upenn.edu/papers/seabed-osdi2016.pdf). It would appear that Ho- momorphic Encryption is not yet feasible for working with data at scale or processing large log files. 1 Unicity is measured by the number of points needed to uniquely identify an individual in a data set. 4
  • 5. 5 Monero Style Privacy Monero is a crypto-currency where the key features are those around privacy and anonymity: • The value of transactions is obfuscated. • Sending addresses are hidden in combination with other addresses (in a ”ring signature”) so it is not clear exactly who sent a transaction. • Receiving addresses are hidden using stealth addresses which are generated using a secret sharing scheme. There has been a back and forth between Monero and researchers who have pointed out privacy concerns in the approaches used by Monero. More recently (September 2020), the United States IRS posted a USD $625,000 bounty to a company to develop tools to help trace Monero and related crypto-currencies. 6 Secure Multi-party Computation / Federated Learning The example in Section 1.0.1 high-lights the problem with Federated Learning. • A learner at Computer1 cannot distinguish between the IoC (XYZZY) and an encoded version of the first victim (abc). • A learner at Computer2 cannot distinguish between the IoC (XYZZY) and an encoded version of the second victim (def). • A learner at Computer3 cannot distinguish between the IoC (XYZZY) and an encoded version of the thrid victim (ghj). We need to merge the records from different people to identify which elements are private and which elements are suitable as public IoCs. But the very process of merging the records breaks the very privacy that we are attempting to create. 7 An Approach for Making Log Files Private 7.1 Proposal Step 1: Rewrite Identifiers with a Ring Signature We may have sensitive data sets where we want/need to replace a personal identifier with another token for the purposes of clustering / pivoting / identifying IoCs / etc. The problematic table in a log file is Table 2: Table 2 PID Event Id Col1 . . . ColMax2 P1 Event1 . . . P1 Event2 . . . P1 Event3 . . . . . . . . . . . . Pj EventMax . . . We replace the PID with a Ring Signature for that data row. We define a parameter R to determine how imprecise each Ring Signature will be. The Ring Signature for EventE which came from person Pi should be created by 1. SetE = randomly generate a set of R − 1 people; 5
  • 6. 2. RSE = generate a ring signature for the set Pi + SetE This gives us the following Table: Table 3 Ring Event Id Col1 . . . ColMax2 Signature RS1 Event1 . . . RS2 Event2 . . . RS3 Event3 . . . . . . . . . . . . RSj EventMax . . . 7.2 Proposal Step 2: Apply a modified k-anonymity We now apply a modified k-anonymity procedure to Table 3. We apply a range of feature extraction approaches (from Security or Machine Learning). Each of these methods gives use a candidate feature, F, with a group of rows, G. We apply the following steps to determine if F is potentially a privacy violation. 1. get the set of ring signatures for group G 2. MinPID(F) = process this set of ring signatures to determine the minimum number of identities in the group 3. If MinPID(F) ≤ k then feature F is a privacy violation and needs to be suppressed or deleted. If MinPID(F) > k, then F (independant of other features) can be considered anonymous since in isolation we can associate a set of identities with it (at least k identities). 7.3 Properties of Table 3 Table 3 is a useful table for identifying pivots and IoCs. Lets consider the situation where we have logs from 100 people and each person has 100 events in Table 3. Let the Ring imprecision parameter R = 5. Table 3 has 10,000 events. Lets consider what an attacker who got the entire contents of Table 3 might do: • They may try to extract information about a specific event. Due to the ring signature, they have R = 5 unidentified people that it may come from. • They may try to extract all the events for person Pi. They would get a collection of 100 events from Pi and a collection of 400 events which were not generated by person Pi. All they could identify was that each event had a chance of 1 R of really being from some unidentified person. 7.4 Light Weight Ring Signatures (LWRS) Most Ring Signature approaches create large signatures; the size of the cryptographic signature increases linearly with the number of people (identifiers) which you are anonymizing [13, Section Efficiency]. This makes their use for large log files / large sets of people more difficult. Many aspects of the above proposal can be satisfied by the following approach: • Allocate each person a large prime (a few hundred bits); • The ring signature for a set of people is the product of the primes for each person; 6
  • 7. • Given two light weight ring signatures, we can determine if they have one or more people in common by performing a greatest common divisor (GCD) operation. If the GCD(LWRS1, LWRS2) = 1 then we know that these 2 rows came from different identities. We can do pairwise GCD calculations to show a group of LWRS came from > k identities. 7.5 Worked Example We now apply the proposal to the example from Section 1.0.1. The data: Person Location data Event Data ------ -------- ----- ---- Alice Computer1 EventA-1 XYZZY-abc Alice Computer1 EventA-2 XYZZY-abc Alice Computer1 EventA-3 XYZZY-abc ... Bob Computer2 EventB-1 XYZZY-def Bob Computer2 EventB-2 XYZZY-def Bob Computer2 EventB-3 XYZZY-def ... Charlie Computer3 EventC-1 XYZZY-ghj Charlie Computer3 EventC-2 XYZZY-ghj Charlie Computer3 EventC-3 XYZZY-ghj where abc = encrypted(Alice) def = encrypted(Bob) ghj = encrypted(Charlie) 7.6 Step 1: Rewrite Identifiers with a Ring Signature We assign the following primes2: Alice 3 Bob 13 Charlie 19 We generate Light Weight Ring Signatures for each person. This results in an intermediate data set: LW Ring Signature Data ----------------- ---- 3 x 11 x 23 XYZZY-abc 3 x 29 x 31 XYZZY-abc 3 x 29 x 37 XYZZY-abc ... 5 x 13 x 17 XYZZY-def 13 x 19 x 57 XYZZY-def 13 x 7 x 61 XYZZY-def ... 19 x 57 x 67 XYZZY-ghj 19 x 5 x 71 XYZZY-ghj 11 x 19 x 73 XYZZY-ghj 2 In this example, we use small primes, but it a real application we would use large primes with 200+ binary digits. 7
  • 8. 7.7 Step 2: Apply a modified k-anonymity We define the GCD of a feature: GCD(F) = GCD(set of LWRS for Feature F) We now evaluate the GCD for a range of features: • “XYZZY-abc” • “XYZZY-def” • “XYZZY-ghi” • “XYZZY” • “abc” • “def” • “ghi” The group of data associated with feature = ”XYZZY-abc” has GCD(“XYZZY − abc′′ ) = GCD(3x11x23, 3x29x31, 3x29x37) = 3 and hence there data rows most likely came from a single person. Thus this feature should be rejected. Similarly, GCD(“XYZZY − def′′ ) = 13 AND GCD(“XYZZY − ghj′′ ) = 19 and hence these strings must not be retained. When we apply common string algorithms to the data, we also consider the strings ”abc”, ”def”, ”ghjh” and ”XYZZY”. We find that GCD(“abc′′ ) = 3 AND GCD(“def′′ ) = 13 AND GCD(“ghj′′ ) = 19 so these strings must not be retained. We find GCD(“XYZZY′′ ) = 1 so this feature can be used - we know it comes from multiple people. The final transformed data set is: LW Ring Signature Data ----------------- ---- 3 x 11 x 23 XYZZY 3 x 29 x 31 XYZZY 3 x 29 x 37 XYZZY ... 5 x 13 x 17 XYZZY 13 x 19 x 57 XYZZY 13 x 7 x 61 XYZZY ... 19 x 57 x 67 XYZZY 19 x 5 x 71 XYZZY 11 x 19 x 73 XYZZY 8
  • 9. 8 Conclusion We have considered applying a range of privacy solutions to log files. We found that methods such as differential privacy and k-anonymity are not suitable for log files. We make a proposal that replaces personal identifiers with ring signatures when collecting log files. In particular we offer a light weight ring signature proposal which significantly improves the privacy for collecting log files while allowing processing of those log files for tasks such as identifying IoCs. References [1] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of cryptography conference. Springer, 2006, pp. 265–284, https://link.springer.com/content/pdf/10.1007/11681878 14.pdf. [2] “Differential privacy,” https://en.wikipedia.org/wiki/Differential privacy, [Online; accessed 17-May-2020]. [3] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” 1998, https://dataprivacylab.org/dataprivacy/projects/kanonymity/paper3.pdf. [4] “K-anonymity,” https://en.wikipedia.org/wiki/K-anonymity, [Online; accessed 17-May- 2020]. [5] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proceedings of the forty- first annual ACM symposium on Theory of computing, 2009, pp. 169–178. [6] “Homomorphic encryption,” https://en.wikipedia.org/wiki/Homomorphic encryption, [Online; accessed 17-May-2020]. [7] “Monero,” https://en.wikipedia.org/wiki/Monero, [Online; accessed 17-May-2020]. [8] A. C. Yao, “Protocols for secure computations,” in 23rd annual symposium on foundations of computer science (sfcs 1982). IEEE, 1982, pp. 160–164. [9] “Secure multi-party computation,” https://en.wikipedia.org/wiki/Secure multi-party computation, [Online; accessed 17-May-2020]. [10] “Federated learning,” https://en.wikipedia.org/wiki/Federated learning, [Online; accessed 17-May-2020]. [11] “Secret sharing,” https://en.wikipedia.org/wiki/Secret sharing, [Online; accessed 17-May- 2020]. [12] Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel, “Unique in the crowd: The privacy bounds of human mobility,” Scientific reports, vol. 3, no. 1, pp. 1–5, 2013, https://www.nature.com/articles/srep01376. [13] “Ring signature,” https://en.wikipedia.org/wiki/Ring signature, [Online; accessed 17-May- 2020]. 9