This document discusses data explosion and the lack of privacy protections for personally identifiable information. It provides examples of how much user data companies collect and store. It then discusses how anonymizing data by removing personally identifying information like names is not enough to protect privacy, as external data can be linked to the anonymized data using quasi-identifiers like zip codes, dates of birth, and other attributes. Methods for achieving k-anonymity are discussed, but these are shown to still leak private information through various attacks. The document examines different privacy models and their limitations in protecting user privacy.
To Support Digital India, We are trying to enforce the security on the web and digital Information. This Slides provide you basic as well as advance knowledge of security model. Model covered in this slides are Chinese Wall, Clark-Wilson, Biba, Harrison-Ruzzo-Ullman Model, Bell-LaPadula Model etc.
Types of Access Control.
To Support Digital India, We are trying to enforce the security on the web and digital Information. This Slides provide you basic as well as advance knowledge of security model. Model covered in this slides are Chinese Wall, Clark-Wilson, Biba, Harrison-Ruzzo-Ullman Model, Bell-LaPadula Model etc.
Types of Access Control.
A hash function usually means a function that compresses, meaning the output is shorter than the input
A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash).
The hash value is representative of the original string of characters, but is normally smaller than the original.
This term is also known as a hashing algorithm or message digest function.
Hash functions also called message digests or one-way encryption or hashing algorithm.
http://phpexecutor.com
CS8792 - Cryptography and Network Securityvishnukp34
this is an engineering subject.this consist of
pgno: 5 - Information security in past & present
pgno: 7 - Aim of Course
pgno: 8 - OSI Security Architecture
pgno: 9 - Security Goals – CIA Triad
pgno: 13 - Aspects of Security
pgno: 17 - ATTACKS
pgno: 22 - Passive Versus Active Attacks
pgno: 23 - SERVICES AND MECHANISMS
Pretty Good Privacy (PGP) is strong encryption software that enables you to protect your email and files by scrambling them so others cannot read them. It also allows you to digitally "sign" your messages in a way that allows others to verify that a message was actually sent by you. PGP is available in freeware and commercial versions all over the world.
PGP was first released in 1991 as a DOS program that earned a reputation for being difficult. In June 1997, PGP Inc. released PGP 5.x for Win95/NT. PGP 5.x included plugins for several popular email programs.
These slides guides you through the tools and techniques one can use for footprinting websites or people.You will find amazing tools and techniques have a look
Symmetric encryption and message confidentialityCAS
Symmetric Encryption Principles
Data Encryption Standard
Advanced Encryption Standard
Stream Ciphers and RC4
Cipher Block Modes of Operation
Key Distribution
Distinct l diversity anonymization of set valued dataRohan Khude
We are proposing the use of l-diversity which uses the sensitivity for generalizing the data when anonymized and also a algorithm for hiding the sensitive attribute information which reveals identity of one’s individual in situations like homogeneity and background knowledge.
A hash function usually means a function that compresses, meaning the output is shorter than the input
A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash).
The hash value is representative of the original string of characters, but is normally smaller than the original.
This term is also known as a hashing algorithm or message digest function.
Hash functions also called message digests or one-way encryption or hashing algorithm.
http://phpexecutor.com
CS8792 - Cryptography and Network Securityvishnukp34
this is an engineering subject.this consist of
pgno: 5 - Information security in past & present
pgno: 7 - Aim of Course
pgno: 8 - OSI Security Architecture
pgno: 9 - Security Goals – CIA Triad
pgno: 13 - Aspects of Security
pgno: 17 - ATTACKS
pgno: 22 - Passive Versus Active Attacks
pgno: 23 - SERVICES AND MECHANISMS
Pretty Good Privacy (PGP) is strong encryption software that enables you to protect your email and files by scrambling them so others cannot read them. It also allows you to digitally "sign" your messages in a way that allows others to verify that a message was actually sent by you. PGP is available in freeware and commercial versions all over the world.
PGP was first released in 1991 as a DOS program that earned a reputation for being difficult. In June 1997, PGP Inc. released PGP 5.x for Win95/NT. PGP 5.x included plugins for several popular email programs.
These slides guides you through the tools and techniques one can use for footprinting websites or people.You will find amazing tools and techniques have a look
Symmetric encryption and message confidentialityCAS
Symmetric Encryption Principles
Data Encryption Standard
Advanced Encryption Standard
Stream Ciphers and RC4
Cipher Block Modes of Operation
Key Distribution
Distinct l diversity anonymization of set valued dataRohan Khude
We are proposing the use of l-diversity which uses the sensitivity for generalizing the data when anonymized and also a algorithm for hiding the sensitive attribute information which reveals identity of one’s individual in situations like homogeneity and background knowledge.
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)OpenAIRE
Presentation by Manolis Terrovitis, Athena Research Center, "Amnesia: Data anonymization made easy", at the 8th OpenAIRE workshop, Barcelona, 4 April 2017.
Northeastern Ohio nonprofit innovators met for first annual Big Data for a Better World conference on November 16 at Hyland Software's sprawling Westake, Ohio campus. Leading Hands Through Technology (LHTT) and Workman’s Circle teamed up to offer the event so local nonprofits could discuss how analytics could be successfully used to keep their organization profitable and ultimately improve the community.
Michael J. Berens presents the first part of the free, two-day webinar, "Data Journalism 101," hosted by the Donald W. Reynolds National Center for Business Journalism.
For access to the webinar materials, visit http://bit.ly/datajourn101.
For more information about training for business journalists, please visit http://businessjournalism.org
What is big data, and what are its potential benefits and risks?
Presentation given by Sir Mark Walport at the Oxford Martin School on 3 December 2013.
Managing Confidential Information – Trends and ApproachesMicah Altman
Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication.
Trends in information collection and management -- cloud storage, "big" data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
2. INDEX
• Statistics and Lack of barriers in Collection
and Distribution of Person-specific
information
• Mathematical model for characterizing and
comparing real-world data sharing practices
and policies and for computing privacy and
risk measurements
• Demographics and Uniqueness
3. How much data?
• Google processes 20 PB a day (2008)
• Wayback Machine has 3 PB + 100 TB/month (3/2009)
• Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
• CERN’s Large Hydron Collider (LHC) generates 15 PB
a year
640K ought to be
enough for anybody.
4. Big Data EveryWhere!
• Lots of data is being collected
and warehoused
– Web data, e-commerce
– purchases at department/
grocery stores
– Bank/Credit Card
transactions
– Social Network
5. Background
• Large amount of person-specific data has been
collected in recent years
– Both by governments and by private entities
• Data and knowledge extracted by data mining
techniques represent a key asset to the society
– Analyzing trends and patterns.
– Formulating public policies
• Laws and regulations require that some
collected data must be made public
– For example, Census data
6. • Health-care datasets
– Clinical studies, hospital discharge databases …
• Genetic datasets
– $1000 genome, HapMap, deCode …
• Demographic datasets
– U.S. Census Bureau, sociology studies …
• Search logs, recommender systems, social
networks, blogs …
– AOL search data, social networks of blogging
sites, Netflix movie ratings, Amazon …
Public Data Conundrum
7. • First thought: anonymize the data
• How?
• Remove “personally identifying information”
(PII)
– Name, Social Security number, phone number,
email, address… what else?
– Anything that identifies the person directly
• Is this enough?
What About Privacy?
8. Re-identification by Linking
Name Zipcode Age Sex
Alice 47677 29 F
Bob 47983 65 M
Carol 47677 22 F
Dan 47532 23 M
Ellen 46789 43 F
Voter registration data
QID SA
Zipcode Age Sex Disease
47677 29 F Ovarian Cancer
47602 22 F Ovarian Cancer
47678 27 M Prostate Cancer
47905 43 M Flu
47909 52 F Heart Disease
47906 47 M Heart Disease
ID
Name
Alice
Betty
Charles
David
Emily
Fred
Microdata
10. Quasi-Identifiers
• Key attributes
– Name, address, phone number - uniquely
identifying!
– Always removed before release
• Quasi-identifiers
– (5-digit ZIP code, birth date, gender) uniquely
identify 87% of the population in the U.S.
– Can be used for linking anonymized dataset with
other datasets
11. Classification of Attributes
• Sensitive attributes
– Medical records, salaries, etc.
– These attributes is what the researchers need, so they are
always released directly
Name DOB Gender Zipcode Disease
Andre 1/21/76 Male 53715 Heart Disease
Beth 4/13/86 Female 53715 Hepatitis
Carol 2/28/76 Male 53703 Brochitis
Dan 1/21/76 Male 53703 Broken Arm
Ellen 4/13/86 Female 53706 Flu
Eric 2/28/76 Female 53706 Hang Nail
Key Attribute Quasi-identifier Sensitive attribute
12. K-Anonymity: Intuition
• The information for each person contained in the
released table cannot be distinguished from at
least k-1 individuals whose information also
appears in the release
– Example: you try to identify a man in the released
table, but the only information you have is his birth
date and gender. There are k men in the table with the
same birth date and gender.
• Any quasi-identifier present in the released table
must appear in at least k records
13. K-Anonymity Protection Model
• Private table
• Released table: RT
• Attributes: A1, A2, …, An
• Quasi-identifier subset: Ai, …, Aj
14. Male Female
*
476**
47677 4767847602
2*
29 2722
ZIP code Age Sex
Generalization
• Goal of k-Anonymity
– Each record is indistinguishable from at least k-1
other records
– These k records form an equivalence class
• Generalization: replace quasi-identifiers with
less specific, but semantically consistent
values
15. Achieving k-Anonymity
• Generalization
– Replace specific quasi-identifiers with less specific
values until get k identical values
– Partition ordered-value domains into intervals
• Suppression
– When generalization causes too much information loss
• This is common with “outliers”
• Lots of algorithms in the literature
– Aim to produce “useful” anonymizations
… usually without any clear notion of utility
18. Name Birth Gender ZIP Race
Andre 1964 m 02135 White
Beth 1964 f 55410 Black
Carol 1964 f 90210 White
Dan 1967 m 02174 White
Ellen 1968 f 02237 White
Released table External data Source
By linking these 2 tables, you still don’t learn Andre’s problem
Example of Generalization (1)
19. QID SA
Zipcode Age Sex Disease
47677 29 F Ovarian Cancer
47602 22 F Ovarian Cancer
47678 27 M Prostate Cancer
47905 43 M Flu
47909 52 F Heart Disease
47906 47 M Heart Disease
QID SA
Zipcode Age Sex Disease
476**
476**
476**
2*
2*
2*
*
*
*
Ovarian Cancer
Ovarian Cancer
Prostate Cancer
4790*
4790*
4790*
[43,52]
[43,52]
[43,52]
*
*
*
Flu
Heart Disease
Heart Disease
Microdata Generalized table
Example of Generalization (2)
• Released table is 3-anonymous
• If the adversary knows Alice’s quasi-identifier
(47677, 29, F), he still does not know which of
the first 3 records corresponds to Alice’s record
!!
20. Curse of Dimensionality
• Generalization fundamentally relies
on spatial locality
– Each record must have k close neighbors
• Real-world datasets are very sparse
– Many attributes (dimensions)
– “Nearest neighbor” is very far
• Projection to low dimensions loses all info
k-anonymized datasets are useless
[Aggarwal VLDB ‘05]
21. HIPAA Privacy Rule
“The identifiers that must be removed include direct identifiers, such as
name, street address, social security number, as well as other identifiers, such
as birth date, admission and discharge dates, and five-digit zip code. The safe
harbor requires removal of geographic subdivisions smaller than a State,
except for the initial three digits of a zip code if the geographic unit formed by
combining all zip codes with the same initial three digits contains more than
20,000 people. In addition, age, if less than 90, gender, ethnicity, and other
demographic information not listed may remain in the information. The safe
harbor is intended to provide covered entities with a simple, definitive
method that does not require much judgment by the covered entity to
determine if the information is adequately de-identified."
"Under the safe harbor method, covered entities must remove all of a list of
18 enumerated identifiers and have no actual knowledge that the information
remaining could be used, alone or in combination, to identify a subject of the
information."
22. Two (and a Half) Interpretations
• Membership disclosure: Attacker cannot tell
that a given person in the dataset
• Sensitive attribute disclosure: Attacker cannot
tell that a given person has a certain sensitive
attribute
• Identity disclosure: Attacker cannot tell which
record corresponds to a given person
23. Unsorted Matching Attack
• Problem: records appear in the same order in
the released table as in the original table
• Solution: randomize order before releasing
24. Complementary Release Attack
• Different releases of the same private table can be
linked together to compromise k-anonymity
26. Zipcode Age Disease
476** 2* Heart Disease
476** 2* Heart Disease
476** 2* Heart Disease
4790* ≥40 Flu
4790* ≥40 Heart Disease
4790* ≥40 Cancer
476** 3* Heart Disease
476** 3* Cancer
476** 3* Cancer
A 3-anonymous patient table
Bob
Zipcode Age
47678 27
Carl
Zipcode Age
47673 36
Homogeneity attack
Background knowledge attack
Attacks on k-Anonymity
• k-Anonymity does not provide privacy if
– Sensitive values in an equivalence class lack diversity
– The attacker has background knowledge
27. Caucas 787XX Flu
Caucas 787XX Shingles
Caucas 787XX Acne
Caucas 787XX Flu
Caucas 787XX Acne
Caucas 787XX Flu
Asian/AfrAm 78XXX Flu
Asian/AfrAm 78XXX Flu
Asian/AfrAm 78XXX Acne
Asian/AfrAm 78XXX Shingles
Asian/AfrAm 78XXX Acne
Asian/AfrAm 78XXX Flu
Sensitive attributes must be
“diverse” within each
quasi-identifier equivalence class
[Machanavajjhala et al. ICDE ‘06]
l-Diversity
28. Distinct l-Diversity
• Each equivalence class has at least l well-
represented sensitive values
• Doesn’t prevent probabilistic inference attacks
Disease
...
HIV
HIV
HIV
pneumonia
...
...
bronchitis
...
10 records
8 records have HIV
2 records have other values
29. Other Versions of l-Diversity
• Probabilistic l-diversity
– The frequency of the most frequent value in an
equivalence class is bounded by 1/l
• Entropy l-diversity
– The entropy of the distribution of sensitive values
in each equivalence class is at least log(l)
• Recursive (c,l)-diversity
– r1<c(rl+rl+1+…+rm) where ri is the frequency of the
ith most frequent value
– Intuition: the most frequent value does not appear
too frequently
30. … Cancer
… Cancer
… Cancer
… Flu
… Cancer
… Cancer
… Cancer
… Cancer
… Cancer
… Cancer
… Flu
… Flu
Original dataset
Q1 Flu
Q1 Cancer
Q1 Cancer
Q1 Cancer
Q1 Cancer
Q1 Cancer
Q2 Cancer
Q2 Cancer
Q2 Cancer
Q2 Cancer
Q2 Flu
Q2 Flu
Anonymization B
Q1 Flu
Q1 Flu
Q1 Cancer
Q1 Flu
Q1 Cancer
Q1 Cancer
Q2 Cancer
Q2 Cancer
Q2 Cancer
Q2 Cancer
Q2 Cancer
Q2 Cancer
Anonymization A
99% have cancer
50% cancer quasi-identifier group is “diverse”
This leaks a ton of information
99% cancer quasi-identifier group is not “diverse”
…yet anonymized database does not leak anything
Neither Necessary, Nor Sufficient
31. Limitations of l-Diversity
• Example: sensitive attribute is HIV+ (1%) or
HIV- (99%)
– Very different degrees of sensitivity!
• l-diversity is unnecessary
– 2-diversity is unnecessary for an equivalence class that
contains only HIV- records
• l-diversity is difficult to achieve
– Suppose there are 10000 records in total
– To have distinct 2-diversity, there can be at most
10000*1%=100 equivalence classes
32. Skewness Attack
• Example: sensitive attribute is HIV+ (1%) or
HIV- (99%)
• Consider an equivalence class that contains an
equal number of HIV+ and HIV- records
– Diverse, but potentially violates privacy!
• l-diversity does not differentiate:
– Equivalence class 1: 49 HIV+ and 1 HIV-
– Equivalence class 2: 1 HIV+ and 49 HIV-l-diversity does not consider overall distribution of sensitive values!
33. Bob
Zip Age
47678 27
Zipcode Age Salary Disease
476** 2* 20K Gastric Ulcer
476** 2* 30K Gastritis
476** 2* 40K Stomach Cancer
4790* ≥40 50K Gastritis
4790* ≥40 100K Flu
4790* ≥40 70K Bronchitis
476** 3* 60K Bronchitis
476** 3* 80K Pneumonia
476** 3* 90K Stomach Cancer
A 3-diverse patient table
Conclusion
1. Bob’s salary is in [20k,40k],
which is relatively low
2. Bob has some stomach-related
disease
l-diversity does not consider semantics of sensitive values!
Similarity attack
Sensitive Attribute Disclosure
34. Caucas 787XX Flu
Caucas 787XX Shingles
Caucas 787XX Acne
Caucas 787XX Flu
Caucas 787XX Acne
Caucas 787XX Flu
Asian/AfrAm 78XXX Flu
Asian/AfrAm 78XXX Flu
Asian/AfrAm 78XXX Acne
Asian/AfrAm 78XXX Shingles
Asian/AfrAm 78XXX Acne
Asian/AfrAm 78XXX Flu
[Li et al. ICDE ‘07]
Distribution of sensitive
attributes within each
quasi-identifier group should
be “close” to their distribution
in the entire original database
Trick question: Why publish
quasi-identifiers at all??
t-Closeness
36. Caucas 787XX HIV+ Flu
Asian/AfrAm 787XX HIV- Flu
Asian/AfrAm 787XX HIV+ Shingles
Caucas 787XX HIV- Acne
Caucas 787XX HIV- Shingles
Caucas 787XX HIV- Acne
Bob is Caucasian and
I heard he was
admitted to hospital
with flu…
This is against the rules!
“flu” is not a quasi-identifier
Yes… and this is yet another
problem with k-anonymity
What Does Attacker Know?
37. AOL Privacy Debacle
• In August 2006, AOL released anonymized search
query logs
– 657K users, 20M queries over 3 months (March-May)
• Opposing goals
– Analyze data for research purposes, provide better
services for users and advertisers
– Protect privacy of AOL users
• Government laws and regulations
• Search queries may reveal income, evaluations, intentions to
acquire goods and services, etc.
38. AOL User 4417749
• AOL query logs have the form
<AnonID, Query, QueryTime, ItemRank,
ClickURL>
– ClickURL is the truncated URL
• NY Times re-identified AnonID 4417749
– Sample queries: “numb fingers”, “60 single men”, “dog
that urinates on everything”, “landscapers in Lilburn,
GA”, several people with the last name Arnold
• Lilburn area has only 14 citizens with the last name Arnold
– NYT contacts the 14 citizens, finds out AOL User
4417749 is 62-year-old Thelma Arnold
39. k-Anonymity Considered Harmful
• Syntactic
– Focuses on data transformation, not on what can
be learned from the anonymized dataset
– “k-anonymous” dataset can leak sensitive
information
• “Quasi-identifier” fallacy
– Assumes a priori that attacker will not
know certain information about his target
• Relies on locality
– Destroys utility of many real-world datasets
41. Demography—from the Greek demo, for people, and graphy, for writing,
or analysis, or field of study, is “the study of changes (such as the
number of births, deaths, marriages, and illnesses) that occur over a
period of time in human populations”, or just “science of population”
What is demography?
53. Some Other Scenarios
Clinical Trial Deviations
Originally Viagra was developed to lower blood pressure and treat Angina
Now its used to help newborn pulmonary hypertension and altitude sickness
Incidence Prediction
Missed 4 or more visits, twice as likely to have an asthmatic incident
Particular Cardiac monitor sine wave points to highly likelihood of heart attack
Campaigns
Social media and advertising campaigns to understand user behavior and sentiment
Patient Satisfaction
Social media and advertising campaigns to understand user behavior and sentiment
54. Auditing is critical component HIPAA in ensuring patient privacy
1 Billion rows+ of audit data
146 mission critical clinical applications
Comprehensive audits yield 300-500k transactions/day
HIPAA requires audit system with 20 years of data
Auditing Project
Available to community as part of Compliance SDK
Updating for SQL Server 2012, HDInsight, Power View, and MobileBI*