6. Problem Entities
Entity Dataset
Distributor
T
Set of all valuable data
Agents
U1, …, Un
R1, …, Rn
Ri: Subset of records from T received by
an agent Ui
Leaker
S
Set of leaked data
6
7. Agent’s Data Requests
• Sample
– Ri = SAMPLE(T, mi) i.e. Any subset of mi records
from T can be given to Ui.
• Explicit
– Ri = EXPLICIT(T, conditioni) i.e. Ui receives all T
records that satisfy some condition.
7
9. Guilt Models (1/3)
p: posterior probability that a leaked profile
comes from other sources
9
p
Other Sources
e.g. Sarah’s
Network
p
Guilty Agent: Agent who leaks at least one profile
Pr{Gi|S}: probability that agent Ui is guilty, given
the leaked set of profiles S
10. Guilt Models (2/3)
10
or
or
Agents leak each of their
data items independently
Agents leak all their data
items OR nothing
or
p2
p(1-p)
(1-p)p
(1-p)2
13. The Distributor’s Objective (1/2)
S (leaked)
13
U1
U2
U3
U4
R1
R1
R3
Pr{G1|S}>>Pr{G2|S}
Pr{G1|S}>> Pr{G4|S}
R2
R3
R4
14. The Distributor’s Objective (2/2)
• To achieve his objective the distributor has to
distribute sets Ri, …, Rn that
minimize
• Intuition: Minimized data sharing among
agents makes leaked data reveal the guilty
agents
14
R R i j n
R i j i
i j
i
, , 1,...,
1
15. Distribution Strategies – Sample (1/4)
• Set T has four profiles:
– Kiran, John, Sarah and Mark
• There are 4 agents:
– U1, U2, U3 and U4
• Each agent requests a sample of any 2 profiles
of T for a market survey
15
16. Distribution Strategies – Sample (2/4)
Poor
Minimize R
R
i j i j
16
U1
U2
U3
U4
U1
U2
U3
U4
17. Distribution Strategies – Sample (3/4)
• Optimal Distribution
• Avoid full overlaps and minimize
17
U1
U2
U3
U4
1
i j i
i j
i
R R
R
18. Distribution Strategies
Sample Data Requests
• The distributor has the freedom
to select the data items to
provide the agents with
• General Idea:
– Provide agents with as
much disjoint sets of data as
possible
• Problem: There are cases where
the distributed data must
overlap E.g., |Ri|+…+|Rn|>|T|
Explicit Data Requests
• The distributor must provide
agents with the data they request
• General Idea:
– Add fake data to the
distributed ones to minimize
overlap of distributed data
• Problem: Agents can collude and
identify fake data
18
19. Conclusions
• Modeled as maximum likelihood problem
• Data distribution strategies that help identify
the guilty agents
19
20. References
• [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases, ”Proc. 28th
Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166,
2002.
• [2] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection for Relational Data,”
IEEE Trans. Knowledge And Data Engineering , vol. 16, no. 12, Dec. 2004.
• [3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization of
Data Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. den
Bussche and V. Vianu, eds.,pp. 316-330, Jan. 2001.
• [4] P.Buneman and W.-C. Tan “Provenance in Databases,” Proc. ACM SIGMOD,
pp. 1171-1173, 2007.
• [5] Y.Cui and J. Widom, “Lineage Tracing for General Data Warehouse
Transformations,” The VLDB J., vol. 12, pp. 41-58, 2003.
• [6] S.Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio
Watermarking,” http://www.scientificcommons. org/43025658, 2007.
21. References
• [7] Jen-Sheng, Win-Bin Huang,Chao-Lieh Chen, Yau-Hwang Kuo, “A Feature-
Based Digital Image Watermarking For Copyright Protection and Content
Authentication,” 1-4244-1437-7/07/$20.00 ,2007 IEEE ,v-469,ICIP 2007.
• [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and Compressed
Video,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998.
• [9] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases:
Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2,
no. 1, pp. 34-45, Jan.-Mar. 2005.
• [10] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible
Support for Multiple Access Control Policies,” ACM Trans. Database Systems, vol.
26, no. 2, pp. 214-260, 2001.
• [11] L. Sweeney, “Achieving K-Anonymity Privacy Protection Using
Generalization and Suppression,” http://en.scientificcommons. org/43196131, 2002.