This document discusses techniques for detecting data leakage from organizations. It proposes adding fake records to distributed data to improve the chances of identifying guilty parties if a leak occurs. The key techniques are:
1. The data distributor adds "fake but realistic" records to sensitive data distributed to agents. If a leak is discovered, the fake records act as watermarks to identify the source.
2. Two allocation strategies are evaluated: explicit, where the distributor cannot add fakes, and sample, where fakes can be injected. The goal is to maximize the probability of identifying a leaker.
3. An optimization problem is formulated to distribute data across agents in a way that maximizes probability differences between potential leakers,