From the Hadoop Summit 2015 Session with Ted Dunning.
Open source is great, if developed in the open. Privacy is great, but things have to be private. So what happens when you find an open source bug with private data? How do you even file the bug report? Likewise, how can you develop fraud detection algorithms in academic settings when the training data can't be transported outside a secure perimeter. One answer is really good fake data. Good enough to fool the bug. Good enough to emulate the fraud. I will describe log-synth and several physics based approaches that can do this and tell some real stories about fake data.