Your SlideShare is downloading. ×
0
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

EOCD Big Data Flows vs. Wicked Leaks

1,299

Published on

Jeff Jonas' presentation to OECD about big data and wicked leaks on Dec 1, 2010.

Jeff Jonas' presentation to OECD about big data and wicked leaks on Dec 1, 2010.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,299
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Flows vs. Wicked Leaks Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] December 1, 2010
  • 2. Background <ul><li>Early 80’s: Founded Systems Research &amp; Development (SRD), a custom software consultancy </li></ul><ul><li>1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA) </li></ul><ul><li>2005: IBM acquires SRD, now chief scientist of IBM Entity Analytics </li></ul><ul><li>Personally designed and deployed +/- 100 systems, a number of which contained multi-billions of transactions describing 100’s of millions of entities </li></ul><ul><li>Today: My focus is in the area of ‘sensemaking on streams’ with special attention towards privacy and civil liberties protections </li></ul><ul><ul><li>Markle Foundation, Member, Task Force on National Security </li></ul></ul><ul><ul><li>EPIC, Member, Advisory Board </li></ul></ul>
  • 3. Data Volumes Exploding “ Every two days now we create as much information as we did from the dawn of civilization up until 2003.” -Eric Schmidt, CEO Google
  • 4. Big Data Flows: How Many Copies? <ul><li>Blog Post: How Many Copies of Your Data? Is Somewhat Like Asking: How Many Licks to the Center of the Tootsie Pop? </li></ul><ul><li>Often, at minimum, 144 copies </li></ul><ul><ul><li>Backups </li></ul></ul><ul><ul><li>Internal transfers </li></ul></ul><ul><ul><ul><li>Other operational systems </li></ul></ul></ul><ul><ul><ul><li>Operational data stores </li></ul></ul></ul><ul><ul><ul><li>Data warehouses </li></ul></ul></ul><ul><ul><ul><li>Data marts </li></ul></ul></ul><ul><ul><ul><li>Testing systems </li></ul></ul></ul><ul><ul><ul><li>Training systems </li></ul></ul></ul><ul><ul><ul><li>Their backups </li></ul></ul></ul><ul><ul><li>External transfers (information sharing partners) </li></ul></ul><ul><ul><ul><li>And then their entire ecosystem (from warehouses to backups) </li></ul></ul></ul><ul><ul><ul><li>And their information sharing partners </li></ul></ul></ul><ul><li>Sometimes 10,000’s of copies </li></ul>
  • 5. <ul><li>What’s driving information sharing and big data? </li></ul>
  • 6. Organizations Are Getting Dumber Time Computing Power Growth Available Observation Space Context Sensemaking Algorithms Enterprise Amnesia
  • 7. No Context [email_address]
  • 8. Information in Context … and Accumulating Top 200 Customer Job Applicant Identity Thief Termination “ No-Rehire” [email_address]
  • 9. Demonstration
  • 10. VOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 Is This Voter Deceased? When it comes to best practices in voter matching, if only a name and year of birth match, this is insufficient proof of a match. Many different people in the U.S. share a name and year of birth. Human review is required. Unfortunately, there are thousands and thousands of cases just like this and state election offices don’t have the staff (or budget) to manually review such volumes.
  • 11. VOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 Now Consider This Tertiary DMV Record DMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 The DMV record contains enough features to match both the voter (name, year of birth and driver’s license) and/or the deceased persons record (name, year of birth and SSN). For the sake of argument, let’s say it matches the voter best.
  • 12. VOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 DECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 Is This Voter/DMV Person Deceased? The voter/DMV record now shares a name, year of birth and SSN with the deceased person record. In voter matching best practices, this evidence would be sufficient to make a determination that this voter is in fact deceased. This case no longer needs human review.
  • 13. VOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 DECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 Context Accumulates As features accumulate it becomes easier to match future identity records. As events and transactions accumulate – detection of relevance improves. Here we can see George who died in 1995 voted in 2008.
  • 14. Flows vs. Leaks
  • 15. Flows vs. Leaks <ul><li>Most flows are by design </li></ul><ul><ul><li>Better context </li></ul></ul><ul><ul><li>Better prediction </li></ul></ul><ul><li>Unintended disclosure flows </li></ul><ul><ul><li>External: e.g., Cyber </li></ul></ul><ul><ul><li>Internal: e.g., Insider threats </li></ul></ul><ul><li>Recent WikiLeaks </li></ul><ul><ul><li>Devastating consequences </li></ul></ul><ul><ul><li>Could have been worse. What if the leaks were not made public rather selectively and quietly passed around to others? </li></ul></ul>
  • 16. Wicked Leaks, Prediction <ul><li>Sudden change of leak cadence – so much, so fast, no considered intolerable </li></ul><ul><ul><li>May result in new harsh laws and prosecutions </li></ul></ul><ul><li>Receiving stolen property (e.g., data) and benefiting from it … met with severe pursuit and prosecution </li></ul><ul><ul><li>Stealing party </li></ul></ul><ul><ul><li>The information exchange points (e.g., Wikileaks) </li></ul></ul><ul><ul><li>Those publishing the contents (e.g., the media) </li></ul></ul><ul><li>What if? Could result in less transparency, less accountability </li></ul>
  • 17. Protecting Big Data from Wicked Leaks <ul><li>Central indexes </li></ul><ul><ul><li>fewer copies of the data moved; easier to control and audit usage </li></ul></ul><ul><li>Analytics in the anonymized data space </li></ul><ul><ul><li>data anonymization before transfer, reducing the risk of unintended disclosure </li></ul></ul><ul><li>Immutable (tamper resistant) audit logs </li></ul><ul><ul><li>to help prove the system is being used within policy and law </li></ul></ul><ul><li>Real-time active audits </li></ul><ul><ul><li>to evaluate the actions of authorized users </li></ul></ul>
  • 18. Big Data Flows vs. Wicked Leaks Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] December 1, 2010

×