Jeff jonas big data new physics
Upcoming SlideShare
Loading in...5
×
 

Jeff jonas big data new physics

on

  • 318 views

Big Data 12.3.14

Big Data 12.3.14

Statistics

Views

Total Views
318
Views on SlideShare
318
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Jeff jonas big data new physics Jeff jonas big data new physics Presentation Transcript

  • Big Data. New Physics. And Geospatial “Superfood” © 2014 IBM Corporation 1111 Jeff Jonas,Jeff Jonas,Jeff Jonas,Jeff Jonas, IBM Fellow Chief Scientist, Context Computing Email: jeffjonas@us.ibm.com Blog: www.jeffjonas.typepad.com Twitter: http://www.twitter.com/jeffjonas
  • About the Speaker Jeff Jonas IBM Fellow, Chief Scientist for Context Computing Founder and Chief Scientist of Systems Research & Development (SRD), acquired by IBM in 2005 © 2014 IBM Corporation 2222 acquired by IBM in 2005 Been designing, building deploying entity resolution systems for three decades This technology is used today by defense & intelligence, financial institutions, humanitarian efforts and more Today: Primarily focused on ‘sensemaking on streams’ with special attention towards privacy and civil liberties protections
  • ”The data must find the data and the relevance must find the user.” © 2014 IBM Corporation 3333 relevance must find the user.”
  • ComputingPowerGrowth Available Observation Space Context Trend: Organizations Are Getting Dumber Enterprise Amnesia © 2014 IBM Corporation 4444 Time ComputingPowerGrowth Sensemaking Algorithms
  • Available Observation Space Context WHY? Trend: Organizations Are Getting Dumber ComputingPowerGrowth © 2014 IBM Corporation 5555 Time Sensemaking Algorithms ComputingPowerGrowth
  • Algorithms at Dead End. You Can’t © 2014 IBM Corporation 6666 You Can’t Squeeze Knowledge Out of a Pixel.
  • No Context scrila34@msn.com © 2014 IBM Corporation 7777
  • Context, definition Better understanding something © 2014 IBM Corporation 8888 Better understanding something by taking into account the things around it.
  • I ducked as the bat flew my way. Another exciting baseball game … © 2014 IBM Corporation 9999
  • Information in Context … and Accumulating Top 200 CustomerTwitter scrila34@msn.com LinkedIn Career History © 2014 IBM Corporation 10101010 Customer Job Applicant Twitter Influencer AML Investigation
  • The Puzzle Metaphor Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors What it represents is unknown – there is no picture on hand Is it one puzzle, 15 puzzles, or 1,500 different puzzles? © 2014 IBM Corporation 11111111 Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted Some pieces may even be professionally fabricated lies Until you take the pieces to the table and attempt assembly, you don’t know what you are dealing with
  • 270 pieces 90% 200 pieces 66% 150 pieces 50% 6 pieces 2% Puzzling Images: Courtesy Ravensburger © 2011 © 2014 IBM Corporation 12121212 90% 66% 50% 2% 30 pieces 10% (duplicates)
  • © 2014 IBM Corporation 13131313
  • © 2014 IBM Corporation 14141414
  • First Discovery © 2014 IBM Corporation 15151515
  • More Data Finds Data © 2014 IBM Corporation 16161616
  • Duplicates in Front Of Your Eyes © 2014 IBM Corporation 17171717
  • First Duplicate Found Here © 2014 IBM Corporation 18181818
  • © 2014 IBM Corporation 19191919
  • Incremental Context – Incremental Discovery 6:40pm START 22min “Hey, this one is a duplicate!” 35min “I think some pieces are missing.” © 2014 IBM Corporation 20202020 37min “Looks like a bunch of hillbillies on a porch.” 44min “Hillbillies, playing guitars, sitting on a porch, near a barber sign … and a banjo!”
  • 150 pieces 50% © 2014 IBM Corporation 21212121
  • Incremental Context – Incremental Discovery 47min “We should take the sky and grass off the table.” 2hr “Let’s switch sides, and see if we can make sense of this from different perspectives.” © 2014 IBM Corporation 22222222 different perspectives.” 2hr10m “Wait, there are three … no, four puzzles.” 2hr17m “We need a bigger table.” 2hr18m “I think you threw in a few random pieces.”
  • © 2014 IBM Corporation 23232323
  • How Context Accumulates With each new observation … one of three assertions are made: 1) Un- associated; 2) placed near like neighbors; or 3) connected Must favor the false negative New observations sometimes reverse earlier assertions © 2014 IBM Corporation 24242424 Some observations produce novel discovery The emerging picture helps focus collection interests As the working space expands, computational effort increases Given sufficient observations, there can come a tipping point Thereafter, confidence improves while computational effort decreases!
  • UniqueIdentities Overstated Population © 2014 IBM Corporation 25252525 Observations UniqueIdentities True Population
  • Counting Is Difficult Mark Smith 6/12/1978 Mark R Smith (707) 433-0000 DL: 00001234 © 2014 IBM Corporation 26262626 6/12/1978 443-43-0000 File 1 File 2
  • UniqueIdentities The Rise and Fall of a Population © 2014 IBM Corporation 27272727 Observations UniqueIdentities True Population
  • Data Triangulation New Record Mark Smith 6/12/1978 Mark R Smith (707) 433-0000 DL: 00001234 © 2014 IBM Corporation 28282828 Mark Randy Smith 443-43-0000 DL: 00001234 6/12/1978 443-43-0000 File 1 File 2
  • Big Data [in context]. New Physics. More data: better the predictions – Lower false positives – Lower false negatives © 2014 IBM Corporation 29292929 More data: bad data good – Suddenly glad your data is not perfect More data: less compute
  • Big Data © 2014 IBM Corporation 30303030 Pile of ____ Information In Context
  • One Form of Context: “Expert Counting” Is it 5 people each with 1 account … or is it 1 person with 5 accounts? Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times? © 2014 IBM Corporation 31313131 reported 20 times? If one cannot count … one cannot estimate vector or velocity (direction and speed). Without vector and velocity … prediction is nearly impossible.
  • Entity Resolution Demonstration © 2014 IBM Corporation 32323232
  • Entity Resolution Demonstration DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 © 2014 IBM Corporation 33333333 When it comes to best practices in voter matching, if only a name and year of birth match, this is insufficient proof of a match. Many different people in the U.S. share a name and year of birth. Human review is required. Unfortunately, there can be many thousands of cases just like this and state election offices don’t have the staff/budget to manually review them all.
  • Now Consider This Tertiary DMV Record DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 © 2014 IBM Corporation 34343434 DMVDMVDMVDMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 The DMV record contains enough features to match both the voter (name, year of birth and driver’s license) and/or the deceased persons record (name, year of birth and SSN). For the sake of argument, let’s say it matches the voter best.
  • DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 Features Accumulate VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DMVDMVDMVDMV © 2014 IBM Corporation 35353535 The voter/DMV record now shares a name, year of birth and SSN with the deceased person. In voter matching best practices, this evidence would be sufficient to make a determination that this voter is likely deceased. This case no longer needs human review. DMVDMVDMVDMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005
  • VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DMVDMVDMVDMV As features accumulate it becomes possible to resolve previous un-resolvable identity records. As events and transactions Useful Insight Revealed!Useful Insight Revealed! © 2014 IBM Corporation 36363636 DMVDMVDMVDMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 As events and transactions accumulate – detection of relevance improves. Here we can see George who died in 1995 voted in 2008.
  • Expert Counting: Degrees of Difficulty Incompatible Features Deceit Bob Jones 123455 Ken Wells 550119 © 2014 IBM Corporation 37373737 Exactly Same Fuzzy Bob Jones 123455 Bob Jones 123455 Bob Jones 123455 Robert T Jonnes 000123455 Bob Jones 123455 bjones@hotmail
  • Deceit Detection Using Context Accumulation Deceit Bob Jones 123455 Ken Wells 550119Robert Jones 123455 POB 13452 DOB 03/12/73 Feature Accumulation © 2014 IBM Corporation 38383838 Ken Wells 550119 POB 999911 DOB 03/12/73 gw3e56@hotmail.com gw3e56@hotmail.com DOB 03/12/73 Robert Jones 123455 Ken Wells 550119 Resolved! DOB 03/12/73 Bob Jones POB 13452 gw3e56@hotmail.com
  • Skilled adversaries use “channel separation” to avoid detection. © 2014 IBM Corporation 39393939 Cell Phone #1 Unknown Cell Phone #2 Unknown Passport #1 William A. Bank Acct #1 Billy K.
  • Detection requires “channel consolidation.” © 2014 IBM Corporation 40404040 William A aka Billy K. • Cell Phone #1 • Cell Phone #2 • Bank Acct #1 • Passport #1
  • Take Note To catch clever criminals, one must ... 1) Collect observations the adversary doesn’t © 2014 IBM Corporation 41414141 1) Collect observations the adversary doesn’t know you have 2) Or, be able to perform compute over your observations in a manner the adversary cannot fathom
  • InfoSphere Identity Insight v8 © 2014 IBM Corporation 42424242 v8
  • New Think About Expert Counting Incompatible Features Deceit Bob Jones 123455 Ken Wells 550119 © 2014 IBM Corporation 43434343 Exactly Same Fuzzy Bob Jones 123455 Bob Jones 123455 Bob Jones 123455 Robert T Jonnes 000123455 Bob Jones 123455 bjones@hotmail
  • Key Features Enable Expert Counting Name License Plate No. Serial Number Address VIN MAC Address Date of Birth Make IP Address Phone Model Make Passport Year Model People Cars Router © 2014 IBM Corporation 44444444 Passport Year Model Nationality Color Firmware Version Biometric Etc. Etc. Etc.
  • Consider Lying Identical Twins #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT © 2014 IBM Corporation 45454545 Fingerprint DNA Most Trusted Authority “Same person – trust me.” Most Trusted Authority
  • The same thing cannot be in two places … at the same time. Two different things cannot occupy the same space … at the © 2014 IBM Corporation 46464646 Two different things cannot occupy the same space … at the same time.
  • Space & Time Enables Absolute Disambiguation When When When Where Where Where People Cars Router Name License Plate No. Serial Number Address VIN MAC Address Date of Birth Make IP Address Phone Model Make Passport Year Model © 2014 IBM Corporation 47474747 Passport Year Model Nationality Color Firmware Version Biometric Etc. Etc. Etc.
  • “Life Arcs” Are Also Telling Bill Smith 4/13/67 Salem, Oregon Bill Smith 4/13/67 Seattle, Washington Address History Address History © 2014 IBM Corporation 48484848 Address History Tampa, FL 2008-2008 Biloxi, MS 2005-2008 NY, NY 1996-2005 Tampa, FL 1984-1996 Address History San Diego, CA 2005-2009 San Fran, CA 2005-2005 Phoenix, AZ 1990-2005 San Jose, CA 1982-1990
  • OMG © 2014 IBM Corporation 49494949
  • Space-Time-Travel Cell phones are generating a staggering amount of geo- locational data – 600B transactions per day being created in the US alone This data is being “de-identified” and shared with third parties – in volume and in real-time © 2014 IBM Corporation 50505050 parties – in volume and in real-time Your movement quickly reveals where you spend your time (e.g., evenings vs. working hours) Re-identification (figuring out who is who) is somewhat trivial And, oh so powerful predictions …
  • The 10 People I Spend the Most Time With (Not at Home and Not at Work) 1. Michelle J 2. Renee M 3. Peggy M 4. Erin E 5. Joshua J He must be following me! © 2014 IBM Corporation 51515151 4. Erin E 5. Joshua J 6. Ivan X 7. Bob Y 8. Amanda H 9. Dane J 10. Wesley R He must be following me!
  • Consequences Space-time-travel data is the ultimate biometric It will enable enormous opportunity It will unravel one’s secrets © 2014 IBM Corporation 52525252 It will unravel one’s secrets It will challenge existing notions of privacy Adoption is now accelerating at a blistering pace
  • [Theatrical Pause] © 2014 IBM Corporation 53535353 [Theatrical Pause]
  • The G2 | Sensemaking Project © 2014 IBM Corporation 54545454
  • The G2 Vision 1) Evaluate each new observation against previous observations. 2) Determine if what is being observed is relevant. 3) Delivering this actionable insight to its consumer © 2014 IBM Corporation 55555555 3) Delivering this actionable insight to its consumer … fast enough to do something about it while it is still happening. 4) Doing this with sufficient accuracy and scale to really matter.
  • Uniquely G2 Real “Context Computing” – Complete Context: Contextualize diverse observations, each observation benefiting from others – Current Context: Real-time, incremental integration – Conflicting Context: High tolerance for disagreement, confusion and uncertainty – Self-Correcting Context: New observations able to reverse earlier assertions Engineered ground-up for cloud compute … in support of hemisphere-scale data © 2014 IBM Corporation 56565656 Introduce new data sources (e.g., geospatial), new entity types (e.g., vessels), new features (e.g., MAC addresses) … without schema change/re-engineering From sense to respond in sub-200ms– fast enough to do something about the transaction while it is still happening Unprecedented number of Privacy by Design (PbD) features baked-in
  • Privacy by Design (PbD) 1. Full Attribution 2. Tamper Resistant Audit Log 3. Information Transfer Accounting 4. Data Tethering © 2014 IBM Corporation 57575757 http://jeffjonas.typepad.com/jeff_jonas/2012/06/privacy-by-design-in-the-era-of-big-data.html 4. Data Tethering 5. False Negative Favoring 6. Self-Correcting False Positives 7. Analytics on Anonymized Data
  • Example: Self-Correcting False Positive John T Smith Jr 123 Main Street 703 111-2000 DOB: 03/12/1984 John T Smith 123 Main Street A plausible claim these two people are the same 1 2 John T Smith Sr 123 Main Street Until this record 3 © 2014 IBM Corporation 58585858 Which reveals this is a FALSE POSITIVE 123 Main Street 703 111-2000 DL: 009900991 2 123 Main Street 703 111-2000 DL: 009900991 Until this record comes into view
  • Example: Self-Correcting False Positive John T Smith Jr 123 Main Street 703 111-2000 DOB: 03/12/1984 John T Smith 123 Main Street John T Smith Sr 123 Main Street 1 3 2 © 2014 IBM Corporation 59595959 123 Main Street 703 111-2000 DL: 009900991 123 Main Street 703 111-2000 DL: 009900991 New Best Practice: FIXED IN REAL-TIME (not end of month) John T Smith 123 Main Street 703 111-2000 DL: 009900991 2 2
  • Use Cases Maritime Domain Awareness New system lets authorities track suspicious ships http://www.asiaone.com/print/News/Latest%2BNews/Science%2Band%2BTech/Story/A1Story201 30703-434337.html Voter Registration Modernization © 2014 IBM Corporation 60606060 Voter Registration Modernization David Becker (PEW Charitable Trust) and Jeff Jonas (IBM) Discuss How G2 Has Helped Modernize Voter Registration in America http://ibmreferencehub.com/STG/ibm_executive_edge_2013/#gensession_daytwo_jonasbecker
  • Closing Thoughts © 2014 IBM Corporation 61616161
  • Available Observation Space Context Wish This on the Adversary Enterprise Amnesia ComputingPowerGrowth © 2014 IBM Corporation 62626262 Time Sensemaking Algorithms ComputingPowerGrowth
  • Wish This for Yourself: Better Sensemaking Skills Available Observation Space Context ComputingPowerGrowth © 2014 IBM Corporation 63636363 Time Sensemaking Algorithms ComputingPowerGrowth
  • State of the Union: Isolated Analytics Structured Data Analytics Unstructured Data Analytics © 2014 IBM Corporation 64646464 Observation Space Action Social Network Analytics
  • The Future: General Purpose Context Accumulation Data Finds Data Relevance Finds You This is GThis is GThis is GThis is G2222 © 2014 IBM Corporation 65656565 Observation Space Consumer (An analyst, a system, the sensor itself, etc.) Information In Context
  • The most competitive organizations are going to make sense of what they are observing fast enough to do something about it © 2014 IBM Corporation 66666666 fast enough to do something about it while they are observing it.
  • Related Blog Posts Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel Puzzling: How Observations Are Accumulated Into Context Big Data. New Physics. On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others © 2014 IBM Corporation 67676767 Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! When Federated Search Bites Data Finds Data Structuring Unstructured Data Fantasy Analytics
  • Questions? © 2014 IBM Corporation 68686868 Email: jeffjonas@us.ibm.com Blog: www.jeffjonas.typepad.com Twitter: http://www.twitter.com/jeffjonas
  • Big Data. New Physics. And Geospatial “Superfood” © 2014 IBM Corporation 69696969 Jeff Jonas,Jeff Jonas,Jeff Jonas,Jeff Jonas, IBM Fellow Chief Scientist, Context Computing Email: jeffjonas@us.ibm.com Blog: www.jeffjonas.typepad.com Twitter: http://www.twitter.com/jeffjonas