Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas,  IBM Distinguished Engineer Chief Scienti...
Big Data.  New Physics. <ul><li>More data: better the predictions </li></ul><ul><ul><li>Lower false positives </li></ul></...
Background <ul><li>Early 80’s: Founded Systems Research & Development </li></ul><ul><li>1989 – 2003: Built numerous system...
Trend: Organizations Are Getting Dumber Time Computing Power Growth Sensemaking Algorithms Available Observation Space Con...
Trend: Organizations Are Getting Dumber Time Computing Power Growth Sensemaking Algorithms Available Observation Space Con...
Algorithms at Dead End.  You Can’t  Squeeze Knowledge  Out of a Pixel.
No Context [email_address]
<ul><li>Context , definition </li></ul><ul><li>Better understanding something by taking into account the things around it....
Information in Context … and Accumulating  Top 200 Customer Job  Applicant Identity Thief  Criminal Investigation [email_a...
From Pixels to Pictures to Insight  Observations Contextualization Information in Context Relevance Consumer (An analyst, ...
The Puzzle Metaphor <ul><li>Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors </li></ul><u...
How Context Accumulates <ul><li>With each new observation … one of three assertions are made: 1) Un-associated; 2) placed ...
Overstated Population Observations Unique Identities True Population
Counting Is Difficult Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2
The Bigger, The More Accurate, The Faster Observations Unique Identities True Population
Data Triangulation  Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2 Mark Randy Smi...
Big Data … pile of … Big Data … in context
One Form of Context is “Expert Counting” <ul><li>Is it 5 people each with 1 account … or is it 1 person with 5 accounts?  ...
“Key Features” Enable Expert Counting <ul><li>People Cars Router </li></ul><ul><li>Name Make Device ID </li></ul><ul><li>A...
Consider Lying Identical Twins #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT Finge...
<ul><li>The same thing cannot be in two places … at the same time. </li></ul><ul><li>Two different things cannot occupy th...
Space & Time Enables  Absolute  Disambiguation <ul><li>People Cars Router </li></ul>Name Make Device ID Address Model Make...
“Life Arcs” Are Also Telling Bill Smith 4/13/67 Salem, Oregon Bill Smith 4/13/67 Seattle, Washington Address History Tampa...
OMG
Space-Time-Travel <ul><li>Cell phones are generating a staggering amount of geo-locational data – 600B transactions per da...
Space-Time-Travel is Prediction Super-Food <ul><li>Prediction with 87% certainty where you will be next Thursday at 5:35pm...
Consequences <ul><li>Space-time-travel data is the ultimate biometric </li></ul><ul><li>It will enable enormous opportunit...
Surveillance society is irresistible. And you are doing it. Location-based services (GPS), free email, Facebook, etc.
2 Big Data Trends
Trend: Time Is Of The Essence Willingness to Wait The better the predictions … the faster they will be wanted.  “ Why did ...
Trend: Growing Tolerance for Non-Repeatability Accountable and Repeatable It appears the market is becoming more tolerant ...
Trend: Be Careful What You Wish For Accountable and Repeatable 6:34pm  Recommendation  Shoot it 6:35pm  Action Taken  Bang...
Closing Thoughts
Wish This On The Adversary Time Computing Power Growth Sensemaking Algorithms Available Observation Space Context
Context Accumulation: The Way Forward Time Computing Power Growth Sensemaking Algorithms Available Observation Space Conte...
Related Blog Posts <ul><li>Big Data. New Physics. </li></ul><ul><li>Algorithms At Dead-End: Cannot Squeeze Knowledge Out O...
Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas,  IBM Distinguished Engineer Chief Scienti...
“ G2” My R&D Skunk Works Project
My G2 Goals <ul><li>General purpose, real-time, sensemaking engine </li></ul><ul><li>Performs ‘information colocation’ ove...
Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas,  IBM Distinguished Engineer Chief Scienti...
Upcoming SlideShare
Loading in...5
×

Big data new physics giga om structure conference ny - march 2011

7,573

Published on

Opening keynote @ Structure Big Data 2011 conference.

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,573
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
228
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Big data new physics giga om structure conference ny - march 2011

  1. 1. Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] March 23rd, 2011
  2. 2. Big Data. New Physics. <ul><li>More data: better the predictions </li></ul><ul><ul><li>Lower false positives </li></ul></ul><ul><ul><li>Lower false negatives </li></ul></ul><ul><li>More data: faster </li></ul><ul><ul><li>The compute required decreases as database gets bigger </li></ul></ul><ul><li>Bonus: bad data … good </li></ul><ul><ul><li>Suddenly glad your data is not perfect </li></ul></ul>
  3. 3. Background <ul><li>Early 80’s: Founded Systems Research & Development </li></ul><ul><li>1989 – 2003: Built numerous systems for Las Vegas, including NORA </li></ul><ul><li>Designed and deployed +/- 100 systems, at least 5 systems containing multi-billions of records and 100’s of millions of entities </li></ul><ul><li>2005: IBM acquires SRD </li></ul><ul><li>Today: Focus on ‘sensemaking on streams’ with special attention towards privacy and civil liberties protections </li></ul>
  4. 4. Trend: Organizations Are Getting Dumber Time Computing Power Growth Sensemaking Algorithms Available Observation Space Context Every two days now we create as much information as we did from the dawn of civilization up until 2003.” ~ Eric Schmidt, CEO Google Enterprise Amnesia
  5. 5. Trend: Organizations Are Getting Dumber Time Computing Power Growth Sensemaking Algorithms Available Observation Space Context WHY?
  6. 6. Algorithms at Dead End. You Can’t Squeeze Knowledge Out of a Pixel.
  7. 7. No Context [email_address]
  8. 8. <ul><li>Context , definition </li></ul><ul><li>Better understanding something by taking into account the things around it. </li></ul>
  9. 9. Information in Context … and Accumulating Top 200 Customer Job Applicant Identity Thief Criminal Investigation [email_address]
  10. 10. From Pixels to Pictures to Insight Observations Contextualization Information in Context Relevance Consumer (An analyst, a system, the sensor itself, etc.)
  11. 11. The Puzzle Metaphor <ul><li>Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors </li></ul><ul><li>What it represents is unknown (there is no picture on hand) </li></ul><ul><li>Is it one puzzle, 15 puzzles, or 1,500 different puzzles? </li></ul><ul><li>Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted </li></ul><ul><li>Some pieces may even be professionally fabricated lies </li></ul><ul><li>Point being: Until you take the pieces to the table and attempt assembly, you don’t know what you are dealing with </li></ul>
  12. 12. How Context Accumulates <ul><li>With each new observation … one of three assertions are made: 1) Un-associated; 2) placed near like neighbors; or 3) connected </li></ul><ul><li>Must favor the false negative </li></ul><ul><li>New observations sometimes reverse earlier assertions </li></ul><ul><li>As the working space expands, computational effort increases </li></ul><ul><li>Given sufficient observations, there can come a tipping point … thereafter, confidence improves while computational effort decreases ! </li></ul>
  13. 13. Overstated Population Observations Unique Identities True Population
  14. 14. Counting Is Difficult Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2
  15. 15. The Bigger, The More Accurate, The Faster Observations Unique Identities True Population
  16. 16. Data Triangulation Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2 Mark Randy Smith 443-43-0000 DL: 00001234 New Record
  17. 17. Big Data … pile of … Big Data … in context
  18. 18. One Form of Context is “Expert Counting” <ul><li>Is it 5 people each with 1 account … or is it 1 person with 5 accounts? </li></ul><ul><li>Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times? </li></ul><ul><li>If one cannot count … one cannot estimate vector or velocity (direction and speed). </li></ul><ul><li>Without vector and velocity … prediction is nearly impossible. </li></ul>
  19. 19. “Key Features” Enable Expert Counting <ul><li>People Cars Router </li></ul><ul><li>Name Make Device ID </li></ul><ul><li>Address Model Make </li></ul><ul><li>Date of Birth Year Model </li></ul><ul><li>Phone License Plate No. Firmware Vers. </li></ul><ul><li>Passport VIN Asset ID </li></ul><ul><li>Nationality Owner Etc. </li></ul><ul><li>Biometric Etc. </li></ul><ul><li>Etc. </li></ul>
  20. 20. Consider Lying Identical Twins #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT Fingerprint DNA Most Trusted Authority “ Same person – trust me.” Most Trusted Authority
  21. 21. <ul><li>The same thing cannot be in two places … at the same time. </li></ul><ul><li>Two different things cannot occupy the same space … at the same time. </li></ul>
  22. 22. Space & Time Enables Absolute Disambiguation <ul><li>People Cars Router </li></ul>Name Make Device ID Address Model Make Date of Birth Year Model Phone License Plate No. Firmware Vers. Passport VIN Asset ID Nationality Owner Etc. Biometric Etc. Etc. When When When Where Where Where
  23. 23. “Life Arcs” Are Also Telling Bill Smith 4/13/67 Salem, Oregon Bill Smith 4/13/67 Seattle, Washington Address History Tampa, FL 2008-2008 Biloxi, MS 2005-2008 NY, NY 1996-2005 Tampa, FL 1984-1996 Address History San Diego, CA 2005-2009 San Fran, CA 2005-2005 Phoenix, AZ 1990-2005 San Jose, CA 1982-1990
  24. 24. OMG
  25. 25. Space-Time-Travel <ul><li>Cell phones are generating a staggering amount of geo-locational data – 600B transactions per day being created in the US alone </li></ul><ul><li>This data is being “de-identified” and shared with third parties – in volume and in real-time </li></ul><ul><li>Your movement quickly reveals where you spend your time (e.g., evenings vs. working hours) and who you spend your time with </li></ul><ul><li>Re-identification (figuring out who is who) is somewhat trivial </li></ul>
  26. 26. Space-Time-Travel is Prediction Super-Food <ul><li>Prediction with 87% certainty where you will be next Thursday at 5:35pm </li></ul><ul><li>Names of the top 10 people you co-locate with, not at home and not at work </li></ul><ul><li>The Uberstan intelligence service preempts the next mass protest in real-time </li></ul><ul><li>A political opponent is crushed and resigns two days after announcing their candidacy </li></ul>
  27. 27. Consequences <ul><li>Space-time-travel data is the ultimate biometric </li></ul><ul><li>It will enable enormous opportunity </li></ul><ul><li>It will unravel one’s secrets </li></ul><ul><li>It will challenge existing notions of privacy </li></ul><ul><li>And, it’s here now and more to come </li></ul>
  28. 28. Surveillance society is irresistible. And you are doing it. Location-based services (GPS), free email, Facebook, etc.
  29. 29. 2 Big Data Trends
  30. 30. Trend: Time Is Of The Essence Willingness to Wait The better the predictions … the faster they will be wanted. “ Why did we have to wait until the end of the day for the smart answer?” Relevance (Iffy) (Totally) Day Hour 200ms Batch Real-Time
  31. 31. Trend: Growing Tolerance for Non-Repeatability Accountable and Repeatable It appears the market is becoming more tolerant of one-time results that cannot be easily repeated or substantiated Facebook Going Forward Yesterday Payroll Now Google
  32. 32. Trend: Be Careful What You Wish For Accountable and Repeatable 6:34pm Recommendation Shoot it 6:35pm Action Taken Bang.Dead 6:36pm Recommendation Oops.Send Flowers Going Forward Yesterday Now
  33. 33. Closing Thoughts
  34. 34. Wish This On The Adversary Time Computing Power Growth Sensemaking Algorithms Available Observation Space Context
  35. 35. Context Accumulation: The Way Forward Time Computing Power Growth Sensemaking Algorithms Available Observation Space Context Context Accumulation
  36. 36. Related Blog Posts <ul><li>Big Data. New Physics. </li></ul><ul><li>Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel </li></ul><ul><li>Puzzling: How Observations Are Accumulated Into Context </li></ul><ul><li>Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems </li></ul><ul><li>Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! </li></ul><ul><li>Data Finds Data </li></ul><ul><li>General Purpose Sensemaking Systems and Information Colocation </li></ul><ul><li>Sensemaking on Streams – My G2 Skunk Works Project: Privacy by Design ( PbD ) </li></ul>
  37. 37. Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] March 23rd, 2011
  38. 38. “ G2” My R&D Skunk Works Project
  39. 39. My G2 Goals <ul><li>General purpose, real-time, sensemaking engine </li></ul><ul><li>Performs ‘information colocation’ over diverse data types e.g., structured, unstructured, social, geospatial, queries, hypothesis, anonymized data and more </li></ul><ul><li>Exploiting the big data, new physics phenomenon </li></ul><ul><li>Delivers “data finds data, relevance finds you” </li></ul><ul><li>Engineered for grid compute for massive scalability </li></ul><ul><ul><li>Dreaming about: 1T rows for breakfast – then sustaining 1M context accumulating observations per second </li></ul></ul><ul><ul><li>While new observations reverse earlier assertions </li></ul></ul><ul><li>Privacy by Design (PbD) – a number of exciting privacy and civil liberties enhancing features baked-in, by design </li></ul>
  40. 40. Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] March 23rd, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×