Jim AdlerVP Data Systems & Chief Privacy Officerinome@jim_adlerhttp://jimadler.me                             inome       ...
OVERTURE & 3 ACTS1. About inome2. Strata Redux3. Felon Classifier4. Closing Arguments
IntelligenceI am not an           Geek           Dweeb  Attorney                    Nerd                                  ...
ABOUT INOMEReal-time, person-centricdata engineStructured andunstructured data10 years in the makingScalable – serves over...
When towns were small …
INFORMATION              SOCIAL              GENOMICSINTERACTION
inome is bringing the “local village” back
HOW WE ALL FIT TOGETHER
HOW INOME SOLVES THEBillions of Records        “BIG DATA” PEOPLE PROBLEM                                Millions of People...
THE INOME ENGINE      Names      Places      Phones   Court Records                                  Data                 ...
ACT 1Strata Redux
… the essential crime that                               contained all others in itself.                               Tho...
THE PLACES-PLAYERS-PERILS  PRIVACY FRAMEWORK       P R IVAC Y             PERILS                      http://jimadler.me/p...
M O R E P L AY E R P O W E R G A P                                                           PLACES-PLAYERS-PERILS CASES  ...
ACT 2                        Felon ClassifierContributorsJeremy Kahn, Senior ScientistDeepak Konidena, Software Engineer
THE CLASSIFIER’S GOALIf someone has minor offenses    on their criminal record,do they also have any felonies?
MOTIVATIONSAsk the hard questionsConvene the suits, wonks, and geeksDrive responsible innovationExplore the data & showcas...
A FEW DEFINITIONSDefinition   Positive  Has at least one felony   Negative  Has no felonies but does have lesser offen...
DATA EXTRACTION AND CLEANSING               Data Acquisition                                   Data Exchange              ...
EXAMPLE DATAPrediction Data    key: e926f511b7f8289c64130a266c66411e    val:      offenses:      - {CaseID: MDAOC206059-2,...
Model Training                   INOME Person Profile          Prediction                 Non-Felony                      ...
MODEL FEATURES  Personal Profile           Criminal ProfilePerson.NumBodyMarks        Offenses.NumOffenses  Person.HasTatt...
EXAMPLE FEATUREclass EyeColor(Extractor):    normalizer = {        bro: brown’,blu: blue, blk: black, hzl: hazel’,        ...
THE CODEGasket – an inome functional toolset for data extraction   Avro, Json, and YamlGemini – an inome framework for fe...
FELON CLASSIFIER PERFORMANCE                                      100.0%                False Negative Rate   80.0%       ...
ALTERNATING DECISION TREE
ACT 3Closing Arguments
M O R E P L AY E R P O W E R G A P                                     US deports tourists                                ...
FROM INFERENCES TO ACTIONSFourth Amendment checks gov’t abusesPrinciples of reasonable suspicionGeographic ProfilingCrimin...
REASONABLE SUSPICIONCourts have upheld profilingPredictive information never enough   1.   Reliable   2.   Efficient   3. ...
GEOGRAPHIC PROFILING“Very soon, we will be moving to a predictive policing modelwhere, by studying real time crime pattern...
CRIMINAL PROFILING“Computerized” tips and profiles   Predicting crime for specific individuals   Courts have held that p...
SUMMARYBig data inferences are thought, not crimeSpeech and action could be criminal… So think carefullyCheck us out  Cla...
Jim AdlerVP Data Systems & Chief Privacy Officerinome@jim_adlerhttp://jimadler.me                          It’s in inome
Big Data is a Hotbed of Thoughtcrime, Part II: The Code
Upcoming SlideShare
Loading in …5
×

Big Data is a Hotbed of Thoughtcrime, Part II: The Code

2,651 views

Published on

Strata Conference
Santa Clara, CA
Feb 27, 2013
http://strataconf.com/strata2013/public/schedule/detail/27443

At Strata 2012 in New York, we discussed the hazards of curbing big data inferences by defining a new category of thoughtcrime. After all, acting on thoughts might constitute a crime, but thoughts, in isolation, cannot be criminal. It’s time to go deeper. Let’s create and evaluate a predictive criminal model that highlights where the sensitivities lie, both technically and ethically.

Over the last decade, Intelius has built a people-centric big data platform — what we call the inome platform. We’ll use it and our criminal database of several hundred million U.S. criminal records to train and evaluate a predictive criminal model. As part of this talk, we’ll release the model and some of the inome machine-learning scaffolding code.

What makes big data so scary is that, for the first time, we are leveraging huge data mines to make inferences outside the wisdom of our own minds. Is it possible to predict, with meaningful recall and acceptable precision, who might commit a crime? We’ll showcase our model’s shortcomings due to inescapable precision/recall trade-offs — false negatives miss criminals while false positives indict the innocent. And even if we could build a perfect predictor, does a powerful government have the right to use it and eclipse free will?

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,651
On SlideShare
0
From Embeds
0
Number of Embeds
1,283
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big Data is a Hotbed of Thoughtcrime, Part II: The Code

  1. 1. Jim AdlerVP Data Systems & Chief Privacy Officerinome@jim_adlerhttp://jimadler.me inome The Genomics of How We All Fit Together
  2. 2. OVERTURE & 3 ACTS1. About inome2. Strata Redux3. Felon Classifier4. Closing Arguments
  3. 3. IntelligenceI am not an Geek Dweeb Attorney Nerd Social Obsession Dork Ineptitude
  4. 4. ABOUT INOMEReal-time, person-centricdata engineStructured andunstructured data10 years in the makingScalable – serves over 1million visitors a dayAPIs support 3rd party apps– http://developer.inome.com
  5. 5. When towns were small …
  6. 6. INFORMATION SOCIAL GENOMICSINTERACTION
  7. 7. inome is bringing the “local village” back
  8. 8. HOW WE ALL FIT TOGETHER
  9. 9. HOW INOME SOLVES THEBillions of Records “BIG DATA” PEOPLE PROBLEM Millions of People 213 records mapped to the correct 37 Jim Adlers Philip Collins Randolph Jim Adler Hutchins Jim Adler 375 5 People McKinney, TX People 213 Records 37 People Jim Adler Age 57 Gwen Houston, TX Fleming Carol Brooks 2 Age 68 People 9800 Records Jim Adler 1250 People Hastings, NE Age 32 Jim Adler Canaan, NH Age 59 Jim Adler Redmond, WA Age 48 Jim Adler Denver, CO Age 48
  10. 10. THE INOME ENGINE Names Places Phones Court Records Data Data News/Blogs Acquisition Exchange Professional Relatives Acquire, Standardize, Friends Validate, Extract Colleagues FeaturesFull Text Search Machine Index Learners Clustering BlockingDocument http://developer.inome.com Store APIs
  11. 11. ACT 1Strata Redux
  12. 12. … the essential crime that contained all others in itself. Thoughtcrime, they called it." George Orwell"Watch your thoughts, they become words.Watch your words, they become actions.Watch your actions, they become habits.Watch your habits, they become your character.Watch your character, it becomes your destiny.” Lao Tzu
  13. 13. THE PLACES-PLAYERS-PERILS PRIVACY FRAMEWORK P R IVAC Y PERILS http://jimadler.me/post/14171086020/creepy-is-as-creepy-does http://jimadler.me/post/18618791545/strata-2012-is-privacy-a-big-data-prison
  14. 14. M O R E P L AY E R P O W E R G A P PLACES-PLAYERS-PERILS CASES US deports tourists over Predictive Policing FBI GPS surveillance Tweets Google privacy policy unification Target finds out teen PA school district spies NYPD catches gangs pregnant before parents on students with bragging on Twitter HR exec loses job over LinkedIn profile updates webcams Disney tracks kids without parental consent Carrier IQ logging News of the World phone location hacking Netflix shares your movie picks Woman caught naked by Actress sues IMDB over iPhone caching location Google Street View revealing her age GM OnStar tracks users Craigslist prostitution client exposure Rutgers student commits FB user sets fire to home suicide after spied by after de-friending webcam M O R E P R I VAT E P L A C E S
  15. 15. ACT 2 Felon ClassifierContributorsJeremy Kahn, Senior ScientistDeepak Konidena, Software Engineer
  16. 16. THE CLASSIFIER’S GOALIf someone has minor offenses on their criminal record,do they also have any felonies?
  17. 17. MOTIVATIONSAsk the hard questionsConvene the suits, wonks, and geeksDrive responsible innovationExplore the data & showcase the technology
  18. 18. A FEW DEFINITIONSDefinition  Positive  Has at least one felony  Negative  Has no felonies but does have lesser offensesClassifier Performance  True Positive  Correctly identifies a felon  True Negative  Correctly ignores someone who isn’t a felon  False Positive  Incorrectly identifies a felon who isn’t one  False Negative  Incorrectly ignores a felon
  19. 19. DATA EXTRACTION AND CLEANSING Data Acquisition Data Exchange Clustering Blocking Linking 250 M 40 M State NoiseDefendants Defendants Fan-Out Filter(avro files) INOME ENGINE
  20. 20. EXAMPLE DATAPrediction Data key: e926f511b7f8289c64130a266c66411e val: offenses: - {CaseID: MDAOC206059-2, CaseInfo: CASE DISPO: TRIAL, CJIS CODE: 3 5010, Disposition: STET, Key: hyg-MDAOC206059, OffenseClass: M, OffenseCount: 2, OffenseDate: 20041205, OffenseDesc: THEFT:LESS $500 VALUE} - {CaseID: MDAOC206060-1, CaseInfo: CASE DISPO: TRIAL, CJIS CODE: 1 4803, Disposition: GUILTY, Key: hyg-MDAOC206060, OffenseClass: M, OffenseCount: 1, OffenseDate: 20040928, OffenseDesc: FALSE STATEMENT TO OFFICER} profile: {BodyMarks: TAT L ARM; ,TAT L SHLD: N/A; ,TAT R ARM: N/A; ,TAT R SHLD: N/A; ,TAT RF ARM; ,TAT UL ARM; ,TAT UR AR, DOB: 19711206, DOB.Completeness: 111, EyeColor: HAZEL, Gender: m, HairColor: BROWN, Height: 58", SkinColor: FAIR, State: DE,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD’, Weight: 180 LBS}Training Labels key: e926f511b7f8289c64130a266c66411e val: label: true offenses: - {CaseID: MDAOC206065-4, CaseInfo: CASE DISPO: TRIAL, CJIS CODE: 1 6501, Disposition: NOLLE PROSEQUI, Key: hyg-MDAOC206065, OffenseClass: F, OffenseCount: 1, OffenseDesc: ARSON 2ND DEGREE}
  21. 21. Model Training INOME Person Profile Prediction Non-Felony Profile Data Offense Information Information Features Learn Model Training Felony Labels Offense InformationModel Operation INOME Person Profile Prediction Non-Felony Person Data Offense Model Has any felonies? Information Information
  22. 22. MODEL FEATURES Personal Profile Criminal ProfilePerson.NumBodyMarks Offenses.NumOffenses Person.HasTattoo Offenses.OnlyTraffic Person.IsMale Person.HairColor Person.EyeColor Person.SkinColor
  23. 23. EXAMPLE FEATUREclass EyeColor(Extractor): normalizer = { bro: brown’,blu: blue, blk: black, hzl: hazel’, haz’: hazel’, grn: green’} schema = {type: enum, name: EyeColors, symbols: (black, brown, hazel, blue, green, other, unknown)} def extract(self, record): recorded = record[profile].get(EyeColor, None) if recorded is None: return unknown recorded = recorded.lower() if recorded in self.normalizer: recorded = self.normalizer[recorded] for i in self.schema[symbols]: if recorded.startswith(i): recorded = i if recorded in self.schema[symbols]: return recorded else: return other
  24. 24. THE CODEGasket – an inome functional toolset for data extraction  Avro, Json, and YamlGemini – an inome framework for feature extraction and learning  Domain knowledge feature extractors  Model construction from features and labelsFelon detector available now: http://github.com/inome/strataconf-2013-sc
  25. 25. FELON CLASSIFIER PERFORMANCE 100.0% False Negative Rate 80.0% Threshold: 1.01 FP Rate: 1%A N A R C H Y FN Rate: 40% 60.0% Threshold: 0.66 40.0% FP Rate: 5% FN Rate: 22% 20.0% Threshold: -1.82 FP Rate: 19% FN Rate: 0% 0.0% 0.0% 5.0% 10.0% 15.0% 20.0% False Positive Rate T Y R A N N Y
  26. 26. ALTERNATING DECISION TREE
  27. 27. ACT 3Closing Arguments
  28. 28. M O R E P L AY E R P O W E R G A P US deports tourists Predictive Policing FBI GPS surveillance over Tweets PA school district spies NYPD catches gangs exec loses job over HR on students with bragging on Twitter LinkedIn profile webcams updates Public data used by powerful government players resulting in perilous consequences like stop, seizure, arrest, and imprisonment M O R E P R I VAT E P L A C E S
  29. 29. FROM INFERENCES TO ACTIONSFourth Amendment checks gov’t abusesPrinciples of reasonable suspicionGeographic ProfilingCriminal ProfilingReferences  Predictive Policing Andrew Guthrie Ferguson, U of District of Columbia Law http://ssrn.com/abstract_id=2050001  Rethinking Racial Profiling Bernard Harcourt, U Chicago Law http://www.law.uchicago.edu/files/files/rethinking_racial_profiling.pdf  Looking at Prediction from an Economics Perspective Yoram Margalioth http://bernardharcourt.com/documents/margalioth-againstprediction.pdf
  30. 30. REASONABLE SUSPICIONCourts have upheld profilingPredictive information never enough 1. Reliable 2. Efficient 3. Particularized 4. Detailed 5. Timely 6. Corroborated
  31. 31. GEOGRAPHIC PROFILING“Very soon, we will be moving to a predictive policing modelwhere, by studying real time crime patterns, we cananticipate where a crime is likely to occur.” Chief William Bratton, Los Angeles Police Testimony to US House September 24, 2009 predpol.com Profile identifies higher crime area  Small area, 500 sq ft to avoid profiling neighborhoods Must be corroborated by witnessed criminal activity What about police “stops” outside the profiled area?
  32. 32. CRIMINAL PROFILING“Computerized” tips and profiles  Predicting crime for specific individuals  Courts have held that profiling is a reasonable factorViolates punishment theory of equal chances of getting caughtRatcheting creates a closed loop of confusionSelf-fulfilling prophecy by controlling profile
  33. 33. SUMMARYBig data inferences are thought, not crimeSpeech and action could be criminal… So think carefullyCheck us out  Classifier available on http://github.com/inome  APIs for exploring people data at http://developer.inome.com
  34. 34. Jim AdlerVP Data Systems & Chief Privacy Officerinome@jim_adlerhttp://jimadler.me It’s in inome

×