• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data is a Hotbed of Thoughtcrime, Part II: The Code

Big Data is a Hotbed of Thoughtcrime, Part II: The Code



Strata Conference ...

Strata Conference
Santa Clara, CA
Feb 27, 2013

At Strata 2012 in New York, we discussed the hazards of curbing big data inferences by defining a new category of thoughtcrime. After all, acting on thoughts might constitute a crime, but thoughts, in isolation, cannot be criminal. It’s time to go deeper. Let’s create and evaluate a predictive criminal model that highlights where the sensitivities lie, both technically and ethically.

Over the last decade, Intelius has built a people-centric big data platform — what we call the inome platform. We’ll use it and our criminal database of several hundred million U.S. criminal records to train and evaluate a predictive criminal model. As part of this talk, we’ll release the model and some of the inome machine-learning scaffolding code.

What makes big data so scary is that, for the first time, we are leveraging huge data mines to make inferences outside the wisdom of our own minds. Is it possible to predict, with meaningful recall and acceptable precision, who might commit a crime? We’ll showcase our model’s shortcomings due to inescapable precision/recall trade-offs — false negatives miss criminals while false positives indict the innocent. And even if we could build a perfect predictor, does a powerful government have the right to use it and eclipse free will?



Total Views
Views on SlideShare
Embed Views



6 Embeds 134

https://getappcase.com 114
https://twitter.com 11
http://www.linkedin.com 6
http://reader.aol.com 1
https://www.linkedin.com 1
https://www.rebelmouse.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Big Data is a Hotbed of Thoughtcrime, Part II: The Code Big Data is a Hotbed of Thoughtcrime, Part II: The Code Presentation Transcript

    • Jim AdlerVP Data Systems & Chief Privacy Officerinome@jim_adlerhttp://jimadler.me inome The Genomics of How We All Fit Together
    • OVERTURE & 3 ACTS1. About inome2. Strata Redux3. Felon Classifier4. Closing Arguments
    • IntelligenceI am not an Geek Dweeb Attorney Nerd Social Obsession Dork Ineptitude
    • ABOUT INOMEReal-time, person-centricdata engineStructured andunstructured data10 years in the makingScalable – serves over 1million visitors a dayAPIs support 3rd party apps– http://developer.inome.com
    • When towns were small …
    • inome is bringing the “local village” back
    • HOW INOME SOLVES THEBillions of Records “BIG DATA” PEOPLE PROBLEM Millions of People 213 records mapped to the correct 37 Jim Adlers Philip Collins Randolph Jim Adler Hutchins Jim Adler 375 5 People McKinney, TX People 213 Records 37 People Jim Adler Age 57 Gwen Houston, TX Fleming Carol Brooks 2 Age 68 People 9800 Records Jim Adler 1250 People Hastings, NE Age 32 Jim Adler Canaan, NH Age 59 Jim Adler Redmond, WA Age 48 Jim Adler Denver, CO Age 48
    • THE INOME ENGINE Names Places Phones Court Records Data Data News/Blogs Acquisition Exchange Professional Relatives Acquire, Standardize, Friends Validate, Extract Colleagues FeaturesFull Text Search Machine Index Learners Clustering BlockingDocument http://developer.inome.com Store APIs
    • ACT 1Strata Redux
    • … the essential crime that contained all others in itself. Thoughtcrime, they called it." George Orwell"Watch your thoughts, they become words.Watch your words, they become actions.Watch your actions, they become habits.Watch your habits, they become your character.Watch your character, it becomes your destiny.” Lao Tzu
    • THE PLACES-PLAYERS-PERILS PRIVACY FRAMEWORK P R IVAC Y PERILS http://jimadler.me/post/14171086020/creepy-is-as-creepy-does http://jimadler.me/post/18618791545/strata-2012-is-privacy-a-big-data-prison
    • M O R E P L AY E R P O W E R G A P PLACES-PLAYERS-PERILS CASES US deports tourists over Predictive Policing FBI GPS surveillance Tweets Google privacy policy unification Target finds out teen PA school district spies NYPD catches gangs pregnant before parents on students with bragging on Twitter HR exec loses job over LinkedIn profile updates webcams Disney tracks kids without parental consent Carrier IQ logging News of the World phone location hacking Netflix shares your movie picks Woman caught naked by Actress sues IMDB over iPhone caching location Google Street View revealing her age GM OnStar tracks users Craigslist prostitution client exposure Rutgers student commits FB user sets fire to home suicide after spied by after de-friending webcam M O R E P R I VAT E P L A C E S
    • ACT 2 Felon ClassifierContributorsJeremy Kahn, Senior ScientistDeepak Konidena, Software Engineer
    • THE CLASSIFIER’S GOALIf someone has minor offenses on their criminal record,do they also have any felonies?
    • MOTIVATIONSAsk the hard questionsConvene the suits, wonks, and geeksDrive responsible innovationExplore the data & showcase the technology
    • A FEW DEFINITIONSDefinition  Positive  Has at least one felony  Negative  Has no felonies but does have lesser offensesClassifier Performance  True Positive  Correctly identifies a felon  True Negative  Correctly ignores someone who isn’t a felon  False Positive  Incorrectly identifies a felon who isn’t one  False Negative  Incorrectly ignores a felon
    • DATA EXTRACTION AND CLEANSING Data Acquisition Data Exchange Clustering Blocking Linking 250 M 40 M State NoiseDefendants Defendants Fan-Out Filter(avro files) INOME ENGINE
    • EXAMPLE DATAPrediction Data key: e926f511b7f8289c64130a266c66411e val: offenses: - {CaseID: MDAOC206059-2, CaseInfo: CASE DISPO: TRIAL, CJIS CODE: 3 5010, Disposition: STET, Key: hyg-MDAOC206059, OffenseClass: M, OffenseCount: 2, OffenseDate: 20041205, OffenseDesc: THEFT:LESS $500 VALUE} - {CaseID: MDAOC206060-1, CaseInfo: CASE DISPO: TRIAL, CJIS CODE: 1 4803, Disposition: GUILTY, Key: hyg-MDAOC206060, OffenseClass: M, OffenseCount: 1, OffenseDate: 20040928, OffenseDesc: FALSE STATEMENT TO OFFICER} profile: {BodyMarks: TAT L ARM; ,TAT L SHLD: N/A; ,TAT R ARM: N/A; ,TAT R SHLD: N/A; ,TAT RF ARM; ,TAT UL ARM; ,TAT UR AR, DOB: 19711206, DOB.Completeness: 111, EyeColor: HAZEL, Gender: m, HairColor: BROWN, Height: 58", SkinColor: FAIR, State: DE,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD,MD’, Weight: 180 LBS}Training Labels key: e926f511b7f8289c64130a266c66411e val: label: true offenses: - {CaseID: MDAOC206065-4, CaseInfo: CASE DISPO: TRIAL, CJIS CODE: 1 6501, Disposition: NOLLE PROSEQUI, Key: hyg-MDAOC206065, OffenseClass: F, OffenseCount: 1, OffenseDesc: ARSON 2ND DEGREE}
    • Model Training INOME Person Profile Prediction Non-Felony Profile Data Offense Information Information Features Learn Model Training Felony Labels Offense InformationModel Operation INOME Person Profile Prediction Non-Felony Person Data Offense Model Has any felonies? Information Information
    • MODEL FEATURES Personal Profile Criminal ProfilePerson.NumBodyMarks Offenses.NumOffenses Person.HasTattoo Offenses.OnlyTraffic Person.IsMale Person.HairColor Person.EyeColor Person.SkinColor
    • EXAMPLE FEATUREclass EyeColor(Extractor): normalizer = { bro: brown’,blu: blue, blk: black, hzl: hazel’, haz’: hazel’, grn: green’} schema = {type: enum, name: EyeColors, symbols: (black, brown, hazel, blue, green, other, unknown)} def extract(self, record): recorded = record[profile].get(EyeColor, None) if recorded is None: return unknown recorded = recorded.lower() if recorded in self.normalizer: recorded = self.normalizer[recorded] for i in self.schema[symbols]: if recorded.startswith(i): recorded = i if recorded in self.schema[symbols]: return recorded else: return other
    • THE CODEGasket – an inome functional toolset for data extraction  Avro, Json, and YamlGemini – an inome framework for feature extraction and learning  Domain knowledge feature extractors  Model construction from features and labelsFelon detector available now: http://github.com/inome/strataconf-2013-sc
    • FELON CLASSIFIER PERFORMANCE 100.0% False Negative Rate 80.0% Threshold: 1.01 FP Rate: 1%A N A R C H Y FN Rate: 40% 60.0% Threshold: 0.66 40.0% FP Rate: 5% FN Rate: 22% 20.0% Threshold: -1.82 FP Rate: 19% FN Rate: 0% 0.0% 0.0% 5.0% 10.0% 15.0% 20.0% False Positive Rate T Y R A N N Y
    • ACT 3Closing Arguments
    • M O R E P L AY E R P O W E R G A P US deports tourists Predictive Policing FBI GPS surveillance over Tweets PA school district spies NYPD catches gangs exec loses job over HR on students with bragging on Twitter LinkedIn profile webcams updates Public data used by powerful government players resulting in perilous consequences like stop, seizure, arrest, and imprisonment M O R E P R I VAT E P L A C E S
    • FROM INFERENCES TO ACTIONSFourth Amendment checks gov’t abusesPrinciples of reasonable suspicionGeographic ProfilingCriminal ProfilingReferences  Predictive Policing Andrew Guthrie Ferguson, U of District of Columbia Law http://ssrn.com/abstract_id=2050001  Rethinking Racial Profiling Bernard Harcourt, U Chicago Law http://www.law.uchicago.edu/files/files/rethinking_racial_profiling.pdf  Looking at Prediction from an Economics Perspective Yoram Margalioth http://bernardharcourt.com/documents/margalioth-againstprediction.pdf
    • REASONABLE SUSPICIONCourts have upheld profilingPredictive information never enough 1. Reliable 2. Efficient 3. Particularized 4. Detailed 5. Timely 6. Corroborated
    • GEOGRAPHIC PROFILING“Very soon, we will be moving to a predictive policing modelwhere, by studying real time crime patterns, we cananticipate where a crime is likely to occur.” Chief William Bratton, Los Angeles Police Testimony to US House September 24, 2009 predpol.com Profile identifies higher crime area  Small area, 500 sq ft to avoid profiling neighborhoods Must be corroborated by witnessed criminal activity What about police “stops” outside the profiled area?
    • CRIMINAL PROFILING“Computerized” tips and profiles  Predicting crime for specific individuals  Courts have held that profiling is a reasonable factorViolates punishment theory of equal chances of getting caughtRatcheting creates a closed loop of confusionSelf-fulfilling prophecy by controlling profile
    • SUMMARYBig data inferences are thought, not crimeSpeech and action could be criminal… So think carefullyCheck us out  Classifier available on http://github.com/inome  APIs for exploring people data at http://developer.inome.com
    • Jim AdlerVP Data Systems & Chief Privacy Officerinome@jim_adlerhttp://jimadler.me It’s in inome