Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data ethics and machine learning
Discrimination, algorithmic bias, and
how to discover them.
DINO PEDRESCHI
KDDLAB, DIPART...
Opportunities of
big data
4
5
Spot business trends
Prevent diseases
Fight crime
Improve transportation
Personalised services
Improve wellbeing
Event Detection
Detecting events in a geographic area
classifying the different kinds of users.
City of Rome
Metropolitan ...
San Pietro
San Giovanni
Circo Massimo
Stadio Olimpico
End users
Traveler
Mobility
Manager
City
Personal mobility assistant
12
Carpooling
Network
Estimating wellbeing with mobility data
AI and Big Data 13
A
B
C
H
W
Predicting GDP with Retail Market data
14
generic utility
function
(rationality)
personal utility
function
(diversity)
Pro...
Risks of big data
15
Big Data, Big Risks
Big data is algorithmic, therefore it cannot be biased! And yet…
• All traditional evils of social dis...
By 2018, 50% of business ethics
violations will occur through
improper use of big data analytics
[source: Gartner, 2016]
A...
AI and Big Data 18
19
The danger of black boxes - 1
The COMPAS score (Correctional Offender Management Profiling for
Alternative Sanctions)
A 13...
The danger of black boxes -2
The three major US credit bureaus, Experian, TransUnion, and
Equifax, providing credit scorin...
The danger of black boxes - 3
In 2010, some homeowners with a regular payment
history of their mortgage reported a sudden ...
The danger of black boxes - 4
During the 1970s and 1980s, St. George’s Hospital
Medical School in London used a computer p...
The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy ...
The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy ...
Deep learning is creating computer
systems we don't fully understand
www.theverge.com/2016/7/12/12158238/first-click-deep-...
Is AI Permanently Inscrutable?
nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable
27
The danger of black boxes - 6
In a recent study at Princeton Univ, the authors show
how the semantics derived automaticall...
Human Bias
AI and Big Data 29
Human Bias can be Learned - 7
AI and Big Data 30
As we stated in our 2008 SIGKDD paper that started the field of
discrimination-aware data mining [PRT08]:
“learning from h...
Policies
BIG DATA ETHICS
Satya Nadella's rules for AI
www.theverge.com/2016/6/29/12057516/satya-nadella-ai-robot-laws
AI and Big Data 33
U.S. – F.T.C.
Salvatore Ruggieri 34
www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or-
exclusion-under...
U.S. – White House
Salvatore Ruggieri 35
www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1
_2014.p...
U.S. – White House
Salvatore Ruggieri
36
www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc
rimina...
U.S. – White House
www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST
C/preparing_for_the_future_...
E.U. - EDPS
Salvatore Ruggieri 38
secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con
sultation/Opinions...
E.U. - EDPS
Salvatore Ruggieri 39
secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con
sultation/Opinions...
Netherlands
www.knaw.nl/en/news/publications/ethical-and-legal-aspects-of-informatics-
research (September 2016)
AI and Bi...
Big Data Ethics
informationaccountability.org/big-data-ethics-initiative/
AI and Big Data 41
Value-Sensitive Design
Design for privacy
Design for security
Design for inclusion
Design for sustainability
Design for de...
EU Projects: SoBigData.eu
Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015,
duration: 2015-2...
Master Universitario Di II Livello
BigData Technology
BigData Sensing&Procurement
BigData Mining
BigData StoryTelling
BigD...
Data ethics
technologies
DISCRIMINATION DISCOVERY FROM DATA
AI and Big Data 46
Discrimination discovery
Given:
◦ an historical database of decision records, each describing
features of an applicant to ...
German Credit dataset
DCUBE: Discrimination Discovery in Databases 48
How? Fight with the same weapons
Idea: use data mining to discover discrimination
◦ the decision policies hidden in a data...
Discrimination discovery from data
FOREIGN_WORKER=yes
& PURPOSE=new_car & HOUSING=own
 CREDIT=bad
◦ elift = 5,19 supp = 5...
 Outcome:
 Funded
 Not funded
 Conditionally funded
Case Study: grant evaluation
51
Dataset attributes
52
Features of the PI
Project costs
Research Area
Project Evaluation
A potentially discriminatory rule
Antecedent
◦ Project proposals in “Physical and Analytical
Chemical Sciences”
◦ Young fe...
Case study: US Harmonized Tariff System
US Harmonized Tariff System (HTS)
https://hts.usitc.gov/
Detailed tariff classific...
AI and Big Data 55
Totes-Isotoner Corp. v. U.S.
Rack Room Shoes Inc. and
Forever 21 Inc. vs U.S.
Court of International Tr...
Sample rule from the HTS dataset
AI and Big Data 56
Soccer Player Ratings
Soccer Player Ratings
How humans
evaluate sports
performance?
Human evaluation line
Technical
features
Machine
performance
Human evaluation line
Technical
features
Technical+Contextual
features
Machine
performance
Wrapping up
AI AND BIG DATA 62
Right of explanation
• Applying AI within many domains requires
transparency and responsibility:
• health care
• finance
•...
Accountability
“Why exactly was my loan application rejected?”
“What could I have done differently so that my application
...
Social Mining & Big Data Ecosystem
www.sobigdata.eu
66
Knowledge Discovery
& Data Mining Lab
http://kdd.isti.cnr.it
Special thanks
• Salvatore Ruggieri
• Franco Turini
• Fosca Giannotti
• Anna Monreale
• Luca Pappalardo
SMARTCATs
Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi
Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi
Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi
Upcoming SlideShare
Loading in …5
×

Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi

1,475 views

Published on

Machine learning and data mining algorithms construct predictive models and decision making systems based on big data. Big data are the digital traces of human activities - opinions, preferences, movements, lifestyles, ... - hence they reflect all human biases and prejudices. Therefore, the models learnt from big data may inherit all such biases, leading to discriminatory decisions. In my talk, I discuss many real examples, from crime prediction to credit scoring to image recognition, and how we can tackle the problem of discovering discrimination using the very same approach: data mining.

Published in: Data & Analytics
  • She hasn't even mentioned my snoring!! When I read the story on your website I understood EXACTLY what you were talking about. I have been single for years because my snoring is so loud. As soon as I get to the stage where a girl stays over, I never hear from them again. Your program has taken my snoring down to a low hum. I now have a girlfriend and she hasn't even mentioned my snoring!! ♥♥♥ http://ishbv.com/snoringno/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download Full EPUB Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download Full doc Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download PDF EBOOK here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download EPUB Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download doc Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • New NFL Bankroll doubler for you. Fully Verified Proof [inside] ★★★ http://scamcb.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi

  1. 1. Data ethics and machine learning Discrimination, algorithmic bias, and how to discover them. DINO PEDRESCHI KDDLAB, DIPARTIMENTO DI INFORMATICA, UNIVERSITÀ DI PISA
  2. 2. Opportunities of big data 4
  3. 3. 5 Spot business trends Prevent diseases Fight crime Improve transportation Personalised services Improve wellbeing
  4. 4. Event Detection Detecting events in a geographic area classifying the different kinds of users. City of Rome Metropolitan area Covered geographical region: city of Rome Dataset size per snapshot: ≈ 1.2 GBytes per day Number of records: ≈ 5.6 million lines per day 8 months between 2015 and 2016
  5. 5. San Pietro
  6. 6. San Giovanni
  7. 7. Circo Massimo
  8. 8. Stadio Olimpico
  9. 9. End users Traveler Mobility Manager City
  10. 10. Personal mobility assistant 12 Carpooling Network
  11. 11. Estimating wellbeing with mobility data AI and Big Data 13 A B C H W
  12. 12. Predicting GDP with Retail Market data 14 generic utility function (rationality) personal utility function (diversity) Product Price Quantity Needed Sophistication R2 = 17.25% R2 = 32.38% R2 = 85.72%
  13. 13. Risks of big data 15
  14. 14. Big Data, Big Risks Big data is algorithmic, therefore it cannot be biased! And yet… • All traditional evils of social discrimination, and many new ones, exhibit themselves in the big data ecosystem • Because of its tremendous power, massive data analysis must be used responsibly • Technology alone won’t do: also need policy, user involvement and education efforts 16
  15. 15. By 2018, 50% of business ethics violations will occur through improper use of big data analytics [source: Gartner, 2016] AI and Big Data 17
  16. 16. AI and Big Data 18
  17. 17. 19
  18. 18. The danger of black boxes - 1 The COMPAS score (Correctional Offender Management Profiling for Alternative Sanctions) A 137-questions questionnaire and a predictive model for “risk of crime recidivism.” The model is a proprietary secret of Northpointe, Inc. The data journalists at propublica.org have shown that • the prediction accuracy of recidivism is rather low (around 60%) • the model has a strong ethnic bias ◦ blacks who did not reoffend are classified as high risk twice as much as whites who did not reoffend ◦ whites who did reoffend were classified as low risk twice as much as blacks who did reoffend. AI and Big Data 20
  19. 19. The danger of black boxes -2 The three major US credit bureaus, Experian, TransUnion, and Equifax, providing credit scoring for millions of individuals, are often discordant. In a study of 500,000 records, 29% of consumers received credit scores that differ by at least fifty points between credit bureaus, a difference that may mean tens of thousands dollars over the life of a mortgage [CRS+16]. AI and Big Data 21
  20. 20. The danger of black boxes - 3 In 2010, some homeowners with a regular payment history of their mortgage reported a sudden drop of forty points in their credit score, soon after their own enquiry. AI and Big Data 22
  21. 21. The danger of black boxes - 4 During the 1970s and 1980s, St. George’s Hospital Medical School in London used a computer program for initial screening of job applicants. The program used information from applicants’ forms, which contained no reference to ethnicity. The program was found to unfairly discriminate against female applicants and ethnic minorities (inferred from surnames and place of birth), less likely to be selected for interview [LM88]. AI and Big Data 23
  22. 22. The danger of black boxes - 5 In a recent paper at SIGKDD 2016 [RSG16] the authors show how an accurate but untrustworthy classifier may result from an accidental bias in the training data. In a task of discriminating wolves from huskies in a dataset of images, the resulting deep learning model is shown to classify a wolf in a picture based solely on … AI and Big Data 24
  23. 23. The danger of black boxes - 5 In a recent paper at SIGKDD 2016 [RSG16] the authors show how an accurate but untrustworthy classifier may result from an accidental bias in the training data. In a task of discriminating wolves from huskies in a dataset of images, the resulting deep learning model is shown to classify a wolf in a picture based solely on … the presence of snow in the background! [RSG16] “Why Should I Trust You?” Explaining the Predictions of Any Classifier SIGKDD 2016 Conference Paper AI and Big Data 25
  24. 24. Deep learning is creating computer systems we don't fully understand www.theverge.com/2016/7/12/12158238/first-click-deep-learning-algorithmic- black-boxes AI and Big Data 26
  25. 25. Is AI Permanently Inscrutable? nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable 27
  26. 26. The danger of black boxes - 6 In a recent study at Princeton Univ, the authors show how the semantics derived automatically from large text/web corpora contains human biases ◦ E.g., names associated with whites were found to be significantly easier to associate with pleasant than unpleasant terms, compared to names associated with black people. Therefore, any machine learning model trained on text data for, e.g., sentiment or opinion mining has a strong chance of inheriting the prejudices reflected in the human-produced training data. AI and Big Data 28
  27. 27. Human Bias AI and Big Data 29
  28. 28. Human Bias can be Learned - 7 AI and Big Data 30
  29. 29. As we stated in our 2008 SIGKDD paper that started the field of discrimination-aware data mining [PRT08]: “learning from historical data recording human decision making may mean to discover traditional prejudices that are endemic in reality, and to assign to such practices the status of general rules, maybe unconsciously, as these rules can be deeply hidden within the learned classifier.” AI and Big Data 31
  30. 30. Policies BIG DATA ETHICS
  31. 31. Satya Nadella's rules for AI www.theverge.com/2016/6/29/12057516/satya-nadella-ai-robot-laws AI and Big Data 33
  32. 32. U.S. – F.T.C. Salvatore Ruggieri 34 www.ftc.gov/system/files/documents/reports/big-data-tool-inclusion-or- exclusion-understanding-issues/160106big-data-rpt.pdf (Sept. 2014)
  33. 33. U.S. – White House Salvatore Ruggieri 35 www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1 _2014.pdf (May 2014)
  34. 34. U.S. – White House Salvatore Ruggieri 36 www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc rimination.pdf (May 2016)
  35. 35. U.S. – White House www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST C/preparing_for_the_future_of_ai.pdf (October 2016) AI and Big Data 37
  36. 36. E.U. - EDPS Salvatore Ruggieri 38 secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con sultation/Opinions/2015/15-11-19_Big_Data_EN.pdf
  37. 37. E.U. - EDPS Salvatore Ruggieri 39 secure.edps.europa.eu/EDPSWEB/webdav/site/mySite/shared/Documents/Con sultation/Opinions/2015/15-09-11_Data_Ethics_EN.pdf
  38. 38. Netherlands www.knaw.nl/en/news/publications/ethical-and-legal-aspects-of-informatics- research (September 2016) AI and Big Data 40
  39. 39. Big Data Ethics informationaccountability.org/big-data-ethics-initiative/ AI and Big Data 41
  40. 40. Value-Sensitive Design Design for privacy Design for security Design for inclusion Design for sustainability Design for democracy Design for safety Design for transparency Design for accountability Design for human capabilities AI and Big Data 42
  41. 41. EU Projects: SoBigData.eu Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015, duration: 2015-2019, www.sobigdata.eu AI and Big Data 43
  42. 42. Master Universitario Di II Livello BigData Technology BigData Sensing&Procurement BigData Mining BigData StoryTelling BigData Ethics Il Master Big Data ha l’obiettivo di formare“data scientists”,dei professionisti dotati di un mix di competenze multidisciplinari che permettono non solo di acquisire dati ed estrarne conos- cenza, ma anche di raccontare“storie” attraverso questi dati, a supporto delle decisioni, della creatività e dello sviluppo di servizi innovativi, e di saper gestire le ripercussioni etiche e legali dei Big Data, che spesso contengono informazioni personali e suscitano problematiche relative alla privacy, alla trasparenza,alla consapevolezza. Aree di innovazione socio-economica: BigData for Social Good BigData forBusiness Big Data AnalyticsESocial Mining SoBigData Data Ethics Literacy Rapporto MIUR su Big Data, 28 Luglio 2016 ◦ www.istruzione.it/allegati/2016/bigdata.pdf Master UNIPI in Big Data Analytics & Social Mining ◦ masterbigdata.it AI and Big Data 44
  43. 43. Data ethics technologies DISCRIMINATION DISCOVERY FROM DATA
  44. 44. AI and Big Data 46
  45. 45. Discrimination discovery Given: ◦ an historical database of decision records, each describing features of an applicant to a benefit ◦ e.g., a credit request to a bank and the corresponding on credit approval/denial ◦ some designated categories of applicants, such as groups protected by anti-discrimination laws, find whether, and in which circumstances, there are evidences of discrimination of the designated categories that emerge from the data. DCUBE: Discrimination Discovery in Databases 47
  46. 46. German Credit dataset DCUBE: Discrimination Discovery in Databases 48
  47. 47. How? Fight with the same weapons Idea: use data mining to discover discrimination ◦ the decision policies hidden in a database can be represented by decision rules and discovered by frequent pattern mining ◦ Once found all such decision rules, highlight all potential niches of discrimination by filtering the rules using a measure that quantifies the discrimination risk. DCUBE: Discrimination Discovery in Databases 49
  48. 48. Discrimination discovery from data FOREIGN_WORKER=yes & PURPOSE=new_car & HOUSING=own  CREDIT=bad ◦ elift = 5,19 supp = 56 conf = 0,37 elift = 5,19 means that foreign workers have more than 5 times more probability of being refused credit than the average population (even if they own their house). 50
  49. 49.  Outcome:  Funded  Not funded  Conditionally funded Case Study: grant evaluation 51
  50. 50. Dataset attributes 52 Features of the PI Project costs Research Area Project Evaluation
  51. 51. A potentially discriminatory rule Antecedent ◦ Project proposals in “Physical and Analytical Chemical Sciences” ◦ Young females ◦ Total cost of 1,358,000 Euros or above Possible interpretation ◦ “Peer-reviewers of panel PE4 trusted young females requiring high budgets less than males leading similar projects” 53
  52. 52. Case study: US Harmonized Tariff System US Harmonized Tariff System (HTS) https://hts.usitc.gov/ Detailed tariff classification system for merchandise imported to US Chapter 61, 62, 64, 65: apparels ◦ Different taxes for same garments separately produced for male and female ◦ Description is at semi-structured form 64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and girls 38.6¢/kg + 10%08.9%Men and boys CoatsFur felt hatsCotton pajamas Different taxes for same apparels for men and women 64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and girls 38.6¢/kg + 10%08.9%Men and boys CoatsFur felt hatsCotton pajamas Different taxes for same apparels for men and women 54 Women: 14% Men: 9% 1.3 billions USD!!!
  53. 53. AI and Big Data 55 Totes-Isotoner Corp. v. U.S. Rack Room Shoes Inc. and Forever 21 Inc. vs U.S. Court of International Trade U.S. Court of Appeals for the Federal Circuit (2014) “[…] the courts may have concluded that Congress had no discriminatory intent when ruling the HTS, but there is little doubt that gender-based tariffs have discriminatory impact”
  54. 54. Sample rule from the HTS dataset AI and Big Data 56
  55. 55. Soccer Player Ratings
  56. 56. Soccer Player Ratings How humans evaluate sports performance?
  57. 57. Human evaluation line Technical features Machine performance
  58. 58. Human evaluation line Technical features Technical+Contextual features Machine performance
  59. 59. Wrapping up AI AND BIG DATA 62
  60. 60. Right of explanation • Applying AI within many domains requires transparency and responsibility: • health care • finance • surveillance • autonomous vehicles • Government • EU General Data Protection Regulation (April 2016) establishes (?) a right of explanation for all individuals to obtain “meaningful explanations of the logic involved” when automated (algorithmic) individual decision- making, including profiling, takes place. • In sharp contrast, (big) data-driven AI/ML models are often black boxes. AI and Big Data 63
  61. 61. Accountability “Why exactly was my loan application rejected?” “What could I have done differently so that my application would not have been rejected?” AI and Big Data 64
  62. 62. Social Mining & Big Data Ecosystem www.sobigdata.eu
  63. 63. 66 Knowledge Discovery & Data Mining Lab http://kdd.isti.cnr.it
  64. 64. Special thanks • Salvatore Ruggieri • Franco Turini • Fosca Giannotti • Anna Monreale • Luca Pappalardo SMARTCATs

×