Pragmatic ethical and fair AI for data scientists

David Graus
David GrausLead Data Scientist
••
DDMA Monthly Meetup AI @ online, 8 April 2021
Pragmatic ethical and fair AI for
data scientists
šŸ‘Ø David Graus
āœ‰ david.graus@randstadgroep.nl
🌐 www.graus.nu
🐦 @dvdgrs
⚠ Disclaimer
This slide deck contains opinion: mine (not my employer’s)
Motivation
Narrative: From Negative
Motivation
Narrative: From Negative to Positive
Topic 1: 

āš– Fair AI in recruitment & HR
bias is everywhere
in humans…
ā€œAll the curricula vitae actually came
from a real-life scientist [...] but the
names were changed to traditional
male and female names. 

[…]

Both men and women were more
likely to hire a male job applicant
than a female job applicant with an
identical record.ā€
Rhea E. Steinpreis, Katie A. Anders and Dawn Ritzke. The Impact of Gender on the Review 

of the Curriculum Vitae of Job Applicants and Tenure Candidates. Sex Roles, Springer.
bias is everywhere
in humans…
ā€œWhiteā€ names receive 50 percent
more callbacks for interviews [than
ā€œAfrican-Americanā€ names]. Results
suggest that racial discrimination
is still a prominent feature of the
labor market.
Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable 

than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. NBER.
bias is everywhere
…and algorithms
Amazon’s system taught itself that
male candidates were preferable. It
penalized rƩsumƩs that included
the word ā€œwomen’sā€, as in
ā€œwomen’s chess club captainā€. And
it downgraded graduates of two
all-women’s colleges.
Amazon ditched AI recruiting tool that favored men for technical jobs

[https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-

recruiting-engine]
bias is everywhere
…and algorithms
ā€œWe examine gender inequality on
the resume search engines. We ran
queries on each site’s resume
search engine for 35 job titles. [...]
even when controlling for all other
visible candidate features, there is a
slight penalty against feminine
candidates.ā€
Le Chen, Ruijun Ma, Anikó HannĆ”k, and Christo Wilson. Investigating the Impact 

of Gender on Rank in Resume Search Engines. CHI 2018, ACM.
bias is everywhere
bias is everywhere
bias is everywhere
bias is everywhere
bias is everywhere
šŸ¦¾šŸ¤– AI to the rescue
Fair AI #1
Representational ranking
ā€œrepresentation re-rankingā€ makes
sure the proportion of female
candidates shown is the same as
the corresponding proportion of
profiles matching that query.
Fair AI #1
Representational ranking
ā€œrepresentation re-rankingā€ makes
sure the proportion of female
candidates shown is the same as
the corresponding proportion of
profiles matching that query.
ā€œIn the US, a form of demographic parity
is required by the Equal Employment
Opportunity Commission (EEOC)
according to the 4/5ths (or P%) rule.ā€


https://www.ifow.org/publications/artificial-intelligence-in-hiring-
assessing-impacts-on-equality
Fair AI #1
Representational ranking
ā€œrepresentation re-rankingā€ makes
sure the proportion of female
candidates shown is the same as
the corresponding proportion of
profiles matching that query.

• Results in improvement in fairness
metrics, without statistically significant
change in the business metrics.

• Deployed to all LinkedIn Recruiter users
worldwide.
Human bias mitigation through (over)compensation
• Can we adjust rankings to ā€œfixā€ human bias?

• Yes: balancing gender representation in
candidate slates can correct biases for
some professions.

• No: although doing so has no impact on
professions where human persistent
preferences are at play, e.g., nannies and
gynaecologists. 

• Gender of decision-maker, complexity of the
decision-making task and over- and under-
representation of genders in the candidate
slate can all impact the final decision.
Fair AI #2
Topic 2: 

āš– Value-driven news recommendation
(Editorial) value-driven AI
Context

Het Financieele Dagblad
Goal

Article recommendations (for reader)

upholding editorial values (of provider)
Editorial values?
• Participated in a study by Bastian and Helberger [1]

• ā€œ[Conducted] semi-structured interviews with employees from different departments (journalists,
data scientists, product managers), it explores the costs and benefits of value articulation and a
mission-sensitive approach in algorithmic news distribution from the newsroom perspective.ā€
• Resulting values;

1. surprise readers

2. timely and fresh news

3. diverse reading behavior

4. cover more articles
[1] Bastian & Helberger. Safeguarding the journalistic DNA. Future of journalism 2019.
[2] Ge et al. Beyond Accuracy: Evaluating Recommender Systems by Coverage 

and Serendipity. RecSys 2010
āž” Serendipity
āž” Dynamism
āž” Diversity
āž” Coverage
RQ1
Does FD’s recommender system steer users to useful recommendations?
• Compare usefulness between recommended and manually curated articles

• 115 users

• One month of rankings (August, 2019)
RQ1
Does FD’s recommender system steer users to useful recommendations?
• Compare usefulness between recommended and manually curated articles

• 115 users

• One month of rankings (August, 2019)
Serendipity, Dynamism, Diversity, Coverage

āž”
Metric
• Intra-list diversity

• For four article attributes: Sections, Tags,
Authors, Word Embeddings

Results
• recommendations are more diverse in article
topic/content

• manual curation is more diverse in authors

• both are diverse in tags
Usefulness 1: Diversity
Metric
• Inter-list diversity

Results
• when recommendations change, they tend to
change more
Usefulness 2: Dynamism
Metric
• articles’ average dissimilarity to a reader’s historic
articles 

• (same attributes as diversity)

Results
• Manual curation yields more serendipitous
rankings in terms of tags and authors 

• Recommended articles are more serendipitous in
content
Usefulness 3: Serendipity
Metric
• percentage of daily published articles that are
served

Results
• per user the recommendations provide a narrow
set of articles

• across all users, the overall coverage of
recommended articles is much higher than the
manual curated articles
Usefulness 4: Coverage
RQ1
Does FD’s recommender system steer users to useful recommendations?
āœ…
RQ2
Can we effectively adjust our news recommender to steer our readers
towards more dynamic reading behavior, without loss of accuracy?
Approach: ⚠ Intervention study
• Single usefulness treatment: Dynamism

• avoid exposing readers to sub-optimal rankings

• constrained by technical requirements
RQ2
Can we effectively adjust our news recommender to steer our readers
towards more dynamic reading behavior, without loss of accuracy?
Method: Online A/B test
• Control: the original recommender system
• Variant: the adjusted recommender system, steered towards more dynamic recommendations
• 2 weeks (November 25 to December 4, 2019)

• 1,108 users

• Each randomly assigned to one of the two treatments
Metric
Dynamism
Results
Dynamism
Accuracy
Dynamism
RQ2
Can we effectively adjust our news recommender to steer our readers
towards more dynamic reading behavior, without loss of accuracy?
āœ…
Take homes
Recommendations can benefit both news providers and readers
•Fairness and ethics are time-, culture-, and context-dependent

•Organizational/editorial/ethical values can be ā€˜operationalized’ in algorithms

•Data scientist: don’t try to define fairness (it’s not your job, nor expertise)

•But talk to stakeholders in your organization!

•Come up with a shared definition (combining what you can achieve
technically + what you want to achieve ā€œconceptuallyā€)

•Build! 🦾
Fin

thank you for your attention
discuss.
1 of 36

More Related Content

What's hot(20)

Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Artificial Intelligence Institute at UofSC•2.4K views
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
Marianne Sweeny•2.2K views
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumNetnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
University of Southern California•16.4K views
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataAltmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
Toronto Metropolitan University•1.1K views
Netnography webinarNetnography webinar
Netnography webinar
suresh sood•35.1K views
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
Marianne Sweeny•1.4K views
Nrha 2010 racNrha 2010 rac
Nrha 2010 rac
aubreymm•373 views

Similar to Pragmatic ethical and fair AI for data scientists(20)

Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
Krishnaram Kenthapadi•1.6K views
Three Basic Job Evaluation MethodsThree Basic Job Evaluation Methods
Three Basic Job Evaluation Methods
Ashley Lovato•3 views
Model bias in AIModel bias in AI
Model bias in AI
Jason Tamara Widjaja•825 views
Sleeping with Cognitive CapitalSleeping with Cognitive Capital
Sleeping with Cognitive Capital
Quantifed Self Meetup Brussels•1.5K views
Qs1 group a Qs1 group a
Qs1 group a
UKSG: connecting the knowledge community•1.1K views
Ai demystified for HR and TA leadersAi demystified for HR and TA leaders
Ai demystified for HR and TA leaders
Antonia Macrides•539 views
Myths and Realities of Psychometric TestingMyths and Realities of Psychometric Testing
Myths and Realities of Psychometric Testing
OPRA Psychology Group•1.2K views
2012014050600320120140506003
20120140506003
IAEME Publication•274 views
ElsevierElsevier
Elsevier
Christina Azzam•1.1K views
The Top 11 PR Research Insights of 2017The Top 11 PR Research Insights of 2017
The Top 11 PR Research Insights of 2017
sjackson625•4.7K views
humaniki User Research Report humaniki User Research Report
humaniki User Research Report
Sejal Khatri•37 views
Antifragile-Organisation-DesignAntifragile-Organisation-Design
Antifragile-Organisation-Design
Jo Martens•402 views
Pallak Arora- Gender AnalysisPallak Arora- Gender Analysis
Pallak Arora- Gender Analysis
Pallak Arora, MSc.•1.9K views

More from David Graus(20)

Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus•2.8K views
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
David Graus•415 views
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus•530 views
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus•939 views

Recently uploaded(20)

How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra•10 views
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra1•5 views
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE90•9 views
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIA
Federico Karagulian•5 views
Microsoft Fabric.pptxMicrosoft Fabric.pptx
Microsoft Fabric.pptx
Shruti Chaurasia•17 views
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdf
stuartmcphersonflipm•286 views
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar•7 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej•6 views
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen01•6 views
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann•88 views
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials•5 views
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam •12 views

Pragmatic ethical and fair AI for data scientists

  • 1. DDMA Monthly Meetup AI @ online, 8 April 2021 Pragmatic ethical and fair AI for data scientists šŸ‘Ø David Graus āœ‰ david.graus@randstadgroep.nl 🌐 www.graus.nu 🐦 @dvdgrs
  • 2. ⚠ Disclaimer This slide deck contains opinion: mine (not my employer’s)
  • 5. Topic 1: āš– Fair AI in recruitment & HR
  • 6. bias is everywhere in humans… ā€œAll the curricula vitae actually came from a real-life scientist [...] but the names were changed to traditional male and female names. 
 […]
 Both men and women were more likely to hire a male job applicant than a female job applicant with an identical record.ā€ Rhea E. Steinpreis, Katie A. Anders and Dawn Ritzke. The Impact of Gender on the Review 
 of the Curriculum Vitae of Job Applicants and Tenure Candidates. Sex Roles, Springer.
  • 7. bias is everywhere in humans… ā€œWhiteā€ names receive 50 percent more callbacks for interviews [than ā€œAfrican-Americanā€ names]. Results suggest that racial discrimination is still a prominent feature of the labor market. Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable 
 than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. NBER.
  • 8. bias is everywhere …and algorithms Amazon’s system taught itself that male candidates were preferable. It penalized rĆ©sumĆ©s that included the word ā€œwomen’sā€, as in ā€œwomen’s chess club captainā€. And it downgraded graduates of two all-women’s colleges. Amazon ditched AI recruiting tool that favored men for technical jobs
 [https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-
 recruiting-engine]
  • 9. bias is everywhere …and algorithms ā€œWe examine gender inequality on the resume search engines. We ran queries on each site’s resume search engine for 35 job titles. [...] even when controlling for all other visible candidate features, there is a slight penalty against feminine candidates.ā€ Le Chen, Ruijun Ma, Anikó HannĆ”k, and Christo Wilson. Investigating the Impact 
 of Gender on Rank in Resume Search Engines. CHI 2018, ACM.
  • 16. Fair AI #1 Representational ranking ā€œrepresentation re-rankingā€ makes sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query.
  • 17. Fair AI #1 Representational ranking ā€œrepresentation re-rankingā€ makes sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query. ā€œIn the US, a form of demographic parity is required by the Equal Employment Opportunity Commission (EEOC) according to the 4/5ths (or P%) rule.ā€ https://www.ifow.org/publications/artificial-intelligence-in-hiring- assessing-impacts-on-equality
  • 18. Fair AI #1 Representational ranking ā€œrepresentation re-rankingā€ makes sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query. • Results in improvement in fairness metrics, without statistically significant change in the business metrics. • Deployed to all LinkedIn Recruiter users worldwide.
  • 19. Human bias mitigation through (over)compensation • Can we adjust rankings to ā€œfixā€ human bias? • Yes: balancing gender representation in candidate slates can correct biases for some professions. • No: although doing so has no impact on professions where human persistent preferences are at play, e.g., nannies and gynaecologists. • Gender of decision-maker, complexity of the decision-making task and over- and under- representation of genders in the candidate slate can all impact the final decision. Fair AI #2
  • 20. Topic 2: āš– Value-driven news recommendation
  • 21. (Editorial) value-driven AI Context
 Het Financieele Dagblad Goal
 Article recommendations (for reader)
 upholding editorial values (of provider)
  • 22. Editorial values? • Participated in a study by Bastian and Helberger [1] • ā€œ[Conducted] semi-structured interviews with employees from different departments (journalists, data scientists, product managers), it explores the costs and benefits of value articulation and a mission-sensitive approach in algorithmic news distribution from the newsroom perspective.ā€ • Resulting values; 1. surprise readers 2. timely and fresh news 3. diverse reading behavior 4. cover more articles [1] Bastian & Helberger. Safeguarding the journalistic DNA. Future of journalism 2019. [2] Ge et al. Beyond Accuracy: Evaluating Recommender Systems by Coverage 
 and Serendipity. RecSys 2010 āž” Serendipity āž” Dynamism āž” Diversity āž” Coverage
  • 23. RQ1 Does FD’s recommender system steer users to useful recommendations? • Compare usefulness between recommended and manually curated articles • 115 users • One month of rankings (August, 2019)
  • 24. RQ1 Does FD’s recommender system steer users to useful recommendations? • Compare usefulness between recommended and manually curated articles • 115 users • One month of rankings (August, 2019) Serendipity, Dynamism, Diversity, Coverage
 āž”
  • 25. Metric • Intra-list diversity • For four article attributes: Sections, Tags, Authors, Word Embeddings Results • recommendations are more diverse in article topic/content • manual curation is more diverse in authors • both are diverse in tags Usefulness 1: Diversity
  • 26. Metric • Inter-list diversity Results • when recommendations change, they tend to change more Usefulness 2: Dynamism
  • 27. Metric • articles’ average dissimilarity to a reader’s historic articles • (same attributes as diversity) Results • Manual curation yields more serendipitous rankings in terms of tags and authors • Recommended articles are more serendipitous in content Usefulness 3: Serendipity
  • 28. Metric • percentage of daily published articles that are served Results • per user the recommendations provide a narrow set of articles • across all users, the overall coverage of recommended articles is much higher than the manual curated articles Usefulness 4: Coverage
  • 29. RQ1 Does FD’s recommender system steer users to useful recommendations? āœ…
  • 30. RQ2 Can we effectively adjust our news recommender to steer our readers towards more dynamic reading behavior, without loss of accuracy? Approach: ⚠ Intervention study • Single usefulness treatment: Dynamism • avoid exposing readers to sub-optimal rankings • constrained by technical requirements
  • 31. RQ2 Can we effectively adjust our news recommender to steer our readers towards more dynamic reading behavior, without loss of accuracy? Method: Online A/B test • Control: the original recommender system • Variant: the adjusted recommender system, steered towards more dynamic recommendations • 2 weeks (November 25 to December 4, 2019) • 1,108 users • Each randomly assigned to one of the two treatments
  • 34. RQ2 Can we effectively adjust our news recommender to steer our readers towards more dynamic reading behavior, without loss of accuracy? āœ…
  • 35. Take homes Recommendations can benefit both news providers and readers •Fairness and ethics are time-, culture-, and context-dependent •Organizational/editorial/ethical values can be ā€˜operationalized’ in algorithms •Data scientist: don’t try to define fairness (it’s not your job, nor expertise) •But talk to stakeholders in your organization! •Come up with a shared definition (combining what you can achieve technically + what you want to achieve ā€œconceptuallyā€) •Build! 🦾 Fin
 thank you for your attention