Pragmatic ethical and fair AI for data scientists

David Graus
David GrausLead Data Scientist
DDMA Monthly Meetup AI @ online, 8 April 2021
Pragmatic ethical and fair AI for
data scientists
👨 David Graus
✉ david.graus@randstadgroep.nl
🌐 www.graus.nu
🐦 @dvdgrs
⚠ Disclaimer
This slide deck contains opinion: mine (not my employer’s)
Motivation
Narrative: From Negative
Motivation
Narrative: From Negative to Positive
Topic 1: 

⚖ Fair AI in recruitment & HR
bias is everywhere
in humans…
“All the curricula vitae actually came
from a real-life scientist [...] but the
names were changed to traditional
male and female names. 

[…]

Both men and women were more
likely to hire a male job applicant
than a female job applicant with an
identical record.”
Rhea E. Steinpreis, Katie A. Anders and Dawn Ritzke. The Impact of Gender on the Review 

of the Curriculum Vitae of Job Applicants and Tenure Candidates. Sex Roles, Springer.
bias is everywhere
in humans…
“White” names receive 50 percent
more callbacks for interviews [than
“African-American” names]. Results
suggest that racial discrimination
is still a prominent feature of the
labor market.
Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable 

than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. NBER.
bias is everywhere
…and algorithms
Amazon’s system taught itself that
male candidates were preferable. It
penalized résumés that included
the word “women’s”, as in
“women’s chess club captain”. And
it downgraded graduates of two
all-women’s colleges.
Amazon ditched AI recruiting tool that favored men for technical jobs

[https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-

recruiting-engine]
bias is everywhere
…and algorithms
“We examine gender inequality on
the resume search engines. We ran
queries on each site’s resume
search engine for 35 job titles. [...]
even when controlling for all other
visible candidate features, there is a
slight penalty against feminine
candidates.”
Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. Investigating the Impact 

of Gender on Rank in Resume Search Engines. CHI 2018, ACM.
bias is everywhere
bias is everywhere
bias is everywhere
bias is everywhere
bias is everywhere
🦾🤖 AI to the rescue
Fair AI #1
Representational ranking
“representation re-ranking” makes
sure the proportion of female
candidates shown is the same as
the corresponding proportion of
profiles matching that query.
Fair AI #1
Representational ranking
“representation re-ranking” makes
sure the proportion of female
candidates shown is the same as
the corresponding proportion of
profiles matching that query.
“In the US, a form of demographic parity
is required by the Equal Employment
Opportunity Commission (EEOC)
according to the 4/5ths (or P%) rule.”


https://www.ifow.org/publications/artificial-intelligence-in-hiring-
assessing-impacts-on-equality
Fair AI #1
Representational ranking
“representation re-ranking” makes
sure the proportion of female
candidates shown is the same as
the corresponding proportion of
profiles matching that query.

• Results in improvement in fairness
metrics, without statistically significant
change in the business metrics.

• Deployed to all LinkedIn Recruiter users
worldwide.
Human bias mitigation through (over)compensation
• Can we adjust rankings to “fix” human bias?

• Yes: balancing gender representation in
candidate slates can correct biases for
some professions.

• No: although doing so has no impact on
professions where human persistent
preferences are at play, e.g., nannies and
gynaecologists. 

• Gender of decision-maker, complexity of the
decision-making task and over- and under-
representation of genders in the candidate
slate can all impact the final decision.
Fair AI #2
Topic 2: 

⚖ Value-driven news recommendation
(Editorial) value-driven AI
Context

Het Financieele Dagblad
Goal

Article recommendations (for reader)

upholding editorial values (of provider)
Editorial values?
• Participated in a study by Bastian and Helberger [1]

• “[Conducted] semi-structured interviews with employees from different departments (journalists,
data scientists, product managers), it explores the costs and benefits of value articulation and a
mission-sensitive approach in algorithmic news distribution from the newsroom perspective.”
• Resulting values;

1. surprise readers

2. timely and fresh news

3. diverse reading behavior

4. cover more articles
[1] Bastian & Helberger. Safeguarding the journalistic DNA. Future of journalism 2019.
[2] Ge et al. Beyond Accuracy: Evaluating Recommender Systems by Coverage 

and Serendipity. RecSys 2010
➡ Serendipity
➡ Dynamism
➡ Diversity
➡ Coverage
RQ1
Does FD’s recommender system steer users to useful recommendations?
• Compare usefulness between recommended and manually curated articles

• 115 users

• One month of rankings (August, 2019)
RQ1
Does FD’s recommender system steer users to useful recommendations?
• Compare usefulness between recommended and manually curated articles

• 115 users

• One month of rankings (August, 2019)
Serendipity, Dynamism, Diversity, Coverage

➡
Metric
• Intra-list diversity

• For four article attributes: Sections, Tags,
Authors, Word Embeddings

Results
• recommendations are more diverse in article
topic/content

• manual curation is more diverse in authors

• both are diverse in tags
Usefulness 1: Diversity
Metric
• Inter-list diversity

Results
• when recommendations change, they tend to
change more
Usefulness 2: Dynamism
Metric
• articles’ average dissimilarity to a reader’s historic
articles 

• (same attributes as diversity)

Results
• Manual curation yields more serendipitous
rankings in terms of tags and authors 

• Recommended articles are more serendipitous in
content
Usefulness 3: Serendipity
Metric
• percentage of daily published articles that are
served

Results
• per user the recommendations provide a narrow
set of articles

• across all users, the overall coverage of
recommended articles is much higher than the
manual curated articles
Usefulness 4: Coverage
RQ1
Does FD’s recommender system steer users to useful recommendations?
✅
RQ2
Can we effectively adjust our news recommender to steer our readers
towards more dynamic reading behavior, without loss of accuracy?
Approach: ⚠ Intervention study
• Single usefulness treatment: Dynamism

• avoid exposing readers to sub-optimal rankings

• constrained by technical requirements
RQ2
Can we effectively adjust our news recommender to steer our readers
towards more dynamic reading behavior, without loss of accuracy?
Method: Online A/B test
• Control: the original recommender system
• Variant: the adjusted recommender system, steered towards more dynamic recommendations
• 2 weeks (November 25 to December 4, 2019)

• 1,108 users

• Each randomly assigned to one of the two treatments
Metric
Dynamism
Results
Dynamism
Accuracy
Dynamism
RQ2
Can we effectively adjust our news recommender to steer our readers
towards more dynamic reading behavior, without loss of accuracy?
✅
Take homes
Recommendations can benefit both news providers and readers
•Fairness and ethics are time-, culture-, and context-dependent

•Organizational/editorial/ethical values can be ‘operationalized’ in algorithms

•Data scientist: don’t try to define fairness (it’s not your job, nor expertise)

•But talk to stakeholders in your organization!

•Come up with a shared definition (combining what you can achieve
technically + what you want to achieve “conceptually”)

•Build! 🦾
Fin

thank you for your attention
discuss.
1 of 36

More Related Content

What's hot(20)

Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Artificial Intelligence Institute at UofSC2.4K views
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
Marianne Sweeny2.2K views
Netnography and Research Ethics: From ACR 2015 Doctoral SymposiumNetnography and Research Ethics: From ACR 2015 Doctoral Symposium
Netnography and Research Ethics: From ACR 2015 Doctoral Symposium
University of Southern California16.4K views
Netnography webinarNetnography webinar
Netnography webinar
suresh sood35.1K views
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
Marianne Sweeny1.4K views
Nrha 2010 racNrha 2010 rac
Nrha 2010 rac
aubreymm373 views

Similar to Pragmatic ethical and fair AI for data scientists(20)

Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
Krishnaram Kenthapadi1.6K views
Three Basic Job Evaluation MethodsThree Basic Job Evaluation Methods
Three Basic Job Evaluation Methods
Ashley Lovato3 views
Model bias in AIModel bias in AI
Model bias in AI
Jason Tamara Widjaja825 views
Sleeping with Cognitive CapitalSleeping with Cognitive Capital
Sleeping with Cognitive Capital
Quantifed Self Meetup Brussels1.5K views
Qs1 group a Qs1 group a
Qs1 group a
UKSG: connecting the knowledge community1.1K views
Ai demystified for HR and TA leadersAi demystified for HR and TA leaders
Ai demystified for HR and TA leaders
Antonia Macrides539 views
Myths and Realities of Psychometric TestingMyths and Realities of Psychometric Testing
Myths and Realities of Psychometric Testing
OPRA Psychology Group1.2K views
2012014050600320120140506003
20120140506003
IAEME Publication274 views
ElsevierElsevier
Elsevier
Christina Azzam1.1K views
humaniki User Research Report humaniki User Research Report
humaniki User Research Report
Sejal Khatri37 views
Antifragile-Organisation-DesignAntifragile-Organisation-Design
Antifragile-Organisation-Design
Jo Martens402 views
Pallak Arora- Gender AnalysisPallak Arora- Gender Analysis
Pallak Arora- Gender Analysis
Pallak Arora, MSc.1.9K views

More from David Graus(20)

Bias in RecommendationsBias in Recommendations
Bias in Recommendations
David Graus2.8K views
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
David Graus530 views
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus939 views

Recently uploaded(20)

Pragmatic ethical and fair AI for data scientists

  • 1. DDMA Monthly Meetup AI @ online, 8 April 2021 Pragmatic ethical and fair AI for data scientists 👨 David Graus ✉ david.graus@randstadgroep.nl 🌐 www.graus.nu 🐦 @dvdgrs
  • 2. ⚠ Disclaimer This slide deck contains opinion: mine (not my employer’s)
  • 5. Topic 1: ⚖ Fair AI in recruitment & HR
  • 6. bias is everywhere in humans… “All the curricula vitae actually came from a real-life scientist [...] but the names were changed to traditional male and female names. 
 […]
 Both men and women were more likely to hire a male job applicant than a female job applicant with an identical record.” Rhea E. Steinpreis, Katie A. Anders and Dawn Ritzke. The Impact of Gender on the Review 
 of the Curriculum Vitae of Job Applicants and Tenure Candidates. Sex Roles, Springer.
  • 7. bias is everywhere in humans… “White” names receive 50 percent more callbacks for interviews [than “African-American” names]. Results suggest that racial discrimination is still a prominent feature of the labor market. Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable 
 than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. NBER.
  • 8. bias is everywhere …and algorithms Amazon’s system taught itself that male candidates were preferable. It penalized résumés that included the word “women’s”, as in “women’s chess club captain”. And it downgraded graduates of two all-women’s colleges. Amazon ditched AI recruiting tool that favored men for technical jobs
 [https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-
 recruiting-engine]
  • 9. bias is everywhere …and algorithms “We examine gender inequality on the resume search engines. We ran queries on each site’s resume search engine for 35 job titles. [...] even when controlling for all other visible candidate features, there is a slight penalty against feminine candidates.” Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. Investigating the Impact 
 of Gender on Rank in Resume Search Engines. CHI 2018, ACM.
  • 15. 🦾🤖 AI to the rescue
  • 16. Fair AI #1 Representational ranking “representation re-ranking” makes sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query.
  • 17. Fair AI #1 Representational ranking “representation re-ranking” makes sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query. “In the US, a form of demographic parity is required by the Equal Employment Opportunity Commission (EEOC) according to the 4/5ths (or P%) rule.” https://www.ifow.org/publications/artificial-intelligence-in-hiring- assessing-impacts-on-equality
  • 18. Fair AI #1 Representational ranking “representation re-ranking” makes sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query. • Results in improvement in fairness metrics, without statistically significant change in the business metrics. • Deployed to all LinkedIn Recruiter users worldwide.
  • 19. Human bias mitigation through (over)compensation • Can we adjust rankings to “fix” human bias? • Yes: balancing gender representation in candidate slates can correct biases for some professions. • No: although doing so has no impact on professions where human persistent preferences are at play, e.g., nannies and gynaecologists. • Gender of decision-maker, complexity of the decision-making task and over- and under- representation of genders in the candidate slate can all impact the final decision. Fair AI #2
  • 20. Topic 2: ⚖ Value-driven news recommendation
  • 21. (Editorial) value-driven AI Context
 Het Financieele Dagblad Goal
 Article recommendations (for reader)
 upholding editorial values (of provider)
  • 22. Editorial values? • Participated in a study by Bastian and Helberger [1] • “[Conducted] semi-structured interviews with employees from different departments (journalists, data scientists, product managers), it explores the costs and benefits of value articulation and a mission-sensitive approach in algorithmic news distribution from the newsroom perspective.” • Resulting values; 1. surprise readers 2. timely and fresh news 3. diverse reading behavior 4. cover more articles [1] Bastian & Helberger. Safeguarding the journalistic DNA. Future of journalism 2019. [2] Ge et al. Beyond Accuracy: Evaluating Recommender Systems by Coverage 
 and Serendipity. RecSys 2010 ➡ Serendipity ➡ Dynamism ➡ Diversity ➡ Coverage
  • 23. RQ1 Does FD’s recommender system steer users to useful recommendations? • Compare usefulness between recommended and manually curated articles • 115 users • One month of rankings (August, 2019)
  • 24. RQ1 Does FD’s recommender system steer users to useful recommendations? • Compare usefulness between recommended and manually curated articles • 115 users • One month of rankings (August, 2019) Serendipity, Dynamism, Diversity, Coverage
 ➡
  • 25. Metric • Intra-list diversity • For four article attributes: Sections, Tags, Authors, Word Embeddings Results • recommendations are more diverse in article topic/content • manual curation is more diverse in authors • both are diverse in tags Usefulness 1: Diversity
  • 26. Metric • Inter-list diversity Results • when recommendations change, they tend to change more Usefulness 2: Dynamism
  • 27. Metric • articles’ average dissimilarity to a reader’s historic articles • (same attributes as diversity) Results • Manual curation yields more serendipitous rankings in terms of tags and authors • Recommended articles are more serendipitous in content Usefulness 3: Serendipity
  • 28. Metric • percentage of daily published articles that are served Results • per user the recommendations provide a narrow set of articles • across all users, the overall coverage of recommended articles is much higher than the manual curated articles Usefulness 4: Coverage
  • 29. RQ1 Does FD’s recommender system steer users to useful recommendations? ✅
  • 30. RQ2 Can we effectively adjust our news recommender to steer our readers towards more dynamic reading behavior, without loss of accuracy? Approach: ⚠ Intervention study • Single usefulness treatment: Dynamism • avoid exposing readers to sub-optimal rankings • constrained by technical requirements
  • 31. RQ2 Can we effectively adjust our news recommender to steer our readers towards more dynamic reading behavior, without loss of accuracy? Method: Online A/B test • Control: the original recommender system • Variant: the adjusted recommender system, steered towards more dynamic recommendations • 2 weeks (November 25 to December 4, 2019) • 1,108 users • Each randomly assigned to one of the two treatments
  • 34. RQ2 Can we effectively adjust our news recommender to steer our readers towards more dynamic reading behavior, without loss of accuracy? ✅
  • 35. Take homes Recommendations can benefit both news providers and readers •Fairness and ethics are time-, culture-, and context-dependent •Organizational/editorial/ethical values can be ‘operationalized’ in algorithms •Data scientist: don’t try to define fairness (it’s not your job, nor expertise) •But talk to stakeholders in your organization! •Come up with a shared definition (combining what you can achieve technically + what you want to achieve “conceptually”) •Build! 🦾 Fin
 thank you for your attention