SlideShare a Scribd company logo
Kharkiv National University of Radio Electronics
17 November 2019
@tati_alchueyr
Ethical Machine Learning
building recommendation engines with
editorial support
мне приятно быть здесь с тобой
большое Вам спасибо
About me
● Brazilian living in London since 2014
● Senior Data Engineer at the BBC Datalab team
● Graduated in Computer Engineering at Unicamp
● Passionate software developer for 16 years
● Experience in the private and public sectors
● Developed software for Medicine, Media and Education
● Loves Open Source
● Loves Brazilian Jiu Jitsu
● Proud mother of Amanda
BBC
● British Broadcasting Corporation
● Values
○ Independent, impartial and honest
○ Audiences are at the heart of everything we do
○ We take pride in delivering quality and value for
money
○ Creativity is the lifeblood of our organisation
○ We respect each other and celebrate our diversity
so that everyone can give their best
BBC
● Founded in 1922
● Purpose
○ Inform
○ Educate
○ Entertain
● “Our organisation exists in order to serve individuals and
society as a whole rather than a small set of stakeholders.”
Reference: Gabriel Straub (BBC)
bbc.stats()
➢ BBC TV reaches 91% UK adult population
➢ BBC News reaches 426 million global audience weekly
Reference 1: BBC
Reference 2: BBC
Image Credit: BBC
BBC. .
“Bring the BBC’s data together
accessible through a common platform,
along with flexible and scalable tools to
support machine learning to enable
content enrichment and deeper
personalisation”
Some of the Datalab team members (15 August 2019)
BBC. .
BBC. .
● Multi-disciplinary team
○ Editorial
○ Data scientists
○ Engineers
○ Product Manager
○ Project Manager
BBC
machine learning
BBC Machine learning applied to the audiences
Image credit: BBC
BBC Machine learning applied to content creation
Image credit: BBCMade by the Machine: when AI met the archive (BBC 4)
Machine learning overview
BBC+
experimental personalised app
BBC+ app experiment
● Fully personalised experience on short videos, on Android & iPhone
● Allow users to find gems that they didn’t know at a time that suits them
Content-based recommendations content
Content-based recommendations content
Content-based recommendations content
We create a content representation (*):
{
"genres": {
"science": 0.8,
"nature": 0.2,
}
}
(*) simplified for didactic purposes
Content-based recommendations user
We learn about the user indirectly
● news you read
● videos you watch
● things you search
● quizzes you answer
● things you like
● things you comment
Content-based recommendations user
We create a user representation (*):
{
"genres": {
"science": 0.4,
"folk-music": 0.5,
"judo": 0.1,
}
}
(*) simplified for didactic purposes
Content-based recommendations prediction
We use the user representation to search for content similar to it,
using Elasticsearch. As an output, we have a ranked list of content.
BBC+ app experiment
● How to get from algorithm to product
○ Start with content-based recommendations
○ Apply business rules
Legal, editorial, GDPR, business values
https://www.bbc.com/editorialguidelines/
Legal Policies
Programme: BBC
Contempt of court
● The recommendations should not affect the
outcome of a legal case
● The BBC can be held accountable for
influencing the jury’s opinion
Action
● Create a “contempt of court risk” label by
detecting keywords such as arrest, assault,
allegation etc
● Avoid items with this label
Legal Policies
Electoral law
● During elections we should not surface
political content that could influence the vote
Action
● Create a “political risk” label by detecting
political content sources
● Avoid items when appropriate
Editorial Policies
Quality criteria
● Avoid content that shows little care has been
taken in the metadata
Action
● Avoid content with poor titles and descriptions
Editorial Policies
Under 16 audience
● Provide children-safe content
● BBC’s 9PM watershed
Action
● Avoid items with warnings of sex, violence,
strong language
Cold start: human curation alongside automation
GDPR
Explainability
● Choose simple models over complex ones
● UI features to provide explanations
Agency
● UI features for users to interact with the algorithm
● Eg. delete history items, like, dislike, report
Curation values
● Affection
● Authenticity
● Compelling
● Fresh
● Warm
● Quirky
● Relatable
● Aspirational
● Entertaining
● Reassuring
Reference: Anna McGovern
“Website editor, manager, analyst and
digital nurturer” at the BBC
Much more than click rates
Business values & objectives
Quantitative offline evaluation
● NDCG, hit rate, diversity, recency, surprisal
● Prioritise diversity and recency over accuracy
Qualitative offline evaluation
● Prioritise content for young audiences
● Prioritise content of editorial importance
BBC+ app experiment
BBC+ app experiment
Takeaways
● The editorial partnership is key to how we work
● The company’s principles are at the heart of all of our decisions
● There is a significant path between implementation and
production ready
BBC Sounds Recommendations
Challenging existing recs provider
Current external provider
Content-based recommendations
● 9 to 12 items on native apps and web
● Current provider: content-based algorithm
○ Poor metadata, poor recommendations
○ Popularity biases towards heritage audience
○ Cold start using editorially curated lists
○ Opportunity for improvement of performance
We decided to try a different approach: Factorisation Machines
Recommended for you
Recommendation strategy content-based
How it works
● Given a user, find similar content to their preferences
● Characterising item using genres, masterbrand, etc.
● Based on user’s historical data and content metadata
Challenges
● Potential lack of diversity and relies on good content description
Where can we find this?
● “You may also be interested in …”
How it works
● Given a user, find similar users and the content they watched
● Based on all users’ historical data
● Uses implicit feedback (user-item interactions)
Challenges
● Sparse matrix
○ SVM very efficient except in sparse settings where
not enough data to estimate interactions
● Cold start
Where can we find this?
● “Customers who viewed this item, also viewed...”
Recommendation strategy collaborative filtering
How it works
● Hybrid content-based and collaborative filtering
● SVM and factorisation techniques
● Based on all users’ historical data and content metadata
● Based on reliable information (latent features)
● Linear time complexity
Recommendation strategy factorisation machine
Reference: Academic Paper
Example
● Estimate interaction between Alice and
Star Trek
a. No case where A and ST > wA,ST= 0
b. Use factorized interaction parameters
{vA, vST}
c. Dot product of the factor vectors of A and
ST will be similar to the one of A and SW
Recommendation strategy factorisation machine
User Item Rating
Alie (A) Titanic (T) 5
Alice (A) Notting Hill (NH) 3
Alice (A) Star Wars (SW) 1
Bob (B) Star Wars (SW) 4
Bob (B) Star Trek (ST) 5
Charlie
(C)
Titanic (T) 1
Charlie
(C)
Star Wars (SW) 5
Qualitative Experiment
Who
● ~30 test users recruited
○ From non-editorial and editorial
teams from BBC audio networks
○ Under 35
How
● Two sets of recommendations
displayed
● Users have to pick either the best list,
or “both”, or “neither”
● And explain why
Qualitative Experiment Feedback
● “Need to categorize speech vs music,
background listening vs ‘serious’
content”
● “Need to consider the age of the item”
● “Looking for diverse content durations
…”
Reducing item/user biases helped to
generate more personalised
recommendations than the current state
Neither Content-
Based
Hybrid
approach
Both
2 8 17 1
7% 28.5% 61% 3.5%
Quantitative Experiment (MVT or A/B test)
Machine Learning Principles
The BBC Machile Learning Values
1. Audiences at the heart of everything we do. We celebrate diversity
○ Good value for money and focusing on using the audience-based
data to improve their experience
3. Our algorithms serve our audiences equally and fairly, so that the
full breadth of the BBC is available to everyone
6. Algorithms form only part of the content discovery process for our
audiences, and sit alongside (human) editorial curation
Reference: Gabriel Straub (BBC)
Flourishing
in the age of AI
Flourishing in the age of AI
● Research
● 11,000 people
● 7 markets
● What people want from their lives
● How technology might enable that
Reference: Flourishing in AI report
Flourishing in the age of AI
“(...) people in the UK don’t think technology is being
developed with their best interests at heart”
Reference: Flourishing in AI report
Flourishing in the age of AI
Reference: Flourishing in AI report
● How satisfied are you with
your life?
● To what extent the thing
you do in life is
worthwhile?
● How anxious did you feel
yesterday?
Base: 5432, May 2019
Flourishing in the age of AI
Reference: Flourishing in AI report
Flourishing in the age of AI
Reference: Flourishing in AI report
Flourishing in the age of AI
Reference: Flourishing in AI report
Flourishing in the age of AI
Reference: Flourishing in AI report
Empower the user
Bonus
BBC Radio 1 studios tour
Link to video from Jacob Rickard
http://datalab.rocks
We are hiring
Further Reading
Ethical Machine Learning
● How do you make decisions about what is fair?
● What metrics can you use?
● How to achieve an ethical machine learning in your work?
Reference: Avoiding the Fate of Icarus
Medium
дуже тобі дякую
Спасибо
Image credit: Wikipedia Commons
@tati_alchueyr

More Related Content

Similar to Responsible Machine Learning at the BBC

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
multimediaeval
 
Advertising PRESENTATION
Advertising PRESENTATIONAdvertising PRESENTATION
Advertising PRESENTATIONHongjin Li
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
Pistoia Alliance
 
EMMI Lot | Final AMR Presentation
EMMI Lot | Final AMR PresentationEMMI Lot | Final AMR Presentation
EMMI Lot | Final AMR PresentationNelson Gaytón
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
Mounia Lalmas-Roelleke
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
Bill Liu
 
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
dclsocialmedia
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Förderverein Technische Fakultät
 
How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)
Vbout.com
 
Future of land use project overview - august 2019
Future of land use   project overview - august 2019Future of land use   project overview - august 2019
Future of land use project overview - august 2019
Future Agenda
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Andrea Wiggins
 
Website Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics WebinarsWebsite Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics Webinars
Dan Jaffe
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Justin Basilico
 
NAB Show 2015 — Knowledge Dump presentation
NAB Show 2015 — Knowledge Dump presentationNAB Show 2015 — Knowledge Dump presentation
NAB Show 2015 — Knowledge Dump presentation
Taylor Ho
 
Software libre en la banca - Experiencias del grupo Santander con OSS
Software libre en la banca - Experiencias del grupo Santander con OSSSoftware libre en la banca - Experiencias del grupo Santander con OSS
Software libre en la banca - Experiencias del grupo Santander con OSS
LibreCon
 
Information Architecture and the Distributed User Experience
Information Architecture and the Distributed User ExperienceInformation Architecture and the Distributed User Experience
Information Architecture and the Distributed User Experience
Jason Ryan
 
WE16 - Navigating the Seas of Open Source Projects
WE16 - Navigating the Seas of Open Source ProjectsWE16 - Navigating the Seas of Open Source Projects
WE16 - Navigating the Seas of Open Source Projects
Society of Women Engineers
 
Guerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable ResearchGuerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable Research
Brad Orego (he/they)
 
【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024
NABLAS株式会社
 
Web Analytics for your ePortfolio
Web Analytics for your ePortfolioWeb Analytics for your ePortfolio
Web Analytics for your ePortfolio
Matthieu Aubry
 

Similar to Responsible Machine Learning at the BBC (20)

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Advertising PRESENTATION
Advertising PRESENTATIONAdvertising PRESENTATION
Advertising PRESENTATION
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
 
EMMI Lot | Final AMR Presentation
EMMI Lot | Final AMR PresentationEMMI Lot | Final AMR Presentation
EMMI Lot | Final AMR Presentation
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
 
Natural Intelligence the human factor in AI
Natural Intelligence the human factor in AINatural Intelligence the human factor in AI
Natural Intelligence the human factor in AI
 
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
The Freedom to Grow: How Standards in Communication Facilitate Our Industry, ...
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)How AI is Impacting User Experience (UX)
How AI is Impacting User Experience (UX)
 
Future of land use project overview - august 2019
Future of land use   project overview - august 2019Future of land use   project overview - august 2019
Future of land use project overview - august 2019
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
 
Website Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics WebinarsWebsite Content Planning For Law Firms | LawLytics Webinars
Website Content Planning For Law Firms | LawLytics Webinars
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
NAB Show 2015 — Knowledge Dump presentation
NAB Show 2015 — Knowledge Dump presentationNAB Show 2015 — Knowledge Dump presentation
NAB Show 2015 — Knowledge Dump presentation
 
Software libre en la banca - Experiencias del grupo Santander con OSS
Software libre en la banca - Experiencias del grupo Santander con OSSSoftware libre en la banca - Experiencias del grupo Santander con OSS
Software libre en la banca - Experiencias del grupo Santander con OSS
 
Information Architecture and the Distributed User Experience
Information Architecture and the Distributed User ExperienceInformation Architecture and the Distributed User Experience
Information Architecture and the Distributed User Experience
 
WE16 - Navigating the Seas of Open Source Projects
WE16 - Navigating the Seas of Open Source ProjectsWE16 - Navigating the Seas of Open Source Projects
WE16 - Navigating the Seas of Open Source Projects
 
Guerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable ResearchGuerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable Research
 
【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024【NABLAS Inc.】Recruitment materials - Ver. 2024
【NABLAS Inc.】Recruitment materials - Ver. 2024
 
Web Analytics for your ePortfolio
Web Analytics for your ePortfolioWeb Analytics for your ePortfolio
Web Analytics for your ePortfolio
 

More from Tatiana Al-Chueyr

Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
Tatiana Al-Chueyr
 
Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache Airflow
Tatiana Al-Chueyr
 
From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC Sounds
Tatiana Al-Chueyr
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
Tatiana Al-Chueyr
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache Beam
Tatiana Al-Chueyr
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
Tatiana Al-Chueyr
 
Scaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache BeamScaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache Beam
Tatiana Al-Chueyr
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
Tatiana Al-Chueyr
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
Tatiana Al-Chueyr
 
Sprint cPython at Globo.com
Sprint cPython at Globo.comSprint cPython at Globo.com
Sprint cPython at Globo.com
Tatiana Al-Chueyr
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
Tatiana Al-Chueyr
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
Tatiana Al-Chueyr
 
Crafting APIs
Crafting APIsCrafting APIs
Crafting APIs
Tatiana Al-Chueyr
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English Right
Tatiana Al-Chueyr
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging software
Tatiana Al-Chueyr
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correction
Tatiana Al-Chueyr
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolution
Tatiana Al-Chueyr
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
Tatiana Al-Chueyr
 
PythonBrasil[8] closing
PythonBrasil[8] closingPythonBrasil[8] closing
PythonBrasil[8] closing
Tatiana Al-Chueyr
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
Tatiana Al-Chueyr
 

More from Tatiana Al-Chueyr (20)

Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache Airflow
 
From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC Sounds
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache Beam
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Scaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache BeamScaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache Beam
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
 
Sprint cPython at Globo.com
Sprint cPython at Globo.comSprint cPython at Globo.com
Sprint cPython at Globo.com
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Crafting APIs
Crafting APIsCrafting APIs
Crafting APIs
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English Right
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging software
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correction
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolution
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
 
PythonBrasil[8] closing
PythonBrasil[8] closingPythonBrasil[8] closing
PythonBrasil[8] closing
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Responsible Machine Learning at the BBC

  • 1. Kharkiv National University of Radio Electronics 17 November 2019 @tati_alchueyr Ethical Machine Learning building recommendation engines with editorial support
  • 2. мне приятно быть здесь с тобой большое Вам спасибо
  • 3. About me ● Brazilian living in London since 2014 ● Senior Data Engineer at the BBC Datalab team ● Graduated in Computer Engineering at Unicamp ● Passionate software developer for 16 years ● Experience in the private and public sectors ● Developed software for Medicine, Media and Education ● Loves Open Source ● Loves Brazilian Jiu Jitsu ● Proud mother of Amanda
  • 4. BBC ● British Broadcasting Corporation ● Values ○ Independent, impartial and honest ○ Audiences are at the heart of everything we do ○ We take pride in delivering quality and value for money ○ Creativity is the lifeblood of our organisation ○ We respect each other and celebrate our diversity so that everyone can give their best
  • 5. BBC ● Founded in 1922 ● Purpose ○ Inform ○ Educate ○ Entertain ● “Our organisation exists in order to serve individuals and society as a whole rather than a small set of stakeholders.” Reference: Gabriel Straub (BBC)
  • 6. bbc.stats() ➢ BBC TV reaches 91% UK adult population ➢ BBC News reaches 426 million global audience weekly Reference 1: BBC Reference 2: BBC Image Credit: BBC
  • 7. BBC. . “Bring the BBC’s data together accessible through a common platform, along with flexible and scalable tools to support machine learning to enable content enrichment and deeper personalisation”
  • 8. Some of the Datalab team members (15 August 2019) BBC. .
  • 9. BBC. . ● Multi-disciplinary team ○ Editorial ○ Data scientists ○ Engineers ○ Product Manager ○ Project Manager
  • 11. BBC Machine learning applied to the audiences Image credit: BBC
  • 12. BBC Machine learning applied to content creation Image credit: BBCMade by the Machine: when AI met the archive (BBC 4)
  • 15. BBC+ app experiment ● Fully personalised experience on short videos, on Android & iPhone ● Allow users to find gems that they didn’t know at a time that suits them
  • 18. Content-based recommendations content We create a content representation (*): { "genres": { "science": 0.8, "nature": 0.2, } } (*) simplified for didactic purposes
  • 19. Content-based recommendations user We learn about the user indirectly ● news you read ● videos you watch ● things you search ● quizzes you answer ● things you like ● things you comment
  • 20. Content-based recommendations user We create a user representation (*): { "genres": { "science": 0.4, "folk-music": 0.5, "judo": 0.1, } } (*) simplified for didactic purposes
  • 21. Content-based recommendations prediction We use the user representation to search for content similar to it, using Elasticsearch. As an output, we have a ranked list of content.
  • 22. BBC+ app experiment ● How to get from algorithm to product ○ Start with content-based recommendations ○ Apply business rules
  • 23. Legal, editorial, GDPR, business values https://www.bbc.com/editorialguidelines/
  • 24. Legal Policies Programme: BBC Contempt of court ● The recommendations should not affect the outcome of a legal case ● The BBC can be held accountable for influencing the jury’s opinion Action ● Create a “contempt of court risk” label by detecting keywords such as arrest, assault, allegation etc ● Avoid items with this label
  • 25. Legal Policies Electoral law ● During elections we should not surface political content that could influence the vote Action ● Create a “political risk” label by detecting political content sources ● Avoid items when appropriate
  • 26. Editorial Policies Quality criteria ● Avoid content that shows little care has been taken in the metadata Action ● Avoid content with poor titles and descriptions
  • 27. Editorial Policies Under 16 audience ● Provide children-safe content ● BBC’s 9PM watershed Action ● Avoid items with warnings of sex, violence, strong language
  • 28. Cold start: human curation alongside automation
  • 29. GDPR Explainability ● Choose simple models over complex ones ● UI features to provide explanations Agency ● UI features for users to interact with the algorithm ● Eg. delete history items, like, dislike, report
  • 30. Curation values ● Affection ● Authenticity ● Compelling ● Fresh ● Warm ● Quirky ● Relatable ● Aspirational ● Entertaining ● Reassuring Reference: Anna McGovern “Website editor, manager, analyst and digital nurturer” at the BBC Much more than click rates
  • 31. Business values & objectives Quantitative offline evaluation ● NDCG, hit rate, diversity, recency, surprisal ● Prioritise diversity and recency over accuracy Qualitative offline evaluation ● Prioritise content for young audiences ● Prioritise content of editorial importance
  • 33. BBC+ app experiment Takeaways ● The editorial partnership is key to how we work ● The company’s principles are at the heart of all of our decisions ● There is a significant path between implementation and production ready
  • 34. BBC Sounds Recommendations Challenging existing recs provider
  • 36. ● 9 to 12 items on native apps and web ● Current provider: content-based algorithm ○ Poor metadata, poor recommendations ○ Popularity biases towards heritage audience ○ Cold start using editorially curated lists ○ Opportunity for improvement of performance We decided to try a different approach: Factorisation Machines Recommended for you
  • 37. Recommendation strategy content-based How it works ● Given a user, find similar content to their preferences ● Characterising item using genres, masterbrand, etc. ● Based on user’s historical data and content metadata Challenges ● Potential lack of diversity and relies on good content description Where can we find this? ● “You may also be interested in …”
  • 38. How it works ● Given a user, find similar users and the content they watched ● Based on all users’ historical data ● Uses implicit feedback (user-item interactions) Challenges ● Sparse matrix ○ SVM very efficient except in sparse settings where not enough data to estimate interactions ● Cold start Where can we find this? ● “Customers who viewed this item, also viewed...” Recommendation strategy collaborative filtering
  • 39. How it works ● Hybrid content-based and collaborative filtering ● SVM and factorisation techniques ● Based on all users’ historical data and content metadata ● Based on reliable information (latent features) ● Linear time complexity Recommendation strategy factorisation machine Reference: Academic Paper
  • 40. Example ● Estimate interaction between Alice and Star Trek a. No case where A and ST > wA,ST= 0 b. Use factorized interaction parameters {vA, vST} c. Dot product of the factor vectors of A and ST will be similar to the one of A and SW Recommendation strategy factorisation machine User Item Rating Alie (A) Titanic (T) 5 Alice (A) Notting Hill (NH) 3 Alice (A) Star Wars (SW) 1 Bob (B) Star Wars (SW) 4 Bob (B) Star Trek (ST) 5 Charlie (C) Titanic (T) 1 Charlie (C) Star Wars (SW) 5
  • 41. Qualitative Experiment Who ● ~30 test users recruited ○ From non-editorial and editorial teams from BBC audio networks ○ Under 35 How ● Two sets of recommendations displayed ● Users have to pick either the best list, or “both”, or “neither” ● And explain why
  • 42. Qualitative Experiment Feedback ● “Need to categorize speech vs music, background listening vs ‘serious’ content” ● “Need to consider the age of the item” ● “Looking for diverse content durations …” Reducing item/user biases helped to generate more personalised recommendations than the current state Neither Content- Based Hybrid approach Both 2 8 17 1 7% 28.5% 61% 3.5%
  • 45. The BBC Machile Learning Values 1. Audiences at the heart of everything we do. We celebrate diversity ○ Good value for money and focusing on using the audience-based data to improve their experience 3. Our algorithms serve our audiences equally and fairly, so that the full breadth of the BBC is available to everyone 6. Algorithms form only part of the content discovery process for our audiences, and sit alongside (human) editorial curation Reference: Gabriel Straub (BBC)
  • 47. Flourishing in the age of AI ● Research ● 11,000 people ● 7 markets ● What people want from their lives ● How technology might enable that Reference: Flourishing in AI report
  • 48. Flourishing in the age of AI “(...) people in the UK don’t think technology is being developed with their best interests at heart” Reference: Flourishing in AI report
  • 49. Flourishing in the age of AI Reference: Flourishing in AI report ● How satisfied are you with your life? ● To what extent the thing you do in life is worthwhile? ● How anxious did you feel yesterday? Base: 5432, May 2019
  • 50. Flourishing in the age of AI Reference: Flourishing in AI report
  • 51. Flourishing in the age of AI Reference: Flourishing in AI report
  • 52. Flourishing in the age of AI Reference: Flourishing in AI report
  • 53. Flourishing in the age of AI Reference: Flourishing in AI report
  • 55.
  • 56. Bonus BBC Radio 1 studios tour
  • 57.
  • 58.
  • 59.
  • 60. Link to video from Jacob Rickard
  • 63. Ethical Machine Learning ● How do you make decisions about what is fair? ● What metrics can you use? ● How to achieve an ethical machine learning in your work? Reference: Avoiding the Fate of Icarus Medium
  • 64. дуже тобі дякую Спасибо Image credit: Wikipedia Commons @tati_alchueyr

Editor's Notes

  1. мне приятно быть здесь с тобой it's a pleasure to be here with you большое Вам спасибо thank you very much
  2. UK population: 66.44 million Ukraine: ~ 42.22 million World wide population: 7.7 billion people as of April 2019 Image from Seven worlds, one planet ~12 million penguins live in Antarctica https://oceanites.org/wp-content/uploads/2019/06/SOAP-2019-Online.pdf
  3. Program: Made by Machine: when AI met the archive https://www.bbc.co.uk/rd/blog/2018-09-artificial-intelligence-archive-made-machine https://www.bbc.co.uk/programmes/b0bhwk3p
  4. The General Data Protection Regulation 2016/679 is a regulation in EU law on data protection and privacy for all individual citizens of the European Union and the European Economic Area. It also addresses the transfer of personal data outside the EU and EEA
  5. (normalised) Discounted cumulative gain (DCG) is a measure of ranking quality. In information retrieval, it is often used to measure effectiveness of web search engine algorithms or related applications. Using a graded relevance scale of documents in a search-engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated from the top of the result list to the bottom, with the gain of each result discounted at lower ranks.
  6. приємно бути тут з тобою pryyemno buty tut z toboyu it's a pleasure to be here with you дуже тобі дякую duzhe tobi dyakuyu thank you very much