SlideShare a Scribd company logo
1 of 72
Scaling machine learning
workflows with Apache Beam
Tatiana Al-Chueyr
Senior Data Engineer
Online, 24 October 2020 @tati_alchueyr
Добрый день!
@tati_alchueyr
Multi
NIX Conf
доброго дня!
@tati_alchueyrMulti
NIX Conf
tati.__doc__
● Brazilian living in London since 2014
● Senior Data Engineer at the BBC Datalab team
● Graduated in Computer Engineering at Unicamp, Brazil
● Passionate software developer for 16 years
● Experience in the private and public sectors
● Developed software for Medicine, Media and Education
3
@tati_alchueyrMulti
NIX Conf
I ❤ Ukraine
In 2019, Amanda and I went to Kharkiv for 3 days, when:
● We were Keynote Speakers at OctopusCon
● We lectured at the Kharkiv National University of Radio Electronics
● I was really impressed with the Ukranian Tech Community
● We had a Dolphin therapy session at the Nemo Dolphinarium
Credit: @obestwalter
Credit: OctopusCon
4
@tati_alchueyrMulti
NIX Conf
BBC: British Broadcasting Corporation
● Founded in 1922
● In the UK…
○ The BBC has no advertisements
○ If a resident wants to watch the BBC, they pay a TV
License
● Values
○ Independent, impartial and honest
○ Audiences are at the heart of everything we do
● Purpose
Inform Educate Entertain+ +
5
@tati_alchueyrMulti
NIX Conf
bbc.stats()
● BBC TV reaches 91% UK adult population
● BBC News reaches 426 million global audience weekly
Reference 1: BBC
Reference 2: BBC
Image Credit: BBC6
@tati_alchueyrMulti
NIX Conf
bbc.stats()
~2,000 pieces of BBC content are produced every day….
and a limited number of available slots to occupy!
7
@tati_alchueyrMulti
NIX Conf
BBC.
Vision
For the BBC to be a leader in Machine Learning that
delights audiences and prioritises the needs of
individuals and society over corporations and states.
Mission
To develop and deploy Machine Learning at BBC scale
so that teams can tailor services to individuals whilst
upholding our editorial values.
8
@tati_alchueyrMulti
NIX Conf
Pre-lockdown (subset of) Datalab team members (15 August 2019)
BBC. .
9
@tati_alchueyrMulti
NIX Conf
Locked-down (subset of) Datalab team members (19 March 2020)
BBC. .
COVID-19
pandemic
10
recommendation engine
the challenge
@tati_alchueyrMulti
NIX Conf
The BBC outsourced a recommendation engine
12
@tati_alchueyrMulti
NIX Conf
The audience liked personalised recommendations
13
@tati_alchueyrMulti
NIX Conf
Could we replace it with our own recommender?
14
@tati_alchueyrMulti
NIX Conf
Could we replace it with our own recommender?
15
recommendation engine
principles
@tati_alchueyrMulti
NIX Conf
Content-based approach
17
@tati_alchueyrMulti
NIX Conf
Collaborative-filtering approach
18
@tati_alchueyrMulti
NIX Conf
Hybrid approach e.g. Factorisation Machine Algorithm
19
@tati_alchueyrMulti
NIX Conf
Machine learning workflow
20
recommendation engine
the prototype
@tati_alchueyrMulti
NIX Conf
1-2 months of work:
● Collected data (quick-and-dirty™ scripts)
● Compared existing Python Factorisation Machines libraries (winner: LightFM)
● Trained and predicted recommendations (quick-and-dirty™ scripts)
● Implemented a qualitative experiment tool
● Recruited volunteers to join the qualitative experiment
● Ran qualitative experiment, comparing:
○ External provider recommendations
○ Our own Factorization Machines-powered recommendations
The prototype
22
@tati_alchueyrMulti
NIX Conf
Qualitative experiment: how
Who
● ~30 test users recruited
○ Internal BBC employees
○ Under 35
How
● Two sets with 9 recommendations each:
○ External provider
○ Internal factorisation machines
● Users, without knowing the origin of the recs, had to:
○ choose “the best”, “both”, or “neither”
○ explain why
23
@tati_alchueyrMulti
NIX Conf
Qualitative experiments
neither external
provider
factorisation
machines
both
24
recommendation engine
productionising
@tati_alchueyrMulti
NIX Conf
Productionising machine learning
Configuration
Data Collection
and
Transformation
Feature Extraction
Data
Verification
Machine
Resource
Management
Serving
Infrastructure
Monitoring
Process Management
Tools
Analysis ToolsML Code
Image copied from presentation by Googler @mpyeager
26
@tati_alchueyrMulti
NIX Conf
Machine learning workflow
Input
Processing
Output
User activity data Content metadata
Recommendations
Machine Learning model
training
Predict recommendations
27
@tati_alchueyrMulti
NIX Conf
Machine learning workflow
Input
Processing
Output
User activity data Content metadata
Business Rules, part I - Non-personalised
- Recency
- Availability
- Excluded Masterbrands
- Excluded genres
Business Rules, part II - Personalised
- Already seen items
- Local radio (if not consumed previously)
- Specific language (if not consumed previously)
- Episode picking from a series
- Diversification (1 episode per brand/series)
Recommendations
Machine Learning model
training
Predict recommendations
28
@tati_alchueyrMulti
NIX Conf
Steps to be done in the workflows, before the API
Input
Processing
Output
User activity data Content metadata
Business Rules, part I - Non-personalised
- Recency
- Availability
- Excluded Masterbrands
- Excluded genres
Business Rules, part II - Personalised
- Already seen items
- Local radio (if not consumed previously)
- Specific language (if not consumed previously)
- Episode picking from a series
- Diversification (1 episode per brand/series)
Recommendations
Machine Learning model
training
Predict recommendations
29
@tati_alchueyrMulti
NIX Conf
model
Recommendation API strategies
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations
30
@tati_alchueyrMulti
NIX Conf
model
Recommendation API strategies
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations
Goal:
1500 requests/s
with P95 responses
< 60 ms
31
@tati_alchueyrMulti
NIX Conf
Recommendation API: load performance
On the fly Precomputed Precomputed
Concurrent load tests
requests/s
50 50 1500
Success percentage 63.88% 100% 100%
Latency of p50 (success) 323.78 ms 1.68 ms 4.75 ms
Latency of p95 (success) 939.28 ms 3.21 ms 57.53 ms
Latency of p99 (success) 979.24 ms 4.51 ms 97.49 ms
Maximum successful
requests per second
23 50 1500
Goal:
1500 requests/s
with P95 responses
< 60 ms
Machine type: c2-standard-8, Python 3.7, Sanic workers: 7, Prediction threads: 1, vCPU cores: 7, Memory: 15 Gi, Deployment Replicas: 1
32
@tati_alchueyrMulti
NIX Conf
model
Strategies to serve recommendations
API
API
user
activity
content
metadata
cached
recs
A. On the fly
B. Precompute
predicts & applies rules
retrieves pre-computed recommendations
33
@tati_alchueyrMulti
NIX Conf
Steps to be done in the workflows, before the API
Input
Processing
Output
User activity data Content metadata
Business Rules, part I - Non-personalised
- Recency
- Availability
- Excluded Masterbrands
- Excluded genres
Business Rules, part II - Personalised
- Already seen items
- Local radio (if not consumed previously)
- Specific language (if not consumed previously)
- Episode picking from a series
- Diversification (1 episode per brand/series)
Precomputed
recommendations
Machine Learning model
training
Predict recommendations
34
recommendation engine
workflows orchestration
@tati_alchueyrMulti
NIX Conf
Workflows orchestration: requirements
● Scheduling recurrent jobs
● Retry executing a task if it fails
● Task dependency management
● Monitoring and logs
● Capability of programmatically defining workflows (direct acyclic graphs)
● Built-in support for writing automated tests
36
@tati_alchueyrMulti
NIX Conf
Workflows orchestration: Apache Airflow
37
@tati_alchueyrMulti
NIX Conf
Google Managed Apache Airflow: Cloud Composer
38
@tati_alchueyrMulti
NIX Conf
Cloud Composer: monitoring
39
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow
● Good for orchestrating tasks
● Not good for processing a data-intensive task within an Airflow worker
40
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow
41
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow
42
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow
Issue:
Depending on the
volumes of data, a single
PythonOperator task
which usually takes
10 min could take almost
3h!
Consequences:
Overall delay
Blocked worker
43
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow
Time estimations (in seconds) to predict recommendations using a c2-standard-30 instance (30 vCPU and 120 GB RAM)
44
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow
Time estimations (in seconds) to predict recommendations using a c2-standard-30 instance (30 vCPU and 120 GB RAM)
2h to predict
recommendations for
10k users
What about 5 million
users - or more?
45
@tati_alchueyrMulti
NIX Conf
Limitation of Apache Airflow: solutions
Delegating processing to other services
● Tasks which scale vertically (better hardware)
○ Airflow Compute Engine (Virtual Machine) Operator (GceInstanceStartOperator)
○ Airflow Kubernetes Pod Operator (GKEPodOperator)
● Tasks which scale horizontally (can be split and distributed in multiple nodes)
○ Airflow Dataflow Operator (Google Dataflow, Apache Beam )
○ Airflow Dataproc Operator (Google Dataproc, Apache Spark & Hadoop)
46
recommendation engine
efficient data processing
@tati_alchueyrMulti
NIX Conf
Apache Beam
“Apache Beam is a unified
programming model designed
to provide efficient and
portable data processing
pipelines”
48
@tati_alchueyrMulti
NIX Conf
Apache Beam
https://towardsdatascience.com/running-an-apache-beam-data-pipeline-on-azure-databricks-c09e521d8fc3
49
@tati_alchueyrMulti
NIX Conf
Apache Beam: overview of Dataflow job
Image from the book “Google Cloud Platform In Action” by JJ Geewax, Chapter 20
50
@tati_alchueyrMulti
NIX Conf
Apache Beam: overview of Dataflow job
Parallel processing “effortlessly”
Image from the book “Google Cloud Platform In Action” by JJ Geewax, Chapter 20
51
@tati_alchueyrMulti
NIX Conf
Simple Beam example
https://beam.apache.org/documentation/transforms/python/aggregation/cogroupbykey/
52
@tati_alchueyrMulti
NIX Conf
Apache Beam: overview of Dataflow job
53
@tati_alchueyrMulti
NIX Conf
Adoption of Apache Beam & Dataflow
“Serverless” parallel processing of 41,258,135 items (27.32 GB) with
Python in 1min 24s using 10 default workers
54
@tati_alchueyrMulti
NIX Conf
Pure Airflow
PythonOperator in
Cloud Composer
DataflowOperator
running a Beam
pipeline within
Dataflow
episode
availability episode
s/PythonOperator/DataflowOperator
Computation time reduced almost by one
order of magnitude
Document
type
PythonOperator DataflowOperator Performance
gain
episode 60 min 6 min 90%
availability
episode
12 min 5 min 58%
55
@tati_alchueyrMulti
NIX Conf
Precomputing recs for millions of users
56
recommendation engine
beam/dataflow gotchas
@tati_alchueyrMulti
NIX Conf
Quizz time
https://forms.gle/CxhnDU4wd55hmgQX7
58
@tati_alchueyrMulti
NIX Conf
To Beam or not to Beam?
● 8.4 GiB distributed in 130 parquet files
● Task: read only one of the columns and export that in new files
● Three implementations:
○ Single-threaded PyArrow in my computer (Quad-Core 16 GB RAM)
○ Dataflow autoscaling, up to 10 default workers
○ Dataflow fixed amount of 10 workers
● What is the most efficient vCPU, memory and time-wise?
59
@tati_alchueyrMulti
NIX Conf
To Beam or not to Beam?
PyArrow Dataflow
(autoscaling)
Dataflow
(fixed workers)
Time 3m56.355s 12m27.314s 7m44.518s
Total vCPU 0.05 vCPU hr 0.997 vCPU hr 0.979 vCPU hr
Total memory 0.016 GB hr 3.739 GB hr 3.673 GB hr
60
@tati_alchueyrMulti
NIX Conf
Does a better machine means faster?
n1-standard-1:
● 1 vCPU
● 3.75 GB RAM
n1-standard-4
● 4 vCPU
● 15 GB RAM
61
@tati_alchueyrMulti
NIX Conf
Does a better machine means faster?
n1-standard-1
● 1 vCPU
● 3.75 GB RAM
n1-standard-4
● 4 vCPU
● 15 GB RAM
62
@tati_alchueyrMulti
NIX Conf
Does a better machine means faster?
n1-standard-1
● 1 vCPU
● 3.75 GB RAM
n1-standard-4
● 4 vCPU
● 15 GB RAM
63
@tati_alchueyrMulti
NIX Conf
Error message from worker: ConnectionReset
64
@tati_alchueyrMulti
NIX Conf
Error message from worker: ConnectionReset
A single “executor”
within each worker
(VM) needed 10
GB...
65
@tati_alchueyrMulti
NIX Conf
Error message from worker: ConnectionReset
https://stackoverflow.com/questions/63705660/optimising-gcp-costs-for-a-memory-intensive-dataflow-pipeline
66
@tati_alchueyrMulti
NIX Conf
Error message from worker: ConnectionReset
Solutions for memory-intensive beam transformations
● Use custom machine type with extended memory
● Use shared memory feature from Beam 2.24
67
@tati_alchueyrMulti
NIX Conf
Cost analysis
https://cloud.google.com/dataflow/pricing
Resources metrics per job
68
@tati_alchueyrMulti
NIX Conf
Cost reduction
300$ per run
69
@tati_alchueyrMulti
NIX Conf
Cost reduction
memory intensive
transformation
Solutions
● Use shared memory
● Split pipelines so only the memory
intensive transformation uses expensive
machine types.
70
http://datalab.rocks
71
@tati_alchueyr
Multi
NIX Conf
дуже тобі дякую!
Большое спасибо!
Thank you very much!

More Related Content

Similar to Scaling machine learning workflows with Apache Beam

From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsTatiana Al-Chueyr
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamTatiana Al-Chueyr
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 RecapSri Ambati
 
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSCProduct Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSCHAKKACHE Mohamed
 
Industrializing Machine learning pipelines
Industrializing Machine learning pipelinesIndustrializing Machine learning pipelines
Industrializing Machine learning pipelinesGermain Tanguy
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?Tom Paseka
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonTatiana Al-Chueyr
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingVianney FOUCAULT
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...Altinity Ltd
 
Bringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsBringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsApigee | Google Cloud
 
How Parallelware technology eases HPC software development for POWER systems
How Parallelware technology eases  HPC software development for  POWER systemsHow Parallelware technology eases  HPC software development for  POWER systems
How Parallelware technology eases HPC software development for POWER systemsGanesan Narayanasamy
 
Fluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchiveFluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchivePaul Calvano
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018Chun-Yu Tseng
 
Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)Alexandre Roman
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles Sonigo
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamTatiana Al-Chueyr
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusRed Hat Developers
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...Hendrik van Run
 

Similar to Scaling machine learning workflows with Apache Beam (20)

From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC Sounds
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSCProduct Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
 
Industrializing Machine learning pipelines
Industrializing Machine learning pipelinesIndustrializing Machine learning pipelines
Industrializing Machine learning pipelines
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
 
Bringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIsBringing Partners, Teams & Systems Together through APIs
Bringing Partners, Teams & Systems Together through APIs
 
How Parallelware technology eases HPC software development for POWER systems
How Parallelware technology eases  HPC software development for  POWER systemsHow Parallelware technology eases  HPC software development for  POWER systems
How Parallelware technology eases HPC software development for POWER systems
 
Fluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchiveFluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP Archive
 
from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018from ai.backend import python @ pycontw2018
from ai.backend import python @ pycontw2018
 
Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)Pivotal + Apigee Workshop (June 4th, 2019)
Pivotal + Apigee Workshop (June 4th, 2019)
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache Beam
 
An approach to production scheduling optimization, A Case of an Oil Lubricati...
An approach to production scheduling optimization, A Case of an Oil Lubricati...An approach to production scheduling optimization, A Case of an Oil Lubricati...
An approach to production scheduling optimization, A Case of an Oil Lubricati...
 
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
 

More from Tatiana Al-Chueyr

Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowTatiana Al-Chueyr
 
Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache AirflowTatiana Al-Chueyr
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow ObstructionsTatiana Al-Chueyr
 
Responsible machine learning at the BBC
Responsible machine learning at the BBCResponsible machine learning at the BBC
Responsible machine learning at the BBCTatiana Al-Chueyr
 
Responsible Machine Learning at the BBC
Responsible Machine Learning at the BBCResponsible Machine Learning at the BBC
Responsible Machine Learning at the BBCTatiana Al-Chueyr
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCTatiana Al-Chueyr
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesTatiana Al-Chueyr
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English RightTatiana Al-Chueyr
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareTatiana Al-Chueyr
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correctionTatiana Al-Chueyr
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolutionTatiana Al-Chueyr
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comTatiana Al-Chueyr
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and SemanticsTatiana Al-Chueyr
 
Desarollando aplicaciones web en python con pruebas
Desarollando aplicaciones web en python con pruebasDesarollando aplicaciones web en python con pruebas
Desarollando aplicaciones web en python con pruebasTatiana Al-Chueyr
 
Desarollando aplicaciones móviles con Python y Android
Desarollando aplicaciones móviles con Python y AndroidDesarollando aplicaciones móviles con Python y Android
Desarollando aplicaciones móviles con Python y AndroidTatiana Al-Chueyr
 
Transifex: Ensinando o seu Software Público a falar novos idiomas
Transifex: Ensinando o seu Software Público a falar novos idiomasTransifex: Ensinando o seu Software Público a falar novos idiomas
Transifex: Ensinando o seu Software Público a falar novos idiomasTatiana Al-Chueyr
 

More from Tatiana Al-Chueyr (19)

Integrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache AirflowIntegrating ChatGPT with Apache Airflow
Integrating ChatGPT with Apache Airflow
 
Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache Airflow
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Responsible machine learning at the BBC
Responsible machine learning at the BBCResponsible machine learning at the BBC
Responsible machine learning at the BBC
 
Responsible Machine Learning at the BBC
Responsible Machine Learning at the BBCResponsible Machine Learning at the BBC
Responsible Machine Learning at the BBC
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
 
Sprint cPython at Globo.com
Sprint cPython at Globo.comSprint cPython at Globo.com
Sprint cPython at Globo.com
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
Crafting APIs
Crafting APIsCrafting APIs
Crafting APIs
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English Right
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging software
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correction
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolution
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
 
PythonBrasil[8] closing
PythonBrasil[8] closingPythonBrasil[8] closing
PythonBrasil[8] closing
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 
Desarollando aplicaciones web en python con pruebas
Desarollando aplicaciones web en python con pruebasDesarollando aplicaciones web en python con pruebas
Desarollando aplicaciones web en python con pruebas
 
Desarollando aplicaciones móviles con Python y Android
Desarollando aplicaciones móviles con Python y AndroidDesarollando aplicaciones móviles con Python y Android
Desarollando aplicaciones móviles con Python y Android
 
Transifex: Ensinando o seu Software Público a falar novos idiomas
Transifex: Ensinando o seu Software Público a falar novos idiomasTransifex: Ensinando o seu Software Público a falar novos idiomas
Transifex: Ensinando o seu Software Público a falar novos idiomas
 

Recently uploaded

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

Scaling machine learning workflows with Apache Beam

Editor's Notes

  1. duzhe tobi dyakuyu! Bol'shoye spasibo!
  2. Founded in 1922 “Our organisation exists in order to serve individuals and society as a whole rather than a small set of stakeholders.”
  3. UK population: 66.44 million Ukraine: ~ 42.22 million World wide population: 7.7 billion people as of April 2019 Image from Seven worlds, one planet ~12 million penguins live in Antarctica https://oceanites.org/wp-content/uploads/2019/06/SOAP-2019-Online.pdf
  4. Multi-disciplinary team Architecture Data science Editorial Engineering Product Management Project Management
  5. duzhe tobi dyakuyu! Bol'shoye spasibo!