SlideShare a Scribd company logo
1 of 66
Download to read offline
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Machine Learning
is more than
Algorithms
A Consultant's
Perspective on the
Industry and the Job
Market
1
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 2
Image source: https:/
/cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review
Agenda
Introduction
Example Projects
Tooling
Job Market Perspectives
Key Takeaways
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 3
Image source: https:/
/cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review
Agenda
Introduction
Example Projects
Tooling
Job Market Perspectives
Key Takeaways
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 4
Image source: https:/
/cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review
Agenda
Introduction
Example Projects
Tooling
Job Market Perspectives
Key Takeaways
Feel free to ask questions right away!
Then it’s my duty to have a look on the
wall clock.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
$ whoami
5
Niklas Haas
living in Dusseldorf, NRW
Data Scientist turned ML Engineer
Graduated 2017 in Industrial Engineering and Management from
Karlsruhe Institute of Technology, KIT.
Curriculum focus on Statistics, Operations Research, Information
Technology
With codecentric AG in Solingen since 2018
codecentric has a 4+1 model, i.e. we work 4 days for the customer
and have 1 day per week for learning and development of ourselves
and / or the company (though administrative tasks are also included
in this time)
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
$ whoami
6
Project history
Machine Learning Microservice for the Industry 4.0 platform
Create a microservice for early detection of a serious production
failure and integrate into the existing Industry 4.0 Platform.
Customer Lifecycle Recommendations
Scaling Algorithms for Detection of Customer Churn. Migrating an
on-prem solution for personalized customer communication to GCP
and improving the product in close collaboration with its users
Recommendations in Wholesale
Set up a scalable ML system for personalized product
recommendations on the Google Cloud Platform (GCP) following
MLOps principles. Evaluating ML models using AB testing
Industrial Analytics in Renewable Energy
Unsupervised pattern recognition on wind turbine data. Using
automated feature engineering and bayesian clustering to build a
penetrable and validatable decision support system (Decision Tree).
Results are presented in an interactive Dashboard.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Sexiest Job of 21st Century - said in 2012
7
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Data Scientist Job was googled thereafter
8
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
How relevant is the
Data “Scientist” for
the industry?
OR
How much “deep”
learning do you need?
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
In real-life ML systems, there are many things to consider
10
Image source: https:/
/cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
In real-life ML systems, there are many things to consider
11
Image source: https:/
/cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
That might be the expectations of a Data Scientist.
However, this alone does not add value to the business, as it is likely to not go beyond the “Proof of Concept” state.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
In real-life ML systems, there are many things to consider
12
Image source: https:/
/cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
That might be the expectations of a Data Scientist.
However, this alone does not all value to the business, as it is likely to not go beyond the “Proof of Concept” state.
As of 2020:
“Gartner research shows only 53% of projects make it from artificial
intelligence (AI) prototypes to production.”
https:/
/www.gartner.com/en/newsroom/press-releases/2020-10-19-g
artner-identifies-the-top-strategic-technology-trends-for-2021#:~:text
=Gartner%20research%20shows%20only%2053,a%20production%2Dgra
de%20AI%20pipeline.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
In real-life ML systems, there are many things to consider
13
Image source: https:/
/cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
To cover all aspects of a ML system, from our experience, the
ratio of Data Scientists to {Data,ML,DevOps} Engineers
should be around 1:3.
At least in the project ramp-up phase.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 14
Image source: https:/
/cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review
Agenda
Introduction
Example Projects
Tooling
Job Market Perspectives
Key Takeaways
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Our projects in the Gartner “AI” Hype Cycle
15
Image source: https:/
/www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Our Projects represented in “State of AI” 2021 (2nd pandemic year)
16
Image source: https:/
/www.mckinsey.com/business-functions/quantumblack/our-insights/global-survey-the-state-of-ai-in-2021
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Machine Learning Microservice for the Industry 4.0 platform
17
Reference: https://www.youtube.com/watch?v=WywQm0wHLvA
Reference: https://www.kampf.de/de/digitale-produkte/theadvanced/
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Recommendations in Wholesale
18
References:
https://www.codecentric.de/success-stories/metro-digital
https://cloud.google.com/bigquery-ml/docs/bigqueryml-mf-implicit-tutorial
https://developers.google.com/machine-learning/recommendation/collaborative/matrix
https://cloud.google.com/retail/recommendations-ai/docs/create-models
Using matrix factorization with implicit feedback (customer did not give explicit rating but gave implicit feedback).
Use Cases:
- Ranked Promotions
- “Others you may like” (-> not “Frequently bought together”, this is better calculated using basket mining, it has
a different objective: click through rate vs. conversion rate).
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Customer Lifecycle Recommendations
19
Reference:
https:/
/dzone.com/articles/xgboost-a-deep-dive-into-boosting (image taken from here)
https://www.codecentric.de/success-stories/metro-digital
https://towardsdatascience.com/churn-prediction-3a4a36c2129a
https://github.com/dmlc/xgboost
https://github.com/slundberg/shap
Target variable: will the customer
buy something in the next 3
months?
Boosting = sequential
optimization of Decision Trees
SHAP for model explanation
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Industrial Analytics in Renewable Energy
20
Semi-supervised pattern recognition on wind turbine data
1. Infer patterns empirically from data
2. Classify/interpret data with domain knowledge and assign
classes
3. Build higher-level analysis and models on these classes
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
NLP: Automating document processing - Sherloq (cc project)
21
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 22
Agenda
Introduction
Example Projects
Tooling
Job Market Perspectives
Key Takeaways
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
There is no shortage in tooling - Linux Foundation Data & AI Landscape
23
Image source: https:/
/landscape.lfai.foundation/
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
There is no shortage in tooling - Linux Foundation Data & AI Landscape
24
Image source: https:/
/landscape.lfai.foundation/
The amount of available open source tooling is ridiculous. It is like the wild west.
Innovation is everywhere.
Strategy that works for us:
Be aware of your own FOMO (fear of missing out) and ignore it.
Get proficient in the tools you use.
Discuss with colleagues / the community, what they use to get inspired.
Regularly try out new tools.
If it gives you the productivity boost, adapt it.
Do NOT choose tooling simply because it is new/cool/promising (sometimes referred
to as “tech porn”).
Also: Do NOT replace one tool with another unless it adds real value.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Before we start: Python vs R (caution: my opinion)
25
Area of application
General purpose with performance-optimized packages for
statistical computing/ML
Statistical computing
Contributors
Professional Software Engineers from all over the world, also from
Google et al.
Academia
(Unit-)Testing & Linting capabilities Excellent and prominent (pytest, flake8, black, yapf) Exists, but I never saw it used in the wild
Development Environments
Depending on the use: VSCode, PyCharm, Spyder (Matlab/RStudio
clone), Jupyter
I saw basically only RStudio in the wild
API development (for example for
model serving)
Many alternatives: flask, FastAPI, Django plumbe.R ? (I never used it)
Dashboarding & Data Apps Plotly / Dash, Streamlit Shiny
Documentation and QA websites, github, StackOverflow
websites, Github, CRAN (awful, looks like from the 90s), a little bit
on StackOverflow
Verdict (my opinion!)
No doubt the relevant language for the industry.
Learning opportunities are abundant. Using Python will improve
your coding skills. A lot of 3rd party software have APIs in Python.
No demand for R in the industry unless in specialized areas.
Mainly Academia. Using R will almost surely not improve your
coding skills, because that is no focus of the community.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Without “MLOps mindset”, the situation might look like this
26
Data
...
Notebooks with data
storytelling, can turn
very long
Model artifacts,
processed data
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
MLOps Level 1: manual experimentation + automated pipeline
27
Image source:
https:/
/cloud.google
.com/architecture/
mlops-continuous-d
elivery-and-automat
ion-pipelines-in-ma
chine-learning
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Experimentation and Data Storytelling: JupyterLab
28
For
experimentation
and data
storytelling
Source: https:/
/jupyter.org/
Source: https:/
/jupyterlab.readthedocs.io/en/stable/getting_started/overview.html
Reference: https:/
/github.com/jupyterlab/jupyterlab
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Interactive Data Apps: Streamlit
29
For building data
apps
(comparable to R
Shiny)
Source: https:/
/streamlit.io/
Source: https:/
/share.streamlit.io/data-science-at-swast/handover_poc/main/handover.py
Source: https:/
/github.com/streamlit/streamlit
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Experiment Tracking: MLFlow
30
For experiment
and metadata
tracking.
Integrates nicely
with all popular
ML frameworks.
Source: https:/
/mlflow.org/
Source: https:/
/towardsdatascience.com/managing-your-machine-learning-experiments-with-mlflow-1cd6ee21996e
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
SQL (+ DBT )
31
You will need it.
Everywhere.
Some things
don’t need a
fancy ML model,
they can be
done using SQL
:-)
DBT is a great
tool! You can
write data tests
and
auto-generate
data lineage.
Source: https:/
/console.cloud.google.com/bigquery
Source: https:/
/www.postgresql.org/
Source: https:/
/www.getdbt.com/
Source: https:/
/www.datatask.io/blog/workflow-dbt-materialisations-documentation/
BigQuery
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Dashboarding and Apps for Stakeholders: Metabase
32
Nice tool to
discover data
from databases
with
dashboards.
Share
dashboards and
data stories with
stakeholders.
Authentication
mechanism is
included!
Start in a docker
container.
Source: https:/
/github.com/metabase/metabase
Source: https:/
/www.metabase.com/start/oss/
docker run -d -p 3000:3000 
--name metabase metabase/metabase
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Algorithms: Sklearn / xgboost / Tensorflow
33
Good old sklearn
is the industry
standard.
Xgboost often
delivers the best
results.
For image
processing, you
might use
fine-tuned
Tensorflow
models.
Source: https:/
/en.wikipedia.org/wiki/Scikit-learn
Source: https:/
/scikit-learn.org/stable/
Source: https:/
/xgboost.readthedocs.io/en/stable/
Source: https:/
/github.com/tensorflow/tensorflow
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Pipelining: Kedro / Dagster / Kubeflow
34
Split your ML
tasks in logical
and
self-contained
steps and
combine them in
pipelines.
Use kedro /
dagster for
lightweight
on-machine
tasks, and
Kubeflow for
heavyweight
scaling on
kubernetes
(k8s).
Source: https:/
/github.com/kedro-org/kedro
Source:https:/
/github.com/dagster-io/dagster
Source: https:/
/www.kubeflow.org/
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Model Serving: Flask / FastAPI / BentoML
35
Flask / FastAPI
to build from
scratch / to add
some business
logic
BentoML: To
simply serve
business models
Microservices
are nice, but
splitting logic
into different
services
introduces
latency
overhead.
Source: https:/
/flask.palletsprojects.com/en/2.1.x/
Source: https:/
/fastapi.tiangolo.com/
Source: https:/
/github.com/bentoml/BentoML
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Basics: Git, Containerization, Shell (bash, zsh), poetry
36
You will need
these basics in
every ML project
(also in every
software
project).
Source: https:/
/git-scm.com/
Source: https:/
/alexec.github.io/slides/intro-to-docker.html#/
Source: https:/
/de.wikipedia.org/wiki/Bash_(Shell)
Source: https:/
/de.wikipedia.org/wiki/Z_shell
Source: https:/
/github.com/python-poetry
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
AB testing: “by hand”
37
So far we did it
only “by hand”,
i.e. we use SQL +
Dashboarding.
We haven’t
found a very nice
open source
solution yet.
BigQuery
AB distribution image source: https:/
/pubs.rsc.org/image/article/2018/AN/c8an01303a/c8an01303a-f3_hi-res.gif
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
NLP processing: spaCy
38
A pretty good
industry
standard for
NLP.
Source: https:/
/spacy.io/
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Comparison of Public Cloud Providers: AWS vs Google vs Microsoft
39
39
Source: https:/
/aws.amazon.com/resources/analyst-reports/gartner-mq-cips-2021/
The “public cloud” market is dominated by 3 major providers
from the US:
Amazon Web Services / AWS:
https:/
/aws.amazon.com/
Microsoft Azure:
https:/
/azure.microsoft.com/
Google Cloud Platform / GCP:
https:/
/cloud.google.com/?hl=de
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Comparison of Public Cloud Providers: AWS vs Google vs Microsoft
40
40
Source: https:/
/www.itprotoday.com/iaas-and-paas/aws-continues-dominance-over-azure-google-cloud-strong-growth
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
AI / Data offerings by public ☁ providers and other vendors
41
41
Personal recommendation: Go to a public cloud (AWS, Azure, GCP) or
Databricks (cloud-agnostic) and use the open source tooling suite they
deploy as a service.
In my opinion no vendor solution has shown to be superior by now. Often,
the vendors just rebrand given OSS solutions (Jupyterlab!) with tweaks.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
ML is getting commoditized very fast! The entrance barrier shrinks.
42
Source: https:/
/cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models
BigQuery ML to train models from tabular data directly
in SQL! No Python + Pandas + Notebooks needed.
Can enable faster model iteration.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
AutoML tools are getting better very fast!
43
Source: https:/
/cloud.google.com/vision/docs/features-list
Source: https:/
/codelabs.developers.google.com/vertex_custom_training_prediction#1
Source: https:/
/console.cloud.google.com/vertex-ai/datasets/create
AWS, Azure, GCP offer similar services. Though these services can
differ in quality.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
AutoML tools are getting better very fast! - Example: Vertex AI by
Google Cloud
44
Source: https:/
/cloud.google.com/vision/docs/features-list
Source: https:/
/codelabs.developers.google.com/vertex_custom_training_prediction#1
Source: https:/
/console.cloud.google.com/vertex-ai/datasets/create
AWS, Azure, GCP offer similar services. Though these services can
differ in quality.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 45
Image source: https:/
/cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review
Agenda
Introduction
Tooling
Example Projects
Job Market Perspectives
Key Takeaways
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles
46
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Data
Scientist
Machine
Learning
Engineer
Data
Engineer
Backend
Engineer
DevOps
Engineer
The jobs generate different output and use different tools.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - My Path
47
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Data
Scientist
Machine
Learning
Engineer
Data
Engineer
Backend
Engineer
DevOps
Engineer
?
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - Skills
48
Source: Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Engineer
Backend
Engineer
/ SWE
DevOps
Engineer
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - Data Analyst - Task & Tooling
49
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Scientist
Machine
Learning
Engineer
Data
Engineer
Backend
Engineer
DevOps
Engineer
Data
Analyst
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - Data Scientist - Task & Tooling
50
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Machine
Learning
Engineer
Data
Engineer
Backend
Engineer
DevOps
Engineer
Data
Analyst
Data
Scientist
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
Identify business problems that
can be solved with data science
+ implement solutions
Python / sklearn / pandas /
Streamlit / Notebooks / SQL /
AB testing / Power Point …
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - ML Engineer - Task & Tooling
51
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Data
Scientist
Backend
Engineer
Data
Engineer
DevOps
Engineer
Machine
Learning
Engineer
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
Identify business problems that
can be solved with data science
+ implement solutions
Python / sklearn / pandas /
Streamlit / Notebooks / SQL /
AB testing / Power Point …
Scale data solutions and
increase ML team productivity
following MLOps principles
Pipeline tools / MLFlow /
Kubernetes / Databases /
Python / YAML / …
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - Data Engineer - Task & Tooling
52
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Data
Scientist
Machine
Learning
Engineer
Backend
Engineer
DevOps
Engineer
Data
Engineer
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
Identify business problems that
can be solved with data science
+ implement solutions
Python / sklearn / pandas /
Streamlit / Notebooks / SQL /
AB testing / Power Point …
Scale data solutions and
increase ML team productivity
following MLOps principles
Pipeline tools / MLFlow /
Kubernetes / Databases /
Python / YAML / …
Design data models and
implement processes that are
essential to run the business
SQL / Postgres / BigQuery /
Message Queues / Serverless
Functions / DBT /Kafka / …
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - Backend Engineer - Task & Tooling
53
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Data
Scientist
Machine
Learning
Engineer
Data
Engineer
DevOps
Engineer
Backend
Engineer
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
Identify business problems that
can be solved with data science
+ implement solutions
Python / sklearn / pandas /
Streamlit / Notebooks / SQL /
AB testing / Power Point …
Scale data solutions and
increase ML team productivity
following MLOps principles
Pipeline tools / MLFlow /
Kubernetes / Databases /
Python / YAML / …
Design data models and
implement processes that are
essential to run the business
SQL / Postgres / BigQuery /
Message Queues / Serverless
Functions / DBT /Kafka / …
Java /
Java Spring /
Golang /
Microservices /
Docker /
Authentication /
API Management /
…
Engineer scalable
backend systems
that implement the
business logic
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - DevOps Engineer - Task & Tooling
54
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Data
Scientist
Machine
Learning
Engineer
Backend
Engineer
Data
Engineer
DevOps
Engineer
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
Identify business problems that
can be solved with data science
+ implement solutions
Python / sklearn / pandas /
Streamlit / Notebooks / SQL /
AB testing / Power Point …
Scale data solutions and
increase ML team productivity
following MLOps principles
Pipeline tools / MLFlow /
Kubernetes / Databases /
Python / YAML / …
Design data models and
implement processes that are
essential to run the business
SQL / Postgres / BigQuery /
Message Queues / Serverless
Functions / DBT /Kafka / …
Java /
Java Spring /
Golang /
Microservices /
Containers /
Authentication /
API Management /
…
Engineer scalable
backend systems
that implement the
business logic
Promote the DevOps culture
of releasing software
frequently and establishing
feedback loops everywhere
Continuous Integration (CI) /
Continuous Delivery (CD) /
Containers / Cloud / APM / …
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles - DevOps Engineer - Task & Tooling
55
Source: modified from Chandra Reddy, https:/
/medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995
Data
Analyst
Excel /
Power Point /
Power BI /
Tableau /
SQL /
Python / R Scripts /
…
Create one-off or
recurring analyses
as foundation for
business decisions
Identify business problems that
can be solved with data science
+ implement solutions
Python / sklearn / pandas /
Streamlit / Notebooks / SQL /
AB testing / Power Point …
Scale data solutions and
increase ML team productivity
following MLOps principles
Pipeline tools / MLFlow /
Kubernetes / Databases /
Python / YAML / …
Design data models and
implement processes that are
essential to run the business
SQL / Postgres / BigQuery /
Message Queues / Serverless
Functions / DBT /Kafka / …
Java /
Java Spring /
Golang /
Microservices /
Containers /
Authentication /
API Management /
…
Engineer scalable
backend systems
that implement the
business logic
Promote the DevOps culture
of releasing software
frequently and establishing
feedback loops everywhere
Continuous Integration (CI) /
Continuous Delivery (CD) /
Containers / Cloud / APM / …
Data
Scientist
Machine
Learning
Engineer
Backend
Engineer
Data
Engineer
DevOps
Engineer
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles Google Trends - Germany
56
Source: https:/
/trends.google.de/trends/explore?geo=DE&q=data%20analyst,data%20scientist,machine%20learning%20engineer,data%20engineer
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Job Roles Google Trends - United States
57
Source: https:/
/trends.google.de/trends/explore?geo=US&q=data%20analyst,data%20scientist,machine%20learning%20engineer,data%20engineer
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
LinkedIn Job Offerings
58
Source: https:/
/de.linkedin.com/jobs
See that the number of results for Data Analyst and Data Scientist are similar?
This is because many companies promote the same job as “Data Analyst” and “Data Scientist” at the same time.
Be careful: When applying for a Data Scientist job, you might actually end up with a Data Analyst job.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
LinkedIn Job Offerings
59
Source: https:/
/de.linkedin.com/jobs
However, there is also a considerable overlap between “Data Scientist” and “Machine Learning Engineer”.
But it’s not as much as between “Data Analyst” and “Data Scientist”.
Whereas the “Data Engineer” role is pretty well defined from years of experience in classical Business
Intelligence (BI) environments, thus there is not much overlap to the other roles.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Salary DE - Data Analyst
60
Source: https:/
/www.kununu.com/de/gehalt/datenanalyst-982
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Salary DE - Data Scientist + ML Engineer
61
Source: https:/
/www.kununu.com/de/gehalt/data-scientist-973
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Salary DE - Data(base) Engineer
62
Source: https:/
/www.kununu.com/de/gehalt/datenbankentwickler-985
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Resources for self-development
63
For beginners:
- Udacity (high quality content!):
- https:/
/www.udacity.com/course/data-scientist-nanodegree--nd025
- https:/
/www.udacity.com/course/machine-learning-dev-ops-engineer-nanodegree--nd0
821
- https:/
/www.udacity.com/course/intro-to-relational-databases--ud197
- To get2know GCP:
- https:/
/www.cloudskillsboost.google/ with free hands-on labs
- https:/
/developers.google.com/machine-learning/crash-course/ml-intro (from developers)
- https:/
/www.coursera.org/learn/gcp-fundamentals
Advanced:
- Do Certifications (very advanced, needs prior knowledge in GCP):
- https:/
/cloud.google.com/certification/data-engineer
- https:/
/cloud.google.com/certification/machine-learning-engineer
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 64
Image source: https:/
/cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review
Agenda
Introduction
Example Projects
Tooling
Job Market Perspectives
Key Takeaways
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
Key Takeaways
65
The job roles in the data space are very different, they do different things.
You should decide what mixture of Communication/Math/Programming/Business
you want.
It can happen that the Data Scientist Job is actually a Data Analyst Job, be careful!
MLOps is currently the sort-of industry standard for structuring ML projects.
The tooling landscape is abundant and innovating all the time, it is impossible to keep up.
Instead, develop your own mechanism to deal with this complexity.
niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer
The job roles in the data space are very different, they do different things.
You should decide what mixture of Communication/Math/Programming/Business
you want.
It can happen that the Data Scientist Job is actually a Data Analyst Job, be careful!
MLOps is currently the sort-of industry standard for structuring ML projects.
The tooling landscape is abundant and innovating all the time, it is impossible to keep up.
Instead, develop your own mechanism to deal with this complexity.
Key Takeaways
66
Thanks for having me!
Special thanks to Maxx Richard Rahman
for reaching out to me!
Feel free to add me on LinkedIn:
https:/
/www.linkedin.com/in/niklas-haas/
I am open to feedback about the content / slides / style
of presentation / presentation performance! Right now
or on LinkedIn or via email.
I have time for more questions :-)

More Related Content

Similar to Machine Learning Engineer Perspective on Industry Trends and Job Market

Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH ModernizationTrivadis
 
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB
 
Michael fulton it architecture for non-architects
Michael fulton   it architecture for non-architectsMichael fulton   it architecture for non-architects
Michael fulton it architecture for non-architectsMAX Technical Training
 
Final project the future of work - ajm lecciones
Final project   the future of work - ajm leccionesFinal project   the future of work - ajm lecciones
Final project the future of work - ajm leccionesAaron Julius Lecciones
 
Enterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsEnterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsWSO2
 
[DSC Adria 23] Tarry Singh Building High dencity startup.pdf
[DSC Adria 23] Tarry Singh Building High dencity startup.pdf[DSC Adria 23] Tarry Singh Building High dencity startup.pdf
[DSC Adria 23] Tarry Singh Building High dencity startup.pdfDataScienceConferenc1
 
The Benefits Of Software Creation
The Benefits Of Software CreationThe Benefits Of Software Creation
The Benefits Of Software CreationJennifer Wood
 
Way to Agile from Tradition - Agile Way
Way to Agile from Tradition - Agile WayWay to Agile from Tradition - Agile Way
Way to Agile from Tradition - Agile WayRamadevi Lakshmanan
 
Knowledge Engineering For Automated Planning
Knowledge Engineering For Automated PlanningKnowledge Engineering For Automated Planning
Knowledge Engineering For Automated Planningahmad bassiouny
 
Knowledge Engineering For Automated Planning
Knowledge Engineering For Automated PlanningKnowledge Engineering For Automated Planning
Knowledge Engineering For Automated Planningahmad bassiouny
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIBigDataExpo
 
Google cloud certification task 09 guruprasanth.s
Google cloud certification task  09 guruprasanth.sGoogle cloud certification task  09 guruprasanth.s
Google cloud certification task 09 guruprasanth.sGURUPRASANTH33
 
DII PRESENTATION (APRIL 2021)
DII PRESENTATION (APRIL 2021)DII PRESENTATION (APRIL 2021)
DII PRESENTATION (APRIL 2021)Roman Zakharov
 
Solstice Cloud-Native Trends for 2018
Solstice Cloud-Native Trends for 2018Solstice Cloud-Native Trends for 2018
Solstice Cloud-Native Trends for 2018Mike Koleno
 
Green bim software - dr. Nagham Ali Hasan
Green bim software - dr. Nagham Ali HasanGreen bim software - dr. Nagham Ali Hasan
Green bim software - dr. Nagham Ali Hasannagham ali hasan
 
2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey Results2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey ResultsMichael Skok
 

Similar to Machine Learning Engineer Perspective on Industry Trends and Job Market (20)

D365 power platform-user-group-deck-v02
D365 power platform-user-group-deck-v02D365 power platform-user-group-deck-v02
D365 power platform-user-group-deck-v02
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH Modernization
 
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
 
Michael fulton it architecture for non-architects
Michael fulton   it architecture for non-architectsMichael fulton   it architecture for non-architects
Michael fulton it architecture for non-architects
 
Final project the future of work - ajm lecciones
Final project   the future of work - ajm leccionesFinal project   the future of work - ajm lecciones
Final project the future of work - ajm lecciones
 
Enterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and AppsEnterprise Architecture in Practice: from Datastore to APIs and Apps
Enterprise Architecture in Practice: from Datastore to APIs and Apps
 
[DSC Adria 23] Tarry Singh Building High dencity startup.pdf
[DSC Adria 23] Tarry Singh Building High dencity startup.pdf[DSC Adria 23] Tarry Singh Building High dencity startup.pdf
[DSC Adria 23] Tarry Singh Building High dencity startup.pdf
 
The Benefits Of Software Creation
The Benefits Of Software CreationThe Benefits Of Software Creation
The Benefits Of Software Creation
 
Way to Agile from Tradition - Agile Way
Way to Agile from Tradition - Agile WayWay to Agile from Tradition - Agile Way
Way to Agile from Tradition - Agile Way
 
Gttech2.8.18
Gttech2.8.18Gttech2.8.18
Gttech2.8.18
 
Knowledge Engineering For Automated Planning
Knowledge Engineering For Automated PlanningKnowledge Engineering For Automated Planning
Knowledge Engineering For Automated Planning
 
Knowledge Engineering For Automated Planning
Knowledge Engineering For Automated PlanningKnowledge Engineering For Automated Planning
Knowledge Engineering For Automated Planning
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
 
Google cloud certification task 09 guruprasanth.s
Google cloud certification task  09 guruprasanth.sGoogle cloud certification task  09 guruprasanth.s
Google cloud certification task 09 guruprasanth.s
 
DII PRESENTATION (APRIL 2021)
DII PRESENTATION (APRIL 2021)DII PRESENTATION (APRIL 2021)
DII PRESENTATION (APRIL 2021)
 
Solstice Cloud-Native Trends for 2018
Solstice Cloud-Native Trends for 2018Solstice Cloud-Native Trends for 2018
Solstice Cloud-Native Trends for 2018
 
Green bim software - dr. Nagham Ali Hasan
Green bim software - dr. Nagham Ali HasanGreen bim software - dr. Nagham Ali Hasan
Green bim software - dr. Nagham Ali Hasan
 
2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey Results2014 Future of Cloud Computing - 4th Annual Survey Results
2014 Future of Cloud Computing - 4th Annual Survey Results
 

Recently uploaded

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Machine Learning Engineer Perspective on Industry Trends and Job Market

  • 1. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Machine Learning is more than Algorithms A Consultant's Perspective on the Industry and the Job Market 1
  • 2. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 2 Image source: https:/ /cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review Agenda Introduction Example Projects Tooling Job Market Perspectives Key Takeaways
  • 3. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 3 Image source: https:/ /cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review Agenda Introduction Example Projects Tooling Job Market Perspectives Key Takeaways
  • 4. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 4 Image source: https:/ /cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review Agenda Introduction Example Projects Tooling Job Market Perspectives Key Takeaways Feel free to ask questions right away! Then it’s my duty to have a look on the wall clock.
  • 5. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer $ whoami 5 Niklas Haas living in Dusseldorf, NRW Data Scientist turned ML Engineer Graduated 2017 in Industrial Engineering and Management from Karlsruhe Institute of Technology, KIT. Curriculum focus on Statistics, Operations Research, Information Technology With codecentric AG in Solingen since 2018 codecentric has a 4+1 model, i.e. we work 4 days for the customer and have 1 day per week for learning and development of ourselves and / or the company (though administrative tasks are also included in this time)
  • 6. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer $ whoami 6 Project history Machine Learning Microservice for the Industry 4.0 platform Create a microservice for early detection of a serious production failure and integrate into the existing Industry 4.0 Platform. Customer Lifecycle Recommendations Scaling Algorithms for Detection of Customer Churn. Migrating an on-prem solution for personalized customer communication to GCP and improving the product in close collaboration with its users Recommendations in Wholesale Set up a scalable ML system for personalized product recommendations on the Google Cloud Platform (GCP) following MLOps principles. Evaluating ML models using AB testing Industrial Analytics in Renewable Energy Unsupervised pattern recognition on wind turbine data. Using automated feature engineering and bayesian clustering to build a penetrable and validatable decision support system (Decision Tree). Results are presented in an interactive Dashboard.
  • 7. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Sexiest Job of 21st Century - said in 2012 7
  • 8. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Data Scientist Job was googled thereafter 8
  • 9. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer How relevant is the Data “Scientist” for the industry? OR How much “deep” learning do you need?
  • 10. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer In real-life ML systems, there are many things to consider 10 Image source: https:/ /cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
  • 11. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer In real-life ML systems, there are many things to consider 11 Image source: https:/ /cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning That might be the expectations of a Data Scientist. However, this alone does not add value to the business, as it is likely to not go beyond the “Proof of Concept” state.
  • 12. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer In real-life ML systems, there are many things to consider 12 Image source: https:/ /cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning That might be the expectations of a Data Scientist. However, this alone does not all value to the business, as it is likely to not go beyond the “Proof of Concept” state. As of 2020: “Gartner research shows only 53% of projects make it from artificial intelligence (AI) prototypes to production.” https:/ /www.gartner.com/en/newsroom/press-releases/2020-10-19-g artner-identifies-the-top-strategic-technology-trends-for-2021#:~:text =Gartner%20research%20shows%20only%2053,a%20production%2Dgra de%20AI%20pipeline.
  • 13. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer In real-life ML systems, there are many things to consider 13 Image source: https:/ /cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning To cover all aspects of a ML system, from our experience, the ratio of Data Scientists to {Data,ML,DevOps} Engineers should be around 1:3. At least in the project ramp-up phase.
  • 14. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 14 Image source: https:/ /cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review Agenda Introduction Example Projects Tooling Job Market Perspectives Key Takeaways
  • 15. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Our projects in the Gartner “AI” Hype Cycle 15 Image source: https:/ /www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
  • 16. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Our Projects represented in “State of AI” 2021 (2nd pandemic year) 16 Image source: https:/ /www.mckinsey.com/business-functions/quantumblack/our-insights/global-survey-the-state-of-ai-in-2021
  • 17. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Machine Learning Microservice for the Industry 4.0 platform 17 Reference: https://www.youtube.com/watch?v=WywQm0wHLvA Reference: https://www.kampf.de/de/digitale-produkte/theadvanced/
  • 18. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Recommendations in Wholesale 18 References: https://www.codecentric.de/success-stories/metro-digital https://cloud.google.com/bigquery-ml/docs/bigqueryml-mf-implicit-tutorial https://developers.google.com/machine-learning/recommendation/collaborative/matrix https://cloud.google.com/retail/recommendations-ai/docs/create-models Using matrix factorization with implicit feedback (customer did not give explicit rating but gave implicit feedback). Use Cases: - Ranked Promotions - “Others you may like” (-> not “Frequently bought together”, this is better calculated using basket mining, it has a different objective: click through rate vs. conversion rate).
  • 19. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Customer Lifecycle Recommendations 19 Reference: https:/ /dzone.com/articles/xgboost-a-deep-dive-into-boosting (image taken from here) https://www.codecentric.de/success-stories/metro-digital https://towardsdatascience.com/churn-prediction-3a4a36c2129a https://github.com/dmlc/xgboost https://github.com/slundberg/shap Target variable: will the customer buy something in the next 3 months? Boosting = sequential optimization of Decision Trees SHAP for model explanation
  • 20. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Industrial Analytics in Renewable Energy 20 Semi-supervised pattern recognition on wind turbine data 1. Infer patterns empirically from data 2. Classify/interpret data with domain knowledge and assign classes 3. Build higher-level analysis and models on these classes
  • 21. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer NLP: Automating document processing - Sherloq (cc project) 21
  • 22. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 22 Agenda Introduction Example Projects Tooling Job Market Perspectives Key Takeaways
  • 23. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer There is no shortage in tooling - Linux Foundation Data & AI Landscape 23 Image source: https:/ /landscape.lfai.foundation/
  • 24. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer There is no shortage in tooling - Linux Foundation Data & AI Landscape 24 Image source: https:/ /landscape.lfai.foundation/ The amount of available open source tooling is ridiculous. It is like the wild west. Innovation is everywhere. Strategy that works for us: Be aware of your own FOMO (fear of missing out) and ignore it. Get proficient in the tools you use. Discuss with colleagues / the community, what they use to get inspired. Regularly try out new tools. If it gives you the productivity boost, adapt it. Do NOT choose tooling simply because it is new/cool/promising (sometimes referred to as “tech porn”). Also: Do NOT replace one tool with another unless it adds real value.
  • 25. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Before we start: Python vs R (caution: my opinion) 25 Area of application General purpose with performance-optimized packages for statistical computing/ML Statistical computing Contributors Professional Software Engineers from all over the world, also from Google et al. Academia (Unit-)Testing & Linting capabilities Excellent and prominent (pytest, flake8, black, yapf) Exists, but I never saw it used in the wild Development Environments Depending on the use: VSCode, PyCharm, Spyder (Matlab/RStudio clone), Jupyter I saw basically only RStudio in the wild API development (for example for model serving) Many alternatives: flask, FastAPI, Django plumbe.R ? (I never used it) Dashboarding & Data Apps Plotly / Dash, Streamlit Shiny Documentation and QA websites, github, StackOverflow websites, Github, CRAN (awful, looks like from the 90s), a little bit on StackOverflow Verdict (my opinion!) No doubt the relevant language for the industry. Learning opportunities are abundant. Using Python will improve your coding skills. A lot of 3rd party software have APIs in Python. No demand for R in the industry unless in specialized areas. Mainly Academia. Using R will almost surely not improve your coding skills, because that is no focus of the community.
  • 26. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Without “MLOps mindset”, the situation might look like this 26 Data ... Notebooks with data storytelling, can turn very long Model artifacts, processed data
  • 27. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer MLOps Level 1: manual experimentation + automated pipeline 27 Image source: https:/ /cloud.google .com/architecture/ mlops-continuous-d elivery-and-automat ion-pipelines-in-ma chine-learning
  • 28. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Experimentation and Data Storytelling: JupyterLab 28 For experimentation and data storytelling Source: https:/ /jupyter.org/ Source: https:/ /jupyterlab.readthedocs.io/en/stable/getting_started/overview.html Reference: https:/ /github.com/jupyterlab/jupyterlab
  • 29. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Interactive Data Apps: Streamlit 29 For building data apps (comparable to R Shiny) Source: https:/ /streamlit.io/ Source: https:/ /share.streamlit.io/data-science-at-swast/handover_poc/main/handover.py Source: https:/ /github.com/streamlit/streamlit
  • 30. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Experiment Tracking: MLFlow 30 For experiment and metadata tracking. Integrates nicely with all popular ML frameworks. Source: https:/ /mlflow.org/ Source: https:/ /towardsdatascience.com/managing-your-machine-learning-experiments-with-mlflow-1cd6ee21996e
  • 31. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer SQL (+ DBT ) 31 You will need it. Everywhere. Some things don’t need a fancy ML model, they can be done using SQL :-) DBT is a great tool! You can write data tests and auto-generate data lineage. Source: https:/ /console.cloud.google.com/bigquery Source: https:/ /www.postgresql.org/ Source: https:/ /www.getdbt.com/ Source: https:/ /www.datatask.io/blog/workflow-dbt-materialisations-documentation/ BigQuery
  • 32. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Dashboarding and Apps for Stakeholders: Metabase 32 Nice tool to discover data from databases with dashboards. Share dashboards and data stories with stakeholders. Authentication mechanism is included! Start in a docker container. Source: https:/ /github.com/metabase/metabase Source: https:/ /www.metabase.com/start/oss/ docker run -d -p 3000:3000 --name metabase metabase/metabase
  • 33. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Algorithms: Sklearn / xgboost / Tensorflow 33 Good old sklearn is the industry standard. Xgboost often delivers the best results. For image processing, you might use fine-tuned Tensorflow models. Source: https:/ /en.wikipedia.org/wiki/Scikit-learn Source: https:/ /scikit-learn.org/stable/ Source: https:/ /xgboost.readthedocs.io/en/stable/ Source: https:/ /github.com/tensorflow/tensorflow
  • 34. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Pipelining: Kedro / Dagster / Kubeflow 34 Split your ML tasks in logical and self-contained steps and combine them in pipelines. Use kedro / dagster for lightweight on-machine tasks, and Kubeflow for heavyweight scaling on kubernetes (k8s). Source: https:/ /github.com/kedro-org/kedro Source:https:/ /github.com/dagster-io/dagster Source: https:/ /www.kubeflow.org/
  • 35. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Model Serving: Flask / FastAPI / BentoML 35 Flask / FastAPI to build from scratch / to add some business logic BentoML: To simply serve business models Microservices are nice, but splitting logic into different services introduces latency overhead. Source: https:/ /flask.palletsprojects.com/en/2.1.x/ Source: https:/ /fastapi.tiangolo.com/ Source: https:/ /github.com/bentoml/BentoML
  • 36. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Basics: Git, Containerization, Shell (bash, zsh), poetry 36 You will need these basics in every ML project (also in every software project). Source: https:/ /git-scm.com/ Source: https:/ /alexec.github.io/slides/intro-to-docker.html#/ Source: https:/ /de.wikipedia.org/wiki/Bash_(Shell) Source: https:/ /de.wikipedia.org/wiki/Z_shell Source: https:/ /github.com/python-poetry
  • 37. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer AB testing: “by hand” 37 So far we did it only “by hand”, i.e. we use SQL + Dashboarding. We haven’t found a very nice open source solution yet. BigQuery AB distribution image source: https:/ /pubs.rsc.org/image/article/2018/AN/c8an01303a/c8an01303a-f3_hi-res.gif
  • 38. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer NLP processing: spaCy 38 A pretty good industry standard for NLP. Source: https:/ /spacy.io/
  • 39. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Comparison of Public Cloud Providers: AWS vs Google vs Microsoft 39 39 Source: https:/ /aws.amazon.com/resources/analyst-reports/gartner-mq-cips-2021/ The “public cloud” market is dominated by 3 major providers from the US: Amazon Web Services / AWS: https:/ /aws.amazon.com/ Microsoft Azure: https:/ /azure.microsoft.com/ Google Cloud Platform / GCP: https:/ /cloud.google.com/?hl=de
  • 40. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Comparison of Public Cloud Providers: AWS vs Google vs Microsoft 40 40 Source: https:/ /www.itprotoday.com/iaas-and-paas/aws-continues-dominance-over-azure-google-cloud-strong-growth
  • 41. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer AI / Data offerings by public ☁ providers and other vendors 41 41 Personal recommendation: Go to a public cloud (AWS, Azure, GCP) or Databricks (cloud-agnostic) and use the open source tooling suite they deploy as a service. In my opinion no vendor solution has shown to be superior by now. Often, the vendors just rebrand given OSS solutions (Jupyterlab!) with tweaks.
  • 42. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer ML is getting commoditized very fast! The entrance barrier shrinks. 42 Source: https:/ /cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-dnn-models BigQuery ML to train models from tabular data directly in SQL! No Python + Pandas + Notebooks needed. Can enable faster model iteration.
  • 43. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer AutoML tools are getting better very fast! 43 Source: https:/ /cloud.google.com/vision/docs/features-list Source: https:/ /codelabs.developers.google.com/vertex_custom_training_prediction#1 Source: https:/ /console.cloud.google.com/vertex-ai/datasets/create AWS, Azure, GCP offer similar services. Though these services can differ in quality.
  • 44. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer AutoML tools are getting better very fast! - Example: Vertex AI by Google Cloud 44 Source: https:/ /cloud.google.com/vision/docs/features-list Source: https:/ /codelabs.developers.google.com/vertex_custom_training_prediction#1 Source: https:/ /console.cloud.google.com/vertex-ai/datasets/create AWS, Azure, GCP offer similar services. Though these services can differ in quality.
  • 45. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 45 Image source: https:/ /cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review Agenda Introduction Tooling Example Projects Job Market Perspectives Key Takeaways
  • 46. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles 46 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Data Scientist Machine Learning Engineer Data Engineer Backend Engineer DevOps Engineer The jobs generate different output and use different tools.
  • 47. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - My Path 47 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Data Scientist Machine Learning Engineer Data Engineer Backend Engineer DevOps Engineer ?
  • 48. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - Skills 48 Source: Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Engineer Backend Engineer / SWE DevOps Engineer
  • 49. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - Data Analyst - Task & Tooling 49 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Scientist Machine Learning Engineer Data Engineer Backend Engineer DevOps Engineer Data Analyst Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions
  • 50. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - Data Scientist - Task & Tooling 50 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Machine Learning Engineer Data Engineer Backend Engineer DevOps Engineer Data Analyst Data Scientist Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions Identify business problems that can be solved with data science + implement solutions Python / sklearn / pandas / Streamlit / Notebooks / SQL / AB testing / Power Point …
  • 51. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - ML Engineer - Task & Tooling 51 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Data Scientist Backend Engineer Data Engineer DevOps Engineer Machine Learning Engineer Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions Identify business problems that can be solved with data science + implement solutions Python / sklearn / pandas / Streamlit / Notebooks / SQL / AB testing / Power Point … Scale data solutions and increase ML team productivity following MLOps principles Pipeline tools / MLFlow / Kubernetes / Databases / Python / YAML / …
  • 52. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - Data Engineer - Task & Tooling 52 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Data Scientist Machine Learning Engineer Backend Engineer DevOps Engineer Data Engineer Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions Identify business problems that can be solved with data science + implement solutions Python / sklearn / pandas / Streamlit / Notebooks / SQL / AB testing / Power Point … Scale data solutions and increase ML team productivity following MLOps principles Pipeline tools / MLFlow / Kubernetes / Databases / Python / YAML / … Design data models and implement processes that are essential to run the business SQL / Postgres / BigQuery / Message Queues / Serverless Functions / DBT /Kafka / …
  • 53. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - Backend Engineer - Task & Tooling 53 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Data Scientist Machine Learning Engineer Data Engineer DevOps Engineer Backend Engineer Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions Identify business problems that can be solved with data science + implement solutions Python / sklearn / pandas / Streamlit / Notebooks / SQL / AB testing / Power Point … Scale data solutions and increase ML team productivity following MLOps principles Pipeline tools / MLFlow / Kubernetes / Databases / Python / YAML / … Design data models and implement processes that are essential to run the business SQL / Postgres / BigQuery / Message Queues / Serverless Functions / DBT /Kafka / … Java / Java Spring / Golang / Microservices / Docker / Authentication / API Management / … Engineer scalable backend systems that implement the business logic
  • 54. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - DevOps Engineer - Task & Tooling 54 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Data Scientist Machine Learning Engineer Backend Engineer Data Engineer DevOps Engineer Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions Identify business problems that can be solved with data science + implement solutions Python / sklearn / pandas / Streamlit / Notebooks / SQL / AB testing / Power Point … Scale data solutions and increase ML team productivity following MLOps principles Pipeline tools / MLFlow / Kubernetes / Databases / Python / YAML / … Design data models and implement processes that are essential to run the business SQL / Postgres / BigQuery / Message Queues / Serverless Functions / DBT /Kafka / … Java / Java Spring / Golang / Microservices / Containers / Authentication / API Management / … Engineer scalable backend systems that implement the business logic Promote the DevOps culture of releasing software frequently and establishing feedback loops everywhere Continuous Integration (CI) / Continuous Delivery (CD) / Containers / Cloud / APM / …
  • 55. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles - DevOps Engineer - Task & Tooling 55 Source: modified from Chandra Reddy, https:/ /medium.com/@lchandratejareddy/a-data-analyst-vs-a-data-scientist-vs-a-data-engineer-91b1f46d5995 Data Analyst Excel / Power Point / Power BI / Tableau / SQL / Python / R Scripts / … Create one-off or recurring analyses as foundation for business decisions Identify business problems that can be solved with data science + implement solutions Python / sklearn / pandas / Streamlit / Notebooks / SQL / AB testing / Power Point … Scale data solutions and increase ML team productivity following MLOps principles Pipeline tools / MLFlow / Kubernetes / Databases / Python / YAML / … Design data models and implement processes that are essential to run the business SQL / Postgres / BigQuery / Message Queues / Serverless Functions / DBT /Kafka / … Java / Java Spring / Golang / Microservices / Containers / Authentication / API Management / … Engineer scalable backend systems that implement the business logic Promote the DevOps culture of releasing software frequently and establishing feedback loops everywhere Continuous Integration (CI) / Continuous Delivery (CD) / Containers / Cloud / APM / … Data Scientist Machine Learning Engineer Backend Engineer Data Engineer DevOps Engineer
  • 56. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles Google Trends - Germany 56 Source: https:/ /trends.google.de/trends/explore?geo=DE&q=data%20analyst,data%20scientist,machine%20learning%20engineer,data%20engineer
  • 57. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Job Roles Google Trends - United States 57 Source: https:/ /trends.google.de/trends/explore?geo=US&q=data%20analyst,data%20scientist,machine%20learning%20engineer,data%20engineer
  • 58. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer LinkedIn Job Offerings 58 Source: https:/ /de.linkedin.com/jobs See that the number of results for Data Analyst and Data Scientist are similar? This is because many companies promote the same job as “Data Analyst” and “Data Scientist” at the same time. Be careful: When applying for a Data Scientist job, you might actually end up with a Data Analyst job.
  • 59. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer LinkedIn Job Offerings 59 Source: https:/ /de.linkedin.com/jobs However, there is also a considerable overlap between “Data Scientist” and “Machine Learning Engineer”. But it’s not as much as between “Data Analyst” and “Data Scientist”. Whereas the “Data Engineer” role is pretty well defined from years of experience in classical Business Intelligence (BI) environments, thus there is not much overlap to the other roles.
  • 60. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Salary DE - Data Analyst 60 Source: https:/ /www.kununu.com/de/gehalt/datenanalyst-982
  • 61. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Salary DE - Data Scientist + ML Engineer 61 Source: https:/ /www.kununu.com/de/gehalt/data-scientist-973
  • 62. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Salary DE - Data(base) Engineer 62 Source: https:/ /www.kununu.com/de/gehalt/datenbankentwickler-985
  • 63. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Resources for self-development 63 For beginners: - Udacity (high quality content!): - https:/ /www.udacity.com/course/data-scientist-nanodegree--nd025 - https:/ /www.udacity.com/course/machine-learning-dev-ops-engineer-nanodegree--nd0 821 - https:/ /www.udacity.com/course/intro-to-relational-databases--ud197 - To get2know GCP: - https:/ /www.cloudskillsboost.google/ with free hands-on labs - https:/ /developers.google.com/machine-learning/crash-course/ml-intro (from developers) - https:/ /www.coursera.org/learn/gcp-fundamentals Advanced: - Do Certifications (very advanced, needs prior knowledge in GCP): - https:/ /cloud.google.com/certification/data-engineer - https:/ /cloud.google.com/certification/machine-learning-engineer
  • 64. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer 64 Image source: https:/ /cloud.google.com/blog/products/application-development/a-cloud-built-for-developers-2021-year-in-review Agenda Introduction Example Projects Tooling Job Market Perspectives Key Takeaways
  • 65. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer Key Takeaways 65 The job roles in the data space are very different, they do different things. You should decide what mixture of Communication/Math/Programming/Business you want. It can happen that the Data Scientist Job is actually a Data Analyst Job, be careful! MLOps is currently the sort-of industry standard for structuring ML projects. The tooling landscape is abundant and innovating all the time, it is impossible to keep up. Instead, develop your own mechanism to deal with this complexity.
  • 66. niklas.haas@codecentric.de | Machine Learning Engineer & Google Cloud Data Engineer The job roles in the data space are very different, they do different things. You should decide what mixture of Communication/Math/Programming/Business you want. It can happen that the Data Scientist Job is actually a Data Analyst Job, be careful! MLOps is currently the sort-of industry standard for structuring ML projects. The tooling landscape is abundant and innovating all the time, it is impossible to keep up. Instead, develop your own mechanism to deal with this complexity. Key Takeaways 66 Thanks for having me! Special thanks to Maxx Richard Rahman for reaching out to me! Feel free to add me on LinkedIn: https:/ /www.linkedin.com/in/niklas-haas/ I am open to feedback about the content / slides / style of presentation / presentation performance! Right now or on LinkedIn or via email. I have time for more questions :-)