Data In Motion Paris 2023

https://docs.google.com/spreadshe
ets/d/19sCmNHIF29faxPonSquGz
PKMcvBgTGyClBLAmjQUEFc/edit
?usp=sharing

Evaluez votre niveau de maturité dans le
streaming de données
● Scannez le QR code
● Répondez aux questions
● Découvrez votre niveau de maturité
Data in Motion

Agenda
Plénière
Horaire SESSION
09:30 Keynote: Reinventing Kafka in the Data Streaming Era
10:05 Adéo : Construire une plateforme de données sur-mesure
10:40 CA GIP - CA PS : Publication métier des évènements du système d’autorisation émetteur
11:15 Pause café - Networking
11:45 Lactalis : le bilan
12:20 L’Oréal : L’Oréal Beauty Tech empowered by event-driven architecture
12:55 Cocktail déjeunatoire - Networking
14:00 CDC Informatique : Scaling with Kafka
14:30 Keynote : Stream processing with Apache Flink
15:00 Europcar : De Kafka open-source à une stratégie multi-cloud avec Confluent Cloud
15:30 Everysens : How Everysens made its product pivot a success with confluent cloud
16:00 AWS : Building Modern Streaming Analytics with Confluent on AWS

Agenda
Breakout
Horaire SESSION - Auditorium
16:30
Confluent et Flink: le mariage parfait à l'ère des données en temps
réel
17:00
Comment gouverner une plateforme Confluent - un équilibre à
trouver entre anarchisme et autoritarisme
17:30 Cocktail - Networking- Clap de fin
H SESSION - Auditorium
Imply : Building an Event Analytics Pipeline with Confluent Cloud and
Imply Polaris
Tinybird : Speed Wins: From Kafka to APIs in Minutes

Keynote : Reinventing Kafka in the
Data Streaming Era
Dan Rosanova
Head of Product
Confluent Cloud Platform and Growth

Loyalty Rewards
Curbside Pickup
Trending Now
Popular on Netflix
Top Picks for Joshua
Created by the founders of
Confluent while at LinkedIn
Apache Kafka has ushered in the
data streaming era…
>70%
of the Fortune 500
>100,000+
Organizations
>41,000
Kafka Meetup
Attendees
>200
Global Meetup
Groups
>750
Kafka Improvement Proposals
(KIPs)
>12,000
Jiras for Apache Kafka
>32,000
Stack Overflow
Questions
Real-time Trades
Ride ETA
Personalized Recommendations

The need for a cloud-native, data streaming platform
Connecting all your apps, systems and data into a central nervous system

Self-managing Kafka comes with cost and complexity…
Infrastructure and
Operations
Development
Resources
Security &
Governance
Global
Availability

“Hosted Streaming Services” didn’t solve all our
problems…
How can I connect
to all of my source
and sink systems?
How do I govern
my data for quality
and compliance?
How do I deploy
across multi and
hybrid cloud
environments?
How can I control my
networking
costs?
How can I ensure
low-latency,
while maintaining a
resilient service
How can I meet each
use case with
stream
processing?

What is this costing your business?
Unpacking the direct and indirect costs of self-managing and hosted streaming services
FTE Costs
It’s hard because... Which results in...
Costly time & resources (~$3-5M/year) managing Kafka,
connectors, governance, security, etc.
Delayed Time-to-Value
Infra Spend
$$$$ on underutilized infra for storage, compute and
networking
Increased Total Cost
of Ownership
Business Risk
Potential downtime and security breaches means
diverting resources
Unplanned Downtime
and Breaches
It’s expensive because... Which results in...

The world is moving towards fully-managed services…
Data Warehousing Databases
Self-managed
hardware and
software
Fully managed
services
Snowflake
“By 2025, at least 75% of organizations will depend on managed
services.”
— Globe Newswire
Hosted cloud
services CloudSQL
Data Streaming
Hosted Streaming
Vendors

Confluent Cloud
Cloud-native data streaming platform built by the founders of Apache Kafka®
Everywhere
Connect your data in real
time with a platform that
spans from on-prem to
cloud and across clouds
Complete
Go above & beyond
Kafka with all the
essential tools for a
complete data streaming
platform
Cloud-Native
The 10x Apache Kafka®
service: elastic, resilient
and performant,
powered by the Kora
Engine
Stream confidently on the world’s most trusted data streaming platform built by the founders of
Apache Kafka©, with resilience, security, compliance, and privacy built-in by default.

A Cloud-Native Kafka Service Can
Eliminate Operational and Infrastructure
Burden…
Compute and
Storage Decoupling
Networking and
Global Replication
Elastic and
Automated
Multi-tenancy
and Serverless
… But Putting Kafka in the Cloud Isn’t Just Putting Kafka in the Cloud

We Transformed Kafka for the Cloud, Ground Up!
Resilient
with automated
operations to ensure
high availability and
reliability
Performant
with networking
service decoupling
and replication
optimization
Elastic
to seamlessly
expand and shrink
based on customer
demands
KORA ENGINE
The Apache Kafka® Engine Built for the Cloud
Cost efficient
with multi-tenancy,
data tiering, cloud
optimizations and
hands-off operations

We Invested 5M Engineering Hours to Reachitect Every
Layer of Kafka and Built a Truly Cloud-Native Engine
NETWORK
COMPUTE
AZ AZ AZ
Cells
Cells
Cells
OBJECT
STORAGE
CUSTOMERS
Multi-Cloud Networking & Routing Tier
Metadata
Durability Audits
METRICS & OBSERVABILITY
CONNECT
PROCESSING
GOVERNANCE
Data Balancing
Health Checks
Real-
time
feedback
data
Other Confluent Cloud Services
GLOBAL CONTROL PLANE

50
40
30
20
10
0
Hours required to scale 3 brokers to 4, replication factor of 3,
30-day retention, 100 MBps throughput, 10GBps network
30X
ELASTICITY
Scale to handle GBps+
workloads and peak
customer demands
30x faster without
operational burden
30X
Confluent
Cloud
OSS
Kafka
Hours

10
8
6
4
2
0
Other
Kafka Service
Confluent
Cloud
Minimum downtime commitment
by Kafka service based on SLA
10X
RESILIENCY
Ensure high availability
and offload Kafka ops
with 99.99% uptime
SLA, multi-AZ clusters,
and no-touch Kafka
patches
10X
8.76 hrs
0.876 hrs
99.99%
99.9%
Minimum
downtime
commitment
(hrs/year)

Infinite Storage
AWS GA
Infinite Storage
GCP GA
Infinite Storage
Azure GA
Time
X
∞
Average Storage Used per Cluster
by Cloud Providers
AWS GCP AZURE
STORAGE
Never worry about
Kafka storage again
with Intelligent Tiered
Storage and Infinite
Retention
AVG Storage
per Cluster

$2.57M
Total savings
Operate 60%+ more efficiently with reduced
infrastructure costs, maintenance demands
and overhead, and downtime risk
257%
3-year ROI
Launch in months rather than years by
reducing the burden on your teams
with our fully managed cloud service
Our Customers Save on Costs and Increase Their ROI
Total Economic Impact of using Confluent • Forrester, March 2022
“Confluent Cloud made it possible for us to meet our tight launch deadline with limited resources.
With event streaming as a managed service, we had no costly hires to maintain our clusters and
no worries about 24x7 reliability.”

Cloud-native data streaming platform built by the founders of Apache Kafka®
KORA: THE APACHE KAFKA ENGINE, BUILT FOR THE CLOUD
STREAM
Fully managed
service, available
Everywhere
The 10x, Cloud-native
Kafka service powered
by Kora Engine
A Complete,
enterprise-grade
Data Streaming
Platform
CONNECT GOVERN PROCESS
Confluent is so much more than Apache Kafka

Tom
Architect Lead
Anne
Architect Lead
Legacy apps
Real-time
apps
Cloud-native
apps
Cloud-based
data systems
Both Tom and Anne are tasked with…
● Maintaining OSS Kafka across all distributed systems, apps,
etc.
● Ensuring the web application is performant and resilient
● Building the new digital experiences for mobile, tablets, and
etc.
Legacy data
systems
Mainframes
PIVOT
INC.
FOSTER
OPS

…This is the result!
Without a fully
managed Kafka
service, Tom is
struggling…
PIVOT
INC.
…His “vendor” doesn’t
help connect, process,
or govern data
Self-managing Kafka was costly
and complex…

Creates, maintains and scales Kafka clusters
Onboards teams to use Kafka in a secure way
Connect to source and sink systems, while maintaining
governance
1
2
3
Build projects and distribute time between new tasks and
Kafka management
4
In this example, you will see how Anne…
Anne is going to
try with
Confluent
Cloud!
FOSTER
OPS

Tom and Anne have
very different budgets
and delivery timelines
Tom has exponentially rising
TCO, and can’t deliver for 12
months!
Anne has reduced TCO of
by up to 60%, and can
deliver in 3 months!
*App development time for example purposes only, actual time varies based on use
case
Cost to operate
Kafka environment
Time to market
~6-9 months to build production grade Kafka platform
~3 months on app
development*
~3 months on app
development*
Start
in 1
wee
k
Go to market in ~12 months
⬇60%
Cloud
Infrastructure
Operational
(FTE)
Downtime
Impact
Support & other
3rd party spend
Total self-
managed
Confluent
Cloud
Go to market in ~3 months

Who would you rather
be?
Anne at Foster Ops
with Confluent Cloud
Fully managed, cloud-native data
streaming solution
Complete data streaming platform with
connectors, governance and security
Flexible deployments across clouds and
on-premises
Anne has reduced TCO by up to 60% while
delivering to market 3x faster, and is in line for
that promotion real soon!
Significant effort self-managing
and maintaining Kafka
Custom coded connectors, governance
and security
Manually replicate clusters across
environments
Tom at Pivot Inc. with
OSS Kafka
Tom has exponentially rising infra costs and
spends 80% of his time self-managing Kafka,
thus is constantly getting pestered by
leadership!

As a result, Tom
isn’t very popular
right now…
PIVOT
INC.

While Anne is
quite the
superstar!
FOSTER
OPS

Trusted by customers everywhere

Program Details/Benefits
- Grand prize of up to $500K
- 2 runner-up awards of up to $250K
- Oppty to pitch to Benchmark, Sequoia,
Index
Target Profile
- Founded within last 5 years
- <$10M in venture funding
- Must use Confluent in submission
9/12 to 12/31 → Application window open
1/22 → Top 10 Announced
2/15 → Top 3 announced
3/19 → Grand prize announced at KSL
Sign up now!

Scan to get started
Start your free trial of Confluent
Cloud & get $500 in credits
Get started with Confluent Cloud!
$400 to spend immediately, plus an
additional $100 credit voucher
Code: DIMT2023 confluent.io/get-started/

Data Platform
Define your
business
assets
Document,
Reference and Share
your data
Get assisted by
data architects
Make your
data available
through the
datahub
Build your data
pipelines to transform
your data into
business data

Data Platform
Find &
understand
data
Share and manage
your reports
Explore the
data
Build your
reports
By Datahub
By Datahub

Data Platform
SQL
SQL
Transformations MLOPS
by
ADEO

Data Streaming - Patterns
CD
C
…
…

Data Streaming - Patterns
KStream /
ksqlDB

Technology &
Expertise
SelfService &
Governance
Vision and
Sponsorship

Construire une plateforme de
streaming de données, sur-mesure
Mustapha Benosmane
Product Leader Data Exchange & Processing
Adeo

Buildinga tailored
datastreaming
platform

Product Manager
Data Exchange & Processing
Dad of a little boy
I have a passion for technology and how to make it
useful
Data, Apache Kafka, Api management, ESB, REST,
Java, GO ...
Mustapha Benosmane

Collaborateurs ADEO
Habitants
Professionnels de
l’amélioration de l’habitat
Ecosystèmes
Fournisseurs, Partenaires
Marchands
Construire, Rénover
Aménager, Décorer
Produire, Délivrer
Agir, Impacter
Maison, appartement
Quartier, ville
Environnement
Planète
Endroit sain, sécurisé
responsable, durable
économe et confortable
Vie
Bien-être
Accomplissement
59

2
COMPLEMEN
TARY
MARKETS
INHABITANTS WITH HOME
IMPROVEMENT PROJECTS
HOME IMPROVEMENT
PROFESSIONALS

WORLDWIDE COLLABORATORS IN ADEO
61
150 000

DIGITAL COLLABORATORS IN ADEO
62
4 500

Central Integration platform
Product Teams Product Teams
ESB
Expert Team

Central Data Lake
Data Team
Product Teams
Data Warehouse ou Data
Lake centralisé
ESB

Centralizing skills ensures strong governance
Centralizing skills can help mutualize costs.
Centralizing skills reduces training and support costs
Centralizing skills reduces iteration capacity.
Centralizing the platform disengages users.
Centralizing skills and platforms reduces autonomy
and innovation.
Lessons learned

How can we provide
a service that
enables autonomy
and innovation,
while maintaining a
high level of
governance?

67
Data Streaming Platform
Topic As A
Service
Technology
Governance
Self-Service
1. Enable developers to search, find,
understand and use Topics.
2. Enable teams to subscribe and agree on a
defined Interface agreement.
3. Enable developers to create and manage
the life-cycle of Topics and Schemas
4. Within a defined framework.
Automatically enforced.
5. Provide visibility of links between
applications.
6. Enable the product teams to control costs.

1. Kafka for its properties
2. A managed offering -> No added
value in operating a Kafka cluster
3. Performance and resilience
4. A high level of security
5. A controlled cost
Technologie

69
Kafka as a service
Serverless
● Elastic scaling up &
down from 0 to GBps
● Auto capacity mgmt,
load balancing, and
upgrades
High Availability
● 99.99% SLA
● Multi-region / AZ availability
across cloud providers
● Patches deployed in
Confluent Cloud before
Apache Kafka
Infinite Storage
● Store data cost-effectively
at any scale without
growing compute
DevOps Automation
● API-driven and/or point-
and-click ops
● Service portability &
consistency across cloud
providers and on-prem
Network Flexibility
● Public, VPC, and Private Link
● Seamlessly link across clouds
and on-prem with Cluster
Linking

1. Respect best practices.
2. Maintain visibility and control over
interdependencies.
3. Provide and enforce interface
contracts.
4. Resource segmentation
5. Control access and authorizations
Governance

Governance
1 Respect best practices

Governance
1
2
Respect best practices
Interdependencies
cartography

Governance
1
2
Interdependencies
cartography
3
Avoid mixing business
objects in the same
Topic

Governance
1
2
Interdependencies
cartography
3
objects in the same
Topic
4Provide and enforce
interface contracts

Governance
1
2
Interdependencies
cartography
3
objects in the same
Topic
4Provide and enforce
interface contracts
5
Resource segmentation
/ access and
authorizations

1. Topic catalog
2. Topic documentation
3. Topic subscription
4. Topic and Schema management
Self Service

insert
here
Governance
1 Topic catalog

insert
here
Governance
1
2
Topic catalog
Topic documentation

insert
here
Governance
1
2
Topic catalog
Topic documentation
3Topic subscription

insert
here
Governance
1
2
Topic catalog
Topic documentation
3Topic subscription
4Topic and Schema
management

Confluent Cloud
Topic Topic
DSP API
DSP CLI
UI
Kafka To
BigQuery
Github
Action
Terraform
provider
Topic

Billions Records
produced/consumed
per month
470 40/160 4296
Topics in production
Digital Products
using the platform
Strong adoption
some figures

Great
responsiveness from
the team in the Run
channel
Very fast
OnBoarding for
newcomers
Extremely high
user autonomy
Rich and clear
documentation
A pleasure to work
with DSP
Glad to have a knowledgeable
team at Adeo with this level of
maturity

The Data Streaming
Platform is part of
the Adeo Data
Platform

Data Platform
Customer
Orders
Offers
SQL
SQL
SQL
Product & Data
Team
Product & Data
Team
x

Data Platform
Digital
Product
Connectivity
Product Team
Workflow
(histo, transfo, quality…)
Expose/Explore
Batch storage
Stream storage
Doc
Search & Find
Monitor Security
Digital
Product
Business Users
IT Users
Business
Users
Quality

Do you have any questions?
Mustapha.benosmane@adeo.com
THANKS!

Publication métier des événements
du système d’autorisation émetteur
Julien Legrand
Product owner data
Crédit Agricole Gip
Camille Facque
Chef de projet
Crédit Agricole Gip

« Construire une offre de service, c’est industrialiser le déploiement d’une
solution technique complexe en y ajoutant un ensemble d’outils et
d’expertises permettant de rendre autonome l’utilisateur final. »

Publication métier des
évènements du système
d’autorisation émetteur
CAPS - KAFKA
19/10/2023

Les activités et l’expertise paiement pour le compte du Crédit Agricole
Gestion des cartes bancaires,
de l’émission de la carte
jusqu’au paiement
Monétique porteurs
Encaissement des paiements
par carte ou par chèque en proximité
ou en VAD
Monétique commerçants
Paiements SEPA & internationaux
Echanges et Flux
Gestion et mise à disposition de billets
et de pièces sur les différents marchés
(particuliers, professionnels, entreprises).
Fiduciaire
Garantir aux clients la sécurité des
transactions et des systèmes
d’information notamment via la DATA
Science et des outils d’IA
Authentification, Sécurité
& DATA
Développement de nouveaux services
innovants par l’Open Banking et l’utilisation
de la DATA
Open banking
& Data
Les domaines d’activités
Pour le compte de Crédit Agricole S.A.
auprès des instances de place nationales,
européennes et internationales
Représentation
interbancaire
Gestion des échanges d’opérations
bancaires entre banques, entre clients sur
tous les marchés France et l’international

10/10/2023
95
Chiffres en suivi cumulé
janvier à décembre 2022
Nos principaux chiffres clés
13,6 milliards
D’opérations paiement traitées
MONÉTIQUE FLUX
9,8 milliards
d’opérations carte (Groupe CA)
22,9 millions
de cartes dans le parc Crédit Agricole
(CR, LCL, CACF)
5,2 milliards
d’autorisations fournies (paiement, retrait)
1,3 milliard
d’opérations SCT (virements)
1,6 milliard
d’opérations SDD (prélèvements)
19 millions
de virements SWIFT (Groupe CA)

Affichage des opérations
d’autorisations (paiements &
retraits)
Affichage des
opérations temps réel
Mise à jour du solde
provisoire
Emissions de notifications INAPP
Alerting client
Besoins Clients d’opérations temps réel
Gestion du cycle de vie
des cartes bancaires
Prise en compte des évolutions
des statuts cartes
Refonte de la MAJ des soldes
provisoires
Simplification de la
restitution
Enrichissement des données
existantes externes et restitution
dans un message unique
Utilisation des données statiques &
supervision business
Sauvegarde &
supervision

Enjeux
Des refontes d’architectures techniques et
fonctionnelles
Choix de la solution technique
MQ Séries
KAFKA
API
Protocoles d’échanges techniques
Diversités du format fonctionnel des
messages
Structure fonctionnelle historiquement
complexe
Collecte de données externes
Restitution d’avis unique
Enrichissement des données
Le chef
Les équipes
solutions
Diversités des échanges
Utilisation statique des données
Monitoring métier
Utilisations des données

Architecture existante
SAE SPAA
Cluster
MQ
Demande
d’autorisation
Application 3
Application 4
Application 5
Application 6
API Format de données 1
MQ Format de données 2
API Spécificités techniques 1
Format de données 3
API Spécificités techniques 2
Application 2
MQ Format de données 2
Application 1
Cluster
MQ
S.A.E - Serveur d’autorisation émetteur
S.P.A.A - Service de publication des avis d’autorisations
AVANT
Diffusion des avis d’autorisations en échanges
synchrones

Architecture KAFKA
SAE
SPAA
KAFKA
Demande
d’autorisation
Diffusion des avis d’autorisations en échanges
asynchrones
Application 1
Consumers
S.P.A.A - Service de publication des avis d’autorisation
Application 2
Application 3
Application 4
Application 5
Application 6
Stream
Producteur
Prométhéus ELK
Grafana
APRES
Schéma registry

Focus SPAA
Une application stateless
Evènement unique
SAE
DLT
LCL
CAPS
C.R
Validation
Autorisation
Autorisation
Notification
Carte
Contrat
Autorisation
Autorisation
Notification
Carte
Contrat
Autorisation
Autorisation
Notification
Carte
Contrat
Identification
clients
Split
Split
Split

Données fonctionnelles – Répartition par topics
Pics de volume ~550 TPS soit ~15M de transactions / jour
Cycle de vie carte
Cycle de vie contrat
Avis de paiement
Avis de retrait
Avis de redressement
3 %
96 %
1 %
Autorisation /
notifications
Opposition
Activation du sans contact
Ouverture de service VAD
Création de carte
Suppression de carte
Changement de plafonds carte

Architecture KAFKA
SAE
SPAA
KAFKA
Demande
d’autorisation
Passage d’une application stateless à statefull
Application 1
Consumers
Application 2
Application 3
Application 4
Application 5
Application 6
Stream
Producteur
Prométhéus ELK
Grafana
AVANT
Schéma registry

Architecture KAFKA
SAE
SPAA
KAFKA
Demande
d’autorisation
Application 1
Consumers
Application 2
Application 3
Application 4
Application 5
Application 6
Stream
Producteur
Système
externe
Prométhéus ELK
Grafana
APRES
Schéma registry

Architecture KAFKA
Messages CAPS
Accounts
Client HTTP
Producteur
Connector HTTP SINK
Questions / Réponse
Mise au format Success Error
Response
Messages CAPS
enrichis
Messages CAPS
Left Join
Merge
Join
Évènement unique
SAE
DLT
CAPS
Validation Identification
clients
Split
APRES
Système externe

Architecture KAFKA
Autorisation
Input
Évènement unique
SAE
DLT
CAPS
Validation Identification
clients
Split
AVANT

Architecture KAFKA
SAE
SPAA
KAFKA
Demande
d’autorisation
Logstash & mongoDB
Application 1
Consumer
Application 2
Application 3
Application 4
Application 5
Application 6
Stream
Producteur
Système
externe
Envoi et récupération des
données
Connector HTTPS SINK
Prométhéus ELK
Grafana
AVANT
Schéma registry

Architecture KAFKA
SAE
SPAA
KAFKA
Demande
d’autorisation
Logstash & mongoDB
Application 1
Consumer
Schéma registry
Application 2
Application 3
Application 4
Application 5
Application 6
Stream
Producteur
Système
externe
Envoi et récupération des
données
Connector HTTPS SINK
Prométhéus ELK
Grafana
Connector MongoDB
Consumer Logstash
APRES

10/10/2023
111
Offre Topic as a Service
Des fonctionnalités déjà disponibles
❖ Création d’un compte technique
associé au contexte applicatif
SPAA via un call HTTP KAPI.
❖ Export des données vers
mongoDB ou Elasticsearch ou
ingestion de données issues
d’api HTTP via le cluster de
worker Kafka Connect déjà
disponible.
❖ Sollicitation de
l’accompagnement de la squad
Streaming ou de l’expertise
Confluent à tout moment.

10/10/2023
112
Et d’autres à améliorer

10/10/2023
113
Objectif
2024…

10/10/2023
114

Retour d’expérience : Projet kafka
Le bilan, un an plus tard
Cédric Barbin
IT & Innovation Architect
Lactalis

118
Cédric BARBIN
Architecte SI, Lactalis Informatique
• 20+ années d'expérience
• Développeur, expert technique, architecte, manager, …
• Transformation digitale des entreprises
• Expériences en SSII, Cabinet de conseil, Client final
• Passionné par la technologie et l’innovation
• Entrepreneur dans l’âme
• Certifications Dev et Ops sur Kafka et MongoDB

119
Le groupe Lactalis
Premier groupe Laitier au monde
270
Sites de production
dans 51 Pays
85 500
Collaborateurs
dans 84 pays
28
Milliards d’euros
de chiffre d’affaire

120
Amériques
18 900
collaborateurs
Afrique
9 900
collaborateurs
Asie
Océanie
10 700
collaborateurs
Europe
32 000
collaborateurs

123
Le Lactopole
La Cité du Lait

124
• Une DSI groupe, Française, à Laval
• Des correspondants internationaux rattachés aux Pays
• Des projets d’envergure en France et à l’international
• Une forte croissance externe du groupe
• Une stratégie Cloud Privé (LACTIC)
Et des postes à pourvoir,
notamment sur Kafka !
Direction des Systèmes d’Informations
Internationale et basée en France, à Laval
80
Salles
serveurs
~200
Personnes en
France
~500
Personnes à
l’international
2 Po
De données

125
Premier projet Kafka…
… le bilan 1 an après !

126
Contexte
Traçabilité produit fini (vision supply-chain)

127
Le projet (périmètre Kafka)
Modernisation de la traçabilité produit fini
5 WMS as
Data Source
(CDC & Connect)
450
Utilisateurs
300k
business
event/day
(output)
63 lieux de
conditionnement
23 entrepôts
Source : GS1

128
• Fraicheur des données
• Aujourd’hui : plusieurs dizaines de minutes (mode batch)
• Cible : moins de 1 minute
• Capacité de corriger / rejouer
• Problématique de référentiels pas à jour
• Traçabilité technique des données
• Expliquer d’où vient la donnée,
• Comment elle a été calculée,
• Le cas échéant pourquoi elle a été rejouée
Les objectifs business
Amélioration du nouveau système de traçabilité

130
• Une refonte des Batch BigData / Scala « as is »
• Un principe de « migration technique »
• Des règles métiers à priori simples
• Donc utilisation de ksqlDB
• Langage SQL connu des analystes
• Pas de micro-services à gérer
• Complément à ksqlDB/Connect : la boite à outils
• SMT / Plugins
• UDF
• CI/CD JulieOps
L’orientation projet initiale
KSQL/Connect + Boite à outils

131
Talend CDC
ex GammaSoft
Kafka Connect
SFTP
Kafka Connect
JDBC
ksqlDB
Jointures
Formatage
Règles de gestions
Kafka Connect
MongoDB
YAML
JSON
SQL
Pipeline « 0 code »
déployé avec JulieOps

132
Fiction vs. Réalité
Des problématiques aussi variées que nombreuses

133
Talend CDC
ex GammaSoft
Kafka Connect
JDBC
ksqlDB
Jointures
Formatage
Règles de gestions
Kafka Connect
MongoDB
Micro-services
KafkaStream

134
• Des données en BDD (pas de « push » métier)
• Une captation des changements : CDC ou Connect
• Refonte milieu de projet : émission d’événement par certaines sources
• L’insertion dans un TOPIC d’entrée dit « RAW »
• Un traitement optionnel de préfiltre (lié au modèle CDC)
• Des requêtes ksqlDB
• Données brutes (pointeur sur le TOPIC initial ou pré-filtré)
• Données préparées : formatage, conversion, clés externes, …
• Données consolidées : jointure et transcodifications
• Données exposées : règles de gestions
La « topologie » classique d’un de nos flux
26 flux très proches d’un point de vue structure
Plus des nouveaux flux 100% KafkaStream !

137
1 déploiement comporte aujourd’hui :
• Des configurations CDC
• Des configurations JSON pour les Connect Source
• La création de TOPICS avec paramétrage « Stream ou Table »
• Une gestion des consumer-group
• Des inventaires de déploiement de « PréFiltre » Kafka Stream
• Des données de références de transcodification
• Les mapping KSQL (ensemble de requêtes cohérent) = 1 flux
• Des configuration JSON pour les Connect Sink
Un déploiement complexe
Ecart démarrage : pas simplement du SQL…

138
Un de nos grosses problématiques :
• Un système toujours en mouvement : pas de début et de fin, pas de « OK/KO »
• Comment détecter des écarts business ? Comment les corriger ?
On a donc besoin de s’outiller pour, entre autres choses :
• Cartographier nos flux
• Déployer (et dé-comissionner) ces flux
• Lancer des rejeux métier sur ces flux
• Gérer nos tables de transco et le cache des UDF
• Superviser les traitements (compteurs / KPI / latence)
On a intégré la supervision à nos outils d’exploitation (EON),
de BI (Qlik) et dans notre outils de gestion des flux : LactaFlux !
Une exploitation complexe
Un « run » en production sous-estimé

141
Déploiement / cartographie

143
• Migration technique = pas si évident
• Complexité du « In Motion »
• Concepts temporels à intégrer
• Besoin de maîtrise de la donnée et des systèmes amonts
• Principe d’idempotence à intégrer au projet
• Besoin fort d’expertise dès le début
• Besoin d’experts (externes) mais d’une implication forte interne
• Besoin d’optimisation pour ne pas exploser les volumes/perfs
• Un outillage spécifique à concevoir et développer
• Une plateforme technique complexe (on premise) : go to cloud ?
Les enseignements de ce démarrage
Data in motion <> Data at rest !!

144
• Fraîcheur & qualité des données
• Cible : 1 minute 🡺 moyenne à 1 seconde !
• Responsabilisation des systèmes sources (pivot / event)
• Capacité de corriger / rejouer
• Rejeu sur plusieurs dizaines de milliers de lignes simple
(quelques clics) exécuté en quelques secondes
• Traçabilité technique des données
• Rejeu via topic = traçabilité
• Principe des topics Kafka = pas d’update
Les objectifs business
Un succès, on time !
Objectifs atteints
voir même dépassés

145
• Nouveaux flux business & Machine Learning
• Déploiement à l’international (US notamment)
• Intégration de la traçabilité amont (production)
• Migration / passerelle vers normalisation GS1
• Plateforme commune (GS1/Distributeur)
• Blockchain & SmartContract
Et demain ?
Augmentation périmètre et nouveaux use case

146
Merci de votre attention
Questions / Réponses ?
Rejoignez-nous !

L’Oréal Beauty Tech empowered by
event-driven architecture
Julien Brun
Head of APIs & EDA
Center of Enablement
L’Oréal
Sindhu Prasanna
EDA Expert
L’Oréal

C1 - Internal
use
C1 – Usage interne
L’ORÉAL BEAUTY TECH
EMPOWERED BY EDA
DATA IN MOTION
19TH OCTOBER 2023

C1 - Internal
use
149
MODERN INTEGRATION TO SUPPORT BEAUTY TECH
API not enough to address all
integration pattern
Give to program, project,
platform the rights tools for theirs
use cases
Provide the freedom and
autonomy by providing a
frame, best practices and
support
API FIRST
if NOT
Only…..
….ED
A

C1 - Internal
use
150
COE API & EDA
COE API & EDA
PROVIDE THE BEST PRACTICES AND FRAMEWORK
Z
Sindhu
PRASANNA
EDA expert
BUSINESS
Business Enablement (support API Product
Owner, projects)
governance & processes
Backlog management aligned with
business priorities
TECHNICAL
API/EDA expert community
Continuous improvement of
Framework
API & EDA community & Technology
Expertise
TRANSVERSAL
Training and upskilling programs
Modern integration sustainability
Analytics and reporting
API expert
BUSINESS TECHNICAL
Abdeladim
ABDELLAH
GLOBAL ARCHITECTURE & DATA

C1 - Internal
use
151
EMEA
DEV
QU
A
PPD
PRD
AMERICAS
DEV
QU
A
PPD
PRD
APAC
DEV
QU
A
PPD
PRD
EVENT DRIVEN PLATFORM
EXISTING PLATFORM
AZURE EVENT HUB
DEV
QUA
PPD
PRD
.
.
REPLICATORS RUNNING
BETWEEN THE ZONES
CONNECTOR DEPLOYED TO
REPLICATE DATA FROM AZURE
EVENT HUB TO CONFLUENT
private
EDA AT L’OREAL

C1 - Internal
use
152
EDA AT L’OREAL

C1 - Internal
use
153
ENABLEMENT
OFFICE
HOURS
c
WEBINAR
ENABLEMENT
TRAINING
PROFESSIONAL
SERVICE

C1 - Internal
use
154
PLANNING
JAN
2023 MILESTONES
FEB
MAR
APR
JUN
MAY
JUL
AUG
SEP
CONTRACT
CLUSTERS SET UP
AUTOMATION
MONITORING (ELK)
PREPRD & PROD READY
(INTERNAL PROJECTS)
PREPRD & PROD READY
(EXTERNAL PROJECTS)
PROOF OF VALUE
GOVERNANCE & BEST
PRACTICES ONBOARDING
KT FOR SUPPORT
TEAM
FIRST PROJECT LIVE

C1 - Internal
use
155
CHALLENGES
.
.
.
.
.
Network configuration between the clusters
OAuth2: Compliancy between Confluent and our IDP
Lack of maturity
Hybrid use cases on private clusters
KsqlDB roles restrictions

C1 - Internal
use
156
DEMOCRATIZED EDA PLATFORM

C1 - Internal
use
157
TOPIC AS A SERVICE
Automatization of access management
to confluent using ITSM tool
(ServiceNow).
Automatization of topic management
for:
to create a topic
to subscribe to a topic
to publish into a topic

C1 - Internal
use
158
ENABLEMENT
OFFICE
HOURS
WEBINAR
ENABLEMENT
TRAINING
PROFESSIONAL
SERVICE

C1 - Internal
use
159
ENABLEMENT

C1 - Internal
use
160
PRODUCT CATALOG
OpenAPI
AsyncAPI

C1 - Internal
use
161
USE CASE
Example 3PL
L’ORÉAL
SAPS4HANA
APIGEE /
CONFLUENT
GEODIS
DHL
Event for Inbound delivery
Inbound delivery confirmation

C1 - Internal
use
162
NEXT STEP
Shared Domain Data
Sets on GCP
Governed Business APIs
on APIGEE
Use Cases
DATA
Product SellOut O+O …
TO SUSTAINABLE DATA PRODUCTS
OWNERSHIP
Data mapped and under
business ownership
ACCESSIBILITY
Data accessibility
to all use case
STANDARDISATION
Shared data and common
catalogue
QUALITY
Single source of truth
SECURITY
Follow group security rules
Governed Business
Event on CONFLUENT

C1 - Internal
use
163
ORGANIZATION
Confluent Account Team
Philippe Amiel Account Executive
Identifies opportunities with new and existing customers
and builds them into long-term profitable relationships. philippe@confluent.io
Eric Carlier
Senior Solutions
Engineer
Key technical advisor to customers, undertaking
technical design and development of end-to-end
solution.
eric.carlier@confluent.io
Camille de Rosier
Customer Success
Manager
Ensures customers are successful in their deployments of
Confluent service throughout onboarding and beyond.
cderosier@confluent.io
Sylvain Le Gouellec
Customer Success
Technical Architect
Ensures customers realize the full value of the Confluent
service. Runs point with customer and liaises with internal
account team from day to day.
slegouellec@confluent.io
Daniel Petisme
Customer Success
Technical Architect dpetisme@confluent.io
Nils Bouchardon
Senior Solutions
Architect
Your senior technical lead who will guide you through
design principles, deployment strategies, best practices
and much more.
nbouchardon@confluent.io

Cocktail déjeunatoire & Networking

Scaling with Kafka: notre expérience
Julien Maillard
Architecte
CDC Informatique

CDC Informatique
La Caisse des Dépôts
Le groupe Caisse des Dépôts, alliance
unique d'acteurs économiques publics et
privés, s’engage, au cœur des territoires,
pour accélérer la transformation écologique
et pour contribuer à une vie meilleure pour
toutes et tous.
1 320Md€
Bilan agrégé 2022 *
4,2 Md€
Résultat net agrégé
*
* Chiffres agrégés : :Section générale comptes consolidés en
normes IFRS + Fonds d’Epargne en normes françaises
171

CDC Informatique
CDC Informatique
PRIORÉNO
172

Kafka introduit pour l'ingestion de
données en temps réel dans Hive.
CDC Informatique
L’arrivée de Kafka : Un tournant en 2019
173
Module HDF déployé en mars 2019.

Nécessité de revoir l'ISP pour être conforme aux nouveaux usages
Offre de service du socle non critique
Multiplicité des outils pour la création de ressource Kafka 3 équipes
Peur et résistance au changement
Présence de silos prégnants
CDC Informatique
Nos Constats : 2021
174

CDC Informatique
Pourquoi avons-nous évolué ?
175
Conviction que
l'état actuel
n'était pas viable
Soutien d'un
responsable de
squad engagé
Nouveaux enjeux
et jalons métier
(SRE, Instant
payment)
Élément
incontournable dans
l'approche Cloud
Native de notre
schéma directeur
Décryptage de Kafka
pour le rendre lisible
auprès de la DG et avoir
des sponsors

CDC Informatique
Notre démarche
◆ Diagnostic 360°
◆ Infrastructures
◆ Sécurité
◆ DevSecOps
◆ Supervision
◆ Enjeux et jalons métier (instant payment)
◆ Usage existant
◆ Questionnement sur la distribution de Kafka
176
Réalisation d'une étude complète comprenant :

CDC Informatique 177
La nouvelle cible
◆ Mise en œuvre rapide des projets et diffusion de la connaissance.
◆ Formation des équipes.
◆ Sécurité renforcée de la plateforme.
◆ Plateforme DevSecOps maîtrisée de bout en bout.
◆ Haute disponibilité et compatibilité avec le PSI.
◆ Prendre en compte les irritants collectés lors du constat

CDC Informatique
Rappel des scénarios éditeurs
pour le socle Kafka
Évolution de la plateforme actuelle
Cloudera HDF vers CDP
Nouvelle plateforme Confluent
Nouvelle plateforme Apache Kafka basée
sur les fonctionnalités de la LAPOSTE BSCC
178

CDC Informatique
◆ Professional Service
◆ MultiRegion cluster (PSI)
◆ Délai de mise en œuvre faible
179
Scénario choisi par le codir et
recommandé par nos équipes
Scénario Confluent
◆ Haut niveau d’industrialisation
◆ Meilleure gestion du contenu
◆ Support éditeur expert Kafka

CDC Informatique
Bilan en chiffre après 1 an de production
◆ 69 applications en recette.
◆ 30 personnes formées (120 jours
de formation).
◆ 40 jours de PS.
◆ 4 montées de version sans
interruption de service.
◆ 2 ops, 2 experts techniques, 1
archi 180
◆ Mars 2022:
Démarrage du projet.
◆ Octobre 2022 :
Ouverture de la production.
◆ 5 clusters : 3 clusters (8 brokers)
sur 2,5 Data Centers.
◆ 35 applications en production.

CDC Informatique
Bilan après 1 an de production
181
◆ La plateforme est devenue un exemple à suivre dans l’entreprise
◆ Retour très positif sur l'autonomie des équipes
◆ Documentation en ligne autoportante
◆ Pattern sur étagère transactional outbox
◆ Dashboard de métriques public de la plateforme
◆ Elastic qui offre l'accès à tous les logs des connecteurs par cluster

CDC Informatique
Les clés de notre réussite
Équipe
pluridisciplinaire
Intégration
précoce de
toutes les
équipes
Budget projet
complet
Beaucoup de PS
au démarrage
Transformation
organisationnelle
182

CDC Informatique
Prochaines étapes et défis à relever
183
◆ Industrialisation des secrets, des cas d’usage, et des tableaux de bord.
◆ Travaux de rework et convergence sur l’IAC (API, Kubernetes, S3).
◆ Vérification automatisée des normes d’entreprise.
◆ Interaction utilisateur via IHM avec la plateforme.

CDC Informatique
En conclusion
We have a Dream !
Rendez-vous dans 2 ans !
184

CDC Informatique
Scaling avec Kafka : Notre expérience
Julien Maillard
Architecte
CDC Informatique
MERCI
185

01
02
03
Understanding the importance of stream processing
Why Apache Flink is becoming the de facto standard
Enhancing Apache Flink as a cloud-native service
Agenda

Keynote: Stream processing with
Apache Flink®
Konstantin Knauf
Director Solutions Engineering
Confluent

Understanding the importance of
stream processing

Stream processing is a critical part of data streaming
Enable frictionless
access to up-to-date
trustworthy data
products
Share
Reimagine data
streaming everywhere,
on-prem and in every
major public cloud
Stream
Make data in motion
self-service, secure,
compliant and
trustworthy
Govern
Drive greater
data reuse with
always-on stream
processing
Process
Make it easy to on-
ramp and off-ramp
data from existing
systems and apps
Connect

Stream processing acts as the compute layer to Kafka,
powering real-time applications & pipelines
DATA IN MOTION
Streaming
Applications
Apache
Flink
Apache
Kafka
DATA AT REST
Application
Layer
Processing
Layer
Storage
Layer
Traditional
Databases
File
Systems
Web
Applications

Processing
Kafka
Custom
apps
3rd party
apps
Databases
Databas
e
Data
Warehouse
SaaS
app
Querie
s
Analytics
Interactions
Processing
Processing
Processing down
stream of Kafka
increases latency,
adds costs and
redundancy, and
inhibits data reuse
Increased complexity from
redundant processing
Data systems & applications
built on stale data
Expensive & inefficient to clean
and enrich data multiple times

Processing data at
ingest improves
latency, data
portability, and cost
effectiveness
Custom
apps
3rd party
apps
Databases
Databas
e
Data
Warehouse
SaaS
app
Querie
s
Analytics
Interactions
Kafka
Storage
Flink
Compute
Stream Processing
Process your data once, process your data right
Maximized data reusability &
consistency
Improved cost-efficiency from
cleaning & enriching data once
Real-time apps & data systems
reflect current state

Stream processing enables users to filter, join, and enrich
streams on-the-fly to drive greater data reuse
Heatmap service
Payment service
Supply chain systems
Watch lists
Profile mgmt
Incident mgmt
Customer
profile data
ITSM systems
Central log systems
Fraud & SIEM systems
Alerting systems
AI/ML engines
Visualization apps
Threat vector
Transactions
Payments
Mainframe data
Inventory
Weather
Telemetry
IoT data
Notification engine
Payroll systems
CRM systems
Mobile application
Personalization
Web application
Clickstreams
Customer loyalty
Change logs
Customer data
Recommendation
engine

Why Apache Flink is becoming
the de facto standard

Flink growth has
mirrored the growth
of Kafka, the de facto
standard for
streaming data
>75% of the Fortune 500 estimated
to be using Kafka
>100,000+ orgs using Kafka
>41,000 Kafka meetup attendees
>750 Kafka Improvement Proposals
>12,000 Jiras for Apache Kafka
0
50,000
100,000
150,000
2020 2021 2022
2016 2017 2018
Flink
Kafka
Two Apache Projects, Born a
Few Years Apart
Monthly Unique Users

Innovative companies have adopted both Kafka & Flink

Digital natives leverage Flink to disrupt markets and gain
competitive advantage
UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection

Developers choose Flink because of its performance and
rich feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology

Flink’s powerful runtime offers limitless scalability
Job Manager
Client
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
Data Streams
Deploy, Stop, Cancel
Tasks
Trigger Checkpoints
Submit
Job
Result
s
Applications are parallelized into possibly
thousands of tasks that are distributed and
concurrently executed in a cluster

Leverage in-memory performance
. . .
Durable
Storage
Logic State Logic State Logic State
Input
Tasks
Output
In-Memory or
On-Disk State
Local State
Access
Periodic, Asynchronous,
Incremental Snapshots
Stateful Flink applications are optimized for fast access to local state by maintaining
task state in memory or on-disk data structures, resulting in low latency processing.

Flink checkpoints and savepoints enable fault tolerance and
stateful processing
CHECKPOINTS SAVEPOINTS
Automatic snapshot
created by Flink
periodically
● Used to recover from failures
● Optimized for quick recovery
● Automatically created and managed
by Flink
User-triggered snapshot at
a specific point in time
● Enables manual operational tasks,
such as upgrades
● Optimized for operational flexibility
● Created and managed by the user

Flink recovers from failures in a timely and efficient manner
Job Manager
Client
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
Data Streams
Deploy, Stop, Cancel
Tasks
Trigger Checkpoints
Submit
Job
Result
s
If a task managers fails, the job manager will
detect the failure and arrange for the job to be
restarted from the most recent state snapshot
X

Flink offers layered APIs at different levels of of abstraction
to handle both common and specialized use cases
Flink SQL
Table API
DataStream API
ProcessFunction Apache Flink Runtime
Low-level Stream Operator API
DataStream
API
ProcessFunction
Table / SQL API
Planner/Optimize
r
Flink SQL
High-level, declarative API that allows you to write SQL
queries to process data streams and batch data as
dynamic tables
Table API
Programmatic equivalent of Flink SQL, allowing you to
define your business logic in either Java or Python, or
combine it with SQL
DataStream API
Low-level, expressive API that exposes the building
blocks for stream processing, giving you direct access to
things like state and timers
ProcessFunction
The most low-level API, allowing for fine-grained
processing of individual elements for complex event-
driven processing logic and state management

Process real-time
data streams with
Flink SQL
Flink SQL is an ANSI-compliant SQL
engine that can define both simple
and complex queries, making it well-
suited for most stream processing
use cases, particularly building real-
time data products and pipelines.
GROUP BY color
events
results
COUNT
WHERE color <> orange
4
3

Flink supports unified stream and batch processing
● Entire pipeline must always be running ● Execution proceeds in stages, running as needed
● Input must be processed as it arrives ● Input may be pre-sorted by time and key
● Results are reported as they become ready ● Results are reported at the end of the job
● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart
● Flink guarantees effectively exactly-once results
despite out-of-order data and restarts due to
failures, etc.
● Effectively exactly-once guarantees are more
straightforward

Enhancing Apache Flink as a
cloud-native service

Operating Flink on your own (along with the Kafka storage
layer) is difficult
Deployment
Complexity
Setting up Flink requires a
deep understanding of
resource allocation and
management
Management &
Monitoring
Picking relevant metrics can
be overwhelming for a
DevOps team just starting
with stream processing
Limited
Ecosystem
Flink lacks pre-built
integrations with observability,
metadata management, data
governance, and security
tooling
Cost &
Risk
Self-supporting Flink
incurs significant costs &
resources in terms of infra
footprint and Dev & Ops
FTEs

Effortlessly filter, join, and enrich your
data streams with Flink, the de facto
standard for stream processing
Enable high-performance and efficient
stream processing at any scale,
without the complexities of
infrastructure management
Experience Kafka and Flink as a
unified platform, with fully integrated
monitoring, security, and governance
Confluent Cloud for
Apache Flink®
Simple, Serverless Stream Processing
Easily build high-quality,
reusable data streams with
the industry’s only cloud-
native, serverless Flink service
Available for preview in select regions – see the docs for regional availability

Effortlessly filter, join, and enrich your data streams with Apache Flink
Real-time processing
Power low-latency applications and pipelines that react
to real-time events and provide timely insights
Data reusability
Share consistent and reusable data streams widely with
downstream applications and systems
Data enrichment
Curate, filter, and augment data on-the-fly with
additional context to improve completeness, accuracy, &
compliance
Efficiency
Improve resource utilization and cost-effectiveness by
avoiding redundant processing across silos
“With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors,
smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion
detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and
response to security incidents without any added operational burden.”

Analyze real-time
data streams to
generate important
business insights
Get up-to-date results to power
dashboards or applications requiring
continuous updates using:
● Materialized views
● Temporal analytic functions
● Interactive queries
Account Balance
A $15
B $2
C $15
Account A,
+$10
Account B,
+$12
Account C, +$5
Account B, -
$10
Account C,
+$10
Account A, -$5
Account A,
+$10
Time
REAL-TIME ANALYTICS

Build streaming data
pipelines to inform
real-time decision
making
Create new enriched and curated
streams of higher value using:
● Data transformations
● Streaming joins, temporal joins,
lookup joins, and versioned joins
● Fan out queries, multi-cluster
queries
215
t1, 21.5 USD
t3, 55 EUR
t5, 35.3
EUR
t0, EUR:USD=1.00
t2, EUR:USD=1.05
t4: EUR:USD=1.10
t1, 21.5 USD
t3, 57.75 USD
t5, 38.83 USD
Currency rate
Orders
STREAMING DATA PIPELINES

Recognize patterns
and react to events in
a timely manner
Develop applications using fine-
grained control over how time
progresses and data is grouped
together using:
● Hopping, tumbling, session
windows
● OVER aggregations
● Pattern matching with
MATCH_RECOGNIZE
EVENT-DRIVEN APPLICATIONS
C
price>lag(price)
D
price<lag(price)
C
price>lag(price)
B
price<lag(price)
A
Double Bottom
Period & Volume
Price

Enrich real-time data streams with Generative AI directly
from Flink SQL
INSERT INTO enriched_reviews
SELECT id
,
review
,
invoke_openai(prom
pt,review) as score
FROM
product_reviews
;
K
N
Kate
4 hours ago
This was the worst decision ever.
Nikola
1 day ago
Not bad. Could have been
cheaper.
K
N
B
Kate
★★★★★ 4 hours ago
This was the worst decision ever.
Nikola
★★★★★ 1 day ago
Not bad. Could have been
cheaper.
Brian
★★★★★ 3 days ago
Amazing! Game Changer!
The Prompt
“Score the following text on a scale of
1 and 5 where 1 is negative and 5 is
positive returning only the number”
DATA STREAMING PLATFORM
B
Brian
3 days ago
Amazing! Game Changer!
COMING SOON

Fully managed
Easily develop Flink applications with a serverless, SaaS-
based experience instantly available & without ops
burden
Elastic scalability
Automatically scale up or down to meet the demands of
the most complex workloads without overprovisioning
Usage-based billing
Pay only for resources used instead of infrastructure
provisioned, with scale-to-zero pricing
Continuous, no touch updates
Build using an always up-to-date platform with
declarative, versionless APIs and interfaces
Throughput/Data Traffic Over Time
Capacity Demand
Enable high-performance and efficient stream processing at any scale
"Offloading that day-to-day burden of operations has been a huge help. A lot of overall operations-type work
gets offloaded when you move to Confluent Cloud… Where we’re saving time now is on the DevOps side of
maintenance of all those systems — patching underlying systems or upgrading(them) — those were big things
to be able to offload."

Go from zero to production in minutes versus months
Minutes
Weeks
Open Source
Apache Flink
In-house development and
maintenance without
support
Cloud-hosted
Flink services
Manual Day 2 operations
with basic tooling and/or
support
Apache Flink on
Confluent Cloud
Fully managed, elastic,
and automated product
capabilities with zero
overhead
Months

Throughput over Time Capacity
Demand
Maximize resource utilization & avoid over-provisioning infrastructure
Scale elastically to
meet changing
business needs
Automatically scale up or down to
meet the demands of the most
complex workloads
● Avoid underutilized infrastructure
resources
● Pay only for resources used, with
scale-to-zero pricing

Tap into a next-generation, serverless SQL experience …
SQL client in Confluent
Cloud CLI
Different teams with different skills and needs can access
stream processing using the interface of their choice
Rich SQL editing
user interface

"When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant
downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing
without the complexities of infrastructure management, enabling teams to focus on building real-time streaming
applications & pipelines that differentiate the business."
Enterprise-grade security
Secure stream processing with built-in identity and
access management, RBAC, and audit logs
Stream governance
Enforce data policies and avoid metadata duplication
leveraging native integration with Stream Governance
Monitoring
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Connectors
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Monitoring Connectors
Enterprise-grade
Security
Stream
Governance
Experience Kafka and Flink seamlessly integrated as a unified
platform

Provide platform-wide security with granular access to
critical resources
Flink
Admin
Flink
Developer
Flink
Developer
Flink SQL
queries
Flink Control
Plane requests

Automate metadata
synchronization for
effortless data
exploration
Integration with Schema Registry
enables Flink to easily access and
process data from multiple Kafka
clusters and Confluent environments
in a consistent and unified way:
● Kafka topics → Flink tables
● Confluent environments → catalogs
● Kafka clusters → databases …
…
…

Connect your entire business with just a few clicks
70+
fully
managed
connectors
Amazon S3
Amazon Redshift
Amazon DynamoDB
Google Cloud
Spanner
AWS Lambda
Amazon SQS
Amazon Kinesis
Azure Service Bus
Azure Event Hubs
Azure Synapse
Analytics
Azure Blob
Storage
Azure
Functions
Azure Data Lake
Google
BigTable

De Kafka Open Source à la mise en
place d’une stratégie multi-cloud
avec Confluent
Ahmed Tali
Group Head of Architecture & Foundations Engineering
Europcar Mobility Group

229
229
Agenda
1. Europcar Mobility Group Global Context
1. Group Information System
1. Internal Kafka Usage
1. Event Driven Architecture Study
1. Why Confluent Cloud
1. Migration Plan
1. Project Status & Next Steps

230
Europcar Mobility Group in a nutshell
230
Global Context
- Part of Green Mobility Holding led by VW
- Extensive network in more than 140 countries
- Almost 9000 employees
- 3 Billions revenue / 256 000 Vehicles
- 5 million customers worldwide

231
231
Group Information System
EMOBG Information System in nutshell
Business Oriented
IS Components
Domain Driven Design
Brand Agnostic
Products based organization
Multi-cloud
Strategy
AWS First approach
Specialized business domains in
GCP
Composable
Architecture
Interoperability with 3rd party
solutions
API Products & Events as main
communication flows
Technology
Transformation
Monoliths to Microservice
Architecture
API First Approach
Event Driven Architecture

232
Overview & Architecture Patterns
232
Event Driven Usage
Event Driven Patterns
- Publish-Subscribe
- Kafka Connectors
- Change Data Capture
Microservices Based Architecture
- Autonomous Microservices
(own storage)
- Microfrontend apps
Distributed and Open IS
- Multicloud / Multi region
- Full Integration with 3rd Party
and Partners Systems

233
Former Situation
233
Group Event Driven Study
Main Issues
- Difficult to setup a stable and extensible platform
- Tricky to scale Kafka platforms causing performance issues
- Hard to achieve high availability
- Costly Integration :
- Several development workloads
- 3 Implementations to maintain
- Lack of visibility on group level

234
Target Situation
234
Main Expectations
- Unify our event driven layer and setup a well
governed kafka based solution
- Adopt latest Kafka Market Standards
- Be focus on business flows instead of managing
Kafka platform
Studied Options
- Self Hosted Event Driven Solution
- Fully Managed Event Driven Solution

235
Why fully managed model
235
1. Kafka Technology difficult to master by internal teams (based on years of
experience)
- We need permanent high level kafka expertise
1. Autonomous teams operating in a multicloud and distributed environments are not
adopting same industry standards
- We need central goverened Kafka solution with group policies
(security, monitoring…) and applied by everybody
1. Managing, Scaling & maintaining Kafka platform reduces teams autonomy and
impact focus on business aspects
- We need stable, performant and auto scaled solution with low
internal effort

236
Why Confluent Cloud
236
Why Confluent Cloud
- We need high level of kafka expertise
- Confluent Cloud are original creators of Apache Kafka
- We need fully managed, stable and auto scaled solution
- Confluent Cloud provides Fully Managed and Hybrid services
- We need central governed Kafka solution where we can apply group
policies (security, monitoring…)
- Confluent Cloud brings features over Kafka such as monitoring,
security, connectors…
- We need cloud agnostic solution offering good level of our infrastructure
coverage
- Confluent Cloud covers all our cloud providers and aligns to our
multicloud strategy

237
High level Architecture
237
EMOBG Confluent Cloud Integration
- Confluent Cloud cluster for each Cloud Provider
- Private Links to secure access for each cloud
provider
- Using CI / CD automation, based on terraform
- Self hosted Connectors on EMOBG clouds
(Internal flows)
- Fully managed connectors for external sources /
sinks (Salesforce, SAP..)
- Cluster linking feature as migration enabler

238
238
Migration Plan to Confluent Cloud
- Stop all local Kafka brokers evolutions (No more flow on
them)
- Migration of technical flows : CDC, JDBC Connectors
- Replication of current kafka local configuration in new
confluent cluster
- Connection of Data sources and Data sinks to new clusters
- Assessment, Quality assurance & Validation with teams
- Migration of functional flows : Publish / Subscribe
- Confluent Cloud CI / CD pipeline shared and used in full
autonomy by teams
- Pilote phase with selected teams (Learning path)
- Full Migration Tribe by tribe (10 tribes)

239
239
Project Status and Next Steps
Project status
- Foundations
- AWS & GCP Terraform CICD pipeline
- Production & Non production environments & Clusters
- Self Hosted Cluster Connect on AWS
- Security flow Access through OIDC & CC Identity Pool
- Migration Status
- Tech flows : Self Hosted Debezium connectors migrated to Confluent Cloud
- Functional flows :
- Kafka Legacy Topics Replication to confluent Cloud
- Connect sources & sinks to CC topics (end of Q1 2024)

240
240
Project Status and Next Steps
Next steps & Opportunities
- Big milestone : Data platform BI, Data Analytics Integration
- Tech Transformation & Azure cloud extension
- Buy First approach & Third parties flows (SAP, Salesforce connectors)

How Everysens made its product
pivot a success with confluent cloud
Dai-Chinh Nguyen
CTO
Everysens
Luc Jallerat
Senior Back Developer
Everysens

How Everysens made its product pivot a success
with confluent.cloud
Luc Jallerat (Senior Backend Developer)
Dai-Chinh Nguyen (CTO)
October 2023

Titre
Everysens: Smart collaboration to decarbonise freight transportation
Why
Why
How
How
What
What
✔ 55+ employees: 60% engineers & products
✔ 3 Offices in Paris, Lille and Duisburg
✔ One-stop shop for rail users
✔ 8 years of expertise in Rail Freight Digitisation
✔ A team experienced in deploying international
projects
Decarbonize Freight Transport
Collaborative SaaS Solution “TVMS”
● The largest integrated rail freight
ecosystem
● A SaaS tool made by and for
shippers
● Single Source of Truth for shared
Data
● Leveraging real-time data in rail
freight processes

Titre
What does a TVMS do ?
Day-to-day challenges of a logistic operator
● Plan & operate freight transports
● Anticipate loading/unloading operations
● Challenge carriers’ performance
● Secure communication with partners
● Optimize wagon fleet size
● Reduce logistic operation costs
● Reduce logistic operation CO2 emissions
● …
Everysens TVMS facilitates those operations

Titre
Once upon a time…
2016
2019
2020-2021
RAIL FREIGHT VISIBILITY
SYSTEM (SaaS)
RAIL FREIGHT TRANSPORT
MANAGEMENT SYSTEM (SaaS)
RAIL FREIGHT TRANSPORT AND
VISIBILITY MANAGEMENT SYSTEM (SaaS)
2015
IOT DEVICE MAKER FOR ASSET
LOCALISATION
Move to Cloud (GCP) From self hosted Kafka to
Confluent cloud
2022-2023
OPENING OF OUR
GERMAN OFFICE IN
DUISBURG
AND
FUNDRAISING OF 6M€

Titre
How Technology supported those transformations ? (1/2)
1.
From IoT sensors to a SaaS
Visibility System
Main Challenge :
SaaS system design principles
1 2
Modular service-based architecture
API & Event-based communication
Agility & Continuous Delivery
Container orchestration
Cloud infrastructure & managed services
Cloud Native Interoperable
Standard public API
Data Integration middleware
Master Data Standards
3
Data Centric
Data Analytics
Real time data processing
Data Science
4
Reliable & Secured
Scalability
Resiliency
Recoverability
Security policy & Legal compliance

Titre
2.
Adding the “V” to
the TVMS
Main Challenge :
Seamless merging of Visibility & TMS systems
TMS VISIBILITY
Contract
Asset
Asset
Type
Route
Goods
Order
Contact
Goods
Route
Transpor
t

Titre
1 Golden Source
+
2 Domains
=
Exchanging Transactional
Data
+
Sharing Static Data
Referential
TMS VISIBILITY
Front TMS
static data
Front Visibility
static data
Front MDR
static data

Titre
Referential
Front MDR
static data
TMS
DB
TMS VISIBILITY
V
DB
REPLICATION
1 New Referential
Binding
2 Old Referentials
Sharing Static Data in a Legacy Context

Titre
Outbox Pattern
DB
TRANSACTION
Entity Outbox
Double Write Issue

Titre
Global Solution for the New Referential
Architecture

Titre
Global Solution in Action
Golden Source
Event in AVRO

Titre
The rest of the journey
???
???
???
Integrationof Flink for a global
Past + Present perspective in
real-time
General WebHook Catalogconnected to
our internalevents
ModularRealTime Fully-IntegratedGlobal TVMS
System
???
TrackingEngine computingimpactsof
unorderedevents on both the past and
the present
From a Batch Driven approach
To an Event Drivenone

Questions?
Contact
youness.lemrabet@everysens.com
Website
www.everysens.com
256

© 2022, Amazon Web Services, Inc. or its
affiliates.
© 2022, Amazon Web Services, Inc. or its
affiliates.
Reatime is everywhere
Confluent on AWS
Mickael Baye,
DATA IN MOTION PARIS 2023
Senior Solution Architect , AWS
257
Mohamed Hamza Ben Mansour
Senior Solution Architect , AWS

Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Speakers
Mohamed Hamza Ben Mansour
Senior Solution Architect , AWS, France
Mickael Baye,
Senior Solution Architect , AWS, France

Agenda
1. Real time is everywhere
2. Confluent on AWS
3. What our customers do together
4. Wrap up
259

2. Confluent on AWS
3. What customers do with Confluent on AWS
4. Wrap Up
260

Event Streaming allow us to set Data in Motion:
Continuously processing evolving streams of data in real-time
Rich front-end
customer
experiences
Real-time
Events
Real-time
Event Streams and Analysis
A Sale A shipment
A Trade
A Customer
Experience
Real-time
backend
operations

Real time in everyday life
262
Anomaly and fraud detection
Empowering IoT analytics
Nourishing marketing campaigns
Real-time personalization
Tailoring customer experience in real time
Supporting healthcare and emergency services

Other
Systems
Other
Systems
Kafka
Connect
Kafka Cluster
Kafka
Connect
Apache Kafka is an Event Streaming Platform

ksqlDB
Meeting you where you are

Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 265
The standard across industries
Finance & Banking Insurance Telecom
Travel & Retail
10 OUT
OF 10 8 OUT
OF 8
Fortune 500 Companies
Using Apache Kafka
70%
Transportation Energy & Utilities Entertainment
Technology
8 OUT
OF 10 9 OUT
OF 10
10 OUT
OF 10
10 OUT
OF 10
10 OUT
OF 10 8 OUT
OF 10

2. Confluent on AWS
4. Wrap Up
266

Cloud-native, complete, everywhere
Re-imagined Kafka Experience
Fully Managed
No Ops
On AWS

Cloud-native, complete, everywhere
Integrated Solutions
● Data Lake/Warehouse
modernization
● Mainframe offload
● Streaming Analytics
● Hybrid Cloud App Modernization
● Industry specific use-cases
OSS Developer Traction
● 100s of thousands of Kafka OSS
developers in the enterprise
Accelerate Cloud
Migrations
● No complex Lift-n-Shift
● Maintain business continuity with
zero-downtime
● Break silos to enable immediate
App/Data innovation in cloud
True Hybrid-Cloud
Architectures
● Across global multi-DCs & cloud
● Leverage legacy investments with
Hybrid Kafka & bidirectional sync
● Shift legacy $ spend to AWS by
offloading Mainframe, Oracle,...
Meet you
where you are
● 200+ pre-built connectors
including S3, RedShift, Lambda,...
● Support Well–Architected
Scenarios

Out-of-box integration with
popular services
AWS Native Services
Top-5 Global ISV for S3 Data Volume
3rd-Party ISV Services
Native integrations

Confluent and AWS: Better together

Lots of integrations ☺
271

Redshift Sink
Lambda Sink
AWS Direct
Connect
Replicator
LEGACY EDW
MAINFRAME
LEGACY DB
JDBC / CDC
connectors
Connect
Leverage 130+ Confluent pre-built connectors
Modernize
Value added apps, increase agility, reduce TCO
On-prem AWS
Cloud
Bridge
Hybrid cloud streaming
Amazon Athena
AWS Glue
SageMaker
Lake Formation
Amazon
DynamoDB
Amazon
Aurora
S3 Sink
Data Streams
Apps
ksqlDB
Connect to all AWS

2. Confluent on AWS
4. Wrap Up
274

Challenge: Modernizing legacy systems for traditional
banks to enable them to innovate faster, deliver hyper-
personalized customer experiences, and compete with
digital- native banks.
Solution: Deliver a cloud-native SaaS solution—
powered by Confluent Cloud’s real-time data streaming
platform.
Results:
● Reduced costs with increased agility and faster time
to market for traditional banks
● Achieved better hyper-personalized experiences for
banking customers
● Delivered a resilient and highly available platform
● Enhanced enterprise-grade security
● Reduced TCO with simplified management
“Our mission is to make banking 10x better for banks, for
customers, and society. To do that, we need a cloud-
native data streaming platform that is also 10x more
reliable, 10x more performant than Apache Kafka.”

Challenge: Design and maintain a resilient IT
infrastructure that can ensure continued seamless
grocery delivery during a period of unprecedented
growth.
Solution: Confluent Cloud for a real-time, data platform
that unlocks the full value of streaming data and
empowers data visibility, agility, and flexibility across a
rapidly growing organization.
Results:
● Better inventory management via real-time data
● Reduced TCO
● Improved fraud detection
● Faster execution
“For me to go hire a bunch of engineers to babysit Kafka,
I don't have the ability to go do that. Being able to
offload those concerns [to Confluent] is such a relief for
us and lets us focus on delivering value to the
organization and not worrying about ops and the other
overhead”
– Nate Kupp, Director of Engineering, Instacart

Challenge: Address legacy tech-related operational
overhead and scalability issues to allow for better
customer behavior analytics and improve internal
processes.
Solution: Confluent Cloud to save time and money by
reducing operational overhead and allowing for real-
time processing and easy scalability of event data.
Results:
● Reduced infrastructure costs by 40%
● Simplified, future-proof data architecture
● Improved infrastructure monitoring for better SLAs
and system health
● Elimination of data loss
“Confluent provides exactly what we dreamed of: an
ecosystem of tools to source and sink data from data
streams. It’s provided us not only with great data
pipeline agility and flexibility but also a highly simplified
infrastructure that’s allowed us to reduce costs.”
— Dima Kalashnikov, Technical Lead

Challenge: Build a conversational chatbot service that
incorporates complex technologies such as fulfillment,
natural-language understanding, and real-time analytics.
Solution: Use Confluent to build a fast, super-scalable
event-driven architecture that could handle immense
traffic spikes and also provide other guarantees around
delivery semantics.
Results:
● Near-zero downtime even during huge traffic spikes
● Rapid acceleration of new-skill onboarding
● Doubling of NPS rating
“We chose event-driven architecture as the core of our
platform, for which we needed a messaging service that
gave us all the guarantees…not to mention that it had to
be extremely scalable, highly available, and simple to use.
Kafka hit all of these markers, and by using Confluent
Cloud, our team was able to reduce the bottom line and
operational burden.”
— Ravi Vankamamidi, Senior Director, Technology, at Expedia Group

2. Confluent on AWS
4. Wrap Up
279

Amazon Redshift Warehousing with Confluent Cloud
Serverless with AWS and Confluent
Cloud
Real-time Sentiment Analysis with Confluent
Amazon ElastiCache and Confluent Cloud
confluent.awsworkshop.io
Try it out yourself !

Learn more
Working with streaming data on AWS
https://aws.amazon.com/streaming-data/
Modern Data Architecture on AWS
https://go.aws/3OJDhFk
Build Modern Data Streaming Analytics
Architectures on AWS
https://go.aws/3bt0HAm
Derive Insights from Modern Data
https://go.aws/3xVU3dn

Data In Motion Paris 2023

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data In Motion Paris 2023

Similar to Data In Motion Paris 2023 (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Data In Motion Paris 2023