SlideShare a Scribd company logo
Emanuele Bardelli - OLX Group
From
“All-At-Once, Once-A-Day”
To
“A-Little-Each-Time, All-The-Time”
#SAISExp5
2#SAISExp5
OLX Group and its Data Platform
Users Communication and Lifecycle Management
Near-Time Data Engineering with Spark
Takeaways & Questions
3#SAISExp5
we fuel local economies
by making it
super easy
for anyone to
buy or sell
almost anything
through our platforms
Our Mission
4#SAISExp5
Who We Are
we operate a network of
market-leading trading platforms
in over 40 countries
that are used by more than
350 million people every month
to buy and sell almost anything
5#SAISExp5
Our Numbers
brands
17+
people
5000+
daily listings
1M+
offices
35
daily events
4B+
daily notifications
50M+
monthly users
350M+
countries
43
shopping app
(20+ markets)
#1
6#SAISExp5
Data Platform Overview
RAW JSON
EVENTS
4B+
events day
DATA LAKE
400TB+
compressed
INTERNAL
TRACKERS
DATA
PUMPS
2 minutes
delay
TEAM 1
RESERVOIR
raw data
subset
DATA
PRODUCTS
ANALYTICS
TEAM 2
RESERVOIR
anonymised
parquet
MACHINE
LEARNING
BI
DATA
CATALOG
7#SAISExp5
Users Communication Team
7#SAISExp5
“optimize and deliver all OLX Group
end-user communication”
Transactional
standard messages
produced due to
day-to-day interactions
(reset password, post
approved, new follower, ...)
Marketing
one-off messages
sent to advertise
specific initiatives
(happy holidays, get the
new OLX app, ...)
Lifecycle (CLM)
messages crafted to
influence customers
journey positively
(we miss you, we picked
this item just for you, ...)
8#SAISExp5
Customers Lifecycle Management
70%
of mobile APPs users are lost after 3 days
40%
increase in Daily Active Users
15%
contribution to Value Added Services purchase
9#SAISExp5
CLM Services Ecosystem
• Segmentation
• Recommendations
• User-Devices Mapping
• Live / Control Group Bucketing
• Audience Generation
• Treatments Definition (A/B Testing)
• Messages Prioritization
• Frequency Capping
• Scheduling
• Analytics
10#SAISExp5
Former “Batch” Approach
11#SAISExp5
From
“All-At-Once, Once-A-Day”
To
“A-Little-Each-Time, All-The-Time”
12#SAISExp5
New “Near-Time” Approach
• Near-Time - Produce and send messages within 10 minutes
after a set of events happened
• Open - Give our users freedom to be more creative and
experiment more
• Reliable - Powered by technologies that make it easy to apply
engineering best practices
• Flexible - Adapt to Big Data challenges easily
13#SAISExp5
Near-Time Data Processing with Spark (Platform)
METADATA
CONSUMER
python
application
TASKS
DATABASE
data chunks
to process
KINESIS
STREAM
updated files
metadata
DATA
RESERVOIR
raw data
files
PYTHON + SPARK
FEATURES
PROCESSOR
OFFLINE
STORE
lo-freq data
products
ONLINE
STORE
hi-freq data
products
GLUE DATA
CATALOG
serverless
analytics
USER-DEVICES
MAPPING
SEGMENTATION
BUSINESS
ANALYTICS
RAW DATA SOURCE
files “virtually partitioned”
in “data chunks”
11:30:00 -> 11:39:59
14#SAISExp5
Near-Time Data Processing with Spark (Data Flow)
RAW DATA FILE
…20181003-11:27:05.gz
RAW DATA FILE
…20181003-11:39:48.gz
PY-SPARK FEATURES PROCESSOR
load a “data chunk“ and
process dependent “features”
FEATURES STORE
“features” set partitioned
by “data chunk”
USER_NUMBER_POSTS
…/hour=11/ten_min=20/
DEVICE_FIRST_BROWSE
…/hour=11/ten_min=20/
USER_LAST_LOGIN
…/hour=11/ten_min=20/
RAW DATA FILE
…20181003-11:34:23.gz
RAW DATA FILE
…20181003-11:22:34.gz
USER_NUMBER_POSTS
…/hour=11/ten_min=30/
DEVICE_FIRST_BROWSE
…/hour=11/ten_min=30/
USER_LAST_LOGIN
…/hour=11/ten_min=30/
PROCESSING CORE
(config, load, save, orchestration)
ENGINE
(reusable logic)
ENGINE
(reusable logic)
FEATURE
S
(data prep)
FEATURE
S
(data prep)
11:20:00 -> 11:29:59
kinesis stream metadata (raw data files paths)
tasks database records (data chunks to process)
15#SAISExp5
Data Chunks Metadata Example
16#SAISExp5
Processed Features Data Example
17#SAISExp5
Features Code Example (Data Preparation)
18#SAISExp5
Engines Code Example (Data Transformation)
19#SAISExp5
Data Product Example (Serverless Analytics)
20#SAISExp5
Spark Environment
• PySpark 2.3.0 : Python3 + Spark SQL + Dataframes
• AWS EMR 5.15 (Yarn) : r5.xlarge instances
1 Master + 1 Core + N Task (SPOT)
• Deployment Mode : Client
• Scheduler Mode : Fair
• Features Stores : Offline S3 (Parquet) – Online Cassandra
• CI / CD : GitLab Pipeline (RPM Packaging)
21#SAISExp5
Takeaways
• Goodbye Schedulers, Welcome Events!
• Raw data is messy and unpredictable. Constantly (re)processing
small chunks of data helps with accuracy and consistency
• Separation of storage and compute helps to optimize resources
and allows scalability on-the-cheap
• Python + Spark : Software + Data + Science engineered
together for effective big data processing
22#SAISExp5
THANK YOU! Questions?
GET IN TOUCH WITH US!
olxgroup.com
tech.olx.com
@olxtechberlin
manu@olx.com
/in/emanuelebardelli
22#SAISExp5

More Related Content

What's hot

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
HostedbyConfluent
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the cars
Juantomás García Molina
 
HOP! Airlines Jets to Real Time
HOP! Airlines Jets to Real TimeHOP! Airlines Jets to Real Time
HOP! Airlines Jets to Real Time
confluent
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
confluent
 
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Elasticsearch
 
How to Define and Share your Event APIs using AsyncAPI and Event API Products...
How to Define and Share your Event APIs using AsyncAPI and Event API Products...How to Define and Share your Event APIs using AsyncAPI and Event API Products...
How to Define and Share your Event APIs using AsyncAPI and Event API Products...
HostedbyConfluent
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
Alexander Dean
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
Alexander Dean
 
FME Powers CKAN Open Data Portal
FME Powers CKAN Open Data PortalFME Powers CKAN Open Data Portal
FME Powers CKAN Open Data Portal
Safe Software
 
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
HostedbyConfluent
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
Lars Albertsson
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
HostedbyConfluent
 
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at ScaleKafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
confluent
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
yalisassoon
 
Kafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data PlatformKafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data Platform
Fredrik Vraalsen
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
Blake Irvine
 
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
confluent
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business Innovation
VoltDB
 
Check Point Big Data Forum m3
Check Point Big Data Forum m3Check Point Big Data Forum m3
Check Point Big Data Forum m3
Alex Fok
 

What's hot (20)

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
 
Librecon 2016 bilbao: kappa architecture IoT of the cars
Librecon 2016 bilbao:   kappa architecture IoT of the carsLibrecon 2016 bilbao:   kappa architecture IoT of the cars
Librecon 2016 bilbao: kappa architecture IoT of the cars
 
HOP! Airlines Jets to Real Time
HOP! Airlines Jets to Real TimeHOP! Airlines Jets to Real Time
HOP! Airlines Jets to Real Time
 
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream RegistryKafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
Kafka Summit SF 2017 - DNS for Data: The Need for a Stream Registry
 
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
Divide & Conquer - Logging Architecture in Distributed Ecosystems with Elasti...
 
How to Define and Share your Event APIs using AsyncAPI and Event API Products...
How to Define and Share your Event APIs using AsyncAPI and Event API Products...How to Define and Share your Event APIs using AsyncAPI and Event API Products...
How to Define and Share your Event APIs using AsyncAPI and Event API Products...
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
 
FME Powers CKAN Open Data Portal
FME Powers CKAN Open Data PortalFME Powers CKAN Open Data Portal
FME Powers CKAN Open Data Portal
 
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
Kafka & InfluxDB: BFFs for Enterprise Data Applications | Russ Savage, Influx...
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at ScaleKafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
Kafka Summit NYC 2017 - Simplifying Omni-Channel Retail at Scale
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
Kafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data PlatformKafka and Kafka Streams in the Global Schibsted Data Platform
Kafka and Kafka Streams in the Global Schibsted Data Platform
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
 
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
Digital Transformation in Healthcare with Kafka—Building a Low Latency Data P...
 
Memory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business InnovationMemory Database Technology is Driving a New Cycle of Business Innovation
Memory Database Technology is Driving a New Cycle of Business Innovation
 
Check Point Big Data Forum m3
Check Point Big Data Forum m3Check Point Big Data Forum m3
Check Point Big Data Forum m3
 

Similar to From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Emanuele Bardelli

Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLab
Amanda Casari
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Databricks
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Flink Forward
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
Sybase Türkiye
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
Alluxio, Inc.
 
sap-overview.ppt
sap-overview.pptsap-overview.ppt
sap-overview.ppt
RamaKrishnaErroju
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
Splunk
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
Niels Naglé
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Landon Robinson
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
There is more to Big Data than data
There is more to Big Data than dataThere is more to Big Data than data
There is more to Big Data than data
Capgemini
 

Similar to From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Emanuele Bardelli (20)

Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Design for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLabDesign for X: Exploring Product Design with Apache Spark and GraphLab
Design for X: Exploring Product Design with Apache Spark and GraphLab
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
sap-overview.ppt
sap-overview.pptsap-overview.ppt
sap-overview.ppt
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
There is more to Big Data than data
There is more to Big Data than dataThere is more to Big Data than data
There is more to Big Data than data
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 

Recently uploaded (20)

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 

From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time” with Emanuele Bardelli

  • 1. Emanuele Bardelli - OLX Group From “All-At-Once, Once-A-Day” To “A-Little-Each-Time, All-The-Time” #SAISExp5
  • 2. 2#SAISExp5 OLX Group and its Data Platform Users Communication and Lifecycle Management Near-Time Data Engineering with Spark Takeaways & Questions
  • 3. 3#SAISExp5 we fuel local economies by making it super easy for anyone to buy or sell almost anything through our platforms Our Mission
  • 4. 4#SAISExp5 Who We Are we operate a network of market-leading trading platforms in over 40 countries that are used by more than 350 million people every month to buy and sell almost anything
  • 5. 5#SAISExp5 Our Numbers brands 17+ people 5000+ daily listings 1M+ offices 35 daily events 4B+ daily notifications 50M+ monthly users 350M+ countries 43 shopping app (20+ markets) #1
  • 6. 6#SAISExp5 Data Platform Overview RAW JSON EVENTS 4B+ events day DATA LAKE 400TB+ compressed INTERNAL TRACKERS DATA PUMPS 2 minutes delay TEAM 1 RESERVOIR raw data subset DATA PRODUCTS ANALYTICS TEAM 2 RESERVOIR anonymised parquet MACHINE LEARNING BI DATA CATALOG
  • 7. 7#SAISExp5 Users Communication Team 7#SAISExp5 “optimize and deliver all OLX Group end-user communication” Transactional standard messages produced due to day-to-day interactions (reset password, post approved, new follower, ...) Marketing one-off messages sent to advertise specific initiatives (happy holidays, get the new OLX app, ...) Lifecycle (CLM) messages crafted to influence customers journey positively (we miss you, we picked this item just for you, ...)
  • 8. 8#SAISExp5 Customers Lifecycle Management 70% of mobile APPs users are lost after 3 days 40% increase in Daily Active Users 15% contribution to Value Added Services purchase
  • 9. 9#SAISExp5 CLM Services Ecosystem • Segmentation • Recommendations • User-Devices Mapping • Live / Control Group Bucketing • Audience Generation • Treatments Definition (A/B Testing) • Messages Prioritization • Frequency Capping • Scheduling • Analytics
  • 12. 12#SAISExp5 New “Near-Time” Approach • Near-Time - Produce and send messages within 10 minutes after a set of events happened • Open - Give our users freedom to be more creative and experiment more • Reliable - Powered by technologies that make it easy to apply engineering best practices • Flexible - Adapt to Big Data challenges easily
  • 13. 13#SAISExp5 Near-Time Data Processing with Spark (Platform) METADATA CONSUMER python application TASKS DATABASE data chunks to process KINESIS STREAM updated files metadata DATA RESERVOIR raw data files PYTHON + SPARK FEATURES PROCESSOR OFFLINE STORE lo-freq data products ONLINE STORE hi-freq data products GLUE DATA CATALOG serverless analytics USER-DEVICES MAPPING SEGMENTATION BUSINESS ANALYTICS
  • 14. RAW DATA SOURCE files “virtually partitioned” in “data chunks” 11:30:00 -> 11:39:59 14#SAISExp5 Near-Time Data Processing with Spark (Data Flow) RAW DATA FILE …20181003-11:27:05.gz RAW DATA FILE …20181003-11:39:48.gz PY-SPARK FEATURES PROCESSOR load a “data chunk“ and process dependent “features” FEATURES STORE “features” set partitioned by “data chunk” USER_NUMBER_POSTS …/hour=11/ten_min=20/ DEVICE_FIRST_BROWSE …/hour=11/ten_min=20/ USER_LAST_LOGIN …/hour=11/ten_min=20/ RAW DATA FILE …20181003-11:34:23.gz RAW DATA FILE …20181003-11:22:34.gz USER_NUMBER_POSTS …/hour=11/ten_min=30/ DEVICE_FIRST_BROWSE …/hour=11/ten_min=30/ USER_LAST_LOGIN …/hour=11/ten_min=30/ PROCESSING CORE (config, load, save, orchestration) ENGINE (reusable logic) ENGINE (reusable logic) FEATURE S (data prep) FEATURE S (data prep) 11:20:00 -> 11:29:59
  • 15. kinesis stream metadata (raw data files paths) tasks database records (data chunks to process) 15#SAISExp5 Data Chunks Metadata Example
  • 17. 17#SAISExp5 Features Code Example (Data Preparation)
  • 18. 18#SAISExp5 Engines Code Example (Data Transformation)
  • 19. 19#SAISExp5 Data Product Example (Serverless Analytics)
  • 20. 20#SAISExp5 Spark Environment • PySpark 2.3.0 : Python3 + Spark SQL + Dataframes • AWS EMR 5.15 (Yarn) : r5.xlarge instances 1 Master + 1 Core + N Task (SPOT) • Deployment Mode : Client • Scheduler Mode : Fair • Features Stores : Offline S3 (Parquet) – Online Cassandra • CI / CD : GitLab Pipeline (RPM Packaging)
  • 21. 21#SAISExp5 Takeaways • Goodbye Schedulers, Welcome Events! • Raw data is messy and unpredictable. Constantly (re)processing small chunks of data helps with accuracy and consistency • Separation of storage and compute helps to optimize resources and allows scalability on-the-cheap • Python + Spark : Software + Data + Science engineered together for effective big data processing
  • 22. 22#SAISExp5 THANK YOU! Questions? GET IN TOUCH WITH US! olxgroup.com tech.olx.com @olxtechberlin manu@olx.com /in/emanuelebardelli 22#SAISExp5