SlideShare a Scribd company logo
1 of 26
Download to read offline
FEBRUARY 9, 2017, WARSAW
Scalable Analytics for
Microservices Architecture
Nikolay Golov
FEBRUARY 9, 2017, WARSAW
Part 1 – Avito story
FEBRUARY 9, 2017, WARSAW
What is Avito?
FEBRUARY 9, 2017, WARSAW
FEBRUARY 9, 2017, WARSAW
Clear №1 in Russia
One Country, Many Cities, Five Verticals
Size Relative to #2 Classified Site in Key Verticals by Page Views
Moscow
St. Petersburg
Novosibirsk
N. Novgorod
Kazan
Samara
Rostov
Volgograd
Voronezh
Ufa
Chelyabinsk
Omsk
Krasnoyarsk
Vladivostok
Yakutsk
Irkutsk
Khabarovsk
1.6x2
Size of Russian
classifieds market
More than 60%
Page Views from
Mobile
40%(3)
of all used cars sold in
Russia in 2014 sold on
Avito
6 mln
listers per day, of which
ca. 10% list for the first
time
9.5bn
Page Views in Dec-2016
across all devices generated
by 40 mln people
Ca. 11,000+
Local SMEs on subscription
model
31mln
Active Items on the site as of
2016 YE
RUB32bln
2016 Revenue(1)
(+200% to 2015)
Auto
2.0x
Real Estate
15.8x
Jobs
1.5x
Services
75.8x
All Categories 4.8x
General
75.1x
✓ By far the largest classifieds in
Russia and one of the biggest
general classifieds worldwide
✓ Top-of-mind in many categories
✓ Ca. 1200 employees, of which
ca.300 moderators
✓ Profitable since Q2 2013
FEBRUARY 9, 2017, WARSAW
Avito now.
Source: Google Analytics, LiveInternet, Internal data
Avito Business Development
0
400
800
1 200
1 600
2 000
2 400
2 800
Jan-09 Jul-09 Jan-10 Jul-10 Jan-11 Jul-11 Jan-12 Jul-12 Jan-13 Jul-13 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16
Weekly Page Views (m)
Q1 2010
Focus on Moscow
and St.Pete
September 2010
Target 13 additional cities
August 2011
Target total
of 28 cities
Q2 2013
Merger with Slando and Olx
reaffirmed #1 position in
the Russian market
+Vertical
+Listing Fees
+Pro tools
Goods C2C
+RE & Cars
+B2C
+Jobs
+Services
Path from Investment Stage to Cash Flow Generation
Stage 2Stage 1 Stage 3
Position • Competing with others • Ahead of competition • x times ahead of competition
• Heavy investment • Approximately break-even • High EBITDA marginEconomics
• Build user base • Develop business model and build leading brand • Focus on monetization enhancement; attract professional classifieds
market spend
Focus
January 2012
Avito has national
coverage
Q2 2014
Launch of Domofond, a dedicated
real estate classified
Q4 2014
Launch of a new revenue
stream: Listing Fees
Q4 2015
Sold to Naspers
FEBRUARY 9, 2017, WARSAW
Part 2: Let’s start from the beginning. What
issues could you face?
0
400
800
1 200
1 600
2 000
2 400
2 800
Jan-09 Jul-09 Jan-10 Jul-10 Jan-11 Jul-11 Jan-12 Jul-12 Jan-13 Jul-13 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16
?? ?
FEBRUARY 9, 2017, WARSAW
At the beginning you do not know:
▪… your future traffic
▪… all your future data sources
▪… your future data monetization tools
FEBRUARY 9, 2017, WARSAW
At the beginning you hope that:
▪… your traffic will grow
▪… you will connect more and more data sources
▪… you will launch more and more data monetization
initiatives
FEBRUARY 9, 2017, WARSAW
Alternatives…
Data Lake
• Data in a natural state
• Hadoop based
• Schemaless
Our choice
• Full SQL support
• Single data model
• Normalization of all incoming
data
FEBRUARY 9, 2017, WARSAW
SCHEMA
ON WRITE
Data Lake
SCHEMA
ON READ
FEBRUARY 9, 2017, WARSAW
SCHEMA
ON WRITE
Data Lake
SCHEMA
ON READ
SCHEMA
ON READ
SCHEMA
ON READ
FEBRUARY 9, 2017, WARSAW
Back office
Click stream
BI Team
Antifraud
MDM
CRM
FEBRUARY 9, 2017, WARSAW
FEBRUARY 9, 2017, WARSAW
15 Operational DB Clickstream
• ~1 bln. Actions/day
• loading each 15 minutes
• All changes
• Hourly loading
ClickStream data
~ 90 tables
~ 190 billion records
Historized data
~ 1000 tables
~ up to 2 billion records
Daily sync
Live connection
Analysts
FEBRUARY 9, 2017, WARSAW
Avito DWH evolution
4
10
15
17
0
2
4
6
8
10
12
14
16
18
Cluster(s) Size
(servers)
2013 2014 2015 2016
11
26
51
76
0
10
20
30
40
50
60
70
80
Cluster(s) Size
(TB)
2013 2014 2015 2016
300
560
740
1000
0
200
400
600
800
1 000
1 200
ClickStream size
(Mln events/day)
2013 2014 2015 2016
3
14
23
29
0
5
10
15
20
25
30
35
Integrated
systems count
2013 2014 2015 2016
FEBRUARY 9, 2017, WARSAW
Part 3: Microservices – new level of
complexity
FEBRUARY 9, 2017, WARSAW
Microservices
FEBRUARY 9, 2017, WARSAW
Microservices – Polyglot Persistence
Registration
Search
Notifications
Billing
Moderation
Search
Search
FEBRUARY 9, 2017, WARSAW
ESP – Event Stream Processing
Registration
Search
Notifications
Billing
Moderation
Search
Search
ESP
ESP:
Kafka
NSQ
…
FEBRUARY 9, 2017, WARSAW
Click stream automation
Registration
Search
Notifications
Billing
Moderation
Search
Search
ESP
Event register:
Event 01
Event 02
New Event-> Service->Attributes
logging
expanding(?)
new chart (?)
FEBRUARY 9, 2017, WARSAW
Part 4: Proof, that approach works
FEBRUARY 9, 2017, WARSAW
Expanding the system
141
195 199
261 274 295 326
423
482
538 580
671 704
752 797 826
876 921 949 972 979 986
1032 1043 1052 1093 1120 1132 1156 1171 1205 1237 1277
1331
1496
1760
1938
2149
2223 2266
29 31 31 32 34 38 45 65 78 85 89 95 101 108 115 119 124 131 136 139 139 139 144 146 146 156 159 161 168 168 174 176 181 196 222
276 306 329 334 342
39 44 44 46 49 55 66 89 108 126 148 150 162 174 191 203 213 221 231 237 238 238 253 256 260 275 279 283 292 299 310 314 327 343
392
465
519 556 569 580
73
120 124
183 191 202 215
269 296 327 343
426 441 470 491 504 539 569 582 596 602 609 635 641 646 662 682 688 696 704 721 747 769 792
882
1019
1113
1264
1320 1344
0
500
1000
1500
2000
2500
cumulative
tables
cumulative
anchors
cumulative
ties
cumulative
attributes
FEBRUARY 9, 2017, WARSAW
Avito DWH – playground for analysts
0
20
40
60
80
100
120
140
160
180
0
2 000 000 000
4 000 000 000
6 000 000 000
8 000 000 000
10 000 000 000
12 000 000 000
2016-01-19 2016-02-19 2016-03-19 2016-04-19 2016-05-19 2016-06-19 2016-07-19 2016-08-19 2016-09-19 2016-10-19 2016-11-19 2016-12-19
row_count marts_count
FEBRUARY 9, 2017, WARSAW
Avito DWH evolution
4
10
15
17
0
2
4
6
8
10
12
14
16
18
Cluster(s) Size
(servers)
2013 2014 2015 2016
11
26
51
76
0
10
20
30
40
50
60
70
80
Cluster(s) Size
(TB)
2013 2014 2015 2016
300
560
740
1000
0
200
400
600
800
1 000
1 200
ClickStream size
(Mln events/day)
2013 2014 2015 2016
3
14
23
29
0
5
10
15
20
25
30
35
Integrated
systems count
2013 2014 2015 2016
FEBRUARY 9, 2017, WARSAW
Thank you!

More Related Content

Similar to Scalable analytics for microservices architecture nikolay golov

Tracxn Research - Blockchain Landscape, November 2016
Tracxn Research - Blockchain Landscape, November 2016Tracxn Research - Blockchain Landscape, November 2016
Tracxn Research - Blockchain Landscape, November 2016Tracxn
 
VR-gaming-feed report-20170614-full
VR-gaming-feed report-20170614-fullVR-gaming-feed report-20170614-full
VR-gaming-feed report-20170614-fullTracxn
 
Tracxn Research - ERP Landscape, February 2017
Tracxn Research - ERP Landscape, February 2017Tracxn Research - ERP Landscape, February 2017
Tracxn Research - ERP Landscape, February 2017Tracxn
 
The Beauty of Mapping Big Data
The Beauty of Mapping Big DataThe Beauty of Mapping Big Data
The Beauty of Mapping Big DataStoimen Popov
 
WebXtract September 2017
WebXtract September 2017WebXtract September 2017
WebXtract September 2017WebXpress.IN
 
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Thoughtworks
 
E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...
E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...
E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...E-Commerce Berlin EXPO
 
Tracxn Research - Enterprise Networking Report, April 2017
Tracxn Research - Enterprise Networking Report, April 2017Tracxn Research - Enterprise Networking Report, April 2017
Tracxn Research - Enterprise Networking Report, April 2017Tracxn
 
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Rainer Sternfeld
 
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Rainer Sternfeld
 
Tracxn Research - Virtual Reality Report, April 2017
Tracxn Research - Virtual Reality Report, April 2017Tracxn Research - Virtual Reality Report, April 2017
Tracxn Research - Virtual Reality Report, April 2017Tracxn
 
NRO Statistics Report
NRO Statistics ReportNRO Statistics Report
NRO Statistics ReportAPNIC
 
Startup News Asia, Q4 2016 preview
Startup News Asia, Q4 2016 previewStartup News Asia, Q4 2016 preview
Startup News Asia, Q4 2016 previewWade Wright
 
NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017
NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017
NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017Roberto Stefanetti
 
Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017Linkfluence
 
The Tech Report - 2015 Market Reviews
The Tech Report - 2015 Market ReviewsThe Tech Report - 2015 Market Reviews
The Tech Report - 2015 Market Reviewsresultsig
 
AdTech & MarTech Barometer - Q1 2015 Market Review
AdTech & MarTech Barometer - Q1 2015 Market ReviewAdTech & MarTech Barometer - Q1 2015 Market Review
AdTech & MarTech Barometer - Q1 2015 Market Reviewresultsig
 
PowerBi Event with Machine Learning
PowerBi Event with Machine LearningPowerBi Event with Machine Learning
PowerBi Event with Machine LearningTrivadis
 
IoT in Action: Architecting, Securing, & Scaling Applications
IoT in Action: Architecting, Securing, & Scaling ApplicationsIoT in Action: Architecting, Securing, & Scaling Applications
IoT in Action: Architecting, Securing, & Scaling ApplicationsOpen Networking Summit
 

Similar to Scalable analytics for microservices architecture nikolay golov (20)

Tracxn Research - Blockchain Landscape, November 2016
Tracxn Research - Blockchain Landscape, November 2016Tracxn Research - Blockchain Landscape, November 2016
Tracxn Research - Blockchain Landscape, November 2016
 
VR-gaming-feed report-20170614-full
VR-gaming-feed report-20170614-fullVR-gaming-feed report-20170614-full
VR-gaming-feed report-20170614-full
 
Tracxn Research - ERP Landscape, February 2017
Tracxn Research - ERP Landscape, February 2017Tracxn Research - ERP Landscape, February 2017
Tracxn Research - ERP Landscape, February 2017
 
The Beauty of Mapping Big Data
The Beauty of Mapping Big DataThe Beauty of Mapping Big Data
The Beauty of Mapping Big Data
 
WebXtract September 2017
WebXtract September 2017WebXtract September 2017
WebXtract September 2017
 
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
 
E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...
E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...
E-commerce Berlin Expo 2017 - Trends and digital advertising opportunities in...
 
Tracxn Research - Enterprise Networking Report, April 2017
Tracxn Research - Enterprise Networking Report, April 2017Tracxn Research - Enterprise Networking Report, April 2017
Tracxn Research - Enterprise Networking Report, April 2017
 
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
Designing a Better Planet with Big Data and Sensor Networks (for Intelligent ...
 
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
Opportunities in Sensor Networks and Big Data in 2014 (for NIKKEI Big Data Co...
 
Tracxn Research - Virtual Reality Report, April 2017
Tracxn Research - Virtual Reality Report, April 2017Tracxn Research - Virtual Reality Report, April 2017
Tracxn Research - Virtual Reality Report, April 2017
 
NRO Statistics Report
NRO Statistics ReportNRO Statistics Report
NRO Statistics Report
 
Startup News Asia, Q4 2016 preview
Startup News Asia, Q4 2016 previewStartup News Asia, Q4 2016 preview
Startup News Asia, Q4 2016 preview
 
NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017
NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017
NAV 2018 and NAV New Technology - Fast Tracks - 14 dicembre 2017
 
Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017Social Media Report - Marketing Week Live & Insight Show 2017
Social Media Report - Marketing Week Live & Insight Show 2017
 
The Tech Report - 2015 Market Reviews
The Tech Report - 2015 Market ReviewsThe Tech Report - 2015 Market Reviews
The Tech Report - 2015 Market Reviews
 
AdTech & MarTech Barometer - Q1 2015 Market Review
AdTech & MarTech Barometer - Q1 2015 Market ReviewAdTech & MarTech Barometer - Q1 2015 Market Review
AdTech & MarTech Barometer - Q1 2015 Market Review
 
PowerBi Event with Machine Learning
PowerBi Event with Machine LearningPowerBi Event with Machine Learning
PowerBi Event with Machine Learning
 
Digital #17
Digital #17Digital #17
Digital #17
 
IoT in Action: Architecting, Securing, & Scaling Applications
IoT in Action: Architecting, Securing, & Scaling ApplicationsIoT in Action: Architecting, Securing, & Scaling Applications
IoT in Action: Architecting, Securing, & Scaling Applications
 

More from Evention

The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...Evention
 
A/B testing powered by Big data - Saurabh Goyal, Booking.com
A/B testing powered by Big data - Saurabh Goyal, Booking.comA/B testing powered by Big data - Saurabh Goyal, Booking.com
A/B testing powered by Big data - Saurabh Goyal, Booking.comEvention
 
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...Evention
 
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...Evention
 
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...Evention
 
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, AdformBuilding a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, AdformEvention
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansEvention
 
Privacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, MapflatPrivacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, MapflatEvention
 
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Evention
 
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...Evention
 
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...Evention
 
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...Evention
 
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...Evention
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyEvention
 
Big Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz ŚliwaBig Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz ŚliwaEvention
 
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz KołpućElastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz KołpućEvention
 
H2 o deep water making deep learning accessible to everyone -jo-fai chow
H2 o deep water   making deep learning accessible to everyone -jo-fai chowH2 o deep water   making deep learning accessible to everyone -jo-fai chow
H2 o deep water making deep learning accessible to everyone -jo-fai chowEvention
 
That won’t fit into RAM - Michał Brzezicki
That won’t fit into RAM -  Michał  BrzezickiThat won’t fit into RAM -  Michał  Brzezicki
That won’t fit into RAM - Michał BrzezickiEvention
 
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian HueskeStream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian HueskeEvention
 

More from Evention (20)

The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...The Factorization Machines algorithm for building recommendation system - Paw...
The Factorization Machines algorithm for building recommendation system - Paw...
 
A/B testing powered by Big data - Saurabh Goyal, Booking.com
A/B testing powered by Big data - Saurabh Goyal, Booking.comA/B testing powered by Big data - Saurabh Goyal, Booking.com
A/B testing powered by Big data - Saurabh Goyal, Booking.com
 
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
 
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
 
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
 
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, AdformBuilding a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
 
Privacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, MapflatPrivacy by Design - Lars Albertsson, Mapflat
Privacy by Design - Lars Albertsson, Mapflat
 
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
 
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
 
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
 
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
 
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
 
Big Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz ŚliwaBig Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz Śliwa
 
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz KołpućElastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz Kołpuć
 
H2 o deep water making deep learning accessible to everyone -jo-fai chow
H2 o deep water   making deep learning accessible to everyone -jo-fai chowH2 o deep water   making deep learning accessible to everyone -jo-fai chow
H2 o deep water making deep learning accessible to everyone -jo-fai chow
 
That won’t fit into RAM - Michał Brzezicki
That won’t fit into RAM -  Michał  BrzezickiThat won’t fit into RAM -  Michał  Brzezicki
That won’t fit into RAM - Michał Brzezicki
 
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian HueskeStream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian Hueske
 

Recently uploaded

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Scalable analytics for microservices architecture nikolay golov

  • 1. FEBRUARY 9, 2017, WARSAW Scalable Analytics for Microservices Architecture Nikolay Golov
  • 2. FEBRUARY 9, 2017, WARSAW Part 1 – Avito story
  • 3. FEBRUARY 9, 2017, WARSAW What is Avito?
  • 5. FEBRUARY 9, 2017, WARSAW Clear №1 in Russia One Country, Many Cities, Five Verticals Size Relative to #2 Classified Site in Key Verticals by Page Views Moscow St. Petersburg Novosibirsk N. Novgorod Kazan Samara Rostov Volgograd Voronezh Ufa Chelyabinsk Omsk Krasnoyarsk Vladivostok Yakutsk Irkutsk Khabarovsk 1.6x2 Size of Russian classifieds market More than 60% Page Views from Mobile 40%(3) of all used cars sold in Russia in 2014 sold on Avito 6 mln listers per day, of which ca. 10% list for the first time 9.5bn Page Views in Dec-2016 across all devices generated by 40 mln people Ca. 11,000+ Local SMEs on subscription model 31mln Active Items on the site as of 2016 YE RUB32bln 2016 Revenue(1) (+200% to 2015) Auto 2.0x Real Estate 15.8x Jobs 1.5x Services 75.8x All Categories 4.8x General 75.1x ✓ By far the largest classifieds in Russia and one of the biggest general classifieds worldwide ✓ Top-of-mind in many categories ✓ Ca. 1200 employees, of which ca.300 moderators ✓ Profitable since Q2 2013
  • 6. FEBRUARY 9, 2017, WARSAW Avito now. Source: Google Analytics, LiveInternet, Internal data Avito Business Development 0 400 800 1 200 1 600 2 000 2 400 2 800 Jan-09 Jul-09 Jan-10 Jul-10 Jan-11 Jul-11 Jan-12 Jul-12 Jan-13 Jul-13 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 Weekly Page Views (m) Q1 2010 Focus on Moscow and St.Pete September 2010 Target 13 additional cities August 2011 Target total of 28 cities Q2 2013 Merger with Slando and Olx reaffirmed #1 position in the Russian market +Vertical +Listing Fees +Pro tools Goods C2C +RE & Cars +B2C +Jobs +Services Path from Investment Stage to Cash Flow Generation Stage 2Stage 1 Stage 3 Position • Competing with others • Ahead of competition • x times ahead of competition • Heavy investment • Approximately break-even • High EBITDA marginEconomics • Build user base • Develop business model and build leading brand • Focus on monetization enhancement; attract professional classifieds market spend Focus January 2012 Avito has national coverage Q2 2014 Launch of Domofond, a dedicated real estate classified Q4 2014 Launch of a new revenue stream: Listing Fees Q4 2015 Sold to Naspers
  • 7. FEBRUARY 9, 2017, WARSAW Part 2: Let’s start from the beginning. What issues could you face? 0 400 800 1 200 1 600 2 000 2 400 2 800 Jan-09 Jul-09 Jan-10 Jul-10 Jan-11 Jul-11 Jan-12 Jul-12 Jan-13 Jul-13 Jan-14 Jul-14 Jan-15 Jul-15 Jan-16 ?? ?
  • 8. FEBRUARY 9, 2017, WARSAW At the beginning you do not know: ▪… your future traffic ▪… all your future data sources ▪… your future data monetization tools
  • 9. FEBRUARY 9, 2017, WARSAW At the beginning you hope that: ▪… your traffic will grow ▪… you will connect more and more data sources ▪… you will launch more and more data monetization initiatives
  • 10. FEBRUARY 9, 2017, WARSAW Alternatives… Data Lake • Data in a natural state • Hadoop based • Schemaless Our choice • Full SQL support • Single data model • Normalization of all incoming data
  • 11. FEBRUARY 9, 2017, WARSAW SCHEMA ON WRITE Data Lake SCHEMA ON READ
  • 12. FEBRUARY 9, 2017, WARSAW SCHEMA ON WRITE Data Lake SCHEMA ON READ SCHEMA ON READ SCHEMA ON READ
  • 13. FEBRUARY 9, 2017, WARSAW Back office Click stream BI Team Antifraud MDM CRM
  • 15. FEBRUARY 9, 2017, WARSAW 15 Operational DB Clickstream • ~1 bln. Actions/day • loading each 15 minutes • All changes • Hourly loading ClickStream data ~ 90 tables ~ 190 billion records Historized data ~ 1000 tables ~ up to 2 billion records Daily sync Live connection Analysts
  • 16. FEBRUARY 9, 2017, WARSAW Avito DWH evolution 4 10 15 17 0 2 4 6 8 10 12 14 16 18 Cluster(s) Size (servers) 2013 2014 2015 2016 11 26 51 76 0 10 20 30 40 50 60 70 80 Cluster(s) Size (TB) 2013 2014 2015 2016 300 560 740 1000 0 200 400 600 800 1 000 1 200 ClickStream size (Mln events/day) 2013 2014 2015 2016 3 14 23 29 0 5 10 15 20 25 30 35 Integrated systems count 2013 2014 2015 2016
  • 17. FEBRUARY 9, 2017, WARSAW Part 3: Microservices – new level of complexity
  • 18. FEBRUARY 9, 2017, WARSAW Microservices
  • 19. FEBRUARY 9, 2017, WARSAW Microservices – Polyglot Persistence Registration Search Notifications Billing Moderation Search Search
  • 20. FEBRUARY 9, 2017, WARSAW ESP – Event Stream Processing Registration Search Notifications Billing Moderation Search Search ESP ESP: Kafka NSQ …
  • 21. FEBRUARY 9, 2017, WARSAW Click stream automation Registration Search Notifications Billing Moderation Search Search ESP Event register: Event 01 Event 02 New Event-> Service->Attributes logging expanding(?) new chart (?)
  • 22. FEBRUARY 9, 2017, WARSAW Part 4: Proof, that approach works
  • 23. FEBRUARY 9, 2017, WARSAW Expanding the system 141 195 199 261 274 295 326 423 482 538 580 671 704 752 797 826 876 921 949 972 979 986 1032 1043 1052 1093 1120 1132 1156 1171 1205 1237 1277 1331 1496 1760 1938 2149 2223 2266 29 31 31 32 34 38 45 65 78 85 89 95 101 108 115 119 124 131 136 139 139 139 144 146 146 156 159 161 168 168 174 176 181 196 222 276 306 329 334 342 39 44 44 46 49 55 66 89 108 126 148 150 162 174 191 203 213 221 231 237 238 238 253 256 260 275 279 283 292 299 310 314 327 343 392 465 519 556 569 580 73 120 124 183 191 202 215 269 296 327 343 426 441 470 491 504 539 569 582 596 602 609 635 641 646 662 682 688 696 704 721 747 769 792 882 1019 1113 1264 1320 1344 0 500 1000 1500 2000 2500 cumulative tables cumulative anchors cumulative ties cumulative attributes
  • 24. FEBRUARY 9, 2017, WARSAW Avito DWH – playground for analysts 0 20 40 60 80 100 120 140 160 180 0 2 000 000 000 4 000 000 000 6 000 000 000 8 000 000 000 10 000 000 000 12 000 000 000 2016-01-19 2016-02-19 2016-03-19 2016-04-19 2016-05-19 2016-06-19 2016-07-19 2016-08-19 2016-09-19 2016-10-19 2016-11-19 2016-12-19 row_count marts_count
  • 25. FEBRUARY 9, 2017, WARSAW Avito DWH evolution 4 10 15 17 0 2 4 6 8 10 12 14 16 18 Cluster(s) Size (servers) 2013 2014 2015 2016 11 26 51 76 0 10 20 30 40 50 60 70 80 Cluster(s) Size (TB) 2013 2014 2015 2016 300 560 740 1000 0 200 400 600 800 1 000 1 200 ClickStream size (Mln events/day) 2013 2014 2015 2016 3 14 23 29 0 5 10 15 20 25 30 35 Integrated systems count 2013 2014 2015 2016
  • 26. FEBRUARY 9, 2017, WARSAW Thank you!