SlideShare a Scribd company logo
Company Profile
Сегментация пользователей
в online-рекламе
Spark vs Hadoop
Сергей Жемжицкий,
CTO, CleverDATA,
22 мая, 2015
cleverdata.ru | info@cleverdata.ru
International market
business development
since 2012
One of three leading IT companies in Russia
43 branches in Russia and abroad
+5500 employees
100K projects for 10K customers
Data management innovative
platform (Data Exchange Service)
Cloud Service
In-house development
Internet advertising solutions
Data Management Platforms
Customers Base Management
Web Analytics
Marketing automation
Big Data
Data Mining
Digital Intelligence
Operational Intelligence
Low Latency and NoSQL
Cloud Computing
cleverdata.ru | info@cleverdata.ru
Агенда
• Про задачу;
• Hadoop vs. Spark;
• Особенности;
• Что дальше.
cleverdata.ru | info@cleverdata.ru
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
D
S
P
S
S
P
Real Time Bidding (RTB)
TRACKING DATA
cleverdata.ru | info@cleverdata.ru
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNER’S DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
DMP
Data Management Platform (DMP)
cleverdata.ru | info@cleverdata.ru
3rd party
data
Relational Data Store
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
Типовые потоки данных
cleverdata.ru | info@cleverdata.ru
Типовые потоки данных :: RTB
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
3rd party
data
Relational Data Store
RTB
SRV
Exchange
SSP
bid req.
bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data3rd party
data
3rd party
data
Raw Data Store & Processing
RealTime Data Store
user profilesaggregates
cleverdata.ru | info@cleverdata.ru
1st-party data
• Зачем монетизировать?
• Как монетизировать?
• Чем монетизировать?
cleverdata.ru | info@cleverdata.ru
Зачем монетизировать?
Найти всех пользователей, которые
участвовали в рекламной кампании “Star Wars” [и]
видели один из баннеров “Darth Vader” или “Luke Skywalker”
в течении последних 6 дней [и]
кликнули на этот баннер [и]
посетили страницу покупки светового меча Darth’а Vader’а [и]
но так ничего и не купили
Для того, чтобы
сделать ретаргетинг персонифицированным баннером со
скидкой на меч в 40%
cleverdata.ru | info@cleverdata.ru
find all users who have
taken part in campaign[s] “Star Wars” [and]
viewed banner[s] “Darth Vader” or “Luke Skywalker”
during [last] 6 day[s] [and]
clicked banner[s] “Darth Vader's lightsaber” [and]
visited buying area of “Darth Vader's lightsaber” [and]
not visited order confirmed area of “Darth Vader's lightsaber”
Как монетизировать?
[impression]
[click]
[tr. pixel]
[tr. pixel]
id cookie event_id event_type campaign_id timestamp …
1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 …
2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 …
3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 …
[cookies]
cleverdata.ru | info@cleverdata.ru
Как монетизировать?
reducefind all users who have
taken part in campaign[s] “Star Wars”
viewed banner[s] “Darth Vader” or
“Luke Skywalker” during [last] 6 day[s]
clicked banner[s] “Darth Vader's
lightsaber”
visited buying area of “Darth Vader's
lightsaber”
not visited order confirmed area of “Darth
Vader's lightsaber”
(c1, 0)
(c1, 1)
(c1, 2)
(c1, 3)
Ø
map
(c1, 0;1;2;3)
true(0) and
true(1) and
true(2) and
true(3) and
not false(4)
C1
cleverdata.ru | info@cleverdata.ru
VS.
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Правда жизни
• Стильно;
• Модно;
• Молодежно.
cleverdata.ru | info@cleverdata.ru
Spark :: Размер
cleverdata.ru | info@cleverdata.ru
Перед тем, как смотреть на Hadoop
cleverdata.ru | info@cleverdata.ru
Map-Reduce :: Размер
cleverdata.ru | info@cleverdata.ru
Материалы и инструменты
Hardware (3 Nodes)
• 12 Core AMD Opteron™ 6338P
~ 2.8 GHz
• 64 GB RAM
• 1 GBPS NICs
Software
• CDH 5.3.1 (Hadoop 2.5.0)
• Spark 1.2.0
Data
• 14.2 GB of raw data
• 61.1 M of transactions
• 128 MB block size
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Время выполнения
cleverdata.ru | info@cleverdata.ru
Spark :: Exec-cores vs Num-execs
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Инициализация
MR
protected void setup(Context ctx)
o.a.h.c.Configured
distributed cache
Spark
mapRegion
broadcast vars
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Параллелизм
MR
mapred.reduce.tasks
mapreduce.job.reduces
splittable formats
Spark
spark.default.parallelism
num-executors, executor-cores in
yarn
numTasks в groupByKey,
reduceByKey, aggregateByKey…
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Зависимости
MR
o.a.h.u.Tool
o.a.h.u.ToolRunner
-conf app.conf
-files
-libjars
setUserClassesTakesPrecedence
Spark
--jars
--files
--conf
--driver-java-options
spark.driver.extraJavaOptions
spark.executor.extraJavaOptions
spark.driver.userClassPathFirst
spark.executor.userClassPathFirst
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Secondary Sort
MR
setSortComparatorClass
setGroupingComparatorClass
setPartitionerClass
Spark
repartitionAndSortWithinPartitions
mapPartitions
Entire partition processing result
must be able to fit in memory
cleverdata.ru | info@cleverdata.ru
MR vs Spark :: Тестирование
MR
MRUnit
o.a.h.h.MiniDFSCluster
o.a.h.m.MiniMRCluster
o.a.h.y.s.MiniYARNCluster
o.a.h.m.v2.MiniMRYarnCluster
Spark
Local executor
cleverdata.ru | info@cleverdata.ru
Что дальше и почему Spark?
• Spark Streaming;
• Micro Batches;
• λ-архитектура.
без серьезного хирургического вмешательства
cleverdata.ru | info@cleverdata.ru
Спасибо за вопросы!
info@cleverleaf.co.uk :: info@cleverdata.ru
cleverleaf.co.uk :: cleverdata.ru
1dmp.io :: crawler.1dmp.io
facebook.com/CleverData :: +7 (495) 967-66-50

More Related Content

Viewers also liked

5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week
Jay Gotra
 
10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey
Jay Gotra
 
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Антон Шестаков
 
Let's Encrypt
Let's EncryptLet's Encrypt
Let's Encrypt
Jay Gotra
 
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Антон Шестаков
 
4 animaux pour_une_femme
4 animaux pour_une_femme4 animaux pour_une_femme
4 animaux pour_une_femmeRenée Bukay
 
Joelle chelala
Joelle chelalaJoelle chelala
Joelle chelala
joelleghosnchelala
 
Com fem les làmines de l'espai...
Com fem les làmines de l'espai...Com fem les làmines de l'espai...
Com fem les làmines de l'espai...
laclassedequartb
 
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Антон Шестаков
 
Electrochemical Machining
Electrochemical MachiningElectrochemical Machining
Electrochemical Machining
Sushima Keisham
 
Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?John Passy
 
Comment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérienceComment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérience
John Passy
 
Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !
John Passy
 
Feerie d orchidees
Feerie d orchideesFeerie d orchidees
Feerie d orchideesRenée Bukay
 

Viewers also liked (15)

5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week5 Sunday Hacks to a Great Week
5 Sunday Hacks to a Great Week
 
10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey10 Things You Didn't Know:  Jack Dorsey
10 Things You Didn't Know:  Jack Dorsey
 
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
Выступление Константина Круглова и Анны Кузьменко на HybridConf 2015
 
Let's Encrypt
Let's EncryptLet's Encrypt
Let's Encrypt
 
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
Андрей Поддубный, Exebid.DCA: Потерянные аудитории или как не перемудрить с т...
 
4 animaux pour_une_femme
4 animaux pour_une_femme4 animaux pour_une_femme
4 animaux pour_une_femme
 
Joelle chelala
Joelle chelalaJoelle chelala
Joelle chelala
 
Com fem les làmines de l'espai...
Com fem les làmines de l'espai...Com fem les làmines de l'espai...
Com fem les làmines de l'espai...
 
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
Выступление Александра Крота из "Вымпелком" на Hadoop Meetup в рамках RIT++
 
Electrochemical Machining
Electrochemical MachiningElectrochemical Machining
Electrochemical Machining
 
Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?Comment être expert dans l'innovation ?
Comment être expert dans l'innovation ?
 
Comment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérienceComment être un expert en innovation - Retour d'expérience
Comment être un expert en innovation - Retour d'expérience
 
Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !Eviter les désastres de sous-traitance Offshore Indien !
Eviter les désastres de sous-traitance Offshore Indien !
 
Feerie d orchidees
Feerie d orchideesFeerie d orchidees
Feerie d orchidees
 
Agglos
AgglosAgglos
Agglos
 

Similar to Выступление Сергея Жемжицкого, CleverData

VR Radar Chart Q2 2014
VR Radar Chart Q2 2014VR Radar Chart Q2 2014
VR Radar Chart Q2 2014
KZero Worldswide
 
Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014
KZero Worldswide
 
Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?
Taste Medio
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020
Patrick Deglon
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies Webinar
Matěj Novák
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
Mark Kromer
 
Experience Summary
Experience SummaryExperience Summary
Experience Summary
Sanket Dave
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdf
Human37
 
The Sizmek_Tech solutions
The Sizmek_Tech solutionsThe Sizmek_Tech solutions
The Sizmek_Tech solutions
Karunakar Ravirala
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
confluent
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator X
Product School
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
Denodo
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014
Olivier Van Baeveghem
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
jexp
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
ScyllaDB
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Databricks
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
VIVALDI
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
ShareThis
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury Market
SB Soft
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 

Similar to Выступление Сергея Жемжицкого, CleverData (20)

VR Radar Chart Q2 2014
VR Radar Chart Q2 2014VR Radar Chart Q2 2014
VR Radar Chart Q2 2014
 
Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014Virtual Reality Games by Genre: 3 2014
Virtual Reality Games by Genre: 3 2014
 
Kde jsou limity zákaznické 360°?
 Kde jsou limity zákaznické 360°? Kde jsou limity zákaznické 360°?
Kde jsou limity zákaznické 360°?
 
Intelligence Data Day 2020
Intelligence Data Day 2020Intelligence Data Day 2020
Intelligence Data Day 2020
 
Publishers' Life After Cookies Webinar
Publishers' Life After Cookies WebinarPublishers' Life After Cookies Webinar
Publishers' Life After Cookies Webinar
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Experience Summary
Experience SummaryExperience Summary
Experience Summary
 
Analytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdfAnalytics Summit Hamburg.pdf
Analytics Summit Hamburg.pdf
 
The Sizmek_Tech solutions
The Sizmek_Tech solutionsThe Sizmek_Tech solutions
The Sizmek_Tech solutions
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 
Product Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator XProduct Management Talk with Oracle, PayPal and Incubator X
Product Management Talk with Oracle, PayPal and Incubator X
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014Filip Lauweres - Conversion Day 2014
Filip Lauweres - Conversion Day 2014
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdfOvercoming Database Scaling Challenges with a New Approach to NoSQL.pdf
Overcoming Database Scaling Challenges with a New Approach to NoSQL.pdf
 
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
Building Identity Graph at Scale for Programmatic Media Buying Using Apache S...
 
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business StrategySurvival of the Fittest in Marketing, Innovation, Branding & Business Strategy
Survival of the Fittest in Marketing, Innovation, Branding & Business Strategy
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
 
CRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury MarketCRM Application for Fashion & Luxury Market
CRM Application for Fashion & Luxury Market
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

Выступление Сергея Жемжицкого, CleverData

  • 1. Company Profile Сегментация пользователей в online-рекламе Spark vs Hadoop Сергей Жемжицкий, CTO, CleverDATA, 22 мая, 2015
  • 2. cleverdata.ru | info@cleverdata.ru International market business development since 2012 One of three leading IT companies in Russia 43 branches in Russia and abroad +5500 employees 100K projects for 10K customers Data management innovative platform (Data Exchange Service) Cloud Service In-house development Internet advertising solutions Data Management Platforms Customers Base Management Web Analytics Marketing automation Big Data Data Mining Digital Intelligence Operational Intelligence Low Latency and NoSQL Cloud Computing
  • 3. cleverdata.ru | info@cleverdata.ru Агенда • Про задачу; • Hadoop vs. Spark; • Особенности; • Что дальше.
  • 4. cleverdata.ru | info@cleverdata.ru publishers AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK AD NETWORK advertisers D S P S S P Real Time Bidding (RTB)
  • 5. TRACKING DATA cleverdata.ru | info@cleverdata.ru publishers COOKIE SYNCs ACCESS LOGS PARTNER’S DATA 3rd PARTY DATA CLICK STREAMS advertisers S S P D S P DMP Data Management Platform (DMP)
  • 6. cleverdata.ru | info@cleverdata.ru 3rd party data Relational Data Store raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates Типовые потоки данных
  • 7. cleverdata.ru | info@cleverdata.ru Типовые потоки данных :: RTB 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 8. cleverdata.ru | info@cleverdata.ru 1st-party data 3rd party data Relational Data Store RTB SRV Exchange SSP bid req. bid resp. pixels :: impressions :: clicks bid requests user profiles raw data3rd party data 3rd party data Raw Data Store & Processing RealTime Data Store user profilesaggregates
  • 9. cleverdata.ru | info@cleverdata.ru 1st-party data • Зачем монетизировать? • Как монетизировать? • Чем монетизировать?
  • 10. cleverdata.ru | info@cleverdata.ru Зачем монетизировать? Найти всех пользователей, которые участвовали в рекламной кампании “Star Wars” [и] видели один из баннеров “Darth Vader” или “Luke Skywalker” в течении последних 6 дней [и] кликнули на этот баннер [и] посетили страницу покупки светового меча Darth’а Vader’а [и] но так ничего и не купили Для того, чтобы сделать ретаргетинг персонифицированным баннером со скидкой на меч в 40%
  • 11. cleverdata.ru | info@cleverdata.ru find all users who have taken part in campaign[s] “Star Wars” [and] viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] [and] clicked banner[s] “Darth Vader's lightsaber” [and] visited buying area of “Darth Vader's lightsaber” [and] not visited order confirmed area of “Darth Vader's lightsaber” Как монетизировать? [impression] [click] [tr. pixel] [tr. pixel] id cookie event_id event_type campaign_id timestamp … 1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 … 2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 … 3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 … [cookies]
  • 12. cleverdata.ru | info@cleverdata.ru Как монетизировать? reducefind all users who have taken part in campaign[s] “Star Wars” viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s] clicked banner[s] “Darth Vader's lightsaber” visited buying area of “Darth Vader's lightsaber” not visited order confirmed area of “Darth Vader's lightsaber” (c1, 0) (c1, 1) (c1, 2) (c1, 3) Ø map (c1, 0;1;2;3) true(0) and true(1) and true(2) and true(3) and not false(4) C1
  • 14. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Правда жизни • Стильно; • Модно; • Молодежно.
  • 16. cleverdata.ru | info@cleverdata.ru Перед тем, как смотреть на Hadoop
  • 18. cleverdata.ru | info@cleverdata.ru Материалы и инструменты Hardware (3 Nodes) • 12 Core AMD Opteron™ 6338P ~ 2.8 GHz • 64 GB RAM • 1 GBPS NICs Software • CDH 5.3.1 (Hadoop 2.5.0) • Spark 1.2.0 Data • 14.2 GB of raw data • 61.1 M of transactions • 128 MB block size
  • 19. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Время выполнения
  • 20. cleverdata.ru | info@cleverdata.ru Spark :: Exec-cores vs Num-execs
  • 21. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Инициализация MR protected void setup(Context ctx) o.a.h.c.Configured distributed cache Spark mapRegion broadcast vars
  • 22. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Параллелизм MR mapred.reduce.tasks mapreduce.job.reduces splittable formats Spark spark.default.parallelism num-executors, executor-cores in yarn numTasks в groupByKey, reduceByKey, aggregateByKey…
  • 23. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Зависимости MR o.a.h.u.Tool o.a.h.u.ToolRunner -conf app.conf -files -libjars setUserClassesTakesPrecedence Spark --jars --files --conf --driver-java-options spark.driver.extraJavaOptions spark.executor.extraJavaOptions spark.driver.userClassPathFirst spark.executor.userClassPathFirst
  • 24. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Secondary Sort MR setSortComparatorClass setGroupingComparatorClass setPartitionerClass Spark repartitionAndSortWithinPartitions mapPartitions Entire partition processing result must be able to fit in memory
  • 25. cleverdata.ru | info@cleverdata.ru MR vs Spark :: Тестирование MR MRUnit o.a.h.h.MiniDFSCluster o.a.h.m.MiniMRCluster o.a.h.y.s.MiniYARNCluster o.a.h.m.v2.MiniMRYarnCluster Spark Local executor
  • 26. cleverdata.ru | info@cleverdata.ru Что дальше и почему Spark? • Spark Streaming; • Micro Batches; • λ-архитектура. без серьезного хирургического вмешательства
  • 28. info@cleverleaf.co.uk :: info@cleverdata.ru cleverleaf.co.uk :: cleverdata.ru 1dmp.io :: crawler.1dmp.io facebook.com/CleverData :: +7 (495) 967-66-50