SlideShare a Scribd company logo
1
Confidential
2
Confidential
Big Data in Advertisement
Industry
3
Confidential
Agenda
- Intro into Ad Exchange business area
- Big Data tools overview
- Architectural approach
- JVM-based processing in Big Data analytics
4
Confidential
Intro into Ad Exchange business
area
5
Confidential
Ad Evolution
Reservation
Buying
ads sold via direct
transactions between
advertisers/agencies
and publishers
Ad Networks
ad networks
aggregate inventory
and sold it to
advertisers. Helped
publishers by selling
inventory they could
not sell themselves
Ad Exchanges &
SSPs
real-time
marketplaces with a
large pools of liquid
inventory not sold in
direct buys: SSPs
have more controls for
publishers to optimize
yield
DSPs
Bidding technology
designed to help
advertisers/agencies
target and optimize
their buys across
multiple ad
exchanges/publisher
inventory pools
Private Exchanges
& Automated
Guaranteed
Exclusive advertiser-to-
publisher inventory
relationship for
programmatic
purchasing in brand
safe environments
1990s Now
Direct Sold/
Guaranteed/
Reserved
Indirect/
Programmatic/
Unreserved
Programmatic
Premium
6
Confidential
SellersBuyers
Ad Ecosystem. How it works?
Ad Network Ad Network
Agency DSP Ad Exchange SSP Publisher
DMP/Data Supply
Brand Audience
RTB
7
Confidential
Big Data tools overview
8
Confidential
What is Big Data?
We’ve all heard the term “big data,”
but you may not know exactly what it
means. Most experts agree the term
describes information that shares
these three attributes:
9
Confidential
Typical Big Data pipeline
Data Sources
- Structured
- Unstructured
Data Ingestion
- Batch layer
- Stream layer
Storage
BI / Data
Warehouse
Visualization
and Reporting
ToolsProcessing Layer
- Data Mining
- Machine
Learning
Governance and Privacy Security Quality Management High Scale; Low Cost
10
Confidential
Storages (non-relational)
Key-value Document Column-oriented
Graph Full-text (search engine) BLOB
11
Confidential
Data ingestion or ETL
Batch Near to realtime Realtime
Source ETL Destination
12
Confidential
Resource management
Distributed storage
Hadoop
HDFS
YARNMapReduce 1.0
MapReduce 2.0
13
Confidential
MapReduce
14
Confidential
Spark
Apache Spark is
a unified
analytics engine
for large-scale
data processing
15
Confidential
MapReduce
● Good old, slow and reliable
● Written in Java
● Natively supports Java, though all JVM
compatible languages are adaptable
● Easy to learn and tune
● Just batch processing
● Hard to implement complex pipelines
● Unit testing
Spark
● “Brand-new”, fast and flexible
● Written in Scala
● Natively supports Scala and Java (R and
Python)
● Provides fat pack of functionality
● Batch and micro-batch processing
● Support of complex pipelines is its thing
● Unit testing
MapReduce vs Spark: Which one to pick up?
16
Confidential
Architectural approach
17
Confidential
High level overview
Bid PlatformAd Platform
Buyer Buyer Buyer
Analytical
Platform
Seller
18
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
19
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
● over 100 TB of serialized and compressed raw input data
20
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
● over 100 TB of serialized and compressed raw input data
● around 150K analytic queries over 110 dimensions in an analytic data store
21
Confidential
Big Data analytics: What’s the challenge?
Daily
● 65B of raw ad and bid events
● over 100 TB of serialized and compressed raw input data
● around 150K analytic queries over 110 dimensions in an analytic data store
● 4s of 98% query time and 1s of Avg query time
22
Confidential
Big Data pipeline applied
Ad & Bid
Platforms Data Collector HDFS
Druid
Performance
Analytics
MapReduce
Spark
23
Confidential
Big Data pipeline applied
Ad & Bid
Platforms Data Collector HDFS
Druid
Performance
Analytics
MapReduce
Spark
24
Confidential
UI
25
Confidential
JVM-based processing in Big Data
analytics
26
Confidential
Let’s solve some problem: Keywords
Seller
“I want to have an opportunity to get performance reports beyond the standard account, site, zone, size,
geography, etc”
27
Confidential
Let’s solve some problem: Keywords
Seller
“I want to have an opportunity to get performance reports beyond the standard account, site, zone, size,
geography, etc”
Ad Exchange Company
“I want to satisfy high demand of this functionality, let’s name it Keywords, but I also want to reduce processing
and retention cost by servicing only sellers with limited number of different keywords”
28
Confidential
Let’s solve some problem: Keywords
Seller
“I want to have an opportunity to get performance reports beyond the standard account, site, zone, size,
geography, etc”
Ad Exchange Company
“I want to satisfy high demand of this functionality, let’s name it Keywords, but also want to reduce processing
and retention cost by servicing only sellers with limited number of different keywords”
Engineering
“There are two steps to solve Keywords problem: first, we need to identify sellers which comply with a threshold;
second, we need to prepare reports only for them”
29
Confidential
Spark: Let’s write some code
def getKeyword(AELog) => Option[ ( AccountId, Keyword ) ]
AdLog.getDataset(inputPath)(sparkSession)
.flatMap( getKeyword )
.distinct
.mapValues(_ => 1L)
.reduceByKey(_ + _)
.filter { case (_, totalKeywords) => totalKeywords <= maxKeywordsNumber }
.keys
.collect()
.toSet
Step #1: Identify valid sellers
(AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => Set[AccountId]
30
Confidential
Spark: Let’s write some code
case class KeywordsRecord( … ) // fields which represent dimensions and metrics
object KeywordsRecord { .. } // functions pack to operate with input/output data
AdLog.getDataset(inputPath)(sparkSession)
.filter( adLog => validSellers.contains(adLog.getAccountId) )
.map( KeywordsRecord.fromAdLog )
.toDS
.groupBy( KeywordsRecord.groupBy: _* ) // dimensions
.agg( KeywordsRecord.aggregations.head, KeywordsRecord.aggregations.tail: _* ) // metrics
.select( KeywordsRecord.allCols: _* )
.as[ KeywordsRecord ]
.map( KeywordsRecord.toJson )
.write
.text(outputPath)
Step #2: Prepare Keywords Report
(AdLogs: SeqFile[ID,AdLog], validSellers: Set[AccountId]) => TextFile[Json]
31
Confidential
Spark: Let’s write some code
object KeywordsApplication {
def getValidSellers(inputPath, maxKeywordsNumber)(implicit SparkSession)
def prepareReport(inputPath, outputPath, validSellers)(implicit SparkSession)
def main(args: Array[String]) = {
…
implicit val sparkSession = SparkSession.builder()
.appName(jobName)
.getOrCreate
val validSellers = getValidSellers(inputPath, maxKeywordsNumber)
prepareReport(inputPath, outputPath, validSellers)
…
}
}
Put it together: Step #1 + Step#2
(AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => TextFile[Json]
32
Confidential
Is this all about writing clean code?
33
Confidential
Is this all about writing clean code?
Nope!
Network
Bandwidth
Storage I/OCPU RAM
It may be a
bottleneck
34
Confidential
Is this all about writing clean code?
Nope!
Network
Bandwidth
Storage I/OCPU RAM
Compression
algorithms
MapReduce
& Spark jobs
tuning
Storage
formats
Data access
patterns
It may be a
bottleneck
It may help to
overcome the
bottleneck
35
Confidential
MapReduce + Spark: One must use them right
36
Confidential
36
Q&A session
37
Confidential
Thank you!

More Related Content

Similar to Big Data in Advertising Industry — Oleksandr Fedirko, Danylo Stepanchuk

Digital marketing strategy playbook
Digital marketing strategy playbookDigital marketing strategy playbook
Digital marketing strategy playbook
AdCMO
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
Stratebi
 
Data.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptxData.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptx
Doug Hall
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick Gorski
Hakka Labs
 
Adobe Business.pptx
Adobe Business.pptxAdobe Business.pptx
Adobe Business.pptx
Ankush Kapil
 
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Scott Levine
 
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud MarketplacesSapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
Rico Mallozzi
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
Data Science Club
 
Data.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptxData.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptx
DougHall64
 
Programmatic 101 webinar slides ck 032714 final
Programmatic 101 webinar slides   ck 032714 finalProgrammatic 101 webinar slides   ck 032714 final
Programmatic 101 webinar slides ck 032714 final
IABmembership
 
Webinar: Retargeting to the Max
Webinar: Retargeting to the MaxWebinar: Retargeting to the Max
Webinar: Retargeting to the Max
Katana Media
 
A History of Programmatic Media
A History of Programmatic MediaA History of Programmatic Media
A History of Programmatic Media
The Media Kitchen
 
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted AdsTracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
Sajjad "JJ" Arshad
 
TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016
The Media Kitchen
 
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdfEmerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Guido X Jansen
 
Kritter introduction - technology player
Kritter   introduction - technology playerKritter   introduction - technology player
Kritter introduction - technology player
Krittercorporate
 
Embedded analytics and digital transformation
Embedded analytics and digital transformationEmbedded analytics and digital transformation
Embedded analytics and digital transformation
Guha Athreya
 
EnergyMarketPrice Platform
EnergyMarketPrice PlatformEnergyMarketPrice Platform
EnergyMarketPrice Platform
EnergyMarketPrice
 
Presentation at CPDP
Presentation at CPDP Presentation at CPDP
Presentation at CPDP
Johnny Ryan
 
Ad technology101 v8
Ad technology101 v8Ad technology101 v8
Ad technology101 v8
Satish Mehta
 

Similar to Big Data in Advertising Industry — Oleksandr Fedirko, Danylo Stepanchuk (20)

Digital marketing strategy playbook
Digital marketing strategy playbookDigital marketing strategy playbook
Digital marketing strategy playbook
 
A federated information infrastructure that works
A federated information infrastructure that works A federated information infrastructure that works
A federated information infrastructure that works
 
Data.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptxData.Monks sGTM is a universal endpoint.pptx
Data.Monks sGTM is a universal endpoint.pptx
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick Gorski
 
Adobe Business.pptx
Adobe Business.pptxAdobe Business.pptx
Adobe Business.pptx
 
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
Digital Display Advertising For Medicare Marketing: The Comprehensive Medicar...
 
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud MarketplacesSapphire Ventures: The Startup's Guide to Cloud Marketplaces
Sapphire Ventures: The Startup's Guide to Cloud Marketplaces
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
 
Data.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptxData.Monks SGTM is a universal endpoint.pptx
Data.Monks SGTM is a universal endpoint.pptx
 
Programmatic 101 webinar slides ck 032714 final
Programmatic 101 webinar slides   ck 032714 finalProgrammatic 101 webinar slides   ck 032714 final
Programmatic 101 webinar slides ck 032714 final
 
Webinar: Retargeting to the Max
Webinar: Retargeting to the MaxWebinar: Retargeting to the Max
Webinar: Retargeting to the Max
 
A History of Programmatic Media
A History of Programmatic MediaA History of Programmatic Media
A History of Programmatic Media
 
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted AdsTracing Information Flows Between Ad Exchanges Using Retargeted Ads
Tracing Information Flows Between Ad Exchanges Using Retargeted Ads
 
TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016TMK.edu Programmatic: September 2016
TMK.edu Programmatic: September 2016
 
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdfEmerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
Emerce Retail_Stage Session_SQLI&Spryker_2022_AU (1).pdf
 
Kritter introduction - technology player
Kritter   introduction - technology playerKritter   introduction - technology player
Kritter introduction - technology player
 
Embedded analytics and digital transformation
Embedded analytics and digital transformationEmbedded analytics and digital transformation
Embedded analytics and digital transformation
 
EnergyMarketPrice Platform
EnergyMarketPrice PlatformEnergyMarketPrice Platform
EnergyMarketPrice Platform
 
Presentation at CPDP
Presentation at CPDP Presentation at CPDP
Presentation at CPDP
 
Ad technology101 v8
Ad technology101 v8Ad technology101 v8
Ad technology101 v8
 

More from GlobalLogic Ukraine

GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Ukraine
 
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Ukraine
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic Ukraine
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptx
GlobalLogic Ukraine
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptx
GlobalLogic Ukraine
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
GlobalLogic Ukraine
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Ukraine
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"
GlobalLogic Ukraine
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic Ukraine
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic Education
GlobalLogic Ukraine
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic Ukraine
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic Ukraine
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?
GlobalLogic Ukraine
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Ukraine
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Ukraine
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic Ukraine
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
GlobalLogic Ukraine
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Ukraine
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"
GlobalLogic Ukraine
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Ukraine
 

More from GlobalLogic Ukraine (20)

GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
GlobalLogic Embedded Community x ROS Ukraine Webinar "Surgical Robots"
 
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptx
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptx
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic Education
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
 

Recently uploaded

学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 

Recently uploaded (20)

学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 

Big Data in Advertising Industry — Oleksandr Fedirko, Danylo Stepanchuk

  • 2. 2 Confidential Big Data in Advertisement Industry
  • 3. 3 Confidential Agenda - Intro into Ad Exchange business area - Big Data tools overview - Architectural approach - JVM-based processing in Big Data analytics
  • 4. 4 Confidential Intro into Ad Exchange business area
  • 5. 5 Confidential Ad Evolution Reservation Buying ads sold via direct transactions between advertisers/agencies and publishers Ad Networks ad networks aggregate inventory and sold it to advertisers. Helped publishers by selling inventory they could not sell themselves Ad Exchanges & SSPs real-time marketplaces with a large pools of liquid inventory not sold in direct buys: SSPs have more controls for publishers to optimize yield DSPs Bidding technology designed to help advertisers/agencies target and optimize their buys across multiple ad exchanges/publisher inventory pools Private Exchanges & Automated Guaranteed Exclusive advertiser-to- publisher inventory relationship for programmatic purchasing in brand safe environments 1990s Now Direct Sold/ Guaranteed/ Reserved Indirect/ Programmatic/ Unreserved Programmatic Premium
  • 6. 6 Confidential SellersBuyers Ad Ecosystem. How it works? Ad Network Ad Network Agency DSP Ad Exchange SSP Publisher DMP/Data Supply Brand Audience RTB
  • 8. 8 Confidential What is Big Data? We’ve all heard the term “big data,” but you may not know exactly what it means. Most experts agree the term describes information that shares these three attributes:
  • 9. 9 Confidential Typical Big Data pipeline Data Sources - Structured - Unstructured Data Ingestion - Batch layer - Stream layer Storage BI / Data Warehouse Visualization and Reporting ToolsProcessing Layer - Data Mining - Machine Learning Governance and Privacy Security Quality Management High Scale; Low Cost
  • 10. 10 Confidential Storages (non-relational) Key-value Document Column-oriented Graph Full-text (search engine) BLOB
  • 11. 11 Confidential Data ingestion or ETL Batch Near to realtime Realtime Source ETL Destination
  • 14. 14 Confidential Spark Apache Spark is a unified analytics engine for large-scale data processing
  • 15. 15 Confidential MapReduce ● Good old, slow and reliable ● Written in Java ● Natively supports Java, though all JVM compatible languages are adaptable ● Easy to learn and tune ● Just batch processing ● Hard to implement complex pipelines ● Unit testing Spark ● “Brand-new”, fast and flexible ● Written in Scala ● Natively supports Scala and Java (R and Python) ● Provides fat pack of functionality ● Batch and micro-batch processing ● Support of complex pipelines is its thing ● Unit testing MapReduce vs Spark: Which one to pick up?
  • 17. 17 Confidential High level overview Bid PlatformAd Platform Buyer Buyer Buyer Analytical Platform Seller
  • 18. 18 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events
  • 19. 19 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events ● over 100 TB of serialized and compressed raw input data
  • 20. 20 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events ● over 100 TB of serialized and compressed raw input data ● around 150K analytic queries over 110 dimensions in an analytic data store
  • 21. 21 Confidential Big Data analytics: What’s the challenge? Daily ● 65B of raw ad and bid events ● over 100 TB of serialized and compressed raw input data ● around 150K analytic queries over 110 dimensions in an analytic data store ● 4s of 98% query time and 1s of Avg query time
  • 22. 22 Confidential Big Data pipeline applied Ad & Bid Platforms Data Collector HDFS Druid Performance Analytics MapReduce Spark
  • 23. 23 Confidential Big Data pipeline applied Ad & Bid Platforms Data Collector HDFS Druid Performance Analytics MapReduce Spark
  • 26. 26 Confidential Let’s solve some problem: Keywords Seller “I want to have an opportunity to get performance reports beyond the standard account, site, zone, size, geography, etc”
  • 27. 27 Confidential Let’s solve some problem: Keywords Seller “I want to have an opportunity to get performance reports beyond the standard account, site, zone, size, geography, etc” Ad Exchange Company “I want to satisfy high demand of this functionality, let’s name it Keywords, but I also want to reduce processing and retention cost by servicing only sellers with limited number of different keywords”
  • 28. 28 Confidential Let’s solve some problem: Keywords Seller “I want to have an opportunity to get performance reports beyond the standard account, site, zone, size, geography, etc” Ad Exchange Company “I want to satisfy high demand of this functionality, let’s name it Keywords, but also want to reduce processing and retention cost by servicing only sellers with limited number of different keywords” Engineering “There are two steps to solve Keywords problem: first, we need to identify sellers which comply with a threshold; second, we need to prepare reports only for them”
  • 29. 29 Confidential Spark: Let’s write some code def getKeyword(AELog) => Option[ ( AccountId, Keyword ) ] AdLog.getDataset(inputPath)(sparkSession) .flatMap( getKeyword ) .distinct .mapValues(_ => 1L) .reduceByKey(_ + _) .filter { case (_, totalKeywords) => totalKeywords <= maxKeywordsNumber } .keys .collect() .toSet Step #1: Identify valid sellers (AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => Set[AccountId]
  • 30. 30 Confidential Spark: Let’s write some code case class KeywordsRecord( … ) // fields which represent dimensions and metrics object KeywordsRecord { .. } // functions pack to operate with input/output data AdLog.getDataset(inputPath)(sparkSession) .filter( adLog => validSellers.contains(adLog.getAccountId) ) .map( KeywordsRecord.fromAdLog ) .toDS .groupBy( KeywordsRecord.groupBy: _* ) // dimensions .agg( KeywordsRecord.aggregations.head, KeywordsRecord.aggregations.tail: _* ) // metrics .select( KeywordsRecord.allCols: _* ) .as[ KeywordsRecord ] .map( KeywordsRecord.toJson ) .write .text(outputPath) Step #2: Prepare Keywords Report (AdLogs: SeqFile[ID,AdLog], validSellers: Set[AccountId]) => TextFile[Json]
  • 31. 31 Confidential Spark: Let’s write some code object KeywordsApplication { def getValidSellers(inputPath, maxKeywordsNumber)(implicit SparkSession) def prepareReport(inputPath, outputPath, validSellers)(implicit SparkSession) def main(args: Array[String]) = { … implicit val sparkSession = SparkSession.builder() .appName(jobName) .getOrCreate val validSellers = getValidSellers(inputPath, maxKeywordsNumber) prepareReport(inputPath, outputPath, validSellers) … } } Put it together: Step #1 + Step#2 (AdLogs: SeqFile[ID,AdLog], maxKeywords: Long) => TextFile[Json]
  • 32. 32 Confidential Is this all about writing clean code?
  • 33. 33 Confidential Is this all about writing clean code? Nope! Network Bandwidth Storage I/OCPU RAM It may be a bottleneck
  • 34. 34 Confidential Is this all about writing clean code? Nope! Network Bandwidth Storage I/OCPU RAM Compression algorithms MapReduce & Spark jobs tuning Storage formats Data access patterns It may be a bottleneck It may help to overcome the bottleneck
  • 35. 35 Confidential MapReduce + Spark: One must use them right