SlideShare a Scribd company logo
Building reliable big data
applications for news brands
across the Benelux
- Why
- What
- How, 5 challenges
Rogier Vlijm
Frank Mekkelholt
Big Data De Persgroep Nederland 2019
2
25 GB
=
111.825 boeken per dag
>300 mio
events a day
75 GB
data a day
Monthly reach
DIGITAL and PRINT 2018
Data strategy
Building a data foundation to generate value with data for:
Increase conversion
&
Personal Offer
personalized
ads
better results
uplift of page views,
unique visitors
& time on site
respectful
&
reducing risk
3
Newsroom
2
Digital
advertising
1
Subscriptions
Marketing
4
Compliancy
& security
Digital
advertising
News
room
Subscription
Marketing
360 ° PROFILE AND DATALAKE
IN THE CENTRE OF THE ORGANIZATION
Increase
automated
conversion on
paywall and
newspaper.nl
Automated CTR
optimization
Audiences
visitor behavior
for news
innovations
b2c marketing
datafundament
tbv Consumer
Intelligence
b2b Sales
data fundament
Optimize Media channels, creation and platform.
- Channels: Where can we find our consumer and find the best way to convince them.
- Creation: How do we best appeal to you as a consumer and in what format?
- Platform: In what phase is the consumer and do we convert him to sales or more engagement?
-
Data: Online subscriptions marketing
7
televisie
sport
overig
regionieuws
binnenlandse
politiek
onderwijs
digitaal &
technologie
herdenken
voeding
internationaal
culinair
consumeren
& uitgeven
literatuur
bouw &
vastgoed
defensie
natuur &
milieu
geloof &
samenleving
gezondheid
binnenland
rampen &
rechtshandhaving
transport
muziek
film &
podiumkunst
royalty weer
BV
Nederland
bekende
personen
hoger onderwijs
& emancipatie
voetbal
drama &
emotie
kunst
KONING
VOETBAL
Specifieke interesse in voetbal, maar leest ook andere
sporten. Checkt regelmatig voetbalcenter voor uitslagen.
kenmerk per
gebruiker
aantal
# bezoeken per
maand
5,8
# pagina’s per
bezoek
6,5
# artikelpagina’s 79
% kijkt video’s 19
% crossdomain
landelijk / regionaal
15/20
% ingelogd 4,9
Reduce the distance to Google and Facebook
• Strong brands (Volkskrant, Parool, Trouw, AD, tweakers, Qmusic etc ..)
• Link with demographic characteristics through CRM data
• Create audiences based on behaviour
• Demand from larger advertisers is growing to be less dependent on Google
or Facebook while maintaining results
→ Closing step by step by building data in 2 zones
1. Demographic and behavioral data
2. Intent data
Improve service for advertisers and close gap with Google and Facebook
● How successful is my story
● Via which channels van I need to publish
● Can I improve the header
● Should we create a follow up?
RAW layer
Master
(datamarts)
Clean layer
Batch / micro batch
Data catalog raw
● Source
● Owner
● Location
● Frequentie
● Description
● Consent
● Delta /full
Data catalog Clean
● Consent
● PI data (hashed)
● frequentie
● lookuptables
● field description
🕐Airflow Ingestion
Monitoring/ alerting
👤 Acces by owner and
dataprocessor
🕐Airflow transformation
Monitoring/ alerting
Consumers van data
(CI/BI/CX/IT/DCC)
● Dataiku
● Databricks
● Redshift (Spectrum)
● Athena
● Looker / Clicksense
S
3
S
3
🕐Airflow / data
transformation
👤 Acces role based /PI
data hashed
Logging user
Trails
Monitoring
Performance
/costs
How:
Translating 5 challenges
to technical solutions
Analytics log level data:
- time1 - user1 - article1
- time2 - user1 - article2
- time3 - user2 - article2
- ....
1 - User-content interactions
for analytics and data science
articles
users
1 1
0 1
0.5 0.22 0.28
.01 .01 0.98
users
1 - User-content interactions
for analytics and data science
Problems with known analytics partners
- Throttling/sampling
- Non-realtime (event level)
- 3rd party tracking
- Non-transparant
- Privacy control
- Vendor lock-in
1 - User-content interactions
for analytics and data science
Open source tracking:
- Android
- Go
- .NET
- iOS
- Java
- JavaScript
- NodeJS
- Python
- Scala
- [many more]
- Infrastructure as a Service on AWS
- Open source
- Flexible/configurable
- Realtime
- 1st party
2 - Easy testing playground
2 - Easy testing playground
2 - Easy testing playground
- Raw topic - Enriched topic
- Corrupt topic
Snowplow
Collector
Snowplow
Enricher
3 - Data quality
Challenges:
- Variety of brands
- Variety of platforms
- Variety of development teams
Solutions:
- Enforcing schema verification => corrupt events topic
- Tag manager templating
- Monitoring of tags and anomalies
- Automated quality assurance for new releases
4 - High event volumes
ClicksPageviews Player heartbeats
5 B/month
Processing
- Transform
- Filter
- Parse
- Window
- Aggregate
Integrate with
- Business Intelligence tools
- Data Science tools
4 - High event volumes
Solutions:
- Snowplow => heavy lifting of collecting
- Start/terminate (EMR clusters on AWS)
from Airflow when needed
- Spark for cleaning and aggregating
- Mirror S3 (partially) to Redshift for fast
querying and BI tooling
regionieuws
bouw &
vastgoed
voetbal
5 - Realtime scalability
5 - Realtime scalability
Night/day pattern
- almost no night time traffic
Breaking news/developing stories
- double / quadruple daily volume
Push notifications
- peaks up to 16K events per second
How to aggregate?
5 - Realtime scalability
Challenges
- Fluctuating traffic
- Stateful streaming
Considerations
- Latency - How fast is fast enough?
- Spark Streaming is still mini-batch
Solutions
- Dockerize applications
- Orchestrate with Kubernetes
- Container I/O to Kafka
- Redis
- ElasticSearch
- Flink
‘Data isn’t magic,
it’s what you do with it that counts’
(Mary Hamilton, The Guardian)

More Related Content

What's hot

Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
confluent
 
How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization
confluent
 
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
confluent
 
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
 Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML... Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
Databricks
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
Max Lapan
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cash
confluent
 
How to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using SnowplowHow to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using Snowplow
Giuseppe Gaviani
 
Digital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just TechnologyDigital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just Technology
confluent
 
SUMMER 2018 EXECUTIVE BRIEFING
SUMMER 2018  EXECUTIVE BRIEFINGSUMMER 2018  EXECUTIVE BRIEFING
SUMMER 2018 EXECUTIVE BRIEFING
Demandbase
 
JUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streamingJUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streaming
Nicolas Fränkel
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
yalisassoon
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Guido Schmutz
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by
 
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
confluent
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
VoltDB
 
CCT Check and Calculate Transfer
CCT Check and Calculate TransferCCT Check and Calculate Transfer
CCT Check and Calculate Transfer
Francesca Pappalardo
 
Data reply sneak peek: real time decision engines
Data reply sneak peek:  real time decision enginesData reply sneak peek:  real time decision engines
Data reply sneak peek: real time decision engines
confluent
 

What's hot (20)

Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
 
How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization
 
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
Chris D'Agostino | Kafka Summit 2018 Keynote (Building an Enterprise Streamin...
 
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
 Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML... Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cash
 
How to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using SnowplowHow to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using Snowplow
 
Digital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just TechnologyDigital Transformation Mindset - More Than Just Technology
Digital Transformation Mindset - More Than Just Technology
 
SUMMER 2018 EXECUTIVE BRIEFING
SUMMER 2018  EXECUTIVE BRIEFINGSUMMER 2018  EXECUTIVE BRIEFING
SUMMER 2018 EXECUTIVE BRIEFING
 
JUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streamingJUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streaming
 
Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...Why use big data tools to do web analytics? And how to do it using Snowplow a...
Why use big data tools to do web analytics? And how to do it using Snowplow a...
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
CCT Check and Calculate Transfer
CCT Check and Calculate TransferCCT Check and Calculate Transfer
CCT Check and Calculate Transfer
 
Data reply sneak peek: real time decision engines
Data reply sneak peek:  real time decision enginesData reply sneak peek:  real time decision engines
Data reply sneak peek: real time decision engines
 

Similar to Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.BI
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
Data-based business models: How to turn your data into a goldmine?
Data-based business models: How to turn your data into a goldmine?Data-based business models: How to turn your data into a goldmine?
Data-based business models: How to turn your data into a goldmine?
diconium
 
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
Trieu Nguyen
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

Cloudera, Inc.
 
Improve Store Expansion (Territory Management Featuring)
Improve Store Expansion (Territory Management Featuring)Improve Store Expansion (Territory Management Featuring)
Improve Store Expansion (Territory Management Featuring)
Esri España
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
India Quotient
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
Real-time, Big-data Marketing Solution
Real-time, Big-data Marketing SolutionReal-time, Big-data Marketing Solution
Real-time, Big-data Marketing Solution
기형 남
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russia
rusbase.vc
 
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow
 
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
Arvind Sathi
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
Amazon Web Services
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)Vishal Bamba
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forward
Kiththi Perera
 
Dataiku tatvic webinar presentation
Dataiku tatvic webinar presentationDataiku tatvic webinar presentation
Dataiku tatvic webinar presentation
Tatvic Analytics
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
Alex Henthorn-Iwane
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
QA or the Highway
 

Similar to Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm (20)

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Data-based business models: How to turn your data into a goldmine?
Data-based business models: How to turn your data into a goldmine?Data-based business models: How to turn your data into a goldmine?
Data-based business models: How to turn your data into a goldmine?
 
[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP[Notes] Customer 360 Analytics with LEO CDP
[Notes] Customer 360 Analytics with LEO CDP
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

 
Improve Store Expansion (Territory Management Featuring)
Improve Store Expansion (Territory Management Featuring)Improve Store Expansion (Territory Management Featuring)
Improve Store Expansion (Territory Management Featuring)
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Real-time, Big-data Marketing Solution
Real-time, Big-data Marketing SolutionReal-time, Big-data Marketing Solution
Real-time, Big-data Marketing Solution
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russia
 
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
 
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)Customer Intelligence_ Harnessing Elephants at Transamerica    Presentation (1)
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forward
 
Dataiku tatvic webinar presentation
Dataiku tatvic webinar presentationDataiku tatvic webinar presentation
Dataiku tatvic webinar presentation
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 

Presentation Data Council Meetup: F. Mekkenholt, R. Vlijm

  • 1. Building reliable big data applications for news brands across the Benelux - Why - What - How, 5 challenges Rogier Vlijm Frank Mekkelholt
  • 2. Big Data De Persgroep Nederland 2019 2 25 GB = 111.825 boeken per dag >300 mio events a day 75 GB data a day
  • 4. Data strategy Building a data foundation to generate value with data for: Increase conversion & Personal Offer personalized ads better results uplift of page views, unique visitors & time on site respectful & reducing risk 3 Newsroom 2 Digital advertising 1 Subscriptions Marketing 4 Compliancy & security
  • 5. Digital advertising News room Subscription Marketing 360 ° PROFILE AND DATALAKE IN THE CENTRE OF THE ORGANIZATION Increase automated conversion on paywall and newspaper.nl Automated CTR optimization Audiences visitor behavior for news innovations b2c marketing datafundament tbv Consumer Intelligence b2b Sales data fundament
  • 6. Optimize Media channels, creation and platform. - Channels: Where can we find our consumer and find the best way to convince them. - Creation: How do we best appeal to you as a consumer and in what format? - Platform: In what phase is the consumer and do we convert him to sales or more engagement? - Data: Online subscriptions marketing
  • 7. 7 televisie sport overig regionieuws binnenlandse politiek onderwijs digitaal & technologie herdenken voeding internationaal culinair consumeren & uitgeven literatuur bouw & vastgoed defensie natuur & milieu geloof & samenleving gezondheid binnenland rampen & rechtshandhaving transport muziek film & podiumkunst royalty weer BV Nederland bekende personen hoger onderwijs & emancipatie voetbal drama & emotie kunst
  • 8. KONING VOETBAL Specifieke interesse in voetbal, maar leest ook andere sporten. Checkt regelmatig voetbalcenter voor uitslagen. kenmerk per gebruiker aantal # bezoeken per maand 5,8 # pagina’s per bezoek 6,5 # artikelpagina’s 79 % kijkt video’s 19 % crossdomain landelijk / regionaal 15/20 % ingelogd 4,9
  • 9. Reduce the distance to Google and Facebook • Strong brands (Volkskrant, Parool, Trouw, AD, tweakers, Qmusic etc ..) • Link with demographic characteristics through CRM data • Create audiences based on behaviour • Demand from larger advertisers is growing to be less dependent on Google or Facebook while maintaining results → Closing step by step by building data in 2 zones 1. Demographic and behavioral data 2. Intent data Improve service for advertisers and close gap with Google and Facebook
  • 10. ● How successful is my story ● Via which channels van I need to publish ● Can I improve the header ● Should we create a follow up?
  • 11. RAW layer Master (datamarts) Clean layer Batch / micro batch Data catalog raw ● Source ● Owner ● Location ● Frequentie ● Description ● Consent ● Delta /full Data catalog Clean ● Consent ● PI data (hashed) ● frequentie ● lookuptables ● field description 🕐Airflow Ingestion Monitoring/ alerting 👤 Acces by owner and dataprocessor 🕐Airflow transformation Monitoring/ alerting Consumers van data (CI/BI/CX/IT/DCC) ● Dataiku ● Databricks ● Redshift (Spectrum) ● Athena ● Looker / Clicksense S 3 S 3 🕐Airflow / data transformation 👤 Acces role based /PI data hashed Logging user Trails Monitoring Performance /costs
  • 12. How: Translating 5 challenges to technical solutions
  • 13. Analytics log level data: - time1 - user1 - article1 - time2 - user1 - article2 - time3 - user2 - article2 - .... 1 - User-content interactions for analytics and data science articles users 1 1 0 1 0.5 0.22 0.28 .01 .01 0.98 users
  • 14. 1 - User-content interactions for analytics and data science Problems with known analytics partners - Throttling/sampling - Non-realtime (event level) - 3rd party tracking - Non-transparant - Privacy control - Vendor lock-in
  • 15. 1 - User-content interactions for analytics and data science Open source tracking: - Android - Go - .NET - iOS - Java - JavaScript - NodeJS - Python - Scala - [many more] - Infrastructure as a Service on AWS - Open source - Flexible/configurable - Realtime - 1st party
  • 16.
  • 17. 2 - Easy testing playground
  • 18. 2 - Easy testing playground
  • 19. 2 - Easy testing playground - Raw topic - Enriched topic - Corrupt topic Snowplow Collector Snowplow Enricher
  • 20. 3 - Data quality Challenges: - Variety of brands - Variety of platforms - Variety of development teams Solutions: - Enforcing schema verification => corrupt events topic - Tag manager templating - Monitoring of tags and anomalies - Automated quality assurance for new releases
  • 21. 4 - High event volumes ClicksPageviews Player heartbeats 5 B/month Processing - Transform - Filter - Parse - Window - Aggregate Integrate with - Business Intelligence tools - Data Science tools
  • 22. 4 - High event volumes Solutions: - Snowplow => heavy lifting of collecting - Start/terminate (EMR clusters on AWS) from Airflow when needed - Spark for cleaning and aggregating - Mirror S3 (partially) to Redshift for fast querying and BI tooling regionieuws bouw & vastgoed voetbal
  • 23. 5 - Realtime scalability
  • 24.
  • 25.
  • 26.
  • 27. 5 - Realtime scalability Night/day pattern - almost no night time traffic Breaking news/developing stories - double / quadruple daily volume Push notifications - peaks up to 16K events per second How to aggregate?
  • 28. 5 - Realtime scalability Challenges - Fluctuating traffic - Stateful streaming Considerations - Latency - How fast is fast enough? - Spark Streaming is still mini-batch Solutions - Dockerize applications - Orchestrate with Kubernetes - Container I/O to Kafka - Redis - ElasticSearch - Flink
  • 29. ‘Data isn’t magic, it’s what you do with it that counts’ (Mary Hamilton, The Guardian)