SlideShare a Scribd company logo
Auditing data and
answering that life long
question, is it the end of
the day yet?
A true story based off of my endeavors
at Nielsen
Simona Meriam, Aidoc
Agenda
1. Little fires everywhere
2. Designing our auditing data
3. Storing and querying our data
4. Is it the end of the day yet?
5. Alerts and add-ons
● Simona Meriam
● Senior Big Data Engineer @ Aidoc
● Data lover
● Concert goer
● Japan enthusiast
WHOAMI?
LinkedIn:
https://il.linkedin.com/in/simona-meriam-94
68885
Twitter: @simonameriam
Nielsens’ architecture (AT THE TIME)
Little fires
everywhere
Little fires everywhere
● Data arrival pain points and recovering from failures
● When to process data?
● Is it the end of the day yet?
● Some more pain points
Data arrival pain points
Data arrival pain points
Recovering from failures
Recovering from failures
Is it the end of the day yet?
● Let’s talk about this question, and possible answers
○ Data granularity
○ Time granularity
● So when do we process data? When is it the end of
the day?
● The implications of processing and reprocessing
Is it the end of the day yet?
Legacy answers to a legacy question
● Fixed time
● AWS S3 ls
Back to pain points
And some more
Little fires everywhere
Designing our
auditing data
Auditing window?
● Several Kafka topics
● Data serving infrastructure
○ Our own “Nielsen Kafka Producer”
○ 2 JVMs on a single machine
○ Each JVM works against several topics
○ SLA’s are very important!!
● The use of AVRO
Then finally, what is a window?
What should we keep in
mind?
Auditing window
Key
● Topic
● Server
● Process
● Audit time
Value
● Counter
Auditing header
Auditing header injection
Auditing subsystem
Audit windows Audit headers
● Open window
● Close window
● Increment counter
● Inject audit header
Nielsen Kafka Producer
In context
● S3 Consumer ● Kafka consumer
● Straight through API
Audit header Audit window
Consuming our data
In context
Storing and
querying our data
Designing our output table
● Level of granularity?
● Arrival rates?
● Arrival latency?
Questions we want answered!
● Audit timestamp
● Topic
● Server
● Process
● Event count
● Location - origin of data
How about some add ons?
● Region
● Insert time
Designing our output table
insert_time window_timestamp topic_name server_name region process_id location event_count
2021-08-02T00:05:00.079000 2021-08-02T00:00:00 TOPIC1 server1.ams1.nielsen ams1 19862 kafka_windows 0
2021-08-02T00:05:00.082000 2021-08-02T00:00:00 TOPIC1 server1.slj1.nielsen slj1 4075 kafka_windows 98396
2021-08-02T00:05:00.082000 2021-08-02T00:00:00 TOPIC2 server1.slj1.nielsen slj1 4075 kafka_windows 31805
2021-08-02T00:05:00.082000 2021-08-02T00:00:00 TOPIC1 server1.slj1.nielsen slj1 4075 kafka_windows 98396
2021-08-02T00:32:12.082000 2021-08-02T00:00:00 TOPIC2 server1.slj1.nielsen slj1 4075 rdr_headers 12453
2021-08-02T00:05:00.132000 2021-08-02T00:00:00 TOPIC3 server2.ams1.nielsen ams1 31573 kafka_windows 84924
2021-08-02T00:05:00.131000 2021-08-02T00:00:00 TOPIC1 server2.ams1.nielsen ams1 31573 kafka_windows 0
2021-08-02T00:10:00.009700 2021-08-02T00:05:00 TOPIC2 server2.ams1.nielsen ams1 31571 kafka_windows 3177
Q & A with Superset
Q & A with Superset
Q & A with Superset
SELECT window_timestamp AT TIME ZONE 'UTC' AS window_timestamp,
topic_name,
SUM(CASE WHEN location = 'kafka_windows' THEN event_count ELSE 0 END) AS producer_count,
SUM(CASE WHEN location = 'rdr_headers' AND
(insert_time AT TIME zone 'utc' - window_timestamp AT TIME zone 'utc' <= INTERVAL '3
HOURS') THEN event_count ELSE 0 END) AS rdr_count
FROM audit.audit_data
WHERE window_timestamp <= CURRENT_TIMESTAMP - INTERVAL '2 HOURS' AND
window_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 MONTH' AND
event_count > 0
GROUP BY window_timestamp, topic_name
HAVING SUM(CASE WHEN location = 'kafka_windows' THEN event_count ELSE 0 END) > 0
Is it the end of the
day yet?
Is is the end of the day yet??
1. Data arrival rate for the entire
scope?
3. Arrival rate for the last
window?
2. Number of audit windows
for the entire scope?
Some alerts and
add-ons
Some alerts and add-ons
● Alert granularity for different types of
failures
○ Region
○ Topic
○ Server
● Detecting duplications
● More locations!
CREDITS: This presentation template was
created by Slidesgo, including icons by
Flaticon, and infographics & images by Freepik
Thank you!
@simonameriam
https://il.linkedin.com/in/simona-meriam-9468885

More Related Content

Similar to Auditing your data and answering the life long question, is it the end of the day yet? with Simona Meriam | Kafka Summit London 2022

Playing with data and industry 4.0
Playing with data and industry 4.0Playing with data and industry 4.0
Playing with data and industry 4.0
WMG, University of Warwick
 
Value Stream Mapping: What to Do Before You Dive In
Value Stream Mapping: What to Do Before You Dive InValue Stream Mapping: What to Do Before You Dive In
Value Stream Mapping: What to Do Before You Dive In
TKMG, Inc.
 
Data Collection 2.pptx
Data Collection 2.pptxData Collection 2.pptx
Data Collection 2.pptx
Esteban López
 
Critical incident management.pptx
Critical incident management.pptxCritical incident management.pptx
Critical incident management.pptx
DavidForeroS
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
Iwsm2014 gathering data on it maintenance work (jos de weerdt) public release
Iwsm2014   gathering data on it maintenance work (jos de weerdt) public releaseIwsm2014   gathering data on it maintenance work (jos de weerdt) public release
Iwsm2014 gathering data on it maintenance work (jos de weerdt) public release
Nesma
 
Analytics in Action: Project Analytics: Visibility that Aids Risk Management
Analytics in Action: Project Analytics: Visibility that Aids Risk ManagementAnalytics in Action: Project Analytics: Visibility that Aids Risk Management
Analytics in Action: Project Analytics: Visibility that Aids Risk Management
Hannah Flynn
 
Project Analytics: Visibility that Aids Risk Management
Project Analytics: Visibility that Aids Risk ManagementProject Analytics: Visibility that Aids Risk Management
Project Analytics: Visibility that Aids Risk Management
Aggregage
 
The Scary Truth of Spreadsheets
The Scary Truth of SpreadsheetsThe Scary Truth of Spreadsheets
The Scary Truth of Spreadsheets
KeyedIn Projects
 
How Graphs are Taming the Complexity of Network & IT Ops
How Graphs are Taming the Complexity of Network & IT OpsHow Graphs are Taming the Complexity of Network & IT Ops
How Graphs are Taming the Complexity of Network & IT Ops
Neo4j
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
Steve Mushero
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
AsianGames Security Story - Andika Triwidada
AsianGames Security Story - Andika TriwidadaAsianGames Security Story - Andika Triwidada
AsianGames Security Story - Andika Triwidada
idsecconf
 
24-HourClock.pdf
24-HourClock.pdf24-HourClock.pdf
24-HourClock.pdf
AbhishekJha401
 
B tech project sample- petrol pump management system project in vb.net .
B tech project sample- petrol pump management system project in vb.net  .B tech project sample- petrol pump management system project in vb.net  .
B tech project sample- petrol pump management system project in vb.net .
Study Stuff
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Volodymyr Kazantsev
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
iguazio
 
Benchmarking The Digital Workplace
Benchmarking The Digital WorkplaceBenchmarking The Digital Workplace
Benchmarking The Digital Workplace
Unicorn Titans
 
5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events
Jason Strate
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
Jay Gordon
 

Similar to Auditing your data and answering the life long question, is it the end of the day yet? with Simona Meriam | Kafka Summit London 2022 (20)

Playing with data and industry 4.0
Playing with data and industry 4.0Playing with data and industry 4.0
Playing with data and industry 4.0
 
Value Stream Mapping: What to Do Before You Dive In
Value Stream Mapping: What to Do Before You Dive InValue Stream Mapping: What to Do Before You Dive In
Value Stream Mapping: What to Do Before You Dive In
 
Data Collection 2.pptx
Data Collection 2.pptxData Collection 2.pptx
Data Collection 2.pptx
 
Critical incident management.pptx
Critical incident management.pptxCritical incident management.pptx
Critical incident management.pptx
 
Our journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scaleOur journey with druid - from initial research to full production scale
Our journey with druid - from initial research to full production scale
 
Iwsm2014 gathering data on it maintenance work (jos de weerdt) public release
Iwsm2014   gathering data on it maintenance work (jos de weerdt) public releaseIwsm2014   gathering data on it maintenance work (jos de weerdt) public release
Iwsm2014 gathering data on it maintenance work (jos de weerdt) public release
 
Analytics in Action: Project Analytics: Visibility that Aids Risk Management
Analytics in Action: Project Analytics: Visibility that Aids Risk ManagementAnalytics in Action: Project Analytics: Visibility that Aids Risk Management
Analytics in Action: Project Analytics: Visibility that Aids Risk Management
 
Project Analytics: Visibility that Aids Risk Management
Project Analytics: Visibility that Aids Risk ManagementProject Analytics: Visibility that Aids Risk Management
Project Analytics: Visibility that Aids Risk Management
 
The Scary Truth of Spreadsheets
The Scary Truth of SpreadsheetsThe Scary Truth of Spreadsheets
The Scary Truth of Spreadsheets
 
How Graphs are Taming the Complexity of Network & IT Ops
How Graphs are Taming the Complexity of Network & IT OpsHow Graphs are Taming the Complexity of Network & IT Ops
How Graphs are Taming the Complexity of Network & IT Ops
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
AsianGames Security Story - Andika Triwidada
AsianGames Security Story - Andika TriwidadaAsianGames Security Story - Andika Triwidada
AsianGames Security Story - Andika Triwidada
 
24-HourClock.pdf
24-HourClock.pdf24-HourClock.pdf
24-HourClock.pdf
 
B tech project sample- petrol pump management system project in vb.net .
B tech project sample- petrol pump management system project in vb.net  .B tech project sample- petrol pump management system project in vb.net  .
B tech project sample- petrol pump management system project in vb.net .
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Benchmarking The Digital Workplace
Benchmarking The Digital WorkplaceBenchmarking The Digital Workplace
Benchmarking The Digital Workplace
 
5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events
 
Solving the Database Problem
Solving the Database ProblemSolving the Database Problem
Solving the Database Problem
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 

Recently uploaded (20)

Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 

Auditing your data and answering the life long question, is it the end of the day yet? with Simona Meriam | Kafka Summit London 2022

  • 1. Auditing data and answering that life long question, is it the end of the day yet? A true story based off of my endeavors at Nielsen Simona Meriam, Aidoc
  • 2. Agenda 1. Little fires everywhere 2. Designing our auditing data 3. Storing and querying our data 4. Is it the end of the day yet? 5. Alerts and add-ons
  • 3. ● Simona Meriam ● Senior Big Data Engineer @ Aidoc ● Data lover ● Concert goer ● Japan enthusiast WHOAMI? LinkedIn: https://il.linkedin.com/in/simona-meriam-94 68885 Twitter: @simonameriam
  • 6. Little fires everywhere ● Data arrival pain points and recovering from failures ● When to process data? ● Is it the end of the day yet? ● Some more pain points
  • 9.
  • 12. Is it the end of the day yet? ● Let’s talk about this question, and possible answers ○ Data granularity ○ Time granularity ● So when do we process data? When is it the end of the day? ● The implications of processing and reprocessing
  • 13. Is it the end of the day yet? Legacy answers to a legacy question ● Fixed time ● AWS S3 ls
  • 14. Back to pain points
  • 18. Auditing window? ● Several Kafka topics ● Data serving infrastructure ○ Our own “Nielsen Kafka Producer” ○ 2 JVMs on a single machine ○ Each JVM works against several topics ○ SLA’s are very important!! ● The use of AVRO Then finally, what is a window? What should we keep in mind?
  • 19. Auditing window Key ● Topic ● Server ● Process ● Audit time Value ● Counter
  • 21.
  • 23. Auditing subsystem Audit windows Audit headers ● Open window ● Close window ● Increment counter ● Inject audit header Nielsen Kafka Producer
  • 25. ● S3 Consumer ● Kafka consumer ● Straight through API Audit header Audit window Consuming our data
  • 28. Designing our output table ● Level of granularity? ● Arrival rates? ● Arrival latency? Questions we want answered!
  • 29. ● Audit timestamp ● Topic ● Server ● Process ● Event count ● Location - origin of data How about some add ons? ● Region ● Insert time Designing our output table
  • 30. insert_time window_timestamp topic_name server_name region process_id location event_count 2021-08-02T00:05:00.079000 2021-08-02T00:00:00 TOPIC1 server1.ams1.nielsen ams1 19862 kafka_windows 0 2021-08-02T00:05:00.082000 2021-08-02T00:00:00 TOPIC1 server1.slj1.nielsen slj1 4075 kafka_windows 98396 2021-08-02T00:05:00.082000 2021-08-02T00:00:00 TOPIC2 server1.slj1.nielsen slj1 4075 kafka_windows 31805 2021-08-02T00:05:00.082000 2021-08-02T00:00:00 TOPIC1 server1.slj1.nielsen slj1 4075 kafka_windows 98396 2021-08-02T00:32:12.082000 2021-08-02T00:00:00 TOPIC2 server1.slj1.nielsen slj1 4075 rdr_headers 12453 2021-08-02T00:05:00.132000 2021-08-02T00:00:00 TOPIC3 server2.ams1.nielsen ams1 31573 kafka_windows 84924 2021-08-02T00:05:00.131000 2021-08-02T00:00:00 TOPIC1 server2.ams1.nielsen ams1 31573 kafka_windows 0 2021-08-02T00:10:00.009700 2021-08-02T00:05:00 TOPIC2 server2.ams1.nielsen ams1 31571 kafka_windows 3177
  • 31.
  • 32. Q & A with Superset
  • 33. Q & A with Superset
  • 34. Q & A with Superset SELECT window_timestamp AT TIME ZONE 'UTC' AS window_timestamp, topic_name, SUM(CASE WHEN location = 'kafka_windows' THEN event_count ELSE 0 END) AS producer_count, SUM(CASE WHEN location = 'rdr_headers' AND (insert_time AT TIME zone 'utc' - window_timestamp AT TIME zone 'utc' <= INTERVAL '3 HOURS') THEN event_count ELSE 0 END) AS rdr_count FROM audit.audit_data WHERE window_timestamp <= CURRENT_TIMESTAMP - INTERVAL '2 HOURS' AND window_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 MONTH' AND event_count > 0 GROUP BY window_timestamp, topic_name HAVING SUM(CASE WHEN location = 'kafka_windows' THEN event_count ELSE 0 END) > 0
  • 35. Is it the end of the day yet?
  • 36.
  • 37. Is is the end of the day yet?? 1. Data arrival rate for the entire scope? 3. Arrival rate for the last window? 2. Number of audit windows for the entire scope?
  • 39. Some alerts and add-ons ● Alert granularity for different types of failures ○ Region ○ Topic ○ Server ● Detecting duplications ● More locations!
  • 40.
  • 41.
  • 42. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik