SlideShare a Scribd company logo
Logging Apache Spark:
How we made it easy
From my past endeavours at Nielsen
Simona Meriam
Big Data Engineer, Aidoc
Agenda
01 Architecture overview
02 How we used to do it
03 Solution overview
04 Main obstacles
05 Pretty Kibana charts
06 Future and Add-ons
● Simona Meriam
● Big Data Engineer @ Aidoc
● Data lover
● Concert goer
● Japan enthusiast
whoami
Nielsen’s Data Architecture Overview
Nielsen’s Data Architecture Overview
Nielsen’s Data Architecture overview
● Launching a new Spark app
● Production issues
When would you need to access your logs?
How we used to do it
● Access directly on Server
‒ SSH to server running the driver
‒ Decompress and/or access log
● Access through YARN
● Download log file from s3
Accessing your Apache Spark logs on Amazon EMR
How we used to do it
8
Solution overview
1 2 3 4 5
Configure EMR with
bootstrap actions that
include team specific
settings
Bootstrap actions install
filebeat and metricbeat
Spark logs get
automatically shipped to
REDIS
Logs from REDIS are
processed by Logstash
and indexed in
Elasticsearch
Kibana <3
Solution overview
● You can use a bootstrap action to install additional software
● Bootstrap actions run before Amazon EMR installs the applications
that you specify when you create the cluster
● If you add nodes to a running cluster, bootstrap actions also run on
those nodes in the same way
What are AWS bootstrap actions even meant for?
Solution overview
Bootstrapping
Installing Filebeat
Installing Filebeat
Installing Filebeat
“YAML template” file
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
fields:
emr_cluster_id: %EMR_CLUSTER_ID%
fields_under_root: true
# level: debug
# review: 1
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
multiline.pattern: %REGEXP_PATTERN%
Installing Metricbeat
• Old is good
• Change is work
Data Engineers
Main obstacles
Pretty Kibana charts
19
● Replace REDIS with Kafka
● Collection from other EC2 based services
– DRUID
– Kafka
Future and add-ons
Spark logs made easy

More Related Content

What's hot

Algolia - Hosted Search API
Algolia - Hosted Search API Algolia - Hosted Search API
Algolia - Hosted Search API
enterprisesearchmeetup
 
Infrastructure as code (iac) - Terraform for AWS
Infrastructure as code (iac) - Terraform for AWSInfrastructure as code (iac) - Terraform for AWS
Infrastructure as code (iac) - Terraform for AWS
Johanes Glenn
 
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
T. Alexander Lystad
 
Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018
Nils Rhode
 
Serverless Computing: Run code, not servers
Serverless Computing: Run code, not serversServerless Computing: Run code, not servers
Serverless Computing: Run code, not servers
x-celerate
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
torkelo
 
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB
 
How to move a mission critical system to 4 AWS regions in one year?
How to move a mission critical system to 4 AWS regions in one year?How to move a mission critical system to 4 AWS regions in one year?
How to move a mission critical system to 4 AWS regions in one year?
Wojciech Gawroński
 
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB
 
What is new in pass summit 2014
What is new in pass summit 2014What is new in pass summit 2014
What is new in pass summit 2014
Harry Zheng
 
Elk meetup
Elk meetupElk meetup
Elk meetup
Asaf Yigal
 
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
iguazio
 
Tracing Java Applications on Azure
Tracing Java Applications on AzureTracing Java Applications on Azure
Tracing Java Applications on Azure
CodeOps Technologies LLP
 
Algolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide APIAlgolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide API
Paul-Louis NECH
 
How we use the play framework
How we use the play frameworkHow we use the play framework
How we use the play framework
Itai Gilo
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
Dhaval Nagar
 
CICD in the World of Serverless
CICD in the World of ServerlessCICD in the World of Serverless
CICD in the World of Serverless
Srushith Repakula
 
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
DataArt
 
FME Cloud Tips for Success
FME Cloud Tips for SuccessFME Cloud Tips for Success
FME Cloud Tips for Success
Safe Software
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
iguazio
 

What's hot (20)

Algolia - Hosted Search API
Algolia - Hosted Search API Algolia - Hosted Search API
Algolia - Hosted Search API
 
Infrastructure as code (iac) - Terraform for AWS
Infrastructure as code (iac) - Terraform for AWSInfrastructure as code (iac) - Terraform for AWS
Infrastructure as code (iac) - Terraform for AWS
 
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
 
Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018
 
Serverless Computing: Run code, not servers
Serverless Computing: Run code, not serversServerless Computing: Run code, not servers
Serverless Computing: Run code, not servers
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
 
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
 
How to move a mission critical system to 4 AWS regions in one year?
How to move a mission critical system to 4 AWS regions in one year?How to move a mission critical system to 4 AWS regions in one year?
How to move a mission critical system to 4 AWS regions in one year?
 
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
 
What is new in pass summit 2014
What is new in pass summit 2014What is new in pass summit 2014
What is new in pass summit 2014
 
Elk meetup
Elk meetupElk meetup
Elk meetup
 
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
 
Tracing Java Applications on Azure
Tracing Java Applications on AzureTracing Java Applications on Azure
Tracing Java Applications on Azure
 
Algolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide APIAlgolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide API
 
How we use the play framework
How we use the play frameworkHow we use the play framework
How we use the play framework
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
 
CICD in the World of Serverless
CICD in the World of ServerlessCICD in the World of Serverless
CICD in the World of Serverless
 
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
 
FME Cloud Tips for Success
FME Cloud Tips for SuccessFME Cloud Tips for Success
FME Cloud Tips for Success
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
 

Similar to Spark logs made easy

Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
Julien SIMON
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
Sid Anand
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
Amazon Web Services
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Amazon Web Services
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
Pratim Das
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data Pipelines
Bill Liu
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
Helen Rogers
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
Amazon Web Services
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
Sid Anand
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
Amazon Web Services
 
What are DevOps Application Patterns on AWS…and why do I need them?
What are DevOps Application Patterns on AWS…and why do I need them?What are DevOps Application Patterns on AWS…and why do I need them?
What are DevOps Application Patterns on AWS…and why do I need them?
DevOps.com
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
Julien SIMON
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
 
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker GmbH
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
Databricks
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Amazon Web Services
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
AWS Summits
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
Tensult
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads
Amazon Web Services
 

Similar to Spark logs made easy (20)

Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)MLOps with serverless architectures (October 2018)
MLOps with serverless architectures (October 2018)
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
Build, Deploy, and Serve Machine-Learning Models on Streaming Data Using Amaz...
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data Pipelines
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
AWS re:Invent 2016: Amazon Aurora Best Practices: Getting the Best Out of You...
 
Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)Cloud Native Data Pipelines (GoTo Chicago 2017)
Cloud Native Data Pipelines (GoTo Chicago 2017)
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
What are DevOps Application Patterns on AWS…and why do I need them?
What are DevOps Application Patterns on AWS…and why do I need them?What are DevOps Application Patterns on AWS…and why do I need them?
What are DevOps Application Patterns on AWS…and why do I need them?
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads
 

Recently uploaded

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 

Recently uploaded (20)

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 

Spark logs made easy

  • 1. Logging Apache Spark: How we made it easy From my past endeavours at Nielsen Simona Meriam Big Data Engineer, Aidoc
  • 2. Agenda 01 Architecture overview 02 How we used to do it 03 Solution overview 04 Main obstacles 05 Pretty Kibana charts 06 Future and Add-ons
  • 3. ● Simona Meriam ● Big Data Engineer @ Aidoc ● Data lover ● Concert goer ● Japan enthusiast whoami
  • 4. Nielsen’s Data Architecture Overview Nielsen’s Data Architecture Overview Nielsen’s Data Architecture overview
  • 5. ● Launching a new Spark app ● Production issues When would you need to access your logs? How we used to do it
  • 6. ● Access directly on Server ‒ SSH to server running the driver ‒ Decompress and/or access log ● Access through YARN ● Download log file from s3 Accessing your Apache Spark logs on Amazon EMR How we used to do it
  • 7.
  • 9. 1 2 3 4 5 Configure EMR with bootstrap actions that include team specific settings Bootstrap actions install filebeat and metricbeat Spark logs get automatically shipped to REDIS Logs from REDIS are processed by Logstash and indexed in Elasticsearch Kibana <3 Solution overview
  • 10. ● You can use a bootstrap action to install additional software ● Bootstrap actions run before Amazon EMR installs the applications that you specify when you create the cluster ● If you add nodes to a running cluster, bootstrap actions also run on those nodes in the same way What are AWS bootstrap actions even meant for? Solution overview
  • 15. “YAML template” file # Optional additional fields. These fields can be freely picked # to add additional information to the crawled log files for filtering fields: emr_cluster_id: %EMR_CLUSTER_ID% fields_under_root: true # level: debug # review: 1 ### Multiline options # Multiline can be used for log messages spanning multiple lines. This is common # for Java Stack Traces or C-Line Continuation # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [ multiline.pattern: %REGEXP_PATTERN%
  • 17. • Old is good • Change is work Data Engineers Main obstacles
  • 19. 19
  • 20. ● Replace REDIS with Kafka ● Collection from other EC2 based services – DRUID – Kafka Future and add-ons