SlideShare a Scribd company logo
Ayan Sen,Vinay Sen
@ All Things Open 2018, Raleigh
Observability at Expedia
Observability
@ All Things Open 2018, Raleigh
@ All Things Open 2018, Raleigh
Observability Events
Logs : stateless events generated by
the application
Metrics : timeseries events containing
measurements
Traces : correlated events to track
cause of ordering
@ All Things Open 2018, Raleigh
Logs
@ All Things Open 2018, Raleigh
Metrics
@ All Things Open 2018, Raleigh
Traces
Span typically represents a service call
or a block of code
Trace represents a collection of spans
correlated by an identifier
Distribution tracing tracks production requests as they track different parts of
the architecture
@ All Things Open 2018, Raleigh
Traces – Context Propagation
@ All Things Open 2018, Raleigh
Traces –Why do I care
@ All Things Open 2018, Raleigh
Traces - Event
@ All Things Open 2018, Raleigh
DistributedTracing Landscape
@ All Things Open 2018, Raleigh
Log
Metric
Trace
Observability
@ All Things Open 2018, Raleigh
A resilient, scalable tracing and analysis system
Haystack Architecture
@ All Things Open 2018, Raleigh
@ All Things Open 2018, Raleigh
 Traces
 Trends
 Service Graph
 Anomaly Detection
 Pipes
Haystack Subsystems
@ All Things Open 2018, Raleigh
Traces
@ All Things Open 2018, Raleigh
Traces
@ All Things Open 2018, Raleigh
Traces Subsystem Architecture
@ All Things Open 2018, Raleigh
Trends
@ All Things Open 2018, Raleigh
Trends Subsystem Architecture
@ All Things Open 2018, Raleigh
Service Graph Subsystem
@ All Things Open 2018, Raleigh
Service Graph Subsystem
@ All Things Open 2018, Raleigh
Anomaly Detection Subsystem
@ All Things Open 2018, Raleigh
Pipes Subsystem
@ All Things Open 2018, Raleigh
 Multiple brands
 More than few hundred services
 > 400k/sec spans ingestion
 40 node Kafka cluster
 65+ node c5.xlarge k8s cluster
 50 node c5.xlarge Cassandra
 Tens of ES node cluster
 SupportOpenTracing clients in Java, NodeJS, Go & Python (coming soon).
 Support integration with Istio
 Zipkin to Haystack span converter
 Deployment done throughTerraform scripts
Haystack @ Expedia
@ All Things Open 2018, Raleigh
 https://expediadotcom.github.io/haystack/
 https://github.com/ExpediaDotCom/haystack
 https://github.com/ExpediaDotCom/haystack-idl
 https://github.com/ExpediaDotCom/haystack-commons
 https://github.com/ExpediaDotCom/haystack-traces
 https://github.com/ExpediaDotCom/haystack-client-java
 https://github.com/ExpediaDotCom/haystack-agent
 https://github.com/ExpediaDotCom/haystack-trends
 https://github.com/ExpediaDotCom/haystack-collector
 https://github.com/ExpediaDotCom/haystack-pipes
 https://github.com/ExpediaDotCom/haystack-metrics
 https://github.com/ExpediaDotCom/haystack-service-graph
References
@ All Things Open 2018, Raleigh
Questions?

More Related Content

What's hot

Security Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackSecurity Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic Stack
Elasticsearch
 
Monitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at SkyMonitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at Sky
Elasticsearch
 
Atlassian User Group Toronto Hosted By Elasity & AWS
Atlassian User Group Toronto Hosted By Elasity & AWSAtlassian User Group Toronto Hosted By Elasity & AWS
Atlassian User Group Toronto Hosted By Elasity & AWS
iTMethods
 
kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発
Teppei Sato
 
ARIN 35: Internet Number Resource Status Report
ARIN 35: Internet Number Resource Status ReportARIN 35: Internet Number Resource Status Report
ARIN 35: Internet Number Resource Status Report
ARIN
 
NRO Number Resource Status Report
NRO Number Resource Status ReportNRO Number Resource Status Report
NRO Number Resource Status Report
APNIC
 
RIPE Atlas Streaming
RIPE Atlas StreamingRIPE Atlas Streaming
RIPE Atlas Streaming
RIPE NCC
 
RIPE Atlas streaming
RIPE Atlas streamingRIPE Atlas streaming
RIPE Atlas streaming
Massimo Candela
 
Open source historian
Open source historianOpen source historian
Open source historian
Geoff Nunan
 
SW360 Update Tooling Telco
SW360 Update Tooling TelcoSW360 Update Tooling Telco
SW360 Update Tooling Telco
Shane Coughlan
 
Measuring slack api_performance_using_druid
Measuring slack api_performance_using_druidMeasuring slack api_performance_using_druid
Measuring slack api_performance_using_druid
Ananth PackkilDurai
 
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
Sirris
 
Boost dataviz with Python, OW2online, June 2020
Boost dataviz with Python, OW2online, June 2020Boost dataviz with Python, OW2online, June 2020
Boost dataviz with Python, OW2online, June 2020
OW2
 
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
Shane Coughlan
 
Rule-Driven, Fully-Configurable Asset Tracking with GIS
Rule-Driven, Fully-Configurable Asset Tracking with GISRule-Driven, Fully-Configurable Asset Tracking with GIS
Rule-Driven, Fully-Configurable Asset Tracking with GIS
SSP Innovations
 
Gerrit topics support with AWS Lambda
Gerrit topics support with AWS LambdaGerrit topics support with AWS Lambda
Gerrit topics support with AWS Lambda
Artem Nikitin
 
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
SensorUp
 
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
apidays
 
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
apidays
 
SensorThings API Webinar - #1 of 4 - Introduction
SensorThings API Webinar - #1 of 4 - IntroductionSensorThings API Webinar - #1 of 4 - Introduction
SensorThings API Webinar - #1 of 4 - Introduction
SensorUp
 

What's hot (20)

Security Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackSecurity Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic Stack
 
Monitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at SkyMonitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at Sky
 
Atlassian User Group Toronto Hosted By Elasity & AWS
Atlassian User Group Toronto Hosted By Elasity & AWSAtlassian User Group Toronto Hosted By Elasity & AWS
Atlassian User Group Toronto Hosted By Elasity & AWS
 
kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発kintoneがAWSで目指すDevOpsQAな開発
kintoneがAWSで目指すDevOpsQAな開発
 
ARIN 35: Internet Number Resource Status Report
ARIN 35: Internet Number Resource Status ReportARIN 35: Internet Number Resource Status Report
ARIN 35: Internet Number Resource Status Report
 
NRO Number Resource Status Report
NRO Number Resource Status ReportNRO Number Resource Status Report
NRO Number Resource Status Report
 
RIPE Atlas Streaming
RIPE Atlas StreamingRIPE Atlas Streaming
RIPE Atlas Streaming
 
RIPE Atlas streaming
RIPE Atlas streamingRIPE Atlas streaming
RIPE Atlas streaming
 
Open source historian
Open source historianOpen source historian
Open source historian
 
SW360 Update Tooling Telco
SW360 Update Tooling TelcoSW360 Update Tooling Telco
SW360 Update Tooling Telco
 
Measuring slack api_performance_using_druid
Measuring slack api_performance_using_druidMeasuring slack api_performance_using_druid
Measuring slack api_performance_using_druid
 
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
2015 12-02-innovative-tools-wind-turbine-performance-assesment-3 e
 
Boost dataviz with Python, OW2online, June 2020
Boost dataviz with Python, OW2online, June 2020Boost dataviz with Python, OW2online, June 2020
Boost dataviz with Python, OW2online, June 2020
 
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
12th Meeting OpenChain Reference Tooling Work Group - 25th March - Slides
 
Rule-Driven, Fully-Configurable Asset Tracking with GIS
Rule-Driven, Fully-Configurable Asset Tracking with GISRule-Driven, Fully-Configurable Asset Tracking with GIS
Rule-Driven, Fully-Configurable Asset Tracking with GIS
 
Gerrit topics support with AWS Lambda
Gerrit topics support with AWS LambdaGerrit topics support with AWS Lambda
Gerrit topics support with AWS Lambda
 
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
 
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
APIdays Paris 2018 - Accelerate Innovation & Aircraft Production by using API...
 
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
APIdays Paris 2018 - Hack your legacy, from mutualism to Open Source! Chris W...
 
SensorThings API Webinar - #1 of 4 - Introduction
SensorThings API Webinar - #1 of 4 - IntroductionSensorThings API Webinar - #1 of 4 - Introduction
SensorThings API Webinar - #1 of 4 - Introduction
 

Similar to Observability at Expedia

Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
Karin Patenge
 
Les logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiéeLes logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiée
Elasticsearch
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Amazon Web Services
 
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Amazon Web Services
 
Adding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIsAdding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIs
Michael Petychakis
 
LeverX - Live Engineering with IoT on SAP Leonardo
LeverX - Live Engineering with IoT on SAP LeonardoLeverX - Live Engineering with IoT on SAP Leonardo
LeverX - Live Engineering with IoT on SAP Leonardo
Eric Stajda
 
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Amazon Web Services
 
SAP on AWS: SAPPHIRE NOW 2018 Recap
SAP on AWS: SAPPHIRE NOW 2018 RecapSAP on AWS: SAPPHIRE NOW 2018 Recap
SAP on AWS: SAPPHIRE NOW 2018 Recap
Amazon Web Services
 
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Amazon Web Services
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
Harry Frost
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
Makoto Yui
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Amazon Web Services
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
Amazon Web Services
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
Farid Gurbanov
 
Slides: How to Select a PaaS
Slides: How to Select a PaaSSlides: How to Select a PaaS
Slides: How to Select a PaaS
Altoros
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Elasticsearch
 

Similar to Observability at Expedia (20)

Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraphOracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph
 
Les logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiéeLes logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiée
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
Instrumenting Applications for Observability Using AWS X-Ray (DEV402-R2) - AW...
 
Adding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIsAdding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIs
 
LeverX - Live Engineering with IoT on SAP Leonardo
LeverX - Live Engineering with IoT on SAP LeonardoLeverX - Live Engineering with IoT on SAP Leonardo
LeverX - Live Engineering with IoT on SAP Leonardo
 
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
 
SAP on AWS: SAPPHIRE NOW 2018 Recap
SAP on AWS: SAPPHIRE NOW 2018 RecapSAP on AWS: SAPPHIRE NOW 2018 Recap
SAP on AWS: SAPPHIRE NOW 2018 Recap
 
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
Beyond Infrastructure for SAP on AWS (GPSTEC322) - AWS re:Invent 2018
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Slides: How to Select a PaaS
Slides: How to Select a PaaSSlides: How to Select a PaaS
Slides: How to Select a PaaS
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Combinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificadaCombinação de logs, métricas e rastreamentos para observabilidade unificada
Combinação de logs, métricas e rastreamentos para observabilidade unificada
 

Recently uploaded

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 

Recently uploaded (20)

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 

Observability at Expedia

  • 1. Ayan Sen,Vinay Sen @ All Things Open 2018, Raleigh Observability at Expedia
  • 2. Observability @ All Things Open 2018, Raleigh
  • 3. @ All Things Open 2018, Raleigh Observability Events Logs : stateless events generated by the application Metrics : timeseries events containing measurements Traces : correlated events to track cause of ordering
  • 4. @ All Things Open 2018, Raleigh Logs
  • 5. @ All Things Open 2018, Raleigh Metrics
  • 6. @ All Things Open 2018, Raleigh Traces Span typically represents a service call or a block of code Trace represents a collection of spans correlated by an identifier
  • 7. Distribution tracing tracks production requests as they track different parts of the architecture @ All Things Open 2018, Raleigh Traces – Context Propagation
  • 8. @ All Things Open 2018, Raleigh Traces –Why do I care
  • 9. @ All Things Open 2018, Raleigh Traces - Event
  • 10. @ All Things Open 2018, Raleigh DistributedTracing Landscape
  • 11. @ All Things Open 2018, Raleigh Log Metric Trace Observability
  • 12. @ All Things Open 2018, Raleigh A resilient, scalable tracing and analysis system
  • 13. Haystack Architecture @ All Things Open 2018, Raleigh
  • 14. @ All Things Open 2018, Raleigh  Traces  Trends  Service Graph  Anomaly Detection  Pipes Haystack Subsystems
  • 15. @ All Things Open 2018, Raleigh Traces
  • 16. @ All Things Open 2018, Raleigh Traces
  • 17. @ All Things Open 2018, Raleigh Traces Subsystem Architecture
  • 18. @ All Things Open 2018, Raleigh Trends
  • 19. @ All Things Open 2018, Raleigh Trends Subsystem Architecture
  • 20. @ All Things Open 2018, Raleigh Service Graph Subsystem
  • 21. @ All Things Open 2018, Raleigh Service Graph Subsystem
  • 22. @ All Things Open 2018, Raleigh Anomaly Detection Subsystem
  • 23. @ All Things Open 2018, Raleigh Pipes Subsystem
  • 24. @ All Things Open 2018, Raleigh  Multiple brands  More than few hundred services  > 400k/sec spans ingestion  40 node Kafka cluster  65+ node c5.xlarge k8s cluster  50 node c5.xlarge Cassandra  Tens of ES node cluster  SupportOpenTracing clients in Java, NodeJS, Go & Python (coming soon).  Support integration with Istio  Zipkin to Haystack span converter  Deployment done throughTerraform scripts Haystack @ Expedia
  • 25. @ All Things Open 2018, Raleigh  https://expediadotcom.github.io/haystack/  https://github.com/ExpediaDotCom/haystack  https://github.com/ExpediaDotCom/haystack-idl  https://github.com/ExpediaDotCom/haystack-commons  https://github.com/ExpediaDotCom/haystack-traces  https://github.com/ExpediaDotCom/haystack-client-java  https://github.com/ExpediaDotCom/haystack-agent  https://github.com/ExpediaDotCom/haystack-trends  https://github.com/ExpediaDotCom/haystack-collector  https://github.com/ExpediaDotCom/haystack-pipes  https://github.com/ExpediaDotCom/haystack-metrics  https://github.com/ExpediaDotCom/haystack-service-graph References
  • 26. @ All Things Open 2018, Raleigh Questions?

Editor's Notes

  1. In todays microservice architecture, there’s a lot going on at the backend while serving a request. Multiple service interactions, levels of resiliency, multiple layers of caching etcs. So in case something goes wrong its not always evident as to why it happened. Observability is the ability to understand and troubleshoot our systems in production by collecting a series of timestamped events. These events can be either request scoped/system scoped. A garbage collection event would most likely not associated with a request, whereas a response time event is. For the sake of this presentation we are going to talk about events, which are request scoped.
  2. So what are the kind of events we are talking about here, I think they can be broadly classified into three types 1. Logs 2. Metrics 3. Traces Collecting each kind of events have their own use-cases but they don’t really have very clear boundaries. For instance an audit log which logs the response time for an incoming request in the system can be used to compute the average response time metric. In this case as you see you don’t explicitly collect the metric event.
  3. 1 minute
  4. 1 minute
  5. 2 minutes Distribution tracing tracks production requests by correlating different service interactions in the architecture
  6. 2 minutes Context propagation
  7. Reduce time to triage by contextualizing errors and delays Visualizing latencies over the network 2 minutes
  8. 1 minute
  9. 2 minutes
  10. 3 minutes
  11. 3 minutes
  12. 3 minutes This is the architecture of haystack system. We have kafka as central nervous system backing haystack. 1. Componentized: Haystack includes all of the necessary subsystems to make the system ready to use. But we have also ensured that the overall system is designed in such a way that you can replace any given subsystem to better meet your own needs. 2. Resilient: There is no single point of failure. 3. Scalable: We have completely decentralized our system which helps us to scale every component individually The architecture can be broken down into 3 parts : Subsystems : Haystack includes various subsystems to perform tracing, trending, service graph etc. We will go over these subsystems in a bit. Data Stores : We have 3 data stores, namely Cassandra : To store the raw stitched spans ,ie, traces. ElasticSearch is used as an indexer to query the data faster and MetricTank backed by Cassandra to store trends in metrics 2.0 format. Visualization : Haystack UI is a central place to visualize the processed data such as traces, trends, alerts from various haystack sub-systems. Let’s see the subsystems one by one.
  13. I will be doing deep dives about usecases and architecture about each of the current subsystems haystack has. Traces subsystem is mandatory, others are optional. If you deploy you can configure haystack to have only a subset of them, except Traces. Some of them are dependent on others, to be specific Anomaly detection requires Trends as you need trends to detect anomalies. Outcome of Trends goes in Kafka and Anomaly detection picks it up from there. We would love you to feel free and add any new subsystem on top of Kafka backone. It doesn’t need to be part of haystack’s repositories, if you need something specific to your companies need, you can build that and run on top of haystack’s Kafka. Don’t need to come and talk to us about adding any new thing in.
  14. Demo If you know the traceId you can jump to see the timeline/waterfall showing how a single end user request got severed inside your system. In case of this example, user request was to stark service at /stark/endpoint You might have used Zipkin or Jaeger before Usecase Identifying root cause of errors Perf bottlenecks Understanding of flow of requests Open tracing compliant Use 3 IDs traceId, spanId, and parentSpanId spanId needs to be passed on from a service to the next one, which is your logic pass it in http header or in payload. For the next service when it is logging span it will use the caller’s spanId as its parentSpanId. We are also looking into supporting zipkin style ids, they have a slight but crucial difference in Ids.
  15. Usecase You might not have traceIds handy For example, lets say your site has started showing intermittent errors for US SiteId, you might want to see traces where error = true and siteid = us and check traces for that scenario You can setup a number of whitelisted fields and they become searchable on haystack-ui. Click on any of these traces and you will get the timeline/waterfall view
  16. About the architecture, two apps in traces subsystem Indexer Reader
  17. The Trends subsystem is responsible for reading spans and generating vital service health trends. Introduce a new term operation. What is [user service -> loyalty service example] service operation
  18. The Trends subsystem is responsible for reading spans and generating vital service health trends. This system is loosely coupled and can be run on demand. It has two components : haystack-span-timeseries-transformer - This component is responsible for reading span and converting them to metrics 2.0 compatible MetricPoints. These metricpoints are then pushed back to kafka. haystack-timeseries-aggregator - This app is responsible for reading metric points, aggregating them based on rules and pushing the aggregated metric points to Kafka. The metric points are MetricTank compliant and can be directly consumed by metrictank which is a timeseries database. Currently we compute four trends for each combination of service and operation . These are Total count success_count [count] failure_count [count] duration [mean, median, std-dev, 99 percentile, 95 percentile] Each trend is computed for 4 intervals [1min, 5min, 15min, 1hour].
  19. The Trends subsystem is responsible for reading spans and generating vital service health trends. This system is loosely coupled and can be run on demand. It has two components : haystack-span-timeseries-transformer - This component is responsible for reading span and converting them to metrics 2.0 compatible MetricPoints. These metricpoints are then pushed back to kafka. haystack-timeseries-aggregator - This app is responsible for reading metric points, aggregating them based on rules and pushing the aggregated metric points to Kafka. The metric points are MetricTank compliant and can be directly consumed by metrictank which is a timeseries database. Currently we compute four trends for each combination of service and operation . These are Total count success_count [count] failure_count [count] duration [mean, median, std-dev, 99 percentile, 95 percentile] Each trend is computed for 4 intervals [1min, 5min, 15min, 1hour].
  20. The alerts view is used to show up alerts for any anomalous behavior in service health trends. Currently haystack alerts on total count, failure count and duration (TP99) . These alerts would be powered by adaptive alerting system which is one of the other OSS projects by Expedia.