Openzipkin conf: Zipkin at Yelp

Prateek Agarwal
Prateek AgarwalSoftware Engineer at Yelp
Zipkin @
Prateek Agarwal
@prat0318
- Prateek Agarwal
- Software Engineer
- Infrastructure team @ Yelp
- Have worked on
- python Swagger clients,
- Zipkin infrastructure,
- Maintaining Cassandra, ES clusters
About me
Yelp’s Mission
Connecting people with great
local businesses.
Yelp Stats
As of Q1 2016
90M 3270%102M
- Zipkin Infrastructure
- pyramid_zipkin / swagger_zipkin
- Lessons learned
- Future plans
Agenda
- 250+ services
- We <3 Python
- Pyramid/uwsgi framework
- SmartStack for service discovery
- Swagger for API schema declaration
- Zipkin transport : Kafka | Zipkin datastore : Cassandra
- Trace is generated on live traffic at a very very low % rate (0.005%)
- Can also be generated on-demand by providing a particular query-param
Infrastructure overview
Infrastructure overview
Let’s talk about a scenario where service A calls B.
pyramid_zipkin
- A simple decorator around every request
- Able to handle scribe | kafka transport
- Attaches a `unique_request_id` to every request
- No changes needed in the service logic
- Ability to add annotations using python’s `logging` module
- Ability to add custom spans Service B
pyramid_zipkin
pyramid
uwsgi
pyramid_zipkin
Service B
pyramid_zipkin
pyramid
uwsgi
- Ability to add custom spans
swagger_zipkin
- Eliminates the manual work of attaching zipkin headers
- Decorates over swagger clients
- swaggerpy (swagger v1.2)
- bravado (swagger v2.0)
Service A
swagger_client
swagger_zipkin
Lessons Learned
- Cassandra is an excellent datastore for heavy writes
- Typical prod writes/sec : 15k
- It was able to even handle 100k writes/sec
Lessons Learned
- Allocating offheap memory for Cassandra helped in reducing write latency by 2x
- Pending compactions also went down.
Lessons Learned
- With more services added, fetching from Kafka became a bottleneck
- Solutions tried:
- Adding more kafka partitions
- Running more instances of collector
- Adding multiple kafka consumer threads
- with appropriate changes in openzipkin/zipkin
- WIN
- Batching up messages before sending to Kafka
- with appropriate changes in openzipkin/zipkin
- BIG WIN
Lessons Learned
- With more services added, fetching from Kafka became a bottleneck
- Solutions tried:
- Adding more kafka partitions
- Running more instances of collector
- Adding multiple kafka consumer threads
- with appropriate changes in openzipkin/zipkin
- WIN
- Batching up messages before sending to Kafka
- with appropriate changes in openzipkin/zipkin
- BIG WIN
Lessons Learned
- With more services added, fetching from Kafka became a bottleneck
- Solutions tried:
- Adding more kafka partitions
- Running more instances of collector
- Adding multiple kafka consumer threads
- with appropriate changes in openzipkin/zipkin
- WIN
- Batching up messages before sending to Kafka
- with appropriate changes in openzipkin/zipkin
- BIG WIN
Lessons Learned
- With more services added, fetching from Kafka became a bottleneck
- Solutions tried:
- Running more instances of collector
- Adding more kafka partitions
- Adding multiple kafka consumer threads
- with appropriate changes in openzipkin/zipkin
- WIN
- Batching up messages before sending to Kafka
- with appropriate changes in openzipkin/zipkin
- BIG WIN
Lessons Learned
- With more services added, fetching from Kafka became a bottleneck
- Solutions tried:
- Running more instances of collector
- Adding more kafka partitions
- Adding multiple kafka consumer threads
- with appropriate changes in openzipkin/zipkin
- WIN
- Batching up messages before sending to Kafka
- with appropriate changes in openzipkin/zipkin
- BIG WIN
Future Plans
- To be used during deployments to check degradations
- Validate the differences in number of downstream calls
- Check against any new dependency sneaking in
- Time differences in the spans
- Create trace aggregation infrastructure using Splunk (wip)
- A missing part of Zipkin
- Redeploy zipkin dependency graph service after improvements
- The service was unprovisioned because it created 100s of Gigs of /tmp files
- These files got purged after the run (in ~1-2 hours)
- Meanwhile, ops got alerted due to low disk space remaining
- Didn’t give much of a value addition
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp
1 of 19

Recommended

Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019 by
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
4K views117 slides
Gobblin on-aws by
Gobblin on-awsGobblin on-aws
Gobblin on-awsVasanth Rajamani
564 views9 slides
Introducing Exactly Once Semantics To Apache Kafka by
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaApurva Mehta
6.1K views68 slides
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People by
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
1.6K views300 slides
Streaming and Messaging by
Streaming and MessagingStreaming and Messaging
Streaming and MessagingXin Wang
870 views20 slides
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (... by
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...confluent
2.9K views15 slides

More Related Content

What's hot

Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat... by
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent
12K views31 slides
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe... by
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...confluent
5.7K views86 slides
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka... by
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
7.8K views42 slides
What We Learned From Building a Modern Messaging and Streaming System for Cloud by
What We Learned From Building a Modern Messaging and Streaming System for CloudWhat We Learned From Building a Modern Messaging and Streaming System for Cloud
What We Learned From Building a Modern Messaging and Streaming System for CloudStreamNative
108 views35 slides
Netflix Keystone Pipeline at Samza Meetup 10-13-2015 by
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
1.2K views80 slides
Pulsarctl & Pulsar Manager by
Pulsarctl & Pulsar ManagerPulsarctl & Pulsar Manager
Pulsarctl & Pulsar ManagerStreamNative
516 views43 slides

What's hot(20)

Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat... by confluent
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
confluent12K views
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe... by confluent
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
confluent5.7K views
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka... by confluent
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent7.8K views
What We Learned From Building a Modern Messaging and Streaming System for Cloud by StreamNative
What We Learned From Building a Modern Messaging and Streaming System for CloudWhat We Learned From Building a Modern Messaging and Streaming System for Cloud
What We Learned From Building a Modern Messaging and Streaming System for Cloud
StreamNative108 views
Netflix Keystone Pipeline at Samza Meetup 10-13-2015 by Monal Daxini
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini1.2K views
Pulsarctl & Pulsar Manager by StreamNative
Pulsarctl & Pulsar ManagerPulsarctl & Pulsar Manager
Pulsarctl & Pulsar Manager
StreamNative516 views
Apache Kafka 0.8 basic training - Verisign by Michael Noll
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll178K views
Cloud Foundry on OpenStack - An Experience Report | anynines by anynines GmbH
Cloud Foundry on OpenStack - An Experience Report | anynines Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines
anynines GmbH1.6K views
Beaming flink to the cloud @ netflix ff 2016-monal-daxini by Monal Daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Monal Daxini928 views
Kafka At Scale in the Cloud by confluent
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent11.2K views
What's new with Apache Camel 3? | DevNation Tech Talk by Red Hat Developers
What's new with Apache Camel 3? | DevNation Tech TalkWhat's new with Apache Camel 3? | DevNation Tech Talk
What's new with Apache Camel 3? | DevNation Tech Talk
Red Hat Developers3.2K views
Unbounded bounded-data-strangeloop-2016-monal-daxini by Monal Daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
Monal Daxini1.8K views
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story by Joan Viladrosa Riera
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
Kubernetes at Datadog the very hard way by Laurent Bernaille
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
Laurent Bernaille3.3K views
Building flexible ETL pipelines with Apache Camel on Quarkus by Ivelin Yanev
Building flexible ETL pipelines with Apache Camel on QuarkusBuilding flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on Quarkus
Ivelin Yanev395 views
TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ... by tdc-globalcode
TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...
TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...
tdc-globalcode525 views
Building a FaaS with pulsar by StreamNative
Building a FaaS with pulsarBuilding a FaaS with pulsar
Building a FaaS with pulsar
StreamNative479 views
Cloud Infrastructures Slide Set 8 - More Cloud Technologies - Mesos, Spark | ... by anynines GmbH
Cloud Infrastructures Slide Set 8 - More Cloud Technologies - Mesos, Spark | ...Cloud Infrastructures Slide Set 8 - More Cloud Technologies - Mesos, Spark | ...
Cloud Infrastructures Slide Set 8 - More Cloud Technologies - Mesos, Spark | ...
anynines GmbH659 views
Apache Kafka - Martin Podval by Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval3.4K views

Viewers also liked

Microservices Tracing with Spring Cloud and Zipkin by
Microservices Tracing with Spring Cloud and ZipkinMicroservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMarcin Grzejszczak
8.5K views57 slides
Guidelines For The Animation Of J P I C by
Guidelines For The Animation Of  J P I CGuidelines For The Animation Of  J P I C
Guidelines For The Animation Of J P I CchitoA
882 views59 slides
Distributed Tracing by
Distributed TracingDistributed Tracing
Distributed Tracingsoasme
1.1K views26 slides
Securing Your Deployment Pipeline With Docker by
Securing Your Deployment Pipeline With DockerSecuring Your Deployment Pipeline With Docker
Securing Your Deployment Pipeline With DockerContainer Solutions
1.8K views40 slides
Zipkin - Strangeloop by
Zipkin - StrangeloopZipkin - Strangeloop
Zipkin - StrangeloopJohan Oskarsson
9.1K views66 slides
Troubleshooting RabbitMQ and services that use it by
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itMichael Klishin
2K views82 slides

Viewers also liked(19)

Microservices Tracing with Spring Cloud and Zipkin by Marcin Grzejszczak
Microservices Tracing with Spring Cloud and ZipkinMicroservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and Zipkin
Marcin Grzejszczak8.5K views
Guidelines For The Animation Of J P I C by chitoA
Guidelines For The Animation Of  J P I CGuidelines For The Animation Of  J P I C
Guidelines For The Animation Of J P I C
chitoA882 views
Distributed Tracing by soasme
Distributed TracingDistributed Tracing
Distributed Tracing
soasme1.1K views
Securing Your Deployment Pipeline With Docker by Container Solutions
Securing Your Deployment Pipeline With DockerSecuring Your Deployment Pipeline With Docker
Securing Your Deployment Pipeline With Docker
Container Solutions1.8K views
Troubleshooting RabbitMQ and services that use it by Michael Klishin
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use it
Michael Klishin2K views
distributed tracing in 5 minutes by Dan Kuebrich
distributed tracing in 5 minutesdistributed tracing in 5 minutes
distributed tracing in 5 minutes
Dan Kuebrich8.5K views
IPaaS 2.0: Fuse Integration Services (Robert Davies & Keith Babo) by Red Hat Developers
IPaaS 2.0: Fuse Integration Services (Robert Davies & Keith Babo)IPaaS 2.0: Fuse Integration Services (Robert Davies & Keith Babo)
IPaaS 2.0: Fuse Integration Services (Robert Davies & Keith Babo)
Red Hat Developers2.1K views
Microservices architecture examples by Channy Yun
Microservices architecture examplesMicroservices architecture examples
Microservices architecture examples
Channy Yun9.1K views
Distributed Tracing with OpenTracing, ZipKin and Kubernetes by Container Solutions
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Container Solutions9.2K views
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV by Marcin Grzejszczak
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEVMicroservices Tracing With Spring Cloud and Zipkin @CybercomDEV
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV
Marcin Grzejszczak3.3K views
Tracing Microservices with Zipkin by takezoe
Tracing Microservices with ZipkinTracing Microservices with Zipkin
Tracing Microservices with Zipkin
takezoe13.8K views
Monitoring Microservices by Weaveworks
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
Weaveworks14.6K views
Technical Seminar PPT by Kshitiz_Vj
Technical Seminar PPTTechnical Seminar PPT
Technical Seminar PPT
Kshitiz_Vj73.8K views
Dockercon State of the Art in Microservices by Adrian Cockcroft
Dockercon State of the Art in MicroservicesDockercon State of the Art in Microservices
Dockercon State of the Art in Microservices
Adrian Cockcroft97.5K views
Advantages and Disadvantages of Technology by Pave Maris Cortez
Advantages and Disadvantages of TechnologyAdvantages and Disadvantages of Technology
Advantages and Disadvantages of Technology
Pave Maris Cortez170.6K views
Implementing microservices tracing with spring cloud and zipkin (spring one) by Reshmi Krishna
Implementing microservices tracing with spring cloud and zipkin (spring one)Implementing microservices tracing with spring cloud and zipkin (spring one)
Implementing microservices tracing with spring cloud and zipkin (spring one)
Reshmi Krishna20.7K views
Technology powerpoint presentations by ismailraesha
Technology powerpoint presentationsTechnology powerpoint presentations
Technology powerpoint presentations
ismailraesha386.6K views

Similar to Openzipkin conf: Zipkin at Yelp

Kafka Explainaton by
Kafka ExplainatonKafka Explainaton
Kafka ExplainatonNguyenChiHoangMinh
15 views53 slides
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story by
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera
488 views74 slides
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ... by
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
390 views39 slides
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ... by
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...Natan Silnitsky
456 views59 slides
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative by
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative Natan Silnitsky
99 views41 slides
Building Event-Driven Systems with Apache Kafka by
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
8.7K views33 slides

Similar to Openzipkin conf: Zipkin at Yelp(20)

[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story by Joan Viladrosa Riera
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ... by Trivadis
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis390 views
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ... by Natan Silnitsky
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...
Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and ...
Natan Silnitsky456 views
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative by Natan Silnitsky
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Natan Silnitsky99 views
Building Event-Driven Systems with Apache Kafka by Brian Ritchie
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie8.7K views
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ... by Natan Silnitsky
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Natan Silnitsky331 views
Apache Kafka by Joe Stein
Apache KafkaApache Kafka
Apache Kafka
Joe Stein23.8K views
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ... by Natan Silnitsky
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Polyglot, fault-tolerant event-driven programming with kafka, kubernetes and ...
Natan Silnitsky172 views
Real time Analytics with Apache Kafka and Apache Spark by Rahul Jain
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain84K views
Kafka indexing service by Seoeun Park
Kafka indexing serviceKafka indexing service
Kafka indexing service
Seoeun Park160 views
Netflix Keystone—Cloud scale event processing pipeline by Monal Daxini
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
Monal Daxini3.3K views
Apache Kafka with Spark Streaming: Real-time Analytics Redefined by Edureka!
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!10K views
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix) by Spark Summit
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark Summit12.6K views
Kafka elastic search meetup 09242018 by Ying Xu
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
Ying Xu175 views
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka by Guido Schmutz
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz2.5K views
10 Lessons Learned from using Kafka with 1000 microservices - java global summit by Natan Silnitsky
10 Lessons Learned from using Kafka with 1000 microservices - java global summit10 Lessons Learned from using Kafka with 1000 microservices - java global summit
10 Lessons Learned from using Kafka with 1000 microservices - java global summit
Natan Silnitsky336 views
Stream Processing using Apache Spark and Apache Kafka by Abhinav Singh
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh3.8K views
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017 by Monal Daxini
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini2.7K views

Recently uploaded

Is Entireweb better than Google by
Is Entireweb better than GoogleIs Entireweb better than Google
Is Entireweb better than Googlesebastianthomasbejan
12 views1 slide
PORTFOLIO 1 (Bret Michael Pepito).pdf by
PORTFOLIO 1 (Bret Michael Pepito).pdfPORTFOLIO 1 (Bret Michael Pepito).pdf
PORTFOLIO 1 (Bret Michael Pepito).pdfbrejess0410
8 views6 slides
UiPath Document Understanding_Day 3.pptx by
UiPath Document Understanding_Day 3.pptxUiPath Document Understanding_Day 3.pptx
UiPath Document Understanding_Day 3.pptxUiPathCommunity
105 views25 slides
Building trust in our information ecosystem: who do we trust in an emergency by
Building trust in our information ecosystem: who do we trust in an emergencyBuilding trust in our information ecosystem: who do we trust in an emergency
Building trust in our information ecosystem: who do we trust in an emergencyTina Purnat
100 views18 slides
WEB 2.O TOOLS: Empowering education.pptx by
WEB 2.O TOOLS: Empowering education.pptxWEB 2.O TOOLS: Empowering education.pptx
WEB 2.O TOOLS: Empowering education.pptxnarmadhamanohar21
16 views16 slides
IETF 118: Starlink Protocol Performance by
IETF 118: Starlink Protocol PerformanceIETF 118: Starlink Protocol Performance
IETF 118: Starlink Protocol PerformanceAPNIC
297 views22 slides

Recently uploaded(10)

PORTFOLIO 1 (Bret Michael Pepito).pdf by brejess0410
PORTFOLIO 1 (Bret Michael Pepito).pdfPORTFOLIO 1 (Bret Michael Pepito).pdf
PORTFOLIO 1 (Bret Michael Pepito).pdf
brejess04108 views
UiPath Document Understanding_Day 3.pptx by UiPathCommunity
UiPath Document Understanding_Day 3.pptxUiPath Document Understanding_Day 3.pptx
UiPath Document Understanding_Day 3.pptx
UiPathCommunity105 views
Building trust in our information ecosystem: who do we trust in an emergency by Tina Purnat
Building trust in our information ecosystem: who do we trust in an emergencyBuilding trust in our information ecosystem: who do we trust in an emergency
Building trust in our information ecosystem: who do we trust in an emergency
Tina Purnat100 views
IETF 118: Starlink Protocol Performance by APNIC
IETF 118: Starlink Protocol PerformanceIETF 118: Starlink Protocol Performance
IETF 118: Starlink Protocol Performance
APNIC297 views
Marketing and Community Building in Web3 by Federico Ast
Marketing and Community Building in Web3Marketing and Community Building in Web3
Marketing and Community Building in Web3
Federico Ast12 views
How to think like a threat actor for Kubernetes.pptx by LibbySchulze1
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptx
LibbySchulze15 views

Openzipkin conf: Zipkin at Yelp

  • 2. - Prateek Agarwal - Software Engineer - Infrastructure team @ Yelp - Have worked on - python Swagger clients, - Zipkin infrastructure, - Maintaining Cassandra, ES clusters About me
  • 3. Yelp’s Mission Connecting people with great local businesses.
  • 4. Yelp Stats As of Q1 2016 90M 3270%102M
  • 5. - Zipkin Infrastructure - pyramid_zipkin / swagger_zipkin - Lessons learned - Future plans Agenda
  • 6. - 250+ services - We <3 Python - Pyramid/uwsgi framework - SmartStack for service discovery - Swagger for API schema declaration - Zipkin transport : Kafka | Zipkin datastore : Cassandra - Trace is generated on live traffic at a very very low % rate (0.005%) - Can also be generated on-demand by providing a particular query-param Infrastructure overview
  • 7. Infrastructure overview Let’s talk about a scenario where service A calls B.
  • 8. pyramid_zipkin - A simple decorator around every request - Able to handle scribe | kafka transport - Attaches a `unique_request_id` to every request - No changes needed in the service logic - Ability to add annotations using python’s `logging` module - Ability to add custom spans Service B pyramid_zipkin pyramid uwsgi
  • 10. swagger_zipkin - Eliminates the manual work of attaching zipkin headers - Decorates over swagger clients - swaggerpy (swagger v1.2) - bravado (swagger v2.0) Service A swagger_client swagger_zipkin
  • 11. Lessons Learned - Cassandra is an excellent datastore for heavy writes - Typical prod writes/sec : 15k - It was able to even handle 100k writes/sec
  • 12. Lessons Learned - Allocating offheap memory for Cassandra helped in reducing write latency by 2x - Pending compactions also went down.
  • 13. Lessons Learned - With more services added, fetching from Kafka became a bottleneck - Solutions tried: - Adding more kafka partitions - Running more instances of collector - Adding multiple kafka consumer threads - with appropriate changes in openzipkin/zipkin - WIN - Batching up messages before sending to Kafka - with appropriate changes in openzipkin/zipkin - BIG WIN
  • 14. Lessons Learned - With more services added, fetching from Kafka became a bottleneck - Solutions tried: - Adding more kafka partitions - Running more instances of collector - Adding multiple kafka consumer threads - with appropriate changes in openzipkin/zipkin - WIN - Batching up messages before sending to Kafka - with appropriate changes in openzipkin/zipkin - BIG WIN
  • 15. Lessons Learned - With more services added, fetching from Kafka became a bottleneck - Solutions tried: - Adding more kafka partitions - Running more instances of collector - Adding multiple kafka consumer threads - with appropriate changes in openzipkin/zipkin - WIN - Batching up messages before sending to Kafka - with appropriate changes in openzipkin/zipkin - BIG WIN
  • 16. Lessons Learned - With more services added, fetching from Kafka became a bottleneck - Solutions tried: - Running more instances of collector - Adding more kafka partitions - Adding multiple kafka consumer threads - with appropriate changes in openzipkin/zipkin - WIN - Batching up messages before sending to Kafka - with appropriate changes in openzipkin/zipkin - BIG WIN
  • 17. Lessons Learned - With more services added, fetching from Kafka became a bottleneck - Solutions tried: - Running more instances of collector - Adding more kafka partitions - Adding multiple kafka consumer threads - with appropriate changes in openzipkin/zipkin - WIN - Batching up messages before sending to Kafka - with appropriate changes in openzipkin/zipkin - BIG WIN
  • 18. Future Plans - To be used during deployments to check degradations - Validate the differences in number of downstream calls - Check against any new dependency sneaking in - Time differences in the spans - Create trace aggregation infrastructure using Splunk (wip) - A missing part of Zipkin - Redeploy zipkin dependency graph service after improvements - The service was unprovisioned because it created 100s of Gigs of /tmp files - These files got purged after the run (in ~1-2 hours) - Meanwhile, ops got alerted due to low disk space remaining - Didn’t give much of a value addition