Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Yelp Leapt to Microservices with More than a Message Queue

770 views

Published on

Without seeing what’s wrong with today’s messaging queues, it can be initially confusing to view Apache Kafka as more. By adding additional functionality, true storage, and guarantees it opens opportunities to take full advantage of a publish/subscribe model.

Joined by Yelp’s Justin Cunningham we’ll see how their infrastructure has quickly evolved. Powered by Kafka, Yelp has made the leap to microservices and is seeing the benefits of efficiency and performance.

Speakers:
Justin Cunningham
Technical Lead, Software Engineer, Yelp

Gehrig Kunz
Technical Product Marketing Manager, Confluent

Published in: Technology
  • Be the first to comment

How Yelp Leapt to Microservices with More than a Message Queue

  1. 1. 1Confidential Messaging done right: How Yelp Leapt to Microservices with more than a Message Queue Justin Cunningham, Technical Lead Software Engineering, Yelp Gehrig Kunz, Technical Product Marketing, Confluent
  2. 2. 2Confidential Streaming in Action Series August 10th Why VR needed Stream Processing to Survive August 16th Pandora Plays Nicely Everywhere with Real-Time Data Pipelines You are here!
  3. 3. 3Confidential Today’s agenda How a streaming platform is ‘messaging done right’ • Review what and why messaging queues are a thing • Gaps you might run into • Building our dream messaging queue How Yelp uses Kafka to move to microservices • Transition to microservices • Using Kafka for their data pipeline • Benefits realized
  4. 4. 4Confidential What is a message queue? From Wikipedia: They use a queue for messaging – the passing of control or of content. Group communication systems provide similar kinds of functionality.
  5. 5. 5Confidential What is a message queue? From Wikipedia: They use a queue for messaging – the passing of control or of content. Group communication systems provide similar kinds of functionality.
  6. 6. 6Confidential What is a message queue?
  7. 7. 7Confidential Why use a messaging queue? • Decouple producers and consumers of data • Greater/more predictable performance • More flexible architecture
  8. 8. 8Confidential A message queue at scale
  9. 9. 9Confidential Trying to scale a message queue Uh oh. Single point of failure.
  10. 10. 10Confidential Trying to scale a message queue Uh oh. Single point of failure.
  11. 11. 11Confidential Trying to scale a message queue Welp, better make this HA.
  12. 12. 12Confidential Trying to scale a message queue Welp we need to increase throughput... Let’s add more.
  13. 13. 13Confidential Trying to scale a message queue Welp we need to increase throughput... Let’s add more.
  14. 14. 14Confidential Problems in the real-world ● Unwieldy to scale ● Performance could be better ● Want to use data
  15. 15. 15Confidential Let’s re-think messaging.
  16. 16. 16Confidential Building our dream messaging queue Publish/Subscribe Model I want to ____________ . have everyone in the company use this. connect whatever I need. survive failure scenarios.
  17. 17. 17Confidential Our dream messaging queue Publish/Subscribe Model + Scalability I want to ____________ . use Oracle, MySQL, MongoDB, Cassandra. add search. recover an entire database. send some test data to a ML library. do something new. Rewind and replay
  18. 18. 18Confidential Our dream messaging queue Publish/Subscribe Model + Scalability + True Storage I want to ____________ . quickly build apps that use data. use real-time data. use accurate, real-time data. not manage additional things.
  19. 19. 19Confidential Messaging Queue to Streaming Platform Publish/Subscribe Model + Scalability + True Storage + Stream Processing Streaming Platform
  20. 20. 20Confidential What a streaming platform enables Access (and process) what you need Be flexible for the future Simplify your infrastructure
  21. 21. 21Confidential Messaging Queue to Streaming Platform Netflix Uses Kafka to power their data pipeline, supporting a trillion messages a day. Line Line uses Kafka’s stream processing to perform streaming ETL on millions of messages daily. The New York Times Kafka is the ‘source of truth’ storing every article since 1851. Yelp Let’s talk to Justin.
  22. 22. 22Confidential Connecting people with great local businesses. Yelp’s Mission
  23. 23. 23Confidential As of Q3 2016 97M 3274%115M Yelp stats
  24. 24. 24Confidential Start with a monolith 2011: ~1,000,000 lines Why build a data pipeline?
  25. 25. 25Confidential 2014: ~150 services Service Service Service Service Service Services Solve Everything!
  26. 26. 26Confidential Metcalfe's Law 22,350 Omni-Directional Communication Paths 11,175 Bi-Directional Communication Paths 150 Services Almost Everything
  27. 27. 27Confidential What about the data?
  28. 28. 28Confidential 86 Million is a Magic Number I want to process all reviews every day. I want to make 1,000 requests per second to your service, every second, forever. Reasonable Becomes Unreasonable
  29. 29. 29Confidential What if we implement a raw bulk-data API? We could pass it arbitrary SQL to generalize it. What if we take DB snapshots and pass them around? Flags Prefs Category 33939 533248 37 Potential Solutions?
  30. 30. 30Confidential session.begin() business = Business() session.add(business) session.commit() my_service_client.notify_business_changed( business.id) Failing at Failure
  31. 31. 31Confidential session.begin() business = Business() session.add(business) my_service_client.notify_business_changed( business.id) session.commit() Failing at Failure
  32. 32. 32Confidential Service Service Service Service Service Service Message Bus n2 -> n How do we start solving these problems?
  33. 33. 33Confidential Why Kafka? ● High Performance ● Persistent ● Reliable ● Replicated ● Scalable ● Log Compaction
  34. 34. 34Confidential + Offset Key Data 0 47 The 1 21 Quick 2 18 Brown 3 47 Fox 4 21 Jumps Key Offset 18 2 47 3 21 4 Offset Key Data 2 18 Brown 3 47 Fox 4 21 Jumps ...
  35. 35. 35Confidential COMMUNICATION What is the Data Pipeline?
  36. 36. 36Confidential Schema 1 Schema 1 Schema 2 Schema 2 Schema 3Schema 2 Consumer loads schema dynamically as it receives messages SCHEMATIZER Load Schema 1 Load Schema 3 Load Schema 2 PIPE (topic w/registered schema) PRODUCER CONSUMER CONSUMER SCHEMATIZER Guaranteed Format and Compatibility
  37. 37. 37Confidential Guaranteed Data Availability
  38. 38. 38Confidential CORE SCHEMA STORE All About the Data
  39. 39. 39Confidential CORE SCHEMA STORE Application Event Logs MySQL Code / Stream Processor Amazon Redshift S3 Application Event Logs MySQL Code / Stream Processor How We Use It
  40. 40. 40Confidential Processing Business Changes Amazon Redshift S3 Elasticsearch Code MySQL Stream Processor How We Use It
  41. 41. 41Confidential Event-First Architecture Event-Log as System of Record CORE SCHEMA STORE Web Workers Everything Else Glorious Future?
  42. 42. 42Confidential Datapipe Producer Bunsen Scribe Replication Handler MySQL Other Data Stores Yelp-main Services MONK DP DP JSON SCHEMATIZER KAFKAKAFKA • Paastorm • Python • Flatmap • Flink* • Java/Scala • Advanced Primitives & Stream SQL Recursive MySQL Services Yelp-main Redshift S3 Flink Kafka Connect Cassandra ES Overall Data Infra
  43. 43. 43Confidential How it’s helped Yelp ● More than $10 million in direct savings ● Eliminated many duplicative systems ● Higher quality data, metrics and analytics ● Faster, Better Decision Making
  44. 44. 44Confidential A streaming platform can be messaging done right • Decouple and modernize your infrastructure • Reach company-wide scale • Build streaming applications and data pipelines (like Yelp’s) with real-time data
  45. 45. 45Confidential Streaming in Action Series Up next – August 10th Why VR needed Stream Processing to Survive August 16th Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
  46. 46. 46Confidential @YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp Download Confluent Open Source Join the Slack community Check out Kafka Summit! August 28th in San Francisco

×