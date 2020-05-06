Successfully reported this slideshow.
Flink acceptance testing And State compatibility checking Catlyn Kong catlynk@yelp.com
Yelp’s Mission Connecting people with great local businesses
FLINK AT YELP Powering Data Enrichment and Transformation as a Service StreamSQL manipulations and multi-stream unwindowed...
FLINK AT YELP Powering Connector Ecosystem Cassandra, Elasticsearch, Redshift, S3, MySQL, etc. Apache Beam All python stre...
Flink acceptance testing framework (a.k.a Flink Compose) at YelpWhat you’ll see OUTLINE Lessons learned State compatibilit...
Acceptance Testing
A test conducted to determine if the requirements of a specification or contract are met What is acceptance testing? ACCEP...
Why is it hard? ACCEPTANCE TESTING Too many moving blocks! Flink Service Kafka Schema Registry Database Dependencies
Our solution? ACCEPTANCE TESTING Built on top of yelp-compose which provides better integration with yelp infrastructure. ...
What does it looks like? ACCEPTANCE TESTING test_script Flink Standalone Cluster Dependencies Kafka Schema Registration In...
Lessons Learned
Ordering of Operations LESSONS LEARNED Make sure the assumption of ordering is met Test Kafka Flink app 1) Write 3) Read 2...
LESSONS LEARNED Deterministic results using Event Time. SELECT business_id, COUNT(*) as review_count, FROM biz_reviews, GR...
LESSONS LEARNED Deterministic results using Event Time. Count # reviews for biz in 2 minute non-overlap windows. Kafka biz...
LESSONS LEARNED Event Time currentWatermark = highestTimestamp - maxOutOfOrderness biz_id: 1 review_id: 1 time: 35 t cW=5 ...
LESSONS LEARNED Deterministic results using Processing Time. SELECT business_id, COUNT(*) as review_count, FROM biz_review...
Processing Time LESSONS LEARNED Deterministic results using Processing Time. Count # reviews for biz in 2 minute non-overl...
Processing Time LESSONS LEARNED biz_id: 1 review_id: 1 time: 35 biz_id: 2 review_id: 2 time: 65 biz_id: 1 review_id: 3 tim...
Best Practices LESSONS LEARNED Publish common testing images Generalize common functionalities ● Setting up consumer & pro...
Another Dimension
Another dimension STATE COMPATIBILITY
State compatibility checking STATE COMPATIBILITY test_script Submit job_1 with v1 of the service Cancel job_1 with savepoi...
CI/CD integration STATE COMPATIBILITY
Looking Forward WHAT’S NEXT Ensure every stateful service is guarded by compatibility check Leverage state API starting fr...
@YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp
Questions/Suggestions? catlynk@yelp.com
Thank you.
  1. 1. Flink acceptance testing And State compatibility checking Catlyn Kong catlynk@yelp.com
  2. 2. Yelp’s Mission Connecting people with great local businesses
  3. 3. FLINK AT YELP Powering Data Enrichment and Transformation as a Service StreamSQL manipulations and multi-stream unwindowed joins as a service Bot Detection User Activity Sessions Customized filters and ML features to provide trustworthy data Multi-platform user activity sessions from event logs
  4. 4. FLINK AT YELP Powering Connector Ecosystem Cassandra, Elasticsearch, Redshift, S3, MySQL, etc. Apache Beam All python stream processing
  5. 5. Flink acceptance testing framework (a.k.a Flink Compose) at YelpWhat you’ll see OUTLINE Lessons learned State compatibility checking using Flink Compose
  6. 6. Acceptance Testing
  7. 7. A test conducted to determine if the requirements of a specification or contract are met What is acceptance testing? ACCEPTANCE TESTING Often involves orchestrating several services, creating fixture data, and running some type of test driver.
  8. 8. Why is it hard? ACCEPTANCE TESTING Too many moving blocks! Flink Service Kafka Schema Registry Database Dependencies
  9. 9. Our solution? ACCEPTANCE TESTING Built on top of yelp-compose which provides better integration with yelp infrastructure. Provides a set of libraries that takes care of common tasks Great way to verify the correctness for developers Lower the overhead across applications
  10. 10. What does it looks like? ACCEPTANCE TESTING test_script Flink Standalone Cluster Dependencies Kafka Schema Registration Input Test Stream Output Test Stream Flink Compose Sandbox Read Write Submit *.jar job to the cluster
  11. 11. Lessons Learned
  12. 12. Ordering of Operations LESSONS LEARNED Make sure the assumption of ordering is met Test Kafka Flink app 1) Write 3) Read 2) Process Ordering of operations is important
  13. 13. LESSONS LEARNED Deterministic results using Event Time. SELECT business_id, COUNT(*) as review_count, FROM biz_reviews, GROUP BY business_id, TUMBLE(rowtime, INTERVAL '2' MINUTE) Event time is the timestamp associated with the message, maxOutOfOrderness is 30 sec. Kafka msg1 msg2 msg3 msg4 msg5 Event Time
  14. 14. LESSONS LEARNED Deterministic results using Event Time. Count # reviews for biz in 2 minute non-overlap windows. Kafka biz_id: 1 review_id: 1 time: 35 Event time is the timestamp associated with the message maxOutOfOrderness is 30 sec. biz_id: 2 review_id: 2 time: 65 biz_id: 1 review_id: 3 time: 95 biz_id: 1 review_id: 4 time: 125 Event Time
  15. 15. LESSONS LEARNED Event Time currentWatermark = highestTimestamp - maxOutOfOrderness biz_id: 1 review_id: 1 time: 35 t cW=5 cW=35 cW=65 cW=95 biz_id: 1 # review: 1 window: [0, 120) biz_id: 2 review_id: 2 time: 65 biz_id: 2 # review: 1 window: [0, 120) biz_id: 1 review_id: 3 time: 95 biz_id: 1 # review: 2 window: [0, 120) biz_id: 1 review_id: 4 time: 125 biz_id: 1 # review: 1 window: [120, 240) biz_id: 1 review_id: 4 time: 155 cW=125 biz_id: 1 # review: 1 window: [120, 240) Event time? Careful with watermark manipulation!
  16. 16. LESSONS LEARNED Deterministic results using Processing Time. SELECT business_id, COUNT(*) as review_count, FROM biz_reviews, GROUP BY business_id, TUMBLE(proctime, INTERVAL '2' MINUTE) Kafka msg1 msg2 msg3 msg4 msg5 Processing Time
  17. 17. Processing Time LESSONS LEARNED Deterministic results using Processing Time. Count # reviews for biz in 2 minute non-overlap windows. Kafka biz_id: 1 review_id: 1 time: 35 biz_id: 2 review_id: 2 time: 65 biz_id: 1 review_id: 3 time: 95 biz_id: 1 review_id: 4 time: 125
  18. 18. Processing Time LESSONS LEARNED biz_id: 1 review_id: 1 time: 35 biz_id: 2 review_id: 2 time: 65 biz_id: 1 review_id: 3 time: 95 biz_id: 1 review_id: 4 time: 125 Window Start We have no control over when Flink sees the messages Window Start Proctime? Wait till start of window to produce!
  19. 19. Best Practices LESSONS LEARNED Publish common testing images Generalize common functionalities ● Setting up consumer & producer ● Schema registration ● flink-clientlib to accomodate for upgrades Run your tests in parallel
  20. 20. Another Dimension
  21. 21. Another dimension STATE COMPATIBILITY
  22. 22. State compatibility checking STATE COMPATIBILITY test_script Submit job_1 with v1 of the service Cancel job_1 with savepoint Submit job_2 with v2 of the service Check for potential issues of state restoration Just another test
  23. 23. CI/CD integration STATE COMPATIBILITY
  24. 24. Looking Forward WHAT’S NEXT Ensure every stateful service is guarded by compatibility check Leverage state API starting from Flink 1.9 for smoother state migration Automate test message generation Provide test template generation
  25. 25. @YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp
  26. 26. Questions/Suggestions? catlynk@yelp.com
  27. 27. Thank you.

