Testing production
streaming applications
Gyula Fora & Matyas Orhidi
© 2019 Cloudera, Inc. All rights reserved. 2
Decomposing production applications
Pipeline
SourceConnectors
SinkConnectors
Operator unit testing
© 2019 Cloudera, Inc. All rights reserved. 4
Testing Stateless Operators
Stateless testing is simple, no magic involved
Steps
1. Instantiate operator
2. (Create Mock/ListCollector)
3. Send input
4. Validate output
Map
Input 1
Input 2
Input 3
Output 1
Output 2
Output 3
© 2019 Cloudera, Inc. All rights reserved. 5
Testing Stateless Operators
© 2019 Cloudera, Inc. All rights reserved. 6
Testing Stateless Operators
© 2019 Cloudera, Inc. All rights reserved. 7
Testing Stateful Operators
Map
Input 1
Watermark
Input 2
Output 1
Output 2
Output 3
Harness
We need to use the test harness to access
state and timely functionality.
Steps
1. Create operator test harness
2. Send inputs
3. Send watermarks
4. Validate output
© 2019 Cloudera, Inc. All rights reserved. 8
Testing Stateful Operators
© 2019 Cloudera, Inc. All rights reserved. 9
Testing Stateful Operators
Pipeline / flow testing
© 2019 Cloudera, Inc. All rights reserved. 11
Simple end-to-end tests
Steps
1. Prepare test input
2. Run application
3. Collect test output
4. Validate
Pipeline
Test
Input
Test
Output
Validate
© 2019 Cloudera, Inc. All rights reserved. 12
Simple end-to-end tests
© 2019 Cloudera, Inc. All rights reserved. 13
Simple end-to-end tests
© 2019 Cloudera, Inc. All rights reserved. 14
Simple end-to-end tests
Pros
• Mimics proper behavior
• Perfect for simple apps
Cons
• Hard to cover all cases
• Useless for complex applications
• Nearly impossible to control
ordering
• Hard to test windowing
Use it to validate simple pipeline logic.
Can be replaced by proper integration tests.
© 2019 Cloudera, Inc. All rights reserved. 15
“Manual” pipeline tests
Steps
1. Create manual Sources
2. Start application
3. Send input
4. Wait for output
5. Validate output
6. Repeat 3-5
7. Stop application
PipelineInput 1 Output 1
Input 2 Output 2
Watermark Output 3
Input 3 Output 4
Check in the flink-tutorials repo!
© 2019 Cloudera, Inc. All rights reserved. 16
“Manual” pipeline tests
© 2019 Cloudera, Inc. All rights reserved. 17
“Manual” pipeline tests
© 2019 Cloudera, Inc. All rights reserved. 18
“Manual” pipeline tests
Pros
• Full flow control
• Easy to test complex pipelines
• Easy watermark control
• Can cover corner cases
Cons
• Does not test race conditions
• Still pretty easy to miss corner
cases due to ordering
Use it to validate complex, timely dataflow logic and corner cases.
Integration testing
© 2019 Cloudera, Inc. All rights reserved. 20
Integration Testing
Pipeline
SourceConnectors
SinkConnectors
Embedded/Testing
Storage
Test Input
Input
Generator
Output
Validation
© 2019 Cloudera, Inc. All rights reserved. 21
Integration Testing
© 2019 Cloudera, Inc. All rights reserved. 22
Integration Testing
© 2019 Cloudera, Inc. All rights reserved. 23
Integration Testing
Pros
• Actually tests the “real thing”
• Covers connector config
• Can be used for performance
testing
Cons
• Resource intensive
• Similar caveats as simple
end-to-end testing
• Validation can be even more
complex
Every production application should be integration tested.
Wrapping up
© 2019 Cloudera, Inc. All rights reserved. 25
Overview of testing utilities
Great utils on all levels of our application logic
1. Functions/Operators
a. Unit test directly
b. Operator Test Harnesses
2. Pipeline/Flow
a. Simple end-to-end tests
b. Manual (JobTester) flow tests
3. Complete Application
a. Proper integration tests
Some are more user friendly than others...
© 2019 Cloudera, Inc. All rights reserved. 26
Thank you!
© 2019 Cloudera, Inc. All rights reserved. 27
References
https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html
https://www.ververica.com/flink-forward/resources/testing-stateful-streaming-applications
https://github.com/ottogroup/flink-spector
https://github.com/knaufk/flink-testing-pyramid
https://github.com/demiourgoi/flink-check/blob/master/flink-check/README.md
https://github.com/cloudera/flink-tutorials/tree/master/flink-stateful-tutorial/src/main/java/com/cloudera/streaming/examples/flink/utils/
testing

Virtual Flink Forward 2020: Testing production streaming applications - Gyula Fora & Matyas Orhidi

  • 1.
  • 2.
    © 2019 Cloudera,Inc. All rights reserved. 2 Decomposing production applications Pipeline SourceConnectors SinkConnectors
  • 3.
  • 4.
    © 2019 Cloudera,Inc. All rights reserved. 4 Testing Stateless Operators Stateless testing is simple, no magic involved Steps 1. Instantiate operator 2. (Create Mock/ListCollector) 3. Send input 4. Validate output Map Input 1 Input 2 Input 3 Output 1 Output 2 Output 3
  • 5.
    © 2019 Cloudera,Inc. All rights reserved. 5 Testing Stateless Operators
  • 6.
    © 2019 Cloudera,Inc. All rights reserved. 6 Testing Stateless Operators
  • 7.
    © 2019 Cloudera,Inc. All rights reserved. 7 Testing Stateful Operators Map Input 1 Watermark Input 2 Output 1 Output 2 Output 3 Harness We need to use the test harness to access state and timely functionality. Steps 1. Create operator test harness 2. Send inputs 3. Send watermarks 4. Validate output
  • 8.
    © 2019 Cloudera,Inc. All rights reserved. 8 Testing Stateful Operators
  • 9.
    © 2019 Cloudera,Inc. All rights reserved. 9 Testing Stateful Operators
  • 10.
  • 11.
    © 2019 Cloudera,Inc. All rights reserved. 11 Simple end-to-end tests Steps 1. Prepare test input 2. Run application 3. Collect test output 4. Validate Pipeline Test Input Test Output Validate
  • 12.
    © 2019 Cloudera,Inc. All rights reserved. 12 Simple end-to-end tests
  • 13.
    © 2019 Cloudera,Inc. All rights reserved. 13 Simple end-to-end tests
  • 14.
    © 2019 Cloudera,Inc. All rights reserved. 14 Simple end-to-end tests Pros • Mimics proper behavior • Perfect for simple apps Cons • Hard to cover all cases • Useless for complex applications • Nearly impossible to control ordering • Hard to test windowing Use it to validate simple pipeline logic. Can be replaced by proper integration tests.
  • 15.
    © 2019 Cloudera,Inc. All rights reserved. 15 “Manual” pipeline tests Steps 1. Create manual Sources 2. Start application 3. Send input 4. Wait for output 5. Validate output 6. Repeat 3-5 7. Stop application PipelineInput 1 Output 1 Input 2 Output 2 Watermark Output 3 Input 3 Output 4 Check in the flink-tutorials repo!
  • 16.
    © 2019 Cloudera,Inc. All rights reserved. 16 “Manual” pipeline tests
  • 17.
    © 2019 Cloudera,Inc. All rights reserved. 17 “Manual” pipeline tests
  • 18.
    © 2019 Cloudera,Inc. All rights reserved. 18 “Manual” pipeline tests Pros • Full flow control • Easy to test complex pipelines • Easy watermark control • Can cover corner cases Cons • Does not test race conditions • Still pretty easy to miss corner cases due to ordering Use it to validate complex, timely dataflow logic and corner cases.
  • 19.
  • 20.
    © 2019 Cloudera,Inc. All rights reserved. 20 Integration Testing Pipeline SourceConnectors SinkConnectors Embedded/Testing Storage Test Input Input Generator Output Validation
  • 21.
    © 2019 Cloudera,Inc. All rights reserved. 21 Integration Testing
  • 22.
    © 2019 Cloudera,Inc. All rights reserved. 22 Integration Testing
  • 23.
    © 2019 Cloudera,Inc. All rights reserved. 23 Integration Testing Pros • Actually tests the “real thing” • Covers connector config • Can be used for performance testing Cons • Resource intensive • Similar caveats as simple end-to-end testing • Validation can be even more complex Every production application should be integration tested.
  • 24.
  • 25.
    © 2019 Cloudera,Inc. All rights reserved. 25 Overview of testing utilities Great utils on all levels of our application logic 1. Functions/Operators a. Unit test directly b. Operator Test Harnesses 2. Pipeline/Flow a. Simple end-to-end tests b. Manual (JobTester) flow tests 3. Complete Application a. Proper integration tests Some are more user friendly than others...
  • 26.
    © 2019 Cloudera,Inc. All rights reserved. 26 Thank you!
  • 27.
    © 2019 Cloudera,Inc. All rights reserved. 27 References https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html https://www.ververica.com/flink-forward/resources/testing-stateful-streaming-applications https://github.com/ottogroup/flink-spector https://github.com/knaufk/flink-testing-pyramid https://github.com/demiourgoi/flink-check/blob/master/flink-check/README.md https://github.com/cloudera/flink-tutorials/tree/master/flink-stateful-tutorial/src/main/java/com/cloudera/streaming/examples/flink/utils/ testing