Successfully reported this slideshow.
Your SlideShare is downloading. ×

Kafka Summit 2019 Microservice Orchestration

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Kafka summit apac session
Kafka summit apac session
Loading in …3
×

Check these out next

1 of 58 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Kafka Summit 2019 Microservice Orchestration (20)

Advertisement

Recently uploaded (20)

Kafka Summit 2019 Microservice Orchestration

  1. 1. Building and Evolving a Dependency-Graph Based Microservice Architecture Lars Francke – Partner and Co-Founder @ OpenCore Kafka Summit 2019 – 30 September 2019
  2. 2. © 2019 OpenCore GmbH & Co. KG 2 About Me – Lars Francke • Partner & Co-Founder at OpenCore • We do Hadoop/Big Data/insert Buzzword consulting • Based in Germany but doing business world-wide if you need us  • https://www.opencore.com • ASF/Big Data/Hadoop since 2008 • Apache Committer & Member: HBase, Hive, ORC, Training (PMC) • Contact • lars.francke@opencore.com • @lars_francke
  3. 3. The problem © 2019 OpenCore GmbH & Co. KG 3
  4. 4. © 2019 OpenCore GmbH & Co. KG 4 The problem No one here knows the dependencies between all our Microservices anymore! We drew a picture but it hasn't been updated in months and is now doing more harm than good We're afraid of stopping this service because we don't know who depends on it How do these topics differ and where can I find the latest customer registrations? "customer_regs", "customer_regs1", "customer_regs_new", "new_customers", "customers_lars_test" We need to migrate from On-Prem Kafka to Confluent Cloud but have no idea where to begin and what we need.
  5. 5. © 2019 OpenCore GmbH & Co. KG 5 The problem Didn't the London team already build a service to check zip codes? Why has this dashboard stopped showing data? Does anyone mind if I add a field to the "Customer" object? Oh no, Governance wants to know where in Kafka we store PII data 
  6. 6. Microservice architectures © 2019 OpenCore GmbH & Co. KG 6
  7. 7. Choreography
  8. 8. © 2019 OpenCore GmbH & Co. KG 8 Choreography Also known as Event-Driven
  9. 9. © 2019 OpenCore GmbH & Co. KG 9 Choreography • Services coordinate amongst themselves • No central service • "Smart endpoints and dumb pipes" – Martin Fowler & James Lewis • Kafka often used for the "dumb pipes" part (no offense!) • Lots of flexibility • Just add a new service, no need to coordinate with others • Use whatever language you want, whatever data format you want etc. • Often brittle • Loose coupling means you might depend on a service without knowing it • Those dependencies might change and break • People might depend on your service without you knowing it!
  10. 10. © 2019 OpenCore GmbH & Co. KG 10 Choreography • Hard to keep track of everything & get an overview • Harder to verify at build-time • One can only do the equivalent of a unit test easily, integration testing is harder if other components are unknown or under control by a different team • Different teams can work independently
  11. 11. © 2019 OpenCore GmbH & Co. KG 11 Choreography When I said no central service What I meant was that we obviously still do have central services like: • Schema Registry • Log collection • Monitoring • Etc.
  12. 12. Orchestration Source: https://commons.wikimedia.org/wiki/File:Peter_Oundjian_- _Conductor_of_Toronto_Symphony_Orchestra_2014.jpg
  13. 13. © 2019 OpenCore GmbH & Co. KG 13 Orchestration • One central "coordinator" that tells everyone what to do • Like a conductor in an orchestra • The Enterprise Service Bus (ESB) is an example • Routing, Transformations, Business rules etc. • It's easy to get an overview over the whole system • The central service can even provide a nice UI, showing dependency graphs • Monitoring is easier
  14. 14. © 2019 OpenCore GmbH & Co. KG 14 Orchestration • Less flexible • Adding a new service requires coordination and potentially changing/restarting existing things • Less brittle • Central service can validate the architecture • The architecture/graph can often be verified at "build"-time • Works well with CI/CD • * as Code (Infrastructure, Configuration, …) • May require coordination between teams • Less self-service
  15. 15. © 2019 OpenCore GmbH & Co. KG 15 Orchestography Natural question to ask: Which is better?
  16. 16. © 2019 OpenCore GmbH & Co. KG 16 Orchestography
  17. 17. © 2019 OpenCore GmbH & Co. KG 17 Orchestography Both have their uses!
  18. 18. © 2019 OpenCore GmbH & Co. KG 18 Microservices • Microservices are often used to split up a single monolithic app into multiple independent services • There are still independent "business applications" even though some may share data or even services • Ideally a single team responsible for a product • Orchestration is easier within one product (or team) while Choreography is appealing across product/team borders
  19. 19. © 2019 OpenCore GmbH & Co. KG 19 Orchestography • Orchestration lends itself more to "workflow" oriented tasks which are split across multiple processes and/or need to be distributed • Strict or at least strong dependencies between those tasks • Can be seen as "one" thing, that could – in theory – also be implemented as one monolithic process • Choreography lends itself more to loosely coupled or decoupled services • These might also have dependencies but often not as strict
  20. 20. © 2019 OpenCore GmbH & Co. KG 20 Orchestography Application 1 (Orchestrated) Kafka (or similar, for Choreography) Application 3 (Orchestrated) Application 4 (Orchestrated) Application 2 (Orchestrated) ?
  21. 21. © 2019 OpenCore GmbH & Co. KG 21 Example
  22. 22. Naming things is hard © 2019 OpenCore GmbH & Co. KG 22
  23. 23. © 2019 OpenCore GmbH & Co. KG 23 Cattle vs. Pets Show of hands Who here has (had) servers with names like: Sources: https://www.flickr.com/photos/gageskidmore/7584137078, https://commons.wikimedia.org/wiki/File:Jean-Luc_Picard_2.jpg, https://www.flickr.com/photos/44214515@N06/21547144233
  24. 24. © 2019 OpenCore GmbH & Co. KG 24 Pets Names like that are a good sign that these servers might be your Pets They often have a combination of these features: • Manually built and managed • Indispensable • Can never be down
  25. 25. © 2019 OpenCore GmbH & Co. KG 25 Cattle The industry has moved on (or is in the process) to treating Servers (and services) as Cattle instead • No identity (random names or based on some pattern) • Disposable • Infrastructure as Code
  26. 26. © 2019 OpenCore GmbH & Co. KG 26 Cattle • The Cloud was a big "enabler" for this movement • Servers have more or less random names • Each specific instance doesn't matter, will be rebuilt when needed • e.g. Spot Instances • Kubernetes & Co. playing a role as well
  27. 27. © 2019 OpenCore GmbH & Co. KG 27 Cattle If we agree that this is a good thing… …why do you have a topic called customerCreated
  28. 28. Technology to the rescue Lars, tell us what to do! © 2019 OpenCore GmbH & Co. KG 28
  29. 29. © 2019 OpenCore GmbH & Co. KG 29  We are not the first to struggle with this Surprising, I know
  30. 30. © 2019 OpenCore GmbH & Co. KG 30 Good Compan(y|ies) Source: https://commons.wikimedia.org/wiki/File:ING_Group_N.V._Logo.svg
  31. 31. © 2019 OpenCore GmbH & Co. KG 31 Netflix Conductor • Netflix OSS Project: https://github.com/Netflix/conductor • "Conductor is a Workflow Orchestration engine that runs in the cloud." • Conductor runs backend Servers providing UI & REST API • You define your Workflows in a JSON DSL, POST it to the API • You develop your Workers in whichever language you want (convenience libraries available for Java & Python) and they get new work from the REST API
  32. 32. © 2019 OpenCore GmbH & Co. KG 32 Netflix Conductor • Workflows consist of Tasks • Conductor itself can store some Payload or it can/must be stored externally • It does not support using Kafka (or similar) to decouple Tasks
  33. 33. © 2019 OpenCore GmbH & Co. KG 33 Netflix Conductor – Tasks [ { "name": "verify_if_idents_are_added", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180 }, { "name": "add_idents", "retryCount": 3, "retryLogic": "FIXED", "retryDelaySeconds": 10, "timeoutSeconds": 300, "timeoutPolicy": "TIME_OUT_WF", "responseTimeoutSeconds": 180 } ]
  34. 34. © 2019 OpenCore GmbH & Co. KG 34 Netflix Conductor – Workflow Pt. 1 { "name": "add_netflix_identation", "description": "Adds Netflix Identation to video files.", "version": 2, "schemaVersion": 2, "tasks": [ { "name": "verify_if_idents_are_added", "taskReferenceName": "ident_verification", "inputParameters": { "contentId": "${workflow.input.contentId}" }, "type": "SIMPLE" }, { "name": "decide_task", "taskReferenceName": "is_idents_added", "inputParameters": { "case_value_param": "${ident_verification.output.is_idents_added}" },
  35. 35. © 2019 OpenCore GmbH & Co. KG 35 Netflix Conductor – Workflow Pt. 2 "type": "DECISION", "caseValueParam": "case_value_param", "decisionCases": { "false": [ { "name": "add_idents", "taskReferenceName": "add_idents_by_type", "inputParameters": { "identType": "${workflow.input.identType}", "contentId": "${workflow.input.contentId}" }, "type": "SIMPLE" } ] } } ] }
  36. 36. © 2019 OpenCore GmbH & Co. KG 36 Uber Cadence • Uber Project: https://github.com/uber/cadence • "Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way." • From the same people that lead the Amazon Simple Workflow service • Has Clients for Java & Go • Other possible, communicate via Thrift
  37. 37. © 2019 OpenCore GmbH & Co. KG 37 Uber Cadence • Cadence handles Task state & Queues for us • Your Workflow is implemented in code • Workflows can run & wait for a long time • Example: Subscription Renewal workflow that runs forever and charges your customer every 30 days • Also no direct Kafka integration
  38. 38. © 2019 OpenCore GmbH & Co. KG 38 Uber Cadence – Example @Override public void execute(String customerId) { activities.sendWelcomeEmail(customerId); try { boolean trialPeriod = true; while (true) { Workflow.sleep(Duration.ofDays(30)); activities.chargeMonthlyFee(customerId); if (trialPeriod) { activities.sendEndOfTrialEmail(customerId); trialPeriod = false; } else { activities.sendMonthlyChargeEmail(customerId); } } } catch (CancellationException e) { activities.processSubscriptionCancellation(customerId); activities.sendSorryToSeeYouGoEmail(customerId); } }
  39. 39. © 2019 OpenCore GmbH & Co. KG 39 Expedia Stream Registry • Expedia project (originally HomeAway): https://github.com/ExpediaGroup/stream-registry • A metadata service for streams • Who owns the stream? • Who are the producers and consumers of the stream? • Management of stream replication across clusters and regions • Management of stream storage for permanent access • Management of stream triggers for legacy stream sources
  40. 40. © 2019 OpenCore GmbH & Co. KG 40 Expedia Stream Registry • It manages Clusters as well as "Streams" of data • Including schemas, owners and other metadata • Unfortunately the docs are pretty thin • Moved from HomeAway to Expedia while undergoing a refactor
  41. 41. © 2019 OpenCore GmbH & Co. KG 41 Others There are others: • ING Baker • ING Project: https://github.com/ing-bank/baker • "Orchestrate microservice-based process flows" • Java based library • You specify a Recipe which includes all your functions (interactions), the data they need (ingredients) and the data they produce (event) • Zeebe • From the Camunda folks • BPMN 2 • "A Workflow Engine for Microservices Orchestration" • Dagster • …
  42. 42. Where does this leave us? © 2019 OpenCore GmbH & Co. KG 42
  43. 43. © 2019 OpenCore GmbH & Co. KG 43 The Current State • Most existing tools require you to explicitly model your dependency graph • This makes sense for "strict" workflows • But not for many other use-cases (e.g. analytics, logging, persistence etc.) • This is comparable to having SQL without a Query Optimizer or Spark without Catalyst • Some tools require you to implement their API or use their library
  44. 44. © 2019 OpenCore GmbH & Co. KG 44 The Current State • Unfortunately, the perfect solution doesn't yet exist • The Orchestrators that do exist are all very nice and work • For the Choreography though things are a bit bleak • Stream Registry moves into the right direction • Schema Registry is necessary as well but not sufficient
  45. 45. © 2019 OpenCore GmbH & Co. KG 45 Does this seem familiar?
  46. 46. © 2019 OpenCore GmbH & Co. KG 46 Wishlist • We need better support for Event-Driven (Choreography) style architectures • We need better Governance for data in Kafka • This problem is not exclusive to Kafka • Kafka topics shouldn't be managed manually • We need better self-service tools to find data sources
  47. 47. © 2019 OpenCore GmbH & Co. KG 47 Wishlist • We'd like a tool that • allows us to register logical streams of data, • Used to distinguish flows with the same schema • Metadata (e.g. owners) • e.g. "New customers stream" • allows us to register Connections, • e.g. Kafka Clusters, Kinesis credentials etc. • allows us to register (Micro-)Services • Including their Inputs and Outputs • These are the "Data" in- and outputs, not any topic itself • Both reference existing Schemas • Optional: Dependencies
  48. 48. © 2019 OpenCore GmbH & Co. KG 48 Wishlist • This tool could use this information to • automatically build an optimal DAG, • and execute all necessary steps to enable this DAG: • Create Kafka Topics • Create necessary ACLs • Optionally: Update MirrorMaker configuration or other steps • The Services itself can then get all the information they need from the REST API • Cluster configuration • Schema information • Topic names for in- and output • Optional: Pre- & Postconditions
  49. 49. © 2019 OpenCore GmbH & Co. KG 49 Wishlist • For those who use Apache Spark: • In Spark you define all your actions and transformations, at the end it builds an optimal DAG out of this information and executes it • This tool would do the exact same thing but across process boundaries • The Services itself can be written in any language as long as they can make REST calls • Convenience clients would be great but optional • As this tool controls the data flow (no data flows through the tool itself though) it can create "intermediate" topics to enable more use-cases: • Quality checks • Automatic anonymization • Automatic collection of samples
  50. 50. © 2019 OpenCore GmbH & Co. KG 50 Example
  51. 51. © 2019 OpenCore GmbH & Co. KG 51 Example Service A Service BTopic "xqdrnc"
  52. 52. © 2019 OpenCore GmbH & Co. KG 52 Example Service A Service BTopic "xqdrnc" Infra Service if (booking.travel_agency == "Thomas Cook") { alert() }
  53. 53. © 2019 OpenCore GmbH & Co. KG 53 Example Service A Service BTopic "xqdrnc" if (booking.travel_agency == "Thomas Cook") { fail() }
  54. 54. © 2019 OpenCore GmbH & Co. KG 54 Example if (booking.travel_agency == "Thomas Cook") { fail() } Service A Service BTopic "xqdrnc" Infra Service Topic "blgrgb"
  55. 55. © 2019 OpenCore GmbH & Co. KG 55 Wishlist • This tool could also (optionally) automatically run or re-run the services using e.g. Kubernetes • This'd allow for total control • Services need to be made aware of changes in the topology • We could automatically transform between data formats • e.g. a service accepting Protobuf but the data only exists in Avro
  56. 56. © 2019 OpenCore GmbH & Co. KG 56 Wishlist • A side effect would be an automatically up-to-date Governance/Data Catalog • This would allow for better self-service operations: You don't have to find "topics" in Kafka with your data, you just have to declare which data you're interested in and the system will always tell you where this data lives • Orchestrators like Conductor etc. would still be important encapsulated in a "Application" • Which itself could consist of multiple services
  57. 57. © 2019 OpenCore GmbH & Co. KG 57 Orchestography Service 1 (Orchestrated) Kafka (or similar, for Choreography) Service 3 (Orchestrated) Service 4 (Orchestrated)Service 2 (Orchestrated) ?
  58. 58. Questions What are your questions? lars.francke@opencore.com @lars_francke © 2019 OpenCore GmbH & Co. KG 58

Editor's Notes

  • Note: No Kafka in the title
    We promised Kafka Streams but that's not going to happen, sorry 
  • https://pxhere.com/en/photo/940450
  • https://commons.wikimedia.org/wiki/File:Peter_Oundjian_-_Conductor_of_Toronto_Symphony_Orchestra_2014.jpg
  • Before microservices these workflow style things would probably have been one monolithic service
  • A customer of ours has distributed teams
    They write their services in various languages (JavaScript, Java, Python, Go)
    What can't be seen is that every arrow means a new topic
    Downstream services don't care which topic they read from, they only care about the "best" booking there is, some don't care and use the earliest one
  • https://www.flickr.com/photos/gageskidmore/7584137078
    https://commons.wikimedia.org/wiki/File:Jean-Luc_Picard_2.jpg
    https://www.flickr.com/photos/44214515@N06/21547144233

    Other examples: Rancor, Merlin, ...
  • http://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
  • https://en.wikipedia.org/wiki/File:Netflix_2015_logo.svg
    https://brand.uber.com/guide#logo-overview
    https://commons.wikimedia.org/wiki/File:ING_Group_N.V._Logo.svg
    https://newsroom.expedia.com/image-gallery
  • https://netflix.github.io/conductor/
  • Baker is the exception
  • A customer of ours has distributed teams
    They write their services in various languages (JavaScript, Java, Python, Go)
    What can't be seen is that every arrow means a new topic

×