Successfully reported this slideshow.
Your SlideShare is downloading. ×

How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 34 Ad

How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

Download to read offline

At Gloo.us, we face a challenge in providing platform data to heterogeneous applications in a way that eliminates access contention, avoids high latency ETLs, and ensures consistency for many teams. We're solving this problem by adopting Data Mesh principles and leveraging Kafka, Kafka Connect, and Kafka streams to build an event driven architecture to connect applications to the data they need. A domain driven design keeps the boundaries between specialized process domains and singularly focused data domains clear, distinct, and disciplined. Applying the principles of a Data Mesh, process domains assume the responsibility of transforming, enriching, or aggregating data rather than relying on these changes at the source of truth -- the data domains. Architecturally, we've broken centralized big data lakes into smaller data stores that can be consumed into storage managed by process domains.
This session covers how we’re applying Kafka tools to enable our data mesh architecture. This includes how we interpret and apply the data mesh paradigm, the role of Kafka as the backbone for a mesh of connectivity, the role of Kafka Connect to generate and consume data events, and the use of KSQL to perform minor transformations for consumers.

At Gloo.us, we face a challenge in providing platform data to heterogeneous applications in a way that eliminates access contention, avoids high latency ETLs, and ensures consistency for many teams. We're solving this problem by adopting Data Mesh principles and leveraging Kafka, Kafka Connect, and Kafka streams to build an event driven architecture to connect applications to the data they need. A domain driven design keeps the boundaries between specialized process domains and singularly focused data domains clear, distinct, and disciplined. Applying the principles of a Data Mesh, process domains assume the responsibility of transforming, enriching, or aggregating data rather than relying on these changes at the source of truth -- the data domains. Architecturally, we've broken centralized big data lakes into smaller data stores that can be consumed into storage managed by process domains.
This session covers how we’re applying Kafka tools to enable our data mesh architecture. This includes how we interpret and apply the data mesh paradigm, the role of Kafka as the backbone for a mesh of connectivity, the role of Kafka Connect to generate and consume data events, and the use of KSQL to perform minor transformations for consumers.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to How a Data Mesh is Driving our Platform | Trey Hicks, Gloo (20)

Advertisement

More from HostedbyConfluent (20)

Recently uploaded (20)

Advertisement

How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

  1. 1. How the Data Mesh is Driving Our Platform Trey Hicks Director of Engineering
  2. 2. • Mentors • Faith • Recovery Centers • Resources Applications That Help People Building Technologies To Connect People
  3. 3. • Diverse application types and purpose • Serving several verticals • Varying resource needs • Apps are built internally by Gloo or with partners • Common means of connectivity to data and services Supporting The Mission Common Platform Must Consider
  4. 4. Technical Landscape • Microservices • Datastores per service or application domains • Domain based services • Event Driven • Domain Driven • Kubernetes • AWS • Confluent Cloud Ø Kafka Ø KsqlDB • Kafka Connect cluster • Docker Our Approach Consists of Architectural Infrastructure
  5. 5. • Heterogeneous apps • Resource contention • Gravitational pull to put application use-cases lower in the stack • Tight coupling due to customization of shared services • Blocking development due to cross-team dependencies • Limits to our ability to scale the organization Challenges Challenges in Building the Platform
  6. 6. v Our value prop isn’t the applications, it’s the data v Application specific use-cases low in the stack causes problems Platform Facts
  7. 7. Enter Data Mesh • Domain-driven architecture • Data as a product • Self-serve architecture • Governance Zhamak Dehghani https://martinfowler.com/articles/data-monolith- to-mesh.html Perhaps the ideas have existed before • Data Emphasis • Domain Driven Design • Service Oriented Architectures Provides terminology to shift the conversation UPWARDS to form a BROAD data strategy as opposed to being a technical concern Principles Data Mesh Paradigm
  8. 8. Solving the Challenges Domain-Driven Architecture Principle Appeal Solves Data As a Product Self-Serve Infrastructure Governance • Microservice architecture • Primary value • Apps are transient • Easy connectivity to data and domains • Secure data ports • Community trust • Privacy • Many apps • Resource contention • App requirements in core services • Blocking development • Tight coupling • Blocking development • App requirements in stack • Tight coupling • Blocking development
  9. 9. Adopting The Principles • Establish common terminology and language • Promote a data first philosophy • Embrace democratized ownership and the associated responsibilities • Acceptance of eventual consistency • In our case, embracing event streams Culture Shift
  10. 10. Data As a Product How We Define Data Products • Our data is our unique value • Foundation for apps and services that drive success • Requires governance Ø Security Ø Availability Ø Accessibility Ø Change controls • Free of application use-cases • Integrity
  11. 11. • Person • Organization • Catalysts • Relationships Data Product Examples Core Data Objects Secondary Objects • Cohorts/Collections • Growth Intelligence • Assessments
  12. 12. Access via Data Ports
  13. 13. Sharing the Data • Distributed Data Products • Domain boundaries • Process/Application domains apply their use-cases • Domains may use sub-sets or combinations • Derived Data Products Conceptual Architecture
  14. 14. Examples • Campaign Data • Event Sourcing
  15. 15. Implementation: Campaign Data
  16. 16. Creating a Data Product
  17. 17. Connecting to the Data Mesh Sharing the Data Product • Governed data available • Options for Access Ø Download with ETL or ELT Ø Kafka • Both have complications Ø Manual processes Ø Lack of consuming process Ø Skillsets not aligned
  18. 18. More Complexity
  19. 19. Enter Kafka Ecosystem Data Mesh Platform Using Kafka • Kafka is perfect for one to many • Event streams/batches provide a means keeping the consuming domains in sync with the data product • Kafka Connect is perfect for turning datastores into event streams • Kafka Connect is perfect for sinking the streams into a datastore • KsqlDB is perfect for selecting subsets of data or combining streams to shape the data
  20. 20. Kafka Connect Building the Mesh • Connect Data Product Ø S3 Source Connector • Connect Consumers Ø JDBC Sinks Ø ES Sink
  21. 21. Kafka Connect S3 Source Connector • S3 connection • Policies Ø Polling Ø Subdirectories • JSON = more approachable * Mario Molina
  22. 22. Kafka Connect JDBC Sink Connector • DB Connection • Dealing with Schema Ø Table.name.format Ø Auto.create and evolve • Single Message Transform Ø Inject timestamp
  23. 23. Kafka Connect ES Sink Connector • Uses REST client • Single Message Transform Ø Document id Ø Index name
  24. 24. Derived Data Products
  25. 25. Implementation: Event Sourcing
  26. 26. • Bloated infrastructure Ø Expensive footprint Ø K8s is great, maybe too easy to spin up new instances • Experimentation leaves dead instances and other bones • Complicated data model and APIs Revisiting Technical Landscape New Concerns
  27. 27. • Simplify the overall footprint Ø Fewer and simpler services Ø Smaller clusters Ø Fewer instances • Improve database schema • Rethink our APIs Going Forward In Reverse Rethinking Parts of the Platform
  28. 28. Event Sourcing ● Major changes without interruption Ø Tables restructure Ø Elements combined or removed ● Existing streams via Connectors ● Need additional JDBC sinks Changing the Schema
  29. 29. Applying KsqlDB
  30. 30. More On Infrastructure • Structured like other engineering “pods” Ø Engineers Ø Product • Charter is to build the self-serve connectivity • Responsible for Data Mesh infrastructure • Create reference configs for all Kafka Connectors • Make it super simple to define, add, and govern new data products • One team responsible for connectivity and data movement Creation of Data Mesh Engineering
  31. 31. Discovery • Provide a catalog of all data products Ø Documentation or manual catalogs are DOA Ø Must be automatic • All data products • Communication channels • Consuming domains • Provide schemas • Data ports Keeping Track of All the Things
  32. 32. Deployment • Kafka Configs Project Ø Project for all Connectors, KsqlDB, and topic configurations Ø Updates trigger deployment • Uses REST Proxies to deploy updates • Open Source? • Kafka JMX Exporter to collect metrics used in Grafana dashboards Continuous Deployment
  33. 33. Closure • Data first organization • Data mesh paradigm helps us solves problems • Kafka ecosystem is the core of the data mesh driving the platform • Serving our application domains by using Kafka Connect and KsqlDB • Future Ø Improve self-serve Ø Discovery App à If you have experienced this problem, let’s chat! Summary
  34. 34. Acknowledgments ● Collin Shaafsma – Leadership ● Ken Griesi – Inspiration, guidance, and discovering the articles Alex Lauderbaugh All things Data and ghost writer Scott Symmank Technical lead Hannah Manry Amazing engineer Mitch Ertle Resident BA expert and principal consumer Chicken Mascot * We’re Hiring

×