SlideShare a Scribd company logo
1 of 89
Full slide deck here:
http://bit.ly/ceposta-hardest-part
Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Christian Posta
Principal Architect – Red Hat
• Author “Microservices for Java Developers”
• Committer/contributor Apache Camel, Apache ActiveMQ,
Fabric8.io, Apache Kafka, Debezium.io, et. al.
• Worked with large Microservices, web-scale, unicorn
company
Free download @ http://developers.redhat.com
People try to copy Netflix, but they can only
copy what they see. They copy the
results, not the process.
Adrian Cockcroft, former Chief Cloud Architect, Netflix
“Microservices” is about optimizing… for speed.
• Maybe it doesn’t matter so much… What
we really care about is speed, reduced
time to value, and business outcomes.
• Maybe a data-driven approach is a better
way to answer this question...
Are you doing microservices?
• Number of features accepted
• % of features completed
• User satisfaction
• Feature Cycle time
• defects discovered after deployment
• customer lifetime value (future profit as a result of relationship with the
customer) https://en.wikipedia.org/wiki/Customer_lifetime_value
• revenue per feature
• mean time to recovery
• % improvement in SLA
• number of changes
• number of user complaints, recommendations, suggestions
• % favorable rating in surveys
• % of users using which features
• % reduction in error rates
• avg number of tx / user
• MANY MORE!
Are you doing microservices?
How does your company go fast?
Manage dependencies.
Data is a major dependency.
Wait. What is data?
What is one “thing”?
Book checkout / purchase Title Search
Recommendations
Weekly reporting
Focus on domain models, not data models
• Break things into smaller,
understandable models
• Surround a model and its
“context” with a boundary
• Implement the model in code
or get a new model
• Explicitly map between
different contexts
• Model transactional
boundaries as aggregates
Aggregates
• Use the domain to lead you to invariant rules across your domain
model
• Model the invariants and their associated entities/value objects as
“aggregates”
• Aggregates focus on transactional boundaries (ie, transactional in
the “A” from ACID sense)
• Individual aggregates are transactionally consistent
• Aggregates use relaxed consistency models between aggregates
(ie, something like the Actor model?)
• Bounded Contexts use relaxed consistency models between
boundaries
Stick with these conveniences as long as you can.
Seriously.
But ...
• Load/size is too great to fit on one box
• Modules/use cases have different read/write
characteristics
• Queries/joins are getting too complex
• Security issues
• Lots of conflicting changes to the model/schema
• Need denormalized, optimized indexing engines
• We can live with eventual consistency (whatever that
really means)
From here on out, what we’re saying is
“thank you old reliable, awesome database…
we’ve got it from here”…
Kinda looks like a combinatorial mess….
“A microservice has its own database”
How do we deal with data in this world?
We need to understand something about the data
inside our services and the data outside our services.
https://msdn.microsoft.com/en-us/library/ms954587.aspx
Data inside a service
Data inside a service
Data outside a service
Data outside a service
Data outside a service
We’re now building a full-fledged distributed system.
Some things to remember…
Plan for failures.
Build concepts of time, delay,
network, and failures into the
design as a first-class citizen.
How do you “read” data and how do you “update” data.
tx.begin()
c = retrieveCustomer()
c.addNewAddress(address)
tx.add(c)
tx.commit()
publishAddressChange(address, c.id)
tx.begin()
c = retrieveCustomer()
c.addNewAddress(address)
tx.add(c)
publishAddressChange(address, c.id)
tx.commit()
Separate reads and writes
(CQRS)
https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
getBulkHats()
getBulkHatsForCatsExcept()
wellReallyIJustWantCertainHats()
justExecuteThisSqlForMe()
https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
For our reads and writes, we need some “consistency”.
What is consistency?
The history of past operations we observe as
a reader of the data
We need reads and writes. But we expect failures. This
is starting to sound like a distributed-systems theorem
I’ve heard…
CAP tells us to pick 2: Consistency, Availability,
Partition Tolerance
CAP is a bad way to think about this.
Linearizable (strict) consistency
CAP - C
Sequential consistency
Monotonic reads consistency
Eventual consistency
Consistency models…
https://en.wikipedia.org/wiki/Consistency_model
• Strict consistency (Linearizability)
• Sequential consistency
• Causal consistency
• Processor consistency
• PRAM consistency (FIFO)
• Bounded staleness consistency
• Monotonic read consistency
• Monotonic write consistency
• Read your writes consistency
• Eventual consistency
Can we really use relaxed consistency models?
Tradeoffs to make with read consistency and
performance
Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
replicated-data-consistency-explained-through-baseball/
• What consistency model do you need, depending on what
role you’re playing?
• What consistency model are you willing to pay for?
• Official score keeper? (Linearizability or RMW)
• Umpire? (Linearizability)
• Sports writer? (Bounded staleness, Eventual consistency)
• Radio updates? (Monotonic read, Bounded staleness)
• Statistician (Bounded staleness)
• Friends in the pub (Eventual consistency)
Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
replicated-data-consistency-explained-through-baseball/
Maybe we can use a relaxed consistency model for some
of those previously mentioned use cases…
Example relaxing consistency…
Internet companies created their own tools
for helping with this. (some opensource!!)
• Yelp – MySQL Streamer
https://github.com/Yelp/mysql_streamer
• LinkedIn – Databus
https://github.com/linkedin/databus
• Zendesk – Maxwell
https://github.com/zendesk/maxwell
Meet debezium.io
Meet debezium.io
WePay uses Debezium
https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka
Meet debezium.io
Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Thanks for listening! Time for demo?

More Related Content

What's hot

SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 

What's hot (20)

Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Mysql data replication
Mysql data replicationMysql data replication
Mysql data replication
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
CI CD Basics
CI CD BasicsCI CD Basics
CI CD Basics
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 

Similar to The hardest part of microservices: your data

Similar to The hardest part of microservices: your data (20)

A microservices journey - Round 2
A microservices journey - Round 2A microservices journey - Round 2
A microservices journey - Round 2
 
Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2Microservices with Apache Camel, Docker and Fabric8 v2
Microservices with Apache Camel, Docker and Fabric8 v2
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices
 
Advanced web application architecture Way2Web
Advanced web application architecture Way2WebAdvanced web application architecture Way2Web
Advanced web application architecture Way2Web
 
Microservices Journey Summer 2017
Microservices Journey Summer 2017Microservices Journey Summer 2017
Microservices Journey Summer 2017
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - Talk
 
The Hardest Part of Microservices: Calling Your Services
The Hardest Part of Microservices: Calling Your ServicesThe Hardest Part of Microservices: Calling Your Services
The Hardest Part of Microservices: Calling Your Services
 
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
 
PHX DevOps Days: Service Mesh Landscape
PHX DevOps Days: Service Mesh LandscapePHX DevOps Days: Service Mesh Landscape
PHX DevOps Days: Service Mesh Landscape
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
DDD, CQRS and testing with ASP.Net MVC
DDD, CQRS and testing with ASP.Net MVCDDD, CQRS and testing with ASP.Net MVC
DDD, CQRS and testing with ASP.Net MVC
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
API World: The service-mesh landscape
API World: The service-mesh landscapeAPI World: The service-mesh landscape
API World: The service-mesh landscape
 
Ratpack and Grails 3
 Ratpack and Grails 3 Ratpack and Grails 3
Ratpack and Grails 3
 
Ratpack and Grails 3
Ratpack and Grails 3Ratpack and Grails 3
Ratpack and Grails 3
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Serverless solution architecture in AWS
Serverless solution architecture in AWSServerless solution architecture in AWS
Serverless solution architecture in AWS
 

More from Christian Posta

Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)
Christian Posta
 
Role of edge gateways in relation to service mesh adoption
Role of edge gateways in relation to service mesh adoptionRole of edge gateways in relation to service mesh adoption
Role of edge gateways in relation to service mesh adoption
Christian Posta
 
API Gateways are going through an identity crisis
API Gateways are going through an identity crisisAPI Gateways are going through an identity crisis
API Gateways are going through an identity crisis
Christian Posta
 

More from Christian Posta (20)

Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Understanding Wireguard, TLS and Workload Identity
Understanding Wireguard, TLS and Workload IdentityUnderstanding Wireguard, TLS and Workload Identity
Understanding Wireguard, TLS and Workload Identity
 
Compliance and Zero Trust Ambient Mesh
Compliance and Zero Trust Ambient MeshCompliance and Zero Trust Ambient Mesh
Compliance and Zero Trust Ambient Mesh
 
Cilium + Istio with Gloo Mesh
Cilium + Istio with Gloo MeshCilium + Istio with Gloo Mesh
Cilium + Istio with Gloo Mesh
 
Multi-cluster service mesh with GlooMesh
Multi-cluster service mesh with GlooMeshMulti-cluster service mesh with GlooMesh
Multi-cluster service mesh with GlooMesh
 
Multicluster Kubernetes and Service Mesh Patterns
Multicluster Kubernetes and Service Mesh PatternsMulticluster Kubernetes and Service Mesh Patterns
Multicluster Kubernetes and Service Mesh Patterns
 
Cloud-Native Application Debugging with Envoy and Service Mesh
Cloud-Native Application Debugging with Envoy and Service MeshCloud-Native Application Debugging with Envoy and Service Mesh
Cloud-Native Application Debugging with Envoy and Service Mesh
 
Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)
 
The Truth About the Service Mesh Data Plane
The Truth About the Service Mesh Data PlaneThe Truth About the Service Mesh Data Plane
The Truth About the Service Mesh Data Plane
 
Deep Dive: Building external auth plugins for Gloo Enterprise
Deep Dive: Building external auth plugins for Gloo EnterpriseDeep Dive: Building external auth plugins for Gloo Enterprise
Deep Dive: Building external auth plugins for Gloo Enterprise
 
Role of edge gateways in relation to service mesh adoption
Role of edge gateways in relation to service mesh adoptionRole of edge gateways in relation to service mesh adoption
Role of edge gateways in relation to service mesh adoption
 
Navigating the service mesh landscape with Istio, Consul Connect, and Linkerd
Navigating the service mesh landscape with Istio, Consul Connect, and LinkerdNavigating the service mesh landscape with Istio, Consul Connect, and Linkerd
Navigating the service mesh landscape with Istio, Consul Connect, and Linkerd
 
Chaos Debugging for Microservices
Chaos Debugging for MicroservicesChaos Debugging for Microservices
Chaos Debugging for Microservices
 
Leveraging Envoy Proxy and GraphQL to Lower the Risk of Monolith to Microserv...
Leveraging Envoy Proxy and GraphQL to Lower the Risk of Monolith to Microserv...Leveraging Envoy Proxy and GraphQL to Lower the Risk of Monolith to Microserv...
Leveraging Envoy Proxy and GraphQL to Lower the Risk of Monolith to Microserv...
 
Service-mesh options with Linkerd, Consul, Istio and AWS AppMesh
Service-mesh options with Linkerd, Consul, Istio and AWS AppMeshService-mesh options with Linkerd, Consul, Istio and AWS AppMesh
Service-mesh options with Linkerd, Consul, Istio and AWS AppMesh
 
Intro Istio and what's new Istio 1.1
Intro Istio and what's new Istio 1.1Intro Istio and what's new Istio 1.1
Intro Istio and what's new Istio 1.1
 
API Gateways are going through an identity crisis
API Gateways are going through an identity crisisAPI Gateways are going through an identity crisis
API Gateways are going through an identity crisis
 
KubeCon NA 2018: Evolution of Integration and Microservices with Service Mesh...
KubeCon NA 2018: Evolution of Integration and Microservices with Service Mesh...KubeCon NA 2018: Evolution of Integration and Microservices with Service Mesh...
KubeCon NA 2018: Evolution of Integration and Microservices with Service Mesh...
 
Intro to Knative
Intro to KnativeIntro to Knative
Intro to Knative
 
Making sense of microservices, service mesh, and serverless
Making sense of microservices, service mesh, and serverlessMaking sense of microservices, service mesh, and serverless
Making sense of microservices, service mesh, and serverless
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 

The hardest part of microservices: your data

Editor's Notes

  1. Speed!!!.... As in performance? Or scale? What is this speed thing all about? This is a very different way of thinking about IT. Typically IT is optimized for Cost. Many parts of the business are. We’re not product companies anymore…. IT was traditionally used to transform otherwise paper processes or manual processes. And to support things like CRM, Accounting, Procurement, etc. Internally supporting. But now companies are using IT to deliver value through services. In fact, startups, are finding out to deliver value through digital channels and are quickly disrupting old guard enterprise corporations. We are service companies. Services require bi-direction/omni-directional interactions, communication with our customers. Creating value is done with customers. The faster you can get things to market the faster you can see what works and what doesn’t. We don’t know what will work up front. We don’t know what will deliver business value up front. We need to discover it. What we want is to build an organization that’s able to experiment, fail fast, and iterate on what does work. We basically want IT to drive outcomes that deliver business value. And we want to go fast.
  2. The discovery of what’s important, and the experimentation process leads us to want to find business value. We want to quickly find out the things that don’t work and minimize the cost it takes to do these experiments. This transformation is a process, not something that happens over night, and not something you can copy. You’ll even note that each organization is different in how it can go about this process; each needs to balance speed, safety, and business value for itself.
  3. Get back to first principles. Focus on principles, patterns, methodologies. Tools will help, but you cannot start with tools.
  4. Autonomy….
  5. What is it? Who defines it? Who owns that definition? Who owns the instances of it. How do I get it? How do I not miss something? And if I do solve these questions what does the architecture look like? Bunch of point to point connections? Lots of big up front design? Lots of contracts and governance? These things tend to break autonomy. Let’s explore this a bit and see what problems we run into with data in a microservices world.
  6. Now, understanding the domain, understanding the data model, an understanding where the boundaries is complex stuff. It cannot be solved with technology alone. Let me give you a simple example…. This seems like a simple, even absurd question. It’s really not. This one simple question can illustrate how ambiguous contradictory our language is with respect to understanding “real life” We cannot understand how to store a representation of a perceived “real life” unless we can describe it in plain language without ambiguity. What is a book and how do we represent it? We need to first understand what “one book is.” If an author has written two books we may expect to see two “book” entries represented in some kind of editor database or bibliographic database as “two records”. If they’ve written two editions of the one book, does it appear multiple times? Or do we model that as a revision record? Or maybe each edition gets its own record? If a library or bookstore has 5 copies each of two books, do they record that as books? Is a book really just a copy of a book? Or do we call it a copy? But maybe a library inventory system may just refer to copies as books as well for the purposes of counting the total number of physical books. So we could come up with “copies” and “books” but they can be used interchangeably depending on who’s asking. Or maybe for some systems a Book is something with a hard cover because they want to exclude periodicals, magazines, ebooks? So a “manual” may be classified as a book in some contexts, but not others. Or maybe a book is just a bounded physical unit? But some novels are so long they are actually broken down into two physical elements. Maybe labeled Volume I and Volume II. So then are those separate books or one Book? Or the opposite; maybe multiple novel compositions are bounded together into a single physical unit; but really they are individual works. So we could have a system where the author has written one book, it’s broken into two phsycial volumes, also known as books, and each volume has 5 copies each for a total of ten books. So what is one book? It gets incredibly confusing. So now just try to wrap your head around a Customer, or Patient, or Account, etc. The same polysemes exist there, only far more convoluted and ambiguous. And now when we talk about microservices we talk about the big ball of mud and how we cannot change part of it without re-deploying others. That is the easy part. Reconciling all of the different implicit usages of domain models across multiple contexts slammed into a single application is the hard part.
  7. A is the book checkout system -- book is a physical copy (second edition, volume I, II, etc all individual books) B is the book search system – book can be individual works where a composition may be multiple books and volumes I and II are all the same book C is the checkout reporting engine – a book is what A thinks is a book D is a recommendation engine – a book isn’t even a book, it’s an “interest” which has a mapping between books Book recommendations (D) D also wants to consume messages from A. But the things we need to do in our service is sufficiently different that we want to change the language. A and B have a translation and are coordinating, D is not coordinating and will build an AC that will do the translation. And it’s nobody else’s business how this works. In this case, maybe we have a Book recommendation engine that also reads what gets checked out and whom. Maybe D has some more complicated models it uses for describing recommendations. It wants to use A’s data, but it doesn’t want to conform to A’s domain model. It builds an Anti-Corruption layer to keep its model pure and that can do the translation between its and A’s models.
  8. An order. A customer. An account. A return. A claim. A discount?
  9. This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
  10. We store our data inside this thing…
  11. We store our data inside this thing… Do we really, as developers, understand how to properly use this thing?
  12. Put it all into one big database… No, seriously.. Just do this for your applications. You’ll save yourself a lot of trouble. Focus on
  13. Traditional Databases have tremendous flexibiluty, safety, etc.
  14. Traditional Databases have tremendous flexibiluty, safety, etc.
  15. Traditional Databases have tremendous flexibiluty, safety, etc.
  16. This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
  17. This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
  18. Business Agility!!! Journey … Understand them Test them Change them Different pace – rate of change is key !!! Agile business… Before going to far, we should have a definition about what microservices are. When we talk about microservices, we talk about breaking up complicated, potentially really large systems, whatever they may be, into smaller components. We break them into smaller components so we can understand them individually, test them individually, scale them and ultimately change them at a different pace than the rest of the system. You can imagine having to please every master in a monoithic environment can slow/bogg things down and inhibit change. Which as we discussed in the beginning is the key here. We need to be able to work on systems that can change with the rest of the business as it’s getting even more competitive and disruptive. One of the keys to this flexibility and ability to change is to focus on autonomy. Systems should be designed to be more autonomous so that changes don’t affect other downstream systems, faults don’t ripple across into cascading failures etc. The more dependencies we have (on other systems, protocols, shared libraries, databases, etc) the harder it can be to make changes. So we talk about services having and owning their own data, chosing the right technology for their function, and conciously enforcing modularity through APIs and contracts. Autonomy is key here But autonomoy of systems includes autonomoy of teams as well. Microservices can be a means to an end for a company serious about investing into their digital experiecne they provide to customers. It’s not in and of itself the end goal. It’s part of a digital transformation that encompasses all parts of the organization.
  19. This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
  20. This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
  21. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  22. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  23. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  24. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  25. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  26. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  27. One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
  28. This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
  29. In this scenario we may have established our boundaries… our customer profile service has taken an update to customer preferences. A customer and its profile/preferences may be modeled in other services like our recommendation engine, our master customer SOR, our social alerting engine, etc. And we need to update some important information... So the systems that are interested in this data must first be defined and implemented in code ahead of time. Adding a new system requires changes. Additionally, these downstream systems are not transactional... So if there are errors somewhere, then it’s up to the application try and decide what action to take... And while deciding that action, the application could fail.. And no state is stored about where it left off.. And now we’re in an inconsistent state.
  30. You could try adding compensation logic and stateful tracking of this locally.. And it’s also great practice to implement idempotent consumers.. The problem with is there could be “read uncommitted” issues like dirty reads or dirty writes that happen downstream because of this, and a compensation now gets much more complicated.
  31. We could just try emmitting events and say “whenever this happens over here, we just update a message queue”… now we have to try get consensus between the two systems. This can be expensive, as consensus tends to be, and you also suffer from availability issues..2PC is an anti-availability protocol.
  32. 2PC is perfectly fine when consensus is required, though have to consider the drawbacks. 2PC requires operational overhead to manage the TX Log of the tx manager. Also, you can run into issues with deadlocks when holding locks too long. You can also end up in heuristic situations where one side unilaterally rollsback. Now you need human intervention and reconcilliation logic. People poopoo 2PC, but it may be appropriate in some situations..
  33. Another situation that will tend to come up is identifying boundaries around IO and read/write patterns. How do we get the writes over to the read database? Do we do 2PC from the application? Do we use a message queue?
  34. What about the so called N+1 problem? Where we interact with downstream services, or maybe we take on events and need to enrich them with additional metadata. For example, we may want to group and sort a set of customers that fall into a certain criteria for specific recommendations, and we need to enrich the customer with additional preferences.. So we query for the customer list and then we loop through and enrich each customer.. Can downstream systems sustain this kind of rapid invocation? If they can, are you exposed at all to udnerlying storage incnsistenecies and concurrency issues? Do they just try create bulk APIs? And those APIs are inconsistent across providers (pagination, missed processing of singular elements, etc)
  35. So maybe we set up a cache in front of the service to alleviate the penalties of calling downstream services rapidly… and now what sort of stale data can you deal with? Bounded staleness? How do you handle cache eviction?
  36. We expect some levels of consistency and we need to be able to withstand faults because we know faults occur… But brewer says we can only pick 2 out of consistency, availability, partition tolerance...
  37. Consistency models… the set of allowable histories of operations We say that we read what we wrote
  38. Now, a process is allowed to read the most recently written value from any process, not just itself. The register becomes a place of coordination between two processes; they share state. We relax our model and say when we read, we read the value at the time of the read and take into account other processes writes…
  39. Now, a process is allowed to read the most recently written value from any process, not just itself. The register becomes a place of coordination between two processes; they share state. We relax our model and say when we read, we read the value at the time of the read and take into account other processes writes…
  40. Somewhere (or some node… a database, a service, a set of databases in a cluster) where there is an appearance of an order that is immediately visible to everyone viewing it. Moreover, linearizability’s time bounds guarantee that those changes will be visible to other participants after the operation completes. Hence, linearizability prohibits stale reads
  41. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  42. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  43. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  44. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  45. Score keeper – needs to read most up to date version of the score… cannot do an eventually consistent read (bounded staleness, consistent prefix, monotonic read).. BUT could do a read my writes read Umpire needs to do a strict consistent read to determine at the 9th inning or any afterward whether he can end the game
  46. Show a diagram of consistency related to performance..
  47. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  48. Show a diagram of consistency related to performance..
  49. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  50. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  51. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  52. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
  53. Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.