The hardest part of microservices: your data

•Download as PPTX, PDF•

88 likes•21,465 views

Microservices architecture is a very powerful way to build scalable systems optimized for speed of change. To do this, we need to build independent, autonomous services which by definition tend to minimize dependencies on other systems. One of the tenants of microservices, and a way to minimize dependencies, is “a service should own its own database”. Unfortunately this is a lot easier said than done. Why? Because: your data. We’ve been dealing with data in information systems for 5 decades so isn’t this a solved problem? Yes and no. A lot of the lessons learned are still very relevant. Traditionally, we application developers have accepted the practice of using relational databases and relying on all of their safety guarantees without question. But as we build services architectures that span more than one database (by design, as with microservices), things get harder. If data about a customer changes in one database, how do we reconcile that with other databases (especially where the data storage may be heterogenous?). For developers focused on the traditional enterprise, not only do we have to try to build fast-changing systems that are surrounded by legacy systems, the domains (finance, insurance, retail, etc) are incredibly complicated. Just copying with Netflix does for microservices may or may not be useful. So how do we develop and reason about the boundaries in our system to reduce complexity in the domain? In this talk, we’ll explore these problems and see how Domain Driven Design helps grapple with the domain complexity. We’ll see how DDD concepts like Entities and Aggregates help reason about boundaries based on use cases and how transactions are affected. Once we can identify our transactional boundaries we can more carefully adjust our needs from the CAP theorem to scale out and achieve truly autonomous systems with strictly ordered eventual consistency. We’ll see how technologies like Apache Kafka, Apache Camel and Debezium.io can help build the backbone for these types of systems. We’ll even explore the details of a working example that brings all of this together.

Software

Full slide deck here:
http://bit.ly/ceposta-hardest-part

Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Christian Posta
Principal Architect – Red Hat
• Author “Microservices for Java Developers”
• Committer/contributor Apache Camel, Apache ActiveMQ,
Fabric8.io, Apache Kafka, Debezium.io, et. al.
• Worked with large Microservices, web-scale, unicorn
company

Free download @ http://developers.redhat.com

People try to copy Netflix, but they can only
copy what they see. They copy the
results, not the process.
Adrian Cockcroft, former Chief Cloud Architect, Netflix

“Microservices” is about optimizing… for speed.

• Maybe it doesn’t matter so much… What
we really care about is speed, reduced
time to value, and business outcomes.
• Maybe a data-driven approach is a better
way to answer this question...
Are you doing microservices?

• Number of features accepted
• % of features completed
• User satisfaction
• Feature Cycle time
• defects discovered after deployment
• customer lifetime value (future profit as a result of relationship with the
customer) https://en.wikipedia.org/wiki/Customer_lifetime_value
• revenue per feature
• mean time to recovery
• % improvement in SLA
• number of changes
• number of user complaints, recommendations, suggestions
• % favorable rating in surveys
• % of users using which features
• % reduction in error rates
• avg number of tx / user
• MANY MORE!
Are you doing microservices?

Book checkout / purchase Title Search
Recommendations
Weekly reporting

Focus on domain models, not data models
• Break things into smaller,
understandable models
• Surround a model and its
“context” with a boundary
• Implement the model in code
or get a new model
• Explicitly map between
different contexts
• Model transactional
boundaries as aggregates

Aggregates
• Use the domain to lead you to invariant rules across your domain
model
• Model the invariants and their associated entities/value objects as
“aggregates”
• Aggregates focus on transactional boundaries (ie, transactional in
the “A” from ACID sense)
• Individual aggregates are transactionally consistent
• Aggregates use relaxed consistency models between aggregates
(ie, something like the Actor model?)
• Bounded Contexts use relaxed consistency models between
boundaries

Stick with these conveniences as long as you can.
Seriously.

But ...
• Load/size is too great to fit on one box
• Modules/use cases have different read/write
characteristics
• Queries/joins are getting too complex
• Security issues
• Lots of conflicting changes to the model/schema
• Need denormalized, optimized indexing engines
• We can live with eventual consistency (whatever that
really means)

From here on out, what we’re saying is
“thank you old reliable, awesome database…
we’ve got it from here”…

We need to understand something about the data
inside our services and the data outside our services.
https://msdn.microsoft.com/en-us/library/ms954587.aspx

We’re now building a full-fledged distributed system.
Some things to remember…

Plan for failures.
Build concepts of time, delay,
network, and failures into the
design as a first-class citizen.

How do you “read” data and how do you “update” data.

tx.begin()
c = retrieveCustomer()
c.addNewAddress(address)
tx.add(c)
tx.commit()
publishAddressChange(address, c.id)

tx.begin()
c = retrieveCustomer()
c.addNewAddress(address)
tx.add(c)
publishAddressChange(address, c.id)
tx.commit()

https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/

https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/
getBulkHats()
getBulkHatsForCatsExcept()
wellReallyIJustWantCertainHats()
justExecuteThisSqlForMe()

For our reads and writes, we need some “consistency”.

What is consistency?
The history of past operations we observe as
a reader of the data

We need reads and writes. But we expect failures. This
is starting to sound like a distributed-systems theorem
I’ve heard…

CAP tells us to pick 2: Consistency, Availability,
Partition Tolerance
CAP is a bad way to think about this.

Linearizable (strict) consistency
CAP - C

Consistency models…
https://en.wikipedia.org/wiki/Consistency_model
• Strict consistency (Linearizability)
• Sequential consistency
• Causal consistency
• Processor consistency
• PRAM consistency (FIFO)
• Bounded staleness consistency
• Monotonic read consistency
• Monotonic write consistency
• Read your writes consistency
• Eventual consistency

Can we really use relaxed consistency models?

Tradeoffs to make with read consistency and
performance

Replicated Data Consistency Explained through Baseball
(Doug Terry)
https://www.microsoft.com/en-us/research/publication/
replicated-data-consistency-explained-through-baseball/
• What consistency model do you need, depending on what
role you’re playing?
• What consistency model are you willing to pay for?
• Official score keeper? (Linearizability or RMW)
• Umpire? (Linearizability)
• Sports writer? (Bounded staleness, Eventual consistency)
• Radio updates? (Monotonic read, Bounded staleness)
• Statistician (Bounded staleness)
• Friends in the pub (Eventual consistency)

Maybe we can use a relaxed consistency model for some
of those previously mentioned use cases…

Internet companies created their own tools
for helping with this. (some opensource!!)
• Yelp – MySQL Streamer
https://github.com/Yelp/mysql_streamer
• LinkedIn – Databus
https://github.com/linkedin/databus
• Zendesk – Maxwell
https://github.com/zendesk/maxwell

WePay uses Debezium
https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka

Twitter: @christianposta
Blog: http://blog.christianposta.com
Email: christian@redhat.com
Thanks for listening! Time for demo?

What's hot

Introduction to snowflake

Sunil Gurav

Cloud arch patterns

Corey Huinker

The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads. As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general. This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.

The Parquet Format and Performance Optimization Opportunities

Databricks

Parquet is a very popular column based format. Spark can automatically filter useless data using parquet file statistical data by pushdown filters, such as min-max statistics. On the other hand, Spark user can enable Spark parquet vectorized reader to read parquet files by batch. These features improve Spark performance greatly and save both CPU and IO. Parquet is the default data format of data warehouse in Bytedance. In practice, we find that parquet pushdown filters work poorly resulting in reading too much unnecessary data for statistical data has no discrimination across parquet row groups(column data is out of order when writing to parquet files by ETL jobs).

Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...

Databricks

Mysql data replication

Tuấn Ngô

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

Omid Vahdaty

CI CD Basics

Prabhu Ramkumar

Flink Forward San Francisco 2022. Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo. by Robert Metzger

Autoscaling Flink with Reactive Mode

Flink Forward

"Restoring local state in Kafka Streams applications is indispensable for recovering after a failure or for moving stream processors between Kafka Streams clients. However, restoration has a reputation for being operationally problematic, because a Streams client occupied with restoration of some stream processors blocks other stream processors that are ready from processing new records. When the state is large this can have a considerable impact on the overall throughput of the Streams application. Additionally, when failures interrupt restoration, restoration restarts from the beginning, thus negatively impacting throughput further. In this talk, we will explain how Kafka Streams currently restores local state and processes records. We will show how we decouple processing from restoring by moving restoration to a dedicated thread and how throughput profits from this decoupling. We will present how we avoid restarting restoration from the beginning after a failure. Finally, we will talk about the concurrency and performance problems that we had to overcome and we will present benchmarks that show the effects of our improvements."

Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...

HostedbyConfluent

Bucketing is a partitioning technique that can improve performance in certain data transformations by avoiding data shuffling and sorting. The general idea of bucketing is to partition, and optionally sort, the data based on a subset of columns while it is written out (a one-time cost), while making successive reads of the data more performant for downstream jobs if the SQL operators can make use of this property. Bucketing can enable faster joins (i.e. single stage sort merge join), the ability to short circuit in FILTER operation if the file is pre-sorted over the column in a filter predicate, and it supports quick data sampling. In this session, you’ll learn how bucketing is implemented in both Hive and Spark. In particular, Patil will describe the changes in the Catalyst optimizer that enable these optimizations in Spark for various bucketing scenarios. Facebook’s performance tests have shown bucketing to improve Spark performance from 3-5x faster when the optimization is enabled. Many tables at Facebook are sorted and bucketed, and migrating these workloads to Spark have resulted in a 2-3x savings when compared to Hive. You’ll also hear about real-world applications of bucketing, like loading of cumulative tables with daily delta, and the characteristics that can help identify suitable candidate jobs that can benefit from bucketing.

Hive Bucketing in Apache Spark with Tejas Patil

Databricks

Comcast is one of the leading providers of communications, entertainment, and cable products and services. At the heart of it is Comcast RDK providing the backbone of telemetry to the industry. RDK (Reference Design Kit) is pre-bundled opensource firmware for a complete home platform covering video, broadband and IoT devices. RDK team at Comcast analyzes petabytes of data, collected every 15 minutes from 70 million devices (video and broadband and IoT devices) installed in customer homes. They run ETL and aggregation pipelines and publish analytical dashboards on a daily basis to reduce customer calls and firmware rollout. The analysis is also used to calculate WIFI happiness index which is a critical KPI for Comcast customer experience. In addition to this, RDK team also does release tracking by analyzing the RDK firmware quality. SQL Analytics allows customers to operate a lakehouse architecture that provides data warehousing performance at data lake economics for up to 4x better price/performance for SQL workloads than traditional cloud data warehouses. We present the results of the “Test and Learn” with SQL Analytics and the delta engine that we worked in partnership with the Databricks team. We present a quick demo introducing the SQL native interface, the challenges we faced with migration, The results of the execution and our journey of productionizing this at scale.

SQL Analytics Powering Telemetry Analysis at Comcast

Databricks

(Alex Mironov, Booking.com) Kafka Summit SF 2018 Since its original introduction at Booking.com, Apache Kafka and overall concept of real-time data streaming have come a long way from being a complicated novelty to a common tool, used by a multitude of internal users ranging in their importance from the ad-hoc consumers to business-critical services powering up our property search engine. Over the course of this talk we’ll dive deep into how a relatively small team of SREs is successfully managing a multi-cluster, multi-tenant setup of Kafka and its surrounding ecosystem capable of transporting millions of messages per day. We’ll discuss challenges they faced along their way while building this platform and take a close look not only at application but also at architectural-level decisions they made to overcome them. Surely, we will also review what kind of tooling and automation team is using to stay sane during the day and sleep well during the night.

Data Streaming Ecosystem Management at Booking.com

confluent

Neo4j Training Cypher

Max De Marzi

Using ClickHouse for Experimentation

Gleb Kanterov

Deep Dive into the New Features of Apache Spark 3.0

Databricks

by Mahesh Pakal, AWS PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.

PostgreSQL

Amazon Web Services

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time?

Apache Spark Core—Deep Dive—Proper Optimization

Databricks

Machine learning is overhyped nowadays. There is a strong belief that this area is exclusively for data scientists with a deep mathematical background that leverage Python (scikit-learn, Theano, Tensorflow, etc.) or R ecosystem and use specific tools like Matlab, Octave or similar. Of course, there is a big grain of truth in this statement, but we, Java engineers, also can take the best of machine learning universe from an applied perspective by using our native language and familiar frameworks like Apache Spark. During this introductory presentation, you will get acquainted with the simplest machine learning tasks and algorithms, like regression, classification, clustering, widen your outlook and use Apache Spark MLlib to distinguish pop music from heavy metal and simply have fun. Source code: https://github.com/tmatyashovsky/spark-ml-samples Design by Yarko Filevych: http://filevych.com/

Introduction to ML with Apache Spark MLlib

Taras Matyashovsky

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

Databricks

Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.

Azure data platform overview

James Serra

What's hot (20)

Introduction to snowflake

Cloud arch patterns

The Parquet Format and Performance Optimization Opportunities

Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...

Mysql data replication

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

CI CD Basics

Autoscaling Flink with Reactive Mode

Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...

Hive Bucketing in Apache Spark with Tejas Patil

SQL Analytics Powering Telemetry Analysis at Comcast

Data Streaming Ecosystem Management at Booking.com

Neo4j Training Cypher

Using ClickHouse for Experimentation

Deep Dive into the New Features of Apache Spark 3.0

PostgreSQL

Apache Spark Core—Deep Dive—Proper Optimization

Introduction to ML with Apache Spark MLlib

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

Azure data platform overview

Similar to The hardest part of microservices: your data

A microservices journey - Round 2

Christian Posta

Microservices with Apache Camel, Docker and Fabric8 v2

Christian Posta

Exploring Twitter's Finagle technology stack for microservices

💡 Tomasz Kogut

How to: - Design a clean domain model - Model your application's use cases as application services - Connect those well-designed layers to the world outside Protecting your high quality domain model can be accomplished by applying a so-called ports & adapters or hexagonal architecture. Some of the keywords for this talk: aggregate design, domain events, application services, commands, queries and events, layered architecture, ports & adapters, hexagonal architecture.

Advanced web application architecture Way2Web

Matthias Noback

We consider a microservices architecture to achieve an end goal, not because it's "the cool thing to do". Every organization looking to adopt this architecture must realize (and adhere) to a set of foundational principles. Guided by those principles, we can correctly choose the technology to help support a microservices architecture and meet our end goals. This talk explains those core principles and gives you the tools needed for your microservices journey.

Microservices Journey Summer 2017

Christian Posta

Advanced web application architecture - Talk

Matthias Noback

The Hardest Part of Microservices: Calling Your Services

Christian Posta

Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe

Flip Kromer

PHX DevOps Days: Service Mesh Landscape

Christian Posta

Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud

Rick Bilodeau

Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud

Streamsets Inc.

DDD, CQRS and testing with ASP.Net MVC

Andy Butland

One of the main issues with ML and DL deployment is finding the right way to train and operationalize the model within the company. Serverless approach for deep learning provides simple, scalable, affordable yet reliable architecture. The challenge of this approach is to keep in mind certain limitations in CPU, GPU and RAM, and organize training and inference of your model. My presentation will show how to utilize services like Amazon SageMaker, AWS Batch, AWS Fargate, AWS Lambda and AWS Step Functions to organize deep learning workflows.

DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...

Rustem Feyzkhanov

Microservices for java architects it-symposium-2015-09-15

Derek Ashmore

DjangoCon 2010 Scaling Disqus

zeeg

API World: The service-mesh landscape

Christian Posta

Ratpack and Grails 3

GR8Conf

Ratpack and Grails 3

Lari Hotari

Apache Beam: A unified model for batch and stream processing data

DataWorks Summit/Hadoop Summit

Serverless solution architecture in AWS

Runcy Oommen

Similar to The hardest part of microservices: your data (20)

A microservices journey - Round 2

Microservices with Apache Camel, Docker and Fabric8 v2

Exploring Twitter's Finagle technology stack for microservices

Advanced web application architecture Way2Web

Microservices Journey Summer 2017

Advanced web application architecture - Talk

The Hardest Part of Microservices: Calling Your Services

Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe

PHX DevOps Days: Service Mesh Landscape

Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud

Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud

DDD, CQRS and testing with ASP.Net MVC

DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...

Microservices for java architects it-symposium-2015-09-15

DjangoCon 2010 Scaling Disqus

API World: The service-mesh landscape

Ratpack and Grails 3

Apache Beam: A unified model for batch and stream processing data

Serverless solution architecture in AWS

Recently uploaded

SQL Injection Introduction and Prevention

Mohammed Fazuluddin

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf

kalichargn70th171

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Lu Qiu (Data & AI Platform Tech Lead, @Alluxio) - Siyuan Sheng (Senior Software Engineer, @Alluxio) Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub. In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn: - The data loading challenges hindering GPU utilization - The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT - Real-world examples of boosting model performance and GPU utilization through optimized data access

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

Alluxio, Inc.

APVP,apvp apvp High quality supplier safe spot transport, 98% purity

amy56318795

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Eric Wang (Software Engineer, @Uber) Uber has numerous deep learning models, most of which are highly complex with many layers and a vast number of features. Understanding how these models work is challenging and demands significant resources to experiment with various training algorithms and feature sets. With ML explainability, the ML team aims to bring transparency to these models, helping to clarify their predictions and behavior. This transparency also assists the operations and legal teams in explaining the reasons behind specific prediction outcomes. In this talk, Eric Wang will discuss the methods Uber used for explaining deep learning models and how we integrated these methods into the Uber AI Michelangelo ecosystem to support offline explaining.

AI/ML Infra Meetup | ML explainability in Michelangelo

Alluxio, Inc.

5 Reasons Driving Warehouse Management Systems Demand

Canary7-Warehouse Management System

"Introduction to Windows 7" serves as the foundational chapter in our guide, setting the stage for understanding the key features and functionalities of this operating system. Windows 7, released by Microsoft in 2009, quickly became one of the most popular and widely used versions of Windows due to its user-friendly interface, stability, and performance improvements over its predecessor, Windows Vista. This chapter begins by providing an overview of the Windows 7 operating system, highlighting its key attributes and improvements compared to earlier versions of Windows. It introduces users to the visual enhancements such as Aero Glass, the revamped taskbar (also known as the Superbar), and the redesigned Start menu, which all contribute to a more intuitive and streamlined user experience. Furthermore, "Introduction to Windows 7" delves into the architecture and system requirements of the operating system, helping users understand what hardware specifications are necessary for optimal performance. It covers topics such as processor requirements, RAM, disk space, and graphics capabilities, ensuring that readers have a clear 3 understanding of the hardware prerequisites for running Windows 7 smoothly. Additionally, this chapter explores the various editions of Windows 7, including Home Premium, Professional, Ultimate, and Enterprise, outlining the differences between them and helping users choose the edition that best suits their needs and requirements. Moreover, "Introduction to Windows 7" provides an overview of the installation process, guiding users through the steps required to install or upgrade to Windows 7 on their computers. It covers topics such as preparing for installation, choosing the installation type (upgrade or custom), partitioning disks, and configuring initial settings. In summary, "Introduction to Windows 7" serves as a comprehensive primer for users who are new to the operating system or seeking to refresh their understanding. By familiarizing themselves with the core concepts and features of Windows 7, readers can lay a solid foundation for exploring more advanced topics covered in subsequent chapters of this guide.

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf

mbmh111980

How to pick right visual testing tool.pdf

Testgrid.io

Top Mobile App Development Companies 2024

XongoLab Technologies LLP

In his book, The Nature of the Physical World, Sir Arthur Eddington commented that “We have to appeal to the one outstanding law — the second law of thermodynamics — to put some sense into the world.” This sense-making goes beyond the physical world, too. Entropy is also essential in the fields of information and communication theory. During this lecture for the Princeton Plasma Physics Laboratory, lecturer Andrea Goulet discussed the application of entropy-related concepts in two communication systems: software and collaborative teams. She examined how concepts that help us understand systemic statistical disorder, such as ergodic systems, Lyapunov exponents, Kolmogorov-Sinai entropy, and Shannon-entropy can help us optimize for both software quality and innovation. She also provided several domain-specific models: Lehman’s Laws and Conway’s Law for software, as well as new models from her own research that relate to entropy and innovation. Entropy helps us understand the world and achieve great things. There is an underlying beauty in its principles that we can use to advance scientific discovery. When we understand the subtleties related to balancing surprise and structure, we increase our chances for effective collaboration and finding novel solutions to complex problems.

Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...

Andrea Goulet

GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates

Neo4j

COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...

naitiksharma1124

How to install and activate eGrabber JobGrabber

eGrabber

AI/ML Infra Meetup | Perspective on Deep Learning Framework

Alluxio, Inc.

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1

KnowledgeSeed

The Impact of PLM Software on Fashion Production

Wave PLM

INGKA DIGITAL: Linked Metadata by Design

Neo4j

10 Essential Software Testing Tools You Need to Know About.pdf

kalichargn70th171

JustNaik Solution Deck (stage bus sector)

Max Lee

This is a classic migration case study (the past, current and the future) at scale from a world-wide company transitioning from Confluent Platform and Confluent Cloud to self-managed Apache Kafka on Kubernetes using Strimzi. At Maersk, we have been architecting, designing and implementing our 3rd generation Event Streaming Platform. This platform is based on Kubernetes in Azure and using Strimzi to operate Apache Kafka at large scale, highly reliable, segregating data based on isolated use cases. Our 2nd generation was based on OnPrem Confluent Platform and Confluent Cloud and this presentation is the story of this migration and reasoning behind it. Furthermore, we would get into details on how we monitor (Grafana, Prometheus), alert (GoAlert and alert as code), operate and provide self-service solutions on top of Strimzi to enable business critical application in Maersk, implemented in GoLang using the GitOps deployment model with Flux and Kustomization among others. Finally, if time allows we will end with a demo of an open-source self service tool to monitor and explore the cluster with most wanted features such as topic message browsing and configuring and restarting connectors.

StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf

steffenkarlsson2

Recently uploaded (20)

SQL Injection Introduction and Prevention

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

APVP,apvp apvp High quality supplier safe spot transport, 98% purity

AI/ML Infra Meetup | ML explainability in Michelangelo

5 Reasons Driving Warehouse Management Systems Demand

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf

How to pick right visual testing tool.pdf

Top Mobile App Development Companies 2024

Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...

GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates

COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...

How to install and activate eGrabber JobGrabber

AI/ML Infra Meetup | Perspective on Deep Learning Framework

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1

The Impact of PLM Software on Fashion Production

INGKA DIGITAL: Linked Metadata by Design

10 Essential Software Testing Tools You Need to Know About.pdf

JustNaik Solution Deck (stage bus sector)

StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf

The hardest part of microservices: your data

2. Full slide deck here: http://bit.ly/ceposta-hardest-part

3. Twitter: @christianposta Blog: http://blog.christianposta.com Email: christian@redhat.com Christian Posta Principal Architect – Red Hat • Author “Microservices for Java Developers” • Committer/contributor Apache Camel, Apache ActiveMQ, Fabric8.io, Apache Kafka, Debezium.io, et. al. • Worked with large Microservices, web-scale, unicorn company

4. Free download @ http://developers.redhat.com

7. People try to copy Netflix, but they can only copy what they see. They copy the results, not the process. Adrian Cockcroft, former Chief Cloud Architect, Netflix

8. “Microservices” is about optimizing… for speed.

9. • Maybe it doesn’t matter so much… What we really care about is speed, reduced time to value, and business outcomes. • Maybe a data-driven approach is a better way to answer this question... Are you doing microservices?

10. • Number of features accepted • % of features completed • User satisfaction • Feature Cycle time • defects discovered after deployment • customer lifetime value (future profit as a result of relationship with the customer) https://en.wikipedia.org/wiki/Customer_lifetime_value • revenue per feature • mean time to recovery • % improvement in SLA • number of changes • number of user complaints, recommendations, suggestions • % favorable rating in surveys • % of users using which features • % reduction in error rates • avg number of tx / user • MANY MORE! Are you doing microservices?

11. How does your company go fast?

12. Manage dependencies.

13. Data is a major dependency.

14. Wait. What is data?

15. What is one “thing”?

16. Book checkout / purchase Title Search Recommendations Weekly reporting

17.

18. Focus on domain models, not data models • Break things into smaller, understandable models • Surround a model and its “context” with a boundary • Implement the model in code or get a new model • Explicitly map between different contexts • Model transactional boundaries as aggregates

19. Aggregates • Use the domain to lead you to invariant rules across your domain model • Model the invariants and their associated entities/value objects as “aggregates” • Aggregates focus on transactional boundaries (ie, transactional in the “A” from ACID sense) • Individual aggregates are transactionally consistent • Aggregates use relaxed consistency models between aggregates (ie, something like the Actor model?) • Bounded Contexts use relaxed consistency models between boundaries

20.

21.

22.

23.

24.

25.

26.

27.

28. Stick with these conveniences as long as you can. Seriously.

29. But ... • Load/size is too great to fit on one box • Modules/use cases have different read/write characteristics • Queries/joins are getting too complex • Security issues • Lots of conflicting changes to the model/schema • Need denormalized, optimized indexing engines • We can live with eventual consistency (whatever that really means)

30. From here on out, what we’re saying is “thank you old reliable, awesome database… we’ve got it from here”…

31. Kinda looks like a combinatorial mess….

32. “A microservice has its own database”

33.

34. How do we deal with data in this world?

35. We need to understand something about the data inside our services and the data outside our services. https://msdn.microsoft.com/en-us/library/ms954587.aspx

36. Data inside a service

37. Data inside a service

38. Data outside a service

39. Data outside a service

40. Data outside a service

41. We’re now building a full-fledged distributed system. Some things to remember…

42.

43.

44.

45.

46.

47.

48.

49. Plan for failures. Build concepts of time, delay, network, and failures into the design as a first-class citizen.

50. How do you “read” data and how do you “update” data.

51. tx.begin() c = retrieveCustomer() c.addNewAddress(address) tx.add(c) tx.commit() publishAddressChange(address, c.id)

52. tx.begin() c = retrieveCustomer() c.addNewAddress(address) tx.add(c) publishAddressChange(address, c.id) tx.commit()

53.

54.

55.

56.

57. Separate reads and writes (CQRS)

58. https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/

59. https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/ getBulkHats() getBulkHatsForCatsExcept() wellReallyIJustWantCertainHats() justExecuteThisSqlForMe()

60. https://secure.phabricator.com/book/phabcontrib/article/n_plus_one/

61. For our reads and writes, we need some “consistency”.

62. What is consistency? The history of past operations we observe as a reader of the data

63.

64.

65. We need reads and writes. But we expect failures. This is starting to sound like a distributed-systems theorem I’ve heard…

66. CAP tells us to pick 2: Consistency, Availability, Partition Tolerance CAP is a bad way to think about this.

67.

68. Linearizable (strict) consistency CAP - C

69. Sequential consistency

70. Monotonic reads consistency

71. Eventual consistency

72. Consistency models… https://en.wikipedia.org/wiki/Consistency_model • Strict consistency (Linearizability) • Sequential consistency • Causal consistency • Processor consistency • PRAM consistency (FIFO) • Bounded staleness consistency • Monotonic read consistency • Monotonic write consistency • Read your writes consistency • Eventual consistency

73. Can we really use relaxed consistency models?

74. Tradeoffs to make with read consistency and performance

75. Replicated Data Consistency Explained through Baseball (Doug Terry) https://www.microsoft.com/en-us/research/publication/ replicated-data-consistency-explained-through-baseball/ • What consistency model do you need, depending on what role you’re playing? • What consistency model are you willing to pay for? • Official score keeper? (Linearizability or RMW) • Umpire? (Linearizability) • Sports writer? (Bounded staleness, Eventual consistency) • Radio updates? (Monotonic read, Bounded staleness) • Statistician (Bounded staleness) • Friends in the pub (Eventual consistency)

76. Replicated Data Consistency Explained through Baseball (Doug Terry) https://www.microsoft.com/en-us/research/publication/ replicated-data-consistency-explained-through-baseball/

77. Maybe we can use a relaxed consistency model for some of those previously mentioned use cases…

78. Example relaxing consistency…

79. Internet companies created their own tools for helping with this. (some opensource!!) • Yelp – MySQL Streamer https://github.com/Yelp/mysql_streamer • LinkedIn – Databus https://github.com/linkedin/databus • Zendesk – Maxwell https://github.com/zendesk/maxwell

80. Meet debezium.io

81.

82. Meet debezium.io

83.

84.

85.

86.

87. WePay uses Debezium https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka

88. Meet debezium.io

89. Twitter: @christianposta Blog: http://blog.christianposta.com Email: christian@redhat.com Thanks for listening! Time for demo?

Editor's Notes

Speed!!!.... As in performance? Or scale? What is this speed thing all about? This is a very different way of thinking about IT. Typically IT is optimized for Cost. Many parts of the business are. We’re not product companies anymore…. IT was traditionally used to transform otherwise paper processes or manual processes. And to support things like CRM, Accounting, Procurement, etc. Internally supporting. But now companies are using IT to deliver value through services. In fact, startups, are finding out to deliver value through digital channels and are quickly disrupting old guard enterprise corporations. We are service companies. Services require bi-direction/omni-directional interactions, communication with our customers. Creating value is done with customers. The faster you can get things to market the faster you can see what works and what doesn’t. We don’t know what will work up front. We don’t know what will deliver business value up front. We need to discover it. What we want is to build an organization that’s able to experiment, fail fast, and iterate on what does work. We basically want IT to drive outcomes that deliver business value. And we want to go fast.
The discovery of what’s important, and the experimentation process leads us to want to find business value. We want to quickly find out the things that don’t work and minimize the cost it takes to do these experiments. This transformation is a process, not something that happens over night, and not something you can copy. You’ll even note that each organization is different in how it can go about this process; each needs to balance speed, safety, and business value for itself.
Get back to first principles. Focus on principles, patterns, methodologies. Tools will help, but you cannot start with tools.
Autonomy….
What is it? Who defines it? Who owns that definition? Who owns the instances of it. How do I get it? How do I not miss something? And if I do solve these questions what does the architecture look like? Bunch of point to point connections? Lots of big up front design? Lots of contracts and governance? These things tend to break autonomy. Let’s explore this a bit and see what problems we run into with data in a microservices world.
Now, understanding the domain, understanding the data model, an understanding where the boundaries is complex stuff. It cannot be solved with technology alone. Let me give you a simple example…. This seems like a simple, even absurd question. It’s really not. This one simple question can illustrate how ambiguous contradictory our language is with respect to understanding “real life” We cannot understand how to store a representation of a perceived “real life” unless we can describe it in plain language without ambiguity. What is a book and how do we represent it? We need to first understand what “one book is.” If an author has written two books we may expect to see two “book” entries represented in some kind of editor database or bibliographic database as “two records”. If they’ve written two editions of the one book, does it appear multiple times? Or do we model that as a revision record? Or maybe each edition gets its own record? If a library or bookstore has 5 copies each of two books, do they record that as books? Is a book really just a copy of a book? Or do we call it a copy? But maybe a library inventory system may just refer to copies as books as well for the purposes of counting the total number of physical books. So we could come up with “copies” and “books” but they can be used interchangeably depending on who’s asking. Or maybe for some systems a Book is something with a hard cover because they want to exclude periodicals, magazines, ebooks? So a “manual” may be classified as a book in some contexts, but not others. Or maybe a book is just a bounded physical unit? But some novels are so long they are actually broken down into two physical elements. Maybe labeled Volume I and Volume II. So then are those separate books or one Book? Or the opposite; maybe multiple novel compositions are bounded together into a single physical unit; but really they are individual works. So we could have a system where the author has written one book, it’s broken into two phsycial volumes, also known as books, and each volume has 5 copies each for a total of ten books. So what is one book? It gets incredibly confusing. So now just try to wrap your head around a Customer, or Patient, or Account, etc. The same polysemes exist there, only far more convoluted and ambiguous. And now when we talk about microservices we talk about the big ball of mud and how we cannot change part of it without re-deploying others. That is the easy part. Reconciling all of the different implicit usages of domain models across multiple contexts slammed into a single application is the hard part.
A is the book checkout system -- book is a physical copy (second edition, volume I, II, etc all individual books) B is the book search system – book can be individual works where a composition may be multiple books and volumes I and II are all the same book C is the checkout reporting engine – a book is what A thinks is a book D is a recommendation engine – a book isn’t even a book, it’s an “interest” which has a mapping between books Book recommendations (D) D also wants to consume messages from A. But the things we need to do in our service is sufficiently different that we want to change the language. A and B have a translation and are coordinating, D is not coordinating and will build an AC that will do the translation. And it’s nobody else’s business how this works. In this case, maybe we have a Book recommendation engine that also reads what gets checked out and whom. Maybe D has some more complicated models it uses for describing recommendations. It wants to use A’s data, but it doesn’t want to conform to A’s domain model. It builds an Anti-Corruption layer to keep its model pure and that can do the translation between its and A’s models.
An order. A customer. An account. A return. A claim. A discount?
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
We store our data inside this thing…
We store our data inside this thing… Do we really, as developers, understand how to properly use this thing?
Put it all into one big database… No, seriously.. Just do this for your applications. You’ll save yourself a lot of trouble. Focus on
Traditional Databases have tremendous flexibiluty, safety, etc.
Traditional Databases have tremendous flexibiluty, safety, etc.
Traditional Databases have tremendous flexibiluty, safety, etc.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
Business Agility!!! Journey … Understand them Test them Change them Different pace – rate of change is key !!! Agile business… Before going to far, we should have a definition about what microservices are. When we talk about microservices, we talk about breaking up complicated, potentially really large systems, whatever they may be, into smaller components. We break them into smaller components so we can understand them individually, test them individually, scale them and ultimately change them at a different pace than the rest of the system. You can imagine having to please every master in a monoithic environment can slow/bogg things down and inhibit change. Which as we discussed in the beginning is the key here. We need to be able to work on systems that can change with the rest of the business as it’s getting even more competitive and disruptive. One of the keys to this flexibility and ability to change is to focus on autonomy. Systems should be designed to be more autonomous so that changes don’t affect other downstream systems, faults don’t ripple across into cascading failures etc. The more dependencies we have (on other systems, protocols, shared libraries, databases, etc) the harder it can be to make changes. So we talk about services having and owning their own data, chosing the right technology for their function, and conciously enforcing modularity through APIs and contracts. Autonomy is key here But autonomoy of systems includes autonomoy of teams as well. Microservices can be a means to an end for a company serious about investing into their digital experiecne they provide to customers. It’s not in and of itself the end goal. It’s part of a digital transformation that encompasses all parts of the organization.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
One large database! We should focus on how we design our data models so that they can be sharded and distributed…. Focus on transactions, etc not 2PC
This concept of defining language, developing models to describe a domain, implementing those models, enforcing assertions, etc all happen within a certain context, and that context is vitally important in software. In common language, we are smart enough to resolve these types of language conflicts within a sentence because of its context. The computer doesn’t have this context. We have to make it explicit. And any context needs to have explicit boundaries. This model needs to be “useful” ie, it should be able to be implemented. Try to establish a model that’s both useful for discussion with the domain experts and is implementable. There are infinite ways to model/think about something. Balance both masters with the model you choose. Large complex domains may need multiple models. And really the only way to understand a language and model is within a certain context. That context should have boundaries so it doesn’t bleed or force others to bleed definitions and semantics. Bounded context: within this space, this is the context of the language. This is what it means and it’s not ambiguous. Central thing about a model is the language you create to express the prblem and solution very crisply. Need clear language and need boundaries. Anti corruption layers are translations between the different models that may exist in multiple bounded contexts. They keep an internal model consistent and pure without bleeding across the boundaries. Bounded contexts tend to be “self contained systems” themselves with a complete vertical stack of the software including UI, business logic, data models, and database. They tend to not share databases across multiple models.
In this scenario we may have established our boundaries… our customer profile service has taken an update to customer preferences. A customer and its profile/preferences may be modeled in other services like our recommendation engine, our master customer SOR, our social alerting engine, etc. And we need to update some important information... So the systems that are interested in this data must first be defined and implemented in code ahead of time. Adding a new system requires changes. Additionally, these downstream systems are not transactional... So if there are errors somewhere, then it’s up to the application try and decide what action to take... And while deciding that action, the application could fail.. And no state is stored about where it left off.. And now we’re in an inconsistent state.
You could try adding compensation logic and stateful tracking of this locally.. And it’s also great practice to implement idempotent consumers.. The problem with is there could be “read uncommitted” issues like dirty reads or dirty writes that happen downstream because of this, and a compensation now gets much more complicated.
We could just try emmitting events and say “whenever this happens over here, we just update a message queue”… now we have to try get consensus between the two systems. This can be expensive, as consensus tends to be, and you also suffer from availability issues..2PC is an anti-availability protocol.
2PC is perfectly fine when consensus is required, though have to consider the drawbacks. 2PC requires operational overhead to manage the TX Log of the tx manager. Also, you can run into issues with deadlocks when holding locks too long. You can also end up in heuristic situations where one side unilaterally rollsback. Now you need human intervention and reconcilliation logic. People poopoo 2PC, but it may be appropriate in some situations..
Another situation that will tend to come up is identifying boundaries around IO and read/write patterns. How do we get the writes over to the read database? Do we do 2PC from the application? Do we use a message queue?
What about the so called N+1 problem? Where we interact with downstream services, or maybe we take on events and need to enrich them with additional metadata. For example, we may want to group and sort a set of customers that fall into a certain criteria for specific recommendations, and we need to enrich the customer with additional preferences.. So we query for the customer list and then we loop through and enrich each customer.. Can downstream systems sustain this kind of rapid invocation? If they can, are you exposed at all to udnerlying storage incnsistenecies and concurrency issues? Do they just try create bulk APIs? And those APIs are inconsistent across providers (pagination, missed processing of singular elements, etc)
So maybe we set up a cache in front of the service to alleviate the penalties of calling downstream services rapidly… and now what sort of stale data can you deal with? Bounded staleness? How do you handle cache eviction?
We expect some levels of consistency and we need to be able to withstand faults because we know faults occur… But brewer says we can only pick 2 out of consistency, availability, partition tolerance...
Consistency models… the set of allowable histories of operations We say that we read what we wrote
Now, a process is allowed to read the most recently written value from any process, not just itself. The register becomes a place of coordination between two processes; they share state. We relax our model and say when we read, we read the value at the time of the read and take into account other processes writes…
Now, a process is allowed to read the most recently written value from any process, not just itself. The register becomes a place of coordination between two processes; they share state. We relax our model and say when we read, we read the value at the time of the read and take into account other processes writes…
Somewhere (or some node… a database, a service, a set of databases in a cluster) where there is an appearance of an order that is immediately visible to everyone viewing it. Moreover, linearizability’s time bounds guarantee that those changes will be visible to other participants after the operation completes. Hence, linearizability prohibits stale reads
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Score keeper – needs to read most up to date version of the score… cannot do an eventually consistent read (bounded staleness, consistent prefix, monotonic read).. BUT could do a read my writes read Umpire needs to do a strict consistent read to determine at the 9th inning or any afterward whether he can end the game
Show a diagram of consistency related to performance..
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Show a diagram of consistency related to performance..
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.
Different users will see my message at different times–but each user will see my operations in order. Once seen, a post shouldn’t disappear.

The hardest part of microservices: your data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The hardest part of microservices: your data

Similar to The hardest part of microservices: your data (20)

More from Christian Posta

More from Christian Posta (20)

Recently uploaded

Recently uploaded (20)

The hardest part of microservices: your data

Editor's Notes