This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
Building Cloud-Native App Series - Part 2 of 11
Microservices Architecture Series
Event Sourcing & CQRS,
Kafka, Rabbit MQ
Case Studies (E-Commerce App, Movie Streaming, Ticket Booking, Restaurant, Hospital Management)
Building Cloud-Native App Series - Part 7 of 11
Microservices Architecture Series
Containers Docker Kind Kubernetes Istio
- Pods
- ReplicaSet
- Deployment (Canary, Blue-Green)
- Ingress
- Service
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
Building Cloud-Native App Series - Part 1 of 11
Microservices Architecture Series
Design Thinking, Lean Startup, Agile (Kanban, Scrum),
User Stories, Domain-Driven Design
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Building Cloud-Native App Series - Part 2 of 11
Microservices Architecture Series
Event Sourcing & CQRS,
Kafka, Rabbit MQ
Case Studies (E-Commerce App, Movie Streaming, Ticket Booking, Restaurant, Hospital Management)
Building Cloud-Native App Series - Part 7 of 11
Microservices Architecture Series
Containers Docker Kind Kubernetes Istio
- Pods
- ReplicaSet
- Deployment (Canary, Blue-Green)
- Ingress
- Service
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
Building Cloud-Native App Series - Part 1 of 11
Microservices Architecture Series
Design Thinking, Lean Startup, Agile (Kanban, Scrum),
User Stories, Domain-Driven Design
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
In this session, we will explore key challenges with function interactions and coordination, addressing these problems using Enterprise Integration Patterns (EIP) and modern approaches with the latest innovations from the Apache Camel community:
Apache Camel is the Swiss army knife of integration, and the most powerful integration framework. In this session you will hear about the latest features in the brand new 3rd generation.
Camel K, is a lightweight integration platform that enables Enterprise Integration Patterns to be used natively on any Kubernetes cluster. When used in combination with Knative, a framework that adds serverless building blocks to Kubernetes, and the subatomic execution environment of Quarkus, Camel K can mix serverless features such as auto-scaling, scaling to zero, and event-based communication with the outstanding integration capabilities of Apache Camel.
- Apache Camel 3
- Camel K
- Camel Quarkus
We will show how Camel K works. We’ll also use examples to demonstrate how Camel K makes it easier to connect to cloud services or enterprise applications using some of the 300 components that Camel provides.
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
I see the following topics coming up more regularly in conversations with customers, prospects, and the broader Kafka community across the globe:
Kappa Architecture: Kappa goes mainstream to replace Lambda and Batch pipelines (that does not mean that there is no batch processing anymore). Examples: Kafka-powered Kappa architectures from Uber, Disney, Shopify, and Twitter.
Hyper-personalized Omnichannel: Retail and customer communication across online and offline channels becomes the new black, including context-specific upselling, recommendations, and location-based services. Examples: Omnichannel Retail and Customer 360 in Real-Time with Apache Kafka.
Multi-Cloud Deployments: Business units and IT infrastructures span across regions, continents, and cloud providers. Linking clusters for bi-directional replication of data in real-time becomes crucial for many business models. Examples: Global Kafka deployments.
Edge Analytics: Low latency requirements, cost efficiency, or security requirements enforce the deployment of (some) event streaming use cases at the far edge (i.e., outside a data center), for instance, for predictive maintenance and quality assurance on the shop floor level in smart factories. Examples: Edge analytics with Kafka.
Real-time Cybersecurity: Situational awareness and threat intelligence need to process massive data in real-time to defend against cyberattacks successfully. The many successful ransomware attacks across the globe in 2021 were a warning for most CIOs. Examples: Cybersecurity for situational awareness and threat intelligence in real-time.
CSI – IT2020, IIT Mumbai, October 6th 2017
Computer Society of India, Mumbai Chapter
The presentation focuses on Microservices architecture and the comparison between MicroService with Standard Monolithic Apps and SOA based Apps. It also gives a quick outline of Domain Driven Design, Event Sourcing and CQRS, Functional Reactive Programming and comparison of SAGA pattern with 2 Phase Commit.
http://www.csimumbai.org/it2020-17/index.html
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
Microservice architectures are not free lunch! Microservices need to be decoupled, flexible, operationally transparent, data aware and elastic. Most material from last years only discusses point-to-point architectures with inflexible and non-scalable technologies like REST / HTTP. This video takes a look at cutting edge technologies like Apache Kafka, Kubernetes, Envoy, Linkerd and Istio to implement a cloud-native service mesh to solve these challenges and bring microservices to the next level of scale, speed and efficiency.
Key takeaways:
- Apache Kafka decouples services, including event streams and request-response
- Kubernetes provides a cloud-native infrastructure for the Kafka ecosystem
- Service Mesh helps with security and observability at ecosystem / organization scale
- Envoy and Istio sit in the layer above Kafka and are orthogonal to the goals Kafka addresses
Blog post: http://www.kai-waehner.de/blog/2019/09/24/cloud-native-apache-kafka-kubernetes-envoy-istio-linkerd-service-mesh
Video recording of this slide deck: https://youtu.be/Us_C4RFOUrA
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Preparing for a future Microservices journey using DDD & Wardley MapsSusanne Kaiser
The journey to Microservices can be very challenging. Identifying proper boundaries, integrating services, and handling infrastructure and operational complexities that Microservices come with can be very overwhelming.
How not to loose sight and to cope with those challenges and still delivering user and business value? One approach could be to focus on that part of your business that gives most competitive advantage - your core domain - and outsource undifferentiating commodities to utility suppliers.
Domain Driven Design combined with Wardley Maps can help us to understand the problem domain and to focus on the core domain.
In this talk Susanne will show how Domain Driven Design and Wardley Maps can be used together to visualise how a value chain can evolve during a Microservices journey and keeping focus on your core domain.
KubeCon EU 2022: From Kubernetes to PaaS to Err What's NextDaniel Bryant
Developers building applications on Kubernetes today are being asked to not just code applications -- they are also responsible for shipping and running their applications, too. We often talk about needing a Kubernetes platform, but are we really looking for a PaaS? Or instead, are we looking for some kind of developer control plane with a Goldilock-sized collection of tools that provides just the right amount of platform? This talk will look back on my experience of building platforms, both as an end-user and now as part of an organization helping our customers do the same. The key takeaways are:
- Treat platform as a product
- Realize that you can’t have good developer experience (DevEx) without good UX
- Focus on workflows and tooling interoperability
We’ll wrap this talk with a walk-through of the CNCF ecosystem through the developer control plane lens, and look at what’s next in the future of this important emerging category.
In this session we’ll take a high-level overview of AWS Lambda, a serverless compute platform that has changed the way that developers around the world build applications. We’ll explore how Lambda works under the hood, the capabilities it has, and how it is used. By the end of this talk you’ll know how to create Lambda based applications and deploy and manage them easily.
Speaker: Chris Munns - Principal Developer Advocate, AWS Serverless Applications, AWS
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
In this session, we will explore key challenges with function interactions and coordination, addressing these problems using Enterprise Integration Patterns (EIP) and modern approaches with the latest innovations from the Apache Camel community:
Apache Camel is the Swiss army knife of integration, and the most powerful integration framework. In this session you will hear about the latest features in the brand new 3rd generation.
Camel K, is a lightweight integration platform that enables Enterprise Integration Patterns to be used natively on any Kubernetes cluster. When used in combination with Knative, a framework that adds serverless building blocks to Kubernetes, and the subatomic execution environment of Quarkus, Camel K can mix serverless features such as auto-scaling, scaling to zero, and event-based communication with the outstanding integration capabilities of Apache Camel.
- Apache Camel 3
- Camel K
- Camel Quarkus
We will show how Camel K works. We’ll also use examples to demonstrate how Camel K makes it easier to connect to cloud services or enterprise applications using some of the 300 components that Camel provides.
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
I see the following topics coming up more regularly in conversations with customers, prospects, and the broader Kafka community across the globe:
Kappa Architecture: Kappa goes mainstream to replace Lambda and Batch pipelines (that does not mean that there is no batch processing anymore). Examples: Kafka-powered Kappa architectures from Uber, Disney, Shopify, and Twitter.
Hyper-personalized Omnichannel: Retail and customer communication across online and offline channels becomes the new black, including context-specific upselling, recommendations, and location-based services. Examples: Omnichannel Retail and Customer 360 in Real-Time with Apache Kafka.
Multi-Cloud Deployments: Business units and IT infrastructures span across regions, continents, and cloud providers. Linking clusters for bi-directional replication of data in real-time becomes crucial for many business models. Examples: Global Kafka deployments.
Edge Analytics: Low latency requirements, cost efficiency, or security requirements enforce the deployment of (some) event streaming use cases at the far edge (i.e., outside a data center), for instance, for predictive maintenance and quality assurance on the shop floor level in smart factories. Examples: Edge analytics with Kafka.
Real-time Cybersecurity: Situational awareness and threat intelligence need to process massive data in real-time to defend against cyberattacks successfully. The many successful ransomware attacks across the globe in 2021 were a warning for most CIOs. Examples: Cybersecurity for situational awareness and threat intelligence in real-time.
CSI – IT2020, IIT Mumbai, October 6th 2017
Computer Society of India, Mumbai Chapter
The presentation focuses on Microservices architecture and the comparison between MicroService with Standard Monolithic Apps and SOA based Apps. It also gives a quick outline of Domain Driven Design, Event Sourcing and CQRS, Functional Reactive Programming and comparison of SAGA pattern with 2 Phase Commit.
http://www.csimumbai.org/it2020-17/index.html
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
Microservice architectures are not free lunch! Microservices need to be decoupled, flexible, operationally transparent, data aware and elastic. Most material from last years only discusses point-to-point architectures with inflexible and non-scalable technologies like REST / HTTP. This video takes a look at cutting edge technologies like Apache Kafka, Kubernetes, Envoy, Linkerd and Istio to implement a cloud-native service mesh to solve these challenges and bring microservices to the next level of scale, speed and efficiency.
Key takeaways:
- Apache Kafka decouples services, including event streams and request-response
- Kubernetes provides a cloud-native infrastructure for the Kafka ecosystem
- Service Mesh helps with security and observability at ecosystem / organization scale
- Envoy and Istio sit in the layer above Kafka and are orthogonal to the goals Kafka addresses
Blog post: http://www.kai-waehner.de/blog/2019/09/24/cloud-native-apache-kafka-kubernetes-envoy-istio-linkerd-service-mesh
Video recording of this slide deck: https://youtu.be/Us_C4RFOUrA
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Preparing for a future Microservices journey using DDD & Wardley MapsSusanne Kaiser
The journey to Microservices can be very challenging. Identifying proper boundaries, integrating services, and handling infrastructure and operational complexities that Microservices come with can be very overwhelming.
How not to loose sight and to cope with those challenges and still delivering user and business value? One approach could be to focus on that part of your business that gives most competitive advantage - your core domain - and outsource undifferentiating commodities to utility suppliers.
Domain Driven Design combined with Wardley Maps can help us to understand the problem domain and to focus on the core domain.
In this talk Susanne will show how Domain Driven Design and Wardley Maps can be used together to visualise how a value chain can evolve during a Microservices journey and keeping focus on your core domain.
KubeCon EU 2022: From Kubernetes to PaaS to Err What's NextDaniel Bryant
Developers building applications on Kubernetes today are being asked to not just code applications -- they are also responsible for shipping and running their applications, too. We often talk about needing a Kubernetes platform, but are we really looking for a PaaS? Or instead, are we looking for some kind of developer control plane with a Goldilock-sized collection of tools that provides just the right amount of platform? This talk will look back on my experience of building platforms, both as an end-user and now as part of an organization helping our customers do the same. The key takeaways are:
- Treat platform as a product
- Realize that you can’t have good developer experience (DevEx) without good UX
- Focus on workflows and tooling interoperability
We’ll wrap this talk with a walk-through of the CNCF ecosystem through the developer control plane lens, and look at what’s next in the future of this important emerging category.
In this session we’ll take a high-level overview of AWS Lambda, a serverless compute platform that has changed the way that developers around the world build applications. We’ll explore how Lambda works under the hood, the capabilities it has, and how it is used. By the end of this talk you’ll know how to create Lambda based applications and deploy and manage them easily.
Speaker: Chris Munns - Principal Developer Advocate, AWS Serverless Applications, AWS
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
NoSQL Application Development with JSON and MapR-DBMapR Technologies
NoSQL databases are being used everywhere by startups and Global 2000 companies alike for data environments that require cost-effective scaling. These environments also typically need to represent data in a more flexible way than is practical with relational databases.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
Amazon Redshift는 속도가 빠른 페타바이트 규모의 완전관리형 데이터 웨어하우스로, 간편하고 비용 효율적으로 모든 데이터를 기존 비즈니스 인텔리전스 도구를 사용하여 분석할 수 있게 해줍니다. 이 강연에서는 RedShift를 활용해 데이터 웨어하우스를 구축하고 데이터를 분석할 때의 모범사례과 다양한 고려사항에 대해 알아보고, Amazon S3에 있는 엑사바이트 규모의 데이터에 대해 복잡한 쿼리를 실행할 직접 수행할 수 있는 RedShift Spectrum을 실제로 사용할 때 고려사항에 대해 함께 다룰 예정입니다.
연사: 정영준, 아마존 웹서비스 솔루션즈 아키텍트
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
- Learn how AWS can help you process and make better use of your data with meaningful insights.
- Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
- Learn about real time data processing with Amazon Kinesis.
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
In this talk, we will present how we analyze, predict, and visualize network quality data, as a spark AI use case in a telecommunications company. SK Telecom is the largest wireless telecommunications provider in South Korea with 300,000 cells and 27 million subscribers. These 300,000 cells generate data every 10 seconds, the total size of which is 60TB, 120 billion records per day.
In order to address previous problems of Spark based on HDFS, we have developed a new data store for SparkSQL consisting of Redis and RocksDB that allows us to distribute and store these data in real time and analyze it right away, We were not satisfied with being able to analyze network quality in real-time, we tried to predict network quality in near future in order to quickly detect and recover network device failures, by designing network signal pattern-aware DNN model and a new in-memory data pipeline from spark to tensorflow.
In addition, by integrating Apache Livy and MapboxGL to SparkSQL and our new store, we have built a geospatial visualization system that shows the current population and signal strength of 300,000 cells on the map in real time.
Relational databases are used extensively in many applications and systems, but they are not always the best data store solution to the problem at hand. In this session we discuss the limitations of RDBMS and show which NoSQL solutions can be used to overcome these limitations. We also cover migration topics, such as how to add NoSQL databases without adding complexity to your development and operations.
An overview of the Amazon ElastiCache managed service, with examples of how it can be used to increase performance, lower costs and augment other database services and databases to make things faster, easier and less expensive.
O Amazon Redshift é um data warehouse rápido, gerenciado e em escala de petabytes que torna mais simples e econômica a análise de todos os seus dados usando as ferramentas de inteligência de negócios de que você já dispõe. Comece aos poucos, por apenas 0,25 USD por hora, sem compromissos, e aumente a escala até petabytes por 1.000 USD por terabyte por ano, menos de um décimo do custo das soluções tradicionais. Normalmente, os clientes relatam uma compactação de 3x, que reduz seus custos para 333 USD por terabyte não compactado por ano.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
In this talk from the Dublin Websummit 2014 AWS Technical Evangelist Danilo Poccia discusses NoSQL technology.
Includes an introduction to NoSQL DB and a discussion of when it is time to consider NoSQL.
Danilo also introduces Amazon DynamoDB as a NoSQL solution and talks through several case studies of customers that are using Amazon DynamoDB today.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
A presentation I made for Apache Spark and Apache Cassandra Integration.
First I present what are some of the differences between RDBMS and NoSQL, then I proceed with the Cassandra infrastructure and usual errors when creating a Cassandra Data Model.
Finally, I provide the Spark underlying main concepts and some settings for proper configuration.
Add Redis to Postgres to Make Your Microservices Go Boom!Dave Nielsen
Slides for talk delivered at PostgresOpen 2018 in San Francisco https://postgresql.us/events/pgopen2018/schedule/session/538-add-redis-to-postgres-to-make-your-microservice-go-boom/
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
Apache Spark 2.0 has laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data.
In this introductory part lecture and part hands-on workshop you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas:
What’s new in Spark 2.0
SparkSessions vs SparkContexts
Datasets/Dataframes and Spark SQL
Introduction to Structured Streaming concepts and APIs
Similar to Big Data Redis Mongodb Dynamodb Sharding (20)
Building Cloud-Native App Series - Part 5 of 11
Microservices Architecture Series
Microservices Architecture,
Monolith Migration Patterns
- Strangler Fig
- Change Data Capture
- Split Table
Infrastructure Design Patterns
- API Gateway
- Service Discovery
- Load Balancer
Docker Kubernetes Istio
Understanding Docker and creating containers.
Container Orchestration based on Kubernetes
Blue Green Deployment, AB Testing, Canary Deployment, Traffic Rules based on Istio
Distributed Transactions is a key concept for Micro Services based Apps and Saga Design Pattern helps out over here. However, developers struggle to shift their mindset from CRUD based design to Event Sourcing / CQRS concept. To solve this problem we are introducing the concept of Event Storming and Event Storming Process map.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Free Complete Python - A step towards Data Science
Big Data Redis Mongodb Dynamodb Sharding
1. @arafkarsh arafkarsh
ARAF KARSH HAMID
Co-Founder / CTO
MetaMagic Global Inc., NJ, USA
@arafkarsh
arafkarsh
Microservice
Architecture Series
Building Cloud Native Apps
NoSQL Vs. SQL
Redis / MongoDB / DynamoDB
Scalability: Shards and Partitions
Distributed Transactions
Part 4 of 11
2. @arafkarsh arafkarsh 2
Slides are color coded based on the topic colors.
NoSQL Vs. SQL
1
Redis
MongoDB
Dynamo DB
2
Scalability
Sharding &
Partitions
3
Distributed
Transactions 4
3. @arafkarsh arafkarsh
Agile
Scrum (4-6 Weeks)
Developer Journey
Monolithic
Domain Driven Design
Event Sourcing and CQRS
Waterfall
Optional
Design
Patterns
Continuous Integration (CI)
6/12 Months
Enterprise Service Bus
Relational Database [SQL] / NoSQL
Development QA / QC Ops
3
Microservices
Domain Driven Design
Event Sourcing and CQRS
Scrum / Kanban (1-5 Days)
Mandatory
Design
Patterns
Infrastructure Design Patterns
CI
DevOps
Event Streaming / Replicated Logs
SQL NoSQL
CD
Container Orchestrator Service Mesh
6. @arafkarsh arafkarsh
NoSQL Databases
Database Type ACID Query Use Case
Couchbase
Doc Based,
Key Value
Open Source Yes N1QL
Financial Services, Inventory,
IoT
Cassandra Wide Column Open Source No CQL
Social Analytics
Retail, Messaging
Neo4J Graph
Open Source
Commercial
Yes Cypher
AI, Master Data Mgmt
Fraud Protection
Redis Key Value Open Source Yes Many languages Caching, Queuing
Mongo DB Doc Based
Open Source
Commercial
Yes JS
IoT, Feal Time Analytics
Inventory,
Amazon
Dynamo DB
Key Value
Doc based
Vendor Yes DQL
Gamming, Retail, Financial
Services
Source: https://searchdatamanagement.techtarget.com/infographic/NoSQL-database-comparison-to-help-you-choose-the-right-store.
6
7. @arafkarsh arafkarsh
SQL Vs NoSQL
SQL NoSQL
Database Type Relational Non-Relational
Schema Pre-Defined Dynamic Schema
Database Category Table Based
1. Documents
2. Key Value Stores
3. Graph Stores
4. Wide Column Stores
Queries
Complex Queries (Standard SQL for
all Relational Databases)
Need to apply Special Query language for
each type of NoSQL DB.
Hierarchical Storage Not a Good Fit Perfect
Scalability
Scales well for traditional
Applications
Scales well for Modern heavy data-oriented
Application
Query Language
SQL – Standard Language across all
the Databases
Non-Standard Query Language as each of the
NoSQL DB is different.
ACID Support Yes For some of the Database (Ex. MongoDB)
Data Size Good for traditional Applications
Handles massive amount of Data for the
Modern App requirements.
7
8. @arafkarsh arafkarsh
SQL Vs NoSQL (MongoDB)
1. In MongoDB Transactional Properties are scoped at Doc Level.
2. One or More fields can be atomically written in a Single Operation.
3. With Updates to multiple sub documents including nested arrays.
4. Any Error results in the entire operation to Roll back.
5. This is at par with Data Integrity Guarantees provided Traditional Databases.
8
9. @arafkarsh arafkarsh
Multi Table / Doc ACID Transactions
Examples – Systems of Record or Line of Business (LoB) Applications
1. Finance
1. Moving funds between Bank Accounts,
2. Payment Processing Systems
3. Trading Platforms
2. Supply Chain
• Transferring ownership of Goods & Services through Supply
Chains and Booking Systems – Ex. Adding Order and Reducing
inventory.
3. Billing System
1. Adding a Call Detail Record and then updating Monthly Plan.
Source: ACID Transactions in MongoDB
9
11. @arafkarsh arafkarsh
Redis
• Data Structures
• Design Patterns
11
2020 2019 NoSQL Database Model
1 1 Redis Key-Value, Multi Model
2 2 Amazon DynamoDB Multi Model
3 3 Microsoft Cosmos Multi Model
4 4 Memcached Key-Value
In-Memory Databases
12. @arafkarsh arafkarsh
Why do you need In-Memory Databases
12
1 Users 1 Million +
2 Data Volume Terabytes to Petabytes
3 Locality Global
4 Performance Microsecond Latency
5 Request Rate Millions Per Second
6 Access Mobile, IoT, Devices
7 Economics Pay as you go
8 Developer Access Open API
Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
13. @arafkarsh arafkarsh
Tables / Docs (JSON) – Why Redis is different?
13
• Redis is a Multi data model Key Store
• Commands operate on Keys
• Data types of Keys can change overtime
Source: https://www.youtube.com/watch?v=ELk_W9BBTDU
14. @arafkarsh arafkarsh
Keys, Values & Data Types
14
movie:StarWars “Sold Out”
Key Name Value
String
Hash
List
Set
Sorted Set
Basic Data Types
Key Properties
• Unique
• Binary Safe (Case Sensitive)
• Max Size = 512 MB
Expiration / TTL
• By Default – Keys are retained
• Time in Seconds, Milli Second, Unix Epoch
• Added / Removed from Key SET movie:StarWars ex 5000 (Expires in 5000 seconds)
PEXPIRE movie:StarWars 5 (set for 5 milli seconds)
https://redis.io/commands/set
15. @arafkarsh arafkarsh
Redis – Remote Dictionary Server
15
Distributed In-Memory Data Store
String Standard String data
Hash { A: “John Doe”, B: “New York”, C:USA” }
List [ A -> B -> C -> D. -> E ]
Set { A, B, C, D, E }
Sorted Set { A:10, B:12, C:14:, D:20, E:32 }
Stream … msg1, msg2, msg3
Pub / Sub … msg1, msg2, msg3
https://redis.io/topics/data-types
16. @arafkarsh arafkarsh
Data Type: Hash
16
movie:The-Force-Awakens
Value
J. J. Abrams
L. Kasdan, J. J. Abrams, M. Arndt
Dan Mindel
HGET movie:The-Force-Awakens Director
“J. J. Abrams”
• Field & Value Pairs
• Single Level
• Add and Remove Fields
• Set Operations
• Intersect
• Union
https://redis.io/topics/data-types
https://redis.io/commands#hash
Key Name
Director
Writer
Cinematography
Field
Use Cases
• Session Cache
• Rate Limiting
17. @arafkarsh arafkarsh
Data Type: List
17
movies
Key Name
“Force Awakens, The” “Last Jedi, The” “Rise of Skywalker, The”
LPOP movies
“Force Awakens, The”
LPOP movies
“Last Jedi, The”
RPOP movies
“Rise of Skywalker, The”
RPOP movies
“Last Jedi, The”
• Ordered List (FIFO or LIFO)
• Duplicates Allowed
• Elements added from Left or Right or By Position
• Max 4 Billion elements per List
Type of Lists
• Queues
• Stacks
• Capped List
https://redis.io/topics/data-types
https://redis.io/commands#list
Use Cases
• Communication
• Activity List
18. @arafkarsh arafkarsh
Data Type: Set
18
movies
Member / Element
“Force Awakens, The”
“Last Jedi, The”
“Rise of Skywalker, The”
SMEMBERS movies
“Force Awakens, The”
“Last Jedi, The”
“Rise of Skywalker, The”
• Un-Ordered List of Unique
Elements
• Set Operations
• Difference
• Intersect
• Union
https://redis.io/topics/data-types
https://redis.io/commands#set
Key Name
Use Cases
• Unique Visitors
19. @arafkarsh arafkarsh
Data Type: Sorted Set
19
movies
Value
“Force Awakens, The”
“Last Jedi, The”
“Rise of Skywalker, The”
ZRANGE movies 0 1
“Last Jedi, The”
“Rise of Skywalker, The”
• Ordered List of Unique
Elements
• Set Operations
• Intersect
• Union
https://redis.io/topics/data-types
https://redis.io/commands#set
Key Name
3
1
2
Score
Use Cases
• Leaderboard
• Priority Queues
20. @arafkarsh arafkarsh
Redis: Transactions
20
• Transactions are
• Atomic
• Isolated
• Redis commands are
queue
• All the Queued commands
are executed sequentially
as an Atomic unit
MULTI
SET movie:The-Force-Awakens:Review Good
INCR movie:The-Force-Awakens:Rating
EXEC
21. @arafkarsh arafkarsh
Redis In-Memory Data Store Use cases
21
Machine
Learning
Message
Queues
Gaming
Leaderboards
Geospatial
Session
Store
Media
Streaming
Real-time
Analytics
Caching
22. @arafkarsh arafkarsh
Use Case: Sorted Set – Leader Board
22
• Collection of Sorted Distinct
Entities
• Set Operations and Range
Queries based on Score
value: John
score: 610
value : Jane
score: 987
value : Sarah
score: 1597
value : Maya
score: 144
value : Fred
score: 233
value : Ann
score: 377
Game Scores
ZADD game:1 987 Jane 1597 Sarah 377 Maya 610 John 144
Ann 233 Fred
ZREVRANGE game:1 0 3 WITHSCORES. (Get top 4 Scores)
• Sarah 1597
• Jane 987
• John 610
• Ann 377
Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
https://redis.io/commands/zadd
23. @arafkarsh arafkarsh
Use Case: Geospatial
23
• Compute distance between
members
• Find all members within a
radius
Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
GEOADD cities 87.6298 41.8781 Chicago
GEOADD cities 122.3321 447.6062 Seattle
ZRANGE cities0 -1
• “Chicago”
• “Seattle”
GEODIST cities Chicago Seattle mi
• “1733.4089”
GEORADIUS cities 122.4194 37..7749
1000 mi WITHDIST
• “Seattle”
• “679.4848”
o m for meters
o km for
kilometres
o mi for miles
o ft for feet
https://redis.io/commands/geodist
26. @arafkarsh arafkarsh
MongoDB Docs – Prefer Embedding
Use
Structure
to use
Data
within a
Document
Include
Bounded
Arrays to
have
multiple
records
26
27. @arafkarsh arafkarsh
MongoDB Docs – Embrace Duplication
Field Info
Duplicated
from
Customer
Profile
Address
Duplicated
from
Customer
Profile
27
28. @arafkarsh arafkarsh
Know When Not to Embed
As Item is used outside
of Order, You don’t
need to embed the
whole Object here.
Instead give the Item
Reference ID.
(Not to Embed)
Name is given to
decouple it from Item
(Product) Service.
(Embrace Duplication)
28
30. @arafkarsh arafkarsh
MongoDB – Tips & Best Practices
1. MongoDB Will Abort any Multi Document transaction that runs for more
than 60 seconds.
2. No More than 1000 documents should be modified within a
Transaction.
3. Developers need to try logic to retry the transaction in case transaction
is aborted due to network error.
4. Transactions that affects Multiple Shards incur a greater performance
Cost as operations are coordinated across multiple participating nodes
over the network.
5. Performance will be impacted if a transaction runs against a collection
that is subject to rebalancing.
30
32. @arafkarsh arafkarsh
Amazon DynamoDB Concept
Customer ID Name Category State
Order
Order
Customer
Cart
Payments
Order
Cart
Catalogue
Catalogue
Table
Product ID Name Value Description Image
Item ID Quantity Value Currency
User ID + Item ID
Attributes
1. A single Table holds multiple Entities (Customer, Catalogue, Cart, Order
etc.) aka Items.
2. Item contains a collection of Attributes.
3. Primary Key plays a key role in Performance, Scalability and avoiding Joins
(in a typical RDBMS way).
4. Primary Key contains a Partition Key and an option Sort Key.
5. Item Data Model is JSON, and Attribute can be a field or a Custom Object.
Items
Primary Key
33. @arafkarsh arafkarsh
DynamoDB – Under the Hood
One Single table Multiple Entities with multiple documents (Records in RDBMS style)
1 Org Record
2 Employee Record
1 Org Record
2 Employee Record
1. DynamoDB Structure is JSON (Document Model) – However, it has no resemblance to MongoDB in terms DB
implementation or Schema Design Patterns.
2. Multiple Entities are part of the Single Table and this helps to avoid expensive joins. For Ex. PK = ORG#Magna
will retrieve all the 3 records. 1 Record from Org Entity and 2 Records from Employee Entity.
3. Partition Key helps in Sharding and Horizontal Scalability.
36. @arafkarsh arafkarsh
App Scalability based
on micro services
architecture
Source: The NewStack. Based on the Art of Scalability by By Martin Abbot
& Michael Fisher
36
38. @arafkarsh arafkarsh
Scalability Best Practices : Lessons from
Best Practices Highlights
#1 Partition By Function
• Decouple the Unrelated Functionalities.
• Selling functionality is served by one set of applications, bidding by another, search by yet another.
• 16,000 App Servers in 220 different pools
• 1000 logical databases, 400 physical hosts
#2 Split Horizontally
• Break the workload into manageable units.
• eBay’s interactions are stateless by design
• All App Servers are treated equal and none retains any transactional state
• Data Partitioning based on specific requirements
#3
Avoid Distributed
Transactions
• 2 Phase Commit is a pessimistic approach comes with a big COST
• CAP Theorem (Consistency, Availability, Partition Tolerance). Apply any two at any point in time.
• @ eBay No Distributed Transactions of any kind and NO 2 Phase Commit.
#4
Decouple Functions
Asynchronously
• If Component A calls component B synchronously, then they are tightly coupled. For such systems to
scale A you need to scale B also.
• If Asynchronous A can move forward irrespective of the state of B
• SEDA (Staged Event Driven Architecture)
#5
Move Processing to
Asynchronous Flow
• Move as much processing towards Asynchronous side
• Anything that can wait should wait
#6 Virtualize at All Levels • Virtualize everything. eBay created their on O/R layer for abstraction
#7 Cache Appropriately • Cache Slow changing, read-mostly data, meta data, configuration and static data.
Source: http://www.infoq.com/articles/ebay-scalability-best-practices
38
40. @arafkarsh arafkarsh
CAP Theorem by Eric Allen Brewer
Pick Any 2!! Say NO to 2 Phase Commit
Source: https://en.wikipedia.org/wiki/CAP_theorem | http://en.wikipedia.org/wiki/Eric_Brewer_(scientist)
CAP 12 years later: How the “Rules have changed”
“In a network subject to communication failures, it is
impossible for any web service to implement an atomic
read / write shared memory that guarantees a response
to every request.”
Partition Tolerance
The system continues to operate despite an arbitrary
number of messages being dropped (or delayed) by
the network between nodes.
Consistency
Every read receives the
most recent write or an
error.
Availability
Every request receives a (non-error) response – without
guarantee that it contains the most recent write.
40
41. @arafkarsh arafkarsh
Sharding / Partitioning
Method Scalability Table
Sharding Horizontal Rows Same Schema with
Uniq Rows
Sharding Vertical Columns Different Schema
Partition Vertical Rows Same Schema with
Uniq Rows
1. Optimize the Database
2. Separate Rows or Columns into multiple smaller tables
3. Each table has either Same Schema with Unique Rows
4. Or has a Schema that is subset of the Original
Customer ID Customer
Name
DOB City
1 ABC Bengaluru
2 DEF Tokyo
3 JHI Kochi
4 KLM Pune
Original Table
Customer ID Customer
Name
DOB City
1 ABC Bengaluru
2 DEF Tokyo
Customer ID Customer
Name
DOB City
3 JHI Kochi
4 KLM Pune
Horizontal Sharding - 1
Horizontal Sharding - 2
Customer ID Customer
Name
DOB
1 ABC
2 DEF
3 JHI
4 KLM
Customer ID City
1 Bengaluru
2 Tokyo
3 Kochi
4 Pune
Vertical Sharding - 1 Vertical Sharding - 2
41
42. @arafkarsh arafkarsh
Sharding Scenarios
1. Horizontal Scaling: Single Server is unable to handle the load
even after partitioning the data sets.
2. Data can be partitioned in such a way that specific server(s)
can serve the search query based on the partition. For Ex. In
an e-Commerce Application Searching the data based on
1. Product Type
2. Product Brand
3. Sellers Region (for Local Shipping)
4. Orders based on Year / Months
42
43. @arafkarsh arafkarsh
Geo Partitioning
• Geo-partitioning is the ability to control the location of
data at the row level.
• CockroachDB lets you control which tables are replicated
to which nodes. But with geo-partitioning, you can control
which nodes house data with row-level granularity.
• This allows you to keep customer data close to the user,
which reduces the distance it needs to travel, thereby
reducing latency and improving user experience.
Source: https://www.cockroachlabs.com/blog/geo-partition-data-reduce-latency/
43
45. @arafkarsh arafkarsh
Oracle Sharding and Geo
CREATE SHARDED TABLE customers (
cust_id NUMBER NOT NULL ,
name VARCHAR2(50) ,
address VARCHAR2(250) ,
geo VARCHAR2(20) ,
class VARCHAR2(3) ,
signup_date DATE ,
CONSTRAINT cust_pk PRIMARY KEY(geo, cust_id) )
PARTITIONSET BY LIST (geo)
PARTITION BY CONSISTENT HASH (cust_id)
PARTITIONS AUTO (
PARTITIONSET AMERICA VALUES (‘AMERICA’) TABLESPACE SET tbs1,
PARTITIONSET ASIA VALUES (‘ASIA’) TABLESPACE SET tbs2
);
Primary
Shard
Standby
Shards
Read / Write
Tx / Second
Read Only
Tx / Second
25 25 1.18 Million 1.62 Million
50 50 2.11 Million 3.26 Million
75 75 3.57 Million 5.05 Million
100 100 4.38 Million 6.82 Million
Linear Scalability
Source: https://www.oracle.com/a/tech/docs/sharding-wp-12c.pdf
45
48. @arafkarsh arafkarsh
MongoDB Replication
Application
(Client App Driver)
Replica Set1
(mongos)
RS 2
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Replication
Replication
Heartbeat
Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
Provides redundancy
High Availability.
It provides Fault
Tolerance as
multiple copies of
data on different
database servers
ensures that the loss
of a single database
server will not affect
the Application.
1. Replicate the primary's oplog and
2. Apply the operations to their data
sets such that the secondaries'
data sets reflect the primary's data
set.
3. Secondary apply the operations to
their data sets asynchronously
What Secondary does?
What Primary does?
1. Receives all write operations
mongodb://
mongodb0.example.com:27017,
mongodb1.example.com:27017,
mongodb2.example.com:27017/?
replicaSet=myRepl
Use Secure Connection
mongodb://myDBReader:D1fficultP%40ssw0rd
@mongodb0.example.com:27017
Replica Set Connection Configuration
48
49. @arafkarsh arafkarsh
MongoDB Replication: Automatic Failover
Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
If the Primary is NOT reachable
more than the configured
electionTimeoutMillis (default 10
seconds) then
One of the Secondary will become
the Primary after an election
process.
Most updated Secondary will
become the next Primary.
Election should not take more
than 12 seconds to elect a Primary.
Replica Set1
(mongos)
RS 2
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Heartbeat
Election for new Primary
Replica Set1
(mongos)
Primary
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Heartbeat
Election for new Primary
Replication
The write Operations will be blocked until the new Primary is selected.
The Secondary Replica Set can serve the Read Operations while the election is in progress provided its configured for that.
MongoDB 4.2+ compatible drivers enable retryable writes by default
MongoDB 4.0 and 3.6-compatible drivers must explicitly enable retryable writes by including retryWrites=true in the connection
string.
49
50. @arafkarsh arafkarsh
MongoDB Replication: Arbiter
Application
(Client App Driver)
Replica Set1
(mongos)
RS 2
(mongos)
Arbiter
(mongos)
Secondary Servers
Primary Server
Replication
An Arbiter can be used to save the
cost of adding an additional
Secondary Server.
Arbiter will handle only the election
process to select a Primary.
Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
50
51. @arafkarsh arafkarsh
MongoDB Replication: Secondary Reads
Replica Set1
(mongos)
RS 2
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Replication
Replication
Heartbeat
Source: MongoDB Replication https://docs.mongodb.com/manual/core/read-preference/
Asynchronous replication to secondaries means
that reads from secondaries may return data that
does not reflect the state of the data on the
primary.
Multi-document transactions that contain read
operations must use read preference primary. All
operations in a given transaction must route to
the same member.
Write to Primary and Read from Secondary
Application
(Client App Driver)
Read from
the
Secondary
Write
mongo ‘mongodb://mongodb0,mongodb1,mongodb2/?replicaSet=rsOmega&readPreference=secondary’
$ >
51
52. @arafkarsh arafkarsh
MongoDB – Deploy Replica Set
mongod --replSet “rsOmega” --bind_ip localhost,<hostname(s)|ip address(es)>
$ >
replication:
replSetName: "rsOmega"
net:
bindIp: localhost,<hostname(s)|ip address(es)>
Config File
mongod --config <path-to-replica-config>
$ >
Use Config file to set the Replica Config to each Mongo Instance
Use Command Line to set Replica details to each Mongo Instance
1
Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
52
53. @arafkarsh arafkarsh
MongoDB – Deploy Replica Set
mongo
$ >
Initiate the Replica Set
Connect to Mongo DB
2
> rs.initiate( {
_id : "rsOmega",
members: [
{ _id: 0, host: "mongodb0.host.com:27017" },
{ _id: 1, host: "mongodb1.host.com :27017" },
{ _id: 2, host: "mongodb2.host.com :27017" }
]
})
3
Run rs.initiate() on just one and only one mongod instance for
the replica set.
Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
53
54. @arafkarsh arafkarsh
MongoDB – Deploy Replica Set
mongo ‘mongodb://mongodb0,mongodb1,mongodb2/?replicaSet=rsOmega’
$ >
> rs.conf()
Show Config
Show the Replica Config
4
Source: MongoDB Replication
https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
> rs.status()
5
Ensure that the replica set has a primary
mongo
$ >
6
Connect to the Replica Set
54
57. @arafkarsh arafkarsh
Distributed Transactions : 2 Phase Commit
2 PC or not 2 PC, Wherefore Art Thou XA?
57
How does 2PC impact scalability?
• Transactions are committed in two phases.
• This involves communicating with every database (XA
Resources) involved to determine if the transaction will commit
in the first phase.
• During the second phase each database is asked to complete
the commit.
• While all of this coordination is going on, locks in all of the data
sources are being held.
• The longer duration locks create the risk of higher contention.
• Additionally, the two phases require more database
processing time than a single phase commit.
• The result is lower overall TPS in the system.
Transaction
Manager
XA Resources
Request to Prepare
Commit
Prepared
Prepare
Phase
Commit
Phase
Done
Source : Pat Helland (Amazon) : Life Beyond Distributed Transactions Distributed Computing : http://dancres.github.io/Pages/
Solution : Resilient System
• Event Based
• Design for failure
• Asynchronous Recovery
• Make all operations idempotent.
• Each DB operation is a 1 PC
58. @arafkarsh arafkarsh
Distributed Tx: SAGA Design Pattern instead of 2PC
58
Long Lived Transactions (LLTs) hold on to DB resources for relatively long periods of time, significantly delaying
the termination of shorter and more common transactions.
Source: SAGAS (1987) Hector Garcia Molina / Kenneth Salem,
Dept. of Computer Science, Princeton University, NJ, USA
T1 T2 Tn
Local Transactions
C1 C2 Cn-1
Compensating Transaction
Divide long–lived, distributed transactions into quick local ones with compensating actions for
recovery.
Travel : Flight Ticket & Hotel Booking Example
BASE (Basic Availability, Soft
State, Eventual Consistency)
Room Reserved
T1
Room Payment
T2
Seat Reserved
T3
Ticket Payment
T4
Cancelled Room Reservation
C1
Cancelled Room Payment
C2
Cancelled Ticket Reservation
C3
59. @arafkarsh arafkarsh
SAGA Design Pattern Features
59
1. Backward Recovery (Rollback)
T1 T2 T3 T4 C3 C2 C1
Order Processing, Banking
Transactions, Ticket Booking
Examples
Updating individual scores in
a Team Game.
2. Forward Recovery with Save Points
T1 (sp) T2 (sp) T3 (sp)
• To recover from Hardware Failures, SAGA needs to be persistent.
• Save Points are available for both Forward and Backward Recovery.
Type
Source: SAGAS (1987) Hector Garcia Molina / Kenneth Salem, Dept. of Computer Science, Princeton University, NJ, USA
60. @arafkarsh arafkarsh
Handling Invariants – Monolithic to Micro Services
60
In a typical Monolithic App
Customer Credit Limit info and
the order processing is part of
the same App. Following is a
typical pseudo code.
Order Created
T1
Order
Microservice
Credit Reserved
T2
Customer
Microservice
In Micro Services world with Event Sourcing, it’s a
distributed environment. The order is cancelled if
the Credit is NOT available. If the Payment
Processing is failed then the Credit Reserved is
cancelled.
Payment
Microservice
Payment Processed
T3
Order Cancelled
C1
Credit Cancelled due to
payment failure
C2
Begin Transaction
If Order Value <= Available
Credit
Process Order
Process Payments
End Transaction
Monolithic 2 Phase Commit
https://en.wikipedia.org/wiki/Invariant_(computer_science)
61. @arafkarsh arafkarsh 61
Use Case : Restaurant – Forward Recovery
Domain
The example focus on a
concept of a Restaurant
which tracks the visit of
an individual or group
to the Restaurant. When
people arrive at the
Restaurant and take a
table, a table is opened.
They may then order
drinks and food. Drinks
are served immediately
by the table staff,
however food must be
cooked by a chef. Once
the chef prepared the
food it can then be
served.
Payment
Billing
Dining
Source: http://cqrs.nu/tutorial/cs/01-design
Soda Cancelled
Table Opened
Juice Ordered
Soda Ordered
Appetizer Ordered
Soup Ordered
Food Ordered
Juice Served
Food Prepared
Food Served
Appetizer Served
Table Closed
Aggregate Root : Dinning Order
Billed Order
T1
Payment CC
T2
Payment Cash
T3
T1 (sp) T2 (sp) T3 (sp)
Event Stream
Aggregate Root : Food Bill
Transaction doesn't rollback if one payment
method is failed. It moves forward to the
NEXT one.
sp
Network
Error
C1 sp
62. @arafkarsh arafkarsh
Local SAGA Features
1. Part of the Micro Services
2. Local Transactions and Compensation
Transactions
3. SAGA State is persisted
4. All the Local transactions are based on
Single Phase Commit (1 PC)
5. Developers need to ensure that
appropriate compensating
transactions are Raised in the event of
a failure.
API Examples
@StartSaga(name=“HotelBooking”)
public void reserveRoom(…) {
}
@EndSaga(name=“HotelBooking”)
public void payForTickets(…) {
}
@AbortSaga(name=“HotelBooking”)
public void cancelBooking(…) {
}
@CompensationTx()
public void cancelReservation(…) {
}
62
63. @arafkarsh arafkarsh
SAGA Execution Container
1. SEC is a separate Process
2. Stateless in nature and Saga state is stored in a
messaging system (Kafka is a Good choice).
3. SEC process failure MUST not affect Saga Execution as
the restart of the SEC must start from where the Saga
left.
4. SEC – No Single Point of Failure (Master Slave Model).
5. Distributed SAGA Rules are defined using a DSL.
63
64. @arafkarsh arafkarsh
Use Case : Travel Booking – Distributed Saga (SEC)
Hotel Booking
Car Booking
Flight Booking
Saga
Execution
Container
Start Saga
{Booking Request}
Payment
End
Saga
Start
Saga
Start Hotel
End Hotel
Start Car
End Car
Start Flight
End Flight
Start Payment
End Payment
Saga Log
End Saga
{Booking Confirmed}
SEC knows the structure of the
distributed Saga and for each
of the Request Which Service
needs to be called and what
kind of Recovery mechanism it
needs to be followed.
SEC can parallelize the calls
to multiple services to
improve the performance.
The Rollback or Roll forward
will be dependent on the
business case.
Source: Distributed Sagas By Catitie McCaffrey, June 6, 2017
64
65. @arafkarsh arafkarsh
Use Case : Travel Booking – Rollback
Hotel Booking
Car Booking
Flight Booking
Saga
Execution
Container
Start Saga
{Booking Request}
Payment
Start
Comp
Saga
End
Comp
Saga
Start Hotel
End Hotel
Start Car
Abort Car
Cancel Hotel
Cancel Flight
Saga Log
End Saga
{Booking Cancelled}
Kafka is a good choice to
implement the SEC log.
SEC is completely STATELESS in
nature. Master Slave model
can be implemented to avoid
the Single Point of Failure.
Source: Distributed Sagas By Catitie McCaffrey, June 6, 2017
65
66. @arafkarsh arafkarsh
Summary: Databases
66
1. DB Sharding / Partition
2. 2 Phase Commit
Doesn’t scale well in cloud environment
3. SAGA Design Pattern
Raise compensating events when the local transaction fails.
4. SAGA Supports Rollbacks & Roll
Forwards
Critical pattern to address distributed transactions.
67. @arafkarsh arafkarsh
Scalability Best Practices : Lessons from
Best Practices Highlights
#1 Partition By Function
• Decouple the Unrelated Functionalities.
• Selling functionality is served by one set of applications, bidding by another, search by yet another.
• 16,000 App Servers in 220 different pools
• 1000 logical databases, 400 physical hosts
#2 Split Horizontally
• Break the workload into manageable units.
• eBay’s interactions are stateless by design
• All App Servers are treated equal and none retains any transactional state
• Data Partitioning based on specific requirements
#3
Avoid Distributed
Transactions
• 2 Phase Commit is a pessimistic approach comes with a big COST
• CAP Theorem (Consistency, Availability, Partition Tolerance). Apply any two at any point in time.
• @ eBay No Distributed Transactions of any kind and NO 2 Phase Commit.
#4
Decouple Functions
Asynchronously
• If Component A calls component B synchronously, then they are tightly coupled. For such systems to
scale A you need to scale B also.
• If Asynchronous A can move forward irrespective of the state of B
• SEDA (Staged Event Driven Architecture)
#5
Move Processing to
Asynchronous Flow
• Move as much processing towards Asynchronous side
• Anything that can wait should wait
#6 Virtualize at All Levels • Virtualize everything. eBay created their on O/R layer for abstraction
#7 Cache Appropriately • Cache Slow changing, read-mostly data, meta data, configuration and static data.
Source: http://www.infoq.com/articles/ebay-scalability-best-practices
68. @arafkarsh arafkarsh 68
100s Microservices
1,000s Releases / Day
10,000s Virtual Machines
100K+ User actions / Second
81 M Customers Globally
1 B Time series Metrics
10 B Hours of video streaming
every quarter
Source: NetFlix: : https://www.youtube.com/watch?v=UTKIT6STSVM
10s OPs Engineers
0 NOC
0 Data Centers
So what do NetFlix think about DevOps?
No DevOps
Don’t do lot of Process / Procedures
Freedom for Developers & be Accountable
Trust people you Hire
No Controls / Silos / Walls / Fences
Ownership – You Build it, You Run it.
69. @arafkarsh arafkarsh 69
50M Paid Subscribers
100M Active Users
60 Countries
Cross Functional Team
Full, End to End ownership of features
Autonomous
1000+ Microservices
Source: https://microcph.dk/media/1024/conference-microcph-2017.pdf
1000+ Tech Employees
120+ Teams
70. @arafkarsh arafkarsh 70
Design Patterns are
solutions to general
problems that
software developers
faced during software
development.
Design Patterns
74. @arafkarsh arafkarsh
References
1. July 15, 2015 – Agile is Dead : GoTo 2015 By Dave Thomas
2. Apr 7, 2016 - Agile Project Management with Kanban | Eric Brechner | Talks at Google
3. Sep 27, 2017 - Scrum vs Kanban - Two Agile Teams Go Head-to-Head
4. Feb 17, 2019 - Lean vs Agile vs Design Thinking
5. Dec 17, 2020 - Scrum vs Kanban | Differences & Similarities Between Scrum & Kanban
6. Feb 24, 2021 - Agile Methodology Tutorial for Beginners | Jira Tutorial | Agile Methodology Explained.
Agile Methodologies
74
75. @arafkarsh arafkarsh
References
1. Vmware: What is Cloud Architecture?
2. Redhat: What is Cloud Architecture?
3. Cloud Computing Architecture
4. Cloud Adoption Essentials:
5. Google: Hybrid and Multi Cloud
6. IBM: Hybrid Cloud Architecture Intro
7. IBM: Hybrid Cloud Architecture: Part 1
8. IBM: Hybrid Cloud Architecture: Part 2
9. Cloud Computing Basics: IaaS, PaaS, SaaS
75
1. IBM: IaaS Explained
2. IBM: PaaS Explained
3. IBM: SaaS Explained
4. IBM: FaaS Explained
5. IBM: What is Hypervisor?
Cloud Architecture
76. @arafkarsh arafkarsh
References
Microservices
1. Microservices Definition by Martin Fowler
2. When to use Microservices By Martin Fowler
3. GoTo: Sep 3, 2020: When to use Microservices By Martin Fowler
4. GoTo: Feb 26, 2020: Monolith Decomposition Pattern
5. Thought Works: Microservices in a Nutshell
6. Microservices Prerequisites
7. What do you mean by Event Driven?
8. Understanding Event Driven Design Patterns for Microservices
76
77. @arafkarsh arafkarsh
References – Microservices – Videos
77
1. Martin Fowler – Micro Services : https://www.youtube.com/watch?v=2yko4TbC8cI&feature=youtu.be&t=15m53s
2. GOTO 2016 – Microservices at NetFlix Scale: Principles, Tradeoffs & Lessons Learned. By R Meshenberg
3. Mastering Chaos – A NetFlix Guide to Microservices. By Josh Evans
4. GOTO 2015 – Challenges Implementing Micro Services By Fred George
5. GOTO 2016 – From Monolith to Microservices at Zalando. By Rodrigue Scaefer
6. GOTO 2015 – Microservices @ Spotify. By Kevin Goldsmith
7. Modelling Microservices @ Spotify : https://www.youtube.com/watch?v=7XDA044tl8k
8. GOTO 2015 – DDD & Microservices: At last, Some Boundaries By Eric Evans
9. GOTO 2016 – What I wish I had known before Scaling Uber to 1000 Services. By Matt Ranney
10. DDD Europe – Tackling Complexity in the Heart of Software By Eric Evans, April 11, 2016
11. AWS re:Invent 2016 – From Monolithic to Microservices: Evolving Architecture Patterns. By Emerson L, Gilt D. Chiles
12. AWS 2017 – An overview of designing Microservices based Applications on AWS. By Peter Dalbhanjan
13. GOTO Jun, 2017 – Effective Microservices in a Data Centric World. By Randy Shoup.
14. GOTO July, 2017 – The Seven (more) Deadly Sins of Microservices. By Daniel Bryant
15. Sept, 2017 – Airbnb, From Monolith to Microservices: How to scale your Architecture. By Melanie Cubula
16. GOTO Sept, 2017 – Rethinking Microservices with Stateful Streams. By Ben Stopford.
17. GOTO 2017 – Microservices without Servers. By Glynn Bird.
78. @arafkarsh arafkarsh
References
78
Domain Driven Design
1. Oct 27, 2012 What I have learned about DDD Since the book. By Eric Evans
2. Mar 19, 2013 Domain Driven Design By Eric Evans
3. Jun 02, 2015 Applied DDD in Java EE 7 and Open Source World
4. Aug 23, 2016 Domain Driven Design the Good Parts By Jimmy Bogard
5. Sep 22, 2016 GOTO 2015 – DDD & REST Domain Driven API’s for the Web. By Oliver Gierke
6. Jan 24, 2017 Spring Developer – Developing Micro Services with Aggregates. By Chris Richardson
7. May 17. 2017 DEVOXX – The Art of Discovering Bounded Contexts. By Nick Tune
8. Dec 21, 2019 What is DDD - Eric Evans - DDD Europe 2019. By Eric Evans
9. Oct 2, 2020 - Bounded Contexts - Eric Evans - DDD Europe 2020. By. Eric Evans
10. Oct 2, 2020 - DDD By Example - Paul Rayner - DDD Europe 2020. By Paul Rayner
79. @arafkarsh arafkarsh
References
Event Sourcing and CQRS
1. IBM: Event Driven Architecture – Mar 21, 2021
2. Martin Fowler: Event Driven Architecture – GOTO 2017
3. Greg Young: A Decade of DDD, Event Sourcing & CQRS – April 11, 2016
4. Nov 13, 2014 GOTO 2014 – Event Sourcing. By Greg Young
5. Mar 22, 2016 Building Micro Services with Event Sourcing and CQRS
6. Apr 15, 2016 YOW! Nights – Event Sourcing. By Martin Fowler
7. May 08, 2017 When Micro Services Meet Event Sourcing. By Vinicius Gomes
79
80. @arafkarsh arafkarsh
References
80
Kafka
1. Understanding Kafka
2. Understanding RabbitMQ
3. IBM: Apache Kafka – Sept 18, 2020
4. Confluent: Apache Kafka Fundamentals – April 25, 2020
5. Confluent: How Kafka Works – Aug 25, 2020
6. Confluent: How to integrate Kafka into your environment – Aug 25, 2020
7. Kafka Streams – Sept 4, 2021
8. Kafka: Processing Streaming Data with KSQL – Jul 16, 2018
9. Kafka: Processing Streaming Data with KSQL – Nov 28, 2019
81. @arafkarsh arafkarsh
References
Databases: Big Data / Cloud Databases
1. Google: How to Choose the right database?
2. AWS: Choosing the right Database
3. IBM: NoSQL Vs. SQL
4. A Guide to NoSQL Databases
5. How does NoSQL Databases Work?
6. What is Better? SQL or NoSQL?
7. What is DBaaS?
8. NoSQL Concepts
9. Key Value Databases
10. Document Databases
11. Jun 29, 2012 – Google I/O 2012 - SQL vs NoSQL: Battle of the Backends
12. Feb 19, 2013 - Introduction to NoSQL • Martin Fowler • GOTO 2012
13. Jul 25, 2018 - SQL vs NoSQL or MySQL vs MongoDB
14. Oct 30, 2020 - Column vs Row Oriented Databases Explained
15. Dec 9, 2020 - How do NoSQL databases work? Simply Explained!
1. Graph Databases
2. Column Databases
3. Row Vs. Column Oriented Databases
4. Database Indexing Explained
5. MongoDB Indexing
6. AWS: DynamoDB Global Indexing
7. AWS: DynamoDB Local Indexing
8. Google Cloud Spanner
9. AWS: DynamoDB Design Patterns
10. Cloud Provider Database Comparisons
11. CockroachDB: When to use a Cloud DB?
81
82. @arafkarsh arafkarsh
References
Docker / Kubernetes / Istio
1. IBM: Virtual Machines and Containers
2. IBM: What is a Hypervisor?
3. IBM: Docker Vs. Kubernetes
4. IBM: Containerization Explained
5. IBM: Kubernetes Explained
6. IBM: Kubernetes Ingress in 5 Minutes
7. Microsoft: How Service Mesh works in Kubernetes
8. IBM: Istio Service Mesh Explained
9. IBM: Kubernetes and OpenShift
10. IBM: Kubernetes Operators
11. 10 Consideration for Kubernetes Deployments
Istio – Metrics
1. Istio – Metrics
2. Monitoring Istio Mesh with Grafana
3. Visualize your Istio Service Mesh
4. Security and Monitoring with Istio
5. Observing Services using Prometheus, Grafana, Kiali
6. Istio Cookbook: Kiali Recipe
7. Kubernetes: Open Telemetry
8. Open Telemetry
9. How Prometheus works
10. IBM: Observability vs. Monitoring
82
83. @arafkarsh arafkarsh
References
83
1. Feb 6, 2020 – An introduction to TDD
2. Aug 14, 2019 – Component Software Testing
3. May 30, 2020 – What is Component Testing?
4. Apr 23, 2013 – Component Test By Martin Fowler
5. Jan 12, 2011 – Contract Testing By Martin Fowler
6. Jan 16, 2018 – Integration Testing By Martin Fowler
7. Testing Strategies in Microservices Architecture
8. Practical Test Pyramid By Ham Vocke
Testing – TDD / BDD
84. @arafkarsh arafkarsh 84
1. Simoorg : LinkedIn’s own failure inducer framework. It was designed to be easy to extend and
most of the important components are plug‐ gable.
2. Pumba : A chaos testing and network emulation tool for Docker.
3. Chaos Lemur : Self-hostable application to randomly destroy virtual machines in a BOSH-
managed environment, as an aid to resilience testing of high-availability systems.
4. Chaos Lambda : Randomly terminate AWS ASG instances during business hours.
5. Blockade : Docker-based utility for testing network failures and partitions in distributed
applications.
6. Chaos-http-proxy : Introduces failures into HTTP requests via a proxy server.
7. Monkey-ops : Monkey-Ops is a simple service implemented in Go, which is deployed into an
OpenShift V3.X and generates some chaos within it. Monkey-Ops seeks some OpenShift
components like Pods or Deployment Configs and randomly terminates them.
8. Chaos Dingo : Chaos Dingo currently supports performing operations on Azure VMs and VMSS
deployed to an Azure Resource Manager-based resource group.
9. Tugbot : Testing in Production (TiP) framework for Docker.
Testing tools
85. @arafkarsh arafkarsh
References
CI / CD
1. What is Continuous Integration?
2. What is Continuous Delivery?
3. CI / CD Pipeline
4. What is CI / CD Pipeline?
5. CI / CD Explained
6. CI / CD Pipeline using Java Example Part 1
7. CI / CD Pipeline using Ansible Part 2
8. Declarative Pipeline vs Scripted Pipeline
9. Complete Jenkins Pipeline Tutorial
10. Common Pipeline Mistakes
11. CI / CD for a Docker Application
85
86. @arafkarsh arafkarsh
References
86
DevOps
1. IBM: What is DevOps?
2. IBM: Cloud Native DevOps Explained
3. IBM: Application Transformation
4. IBM: Virtualization Explained
5. What is DevOps? Easy Way
6. DevOps?! How to become a DevOps Engineer???
7. Amazon: https://www.youtube.com/watch?v=mBU3AJ3j1rg
8. NetFlix: https://www.youtube.com/watch?v=UTKIT6STSVM
9. DevOps and SRE: https://www.youtube.com/watch?v=uTEL8Ff1Zvk
10. SLI, SLO, SLA : https://www.youtube.com/watch?v=tEylFyxbDLE
11. DevOps and SRE : Risks and Budgets : https://www.youtube.com/watch?v=y2ILKr8kCJU
12. SRE @ Google: https://www.youtube.com/watch?v=d2wn_E1jxn4
87. @arafkarsh arafkarsh
References
87
1. Lewis, James, and Martin Fowler. “Microservices: A Definition of This New Architectural Term”, March 25, 2014.
2. Miller, Matt. “Innovate or Die: The Rise of Microservices”. e Wall Street Journal, October 5, 2015.
3. Newman, Sam. Building Microservices. O’Reilly Media, 2015.
4. Alagarasan, Vijay. “Seven Microservices Anti-patterns”, August 24, 2015.
5. Cockcroft, Adrian. “State of the Art in Microservices”, December 4, 2014.
6. Fowler, Martin. “Microservice Prerequisites”, August 28, 2014.
7. Fowler, Martin. “Microservice Tradeoffs”, July 1, 2015.
8. Humble, Jez. “Four Principles of Low-Risk Software Release”, February 16, 2012.
9. Zuul Edge Server, Ketan Gote, May 22, 2017
10. Ribbon, Hysterix using Spring Feign, Ketan Gote, May 22, 2017
11. Eureka Server with Spring Cloud, Ketan Gote, May 22, 2017
12. Apache Kafka, A Distributed Streaming Platform, Ketan Gote, May 20, 2017
13. Functional Reactive Programming, Araf Karsh Hamid, August 7, 2016
14. Enterprise Software Architectures, Araf Karsh Hamid, July 30, 2016
15. Docker and Linux Containers, Araf Karsh Hamid, April 28, 2015
88. @arafkarsh arafkarsh
References
88
16. MSDN – Microsoft https://msdn.microsoft.com/en-us/library/dn568103.aspx
17. Martin Fowler : CQRS – http://martinfowler.com/bliki/CQRS.html
18. Udi Dahan : CQRS – http://www.udidahan.com/2009/12/09/clarified-cqrs/
19. Greg Young : CQRS - https://www.youtube.com/watch?v=JHGkaShoyNs
20. Bertrand Meyer – CQS - http://en.wikipedia.org/wiki/Bertrand_Meyer
21. CQS : http://en.wikipedia.org/wiki/Command–query_separation
22. CAP Theorem : http://en.wikipedia.org/wiki/CAP_theorem
23. CAP Theorem : http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
24. CAP 12 years how the rules have changed
25. EBay Scalability Best Practices : http://www.infoq.com/articles/ebay-scalability-best-practices
26. Pat Helland (Amazon) : Life beyond distributed transactions
27. Stanford University: Rx https://www.youtube.com/watch?v=y9xudo3C1Cw
28. Princeton University: SAGAS (1987) Hector Garcia Molina / Kenneth Salem
29. Rx Observable : https://dzone.com/articles/using-rx-java-observable
Editor's Notes
Distributed Transactions and Multi-Document Transactions
Starting in MongoDB 4.2, the two terms are synonymous. Distributed transactions refer to multi-document transactions on sharded clusters and replica sets. Multi-document transactions (whether on sharded clusters or replica sets) are also known as distributed transactions starting in MongoDB 4.2.
http://microservices.io/articles/scalecube.html
Sharding in the context of databases is the process of splitting very large databases into smaller parts, or shards. As experience can tell us, some statements that we issue to our database can take a consid‐ erable amount of time to execute. During these statements’ execu‐ tion, the database becomes locked and unavailable for the application. This means that we are introducing a period of down‐ time to our users.
Pat Helland (Amazon) : Life beyond distributed transactions… http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pdf
In computer science, an invariant is a condition that can be relied upon to be true during execution of a program, or during some portion of it. It is a logical assertion that is held to always be true during a certain phase of execution