This document provides an overview of modern software architecture models and concepts. It begins with an introduction to software architecture and definitions. It then discusses the Kruchten 5+1 view model for describing architecture using multiple views. Additional topics covered include the OCTO matrix approach, example architecture diagrams for a sample application called RIA Organizer, and modern architectures like big data, microservices and serverless computing.
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
In this session, we’ll discuss the benefits of moving from monolithic to micro-services application architectures, and examine where micro-services can be used. We’ll share common transition strategies and relate them to the specifics of e-commerce and retail workloads, using customer examples. You’ll learn how to build micro-services using AWS services, and get a better understanding of the role of data storage, API endpoints and service discovery. Plus, you can learn from the real-life experience of Digital Goodie, an online retailing platform for connected commerce.
Event-driven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events.
This architectural pattern may be applied by the design and implementation of applications and systems which transmit events among loosely coupled software components and services.
In this session you’ll learn how to create a loosely coupled architecture for your business that has the domain at the core. You’ll learn the basics of EDA, and also learn how we are transforming our architecture at Unibet.com to become event driven, and what benefits it will bring to our business. The session will cover technologies such as JMS, XML, JSON, Google Protocol Buffers, ActiveMQ and Spring.
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
In this session, we’ll discuss the benefits of moving from monolithic to micro-services application architectures, and examine where micro-services can be used. We’ll share common transition strategies and relate them to the specifics of e-commerce and retail workloads, using customer examples. You’ll learn how to build micro-services using AWS services, and get a better understanding of the role of data storage, API endpoints and service discovery. Plus, you can learn from the real-life experience of Digital Goodie, an online retailing platform for connected commerce.
Event-driven architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of, and reaction to events.
This architectural pattern may be applied by the design and implementation of applications and systems which transmit events among loosely coupled software components and services.
In this session you’ll learn how to create a loosely coupled architecture for your business that has the domain at the core. You’ll learn the basics of EDA, and also learn how we are transforming our architecture at Unibet.com to become event driven, and what benefits it will bring to our business. The session will cover technologies such as JMS, XML, JSON, Google Protocol Buffers, ActiveMQ and Spring.
JFokus: Cubes, Hexagons, Triangles, and More: Understanding MicroservicesChris Richardson
The microservice architecture is becoming increasing important. But what is it exactly? Why should you care about microservices? And, what do you need to do to ensure that your organization uses the microservice architecture successfully? In this talk, I’ll answer these and other questions using shapes as visual metaphors. You will learn about the motivations for the microservice architecture and why simply adopting microservices is insufficient. I describe essential characteristics of microservices, You will learn how a successful microservice architecture consist of loosely coupled services with stable APIs that communicate asynchronous. I will cover strategies for effectively testing microservices.
Event-driven architecture is a versatile approach to designing and integrating complex software systems. These systems tend to be easier to model and build. Event-driven architecture is not a new concept, but as more organizations contemplate microservices, this approach to system design has become appropriate in more situations and is worth a fresh look.
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.
This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.
Building Cloud-Native App Series - Part 1 of 11
Microservices Architecture Series
Design Thinking, Lean Startup, Agile (Kanban, Scrum),
User Stories, Domain-Driven Design
Any team that has made the jump from building monoliths to building microservices knows the complexities you must overcome to build a system that is functional and maintainable. Building a microservice architecture that is low latency and only communicates using REST APIs is even more tricky, with high latency for requests being a common concern. This talk explains how you can use events as the backbone of your microservice architecture and build an efficient, event-driven system. It covers how to get started with designing your microservice architecture and the key requirements any system needs to fulfil. It also introduces the different patterns you will encounter in event-driven architectures and the advantages and disadvantages of these choices. Finally it explains why Apache Kafka is a great choice for event-driven microservices.
Comparing Service-Oriented Architecture (SOA), Microservices and Service-Based Architecture (SBA - SOA and Microservices Hybrid) patterns.
Also discussing coupling and cohesion concepts in relation to the systems design.
Mariusz Richtscheid: Masz już dosyć swojego monolitu? Rozważasz rozbicie go na mikroserwisy? Nie tak prędko. Zanim zaczniesz zmieniać architekturę, warto dowiedzieć się jakie problemy się z tym wiążą. Na szczęście większość bolączek można rozwiązać stosując odpowiednie wzorce i techniki, których część zostanie omówiona w trakcie prezentacji.
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUESSOAT
Les systèmes distribués ont largement évolués ces 10 dernières années, passant d’énormes applications monolithiques à de petits containers de services, apportant plus de souplesse et d’agilité au sein des systèmes d’information.
Le terme « Architecture microservice » a vu le jour pour décrire cette manière particulière de concevoir des applications logicielles.
Bien qu’il n’y ait pas de définition précise de ce style d’architecture, elles ont un certain nombre de caractéristiques communes basées autour de l’organisation de l’entreprise, du déploiement automatisé et de la décentralisation du contrôle du langage et des données.
Seulement, développer ces systèmes peut tourner au véritable casse-tête. Je vous propose donc un tour des concepts et différentes caractéristiques de ce type d’architecture, des bonnes et mauvaises pratiques, de la création jusqu’au déploiement des applications.
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2015/09/event-driven-architecture/
Enterprise systems today are moving towards being dynamic where change has become the norm rather than the exception. Such systems need to be loosely coupled, autonomous, versatile and adaptive. There arises the need to model such systems, and event driven architecture (EDA) is how such systems can be modelled and explained.
This webinar will discuss
The basics of EDA
How it can benefit your enterprise
How the WSO2 product stack complements this architectural pattern
ELT vs. ETL - How they’re different and why it mattersMatillion
ELT is a fundamentally better way to load and transform your data. It’s faster. It’s more efficient. And Matillion’s browser-based interface makes it easier than ever to work with your data. You’re using data to improve your world: shouldn’t the tools you use return the favor?
In this webinar:
- Explore the differences between ELT and ETL
- Learn why ELT is a better, more modern process
- Discover the latest trends in ELT and how they apply to your business
- Find out how Matillion ETL makes loading large amounts of data easier
The introduction covers the following
1. What are Microservices and why should be use this paradigm?
2. 12 factor apps and how Microservices make it easier to create them
3. Characteristics of Microservices
Note: Please download the slides to view animations.
A introduction to Microservices Architecture: definition, characterstics, framworks, success stories. It contains a demo about implementation of microservices with Spring Boot, Spring cloud an Eureka.
JFokus: Cubes, Hexagons, Triangles, and More: Understanding MicroservicesChris Richardson
The microservice architecture is becoming increasing important. But what is it exactly? Why should you care about microservices? And, what do you need to do to ensure that your organization uses the microservice architecture successfully? In this talk, I’ll answer these and other questions using shapes as visual metaphors. You will learn about the motivations for the microservice architecture and why simply adopting microservices is insufficient. I describe essential characteristics of microservices, You will learn how a successful microservice architecture consist of loosely coupled services with stable APIs that communicate asynchronous. I will cover strategies for effectively testing microservices.
Event-driven architecture is a versatile approach to designing and integrating complex software systems. These systems tend to be easier to model and build. Event-driven architecture is not a new concept, but as more organizations contemplate microservices, this approach to system design has become appropriate in more situations and is worth a fresh look.
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Kai Wähner
Learn the differences between an event-driven streaming platform and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.
This session discusses how teams in different industries solve these challenges by building a native streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.
Building Cloud-Native App Series - Part 1 of 11
Microservices Architecture Series
Design Thinking, Lean Startup, Agile (Kanban, Scrum),
User Stories, Domain-Driven Design
Any team that has made the jump from building monoliths to building microservices knows the complexities you must overcome to build a system that is functional and maintainable. Building a microservice architecture that is low latency and only communicates using REST APIs is even more tricky, with high latency for requests being a common concern. This talk explains how you can use events as the backbone of your microservice architecture and build an efficient, event-driven system. It covers how to get started with designing your microservice architecture and the key requirements any system needs to fulfil. It also introduces the different patterns you will encounter in event-driven architectures and the advantages and disadvantages of these choices. Finally it explains why Apache Kafka is a great choice for event-driven microservices.
Comparing Service-Oriented Architecture (SOA), Microservices and Service-Based Architecture (SBA - SOA and Microservices Hybrid) patterns.
Also discussing coupling and cohesion concepts in relation to the systems design.
Mariusz Richtscheid: Masz już dosyć swojego monolitu? Rozważasz rozbicie go na mikroserwisy? Nie tak prędko. Zanim zaczniesz zmieniać architekturę, warto dowiedzieć się jakie problemy się z tym wiążą. Na szczęście większość bolączek można rozwiązać stosując odpowiednie wzorce i techniki, których część zostanie omówiona w trakcie prezentacji.
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUESSOAT
Les systèmes distribués ont largement évolués ces 10 dernières années, passant d’énormes applications monolithiques à de petits containers de services, apportant plus de souplesse et d’agilité au sein des systèmes d’information.
Le terme « Architecture microservice » a vu le jour pour décrire cette manière particulière de concevoir des applications logicielles.
Bien qu’il n’y ait pas de définition précise de ce style d’architecture, elles ont un certain nombre de caractéristiques communes basées autour de l’organisation de l’entreprise, du déploiement automatisé et de la décentralisation du contrôle du langage et des données.
Seulement, développer ces systèmes peut tourner au véritable casse-tête. Je vous propose donc un tour des concepts et différentes caractéristiques de ce type d’architecture, des bonnes et mauvaises pratiques, de la création jusqu’au déploiement des applications.
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2015/09/event-driven-architecture/
Enterprise systems today are moving towards being dynamic where change has become the norm rather than the exception. Such systems need to be loosely coupled, autonomous, versatile and adaptive. There arises the need to model such systems, and event driven architecture (EDA) is how such systems can be modelled and explained.
This webinar will discuss
The basics of EDA
How it can benefit your enterprise
How the WSO2 product stack complements this architectural pattern
ELT vs. ETL - How they’re different and why it mattersMatillion
ELT is a fundamentally better way to load and transform your data. It’s faster. It’s more efficient. And Matillion’s browser-based interface makes it easier than ever to work with your data. You’re using data to improve your world: shouldn’t the tools you use return the favor?
In this webinar:
- Explore the differences between ELT and ETL
- Learn why ELT is a better, more modern process
- Discover the latest trends in ELT and how they apply to your business
- Find out how Matillion ETL makes loading large amounts of data easier
The introduction covers the following
1. What are Microservices and why should be use this paradigm?
2. 12 factor apps and how Microservices make it easier to create them
3. Characteristics of Microservices
Note: Please download the slides to view animations.
A introduction to Microservices Architecture: definition, characterstics, framworks, success stories. It contains a demo about implementation of microservices with Spring Boot, Spring cloud an Eureka.
Software Archtecture.
Software design is a process to transform user requirements into some suitable form, which helps the programmer in software coding and implementation.
Software design is the important step in SDLC (Software Design Life Cycle), which moves the concentration from problem domain to solution domain. It tries to specify how to fulfill the requirements mentioned in SRS.
Software design plays an important role in developing software: during software design, software engineers produce various models that form a kind of blueprint of the solution to be implemented
4+1. describing the architecture of software-intensive systems, based on the use of multiple, concurrent views.
The views are used to describe the system from the viewpoint of different stakeholders,
Chapter 7 Design Architecture and Methodology1.docxmccormicknadine86
Chapter 7:
Design: Architecture and Methodology
1
Design Topics Covered
Architectural vs. detailed design
“Common” architectural styles, tactics, and reference architectures
Basic techniques for detailed design
Basic issues with user-interface design
2
Design
Starts mostly from/with requirements (evolving mostly from functionalities and other non-functional characteristics).
How is the software solution going to be structured?
What are the main components—(functional comp)?
Often directly from requirements’ functionalities
(use cases).
How are these components related?
Possibly re-organize the components (composition/decomposition).
Two main levels of design:
Architectural (high level)
Detailed design
How should we depict design—notation/language?
3
Relationship between Architecture and Design
Detailed Design
Comes from
Requirements &
Architecture
4
Software Architecture
Structure(s) of the solution, comprising:
Major software elements
Their externally visible properties
Relationships among elements
Every software system has an architecture.
May have multiple structures!
Multiple ways of organizing elements, depending on the perspective
External properties of components (and modules)
Component (module) interfaces
Component (module) interactions, rather than internals of components and modules
5
Views and Viewpoints
View – representation of a system structure
4+1 views (by Krutchen)
Logical (OO decomposition – key abstractions)
Process (run-time, concurrency/distribution of functions)
Subsystem decomposition
Physical architecture
+1: use cases
Other classification (Bass, Clements, Kazman)
Module
Run-time
Allocation (mapping to development environment)
Different views for different people
6
Architectural Styles/Patterns
Pipes and filters
Event driven
Client-server
Model-view-controller (MVC)
Layered
Database centric
Three tier
We discuss architectural styles/patterns as
“reusable” starting point for design activities.
7
Pipe-Filter Architecture Style
The high-level design solution is decomposed into two “generic” parts (filters and pipes):
Filter is a service that transforms a stream of input data into a stream of output data.
Pipe is a mechanism or conduit through which the data flows from one filter to another.
Input
time cards
Prepare for
check processing
Process checks
Problems that require batch file processing seem to fit this architecture: e.g., payroll, compilers, month-end accounting.
** Reminds one of DFD without the data store or source sink.**
8
Event Driven (Real Time)
The high-level design solution is based on an event dispatcher, which manages events and the functionalities that depend on those events. These have the following characteristics:
Events may be a simple notification or may include associated data.
Events may be prioritized or be based on constraints such as time.
Events may require synchronous or asynchronous processing.
Events may be “registered” ...
Oop final project documentation jose pagan v2.1Jose Pagan
The purpose of this Software Architecture Document is to describe, through the use of diagrams and descriptions, the architecture of the Event Driven Process Manager application. The document provides a comprehensive architectural overview of the system, and conveys the significant architectural decisions which have been made in the development of Event Driven Process Manager. A Pre-design Project Proposal and Work Plan, Project Requirements / Documentation, Design Documentation, and Installation Instructions have been included in the Appendices.
Oop final project documentation jose pagan v2.1Jose Pagan
The purpose of this Software Architecture Document is to describe, through the use of diagrams and descriptions, the architecture of the Event Driven Process Manager application. The document provides a comprehensive architectural overview of the system, and conveys the significant architectural decisions which have been made in the development of Event Driven Process Manager. A Pre-design Project Proposal and Work Plan, Project Requirements / Documentation, Design Documentation, and Installation Instructions have been included in the Appendices.
software design is very crusial thing to manage therfore software 'software design is very crusial thing to manage therfore software software design is very crusial thing to manage therfore software software design is very crusial thing to manage therfore software
A proposed framework for Agile Roadmap Design and MaintenanceJérôme Kehrli
Maintaining a relevant and meaningful roadmap while adopting a state of the art Agile methodology is challenging and somewhat antonymous.
This presentation proposes a framework for designing and maintaining an Agile Roadmap.
A presentation of the search for Product-Market Fit with the principles, practices and processes that lead to it, from the Lean-Startup and Design Thinking perspective
From Product Vision to Story Map - Lean / Agile Product shapingJérôme Kehrli
A lot of Software Engineering projects fail for a lack of shared vision due to poor communication among people involved in the project.
A sound maintenance of the product backlog can only be achieved if all the people have a good understanding of what they have to do (common vision).
Roman Pichler, in a post originally written in Jul 16 2012, has proposed a really interesting approach: use various canvas to create and share product vision and product backlog creation and refinement.
This presentation is a drive through these various boards and canvas that should be designed in prior to any product development: the Product Vision, the Lean Canvas, The Product Definition and the Story Map.
Artificial Intelligence and Digital Banking - What about fraud prevention ?Jérôme Kehrli
Artificial intelligence for banking fraud prevention.
A presentation on how it takes its root in the digitalisation ways and how it impacts customer experience.
Artificial Intelligence for Banking Fraud PreventionJérôme Kehrli
Artificial Intelligence at NetGuardians:
"From skepticism to large scale adoption towards fraud prevention"
Slides of my speech at the EPFL / EMBA Innovation Leader 2018 event.
Introduction to NetGuardians' Big Data Software StackJérôme Kehrli
NetGuardians is executing it's Big Data Analytics Platform on three key Big Data components underneath: ElasticSearch, Apache Mesos and Apache Spark. This is a presentation of the behaviour of this software stack.
Periodic Table of Agile Principles and PracticesJérôme Kehrli
Recently I fell by chance on the Periodic Table of the Elements... Long time no see... Remembering my physics lessons in University, I always loved that table. I remembered spending hours understanding the layout and admiring the beauty of its natural simplicity.
So I had the idea of trying the same layout, not the same approach since both are not comparable, really only the same layout for Agile Principles and Practices.
The result is in this presentation: The Periodic Table of Agile Principles and Practices:
Agility and planning : tools and processesJérôme Kehrli
In this presentation, I intend to present the fundamentals, the roles, the processes, the rituals and the values that I believe a team would need to embrace to achieve success down the line in Agile Software Development Management - Product Management, Team Management and Project Management - with the ultimate goal of making planning and forecasting as simple and efficient as it can be.
Bytecode manipulation with Javassist for fun and profitJérôme Kehrli
Java bytecode is the form of instructions that the JVM executes.
A Java programmer, normally, does not need to be aware of how Java bytecode works.
Understanding the bytecode, however, is essential to the areas of tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application's domain. Profilers, mocking tools, AOP, ORM frameworks, IoC Containers, boilerplate code generators, etc. require to understand Java bytecode thoroughly and come up with means of manipulating it at runtime.
Each and every of these advanced features of what is nowadays standard approaches when programming with Java require a sound understanding of the Java bytecode, not to mention completely new languages running on the JVM such as Scala or Clojure.
Bytecode manipulation is not easy though ... except with Javassist.
Of all the libraries and tools providing advanced bytecode manipulation features, Javassist is the easiest to use and the quickest to master. It takes a few minutes to every initiated Java developer to understand and be able to use Javassist efficiently. And mastering bytecode manipulation, opens a whole new world of approaches and possibilities.
DevOps is a methodology capturing the practices adopted from the very start by the web giants who had a unique opportunity as well as a strong requirement to invent new ways of working due to the very nature of their business: the need to evolve their systems at an unprecedented pace as well as extend them and their business sometimes on a daily basis.
While DevOps makes obviously a critical sense for startups, I believe that the big corporations with large and old-fashioned IT departments are actually the ones that can benefit the most from adopting these principles and practices.
Digitalization: A Challenge and An Opportunity for BanksJérôme Kehrli
Today’s banking industry era is strongly defined by a word - digital. The urgency to act is only getting severe each day. Banks using digital technologies to automate processes, improve regulatory compliance, and transform the customer experience may realize a profit upside of 40% or more, while laggards that resist digital innovation will be punished by customers, financial markets, regulators, and may see up to 35% of net profit eroded, according to a McKinsey analysis.
The vital question to answer is, do we get digitalization right? Why is it getting extremely urgent to digitize?
Some years ago, Eric Ries, Steve Blank and others initiated The Lean Startup movement. The Lean Startup is a movement, an inspiration, a set of principles and practices that any entrepreneur initiating a startup would be well advised to follow.
Projecting myself into it, I think that if I had read Ries' book before, or even better Blank's book, I would maybe own my own company today, around AirXCell or another product, instead of being disgusted and honestly not considering it for the near future.
In addition to giving a pretty important set of principles when it comes to creating and running a startup, The Lean Startup also implies an extended set of Engineering practices, especially software engineering practices.
Smart Contracts are a central component to next-generation blockchain platforms. Blockchain technology is much broader than just bitcoin. The sustained levels of robust security achieved by public cryptocurrencies have demonstrated to the world that this new wave of blockchain technologies can provide efficiencies and intangible technological benefits very similar to what the internet has done.
Blockchains are a very powerful technology, capable of going much further than only "simple" financial transaction; a technology capable of performing complex operations, capable of understanding much more than just how many bitcoins one currently has in his digital wallet.
This is where the idea of Smart Contracts come in. Smart Contracts are in the process of becoming a cornerstone for enterprise blockchain applications and will likely become one of the pillars of blockchain technology.
In this presentation, we will explore what a smart contract is, how it works, and how it is being used.
The Blockchain - The Technology behind Bitcoin Jérôme Kehrli
The blockchain and blockchain related topics are becoming increasingly discussed and studied nowadays. There is not one single day where I don't hear about it, that being on linkedin or elsewhere.
I interested myself deeply in the blockchain topic recently and this is the first article of a coming whole serie around the blockchain.
This presentation is an introduction to the blockchain, presents what it is in the light of its initial deployment in the Bitcoin project as well as all technical details and architecture concerns behind it.
We won't focus here on business applications aside from what is required to present the blockchain purpose, more concrete business applications and evolutions will be the topic of another presentation I'll post in a few weeks
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
2. 2
Part I – Software Architecture Models
1.1 Introduction to Software Architecture
1.2 Our illustration example
1.3 The Kruchten 5 + 1 View Model
1.4 The OCTO Matrix Approach
Part II - Modern Architectures
2.1 Big Data
2.2 The Death of the Moore Law
2.3 The CAP Theorem
2.4 NoSQL / NewSQL
2.5 Hadoop
2.6 Data Lake
2.7 Streaming Architecture
2.8 Lambda Architecture
2.9 Big Data 2.0 & Kubernetes
2.10 Microservices Architecture
Part III - Takeaways
Agenda
4. 4
Definitions 1/3
A software system's architecture is the set of principal design decisions about the system
Software architecture is the blueprint for a system's construction and evolution
Design decisions encompass the following aspects of the system under development
Structure,
Behaviour,
Interactions,
Non-functional properties
Taylor 2010
"Principal” implies an a degree of importance that grants a design decision an
"architectural status".
This implies that not all design decisions are architectural. As such, these do not
necessarily impact a system's architecture.
How one defines principal depends on what the stakeholders define as the system
goals.
5. 5
Definitions 2/3
An architecture is
the set of significant decisions about the organization of a software system,
the selection of the structural elements and their interfaces by which the system is
composed
together with their behavior as specified in the collaborations among those elements,
the composition of these structural and behavioral elements into progressively larger
subsystems,
and the architectural style that guides this organization, these elements and their
interfaces, their collaborations, and their composition.
RUP – Rational Unified Process
6. 6
Definitions 3/3
In most successful software projects, the expert developers working on that project have a
shared understanding of the system design. This shared understanding is called
‘architecture’. This understanding includes how the system is divided into components and how
the components interact through interfaces.
Architecture is about stuff that’s hard to change later
Ralph Johnson
Neal Ford
Architecture is about the important stuff
Martin Fowler
7. 7
Sidenotes
Any organization that designs a system (defined broadly) will produce a design whose structure
is a copy of the organization's communication structure.
Melvin E. Conway (Conway's law)
... all models are approximations. Essentially, all models are wrong, but some are useful.
However, the approximate nature of the model must always be borne in mind...
George Box
8. 8
Software Architecture is
A Process : to design a high-level solution
A Product : schemas, models, documentation, prototypes
Means : frameworks, libraries, middleware, etc. to ease implementation
of large systems
A Reality : the working software or Information System
My View
9. 9
Different Kind of Architectures
Enterprise Architecture Solution / Application Architecture
Enterprise Architecture defines the way the enterprise uses
several applications.
Metaphor : City Planning / City Map
Focus : Strategy / Business
Some Key Concerns:
- Uncover operational gaps
- Understand data-dependencies across the IT landscape
- Understand Interactions between Solutions / Applications
- Streamline the application landscape for optimal
performance
- Decommissioning of legacy solutions
- Eliminate redundancies
- Identify and avoid tech risks
Application architecture defines the various pieces that
compose an application
Metaphor : Building / House Architecture
Focus : Technology / Functional
Some Key Concerns:
- Define a best-fit solution for identified problems
- Ensure solution meets functional and non-functional
requirements
- Understand how application supports business
capabilities
- Understand functional fit, technical fit and risks
- Implement technical processes for Application
development
10. 10
Architecture or Design
Architecture Design Implementation
Abstraction Fine Granularity / Reality
Process of creating High-level
structures of a software system
Converts the software
characteristics into a high-level
structure
Micro-services, serverless,
streaming, lambda are some
software architecture patterns
Helps define high-level structure
of the software system
Process of creating a form of
specification of a software artifact
that helps implement the software
Describes all units of a software
system to support coding
Creational, structural and
behavioural are some types of
software design-patterns
Helps implement the software
16. 16
Philippe Kruchten defined a 4+1 Views Model to capture the description of Software
Architecture into multiple complementary views
in 1995 when he was working for Rational Software Corp.
The 4+1 views model is an information organization framework; it consists of logical,
process, development, and physical knowledge of an application, and end-user perspective
information.
A view is an aspect (subpart) of information.
A notion is a way of representing information.
The 4 + 1 Kruchten Views Model
Philippe Kruchten, Architectural Blueprints—The “4+1” View Model of Software Architecture
The “4+1” view model is rather “generic”: other notations and tools can be used, other design
methods can be used, especially for the logical and process decompositions, but we have
indicated the ones we have used with success.
17. Conceptual / Logic Physical / Operational
Non-functional
Functional
Logical / Structural View Implementation / Development View
Process / Behaviour View Deployment / Physical View
The logical view is concerned with the functionality
that the system provides to end-users.
UML Diagrams used to represent the logical view
include Class diagram, Communication diagram,
Sequence diagram.
The development view illustrates a system from
a programmer's perspective and is
concerned with software management. This
view is also known as the implementation view.
It uses the UML Component diagram to
describe system
components.
UML Diagrams used to
represent the development view
include the Package diagram.
The process view deals with the
dynamic aspects of the system,
explains the system processes and
how they communicate, and focuses on the runtime
behavior of the system. The process view addresses
concurrency, distribution, integrators, performance,
and scalability, etc. UML Diagrams to represent
process view include the Activity diagram.
The physical view depicts the system
from a system engineer's
point-of-view.
It is concerned with the topology of
software components on the physical layer, as
well as communication between these
components.
This view is also known as the deployment view.
UML Diagrams used to represent physical view
include the Deployment diagram.
Use Case / Scenario View
The description of an architecture is illustrated using a
small set of use cases, or scenarios which become a
fifth view. The scenarios describe sequences of
interactions between objects and / or processes. They
are used to identify architectural elements and to
illustrate and validate the architecture design. They also
serve as a starting point for tests of an architecture
prototype. UML Diagram(s) used to represent the
scenario view include the Use case diagram.
18. Conceptual / Logic Physical / Operational
Non-functional
Functional
Process / Behaviour View
Perspective: System Integrators
Stage: Design
Focus: Process decomposition
Concerns: Performances, Scalability,
Throughput, Synchronization, Concurrency
Artifacts:
- Sequence Diagrams / Activity Diagrams
- Communication / interactions diagrams
- State Machine Diagrams
- Timing Diagrams
Logical / Structural View
Perspective: End Users , Business
Analysts
Stage: Requirements Analysis
Focus: Components / Objects / Services
Model - Decomposition
Concerns: Functionality
Artifacts:
- Functions Schema
- Class / Objects Diagram
- (composite) Structure Diagram
- State Machine
Implementation / Development View
Perspective: Developers,
Designers
Stage: Design
Focus: Subsystem
decomposition
Concerns: Software /
Configuration Management
Artifacts:
- Components Diagram
- Package Diagram
Deployment / Physical View
Perspective: System Engineers
Stage: Design
Focus: Software mapping to
Hardware (deployment)
Concerns: System Topology,
Delivery, Installation,
Communication
Artifacts:
- Deployment diagram
- Network / Cluster topology
(not UML)
Use Case / Scenario View
Perspective: End User
Stage: Putting it all together
Focus: Understandability , usability
Concerns: Feature Decomposition
Artifacts:
- Use-case diagrams
- User Stories (not UML)
- Story Maps (not UML)
25. 25
OCTO Technology designed in 2010 a matrix that presents a 360 overview of
most-if-not-all questions, concerns and aspects that need to be answered and
addressed when defining a Software Architecture
The OCTO Architecture Matrix
The questions and concerns are
related to different levels of
architecture:
Functional
Application
Technical
System
They regroup different perspectives:
Security
Usage
Services
Data
Exchanges
27. 27
RIAO Functional Architecture
Email
Management
Contact
management
Email Search
Search
Email
Display
/
Edition Folder
Management
Global App. Email Application
Appointment
Display
/
Edition
Calendar
Display
/
Edition
Calendar Application
Folder
Display
/
Edition
Contact
Display /
Edition
Calendar
management
Appointment
Management
Contact
Search
Calendar
Search
Contact App.
Login
User
Management
Appointment
Mapping
Business /
Entry Points
User
Interactions
Services
&
Functions
Mgmt.
Search
Text
Compos.
HTML
Comp.
RTF
Compos.
Text
Display
HTML
Disp.
RTF
Display
Text
Compos.
HTML
Comp.
RTF
Compos.
Text
Display
HTML
Disp.
RTF
Display
Attachment
Management
Email
IO
28. 28
GridFS
Folder
Model
Email
Docs.
Attachem-
ent files
Appoint-
ment Docs.
Calendar
Model
Contact
Documents
User
Model
SMTP
Server
POP3
Store
RIAO
Backend
RIAO
UI
RIAO Application Architecture
Search
Email IO
Appointment Mapping
User
Model
Folder
Model
Email
Model
Calendar
Model
Appointm.
Model
Contact
Model
Attachem.
Model
Appointm.
Search
Email
Search
Contact
Search
Search Mgmt.
User
Mgmt.
Folder
Mgmt.
Email
Mgmt.
Calendar
Mgmt.
Appointm.
Mgmt.
Contact
Mgmt.
Attachem.
Mgmt.
Email Synchronization
Deleg.
Search
Service
User
Service
Email Service Calendar Service
Contact
Service
Login
Page
Profile
Edition
Folder
View
/
Edit
Email
Compos.
Email
View
/
Edit
Calendar
View
/
Edit
Appoinlt.
Compos.
Appoint.
View
/
Edit
Contact
View
/
Edit
Contact
Model
CRUD Fetch
Send
Search Page
REST API
Data
/
Exchanges
Integration
Busi-
ness
Presentation
APIs and Process Orchestration
CRUD
RTF
Display
RTF
Compos.
HTML
Compos.
Deleg.
Email Application Calendar App.
Contact
App
Text
Compos
Email
Model
Calendar
Model
Email Controller Calendar Controller Contact Ctr.
Search Control.
User Ctrl.
Loc. Storage
Main Page
29. 29
User
tier
Proces-
sing
tier
Integration
Tier
Web
browser
RIAO
UI
RIAO
Back.
RIAO Technical Architecture
HTTP
UI Controllers
JAX-RS / HTTPS
Java VM
Apache Proxy
Business managers
MongoDB Client
Views
JQuery CKEditor
Bootstrap
Business Services
DAOs
SMTP Client POP3 Client
Courier / Debian
IO Management
SMTP POP3
Spring
Boot
/
Tomcat
8
Runtime
Forms
Models
Linux
Debian
Spring
Security
Spring
Framework
Apache
Commons
SSL Cert.
Local
Store
Sess.
Ckie.
JAX-RS / HTTP
Main
Page
Obj./JSON
Map.
JSON / Object Mapping
30. 30
User Computer RIA Server
Tomcat (Spring Boot)
RIAO System Architecture
Apache
Proxy
Web browser
RIAO UI RIAO Backend
HTTPS
Courier Server
Courier / Debian
POP3
SMTP
HTTP
(User OS) Debian Linux
MongoDB
Node
MongoDB
Node
Mongo Node
Docker
Debian Linux
Integration
Processing / Business
Presentation
FirewallD
Open JDK 11 / JVM
Loc. Storage
Internet Internal Network
SystemD
Kubernetes Cluster
K8s
service
Locator
34. 34
Data deluge
5 exabytes of data
(5 billions of gigabytes)
has been generated
since the first measurements
until 2003,
In 2011, this
quantity was
generated in 2
days
In 2018, this
quantity was
generated in
2 minutes
Source: https://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf
35. 35
Our architectures are 30 years old !
Corporate Operational Data
Internal GUI Space
Operational / Live Audit / Logs Archived Data …
Ext. Data
Staging Database
…
ETL
ETL
Datawarehouse
Storage
Cleaning / Cleansing / Enrichment / Remapping
Historize
Query
ETL
Reporting / Analytics / Querying
Data
Mart
Data
Mart
Data
Mart
Operational Application Space
Online
Business
Applications
Batch
Business
Applications
Monitoring /
Operation
Applications
External GUI Space
DMZ
Web
Apps
Desktop
Apps
Web
Apps
Mobile
Apps
Operational Information System Analytical Information System / Business Intelligence
37. 37
The Moore law
“The number of transistors and
resistors on a chip doubles every
24 months”
- Gordon Moore, 1965
38. 38
Technical capacitites evolution
For the 40 years, the IT component capabilties grew exponentially
The Moore law!
Source :
http://radar.oreilly.com/2011/08/building-data-startups.html
39. 39
Storage cost evolution
While the unit cost is decreasing…
0.01 $
0.10 $
1.00 $
10.00 $
100.00 $
1,000.00 $
10,000.00 $
100,000.00 $
1,000,000.00 $
10,000,000.00 $
1975 1980 1985 1990 1995 2000 2005 2010 2015
Hard Drive
RAM
Source :http://www.mkomo.com/cost-per-gigabyte
2012
5$/GB
1982
5M$/GB
41. 41
Disk throughput evolution
Issue : The throughput evolution is always lower than the capacity evolution
How read/write more and more data through an always thicker pipe?
Gain : x100 000
Capacity Gain:
x 10’000
In 15 years
Throughput Gain:
x 50
In 15 years
42. 42
New architectures and paradigms
Key
Idea #1
Key Idea #2
Key Idea #3
Since the data is to big to
fit one computer,
distribute it among many
computer (partitioning /
sharding) !
Run transaction and computation in
parallel on multiple (many!) nodes
and scale at the multi-datacenter
level the grid of CPU, RAM and
HDD
Move the code to the
data node, not the data
to the computing node
(Data tier revolution)
44. 44
The early days of digital data …
Before 1960, the data within a Computer Information
System was mostly stored in rather flat files
(sometimes indexed) manipulated by top-level software
systems.
Directly using flat files was cumbersome and painful…
Various needs emerge at the time :
Data isolation
Access efficiency
Data integrity
Reducing the time required to develop brand new
applications
Something else was required …
A bit of history …
45. 45
The relational model rules for 40 years !
E.g. an Exam Grade management app :
Display the subject of a student on his profile
screen, one needs to
1. Extract the personal data from the
“student” table
2. Fetch its subject if from the relation table
3. Read the subject title from the “subject”
table.
Enters the Relational Model …
1969 / Edgar F. Codd - RDBMS
Entities as Tables & associations
The relational model reduces redundancy to optimize disk
space usage
At the time of its creation
Disk storage was very expensive and limited
The volume of data in the Information Systems was rather
small
avoid redundancy to optimize disk space usage, thanks to
guaranties of :
Structure: using normal design forms and modeling
techniques
Coherence: using transaction principles and mechanisms
Why, oh why, to separate these 2 kind of information since in 95% of the use
cases around these data, both will always be used together ?!?
46. 46
The mid and late 2000’s were times of major changes in the IT landscape
Hardware capabilities significantly increased
eCommerce and internet trade, in general, exploded
Some internet companies, so-called the “Web giants” (Yahoo!, Facebook, Google, Amazon,
Ebay, Twitter, …), pushed traditional databases to their limits. Those databases are by
design hard to scale
With relational DBMSes, the only way to improve performance is by scaling up, i.e. getting
bigger servers (more CPU, more RAM, more disk, …). One eventually hits a hard limit
imposed by the current technology
The origins of NoSQL
Faster
More storage
More reliable
Investments
Hard limit
From a certain point,
investments yield little
improvement
Database server
Scaling up:
47. 47
By rethinking the architecture of databases, those companies were able to make
them scale at will, by adding more servers to clusters instead of upgrading the
servers.
The servers are not made of expensive, high-end hardware; they are qualified as
commodity servers (or commodity hardware)
The origins (cont’d)
Faster
More storage
More reliable
Investments
Power grows linearly
with the number of
servers (linear
scalability)
Scaling out:
Database cluster
48. 48
This is the essence of Big Data !
With most NoSQL databases, the data is not stored in one place (i.e. on one server). It is distributed
among the nodes of the cluster. When created, an object A is assigned to a node in the cluster. This is
called sharding – the amount of data assigned to a node is called a shard (also called partition)
Having more cluster nodes implies a higher risk of having some nodes crash, or a network outage splitting
the cluster in two. For this reason, and to avoid data loss, objects are also replicated across the clusters
The number of copies, called replicas, can be tuned. 3 replicas is a common figure
Data distribution
A B
C
D
A
A
B
B
C
C
D
D
The objects may move, as nodes crash or new nodes join the cluster, ready to take charge of some of the
objects. Such events are usually handled automatically by the cluster; the operation of shuffling objects
around to keep a fair repartition of data is called rebalancing
49. 49
The CAP Theorem
Consistency
All clients see the exactly the same
data at the same time, even in the
presence of an update (ACID
Properties)
Availability
The system continues
to operate and all
clients can see “a
version” of a replica,
even in the presence of
node failure
Partition-
tolerance
The system continues to
operate even when the
system is partitioned (some
nodes are unavailable)
AC CP
AP
Not
Possible
Availability
The cluster is available if a
request made by a client is always
acknowledged by the system, i.e.
it is guaranteed to be taken into
account
That doesn’t mean that the
request is processed
immediately. It may be put on
hold. An available system will
at a minimum acknowledge it
Client
Request
Acknowledgement
?
Partition tolerance
Partition Tolerance is verified
if a cluster can stand a
partition; if it continues to
operate when one or several
nodes disappear. (nodes crash,
network equipment down, etc.)
Partition tolerance is related to
availability and consistency, but
it is still different. It states that
the system continues to
function internally (e.g. ensuring
data distribution and
replication), whatever its
interactions with a client
Consistency
Consitency refers to the fact that all replicas
of an entity, identified by a key in the
database, have the same value
whatever the node queried
old version
new version
new version
new version
Client
Update
50. 50
The previous 3 properties, Consistency, Availability and Partition tolerance, are not independent. The CAP
theorem - or Brewer’s theorem - states that a distributed system cannot guarantee all 3 properties at the
same time
This is a theorem. That means it is formally true, but in practice it is less severe than it seems
The system or a client can often choose CA, AP or CP according to the context, and “walk” along the chosen
edge by appropriate tuning
Partition splits happen, but they are rare events (hopefully)
Rule of thumb
Traditional relational DBMSes are CA or CP – consistency is a must, in case of a problem either bring the
cluster down or split it and expect heavy synchronization later
Many NoSQL DBMSes are AP – availability is a must, and with big clusters failures happen so better live with
it. Consistency is only eventual
The CAP theorem
Consistency
Availability Partition-
tolerance
AC CP
AP
Not
Possible
51. 51
This is essential !
Consistency refers to the fact that all replicas of an entity, identified by a key in
the database, have the same value whatever the node queried
With many NoSQL databases, the prefered working mode is AP and all-the-time
consistency is sacrificed.
Favoring performance, updates take a little time to propagate across the cluster. When
an entity’s value has just been created or modified, there is a short span during which
the entity is not consistent.
However the cluster guarantees that it will eventually be, when replication has
occurred. This is called eventual consistency
Eventual Consistency
53. 53
A NoSQL - originally referring to "non-SQL" for "non-relational“ - database provides a mechanism for storage
and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Such databases have existed since the late 1960s, but the name "NoSQL" was only coined in the early 21st
century, triggered by the needs of Web 2.0 companies.
NoSQL databases are increasingly used in Big Data and Real-Time Web applications.
NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query
languages or sit alongside SQL databases in polyglot-persistent architectures.
NoSQL / NewSQL
The fundamental idea behind NoSQL is as follows:
because of the need to distribute data (Big Data), the Web giants have abandoned the whole idea of
ACID transactions (only eventual consistency is possible)
So if we drop ACID Transactions - which we always deemed to be so fundamental - why wouldn't we
challenge all the rest - the relational model and table structure?
Wikipedia - https://en.wikipedia.org/wiki/NoSQL
54. 54
For data fundamentally structured as tabular data et of a
manageable size, the relational model fits.
For instance:
Accounting Data
Customer information
But some other data are modeled in a much more complex way
Geospatial data
Molecular models
Some underlying notions there are fundamentally not relational
Hierarchical data
Several levels of interconnections
In addition, some data models have a high volatility and required
flexibility over time
Information available at the time of the creation of the model are
sometimes incomplete
Or there inherent structure changes over time
The relational model is not well suited for data experiencing constant structural changes
The relational model is not always well suited
56. 56
NoSQL Database Types
Document-oriented (e.g. MongoDB, ES)
Key/Value pairs (e.g. Redis)
Graph (e.g. Neo4J)
Column-family aka BigTable (e.g. Cassandra)
One key has one (and only one) value
The Value type is not specified (Object value)
A Value may have different type
Issue : difficult to fit a model in this modeling pattern
Row = a set of columns
Sorted vertical storage
Operations
Query by key or set of key
Allowing query on secondary indexes
Selection of the resulted columns
The column-family model looks a bit like the relational model
For a given row, the contents of a column can thus be seen as a hash table
with arbitrary (key, value) pairs
Each row in a table is uniquely identified by a key
Documents are structured data in the form of
hierarchical trees (sub-documents)
Data can be of various types
Strings, numbers, arrays
Documents are self-supporting
It contains meta-data about the structure and the
corresponding values
Several storage formats for the document
XML, JSON, BSON
In this model, objects are documents, i.e. trees of
values
Each document has a root and attributes
Attribute values are scalars (integers, strings), lists
or other objects
Each object has a unique ID, a conventional
property whose value serves as a key
Objects are organized into collections. Objects in the
same collection don’t need to have the same schema
– there is no mandatory structure
Based on the interconnection of data (contrary to the other NoSQL
solutions which do not support relations)
Data are not only linked to nodes but also to edges (property graph)
59. 59
What is NewSQL ?
NewSQL refers to relational databases that have adopted upon some of the NoSQL genes, thus exposing
a relational data model and SQL interfaces to distributed, high volume databases
NewSQL, contrary to NoSQL, enables an application to keep
The relational view on the data
The SQL query language
Response times suited to transactional processing
Some were built from scratch (e.g. VoltDB), others are built on top of a NoSQL data store (e.g. SQLFire,
backed by GemFire, a key/value store)
The current trend is for some proven NoSQL databases, like Cassandra, to offer a thin SQL interface,
achieving the same purpose
Generally speaking, the frontier between NoSQL and NewSQL is a bit blurry… SQL compliance is often
sought for, as the key to integrating legacy SQL software (ETL, reporting) with modern No/NewSQL
databases
NewSQL?
61. 61
Hadoop is an Open Source Platform providing
A distributed, scalable and fault tolerant storage system as a grid
Initially, a single parallelism paradigm : MapReduce to reuse the storage nodes as processing nodes
Since Hadoop V2 and YARN, other parallelization paradigms can be implemented on Hadoop
Schemaless and optimized sequential write once and read many times
Querying and processing DSL (Hive, Pig)
Hadoop ?
Hadoop is declined in
different distributions
Fondation Apache
Cloudera
HortonWorks
MapR
IBM
…
The Hadoop’s origins
Initiated by Doug Cutting, leader of Lucene
Based on the Google’s publications about their
indexing system (GFS / Map Reduce / BigTable )
Official Apache project since 2009
Hadoop was primarily intended for Big Data Analytics
Nowadays hadoop can be an infrastructure for much more
Microservices architecture (Hadoop V3)
Real-time Architectures
62. 62
Hadoop Distribution
Hadoop overview
Distributed storage
MapReduce processing engine /
Parallel Computing Framework
Querying Orchestration
Machine learning /
Processing
IS
integration
Supervision
and
Management
Reporting
(Core)
66. 66
Vision of a data lake
With the continued growth in scope and scale of analytics applications using Hadoop and other data
sources, then the vision of an enterprise data lake can become a reality.
In a practical sense, a data lake is characterized by three key attributes:
Collect everything. A data lake contains all data, both raw sources over extended periods of time as well as
any processed data big volumes
Dive in anywhere. A data lake enables users across multiple business units to refine, explore and enrich data
on their terms you don’t know, a priori the analytical structures
Flexible access. A data lake enables multiple data access patterns across a shared infrastructure: batch,
interactive, online, search, in-memory and other processing engines.
As a result, a data lake delivers maximum scale and insight with the lowest possible friction and cost.
Data lake
A data lake is a system or repository of data stored in its natural/raw format
It's is usually a single store of data including raw copies of source system data, sensor data, social data
etc. and transformed data used for tasks such as reporting, visualization, advanced analytics and
machine learning.
It can include structured data from relational databases, semi-structured data (CSV, logs, XML,
JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).
Wikipedia - https://en.wikipedia.org/wiki/Data_lake
67. 67
Datalake Application Architecture
Unstructured Data Storage
Semi-structured data storage
(NoSQL)
Structured Data storage (e.g.
relational)
Interactive Queyring Analytics / Processing Flow Processing
Machine Learning
Databases Raw files Application
logs
External Data / Open
APIs
Events /
Messages
Enterprise DWH Operational
System
Query /
Reporting
APIs / Services Events /
messages
DATA
LAKE
INGESTION
PUBLICATION
69. 69
Definition
A real time system is an event-driven system that is available, scalable and stable, able
to take decisions (actions) with a latency defined as … below the frequency of events
In a streaming architecture …
Historical data is regularly and consistently updated with live data
Live data is available to the end user
Both types or data (historical and live) are not necessarily presented consistently to the
end user
Both sets of data can have their own screens or even application
A consistent view on both sets of data would be proposed by Lambda Architecture (next topic in
this presentation)
Streaming Architectures
70. 70
Complex Event Processing Engine
decision /
action
Transactional
Applications
BPM, ESB
Capture
Streaming Architecture
In memory states and
Calculations:
Time window,
operators, rules
Rules edition GUI
Cache / Distributed Cache
latency : 100 ms
Event/Condition/Action
Stream-based querying
multi-dimen. Analysis
…
Real-time Data GUI
Historical Data GUI
Structured
Events
Unstructured
Events
Reference Data, DWH,
Services Querying
Event
History
71. 71
Complex Event Processing Engine
decision /
action
Transactional
Applications
BPM, ESB
Capture
Streaming Architecture
In memory states and
Calculations:
Time window,
operators, rules
Rules edition GUI
Cache / Distributed Cache
latency : 100 ms
Event/Condition/Action
Stream-based querying
multi-dimen. Analysis
…
Real-time Data GUI
Historical Data GUI
Structured
Events
Unstructured
Events
Reference Data, DWH,
Services Querying
Event
History
Stakes :
- Latency Management ( < 100 ms )
- Throughput( 10’000 msg / sec )
- Memory Consumption
- Balancing and Replication
- Fault Tolerance
- State coherence
- What about lost events ?
- Init from historical data
Stakes :
- Dynamical GUIs
- Data exploration and following axes and
criteria,
- Real-time GUI : event-driven of type « web-
push »
Stakes :
- High read performances in
respect to latency
- Good cache management
Stakes :
- High capacity
- High write performances
- High historical data querying
Performances
- Flexible Design abilities
Stakes:
- « WYSIWYG » editor, usable by business users
- « Hot » updates of rules
- Backtesting
Stakes
- Throughput (10’000 msg/sec )
- Fault tolerance : messages retry?
73. 73
Real-Time Analytics
What if I want real-time analytics ?
• Most Data Analytics software are batch processing solutions!
• So what happens with updates occurring while a batch is running?
• What happens between two of its executions ?
Objectives:
• Take all the data into account
• Be able to answer any kind of request
• Fault-tolerance
• Robustness to evolutions, errors
• Scalability !
• Low latency for writing AND reading
PROCESSED DATA
DATA THAT CAME AFTER THE
START OF THE CURRENT BATCH
Time
More or less a few
minutes to a few hours of
data
A few minutes to a
few hours of data
74. 74
λ (Lambda) Architecture
CONSISTENT
BATCH ANALYTICS ON
COMPREHENSIVE DATA
REAL-TIME / STREAMING
ANALYTICS ON
INCREMENTAL DATA
DATA
STREAM
STORAGE OF PRE-
COMPUTEDS RESULTS /
VIEWS OF THE DATA
STORAGE OF
INCREMENTAL RESULTS /
VIEWS OF THE DATA
To Real-Time Analytics with Near-Real-Time background statistics and models
SPEED LAYER
BATCH LAYER Final latency
< 1second
QUERYING AND
REPORTING
TOOL
AGGREGATION,
MERGING AND
CONSOLIDATION
SERVING LAYER
The batch layer is responsible for consistency and data storage on the long term
The speed layer only analyzes the required time-window
The gap between the last batch execution and the latest real-time data only most recent data.
Both layers produce the same output (unlike usual streaming architectures)
The serving layer provides a consolidated view on both results
75. 75
λ (Lambda) Architecture
CONSISTENT
BATCH ANALYTICS ON
COMPREHENSIVE DATA
REAL-TIME / STREAMING
ANALYTICS ON
INCREMENTAL DATA
DATA
STREAM
STORAGE OF PRE-
COMPUTEDS RESULTS /
VIEWS OF THE DATA
STORAGE OF
INCREMENTAL RESULTS /
VIEWS OF THE DATA
Many solutions for all components
SPEED LAYER
BATCH LAYER
QUERYING AND
REPORTING
TOOL
AGGREGATION,
MERGING AND
CONSOLIDATION
SERVING LAYER
D3.js
HighCharts
Tableaux
Storm DRPC
Java API
Flink
76. 76
κ (Kappa) Architecture
REAL-TIME / STREAMING
ANALYTICS ON
INCREMENTAL DATA
DATA
STREAM
RELOAD OF PREVIOUS
RESULTS / VIEWS OF THE
DATA
STORAGE OF
INCREMENTAL RESULTS /
VIEWS OF THE DATA
Recent Stream Processing Technologies render the batch layer less required
UNIFIED STREAMING LAYER / TECHNOLOGY Final latency
< 1second
QUERYING AND
REPORTING
TOOL
AGGREGATION,
MERGING AND
CONSOLIDATION
SERVING LAYER
Kappa architecture is a streaming-first architecture deployment pattern
With most recent Stream Processing technologies (Kafka Streams, Flink, etc.) the interest and relevance of the batch
layer tend to diminish. The streaming layer matches computation abilities of the batch layer (ML, statistics, etc.) and
stored data as it processes it.
A batch layer would only be needed to kick-start the system on historical data (Flink can do that)
78. 78
Big Data 2.0
2012
2011 2014
Nowadays in 2021 :
With Hadoop 3, these 3 technologies converge tend to converge to the same possibilities. Hadoop 3 supports
deploying jobs as docker containers just as Mesos and K8s
Mesos and Kubernetes can use alternatives to HDFS such as Ceph, GlusterFS, Minio, (of course Amazon,
Azure, …) etc.
However, Kubernetes (and/or technologies based on Kubernetes) emerge as a market standard for the
Operational IS just as Hadoop remains a market standard for the Analytical IS
79. 79
Kubernetes is an Open Source Platform providing
Automated software applications deployment, scaling, failover and management across cluster
of nodes
Management of application runtime components as Docker containers and application units as Pods
Multiple common services required for service location, distributed volume management, etc. (pretty
much everything one requires to deploy application on a Big Data cluster)
Kubernetes
Kubernetes is emerging as a
standard as a
Cloud Operating System
Many distributions
PKS (Pivotal Container Service)
Red-Hat OpenShift
Canonical Kubernetes
Google / AWS / Azure …
…
Kubernetes origins
Based on Google Borg, (one of) Google’s
initial cluster management system(s)
Released as Open-Source component in
Google in 2014
First official release in 2015
80. 80
Kubernetes Architecture
Client
Applications
Client
Applications
Client
Applications
(Secondary Master Node [HA])
(Master Node)
API Server
Control
Plane
Etcd
Key – Value Store
Controller Manager
Kubctl
Port
Forward
Load
Balanc.
Controller
Node
Kubelet
App
App App App App
POD
POD
Volumes
CR1 CR2 GR1 GR3
Ceph Gluster
Kube-Proxy
Docker
Node
App App
App App App
POD
POD
Volumes
CR2 CR3 GR1 GR2
Ceph Gluster
Docker
Node
App
App App App App
POD
POD
Volumes
CR1 CR3 GR2 GR3
Ceph Gluster
Docker
cAdvisor Kubelet Kube-Proxy
cAdvisor Kubelet Kube-Proxy
cAdvisor
KubeMQ
KubeMQ
KubeMQ
82. 82
Microservice architecture – a variant of the Service-Oriented Architecture (SOA) structural style – arranges an application
as a collection of loosely-coupled services. In a microservices architecture, services are fine-grained and the protocols are
lightweight. Its characteristics are as follows:
Services in a microservices architecture (MSA) are small in size, messaging-enabled, bounded by contexts,
autonomously developed, independently deployable, decentralized and built and released with automated
processes.
Services are often processes that communicate over a network to fulfill a goal using technology-agnostic protocols such
as HTTP.
Services are organized around business capabilities.
Services can be implemented using different programming languages, databases, hardware and software environment,
depending on what fits best.
Microservices Architecture
Origins of Micro-services:
As early as 2005, Peter Rodgers introduced the
term "Micro-Web-Services" during a presentation
at the Web Services Edge conference.
The architectural style name was really adopted
in 2012
Kubernetes democratized the architectural
approach
The two big players in this field are Spring
Cloud and Kubernetes
A Microservices-based architecture has the following properties:
Lends itself to a continuous delivery software development
process. A change to a small part of the application only
requires rebuilding and redeploying only one or a small
number of services.
Adheres to principles such as fine-grained interfaces (to
independently deployable services), business-driven
development (e.g. domain-driven design).
Wikipedia - https://en.wikipedia.org/wiki/Microservices
Martin Fowler
83. 83
Microservices Architecture
Client
Applications
Client
Applications
Client
Applications
Master Node
API
Gateway
Service Catalog / Discovery
Management / Orchestration
Node
Node Mgmt.
Execution middleware
Service Proxy
Node Node
Distributed Storage
R1 R2
Distributed Storage
R1 R3
Distributed Storage
R2 R3
Execution middleware Execution middleware
Service B
Service C
Service A
Service D
Service E
Microservices
Node Mgmt. Service Proxy Node Mgmt. Service Proxy
MQ MQ MQ
Static Content
84. 84
Ask yourself : do you need microservices ?
Microservices are NOT Big Data ! [co-local processing]
You don’t need microservices or Kubernetes to benefit from Docker
You’re not scaling anything with synchronous calls
Don’t do microservices unless:
You need independent service-level scalability (vs. storage / processing scalability – Big Data)
You need a strong SOA - Service-Oriented Architecture
You need independent services lifecycle management
Challenges
Distributed caching vs reloading the world all over again
Not all applications are fit for asynchronous communications (WYCIWYG)
Identifying the proper granularity for services
Enterprise architecture view is too big
Application architecture view is too fine
RIA Organizer : good candidates would be : EmailService, CalendarService, ContactService, SearchService
Data consistency without distributed transactions. Applications need to be designed with this in mind.
Weighting the overall memory and performance waste
A Spring boot stack + JVM + Linux Docker base for every single service ?
HTTP calls in between layers ?
Microservices discussion
86. 86
The Strong frontier between Operational IS and Analytical IS vanishes
NoSQL, Streaming, Lambda and Kappa architectures are increasingly overflowing to the
Operational IS and as such provide a common ground for operational processes and
analytical processes.
Historically strong on the BI Side, Hadoop (V3) fits well nowadays for needs of the
Operational IS while Kubernetes can be useful on the Analytical IS
Kubernetes (also Mesos, etc.) is a cloud Operating System, but not only (distribution,
scaling run your cloud locally)
Don’t do Micro-Services unless you need Micro-Services … otherwise just do services :-)
Final notes …
Operational Information System BI
X
Motivation :
D’un côté : IS operationel et son modèle 3 tiers et IS analytics avec son modèle push a J – 1
De l’autre : les micro-services à tort et à travers
Prendre un peu de recul et comprendre ce que la technologie permet et apporte
D’abord sommaiement parcourir des modèles de description d’architecture et introduire un outil qui m’accompagne depuis de nombreuses années dans mon travail d’architecte
Agenda
Typically, the Architectural Design decisions are related to key aspects :
Structural : Typically, "The architectural elements should be organized like this ...”
Behavioural : For instance, "Data processing, storage and visualization will be performed in strict sequence.
Interaction : For instance, "Communication among all system elements should occur only using event notification.”
Non-functional : For instance, "The system's reliability will be ensured by replicating modules."
A process to design a high level solution – un process qui malheureusement n’est pas documenté sur wikipedia – la compréhension de ce process provient de l’expérience mais est supporté par les deux outils qu’on va voir dans un moment
Un produit – la description de l’architecture d’un système çA ne peut pas être un schéma. C’est souvent plusieurs schémas, parfois plusieurs fois le même mais avec des perspectives variants un peu, des spécifications fonctionnelles et non-fonctionnelles, des documentations techniques, etc.
Des moyens, des socles techniques, des librairies technoiques ou fonctionnelles, des middlewares etc
Mais c’est avant tout une réalité. L’architecture d’un système se définit avant tout par le système en fonctionnement
et l’architecte est la personne qui construit ce système, pas la personne qui fait des schémas dans son bureau
Enterprise Architecture vs Application Architecture
L’architecture d’entreprise identifiie comment les différentes applications d’un Système d’information se comportent ensemble contre comme les différents composants se comportent au sein d’une application pour l’architecture applicative.
La meilleure image pour comprendre ceci est de considérer l’architecture d’entreprise comme le plan d’une ville tandis que l’architecture d’une application serait le plan d’un immeuble
Il y a des différences entre ces deux métiers comme les challenges à adresser, le scope et les sujets traités
Mais il y a aussi des grandes similarités, commes les outils à dispositions pour les décrire et les questions à se poser pour identifier les éléments décisionnels
…
L’architecture n’est pas tout à fait du design et le design n’est pas tout à fait de l’architecture.
Mais la frontière entre ces deux mondes est subtil et surtout floue. Aussi, cette frontière dépend de la perspective, de son interprétation au sein d’une équipe, etc.
Neal Ford : “Architecture is about stuff that’s hard to change later”
Moi ça me parle. Pour moi l’architecture s’arrête aux décisions structurantes – aussi bien fonctionnelle que non-fonctionnelles - sur le produit à construire ou le système d’information dans son ensemble. Les éléments qu’n peut changer plus tard, qu’on peur refactorer, sont du design, pas de l’architecture.
Logical View
…
- Fonctionalités et découpe en fonctionalités => identifier les blocs fonctionnels et leur matérialisation.
Décrire ou matérlaiser les relations entre blocs fonctionnels
Pour moi, la vue logique est intimement liée à la story map même si la granularité et la cardinalité peuvent varier
Process View
…
concrètement, on va chercher à identifier comment les blocs technico-fonctionnels intéragissent entre eux pour réaliser les fonctionalités attendues.
Pour ce faire, on va prendre en compte les contraintes fonctionnels mais aussi non-fonctionnelles (performances, scalabilité, la distribution, etc.)
Implementation View
…
Vue du développeut ou on va vouloir voir les packages, les stéréotypes mais aussi répondre à des éléments de gestion du source code.
De mon point de vue, c’est la seule vue du modèle de Kruchten qui a l’époque des IntelliJ, de git et de maven n’est peut être plus tout à fait pertinente et on va voir une approche alternative dans un moment
Physical View
…
C’est vraiment l’architecture système … celle où on pose les composants logiciels et système sur les machines sur lesquels on déploie l’applicatif
Scenario View
Montrer comment tous ces éléments des vues précédentes fonctionnent ensemble pour réaliser les fonctionalités
De plus en plus, la vue scénaio est une dérivation de la story map… ou on la laissse même complptement tomber au profit d’une description des user stories. => je ne vais pas plus m’y attarder
=> On trouve bcp de documentation sur les vues de Kruchten et le 4 + 1 View model online
=> Donner quelques exemples de vue et du design attenant
…
- Regrouper les composants fonctionnels par catégorie business/backend ou presentation/UI
Utiliser un code couleur pour la famille fonctionnelle
Montrer les associations les plus importantes
Montrer des layers – c’est un choix, pas forcément pertinent sur de l’archi fonc.
Aussi décider de montrer quelques composants techniques car ils réalisent des éléments fonctionnels importants
Au final, j’ai décidé de réaliser un schéma qui me permet de
Présenter une découpe fonctionnelle des composants logiciels
Communiquer sur la façon don’t ces composants vont porter les fonctionalités essentiels : éditer un email, afficher un email, sauvegarder un email, envoyer un email , etc.
…
Kruchten take aways.
- Les 4 + 1 vues de Kruchten forment une formalisation des perspectives à décrire en software archtecture.
Un outil intéressant et tjrs d’actualité (à pa peut-être la vue implémentation …)
Ma critique serait :
Bcp de gens se sont evertués à discuter le formalisme en étudiant Kruchten
Le formalisme n’a aucun intérêt … cercles --- ASCII art …
Un bon outil pour faire de l’architecture doit permettre de se poser les bonnes questions
Proposer un autre outil
La vue implémentation me déplait, l’architecture est une formalisme abstrait pour communiquer, pas nécessairement qq chose qui s’évertue à décrire une réalité technique
Finalement, le formalisme du modèle à 4 + 1 vues (basé sur UML) tend naturellement à déborder de l’architecture sur le design (au niveau applicatif)
Consumerization : new information technologies emerge first in the consumer market and then spread into businesses
This is a change compared to the previous situation
Companies used to have better servers/desktop/applications/... than those employees could buy at home
Now, new solutions emerge every month : companies can't keep up
New trend : employees are hired with their devices and their applications
BYOD trend : employees are more comfortable and more efficient with their own devices
Same power in an iPad now than in a Cray a few years back
This consumerization can be found in infrastructures too and is an enabler for the consumer market
A direct consequence of the consumerization: use of a mix of professional and personnal tools by employees (Office Suite, Gmail, Google+, Twitter, Facebook, Dropbox, evernote, ...)
Nowadays several companies are still blocking acccess to these tools from their employees (private banks). Tomorrow, that won’t be possible anymore.
People are used to be connected all the time, with highly efficient devices on highly responsive services, everywhere and for all kind of uses.
The revolution came from the web giants. They had to find technical answers to business challenges like :
GGL : Index the whole web, and keep a response time to any below one second - or how to keep the search free for the user ?
LINK : understand how millions of users use their website ?
AMZ : how to build a product recommendation engine for millions of customers, on millions of products ?
EBAY : how to do a search in ebay ads, even with misspelling ?
Since we started estimating and measuring the amount of produced data until 2003, 5 exabytes (5 billions gigabytes) have been produced.
In 2011, that quantity was generated in 2 days (think of facebook, twitter, google searches logs, financial transaction logs, etc.)
In 2014, this quantity is generated in 10 minutes.
Not only do we generate more and more data
We have the means and the technology to analyze, exploit and mine it and extract meaningful business insights
The data generated by the company’s own systems can be a very interesting source if information regarding customer behaviours, profiles, trends, desires, etc.
But also external data, facebook, twitter logs, etc.
Twitter story : Uber car transportation system in Paris. A driver has refused to carry a customer because the customer was gay. That customer twitted his misadventure. The driver got excluded by Uber only a few hours later.
Instead of harming Uber’s reputation, the story rather gave it credit.
Just an example on how a company can get significant advantages by monitoring social network feeds
For a long time, the increasing volume of data to be handled has not been an issue
The volume of data rises, the number of user rises
The processing abilities rise as well, sometimes even more
See the Moore low above
This model has hold for a very long time.
The cost are going down, the computing capacities are rising, one simply needs to buy a new machine to absorb the load increase.
This is especially true in the mainframe
There wasn’t even any need to make the architecture of the systems (COBOL, etc.) evolve for 30 years
Even outside the mainframe world
The architecture patterns and styles we are using in the operational IS world haven’t really evolve for the last 15 years
Despite new technologies such as Web, Web 2.0, Java, etc. of course
I’m just speaking about architecture and styles
The analytical systems architecture hasn’t evolve for the last 20 years
So everything’s fine ?
No !
As we’ll see, at least two problems emerged relatively recently
1st concern : the throughput
We are able to store more and more data, no problem
Yet we are more and more unable to manipulate this data efficiently
Specifically, fetching all the data on a computation machine to process it is becoming more and more difficult
One challenge : how to handle the massive computation needs / massive amount of data ?
-> New architecture and paradigms are required
3 ideas …
Availability
Availability (or lack thereof) is a property of the database cluster. The cluster is available if a request made by a client is always acknowledged by the system, i.e. it is guaranteed to be taken into account
That doesn’t mean that the request is processed immediately. It may be put on hold. An available system will at a minimum acknowledge it
Practically speaking, availability is usually measured in percents. For instance, 99.99% availability means that the system is unavailable at most 0.01% of the time, that is, at most 53 min per year
Partition tolerance
Partition Tolerance is verified if a system made of several interconnected nodes can stand a partition of the cluster; if it continues to operate when one or several nodes disappear. This happens when nodes crash or when a network equipment is shut down, taking a whole portion of the cluster away
Partition tolerance is related to availability and consistency, but it is still different. It states that the system continues to function internally (e.g. ensuring data distribution and replication), whatever its interactions with a client
Consistency
When talking about distributed databases, like NoSQL, consistency has a meaning that is somewhat more precise than in the relational context
It refers to the fact that all replicas of an entity, identified by a key in the database, have the same value whatever the node queried
With many NoSQL databases, updates take a little time to propagate across the cluster. When an entity’s value has just been created or modified, there is a short span during which the entity is not consistent. However the cluster guarantees that it will eventually be, when replication has occurred. This is called eventual consistency
GFS / Map Reduce – 2002 / BigTable 2006
Ex gisement de données/ réservoir de donénes, ou hub de données
Monde de la décision opérationnelle.
Potentiellement bcp de règles, à faire évoluer fréquemment.
Hors de question de renvoyer le tout à la MOE 3 mois :
on doit aller vite = analyste côté métier doit pouvoir les faire évoluer (= pas du dev)
pouvoir imaginer de nouvelles règles, les simuler sur l’historique (backtesting)
Monde de la décision opérationnelle.
Potentiellement bcp de règles, à faire évoluer fréquemment.
Hors de question de renvoyer le tout à la MOE 3 mois :
on doit aller vite = analyste côté métier doit pouvoir les faire évoluer (= pas du dev)
pouvoir imaginer de nouvelles règles, les simuler sur l’historique (backtesting)