Data Engineer, Patterns & Architecture The future: Deep-dive into Microservices Patterns with Stream Process

Data Engineer, Patterns & Architecture
The future:
Deep-dive into Microservices Patterns with Stream Process
Igor De Souza June - 2 0 2 0

Copyright © 2020, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.

Today’sAgenda
• Industry 4.0
• Data Fabric & Data Mesh
• Microservices Design Patterns
• Is Streaming a Database ?
• Streaming Everywhere

During2020/ 2021the
world continuesto go
throughaParadigmShift
into afuture where“Cyber-
PhysicalSystems”arethe
newnormal.
“Digital Transformation”
requires mindset shift:
1.Sharingdatais more
effective thanaccumulating
2.Decentralizing,distributing,
andcopyingis more
powerful than stockpiling
3.Connectivityandflow of
datais the starting point for
innovation andsocializing.

Real-TimeIndustry4.0
…from Industry 3.0
BatchCentric, Schedulers
Hubs(EDW,Hadoop,DataLake)
Mostly Relational Data(aka Views)
SimplexProcessingis Standard
Sizefor PeakWorkloads
Kimball / Inmon
Architecture Governanceis
“Bolt On” VendorSpecific
…to Industry 4.0
EventCentric,Streams
(Edge,Hybrid,Multi-Cloud)
PolyglotData(viaLogs)
MassivelyParallelisStandard
Elastic,ScaleonDemand
DistributedKappa
GovernanceisEmbedded
OpenSourceEnabled
6

Evolution
This data pattern, popularized by Ralph
Kimball and Bill Inmon, has been the
foundation for enterprise data
management since 1993.
It is transaction consistent, can scale up
nicely for most use cases, and is based on
SQL, lingua-franca for most tools.
By 2010, the Lambda (big data) pattern
was common. In 2014, Jay Kreps (of
LinkedIn) questioned the Lambda
Architecture and spawned Kappa.
The Kappa principles consider batch
processing as a special case of stream
processing. Use a historized event log to
process both real-time as well as batch
processing.
7
ETL
ETL
ETL
ETL
Monoliths

Microservices are Good!
Service Mesh Revolution
Emergence and widespread use of microservices have
directly led to revolution in DevOps, massive uptake in
Kubernetes and by 2020 the Service Mesh revolution
• Key Benefits:
• Decomposition, of monolithic architecture
• Modularity, smaller services and improved
• speed of initial development
• Independence, loosely-coupled systems that can
• be created using different languages or data
• With the loose coupling also comes much
greater flexibility around deployment and
upgrades, eliminating complex dependencies
• Flexibility at Scale, deployments may start small,
run locally and later scale very wide, running
across multi-cloud environments and containers
• Sidecar pattern for “Mesh” frameworks are
10

Domain Driven Design (DDD) principles guide developers to create microservices that align to Bounded
Contexts, which “defines tangible boundaries of applicability of some sub-domain”
Challenges with Bounded Context & DDD
Sounds hard!
11

13
Do not burden my code with all
these infrastructure related
decisions

Data and Events are First-Class Citizens
Service Service Service
Analytics, Data Science and Data Lakes are too important to my
business’ Digital Transformation and Data-Driven initiatives…
we need architecture focus on Data and Events too
Application Microservices
Data Stores
Event Logs
produce events
consume events
produce events
consume events
read write
App
Events
Data
Events
DBLog
Events
Control Plane
• “State of the Truth”
at a point in time
(current or historic)
• Durable storage used
for Data Recovery /
Archives/years of data
• Polyglot, each service
may determine its own
data structures
• “Narrative of the Truth”
sequence of events
(between data snapshots)
• Days/months of event
data available as Time
Series or Messaging
• Strict ordering of events
& Idempotency
• Strong Consistency of
DB logs (eg; whenusing
GoldenGate)
• “Systems ofRecord”
at application tier
• APIs, business rules
and business object
semantics
• Not durable storage
The microservice API is king!
14

Can we take some of the best ideas of a Service Mesh and apply them to
Data and Events, to create a kind of Data Mesh?
Microservice Mesh and Data Mesh
Immutable
Raw Data Events Prepared Data Canonical Data
App
Events
Data
Events
DBLog
Events
Data Domain
Projection…n
Data Domain
Projection 1
Application
Microservices
Control Plane
15

What is a Data Mesh?
16
Microservice
Patterns
Log-based
Integrations
Polyglot Data
Movement
Data Mesh is a data-tier architecture to integrate and
govern enterprise data assets across distributed multi-cloud
environments – two defining characteristics are:
(1) De-centralized data processing; no ETL/Hubs/Lake monoliths
(2) Event-driven; real-time where possible, batch only when necessary
Microservices-centric:
• For the administration, deployment and monitoring of the core
frameworks of data movement and governance
• “Sidecar Proxy” style pattern for Events and Data; Aligns with
Service Mesh frameworks (Kubernetes, Istio, etc)
Immutable event-logs for data integrations:
• Messaging and data store events are globally accessible via
immutable event logs
• Logs may be used to drive Streaming or Batch integrations
Distributed data movement of all types of data
• A data mesh moves data: Relational, NoSQL, JSON, Graph…
• Relational data consistency (ACID) during data movement
• Must work reliably with enterprise OLTP data sets
Data
Mesh
Event
Streaming
Immutable
Logs
Data
Replication
Polyglot
Persistence
Edge / 5G
Frameworks
Domain
Driven
Design
Service Mesh
“Sidecars”
Data
Mesh

Microservice Design Patterns
20

Microservice Design Patterns for Data
Patterns for MicroservicesInherent to the Microservice Architecture is the developer
using specific patterns, sometimes the patterns are partially
embodied in a Programming Framework, but typically the
developers must choose to follow certain heuristics while
programming.
This presentation’s focus:
• “Database Patterns” & “Integration Patterns” …using DBEvent
Replication (AKA: Change Data Capture) to improvethem
• Simplify the pattern, make the microservice application more resilient
and provide better data consistency guarantees
DB Patterns for Discussion:
• Database per Service (coveredearlier)
• CQRS – Command Query Responsibility Segregation
• Event Sourcing
• Saga Pattern
• Transactional Outbox
• Aggregates (AKA: Domain Events)
Transaction
Outbox
21

Hype Cycle 2009
Complex Event Processing:
• CEP is a kind of computing in which incoming data about
events is distilled into more useful, higher level event data
that provides insight into what is happening. […] CEP is
used for highly demanding, continuous-intelligence
applications that enhance situation awareness and
support real-time decisions. Gartner
20 Years Too Early?
• CEP dates back to the 1990’s (history of CEP engines)
• CEP came before “Event Stream Processing” andgenerally
has covered more complex use cases (eg; handling of out-
of-order events and more complicated correlation
semantics) ( what’s the difference, 2019 and mythbuster
CEP vs ESP, 2008)
• Largely overtaken by Big Data stream processing
technologies that are open-source, massively-parallel,and
widely available as cloud-native
2009!
Copyright © 2020 Oracle and/or its affiliates.

Stream Processing 2020
Time
Series
DB/OLAP
Big Data Event
Stream Processing
Complex
Event
Processing
Becoming more aligned to open source
/ apache frameworks
Becoming more capable of rich windowing functions
and time-clock correlation semantics
Complex Event Processing:
• Traditionally running in “scale-up”
SMP in-memory framework
• Many CEP engines aremoving
toward MPP “scale-out”
architectures
• Programming-centric
historically,
some CEP
engines
were
also query-
centric asfar
back as 2009
• Time-clock semantics
and advanced cache
may be used for “state
machine” type usecases
Stream Processing:
• Built around MPP frameworksand
typically Apache open-sourced
• Genesis was around simplistic use
cases on high volumes of events
• All SP engines began with
rudimentary windowing and
correlation semantics, but most
frameworks are gradually
becoming more functionally
comparable to classic CEP
• Simplistic windowing and caching
for basic stream-clock events
Time Series Databases:
• Optimized for persistence and historic analytics
on time-ordered events…data is often sourced
from CEP or SP engines
Technical and Functional Differences between CEP and SP:https://complexevents.com/2019/07/15/whats-the-difference-between-esp-and-cep-2/

Stream Processing/CEP for Event Driven Architectures
There has been a widespread
awakening to the benefits of Event
Drive Architecture (EDA) for
increasing the scalability and agility of
business systems. […] Stream
analytics is based on the mathematics
of complex-event processing (CEP).
CEP is a computing technique in
which incoming data about what is
happening (event data) is processed
as it arrives (data in motion or
recently in motion) to generate
higher level, more useful, summary
information (complex events).
W. Roy Schulte (of Gartner), March 2020:
EDA is Suddenly Popular Will Stream Analytics be Next?
Event Stream Analytics (& CEP)
Data & Microservice Events
Event/Data
Pipelines
Time-Series
Analysis
Geospatial
Analysis
Real-time
AI/ML
Continious
ETL
Use Cases:

Critiques of Event Sourcing
Exposing the Persistence Tier:
• Taken too far (Why Event Sourcing is an Anti-Pattern), developers wind up usingthe
Event Store as a Shared Persistence model, and other microservice now have hard-
coupled binding to the message formats of the originatingservice
Whole System Fallacy:
• Some microservices leaders (Udi and Greg Reach CQRS Agreement) sayto narrow
the aperture on when to use CQRS + Event Sourcing → only within a Business
Component and a Single Bounded Context
• Minimizes utility of pattern for Communications
Forcing Eventual Consistency on Developers:
• The propensity to over-use CQRS & Event Sourcing at the at the whole systemlevel
forces developers to manage eventual consistency in the Application tiers (What
they don’t tell you about eventsourcing)
• “…they will make your life a living hell” doing DevOps, debugging and
system recovery when a “Mesh” of services are interacting via Event Store and
message signatures can lead todisaster

Is Streaming a Database ?
29
• Kstore
• Kcache
• Kareldb
• KSQL
• SparkSQL
• Flink SQL
• Stream Java & Scala
• Oracle 20c - Transactional Event Queues (TEQ)
• Martin Kleppmann | Kafka Summit SF 2018

Spark, Flink or KSQL
[best] ˜œ›™[worst]
Spark
Streaming
Apache
Flink / SQL
Confluent
KSQL
User Experience
Low Code Development (with built-in patterns/accelerators) ™
Interactive/Live Edits (browser based, view changes immediately) ™ ˜
Built-in Live Dashboards (event-driven charts/graphs) ™
Core Streaming Semantics
What is Being Computed (transforms, joins, flatten, statefulness etc) › œ œ
Time Windows(global, fixed, sliding, tumbling, custom etc) › ˜ ˜
When in Processing Time (triggers – event, time, count, timers, etc) œ
How do Refinements Relate (discarding, accumulating, retracting) › œ
Analytics
Robust CEP Capabilities (complex event correlations, native time clock) ™
Geo-Fencing & Spatial (lat/long, built in maps, custom map tiles, etc) ™ ™ ™
Machine Learning (native scala, PMML, python support etc) œ œ
Time Series Analysis (built-in interval patterns, thresholding etc)
Other Features
Backpressure (dynamic ingest per pipeline) Custom Custom Custom
State Management (automation across streams & native cache) N/A RocksDB RocksDB
Data Consistency (OLTP Change Events, Inserts/Updates/Deletes) Custom Custom Custom
GoldenGate Stream Type (aware of SCN/CSN, transactions, order, etc) Custom Custom Custom

Evolution towards Real-Time Data Mesh
mesh & microservice controls
39
ETL
ETL
ETL
ETL

This is not a Metamorphosis, it is a Paradigm Shift
Data success factors that did wellin
Industry 3.0will not be the factors that
create success in Industry4.0
The Success Paradox Next Gen DataArchitecture
ETL Vendors
1990 –2010’s Gen1:
• Replication
• Messaging
• Streaming
• Pipelines
Next-Genhas
newDNAnot
tiedto oldETL tools
Itis impossible to evolve older Batch Processing
tools into a modern Event- Centric Stream
Processing solution; the underlying paradigms
arefundamentally different
41

Data Engineer, Patterns & Architecture The future: Deep-dive into Microservices Patterns with Stream Process

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Engineer, Patterns & Architecture The future: Deep-dive into Microservices Patterns with Stream Process

Similar to Data Engineer, Patterns & Architecture The future: Deep-dive into Microservices Patterns with Stream Process (20)

Recently uploaded

Recently uploaded (20)

Data Engineer, Patterns & Architecture The future: Deep-dive into Microservices Patterns with Stream Process