Functional architectural patterns

Lars Albertsson
Lars AlbertssonFounder & Data Engineer
Functional architectural
patterns
Lars Albertsson
1
Who’s talking?
Swedish Institute of Comp. Sc. (test tools)
Sun Microsystems (very large machines)
Google (Hangouts, productivity)
Recorded Future (NLP startup)
Cinnober Financial Tech. (trading systems)
Spotify (data processing & modelling)
Schibsted (data processing & modelling)
2
Why functional?
Verbs
... has made ... expanding ...
... flourishes ... merged ... has been unable to escape lingering .. built ...
... are ... placed ... say ... are ... to explode ...
.. are considering ... to reopen … to recall ...
3
Or object-oriented?
Nouns, pronouns
... bankruptcy ... government bailout ... automaker Chrysler ... comeback ... sales ... Jeep sport utility vehicles.
... Chrysler ... part ... Fiat Chrysler Automobiles, it ... concerns ... the safety ... Jeeps ...
... Jeeps ... gas tanks ... regulators ... safety advocates ... rear-end crash.
... regulators ... an investigation ... those Jeeps ... Fiat Chrysler’s agreement ... models.
4
Functional benefits? My version.
Matches a few problems
Data processing
Matches a few computer properties
Consistency through immutability
Deterministic - replay for resilience
5
Local vs distributed properties
Local
Hardware provides
strong consistency
Faults -> death
6
Distributed
Eventual consistency
Faults must be
survived
Architectural functional patterns
Personal anti-pattern experiences
Strive to look for
Immutability
Reexecution
7
MapReduce
Discovered pattern, not invention
Well known, enough said
Succeeded by Spark RDD paradigm
8
Data flows
9
Users
Page
views
Sales
Sales
reports
Views with
demographics
Sales with
demographics
Conversion
analytics
Conversion
analytics
Views with
demographics
Dataset artifacts, typically files with date
parameter.
Raw Derived
Anti-pattern - isolated batch jobs
Get data (more on that later)
Cron an ETL batch job (function)
Output solidifies. Mostly.
Steps in isolation - often different teams
What to do on ETL code changes?
10
Sales with
demographics
Views with
demographics
Pattern: data pipeline
End-to-end sequences/DAG of jobs
Not only exist, but treated end-to-end
Input is raw, original data
Separate raw data from generated
11
Users
Page
views
Sales with
demographics
Conversio
n analytics
Conversion
analytics
Views with
demographics
Lambda architecture, part 1
Save all collected data without preprocessing
But timestamp on generation, register,
arrival
Rerun everything downstream on code change
Human fault tolerance
In conflict with privacy management?
12
Pipeline workflow orchestration
Ideally: Good old make + cluster + IDE + xUnit
Test end-to-end
Rebuild on upstream changes (but not all)
State of practice: Luigi, Pinball, Azkaban
Don’t take you all the way :-(
13
Lambda architecture, part 2
Parallel batch and real-time pipelines
Batch more accurate, overrides
Real-time for window of recent data
14
Obtaining data
Log things. Conceptually stable, but collection
is challenging at scale.
Have legacy code and master data in
databases? Let us have a look.
15
Database dimensioned for online traffic
Hadoop = herd of elephants
Load spike
Height = #mapper nodes
Area = #users
Anti-pattern: direct dump
16
API
Direct dumps in the trenches
Company successful - #users increasing
More Sqoop mappers - higher DB load
Daily dump jobs went to 25h
Devops firewalled off Hadoop to recover
17
Anti-pattern: dump through API
SOA/microservice culture
DB protected by throttling
API not used to elephants
Query area is still large
Herd of elephants through gate - 1-2 weeks
18
API
Anti-pattern: slave dump
Protect live service by mirroring to a dump
slave
No online service risk, good!
Why anti-pattern?
19
All dumps are non-deterministic
HDFS down? Dump later.
State is gone - dump not accurate
Slave replication down?
Dump not accurate
20
Anti-pattern: deterministic mirror
Replay commit log until full day/hour
Discovered through archaeology :-)
Not scalable, point of failure
Hourly dump took 45 minutes, increasing...
2121
(Anti-)pattern: better dumping
Netflix Aegisthus
Snapshot Cassandra (fast, atomic,
reliable)
Transfer SSTables to HDFS
Replicate compaction in MapReduce
Other DBs? Depends on atomic snapshot.
22
All dumps are anti-patterns?
Typical use: Join activity events with user info
Event time != dump time
Aggregation discards information
Which users enabled X, tried, and disabled?
23
Pattern: Event source
All facts are events. Immutable, timestamped
Event stream is source of truth
No explicit “current state”
The functional data architecture?
24
Event source incarnated: unified log
Pour events into pub/sub bus, with long history.
Kafka de-facto standard.
Tap from bus to HDFS/S3 in time buckets.
Camus/Secor
Stream processing pipelines to dest topics
Replay on code changes
25
Unified log, practical considerations
Long history necessary
Must have time to fix stream process bugs
Use 3+ months and use stream as temp
DB
Unified log also useful for meta and control
Tweak Kafka for low latency
26
Event source + views
View = snapshot of aggregated state @ time
For ETL, choice of hourly/daily aggregates or
exact views
27
Logs
View View
Event source + database
Business logic may demand “current state”
Event stream is truth, keep DB in sync
28
Event source, synced database
A. Service interface generates
events and DB transactions
B. Generate stream from DB
commit log.
Postgres, MySQL -> Kafka
C.Build DB with stream
processing
29
APIAPIAPI
Deployment & orchestration
System = many machines
Desired system state = code + config
Actual state = Orchestrator(current, desired)
30
Anti-pattern: stateful orchestration
Orchestrator = Puppet|Chef|Ansible {
current.changeSomeProperties(desired)
return current
// current.otherProperties unchanged
}
31
Stateful orchestration in the trench
Desired = { case roleA: install(x,y)
case roleB: install(z) }
Current = x installed on roleB. Old x. Zombie
woke up when B load decreased.
Puppet+apt = No simple way to remove
undesired state
32
Pattern: artifacts from source
Orchestrator = Docker|Packer {
delete current
return Image(desired)
}
No state leak from existing state. Sort of.
33
Deterministic, predictable?
Image building leaky on purpose
E.g. “apt-get update && apt-get install”
Imports external state
Ephemeral databases preserve state
Ability to rebuild from unified log is
valuable
34
Jay Kreps, Confluent: Unified log
Martin Kleppman: Unified log, Bottled Water
Nathan Marz: Lambda
Sander Mak @ Jfokus: Event sourcing
Datomic
Questions?
More?
35
1 of 35

Recommended

Data Infrastructure for a World of Music by
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
1.5K views37 slides
Building real time data-driven products by
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
2.8K views51 slides
A primer on building real time data-driven products by
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
951 views17 slides
Test strategies for data processing pipelines by
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
5.2K views43 slides
Data pipelines from zero to solid by
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
10.7K views58 slides
Quark Virtualization Engine for Analytics by
Quark Virtualization Engine for Analytics Quark Virtualization Engine for Analytics
Quark Virtualization Engine for Analytics DataWorks Summit/Hadoop Summit
942 views21 slides

More Related Content

What's hot

Introduction to Apache Apex by Thomas Weise by
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
662 views36 slides
Lambda architecture @ Indix by
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
2.7K views44 slides
Continuous delivery for machine learning by
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
2.9K views42 slides
Delta from a Data Engineer's Perspective by
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
1.1K views28 slides
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark... by
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
1.6K views25 slides
The Revolution Will be Streamed by
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be StreamedDatabricks
327 views37 slides

What's hot(20)

Introduction to Apache Apex by Thomas Weise by Big Data Spain
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain662 views
Continuous delivery for machine learning by Rajesh Muppalla
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla2.9K views
Delta from a Data Engineer's Perspective by Databricks
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks1.1K views
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark... by Modern Data Stack France
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
The Revolution Will be Streamed by Databricks
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
Databricks327 views
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka by Databricks
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Databricks636 views
Workflow Hacks #1 - dots. Tokyo by Taro L. Saito
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
Taro L. Saito3.1K views
Hadoop summit 2010, HONU by Jerome Boulon
Hadoop summit 2010, HONUHadoop summit 2010, HONU
Hadoop summit 2010, HONU
Jerome Boulon1.1K views
Speed layer : Real time views in LAMBDA architecture by Tin Ho
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
Tin Ho3.5K views
Visual Mapping of Clickstream Data by DataWorks Summit
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit5.7K views
Intro to Spark development by Spark Summit
 Intro to Spark development  Intro to Spark development
Intro to Spark development
Spark Summit10K views
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R by Databricks
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks27.3K views
Scalable real-time processing techniques by Lars Albertsson
Scalable real-time processing techniquesScalable real-time processing techniques
Scalable real-time processing techniques
Lars Albertsson1.5K views
Realtime Reporting using Spark Streaming by Santosh Sahoo
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark Streaming
Santosh Sahoo2.9K views
Pinot: Near Realtime Analytics @ Uber by Xiang Fu
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu21.8K views

Viewers also liked

Organising for Data Success by
Organising for Data SuccessOrganising for Data Success
Organising for Data SuccessLars Albertsson
579 views22 slides
Building Scalable Data Pipelines - 2016 DataPalooza Seattle by
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
5.7K views80 slides
Basic concepts in environmental engineering by
Basic concepts in environmental engineeringBasic concepts in environmental engineering
Basic concepts in environmental engineeringjoefreim
22.3K views11 slides
Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ... by
Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ...Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ...
Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ...Galala University
65.1K views142 slides
Early christian architecture by
Early christian architectureEarly christian architecture
Early christian architectureGoby Cracked
102.2K views79 slides
Chinese gen arch. characteristics by
Chinese gen arch. characteristicsChinese gen arch. characteristics
Chinese gen arch. characteristicsbenazirmohamedkhan
9.8K views20 slides

Viewers also liked(6)

Building Scalable Data Pipelines - 2016 DataPalooza Seattle by Evan Chan
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan5.7K views
Basic concepts in environmental engineering by joefreim
Basic concepts in environmental engineeringBasic concepts in environmental engineering
Basic concepts in environmental engineering
joefreim22.3K views
Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ... by Galala University
Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ...Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ...
Architectural Design Concepts Approaches - كونسيبت التصميم المعمارى و الفكرة ...
Galala University65.1K views
Early christian architecture by Goby Cracked
Early christian architectureEarly christian architecture
Early christian architecture
Goby Cracked102.2K views

Similar to Functional architectural patterns

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona by
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaDobo Radichkov
473 views40 slides
Apache Beam: A unified model for batch and stream processing data by
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
22.5K views73 slides
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben... by
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
4.3K views101 slides
Advanced data science algorithms applied to scalable stream processing by Dav... by
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Big Data Spain
790 views55 slides
Enterprise Data Lakes by
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data LakesFarid Gurbanov
187 views24 slides
Metadata and Provenance for ML Pipelines with Hopsworks by
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
122 views66 slides

Similar to Functional architectural patterns(20)

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona by Dobo Radichkov
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Dobo Radichkov473 views
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben... by confluent
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent4.3K views
Advanced data science algorithms applied to scalable stream processing by Dav... by Big Data Spain
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
Big Data Spain790 views
Metadata and Provenance for ML Pipelines with Hopsworks by Jim Dowling
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
Jim Dowling122 views
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline by ScyllaDB
Learning Rust the Hard Way for a Production Kafka + ScyllaDB PipelineLearning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
ScyllaDB892 views
SnappyData Toronto Meetup Nov 2017 by SnappyData
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
SnappyData241 views
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa... by StreamNative
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative510 views
Managing your black friday logs Voxxed Luxembourg by David Pilato
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
David Pilato475 views
Resilience: the key requirement of a [big] [data] architecture - StampedeCon... by StampedeCon
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
StampedeCon1.3K views
SnappyData, the Spark Database. A unified cluster for streaming, transactions... by SnappyData
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData2.5K views
Synapse 2018 Guarding against failure in a hundred step pipeline by Calvin French-Owen
Synapse 2018 Guarding against failure in a hundred step pipelineSynapse 2018 Guarding against failure in a hundred step pipeline
Synapse 2018 Guarding against failure in a hundred step pipeline
Calvin French-Owen120 views
Data Analytics at Altocloud by Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
Altocloud1K views
Making Machine Learning Easy with H2O and WebFlux by Trayan Iliev
Making Machine Learning Easy with H2O and WebFluxMaking Machine Learning Easy with H2O and WebFlux
Making Machine Learning Easy with H2O and WebFlux
Trayan Iliev119 views

More from Lars Albertsson

Crossing the data divide by
Crossing the data divideCrossing the data divide
Crossing the data divideLars Albertsson
3 views31 slides
Schema management with Scalameta by
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
7 views50 slides
How to not kill people - Berlin Buzzwords 2023.pdf by
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
34 views51 slides
Data engineering in 10 years.pdf by
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
841 views52 slides
The 7 habits of data effective companies.pdf by
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfLars Albertsson
252 views44 slides
Holistic data application quality by
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
396 views30 slides

More from Lars Albertsson(20)

How to not kill people - Berlin Buzzwords 2023.pdf by Lars Albertsson
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
Lars Albertsson34 views
Data engineering in 10 years.pdf by Lars Albertsson
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
Lars Albertsson841 views
The 7 habits of data effective companies.pdf by Lars Albertsson
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
Lars Albertsson252 views
Holistic data application quality by Lars Albertsson
Holistic data application qualityHolistic data application quality
Holistic data application quality
Lars Albertsson396 views
Secure software supply chain on a shoestring budget by Lars Albertsson
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
Lars Albertsson268 views
DataOps - Lean principles and lean practices by Lars Albertsson
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
Lars Albertsson787 views
The right side of speed - learning to shift left by Lars Albertsson
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
Lars Albertsson202 views
Mortal analytics - Covid-19 and the problem of data quality by Lars Albertsson
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson416 views
Data ops in practice - Swedish style by Lars Albertsson
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
Lars Albertsson408 views
Eventually, time will kill your data processing by Lars Albertsson
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
Lars Albertsson413 views
Taming the reproducibility crisis by Lars Albertsson
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
Lars Albertsson521 views
Eventually, time will kill your data pipeline by Lars Albertsson
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
Lars Albertsson936 views

Recently uploaded

Navigating container technology for enhanced security by Niklas Saari by
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas SaariMetosin Oy
14 views34 slides
Airline Booking Software by
Airline Booking SoftwareAirline Booking Software
Airline Booking SoftwareSharmiMehta
7 views26 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
5 views7 slides
AI and Ml presentation .pptx by
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptxFayazAli87
13 views15 slides
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...Marc Müller
41 views62 slides
Flask-Python.pptx by
Flask-Python.pptxFlask-Python.pptx
Flask-Python.pptxTriloki Gupta
7 views12 slides

Recently uploaded(20)

Navigating container technology for enhanced security by Niklas Saari by Metosin Oy
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy14 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta7 views
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8713 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller41 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller42 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app7 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254556 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic14 views
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j17 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino6 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok15 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik8 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views

Functional architectural patterns

  • 2. Who’s talking? Swedish Institute of Comp. Sc. (test tools) Sun Microsystems (very large machines) Google (Hangouts, productivity) Recorded Future (NLP startup) Cinnober Financial Tech. (trading systems) Spotify (data processing & modelling) Schibsted (data processing & modelling) 2
  • 3. Why functional? Verbs ... has made ... expanding ... ... flourishes ... merged ... has been unable to escape lingering .. built ... ... are ... placed ... say ... are ... to explode ... .. are considering ... to reopen … to recall ... 3
  • 4. Or object-oriented? Nouns, pronouns ... bankruptcy ... government bailout ... automaker Chrysler ... comeback ... sales ... Jeep sport utility vehicles. ... Chrysler ... part ... Fiat Chrysler Automobiles, it ... concerns ... the safety ... Jeeps ... ... Jeeps ... gas tanks ... regulators ... safety advocates ... rear-end crash. ... regulators ... an investigation ... those Jeeps ... Fiat Chrysler’s agreement ... models. 4
  • 5. Functional benefits? My version. Matches a few problems Data processing Matches a few computer properties Consistency through immutability Deterministic - replay for resilience 5
  • 6. Local vs distributed properties Local Hardware provides strong consistency Faults -> death 6 Distributed Eventual consistency Faults must be survived
  • 7. Architectural functional patterns Personal anti-pattern experiences Strive to look for Immutability Reexecution 7
  • 8. MapReduce Discovered pattern, not invention Well known, enough said Succeeded by Spark RDD paradigm 8
  • 9. Data flows 9 Users Page views Sales Sales reports Views with demographics Sales with demographics Conversion analytics Conversion analytics Views with demographics Dataset artifacts, typically files with date parameter. Raw Derived
  • 10. Anti-pattern - isolated batch jobs Get data (more on that later) Cron an ETL batch job (function) Output solidifies. Mostly. Steps in isolation - often different teams What to do on ETL code changes? 10 Sales with demographics Views with demographics
  • 11. Pattern: data pipeline End-to-end sequences/DAG of jobs Not only exist, but treated end-to-end Input is raw, original data Separate raw data from generated 11 Users Page views Sales with demographics Conversio n analytics Conversion analytics Views with demographics
  • 12. Lambda architecture, part 1 Save all collected data without preprocessing But timestamp on generation, register, arrival Rerun everything downstream on code change Human fault tolerance In conflict with privacy management? 12
  • 13. Pipeline workflow orchestration Ideally: Good old make + cluster + IDE + xUnit Test end-to-end Rebuild on upstream changes (but not all) State of practice: Luigi, Pinball, Azkaban Don’t take you all the way :-( 13
  • 14. Lambda architecture, part 2 Parallel batch and real-time pipelines Batch more accurate, overrides Real-time for window of recent data 14
  • 15. Obtaining data Log things. Conceptually stable, but collection is challenging at scale. Have legacy code and master data in databases? Let us have a look. 15
  • 16. Database dimensioned for online traffic Hadoop = herd of elephants Load spike Height = #mapper nodes Area = #users Anti-pattern: direct dump 16 API
  • 17. Direct dumps in the trenches Company successful - #users increasing More Sqoop mappers - higher DB load Daily dump jobs went to 25h Devops firewalled off Hadoop to recover 17
  • 18. Anti-pattern: dump through API SOA/microservice culture DB protected by throttling API not used to elephants Query area is still large Herd of elephants through gate - 1-2 weeks 18 API
  • 19. Anti-pattern: slave dump Protect live service by mirroring to a dump slave No online service risk, good! Why anti-pattern? 19
  • 20. All dumps are non-deterministic HDFS down? Dump later. State is gone - dump not accurate Slave replication down? Dump not accurate 20
  • 21. Anti-pattern: deterministic mirror Replay commit log until full day/hour Discovered through archaeology :-) Not scalable, point of failure Hourly dump took 45 minutes, increasing... 2121
  • 22. (Anti-)pattern: better dumping Netflix Aegisthus Snapshot Cassandra (fast, atomic, reliable) Transfer SSTables to HDFS Replicate compaction in MapReduce Other DBs? Depends on atomic snapshot. 22
  • 23. All dumps are anti-patterns? Typical use: Join activity events with user info Event time != dump time Aggregation discards information Which users enabled X, tried, and disabled? 23
  • 24. Pattern: Event source All facts are events. Immutable, timestamped Event stream is source of truth No explicit “current state” The functional data architecture? 24
  • 25. Event source incarnated: unified log Pour events into pub/sub bus, with long history. Kafka de-facto standard. Tap from bus to HDFS/S3 in time buckets. Camus/Secor Stream processing pipelines to dest topics Replay on code changes 25
  • 26. Unified log, practical considerations Long history necessary Must have time to fix stream process bugs Use 3+ months and use stream as temp DB Unified log also useful for meta and control Tweak Kafka for low latency 26
  • 27. Event source + views View = snapshot of aggregated state @ time For ETL, choice of hourly/daily aggregates or exact views 27 Logs View View
  • 28. Event source + database Business logic may demand “current state” Event stream is truth, keep DB in sync 28
  • 29. Event source, synced database A. Service interface generates events and DB transactions B. Generate stream from DB commit log. Postgres, MySQL -> Kafka C.Build DB with stream processing 29 APIAPIAPI
  • 30. Deployment & orchestration System = many machines Desired system state = code + config Actual state = Orchestrator(current, desired) 30
  • 31. Anti-pattern: stateful orchestration Orchestrator = Puppet|Chef|Ansible { current.changeSomeProperties(desired) return current // current.otherProperties unchanged } 31
  • 32. Stateful orchestration in the trench Desired = { case roleA: install(x,y) case roleB: install(z) } Current = x installed on roleB. Old x. Zombie woke up when B load decreased. Puppet+apt = No simple way to remove undesired state 32
  • 33. Pattern: artifacts from source Orchestrator = Docker|Packer { delete current return Image(desired) } No state leak from existing state. Sort of. 33
  • 34. Deterministic, predictable? Image building leaky on purpose E.g. “apt-get update && apt-get install” Imports external state Ephemeral databases preserve state Ability to rebuild from unified log is valuable 34
  • 35. Jay Kreps, Confluent: Unified log Martin Kleppman: Unified log, Bottled Water Nathan Marz: Lambda Sander Mak @ Jfokus: Event sourcing Datomic Questions? More? 35