Uploaded byLorenzo Alberton

41,480 views

Scaling Teams, Processes and Architectures

The document by Lorenzo Alberton outlines key principles for scaling teams, processes, and technology within an organization. It emphasizes the importance of hiring the right talent, establishing effective team structures, and implementing critical processes for improved management and scalability. Additionally, it covers architectural principles, the significance of monitoring systems, and strategies for managing big data and ensuring performance through caching and load testing.

Technology◦Business◦

Related topics:

Service-Oriented Architecture•

In this document

Powered by AI

Slide 1Managing Growth in Scalability

Introduction to scaling teams, processes, and architectures for managing growth.

Slide 2Core Elements of Scalability

Scalability encompasses three main elements: People, Processes, and Technology.

Slide 3Focus on People

Discussion on staffing, roles, management, and team dynamics.

Slides 4 - 6Essential Staffing Guidelines

Key staffing principles: hire smarter people, cultural fit, and avoid toxic individuals.

Slides 7 - 9Optimal Team Size and Structure

Challenges of team size including micromanagement and communication issues; examples of structures.

Slides 10 - 15Importance of Processes

Processes are critical for effective team management, standardizing actions, and enabling agility.

Slides 16 - 20Capacity Planning Explained

Understanding capacity, current load assessment, and implications for planning.Control change by determining risk levels and cumulative effects.

Slides 21 - 34Testing and Performance Measurement

Load and stress testing to identify bottlenecks and ensure app stability.Fundamentals of architecting scalable solutions, including N + 1 design principles.

Slides 35 - 46Fault Isolation for Stability

Implementing fault isolative structures to enhance system availability and debugging ease.Various caching methods (Object, Application, CDN) to improve performance and scale.Myriad factors in managing big data including costs, management, and storage policies.

Slides 47 - 57Monitoring and Measurement Practices

Comprehensive strategies to monitor systems and applications for identifying issues.Examples of software architecture demonstrating independent scalability and SOA principles.

Slides 58 - 62Messaging Systems Explained

Messaging frameworks such as ZeroMQ and Kafka for workload processing and distribution.

Slides 63 - 66Recruitment and Contact Information

Call for job applications, references, and contact details for further inquiries.

Lorenzo Alberton
@lorenzoalberton

Scaling Teams,
Processes and
Architectures
Managing growth

London Scalability Group,
Innovation Warehouse, 16th April 2012
1

Scalability Is About...

People

Processes Technology

2

People
Stafﬁng, Roles, Management, Teams

3

Stafﬁng

Never compromise.

Only hire people smarter than you.

http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

Stafﬁng

Hire people who can ﬁt
the company culture.

Promote fun in your
working environment.

http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

Stafﬁng

Beware of
toxic people

http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4

Team Size and Structure
Micromanaging managers Poor communication
too small Overworked team members Low morale too big
Can’t accomplish much Low productivity

5

Team Size and Structure
Micromanaging managers Poor communication
too small Overworked team members Low morale too big
Can’t accomplish much Low productivity

CTO
functional
PM PM PM

Designer Developer Tester

Designer Developer Tester

Designer Developer Tester

Designer Developer Tester
Designers Developers Testers
5

Team Size and Structure
Micromanaging managers Poor communication
too small Overworked team members Low morale too big
Can’t accomplish much Low productivity

CTO
functional
matrix
PM PM PM

Proj 1 PM Designer Developer Tester

Proj 2 PM Designer Developer Tester

Proj 3 PM Designer Developer Tester

Proj 4 PM Designer Developer Tester
Designers Developers Testers
5

Processes

6

Why are processes critical?
Improve management of teams and employees
Standardise actions in repetitive tasks
Reduce mundane decisions to focus on grander ideas
Allow the team to react quickly to crisis
Determine system capacity and scalability needs

7

Why are processes critical?
Improve management of teams and employees
Standardise actions in repetitive tasks
Reduce mundane decisions to focus on grander ideas
Allow the team to react quickly to crisis
Determine system capacity and scalability needs

Challenge

7

Why are processes critical?
Improve management of teams and employees
Standardise actions in repetitive tasks
Reduce mundane decisions to focus on grander ideas
Allow the team to react quickly to crisis
Determine system capacity and scalability needs

Challenge

right amount
7

Why are processes critical?
Improve management of teams and employees
Standardise actions in repetitive tasks
Reduce mundane decisions to focus on grander ideas
Allow the team to react quickly to crisis
Determine system capacity and scalability needs

Challenge

right amount right process
7

Why are processes critical?
Improve management of teams and employees
Standardise actions in repetitive tasks
Reduce mundane decisions to focus on grander ideas
Allow the team to react quickly to crisis
Determine system capacity and scalability needs

Challenge

right amount right process right time
7

Determining Headroom

Capacity

Current Load

8

Determining Headroom
Why?
Capacity

Planning
annual
budget

Hiring plan
Current Load

Prioritisation
8

Controlling Change: Determine Risk

http://dilbert.com/strips/comic/2008-05-08/ 9

Controlling Change: Determine Risk

http://dilbert.com/strips/comic/2008-05-08/ 9

Risk Management
Risk is cumulative

Determine limits and tolerance
10

Load / Stress Testing
Load testing
- identify, document and eliminate
bottlenecks through a strict controlled
process of measurement and analysis
- measure system’s response and stability
- verify the app can meet the desired
performance objectives (SLA)

Stress testing
- determine the app’s stability when
subjected to above-normal loads
- verify the app’s behaviour when close
to the breaking point
- test the application recoverability
(negative testing)

11

Barrier Conditions

Code reviews
Manual and automated QA processes
Performance and stress testing
Release documentation checks (runbook)
Dev, Test, Stage and Live environments
Instrumentation checks

Protection from signiﬁcant failures
12

Technology
Architecting Scalable Solutions

13

Architectural Principles

14

Architectural Principles

+1
N + 1 design

14

Architectural Principles

+1
N + 1 design for rollback

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be
monitored

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple
monitored live sites

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

asynchronous
design
14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

asynchronous stateless
design systems
14

Architectural Principles

+1
N + 1 design for rollback to be disabled

to be for multiple use mature
monitored live sites technology

asynchronous stateless buy when
design systems non core
14

Stateless, Asynchronous Systems

http://upload.wikimedia.org/wikipedia/commons/4/46/Synchronized_swimming_-_Russian_team.jpg 15

Fault Isolative Structures

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging

First

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging
Functions
causing
repetitive
problems
First

16

Fault Isolative Structures
Increase availability
Limit impact of
failures
Easier debugging
Functions Natural layout
causing or topology
repetitive of the site
problems
First

16

Caching for Performance and Scale

17

Caching for Performance and Scale
Object Caches

Usually serialized
(marshalling /
unmarshalling)

get() / set() /
replace()

APC, Memcached

17

Caching for Performance and Scale
Object Caches Application Caches

Usually serialized Proxy caches
(marshalling /
Reverse proxy
unmarshalling)
caches

get() / set() / HTTP headers
replace()

ISP/Uni proxies
APC, Memcached Squid, Varnish,
mod_cache

17

Caching for Performance and Scale
Object Caches Application Caches CDNs

Usually serialized Proxy caches Multiple locations
(marshalling / / backbones
Reverse proxy
unmarshalling)
caches

get() / set() / HTTP headers CNAME entries
replace()

ISP/Uni proxies Akamai, Coral,
APC, Memcached Squid, Varnish,
Limelight...

mod_cache

17

Managing “Big Data”

storage costs
people and software
power and space
processing power
backup time and costs

18

Managing “Big Data”
The more storage

...the more
storage management
storage costs
people and software
power and space
processing power
backup time and costs

18

Managing “Big Data”
The more storage

...the more
storage management
storage costs
people and software
power and space
processing power
backup time and costs
Evaluate data retention policy
Consider multi-tiered storage
Distribute data/ work (Hadoop, M/R)
18

Monitoring: Measure Everything

19

Monitoring: Measure Everything

1. Is there a problem? User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem? Application monitors

19

Monitoring: Measure Everything

1. Is there a problem? User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem? Application monitors

Keep Signal vs. Noise ratio high
19

Monitoring: Measure Everything

StatsD

1. Is there a problem? User experience / Business metrics monitors

2. Where is the problem? System monitors (threshold - variance)

3. What is the problem? Application monitors

Keep Signal vs. Noise ratio high
19

DataSift Architecture
Some Architecture Pr0n

20

DataSift Architecture

http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

DataSift Architecture

SOA - loosely coupled,
independently scalable
services. Simple APIs

http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

DataSift Architecture

SOA - loosely coupled,
independently scalable
services. Simple APIs

example

http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21

SOA - Scale Each Component

22

Our Stack
Languages: C++, PHP, Java, Scala, Ruby, Node.JS
Storage: MySQL, HBase
Cache: Memcached, APC, Redis
Queues: ZeroMQ, Kafka, Redis
Development/Deployment: GIT, Jenkins CI, RPM, Chef
Monitoring: StatsD + Graphite, Zenoss

23

Our Stack
Languages: C++, PHP, Java, Scala, Ruby, Node.JS
Storage: MySQL, HBase
Cache: Memcached, APC, Redis
Queues: ZeroMQ, Kafka, Redis
Development/Deployment: GIT, Jenkins CI, RPM, Chef
Monitoring: StatsD + Graphite, Zenoss

Secret recipe: amazing people and working environment

23

Messaging
ZeroMQ: PUSH-PULL, REQ-REP, PUB-SUB (multicast, broadcast)

Internal communication: pass messages to the next processing
stage, control events, monitoring

Kafka/Redis: PUSH-PULL with persistence

Internal message / workload buffering and distribution

Node.js: WebSockets / HTTP Streaming

Message delivery (output)

24

0mq PUSH-PULL (workload distribution)

Consumer 1

Consumer 2

Consumer 3

[Round-Robin-ish]

25

0mq PUB-SUB (High Availability)

Listener 1

Publisher 1

Listener 2

Publisher 2
Listener 3

[Broadcast] [Dynamic Subscriptions]

26

0mq PUB-SUB (High Availability)

DC 1
Publisher 1

Publisher 2

DC 2

27

Internal “Firehose”

Publishers Subscribers

Alice’s John’s
Y Z timeline Inbox
X
subscribe
to topic X

Data Bus
subscribe
to topic Y

System Fred’s Tech
Monitor Followers Blog Feed

28

Instrumentation

https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd 29

We’re Hiring!

http://datasift.com/whoweare/jobs
30

References
M. L. Abbot, M. T. Fisher,
“The Art Of Scalability”,
Addison Wesley
http://theartofscalability.com/

http://www.slideshare.net/quipo/the-art-of-scalability-managing-
growth
http://www.slideshare.net/postwait/scalable-internet-architecture
http://bit.ly/IJKwuc
http://agile.dzone.com/news/approaches-organizational
https://bitly.com/vCSd49

31

Lorenzo Alberton
@lorenzoalberton

Thank you!
lorenzo@alberton.info
http://www.alberton.info/talks

Questions?

32

Recommended

PDF

Scrum artifacts

byEnterprise Hardware Solutions (Private) Limited

PPTX

The Unicorn Project and The Five Ideals (older: see notes for newer version)

PDF

Introducing Agile Scrum XP and Kanban

byDimitri Ponomareff

PDF

Introduction to JIRA

PPT

Agile Metrics

byMikalai Alimenkou

PPTX

Scaled Agile Framework Roadmap Template

PDF

Jira 101

byPanji Gautama

PPTX

JIRA System Admin Traning

PPT

What Is Agile Scrum

byMichael Bourque

PDF

TCoE

PDF

Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS

PPTX

Agile Testing by Example

byMikalai Alimenkou

PDF

Bpmn Poster

byGuillermo escutia

PPSX

The Future of Platform Engineering

byJemma Hussein Allen

PPT

Agile QA presentation

byCarl Bruiners

PDF

Agile modeling

PDF

How to Set Up a Cloud Cost Optimization Process for your Enterprise

PDF

Red Hat OpenShift -- Innovation without limitation.pdf

PPT

QM-039-何謂SPC

PPTX

Managing Requirements in Agile Development - Best Practices for Tool-Based Re...

PDF

Hexagonal symfony

byMarcello Duarte

PDF

A New Introduction to Jira & Agile Product Management

byDan Chuparkoff

PDF

Jira as a Project Management Tool

byPaolo Mottadelli

PDF

Demystifying MVP and MMF in an Agile World - Mike Hall, AgileCamp Dallas 2018

byAgile Velocity

PDF

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...

PDF

FMEA-MSR 7步驟分析法

byFast SiC Semiconductor Inc.

PPTX

LinkedIn talk at Netflix ML Platform meetup Sep 2019

byFaisal Siddiqi

PPT

Test automation process

byBharathi Krishnamurthi

KEY

Scalable Architectures - Taming the Twitter Firehose

byLorenzo Alberton

PDF

Monitoring at scale - Intuitive dashboard design

byLorenzo Alberton

More Related Content

PDF

Scrum artifacts

byEnterprise Hardware Solutions (Private) Limited

PPTX

The Unicorn Project and The Five Ideals (older: see notes for newer version)

PDF

Introducing Agile Scrum XP and Kanban

byDimitri Ponomareff

PDF

Introduction to JIRA

PPT

Agile Metrics

byMikalai Alimenkou

PPTX

Scaled Agile Framework Roadmap Template

PDF

Jira 101

byPanji Gautama

PPTX

JIRA System Admin Traning

Scrum artifacts

byEnterprise Hardware Solutions (Private) Limited

The Unicorn Project and The Five Ideals (older: see notes for newer version)

Introducing Agile Scrum XP and Kanban

byDimitri Ponomareff

Introduction to JIRA

Agile Metrics

byMikalai Alimenkou

Scaled Agile Framework Roadmap Template

Jira 101

byPanji Gautama

JIRA System Admin Traning

What's hot

PPT

What Is Agile Scrum

byMichael Bourque

PDF

TCoE

PDF

Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS

PPTX

Agile Testing by Example

byMikalai Alimenkou

PDF

Bpmn Poster

byGuillermo escutia

PPSX

The Future of Platform Engineering

byJemma Hussein Allen

PPT

Agile QA presentation

byCarl Bruiners

PDF

Agile modeling

PDF

How to Set Up a Cloud Cost Optimization Process for your Enterprise

PDF

Red Hat OpenShift -- Innovation without limitation.pdf

PPT

QM-039-何謂SPC

PPTX

Managing Requirements in Agile Development - Best Practices for Tool-Based Re...

PDF

Hexagonal symfony

byMarcello Duarte

PDF

A New Introduction to Jira & Agile Product Management

byDan Chuparkoff

PDF

Jira as a Project Management Tool

byPaolo Mottadelli

PDF

Demystifying MVP and MMF in an Agile World - Mike Hall, AgileCamp Dallas 2018

byAgile Velocity

PDF

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...

PDF

FMEA-MSR 7步驟分析法

byFast SiC Semiconductor Inc.

PPTX

LinkedIn talk at Netflix ML Platform meetup Sep 2019

byFaisal Siddiqi

PPT

Test automation process

byBharathi Krishnamurthi

What Is Agile Scrum

byMichael Bourque

TCoE

Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS

Agile Testing by Example

byMikalai Alimenkou

Bpmn Poster

byGuillermo escutia

The Future of Platform Engineering

byJemma Hussein Allen

Agile QA presentation

byCarl Bruiners

Agile modeling

How to Set Up a Cloud Cost Optimization Process for your Enterprise

Red Hat OpenShift -- Innovation without limitation.pdf

QM-039-何謂SPC

Managing Requirements in Agile Development - Best Practices for Tool-Based Re...

Hexagonal symfony

byMarcello Duarte

A New Introduction to Jira & Agile Product Management

byDan Chuparkoff

Jira as a Project Management Tool

byPaolo Mottadelli

Demystifying MVP and MMF in an Agile World - Mike Hall, AgileCamp Dallas 2018

byAgile Velocity

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...

FMEA-MSR 7步驟分析法

byFast SiC Semiconductor Inc.

LinkedIn talk at Netflix ML Platform meetup Sep 2019

byFaisal Siddiqi

Test automation process

byBharathi Krishnamurthi

Viewers also liked

KEY

Scalable Architectures - Taming the Twitter Firehose

byLorenzo Alberton

PDF

Monitoring at scale - Intuitive dashboard design

byLorenzo Alberton

KEY

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees

byLorenzo Alberton

KEY

The Art of Scalability - Managing growth

byLorenzo Alberton

KEY

Graphs in the Database: Rdbms In The Social Networks Age

byLorenzo Alberton

KEY

NoSQL Databases: Why, what and when

byLorenzo Alberton

KEY

Trees In The Database - Advanced data structures

byLorenzo Alberton

Scalable Architectures - Taming the Twitter Firehose

byLorenzo Alberton

Monitoring at scale - Intuitive dashboard design

byLorenzo Alberton

Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees

byLorenzo Alberton

The Art of Scalability - Managing growth

byLorenzo Alberton

Graphs in the Database: Rdbms In The Social Networks Age

byLorenzo Alberton

NoSQL Databases: Why, what and when

byLorenzo Alberton

Trees In The Database - Advanced data structures

byLorenzo Alberton

Similar to Scaling Teams, Processes and Architectures

PPTX

My talk at PMI Sweden Congress 2013 on Agile and Large Software Products

bySvante Lidman

PDF

Software development is hard

PDF

Lean & agile 101 for Astute Entrepreneurs

byClaudio Perrone

PDF

Envisioning improving productivity and qaulity through better backlogs agi...

PDF

Path to agility, Ken Schwaber

byXavier Warzee

PPT

SIM presentation Oct 9 2012

PDF

Key Considerations for a Successful Hyperion Planning Implementation

PDF

Recruiting a Great Team for your Startup by Dan Olsen

PDF

Agile developers create their own identity by Ajay Danait

byXebia IT Architects

PDF

Release Management for Large Enterprises

bySalesforce Developers

PPTX

Agile marries itil

byMats Janemalm

PPT

Arch factory - Agile Design: Best Practices

byIgor Moochnick

PPTX

Building Results Oriented Websites: The Method That Ends the Madness

byTom McCracken

PDF

The Essentials of Great Search Design (ECIR 2010)

byVegard Sandvold

PPTX

Lanzamiento Visual Studio 2012 - Modern ALM

byDebora Di Piano

PPTX

Scaling Technology Organizations

bySergey Sundukovskiy

PPTX

Webinar - Maximizing Requirements Value Throughout the Product Lifecycle

bySeapine Software

PDF

Lean Thinking on Business Analysis

byLuiz C. Parzianello

PDF

Agile Developers Create Their Own Identity[1]

bySurajit Bhuyan

PDF

Agile Developers Create Their Own Identity

My talk at PMI Sweden Congress 2013 on Agile and Large Software Products

bySvante Lidman

Software development is hard

Lean & agile 101 for Astute Entrepreneurs

byClaudio Perrone

Envisioning improving productivity and qaulity through better backlogs agi...

Path to agility, Ken Schwaber

byXavier Warzee

SIM presentation Oct 9 2012

Key Considerations for a Successful Hyperion Planning Implementation

Recruiting a Great Team for your Startup by Dan Olsen

Agile developers create their own identity by Ajay Danait

byXebia IT Architects

Release Management for Large Enterprises

bySalesforce Developers

Agile marries itil

byMats Janemalm

Arch factory - Agile Design: Best Practices

byIgor Moochnick

Building Results Oriented Websites: The Method That Ends the Madness

byTom McCracken

The Essentials of Great Search Design (ECIR 2010)

byVegard Sandvold

Lanzamiento Visual Studio 2012 - Modern ALM

byDebora Di Piano

Scaling Technology Organizations

bySergey Sundukovskiy

Webinar - Maximizing Requirements Value Throughout the Product Lifecycle

bySeapine Software

Lean Thinking on Business Analysis

byLuiz C. Parzianello

Agile Developers Create Their Own Identity[1]

bySurajit Bhuyan

Agile Developers Create Their Own Identity

Recently uploaded

PDF

Decoding the DNA: The Digital Networks Act, the Open Internet, and IP interco...

byCSUC - Consorci de Serveis Universitaris de Catalunya

PDF

Top 7 Manufacturing Software for Small Businesses Boosting Growth in 2025

byEnvertis Software Solutions

PDF

How Mobile Apps Are Shaping the Future of Digital Innovation

byWilliam Taylor

PPTX

Ritesh_kumar_Aatmanirbhar Bharat: Make in India, Make for the World.pptx

byriteshrkgs2008

PDF

Ransomware_Resilience_Strategic_Playbook.pdf

byAfonso Henrique Rodrigues Alves

PDF

Igniting the Future: Copilot trends, agentic transformation and product roadm...

byUni Systems S.M.S.A.

PDF

Innovative AI Solutions for Business Growth.pdf

PDF

Unlocking the Power of Salesforce Architecture: Frameworks for Effective Solu...

byvarsha30tiwari

PDF

AI Technology important knowledge ......

PPTX

RISE with SAP for Automotive - S4 HANA Value.pptx

PDF

The Ultimate Guide to Problem Management Dashboards for IT Teams.pdf

byomnex systems

PDF

Accelerating Responsible AI Adoption in Public Sector and Private Organizations.

byYasir Naveed Riaz

PDF

Empowering Productivity with Clever Prompts and Intelligent Agents

byUni Systems S.M.S.A.

PDF

December Patch Tuesday

PDF

Zero Trust & Defense-in-Depth: The Future of Critical Infrastructure Security

byYasir Naveed Riaz

PDF

Generative AI in 2026: Hype, Bubble, Winter or The Real Deal?

byDr. Tathagat Varma

PDF

Incident Response Planning with a Foundation Model

PDF

ElyriaSoftware — Powering the Future with Blockchain Innovation

byElyria Software

PDF

TPPmark2025 Kenta Inoue's answer 12/04/2025

PDF

The major tech developments for 2026 by Pluralsight, a research and training ...

byChris Skinner

Decoding the DNA: The Digital Networks Act, the Open Internet, and IP interco...

byCSUC - Consorci de Serveis Universitaris de Catalunya

Top 7 Manufacturing Software for Small Businesses Boosting Growth in 2025

byEnvertis Software Solutions

How Mobile Apps Are Shaping the Future of Digital Innovation

byWilliam Taylor

Ritesh_kumar_Aatmanirbhar Bharat: Make in India, Make for the World.pptx

byriteshrkgs2008

Ransomware_Resilience_Strategic_Playbook.pdf

byAfonso Henrique Rodrigues Alves

Igniting the Future: Copilot trends, agentic transformation and product roadm...

byUni Systems S.M.S.A.

Innovative AI Solutions for Business Growth.pdf

Unlocking the Power of Salesforce Architecture: Frameworks for Effective Solu...

byvarsha30tiwari

AI Technology important knowledge ......

RISE with SAP for Automotive - S4 HANA Value.pptx

The Ultimate Guide to Problem Management Dashboards for IT Teams.pdf

byomnex systems

Accelerating Responsible AI Adoption in Public Sector and Private Organizations.

byYasir Naveed Riaz

Empowering Productivity with Clever Prompts and Intelligent Agents

byUni Systems S.M.S.A.

December Patch Tuesday

Zero Trust & Defense-in-Depth: The Future of Critical Infrastructure Security

byYasir Naveed Riaz

Generative AI in 2026: Hype, Bubble, Winter or The Real Deal?

byDr. Tathagat Varma

Incident Response Planning with a Foundation Model

ElyriaSoftware — Powering the Future with Blockchain Innovation

byElyria Software

TPPmark2025 Kenta Inoue's answer 12/04/2025

The major tech developments for 2026 by Pluralsight, a research and training ...

byChris Skinner

Scaling Teams, Processes and Architectures

1.
Lorenzo Alberton @lorenzoalberton Scaling Teams, Processes and Architectures Managing growth London Scalability Group, Innovation Warehouse, 16th April 2012 1
2.
Scalability Is About... People Processes Technology 2
3.
People Stafﬁng, Roles, Management,Teams 3
4.
Stafﬁng Never compromise. Only hire people smarter than you. http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4
5.
Stafﬁng Hire people who can ﬁt the company culture. Promote fun in your working environment. http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4
6.
Stafﬁng Beware of toxic people http://www.earthrangers.com/content/wildwire/toxic_spill.jpg 4
7.
Team Size andStructure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity 5
8.
Team Size andStructure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity CTO functional PM PM PM Designer Developer Tester Designer Developer Tester Designer Developer Tester Designer Developer Tester Designers Developers Testers 5
9.
Team Size andStructure Micromanaging managers Poor communication too small Overworked team members Low morale too big Can’t accomplish much Low productivity CTO functional matrix PM PM PM Proj 1 PM Designer Developer Tester Proj 2 PM Designer Developer Tester Proj 3 PM Designer Developer Tester Proj 4 PM Designer Developer Tester Designers Developers Testers 5
10.
Processes 6
11.
Why are processescritical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs 7
12.
Why are processescritical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge 7
13.
Why are processescritical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount 7
14.
Why are processescritical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process 7
15.
Why are processescritical? Improve management of teams and employees Standardise actions in repetitive tasks Reduce mundane decisions to focus on grander ideas Allow the team to react quickly to crisis Determine system capacity and scalability needs Challenge right amount right process right time 7
16.
Determining Headroom Capacity Current Load 8
17.
Determining Headroom Why? Capacity Planning annual budget Hiring plan Current Load Prioritisation 8
18.
Controlling Change: DetermineRisk http://dilbert.com/strips/comic/2008-05-08/ 9
19.
Controlling Change: DetermineRisk http://dilbert.com/strips/comic/2008-05-08/ 9
20.
Risk Management Risk is cumulative Determine limits and tolerance 10
21.
Load / StressTesting Load testing - identify, document and eliminate bottlenecks through a strict controlled process of measurement and analysis - measure system’s response and stability - verify the app can meet the desired performance objectives (SLA) Stress testing - determine the app’s stability when subjected to above-normal loads - verify the app’s behaviour when close to the breaking point - test the application recoverability (negative testing) 11
22.
Barrier Conditions Code reviews Manual and automated QA processes Performance and stress testing Release documentation checks (runbook) Dev, Test, Stage and Live environments Instrumentation checks Protection from signiﬁcant failures 12
23.
Technology Architecting Scalable Solutions 13
24.
Architectural Principles 14
25.
Architectural Principles +1 N + 1 design 14
26.
Architectural Principles +1 N + 1 design for rollback 14
27.
Architectural Principles +1 N + 1 design for rollback to be disabled 14
28.
Architectural Principles +1 N + 1 design for rollback to be disabled to be monitored 14
29.
Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple monitored live sites 14
30.
Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology 14
31.
Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous design 14
32.
Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless design systems 14
33.
Architectural Principles +1 N + 1 design for rollback to be disabled to be for multiple use mature monitored live sites technology asynchronous stateless buy when design systems non core 14
34.
Stateless, Asynchronous Systems http://upload.wikimedia.org/wikipedia/commons/4/46/Synchronized_swimming_-_Russian_team.jpg 15
35.
Fault Isolative Structures 16
36.
Fault Isolative Structures Increase availability Limit impact of failures Easier debugging 16
37.
Fault Isolative Structures Increase availability Limit impact of failures Easier debugging First 16
38.
Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions causing repetitive problems First 16
39.
Fault Isolative Structures Increase availability Limit impact of failures Easier debugging Functions Natural layout causing or topology repetitive of the site problems First 16
40.
Caching for Performanceand Scale 17
41.
Caching for Performanceand Scale Object Caches Usually serialized (marshalling / unmarshalling) get() / set() / replace() APC, Memcached 17
42.
Caching for Performanceand Scale Object Caches Application Caches Usually serialized Proxy caches (marshalling / Reverse proxy unmarshalling) caches get() / set() / HTTP headers replace() ISP/Uni proxies APC, Memcached Squid, Varnish, mod_cache 17
43.
Caching for Performanceand Scale Object Caches Application Caches CDNs Usually serialized Proxy caches Multiple locations (marshalling / / backbones Reverse proxy unmarshalling) caches get() / set() / HTTP headers CNAME entries replace() ISP/Uni proxies Akamai, Coral, APC, Memcached Squid, Varnish, Limelight... mod_cache 17
44.
Managing “Big Data” storage costs people and software power and space processing power backup time and costs 18
45.
Managing “Big Data” The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs 18
46.
Managing “Big Data” The more storage ...the more storage management storage costs people and software power and space processing power backup time and costs Evaluate data retention policy Consider multi-tiered storage Distribute data/ work (Hadoop, M/R) 18
47.
Monitoring: Measure Everything 19
48.
Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors 19
49.
Monitoring: Measure Everything 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 19
50.
Monitoring: Measure Everything StatsD 1. Is there a problem? User experience / Business metrics monitors 2. Where is the problem? System monitors (threshold - variance) 3. What is the problem? Application monitors Keep Signal vs. Noise ratio high 19
51.
DataSift Architecture Some Architecture Pr0n 20
52.
DataSift Architecture http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21
53.
DataSift Architecture SOA - loosely coupled, independently scalable services. Simple APIs http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21
54.
DataSift Architecture SOA - loosely coupled, independently scalable services. Simple APIs example http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html 21
55.
SOA - ScaleEach Component 22
56.
Our Stack Languages: C++,PHP, Java, Scala, Ruby, Node.JS Storage: MySQL, HBase Cache: Memcached, APC, Redis Queues: ZeroMQ, Kafka, Redis Development/Deployment: GIT, Jenkins CI, RPM, Chef Monitoring: StatsD + Graphite, Zenoss 23
57.
Our Stack Languages: C++,PHP, Java, Scala, Ruby, Node.JS Storage: MySQL, HBase Cache: Memcached, APC, Redis Queues: ZeroMQ, Kafka, Redis Development/Deployment: GIT, Jenkins CI, RPM, Chef Monitoring: StatsD + Graphite, Zenoss Secret recipe: amazing people and working environment 23
58.
Messaging ZeroMQ: PUSH-PULL, REQ-REP,PUB-SUB (multicast, broadcast) Internal communication: pass messages to the next processing stage, control events, monitoring Kafka/Redis: PUSH-PULL with persistence Internal message / workload buffering and distribution Node.js: WebSockets / HTTP Streaming Message delivery (output) 24
59.
0mq PUSH-PULL (workloaddistribution) Consumer 1 Consumer 2 Consumer 3 [Round-Robin-ish] 25
60.
0mq PUB-SUB (HighAvailability) Listener 1 Publisher 1 Listener 2 Publisher 2 Listener 3 [Broadcast] [Dynamic Subscriptions] 26
61.
0mq PUB-SUB (HighAvailability) DC 1 Publisher 1 Publisher 2 DC 2 27
62.
Internal “Firehose” Publishers Subscribers Alice’s John’s Y Z timeline Inbox X subscribe to topic X Data Bus subscribe to topic Y System Fred’s Tech Monitor Followers Blog Feed 28
63.
Instrumentation https://play.google.com/store/apps/details?id=net.networksaremadeofstring.rhybudd 29
64.
We’re Hiring! http://datasift.com/whoweare/jobs 30
65.
References M. L. Abbot, M. T. Fisher, “The Art Of Scalability”, Addison Wesley http://theartofscalability.com/ http://www.slideshare.net/quipo/the-art-of-scalability-managing- growth http://www.slideshare.net/postwait/scalable-internet-architecture http://bit.ly/IJKwuc http://agile.dzone.com/news/approaches-organizational https://bitly.com/vCSd49 31
66.
Lorenzo Alberton @lorenzoalberton Thank you! lorenzo@alberton.info http://www.alberton.info/talks Questions? 32

Editor's Notes

#2 \n
#3 Let&#x2019;s start by focusing on the true foundation: people and process, without which true scalability cannot be built.\nPeople are the most important element of scalability, as without people there are no processes and no technology.\n
#4 \n
#5 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#6 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#7 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#8 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#9 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#10 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#11 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#12 Leadership is the influencing of an organisation to accomplish a specific objective (down to personal characteristics + skills + experience + actions).\nLook for solid software engineers with good understanding of CS topics, and exceptional devops. Create fun working environment. We solve serious, challenging problems. We also want to have fun. Avoid rockstars. "Hard work beats talent when talent doesn't work hard." - Tim Notke\nFocus and dedication.\n
#13 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#14 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#15 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#16 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#17 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#18 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#19 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#20 Overlapping responsibilities create wasted effort and value-destroying conflicts.\nKey scale-related responsibilities for any organisation include:\n- setting measurable goals; - staffing the team with the appropriate skills; - defining and implementing a scalable architecture.\n
#21 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#22 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#23 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#24 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#25 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#26 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#27 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#28 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#29 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#30 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#31 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#32 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#33 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#34 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#35 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#36 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#37 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#38 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#39 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#40 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#41 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#42 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#43 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#44 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#45 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#46 There&#x2019;s a link between organisational structure and scalability, it has a big impact in personal productivity.\nThe goal is to minimise the friction caused by organisational or team boundaries, without limiting the throughput, and at the same time making innovation and the work flow easy.\nThe team can be organised into 2 structures:- functional (employees divided by their primary function; homogeneity, simplicity of responsibilities, adherence to standards; Drawbacks: no single project owner, poor cross-functional communication).\n- matrix (similar, but with a second dimension that includes a new management structure; better communication, project ownership; Drawbacks: multiple bosses, distraction from a person&#x2019;s primary discipline).\n
#47 \n
#48 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#49 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#50 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#51 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#52 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#53 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#54 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#55 Processes serve 3 general purposes:\n- they augment the management of our teams and employees\n- they standardise employee&#x2019;s actions while performing repetitive tasks\n- they free employees up from daily mundane decisions to concentrate on grander ideas.\n
#56 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#57 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#58 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#59 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#60 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#61 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#62 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#63 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#64 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#65 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#66 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#67 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#68 In order to navigating your way out of the woods, you need to know the point from which you are starting.\nHeadroom = amount of free capacity that exists within your system before you start having problems such as degradation of performance or an outage. Because your app is a system that involves many different components as a db, a firewall, application servers, in order to truly understand headroom you need to first understand the headroom of each of these.\n1) Identify major components. 2) Identify responsible team. 3) Determine usage and capacity. 4) Determine growth rate. Work together for a better analysis and to find the best solution\n
#69 The intent of change management is to limit the impact of changes by controlling them through their release into the production environment and logging them as they are introduced to production. Gut feeling / finger in the air: thanks to experience + innate ability: fast, but not accurate.\nSemaphore: Assign a risk level of green/yellow/red to each small component, then assign an overall colour: methodical, repeatable, documentable, no longer relying on a single person, accurate. Better: Failure Mode and Effect Analysis.\n\n\n
#70 Risk is cumulative. You might want to establish some limits to the amount of risk that you are willing to allow at a particular time of the day or customer volume.\nAlso consider the human factor, i.e. the level of risk tolerance that a person can have within a certain time frame.\n\n
#71 The purpose of load testing is to identify, document and eliminate bottlenecks in the system through a strict controlled process of measurement and analysis. Load testing is the process of putting load or user demand on a system to measure its response and stability, to verify that the app can meet the desired performance objectives (SLA: service level agreement).\n Establish success criteria (concurrent usage, response time, ...)\n Establish the test environment (as close as possible to the production environment)\n Define the tests (Pareto rule 20% - 80%) to cover different things (endurance, most used, most visible, different components)\n Identify what needs to be monitored / what data needs to be collected\n Run, Analyse, Report to Engineers\n Repeat Tests and Analysis\nStress testing is a process used to determine an application&#x2019;s stability when subjected to above-normal loads, to verify the behaviour when close to the breaking point of the application.\nPositive testing is where the load is progressively increased to overwhelm the system&#x2019;s resources.\nNegative testing takes away resources such as memory, threads, connections, testing the application recoverability.\nThe way to insure that the headroom calculations remain accurate is to conduct performance testing on all your releases to insure you are not introducing unexpected load increases.\n
#72 Good processes for the promotion of systems into the production environment have the capability of protecting you from significant failures. Developing effective barrier conditions and coupling them with a process and capability to roll back production changes are necessary components within any highly available service and are critical to the success of your scalability goals.\n
#73 \n
#74 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#75 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#76 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#77 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#78 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#79 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#80 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#81 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#82 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#83 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#84 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#85 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#86 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#87 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#88 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#89 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#90 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#91 - N+1 design (ensure that everything you develop has at least one additional instance of that system in the event of failure)\n- Designing the capability to roll back into an app helps limit the scalability impact of any given release.\n- Designing to disable features adds the flexibility of keeping the most recent release in production while limiting / containing the impact of offending features or functionality.\n- Design to be monitored: you want your system to identify when it&#x2019;s performing differently than it normally operates in addition to telling you when it&#x2019;s not functioning properly.\n- Design for multiple live sites: it usually costs less than the operation of a hot site and a cold disaster recovery site.\n- Use mature technology: early adopters risk a lot in finding the bugs; availability and reliability are important.\n- Asynchronous design: asynchronous systems tend to be more fault tolerant to extreme load.\n- Stateless Systems (if necessary, store state with the end users)\n- Buy when non-core\n- Scale out not up (with commodity hardware; horizontal split in terms of data, transactions and customers).\n- Design for any technology, not for a specific product/vendor\n
#92 Synchronous calls, if used excessively or incorrectly cause undue burden on the system and prevent it from scaling.\nSystems designed to interact synchronously have a higher failure rate than asynchronous ones. Their ability to scale is tied to the slowest system in the chain of communications. It&#x2019;s better to use callbacks, and timeouts to recover gracefully should they not receive responses in a timely fashion.\nSynchronisation is when two or more pieces of work must be in a specific order to accomplish a task. Asynchronous coordination between the original method and the invoked method requires a mechanism that the original method determines when or if a called method has completed executing (callbacks). Ensure they have a chance to recover gracefully with timeouts should they not receive responses in a timely fashion.\nA related problem is stateful versus stateless applications. An application that uses state relies on the current condition of execution as a determinant of the next action to be performed. \nThere are 3 basic approaches to solving the complexities of scaling an application that uses session data: 1) Avoidance (using no sessions or sticky sessions) avoid replication: Share-nothing architecture; 2) Decentralisation (store session data in the browser&#x2019;s cookie or in a db whose key is referenced by a hash in the cookie); 3) Centralisation (store cookies in the db / memcached).\n\n
#93 You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
#94 You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
#95 You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
#96 You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
#97 You must be able to isolate and limit the effects of failures within any system, by segmenting the components. Decouple decouple decouple! A swim lane represent both a barrier and a guide (ensure that swimmers don&#x2019;t interfere with each other. Help guide the swimmer toward their objective with minimal effort). AKA Shard.\nThey increase availability by limiting the impact of failures to a subset of functionality, make incidents easier to detect, identify and resolve. The fewer the things are shared between lanes, the more isolative and beneficial the swim lane becomes to both scalability and availability. They should not have lines of communication crossing lane boundaries, and should always move in the direction of the communication. When designing swim lanes, always address the transactions making the company money first (e.g. Search&Browse vs Shopping Cart), then move functions causing repetitive problems into swim lanes; finally consider the natural layout or topology of the site for opportunities to swim lanes (e.g. customer boundaries within an app / environment. If you have a tenant who is very busy, assign it a swim lane; other tenants with a low utilisation can be all put into another swim lane).\n
#98 What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
#99 What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
#100 What is the best way to handle large volumes of traffic? Answer: &#x201C;Establish the right organisation, implement the right processes and follow the right architectural principles&#x201D;. Correct, but the best way is not to have to handle it at all. The key to achieving this is through pervasive use of caching. The cache hit ratio is important to understand its effectiveness. The cache can be updated/refreshed via a batch job or on a cache-miss. If the cache is filled, some algorithms (LRU, MRU...) will decide on which entry to evict. When the data changes, the cache can be updated through a write-back or write-through policy. There are 3 cache types:\n- Object caches: used to store objects for the app to be reused, usually serialized objects. The app must be aware of them. Layer in front of the db / external services. Marshalling is a process where the object is transformed into a data format suitable for transmitting or storing.\n- Application caches: A) Proxy caches, usually implemented by ISPs, universities or corporations; it caches for a limited number of users and for an unlimited number of sites. B) Reverse proxy caches (opposite): it caches for an unlimited number of users and for a limited number of applications; the configuration of the specific app will determine what can be cached. HTTP headers give much control over caching (Last-Modified, Etag, Cache-Control).\n- Content Delivery Networks: they speed up response time, off load requests from your application&#x2019;s origin server, and usually lower costs. The total capacity of the CDN&#x2019;s strategically placed servers can yield a higher capacity and availability than the network backbone. The way it works is that you place the CDN&#x2019;s domain name as an alias for your server by using a canonical name (CNAME) in your DNS entry\n
#101 \n
#102 \n
#103 \n
#104 \n
#105 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#106 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#107 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#108 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#109 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#110 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#111 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#112 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#113 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#114 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#115 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#116 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#117 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#118 Logging at scale is useless. Too much noise. Instrumentation is essential.\nYou need to identify bottlenecks quickly or suffer prolonged and painful outages. The question of "How come we didn't catch that earlier?" addresses the incident, not the problem. The alternative question "What in our process is flawed that allowed us to launch the service without the appropriate monitoring to catch such an issue?" addresses the people and the processes that allowed the event you just had and every other event for which you didn't have appropriate monitoring.\nDesigning to be monitored is an approach wherein one builds monitoring into the application rather than around it. "How do we know when it's starting to behave poorly?" First, you need to answer the question "Is there a problem?" with user experience and business metrics monitors (lower click-through rate, shopping cart abandonment rate, ...). Then you need to identify where the problem is with system monitors (the problem with this is that it's usually relying on threshold alerts - i.e. checking if something is behaving outside of our expectations - rather than alerting on when it's performing significantly differently than in the past). Finally you need to identify what is the problem thanks to application monitoring. \nNot all monitoring data is valuable, too much of it only creates noise, while wasting time and resources. It's advisable to only save a summary of the reports over time to keep costs down while still providing value. In the ideal world, incidents and crises are predicted and avoided by a robust monitoring solution.\n
#119 \n
#120 \n
#121 \n
#122 \n
#123 Use queues and workers to make processes asynchronous, distribute data to parallel workers. \n
#124 happy to talk about any of them\n
#125 \n
#126 \n
#127 \n
#128 listeners can only subscribe to one or more topics. Different output channels.\nZeroMQ v3: filtering done on the publisher side\n
#129 An interesting idea if you have a highly dynamic site / service, with each update affecting several other users / pages, is to have an internal data bus that carries all the information, with updates labelled with topics, and all the services/users subscribing to the relevant topics.\n
#130 We collect millions of events every second.\nThe importance of people: devops who know what to monitor, how, how to use and write tools, and have 100% dedication.\nWe use different technologies. It&#x2019;s very easy to set up a new ZeroMQ listener.\nWe use StatsD (from Flickr / Etsy), Zenoss, Graphite\n
#131 shameless plug\n
#132 \n
#133 \n