SlideShare a Scribd company logo
1 of 28
System Design & Scalability
A quick reference guide
By John DiFini
Overview
• Types of Scaling & Load Balancing
• Data Storage Design
• Message-oriented Middleware (MOM)
• Fault Handling
• Networking
• MapReduce
Types of Scaling & Load Balancing
Horizontal vs. Vertical Scaling
Type Scaling
Process
Example Complexity Limited
Vertical Add more
resources to
a single node
Add more
memory to a
single server
Easy Yes (e.g. can
only add so
much memory)
Horizontal Add more
nodes
Add more
servers
Hard Practically
unlimited
Load Balancing
• Round Robin
• Source IP Hash - a given client IP address will always go
to the same server
• Request Hash - a given request type will always go to the
same server/cache; avoids cache duplication
• Least Connections
• Least Traffic
• Least Latency
Reference
Data Storage Design
Database - Read vs. Write Performance
• Normalize vs. Denormalize
 Normalize - ↓duplicate data ⇨ ↑write perf but ↓read perf
 Denormalize - ↑duplicate data ⇨ ↑read perf but ↓write perf
• Have your cake and… - Use an append-only structure for
writes; then asynchronously restructure data into a read-
optimized format[*]
Database - Structure
• Relational - general purpose for tabular/table-
based data
• Specialized - for data structures that don't easily fit
the tabular format (e.g. multi-level nesting &
hierarchies)
 NoSQL
 Others
Not to be confused with...
Cache
• DB reads are expensive; i.e. hold as much
of it in memory as possible
• Cache Hit - data were found in cache;
Cache Miss - data not found, so retrieve
it from DB[*]
• Local vs. Distributed Rule of Thumb - use local cache for small
data sets, with predictable number of immutable records[*]
• Cache Warming - anticipate queries and "prime" the cache
not only on startup but also in real-time (e.g. load surrounding
tiles of a recently-requested map)
Cache - Replacement Policy
• Replacement Policy - algorithm used to maximize
cache performance by choosing which data to
eject & which data to add in its place[*]
 LRU - ejects the most Least Recently Used data
 advanced - considers access frequency, size of items,
latency & throughput
LRUMRUCache:
Data Store Sharding
Sharding - partition data
across multiple nodes
Not to be confused with...
Type Scaling Process Drawback
Table-based Put Table A on Node 1, Table B on Node 2,
etc.
What if a table gets too large for its node?
Hash-based Primary key is hashed, and every node is
responsible for a range of hashed keys
What happens if the # of nodes changes? ->
need to reallocate all the data
Directory-based lookup service keeps track of which data
are stored in which shard
What if directory service is down (i.e. single point
of failure)?
What if directory service has to process to many
requests (i.e. a bottleneck)?
Message-oriented Middleware
(MOM)
MOM Considerations
• Used by distributed systems to communicate
amongst nodes[*]
• Abstracts OS & network intricacies (e.g. endian
format, sockets, etc.)
MOM Types
Type Use Case Examples Underlying
Protocol
Cast
Request/
Response
1 sender; 1 receiver
(point-to-point)
e.g. Stock Trade
Order[*]
Synchronous - JSON
Web Services
Asynchronous -
message queues
like ActiveMQ, IBM
MQ
TCP
("guaranteed"
delivery)
Unicast
Publish/
Subscribe
1 sender; many
receivers/listeners
e.g. Stock Tick
Kafka[*], TIBCO
Rendezvous/RV
UDP Broadcast (all
nodes) or Multicast
(node groups)
Fault Handling
Fault Handling
• High Availability (HA) - delayed recovery to
secondary
• Fault Tolerant - immediate recovery
 Active/Passive - primary fails over to secondary
 Active/Active - no primary vs. secondary; when
1 fails, the other(s) takes the additional load
What is dead
should never die
• Great YouTube video on the subject!
• @todo - explain no-special-node, ring topologies
Networking
Network Metrics
• Bandwidth - The maximum amount of data that can
be transferred in a unit of time (e.g. 100Mbps)[*]
• Throughput - The actual amount of data that is
transferred in a unit of time (e.g. 88MBps)
• Latency - The time it takes to send & receive
(round-trip) a packet of data (e.g. 20ms)[*]
Network Metrics - Analogy
Given a water pipe, its diameter determines its
throughput, and its length determines its latency.
Therefore, to improve:
• Throughput - Get a fatter pipe
• Latency - Colocate to reduce distance or reduce
network hops (point-to-point), which also reduces
distance that data have to travel
MapReduce
MapReduce
Uses parallel & distributed systems to process large
data sets[*]
• Implementations - Spark, Hadoop, etc.[*]
• YouTube presentation
MapReduce - Steps
Fundamentally, consists of two steps, Map & Reduce, but
Shuffle step is also prevalent:
• Map - Organizes/filters/sorts. Think of putting elements
into a typical Map Interface with key-value pairs (e.g. <key,
value>)
• Shuffle - Redistributes data so that all data pertaining to a
given key reside on reside on the same node
• Reduce - Summary/aggregation (e.g. sum all values for a
given key)
Coming Soon
Coming Soon
Define P9s
templates
color palette
section template
bullet template
asdf
asdf
asdf
asdf
asdf
asdf

More Related Content

What's hot

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2Tianwei Liu
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsJason Shao
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
load balancing in public cloud ppt
load balancing in public cloud pptload balancing in public cloud ppt
load balancing in public cloud pptKrishna Kumar
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Load balancing
Load balancingLoad balancing
Load balancingSoujanya V
 
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud ComputingPerformance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud ComputingEswar Publications
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...InfluxData
 

What's hot (19)

Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Load balancing
Load balancingLoad balancing
Load balancing
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
 
Flume basic
Flume basicFlume basic
Flume basic
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
load balancing in public cloud ppt
load balancing in public cloud pptload balancing in public cloud ppt
load balancing in public cloud ppt
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Load balancing
Load balancingLoad balancing
Load balancing
 
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud ComputingPerformance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Flume
FlumeFlume
Flume
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 

Viewers also liked

Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...
Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...
Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...Afaq Ahmad
 
Trabajo de informatica unida iv
Trabajo de informatica unida ivTrabajo de informatica unida iv
Trabajo de informatica unida ivserocoll
 
Plantilla fase2 nadia peña
Plantilla fase2 nadia peñaPlantilla fase2 nadia peña
Plantilla fase2 nadia peñanadia peña
 
Plantilla fase2 nadia peña
Plantilla fase2 nadia peñaPlantilla fase2 nadia peña
Plantilla fase2 nadia peñanadia peña
 
Ms. Dawson's First Day
Ms. Dawson's First DayMs. Dawson's First Day
Ms. Dawson's First DayNikki231
 
Profile of prof. afaq ahmad in relation to conference organization - versio...
Profile of prof. afaq ahmad   in relation to conference organization - versio...Profile of prof. afaq ahmad   in relation to conference organization - versio...
Profile of prof. afaq ahmad in relation to conference organization - versio...Afaq Ahmad
 
Resumen de investigacion y analisis de acccidentes
Resumen de investigacion y analisis de acccidentesResumen de investigacion y analisis de acccidentes
Resumen de investigacion y analisis de acccidentesadriana perez
 
Major training report power point
Major training report power pointMajor training report power point
Major training report power pointAnurag Gupta
 

Viewers also liked (20)

Certificates
CertificatesCertificates
Certificates
 
Imagenes animacion
Imagenes animacionImagenes animacion
Imagenes animacion
 
Testing reading
Testing readingTesting reading
Testing reading
 
Revanda - RESUME
Revanda - RESUMERevanda - RESUME
Revanda - RESUME
 
Marawn CV
Marawn CVMarawn CV
Marawn CV
 
Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...
Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...
Profile of Prof. Afaq Ahmad - In Relation to Conference Organization - Versio...
 
Alimentación balanceada
Alimentación balanceadaAlimentación balanceada
Alimentación balanceada
 
Trabajo de informatica unida iv
Trabajo de informatica unida ivTrabajo de informatica unida iv
Trabajo de informatica unida iv
 
Producción y desarrollo sustentable
Producción y desarrollo sustentableProducción y desarrollo sustentable
Producción y desarrollo sustentable
 
Plantilla fase2 nadia peña
Plantilla fase2 nadia peñaPlantilla fase2 nadia peña
Plantilla fase2 nadia peña
 
Plantilla fase2 nadia peña
Plantilla fase2 nadia peñaPlantilla fase2 nadia peña
Plantilla fase2 nadia peña
 
zuatKEXP
zuatKEXPzuatKEXP
zuatKEXP
 
Ms. Dawson's First Day
Ms. Dawson's First DayMs. Dawson's First Day
Ms. Dawson's First Day
 
Imagenes
ImagenesImagenes
Imagenes
 
Profile of prof. afaq ahmad in relation to conference organization - versio...
Profile of prof. afaq ahmad   in relation to conference organization - versio...Profile of prof. afaq ahmad   in relation to conference organization - versio...
Profile of prof. afaq ahmad in relation to conference organization - versio...
 
A digitális világ
A digitális világA digitális világ
A digitális világ
 
Derecho agrario
Derecho agrarioDerecho agrario
Derecho agrario
 
AGRARIO
AGRARIOAGRARIO
AGRARIO
 
Resumen de investigacion y analisis de acccidentes
Resumen de investigacion y analisis de acccidentesResumen de investigacion y analisis de acccidentes
Resumen de investigacion y analisis de acccidentes
 
Major training report power point
Major training report power pointMajor training report power point
Major training report power point
 

Similar to System Design & Scalability

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
integrated and diffrentiated services
 integrated and diffrentiated services integrated and diffrentiated services
integrated and diffrentiated servicesRishabh Gupta
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapBarry Jones
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)VMware Tanzu
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptxMichael Ming Lei
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
DataIntensiveComputing.pdf
DataIntensiveComputing.pdfDataIntensiveComputing.pdf
DataIntensiveComputing.pdfBrahmam8
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...QBiC_Tue
 

Similar to System Design & Scalability (20)

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Hadoop
HadoopHadoop
Hadoop
 
Qo s 09-integrated and red
Qo s 09-integrated and redQo s 09-integrated and red
Qo s 09-integrated and red
 
integrated and diffrentiated services
 integrated and diffrentiated services integrated and diffrentiated services
integrated and diffrentiated services
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 Recap
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptx
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
try
trytry
try
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
DataIntensiveComputing.pdf
DataIntensiveComputing.pdfDataIntensiveComputing.pdf
DataIntensiveComputing.pdf
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...
 

Recently uploaded

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

System Design & Scalability

  • 1. System Design & Scalability A quick reference guide By John DiFini
  • 2. Overview • Types of Scaling & Load Balancing • Data Storage Design • Message-oriented Middleware (MOM) • Fault Handling • Networking • MapReduce
  • 3. Types of Scaling & Load Balancing
  • 4. Horizontal vs. Vertical Scaling Type Scaling Process Example Complexity Limited Vertical Add more resources to a single node Add more memory to a single server Easy Yes (e.g. can only add so much memory) Horizontal Add more nodes Add more servers Hard Practically unlimited
  • 5. Load Balancing • Round Robin • Source IP Hash - a given client IP address will always go to the same server • Request Hash - a given request type will always go to the same server/cache; avoids cache duplication • Least Connections • Least Traffic • Least Latency Reference
  • 7. Database - Read vs. Write Performance • Normalize vs. Denormalize  Normalize - ↓duplicate data ⇨ ↑write perf but ↓read perf  Denormalize - ↑duplicate data ⇨ ↑read perf but ↓write perf • Have your cake and… - Use an append-only structure for writes; then asynchronously restructure data into a read- optimized format[*]
  • 8. Database - Structure • Relational - general purpose for tabular/table- based data • Specialized - for data structures that don't easily fit the tabular format (e.g. multi-level nesting & hierarchies)  NoSQL  Others
  • 9. Not to be confused with... Cache • DB reads are expensive; i.e. hold as much of it in memory as possible • Cache Hit - data were found in cache; Cache Miss - data not found, so retrieve it from DB[*] • Local vs. Distributed Rule of Thumb - use local cache for small data sets, with predictable number of immutable records[*] • Cache Warming - anticipate queries and "prime" the cache not only on startup but also in real-time (e.g. load surrounding tiles of a recently-requested map)
  • 10. Cache - Replacement Policy • Replacement Policy - algorithm used to maximize cache performance by choosing which data to eject & which data to add in its place[*]  LRU - ejects the most Least Recently Used data  advanced - considers access frequency, size of items, latency & throughput LRUMRUCache:
  • 11. Data Store Sharding Sharding - partition data across multiple nodes Not to be confused with... Type Scaling Process Drawback Table-based Put Table A on Node 1, Table B on Node 2, etc. What if a table gets too large for its node? Hash-based Primary key is hashed, and every node is responsible for a range of hashed keys What happens if the # of nodes changes? -> need to reallocate all the data Directory-based lookup service keeps track of which data are stored in which shard What if directory service is down (i.e. single point of failure)? What if directory service has to process to many requests (i.e. a bottleneck)?
  • 13. MOM Considerations • Used by distributed systems to communicate amongst nodes[*] • Abstracts OS & network intricacies (e.g. endian format, sockets, etc.)
  • 14. MOM Types Type Use Case Examples Underlying Protocol Cast Request/ Response 1 sender; 1 receiver (point-to-point) e.g. Stock Trade Order[*] Synchronous - JSON Web Services Asynchronous - message queues like ActiveMQ, IBM MQ TCP ("guaranteed" delivery) Unicast Publish/ Subscribe 1 sender; many receivers/listeners e.g. Stock Tick Kafka[*], TIBCO Rendezvous/RV UDP Broadcast (all nodes) or Multicast (node groups)
  • 16. Fault Handling • High Availability (HA) - delayed recovery to secondary • Fault Tolerant - immediate recovery  Active/Passive - primary fails over to secondary  Active/Active - no primary vs. secondary; when 1 fails, the other(s) takes the additional load What is dead should never die • Great YouTube video on the subject! • @todo - explain no-special-node, ring topologies
  • 18. Network Metrics • Bandwidth - The maximum amount of data that can be transferred in a unit of time (e.g. 100Mbps)[*] • Throughput - The actual amount of data that is transferred in a unit of time (e.g. 88MBps) • Latency - The time it takes to send & receive (round-trip) a packet of data (e.g. 20ms)[*]
  • 19. Network Metrics - Analogy Given a water pipe, its diameter determines its throughput, and its length determines its latency. Therefore, to improve: • Throughput - Get a fatter pipe • Latency - Colocate to reduce distance or reduce network hops (point-to-point), which also reduces distance that data have to travel
  • 21. MapReduce Uses parallel & distributed systems to process large data sets[*] • Implementations - Spark, Hadoop, etc.[*] • YouTube presentation
  • 22. MapReduce - Steps Fundamentally, consists of two steps, Map & Reduce, but Shuffle step is also prevalent: • Map - Organizes/filters/sorts. Think of putting elements into a typical Map Interface with key-value pairs (e.g. <key, value>) • Shuffle - Redistributes data so that all data pertaining to a given key reside on reside on the same node • Reduce - Summary/aggregation (e.g. sum all values for a given key)