SlideShare a Scribd company logo
1 of 27
Download to read offline
Scalability
Broad Strokes - Best practices
Definition
● Concurrency a.k.a number of simultaneous
requests, Latency
● Throughput a.k.a total number of item
processed
● Extensibility - application design for ability to
add new features etc.
● We’d be mostly talking about first two.
Concurrency & Performance
● Scalability is measured as number of
requests/users an application support
without degrading the performance.
● Performance is a measure of individual
request process time mostly.
Handling Scale
● Throttling
● Cache
● Stateful vs. stateless
● Asynchronous vs. synchronous
● Service oriented design
Where (Multi tiered)
● At the client (Browser)
○ Http headers
○ Asynchronous calls
○ local DB
● At the server ( Web tier/application tier)
○ Cache -- distributed
○ Stateless
○ Asynchronous
● DB
○ Cap theorem
Client
● Http headers
○ Pragmatic headers not only cache on browsers but
help with intelligent proxies.
○ YSlow/G page speed guidelines are always useful.
○ e-Tags, long expiry are very good practices.
○ sprites and image maps
● Ajax is good for scalability but some time may cause
performance issues.
Client Server Network
● Always compress response.
● Even on JSON the bandwidth gains are
great.
● In server-server calls consider binary
protocols or more efficient ones
● Even on the web, network layer like spdy
etc. are interesting.
Server -- Numbers all should know
● http://static.googleusercontent.com/media/research.
google.com/en//people/jeff/stanford-295-talk.pdf
● Writes are heavy.
● Disk seeks are heavier than network round trip with
memory seek.
● Global shared data is expensive, if locking is involved.
● Reads do not need to be transactional, just consistent.
● Eventual consistency is useful.
Server - Cache(Low latency)
● Cache
○ Complete HTML response
○ Output from Database
● Cache strategy is determined by
○ is it a broadcast?
○ is it a multicast?
○ A unicast?
● Cache works best for broadcast.
● Distributed Caching with consistent hash works very well.
● Pitfall is cache purge
Server (Concurrency)
● Sequential processing is leaving out CPU and other
resources
● Write parallelism is very important.
● But Shared globals are heavy, hence a trade off.
● In case of Java, JMM understanding is necessary.
● Amdahl’s Law helps in determining the maximum gain
that can be achieved with parallel implementations.
● If making it parallel, even a small fraction of sequential
work can cause loss of throughput
Server (State?full:less)
● Given shared access is expensive, keeping state on
server is heavy.
● Sessions if available on shared memory are great.
● No session and share nothing works best.
● Even cache is better.
● Generally stateless code is modular, easier to unit test
and easier to profile.
● On a function stack than heap.
● Stateless helps in scale out. (Scale out??)
Server Synchronous/Asynchronous
● Waiting for I/O, network connections, DB queries is bad.
● How about “query of death”? on write?
● Writes if not very small should be kept asynchronous.
● Helps on parallelization.
● Reliable queues can improve latency.
● idempotent code helps in avoiding many pitfalls.
● Generally asynchronous is achieved
○ Queue/Topic based infrastructure
■ Good for event processing and propagation of events
○ Incremental batches
● Asynch I/O ? servers, Node.js/ngnix/apache event mpm ??
Debugging for Scale
● Profile
○ In java
■ gc logs
■ JVisualVM
■ Thread and memory dumps
○ GNU
■ hprof
■ strace
■ gdb
■ system utilities
Scale Horizontal vs. vertical
● For a stateless, asynchronous, idempotent
and multithreaded application the horizontal
scaling works , very well.
● Easier to understand with storage a.k.a
databases.
Database
● Which type of DBMS ?
○ RDBMS
○ Key space based multi column family
○ Document based
○ Graph
○ any other NoSQL?
○ Solr and elasticsearch
Database scale out limitation
● CAP theorem
○ Consistency
○ Availability
○ Partition tolerance
○ Not available simultaneously
● Eventual consistency is preferred choice.
RDBMS
● Index based query always
● For RDBMS a query of death is a death knock.
● Generally Write once and read at multiple slaves works
better.
● To normalize or not
● normalize for extensibility
● Use solr/nosql for read scale
● One multiple table join complex query or multiple simple
query?? (performance/scale)
NoSQL
● Several options ranging from document databases to
multiple column family
● We mostly use
○ Mongo
○ Cassandra
○ Neo4j (in some cases)
○ Titan
● Provide very high throughput with manageable
clustering/sharding
Mongo (iBeat)
● Increasing data volumes threatens the
scalability and availability
● Though search is available, it’s not very
efficient.
● The limit of a single document is 16 MB.
● Repair DB and reindexing do impact
performance.
Mongo (iBeat ..)
● Mongo sharding as a solution
● Data volume per replica set decreased.
● For document size limit gridFS was used.
● With less document volume, the overhead of
index etc. reduced.
● But sharding itself with large amount of data
was carried out over a long period of time.
Big Data
● Normally associated with such large and complex data that traditional data
management/visualization tools fail to capture, curate or process.
● Current definition defines 3 aspects a.k.a (3V)
○ Volume
○ Velocity
○ Variety
● General usage is in
○ Genetic algorithms
○ Machine learning
○ Natural language processing
○ Time series analysis (a.k.a attribution analysis)
○ Visualizations
Big Data
● Our usage is
○ Analytics
○ User preference,personalization,profiling
○ Recommendation
○ Decision support system
● The standard known open source eco
systems
○ Hadoop
○ Event processors /stream engines e.g. storm,spark,S4
Big data (Hadoop..)
● Hadoop - Originally a component of Nutch, is now a
biggest driver in big data technologies.
● MapReduce a mechanism/framework to run massively
parallel systems. Published originally by Google.
● Mapreduce - the trick is distributed sorting.
● New languages for statistical computation e.g. R
Hadoop stack components
Image borrowed from http://blogs.gartner.com/merv-adrian/2013/02/21/hadoop-2013-part-two-projects/
Big data - Real time analysis
● While Map Reduce is great throughput
solution, it doesn’t help with real time or near
real time processing
● Eco system are evolving either coupled with
MapReduce or HDFS.
● Storm/Spark stream for augmenting
Mapreduce based computations.
Most important
● Ability to determine impact of changes
● Seamless deployments
?

More Related Content

What's hot

Consistent hashing algorithmic tradeoffs
Consistent hashing  algorithmic tradeoffsConsistent hashing  algorithmic tradeoffs
Consistent hashing algorithmic tradeoffsEvan Lin
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Tom Grek
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesPaolo Corti
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databasesiammutex
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012Gluster.org
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in SmalltalkESUG
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedShubham Tagra
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersbtoddb
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalabGluster.org
 

What's hot (20)

Consistent hashing algorithmic tradeoffs
Consistent hashing  algorithmic tradeoffsConsistent hashing  algorithmic tradeoffs
Consistent hashing algorithmic tradeoffs
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
 
Spark
SparkSpark
Spark
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databases
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Cassandra
CassandraCassandra
Cassandra
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Gluster Data Tiering
Gluster Data TieringGluster Data Tiering
Gluster Data Tiering
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
RPC in Smalltalk
 RPC in Smalltalk RPC in Smalltalk
RPC in Smalltalk
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 

Viewers also liked

What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemAlejandro Llaves
 
Middle Tier Scalability - Present and Future
Middle Tier Scalability - Present and FutureMiddle Tier Scalability - Present and Future
Middle Tier Scalability - Present and Futuredfilppi
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream ProcessingGyula Fóra
 
Apache Storm and Oracle Event Processing for Real-time Analytics
Apache Storm and Oracle Event Processing for Real-time AnalyticsApache Storm and Oracle Event Processing for Real-time Analytics
Apache Storm and Oracle Event Processing for Real-time AnalyticsPrabhu Thukkaram
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free FridayOtávio Carvalho
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormBrandon O'Brien
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 

Viewers also liked (13)

What we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing systemWhat we do to improve scalability in our RDF processing system
What we do to improve scalability in our RDF processing system
 
Middle Tier Scalability - Present and Future
Middle Tier Scalability - Present and FutureMiddle Tier Scalability - Present and Future
Middle Tier Scalability - Present and Future
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Apache Storm and Oracle Event Processing for Real-time Analytics
Apache Storm and Oracle Event Processing for Real-time AnalyticsApache Storm and Oracle Event Processing for Real-time Analytics
Apache Storm and Oracle Event Processing for Real-time Analytics
 
Apache Kafka - Free Friday
Apache Kafka - Free FridayApache Kafka - Free Friday
Apache Kafka - Free Friday
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
Kafka for Scale
Kafka for ScaleKafka for Scale
Kafka for Scale
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 

Similar to Scalability broad strokes

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IOPiyush Katariya
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architectureMarkus Klems
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...Robert Burrell Donkin
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistencyseldo
 
Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01Jay Patel
 
Introduction of data_structure
Introduction of data_structureIntroduction of data_structure
Introduction of data_structureeShikshak
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性YUCHENG HU
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 

Similar to Scalability broad strokes (20)

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
SNIA SDC 2016 final
SNIA SDC 2016 finalSNIA SDC 2016 final
SNIA SDC 2016 final
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Concurrency, Parallelism And IO
Concurrency,  Parallelism And IOConcurrency,  Parallelism And IO
Concurrency, Parallelism And IO
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01Introductionofdatastructure 110731092019-phpapp01
Introductionofdatastructure 110731092019-phpapp01
 
Introduction of data_structure
Introduction of data_structureIntroduction of data_structure
Introduction of data_structure
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
MySQL 高可用性
MySQL 高可用性MySQL 高可用性
MySQL 高可用性
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016Dynomite @ Redis Conference 2016
Dynomite @ Redis Conference 2016
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 

Recently uploaded (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 

Scalability broad strokes

  • 2. Definition ● Concurrency a.k.a number of simultaneous requests, Latency ● Throughput a.k.a total number of item processed ● Extensibility - application design for ability to add new features etc. ● We’d be mostly talking about first two.
  • 3. Concurrency & Performance ● Scalability is measured as number of requests/users an application support without degrading the performance. ● Performance is a measure of individual request process time mostly.
  • 4. Handling Scale ● Throttling ● Cache ● Stateful vs. stateless ● Asynchronous vs. synchronous ● Service oriented design
  • 5. Where (Multi tiered) ● At the client (Browser) ○ Http headers ○ Asynchronous calls ○ local DB ● At the server ( Web tier/application tier) ○ Cache -- distributed ○ Stateless ○ Asynchronous ● DB ○ Cap theorem
  • 6. Client ● Http headers ○ Pragmatic headers not only cache on browsers but help with intelligent proxies. ○ YSlow/G page speed guidelines are always useful. ○ e-Tags, long expiry are very good practices. ○ sprites and image maps ● Ajax is good for scalability but some time may cause performance issues.
  • 7. Client Server Network ● Always compress response. ● Even on JSON the bandwidth gains are great. ● In server-server calls consider binary protocols or more efficient ones ● Even on the web, network layer like spdy etc. are interesting.
  • 8. Server -- Numbers all should know ● http://static.googleusercontent.com/media/research. google.com/en//people/jeff/stanford-295-talk.pdf ● Writes are heavy. ● Disk seeks are heavier than network round trip with memory seek. ● Global shared data is expensive, if locking is involved. ● Reads do not need to be transactional, just consistent. ● Eventual consistency is useful.
  • 9. Server - Cache(Low latency) ● Cache ○ Complete HTML response ○ Output from Database ● Cache strategy is determined by ○ is it a broadcast? ○ is it a multicast? ○ A unicast? ● Cache works best for broadcast. ● Distributed Caching with consistent hash works very well. ● Pitfall is cache purge
  • 10. Server (Concurrency) ● Sequential processing is leaving out CPU and other resources ● Write parallelism is very important. ● But Shared globals are heavy, hence a trade off. ● In case of Java, JMM understanding is necessary. ● Amdahl’s Law helps in determining the maximum gain that can be achieved with parallel implementations. ● If making it parallel, even a small fraction of sequential work can cause loss of throughput
  • 11. Server (State?full:less) ● Given shared access is expensive, keeping state on server is heavy. ● Sessions if available on shared memory are great. ● No session and share nothing works best. ● Even cache is better. ● Generally stateless code is modular, easier to unit test and easier to profile. ● On a function stack than heap. ● Stateless helps in scale out. (Scale out??)
  • 12. Server Synchronous/Asynchronous ● Waiting for I/O, network connections, DB queries is bad. ● How about “query of death”? on write? ● Writes if not very small should be kept asynchronous. ● Helps on parallelization. ● Reliable queues can improve latency. ● idempotent code helps in avoiding many pitfalls. ● Generally asynchronous is achieved ○ Queue/Topic based infrastructure ■ Good for event processing and propagation of events ○ Incremental batches ● Asynch I/O ? servers, Node.js/ngnix/apache event mpm ??
  • 13. Debugging for Scale ● Profile ○ In java ■ gc logs ■ JVisualVM ■ Thread and memory dumps ○ GNU ■ hprof ■ strace ■ gdb ■ system utilities
  • 14. Scale Horizontal vs. vertical ● For a stateless, asynchronous, idempotent and multithreaded application the horizontal scaling works , very well. ● Easier to understand with storage a.k.a databases.
  • 15. Database ● Which type of DBMS ? ○ RDBMS ○ Key space based multi column family ○ Document based ○ Graph ○ any other NoSQL? ○ Solr and elasticsearch
  • 16. Database scale out limitation ● CAP theorem ○ Consistency ○ Availability ○ Partition tolerance ○ Not available simultaneously ● Eventual consistency is preferred choice.
  • 17. RDBMS ● Index based query always ● For RDBMS a query of death is a death knock. ● Generally Write once and read at multiple slaves works better. ● To normalize or not ● normalize for extensibility ● Use solr/nosql for read scale ● One multiple table join complex query or multiple simple query?? (performance/scale)
  • 18. NoSQL ● Several options ranging from document databases to multiple column family ● We mostly use ○ Mongo ○ Cassandra ○ Neo4j (in some cases) ○ Titan ● Provide very high throughput with manageable clustering/sharding
  • 19. Mongo (iBeat) ● Increasing data volumes threatens the scalability and availability ● Though search is available, it’s not very efficient. ● The limit of a single document is 16 MB. ● Repair DB and reindexing do impact performance.
  • 20. Mongo (iBeat ..) ● Mongo sharding as a solution ● Data volume per replica set decreased. ● For document size limit gridFS was used. ● With less document volume, the overhead of index etc. reduced. ● But sharding itself with large amount of data was carried out over a long period of time.
  • 21. Big Data ● Normally associated with such large and complex data that traditional data management/visualization tools fail to capture, curate or process. ● Current definition defines 3 aspects a.k.a (3V) ○ Volume ○ Velocity ○ Variety ● General usage is in ○ Genetic algorithms ○ Machine learning ○ Natural language processing ○ Time series analysis (a.k.a attribution analysis) ○ Visualizations
  • 22. Big Data ● Our usage is ○ Analytics ○ User preference,personalization,profiling ○ Recommendation ○ Decision support system ● The standard known open source eco systems ○ Hadoop ○ Event processors /stream engines e.g. storm,spark,S4
  • 23. Big data (Hadoop..) ● Hadoop - Originally a component of Nutch, is now a biggest driver in big data technologies. ● MapReduce a mechanism/framework to run massively parallel systems. Published originally by Google. ● Mapreduce - the trick is distributed sorting. ● New languages for statistical computation e.g. R
  • 24. Hadoop stack components Image borrowed from http://blogs.gartner.com/merv-adrian/2013/02/21/hadoop-2013-part-two-projects/
  • 25. Big data - Real time analysis ● While Map Reduce is great throughput solution, it doesn’t help with real time or near real time processing ● Eco system are evolving either coupled with MapReduce or HDFS. ● Storm/Spark stream for augmenting Mapreduce based computations.
  • 26. Most important ● Ability to determine impact of changes ● Seamless deployments
  • 27. ?