"Surviving highload with Node.js", Andrii Shumada

•

1 like•140 views

This document discusses highload systems and strategies for scaling Node.js applications to handle increased traffic. It recommends using multiple servers for redundancy and handling spikes in load. Key metrics for monitoring include status codes, backend latency, CPU and memory utilization, and event loop lag. Batching operations to third parties and sampling logs are suggested to reduce load. Offloading heavy tasks to workers can also help optimize performance. The document emphasizes monitoring systems closely and using as few servers as possible through optimization.

Technology

ABOUT ME
- 14y. in IT
- 13y. in Node.js dev
- RnD Team Lead at WalkMe
- working with highload services

Why 2+?
Redundancy
1 service can shut
down or brake
0 downtime updates
You can update 1
service, while 2nd
will handle requests
2 is a minimum number of
servers even for
non-highload projects

When do you need
2+ servers?
- Customers are complaining about
performance
- Your metrics show performance
degradation

- Yes
- Any code optimization has
its limits
- At some point you will
reach your CPU capacity
with more users
Maybe optimize your app?

So adding more servers is the right
approach to handle more request?

Status codes
the more 2xx - the better
the less 5xx - the better
Backend latency
Preferably to respond under 200ms
To satisfy business needs
Be cost effective
The less we spend - the more money
business can get.
How to achieve this?

CPU
~40-60% avg utilization
Memory
<50% max utilization
Trafﬁc pattern
This can affect our auto scaling
parameters
Active handles
Spikes of active handles can block
requests from being processed
Active requests
Spikes of active requests can block
requests from being processed
Event loop lag
can be reason, why we can’t handle
requests in time
Monitoring & auto scaling

$$$$
Case 1: trafﬁc increases and
decreases gradually
$$$$

Case 2: trafﬁc or/and CPU
usage increases and decreases
sporadically
$$
$$
$$
$$
- potential money
saving
Hard to auto scale such systems,
there are some heavy requests.
Possible solution - ofﬂoad CPU heavy
tasks to ofﬂine jobs (workers,
separate deployments)

Node.js metrics: event loop lag
Hundreds of these can cause high event loop lag and
lead to app unresponsiveness.
Mitigation: add setImmediate() to your cycles

event loop lag in sync methods
I hope you are not using sync methods of fs.
Use async variations of methods everywhere.
Do not use it
Use it

How to capture these?
default metrics can be collected in register
of prom-client and later exposed by your
http server, so Prometheus can collect
them and display in Grafana

Exploring event loop lag
Avg event loop lag > 100ms is the case for investigation

Other default metrics, that are collected with
“collectDefaultMetrics”
https://github.com/siimon/prom-client/tree/master/lib/metrics

Debug speciﬁc pod and check types of handles
Incoming http requests
from load balancer
Outgoing connections to
3rd parties
'Number of active libuv handles grouped by handle type. Every handle type is C++ class
name.'

Code improvements: batch writes
Kafka write example.
Batch operations are also supported by Kinesis,
DynamoDb, Aerospike and many more

Batch writes example
Can be applied to any 3rd party, that supports batch writes

Logs, what can go wrong?
100_000 * 3_600 = 0.36B/h
- How much you would pay
to DataDog for this?
- What network load this
will create?
- What CPU load this will
create?
- How would you navigate
through 0.36B of logs per
hour?
In highload this can become

mitigation 1: sample errors
You don’t need all 100_000 errors in your logs

mitigation 2: store statistics of errors
It’s important to know when and how many errors did you
have

Now combine these methods
Error messages should be
persistent
You will know exact
number of events that
happened
You still can find details
about the error, where it
happened
You should tune log rate
to your load. it can be any
number 0.00001%-100%

Conclusion
Horizontal scale is most effective way
to handle more requests
Use as little servers as possible
Use batch operations when possible
log only needed amount of logs
Ofﬂoad heavy jobs to “ofﬂine workers”
Eliminate long blocking operations
Monitor everything

THANK YOU!
Time for questions!
Andrii Shumada
More talks:
https://eagleeye.github.io

Performance is the most important attribute for success of any commercial and Enterprise Software. In a client server environment, developers focus a lot on optimizing the Data and Logical Tiers. Optimization of Presentation Tier which is responsible for more than 30 % of performance is usually ignored. The document is developed with the intension to teach the technical staff on Optimizing the Presentation Tier which significantly improves the performance of the Client Server applications.

Low latency in java 8 by Peter Lawrey

J On The Beach

What are some of the performance implications of using lambdas and what strategies can be used to address these. When might be want an alternative to using a lambda and how can we design our APIs to be flexible in this regard. What are the principles of writing low latency code in Java? How do we tune and optimize our code for low latency? When don’t we optimize our code? Where does the JVM help and where does it get in our way? How does this apply to lambdas? How can we design our APIs to use lambdas and minimize garbage?

Starting Your DevOps Journey – Practical Tips for Ops

Dynatrace

To watch, please see: https://info.dynatrace.com/apm_wc_getting_started_with_devops_na_registration.html Starting Your DevOps Journey: Practical Tips for Ops In this webinar, Andreas Grabner, Chief DevOps Activist at Dynatrace, shares practical tips that all IT groups from Dev to Ops can use to start their DevOps journey quickly. With experience from hundreds of DevOps deployments, Andi provides insights it would take your team months or years to learn firsthand. - Learn how everyone on your Ops team can use APM to better understand and monitor SLAs, Performance and End User Impact of their applications. - Foster better collaboration between Ops and architects by extending basic system monitoring to monolith and microservices architectures. - Shift-left your testing and QA by working with metrics that you and the architects agreed on up front, resulting in early relevant feedback and faster code deployments. - Hear why changing the cultural mindset from “fear of change” to “Continuous Innovation and Optimization” is critical for success. Andi is joined by guest speaker, Brian Chandler, Systems Engineer at Raymond James, who shares commonly used Ops dashboards that increase collaboration across IT teams and pro-actively break down silos!

Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...

Prolifics

Abstract: Recent projects have stressed the "need for speed" while handling large amounts of data, with near zero downtime. An analysis of multiple environments has identified optimizations and architectures that improve both performance and reliability. The session covers data gathering and analysis, discussing everything from the network (multiple NICs, nearby catalogs, high speed Ethernet), to the latest features of extreme scale. Performance analysis helps pinpoint where time is spent (bottlenecks) and we discuss optimization techniques (MQ tuning, IIB performance best practices) as well as helpful IBM support pacs. Log Analysis pinpoints system stress points (e.g. CPU starvation) and steps on the path to near zero downtime.

Managing the logs of your (Rails) applications - Arrrrcamp 2011

lennartkoopmann

Whitepaper: Volume Testing Thick Clients and Databases

RTTS

Even in the current age of cloud computing there are still endless benefits of developing thick client software: non-dependency on browser version, offline support, low hosting fees, and utilizing existing end user hardware, to name a few. It's more than likely that your organization is utilizing at least a few thick client applications. Now consider this: as your user base grows, does your think client's back-end server need to grow as well? How quickly? How do you ensure that you provide the correct amount of additional capacity without overstepping and unnecessarily eating into your profits? The answer is volume testing. Read how RTTS does this with IBM Rational Performance Tester.

If you'd like to learn more about Prometheus, contact us at prometheus@robustperception.io or follow us on twitter at https://twitter.com/RobustPerceiver Prometheus is a next-generation monitoring system designed for microservices. This talk will look at what's the best way to monitor your microservices, which metrics you should care about, how to have useful alerts and how Prometheus empowers you to do things the right way.

Scalable Apache for Beginnerswebhostingguy

Prometheus and Docker (Docker Galway, November 2015)

Brian Brazil

Enterprise application performance - Understanding & Learnings

Dhaval Shah

Server Monitoring (Scaling while bootstrapped)

Ajibola Aiyedogbon

Serverless Computing

Anand Gupta

Introduce AWS Lambda for newbie and Non-IT

Chitpong Wuttanan

Introduce AWS Lambda for newbie and Non-IT อธิบาย ความเป็นมาของ Serverless และ AWS Lambda คืออะไร ดีอย่างไร เพื่อให้คนไม่รู้จักและคนที่ไม่ใช่ IT ได้เข้าใจง่ายๆ Index - What's Serverless - What's AWS Lambda - Working with AWS Lambda - AWS Lambda Life-Cycle - AWS Lambda Anatomy - Beware Cold Start - How to debug - Do and Don't to implement - Pricing structure and example - Advantage/Disadvantage Presentation is English Version Blog is Thai Version : https://myifew.com/5166/understand-serverless-with-aws-lambda-for-newbie/

Introduction to requirement of microservices

Avik Das

We are talking about microservices. It is a pattern to resolve the complexity of the system those need to process a high amount of data within a short period. Financial lose may occur on implementation of this pattern for an application of limited complexity in the initial phase. Initial phases have a learning curve to understand the relation and behavior of domain entities. Small and medium companies lean this during development. Large companies can allocate additional times for documentation and correction on design phases for a reasonable long period. So, sometimes it is good to start with a monolithic architecture and grow with the achievement of the company then migrate to microservices.

Operations: Production Readiness

Amazon Web Services

Yazid Boutejder: AWS San Francisco Startup Day, 9/7/17 Operations: Production Readiness Review – how to stop bad things from happening - There is more to deploying code than pushing the deploy button. A good practice that many companies follow is a Production Readiness Review (PRR) which is essentially a pre-flight check list before a service launches. This helps ensure new services are properly architected, monitored, secured, and more. We’ll walk through an example PRR and discuss the value of ensuring each of these is properly taken care of before your service launches.

Cloud Native & Service Mesh

Roi Ezra

Richardrodger nodeday-2014-final

Richard Rodger

This talk is an appeal to server-side JavaScript developers to make use of this time of change - Node.js is going to become the primary server-side platform for most developers. We can move forward from the old way of building web apps as large inter-locking co-dependent code bases. The Node.js module system has shown us the way. It's the first step. Now, we need to use the beauty of Node modules to help us build robust, scalable apps. This approach is called the Micro-Services Architecture. It's more than just having some services with HTTP end-points. It's about taking this to the extreme. Everything is a service, and no service is larger than 100 lines of code. We've been using this approach for most of our projects for the last 18 months and it works really well. We get to drop loads of project management ceremony. There will be some customer war stories.

"What I learned through reverse engineering", Yuri Artiukh

Fwdays

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

Similar to "Surviving highload with Node.js", Andrii Shumada

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

Brian Brazil

X-Ray distributed tracing proof-of-concept

Aram Alipoor

Serverless meetup Auckland #6

Myles Henaghan

Performance Optimization in Large Systems - Cusec 2019

Pierre-Luc Maheu

High-Speed Reactive Microservices - trials and tribulations

Rick Hightower

Going Serverless on AWS

Aleksandr Maklakov

Building and Scaling a WebSockets Pubsub System

Kapil Reddy

Deep Dive: AWS X-Ray London Summit 2017

Randall Hunt

Scalability using Node.js

ratankadam

Microservices and Prometheus (Microservices NYC 2016)

Brian Brazil

Scalable Apache for Beginnerswebhostingguy

Prometheus and Docker (Docker Galway, November 2015)

Brian Brazil

Enterprise application performance - Understanding & Learnings

Dhaval Shah

Server Monitoring (Scaling while bootstrapped)

Ajibola Aiyedogbon

Serverless Computing

Anand Gupta

Introduce AWS Lambda for newbie and Non-IT

Chitpong Wuttanan

Introduction to requirement of microservices

Avik Das

Operations: Production Readiness

Amazon Web Services

Cloud Native & Service Mesh

Roi Ezra

Richardrodger nodeday-2014-final

Richard Rodger

Similar to "Surviving highload with Node.js", Andrii Shumada (20)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

X-Ray distributed tracing proof-of-concept

Serverless meetup Auckland #6

Performance Optimization in Large Systems - Cusec 2019

High-Speed Reactive Microservices - trials and tribulations

Going Serverless on AWS

Building and Scaling a WebSockets Pubsub System

Deep Dive: AWS X-Ray London Summit 2017

Scalability using Node.js

Microservices and Prometheus (Microservices NYC 2016)

Scalable Apache for Beginners

Prometheus and Docker (Docker Galway, November 2015)

Enterprise application performance - Understanding & Learnings

Server Monitoring (Scaling while bootstrapped)

Serverless Computing

Introduce AWS Lambda for newbie and Non-IT

Introduction to requirement of microservices

Operations: Production Readiness

Cloud Native & Service Mesh

Richardrodger nodeday-2014-final

More from Fwdays

"What I learned through reverse engineering", Yuri Artiukh

Fwdays

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

"Micro frontends: Unbelievably true life story", Dmytro Pavlov

Fwdays

"Objects validation and comparison using runtime types (io-ts)", Oleksandr Suhak

Fwdays

"JavaScript. Standard evolution, when nobody cares", Roman Savitskyi

Fwdays

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...

Fwdays

"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii

Fwdays

In my talk, I will tell about the world of GenAI services beyond GPT-wrappers and how we developed and scaled GenAI-centric applications. I'll share personal experiences about the obstacles, lessons, and strategic tools and methodologies that were key in taking GenAI applications from 0 to 1. I'll talk about the challenges we faced when launching LLM-based and image generative applications and delivering them to end users, and what conclusions and solutions were made.

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Fwdays

Python engineers are introduced to the transformative potential of Large Language Models (LLMs) in the realm of advanced data analysis and the application of Semantic Kernel techniques. We will talk about how LLMs like ChatGPT can be integrated into Python environments to automate data processing, enhance predictive modeling, and unlock deeper insights from complex datasets. The session will delve into practical strategies for embedding Semantic Kernel methods within Python projects, illustrating how these advanced techniques can refine the accuracy of machine learning models by embedding domain-specific knowledge directly into the analysis process. Attendees will leave with a clear roadmap for leveraging the combined power of LLMs and Semantic Kernels, equipped with actionable knowledge to drive innovation in their data analysis projects and beyond, marking a significant leap forward in the evolution of Python engineering practices.

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Fwdays

"What is a RAG system and how to build it",Dmytro Spodarets

Fwdays

"Debugging python applications inside k8s environment", Andrii Soldatenko

Fwdays

"ML in Production",Oleksandr Bagan

Fwdays

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Fwdays

Ever seen a code base where understanding a simple method meant jumping through tangled class hierarchies? We all have! And while "Favor composition over inheritance!" is almost as old as object-oriented programming, strictly avoiding all types of subclassing leads to verbose, un-Pythonic code. So, what to do? The discussion on composition vs. inheritance is so frustrating because far-reaching design decisions like this can only be made with the ecosystem in mind – and because there's more than one type of subclassing! Let's take a dogma-free stroll through the types of subclassing through a Pythonic lens and untangle some patterns and trade-offs together. By the end, you'll be more confident in deciding when subclassing will make your code more Pythonic and when composition will improve its clarity.

"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi

Fwdays

The current architecture of Prom.ua is built on microservices and GraphQL API, but it was not always like that. In this talk, I'll tell you how far we've come and how we've made using graphs in a microservice architecture convenient and simple. I will talk about the problems we faced and how we overcame them, made our development process more accessible, deployments faster, and the remains of the monolith less loaded.

"Rethinking the existing data loading and processing process as an ETL exampl...

Fwdays

ETL stands for extract, transform, load. It's a process that combines data from different sources into a single repository for further processing, analysis, and utilization. This talk provides an example of how pandas can be used to solve ETL tasks as a stage in the evolution of the data intake component. This involves preliminary validation, filtering, and conversion of data according to a set of business rules and internal representation, with intermediate combination with other sources.

"How Ukrainian IT specialist can go on vacation abroad without crossing the T...

Fwdays

I’m confident that many IT professionals are currently facing the same situation I was in a few months ago. Mobilization, uncertainty. How can I be maximally beneficial to the country with my experience and continue professional development in such circumstances? Since the onset of the full-scale invasion, I've been actively volunteering and assisting the army. Mobilization became the next logical step. I want to share: My journey in IT, volunteering, and the beginning of my service in the Armed Forces Impressions from the first few months Which Soft Skills are helpful in this context I aim to dispel myths about the mobilization process and projects of the Armed Forces. Address your questions And yes, military personnel can travel abroad during their leave.

"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...

Fwdays

The leader must be strong all the time. The leader cannot afford to make mistakes, let alone fail in front of their team. Is that really true? Nick Gicinto, a cybersecurity leader with over 25 years of experience, who has worked for the CIA and has built security systems from scratch at Tesla and Uber, fully hiring teams for these projects, will talk about the importance of being vulnerable to build trust within a team.

"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...

Fwdays

Sharing open feedback can be difficult because it equals much work on yourself. However, feedback needs attention and a special place in the corporate culture. It helps to grow dynamically, build a team of like-minded people and achieve powerful results. In the presentation, I will talk about: The ability to work with feedback as a soft, solid skill in developing technical specialists. A list of difficulties that prevent quality work with feedback. The 4A Framework is a tool for successful giving and receiving feedback. I will also help specialists learn the following: Form constructive feedback and understand how and when to give it. Work analytically with the received feedback. Feel free to share your thoughts and be heard.

"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...

Fwdays

Will discuss: Current communication challenges, including mishaps and toxic versus productive interactions. Ever wondered about PDP? It’s likely because its relevance to career planning, even outside your current company, hasn’t been fully spotlighted. Exploring how PDP functions within career planning, applicable even if you’re eyeing an exit. “Who do I aspire to become?” Summarizing key points with a reference to a practical form you can download to use.

"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...

Fwdays

This talk will reveal four destructive communication patterns that can undermine team spirit, reduce productivity and cause conflict, and offer effective strategies for neutralizing them. Let's start with exciting storytelling about a fictional team of developers working on Scrum. You will learn about situations that their team member noticed during team meetings. Next, we will analyze "The Gottman Four Horsemen" model, which describes the four "horsemen of the apocalypse" of work relationships: criticism, defensiveness, contempt, and stonewalling. For each of these patterns, specific "antidotes" will be offered that allow you to build healthier and more productive relationships in the team. Finally, we'll look at why this topic is critical to team productivity, drawing on Google's "Project Aristotle" research. Special attention will be paid to the concept of psychological safety, which is a key factor in the success of high-performance teams. This talk will not only provide valuable insights and tools for improving communication and management in Tech teams, but will also help each member better understand their own contribution to the overall success of the team.

More from Fwdays (20)