Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Тема доклада
Тема доклада
Тема доклада
KYIV 2019
.Net Core in production
By Leonid Molotiievskyi
.NET CONFERENCE #1 IN UKR...
2
About me
• Hands-on software architect and technological
consultant
• Good at splitting a monolith to microservices
• Bu...
3
Spoilers about what we are going to talk
Agenda
Context overview
Environment that we used to live with
Scaling
How did w...
4
Hell for the DevOps
teamDo we solve the right problem?
Useful advices
The things that can help you to resolve
the proble...
Context overview
Several statements about the project
6
Context overview
• Financial domain
• 25+ microservices
• Team 70+ people
• 20+ environments
• Three versions in support...
7
Solution overview: managing workflows
Scaling
How did we scale our services?
9
Notification service
10
Solution?
- And a set of dummy queues left after descale/redeploy appear
11
Gateway: infinite redirect
- Where do we store
keys for cookies?
12
Gateway: infinite redirect solution
Hell for the DevOps team
What technology decisions helped us to survive
14
Each morning…
• Dev/Staging/Prod cluster is down
• RabbitMq/Mongo/Consul/Prometheus is not
operational
• The fire-fight...
15
Greedy service
16
Queues are growing… - 1
• “TTL time is too small” or?
17
Queues are growing… - 2
• A queue has a set of consumers
• Service A consumes the message
• Service A starts processing...
18
OOM Killed issue
• .Net Core 2.2 doesn’t respect docker limits:
https://github.com/aspnet/AspNetCore/issues/3409
https:...
19
Let’s fix issue by upgrade to .Net Core 3.0?
https://github.com/mongodb/mongo-csharp-driver/pull/372/files
20
Socket file descriptor leak in HttpClient
21
Docker: no space left on the device
level=info msg="[8] System error: write
/sys/fs/cgroup/docker/01f5670fbee1f6687f58f...
22
Reason:
23
Prometheus is down
Useful advices
What can prevent nasty situations
25
What can help you to find them?
Configured monitoring to track:
• Memory consumption
• CPU consumption
• Number of thre...
26
Use the standard health check middleware
27
Setup environment in the way…
• Infrastructure services must have HA setup
• Deploy at least two instances of each serv...
Lessons learned
What we get from it
29
Lessons learned
• ”Do it as simple as possible” principle
doesn’t work. “Do it in the smart way” - works
• Think about ...
30
Follow me @lmolotii on
Q&A
THANKS FOR WATCH !!!
Upcoming SlideShare
Loading in …5
×

of

.NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 1 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 2 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 3 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 4 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 5 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 6 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 7 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 8 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 9 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 10 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 11 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 12 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 13 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 14 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 15 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 16 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 17 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 18 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 19 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 20 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 21 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 22 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 23 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 24 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 25 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 26 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 27 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 28 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 29 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 30 .NET Fest 2019. Леонид Молотиевский. DotNet Core in production Slide 31
Upcoming SlideShare
What to Upload to SlideShare
Next

0 Likes

Share

.NET Fest 2019. Леонид Молотиевский. DotNet Core in production

Во время доклада, я поделюсь с Вами опытом, который мы получили, используя микросервисы в прод K8S кластере. Также, обозначу основные проблемы, с которыми столкнулась наша команда на этапе их диагностики. И, самое главное - что мы сделали чтобы избежать их в будущем. Отвечу на вопросы: Почему мы мигрировали в облако? Почему dotNet Core 2.2 вызвал кучу проблем? Данный доклад сохранит сотни часов вашим разработчикам и DevOps команде, жизнь которой может напоминать кошмар.

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

.NET Fest 2019. Леонид Молотиевский. DotNet Core in production

  1. 1. Тема доклада Тема доклада Тема доклада KYIV 2019 .Net Core in production By Leonid Molotiievskyi .NET CONFERENCE #1 IN UKRAINE
  2. 2. 2 About me • Hands-on software architect and technological consultant • Good at splitting a monolith to microservices • Built a huge enterprise financial solution from scratch • Technical guy who believes that right people decisions are more important than technological ones • Speaker and mentor
  3. 3. 3 Spoilers about what we are going to talk Agenda Context overview Environment that we used to live with Scaling How did we scale our services?
  4. 4. 4 Hell for the DevOps teamDo we solve the right problem? Useful advices The things that can help you to resolve the problem Lessons learned How can we benefit in future? Q&A Questions and answers
  5. 5. Context overview Several statements about the project
  6. 6. 6 Context overview • Financial domain • 25+ microservices • Team 70+ people • 20+ environments • Three versions in support one in development
  7. 7. 7 Solution overview: managing workflows
  8. 8. Scaling How did we scale our services?
  9. 9. 9 Notification service
  10. 10. 10 Solution? - And a set of dummy queues left after descale/redeploy appear
  11. 11. 11 Gateway: infinite redirect - Where do we store keys for cookies?
  12. 12. 12 Gateway: infinite redirect solution
  13. 13. Hell for the DevOps team What technology decisions helped us to survive
  14. 14. 14 Each morning… • Dev/Staging/Prod cluster is down • RabbitMq/Mongo/Consul/Prometheus is not operational • The fire-fighter team is on the duty
  15. 15. 15 Greedy service
  16. 16. 16 Queues are growing… - 1 • “TTL time is too small” or?
  17. 17. 17 Queues are growing… - 2 • A queue has a set of consumers • Service A consumes the message • Service A starts processing the message • Heath check of consumer fails due to high load of service A/network issue/OOM killed/etc. • Duplicated message appear in the queue
  18. 18. 18 OOM Killed issue • .Net Core 2.2 doesn’t respect docker limits: https://github.com/aspnet/AspNetCore/issues/3409 https://github.com/dotnet/coreclr/issues/18971 • ” Server GC was designed with the assumption that the process using Server GC is the dominant process on the machine. By default it uses as many heaps as there are # of processors on the machine.”
  19. 19. 19 Let’s fix issue by upgrade to .Net Core 3.0? https://github.com/mongodb/mongo-csharp-driver/pull/372/files
  20. 20. 20 Socket file descriptor leak in HttpClient
  21. 21. 21 Docker: no space left on the device level=info msg="[8] System error: write /sys/fs/cgroup/docker/01f5670fbee1f6687f58f3a943b1e1bdaec26 30197fa4da1b19cc3db7e3d3883/cgroup.procs: no space left on device"
  22. 22. 22 Reason:
  23. 23. 23 Prometheus is down
  24. 24. Useful advices What can prevent nasty situations
  25. 25. 25 What can help you to find them? Configured monitoring to track: • Memory consumption • CPU consumption • Number of threads on worker node • Number of open socket descriptors per node/pod • Connection refused errors • Correlation Ids in logs • Number of messages in queues • Number of consumers for queues
  26. 26. 26 Use the standard health check middleware
  27. 27. 27 Setup environment in the way… • Infrastructure services must have HA setup • Deploy at least two instances of each service • Setup monitoring and alerting • To be sure that “temporary data” disappear after redeployment • To not configure something manually
  28. 28. Lessons learned What we get from it
  29. 29. 29 Lessons learned • ”Do it as simple as possible” principle doesn’t work. “Do it in the smart way” - works • Think about application scaling from the begging • Know about open issues inside your target framework • Do not blame DevOps team, try to help them to find out what is the reason
  30. 30. 30 Follow me @lmolotii on Q&A
  31. 31. THANKS FOR WATCH !!!

Во время доклада, я поделюсь с Вами опытом, который мы получили, используя микросервисы в прод K8S кластере. Также, обозначу основные проблемы, с которыми столкнулась наша команда на этапе их диагностики. И, самое главное - что мы сделали чтобы избежать их в будущем. Отвечу на вопросы: Почему мы мигрировали в облако? Почему dotNet Core 2.2 вызвал кучу проблем? Данный доклад сохранит сотни часов вашим разработчикам и DevOps команде, жизнь которой может напоминать кошмар.

Views

Total views

199

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×