Cây nhị phân tìm kiếm là 1 cấu trúc dữ liệu quen thuộc với chúng ta. Có rất nhiều nghiên cứu và các thuật toán xoay quanh cấu trúc dữ liệu này. Trong talk này, xin giới thiệu một kỹ thuật giúp tối ưu cây nhị phân tìm kiếm dựa trên tần suất tìm kiếm, qua đó giúp giảm chi phí tìm kiếm xuống mức thấp nhất.
- Speaker: Phong Vu - Software Engineer
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
Cây nhị phân tìm kiếm là 1 cấu trúc dữ liệu quen thuộc với chúng ta. Có rất nhiều nghiên cứu và các thuật toán xoay quanh cấu trúc dữ liệu này. Trong talk này, xin giới thiệu một kỹ thuật giúp tối ưu cây nhị phân tìm kiếm dựa trên tần suất tìm kiếm, qua đó giúp giảm chi phí tìm kiếm xuống mức thấp nhất.
- Speaker: Phong Vu - Software Engineer
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
Hopping in clouds: a tale of migration from one cloud provider to anotherMichele Orselli
Nowadays there are a lot of cloud providers, with a wide range of offers. Web projects usually have continuously changing needs: what worked well yesterday may not be enough today. These two facts became quite obvious for us in the last year while migrating a PHP application from Rackspace to Amazon. In this session I’d like to share this experience highlighting infrastructure and code evolution, migration steps, cost analisys, issues.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
The Professional Developers Conference (PDC) is the definitive developer event focused on the technical strategy of the Microsoft developer platform. In this session David Glover & Catherine Eibner provide their summary of what was hot to trot at the PDC 2009.
Michael DeSa will go over some of the advanced topics in Kapacitor such as joins, templated tasks, and debugging your tasks. Prerequisite: Intro To Kapacitor.
Building with Watson - Serverless Chatbots with PubNub and ConversationIBM Watson
PubNub helps you manage your streaming data, and now it is easy to add Watson-powered machine intelligence to those streams with BLOCKS. In this Building with Watson webinar, Josh Marinacci, Head of Developer Relations at PubNub, demonstrates how he used the Watson Conversation PubNub BLOCK to build a geology themed chatbot, Mr. Rockbot.
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Fwdays
Стабильный и быстрый сайт => довольные пользователи => успешный бизнес.
Хороший сайт это живой организм, он растет и меняется, и как всем живим организмам за ним требуется наблюдение.
Я расскажу об инструментах и методиках, помогающих поддерживать ваш сайт в тонусе на примере PHP сайта.
Важность DevOps культуры
Доступные инструменты для логирования, мониторинга и алертинга
Какие метрики собирать, как их анализировать
Особенности распределенной, микросервисной архитектуры
Опыт upwork.com в этой области
Рассмотрим следующие инструменты:
- Custom solutions
- Google Analytics
- New Relic
- ELK stack (Elasticsearch / Logstash / Kibana)
- Graphite / StatsD / Grafana
- PagerDuty / PostMortems
- Zipkin
- ...
Доклад во многом применим не только к РHP продуктам.
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel
Micronaut and Quarkus are two cool emerging Java backend frameworks that aim to solve some problems that exist in current frameworks, like faster startup, low memory footprint, and support for ahead-of-time compilation using GraalVM. In this session, we'll square off both frameworks against each other.
How do they compare, what are the stronger and weaker points of both frameworks?
We'll compare the following features:
Initializing your project
Building your first restcontroller / programming model
Startup time
Database support
Integration test support
Building native images
Memory usage and JAR sizes
Ease of cloud deployment
In the end, we might have a clear winner! ... or will we?
Event-based APIs are becoming more popular, enabling developers to craft new integrations and solutions that go beyond the original design of an API. Yet, there remains a challenge: how can teams design thoughtful event-based APIs that are long-lasting, evolvable, and discoverable? This talk will dive into the design practices of event-based APIs, including tips for determining which protocol(s) you should select, design patterns we should apply, and anti-patterns should we avoid. We will also look at how AI and tools such as ChatGPT are starting to shape the next generation of APIs.
Delivered on May 10, 2023 for the EDA Summit
More Related Content
Similar to Using Coroutines to Create Efficient, High-Concurrency Web Applications
Hopping in clouds: a tale of migration from one cloud provider to anotherMichele Orselli
Nowadays there are a lot of cloud providers, with a wide range of offers. Web projects usually have continuously changing needs: what worked well yesterday may not be enough today. These two facts became quite obvious for us in the last year while migrating a PHP application from Rackspace to Amazon. In this session I’d like to share this experience highlighting infrastructure and code evolution, migration steps, cost analisys, issues.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
The Professional Developers Conference (PDC) is the definitive developer event focused on the technical strategy of the Microsoft developer platform. In this session David Glover & Catherine Eibner provide their summary of what was hot to trot at the PDC 2009.
Michael DeSa will go over some of the advanced topics in Kapacitor such as joins, templated tasks, and debugging your tasks. Prerequisite: Intro To Kapacitor.
Building with Watson - Serverless Chatbots with PubNub and ConversationIBM Watson
PubNub helps you manage your streaming data, and now it is easy to add Watson-powered machine intelligence to those streams with BLOCKS. In this Building with Watson webinar, Josh Marinacci, Head of Developer Relations at PubNub, demonstrates how he used the Watson Conversation PubNub BLOCK to build a geology themed chatbot, Mr. Rockbot.
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Fwdays
Стабильный и быстрый сайт => довольные пользователи => успешный бизнес.
Хороший сайт это живой организм, он растет и меняется, и как всем живим организмам за ним требуется наблюдение.
Я расскажу об инструментах и методиках, помогающих поддерживать ваш сайт в тонусе на примере PHP сайта.
Важность DevOps культуры
Доступные инструменты для логирования, мониторинга и алертинга
Какие метрики собирать, как их анализировать
Особенности распределенной, микросервисной архитектуры
Опыт upwork.com в этой области
Рассмотрим следующие инструменты:
- Custom solutions
- Google Analytics
- New Relic
- ELK stack (Elasticsearch / Logstash / Kibana)
- Graphite / StatsD / Grafana
- PagerDuty / PostMortems
- Zipkin
- ...
Доклад во многом применим не только к РHP продуктам.
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel
Micronaut and Quarkus are two cool emerging Java backend frameworks that aim to solve some problems that exist in current frameworks, like faster startup, low memory footprint, and support for ahead-of-time compilation using GraalVM. In this session, we'll square off both frameworks against each other.
How do they compare, what are the stronger and weaker points of both frameworks?
We'll compare the following features:
Initializing your project
Building your first restcontroller / programming model
Startup time
Database support
Integration test support
Building native images
Memory usage and JAR sizes
Ease of cloud deployment
In the end, we might have a clear winner! ... or will we?
Event-based APIs are becoming more popular, enabling developers to craft new integrations and solutions that go beyond the original design of an API. Yet, there remains a challenge: how can teams design thoughtful event-based APIs that are long-lasting, evolvable, and discoverable? This talk will dive into the design practices of event-based APIs, including tips for determining which protocol(s) you should select, design patterns we should apply, and anti-patterns should we avoid. We will also look at how AI and tools such as ChatGPT are starting to shape the next generation of APIs.
Delivered on May 10, 2023 for the EDA Summit
Similar to Using Coroutines to Create Efficient, High-Concurrency Web Applications (20)
6. Meebo Bar 1000+ sites Quantcast: 197 MM monthly uniques* LOTS of pageviews LOTS of ad requests 6 * http://bit.ly/xAPXx
7. Meebo’s Ad Server Given User features Available ads Objective Maximize revenue P(click) Price Satisfy advertisers Respect targeting Smooth campaign delivery Complex application Lots of concurrent requests 7
9. Sample App: FortuneTeller Given Username Available fortunes Objective Select fortune for user JaccardSimilarity(username, fortune) username=PyConIsForLovers “Generosity and perfection are your everlasting goals.” 9
17. Take Two: Apache mod_wsgi Using mpm_prefork Worker processes handle requests One concurrent request per process Memory cached between requests O/S schedules CPU 17
18. Take Two: Apache mod_wsgi Advantages Straightforward, synchronous code Cached memory Disadvantages Resource inefficient Need working set in each process Cold cache on restart Managing worker count Too few: 502 Too many: OOM? Database DoS? 18
21. Take Three: Twisted Asynchronous framework Events and callbacks Twisted orchestrates context switches Twisted server Single event loop Concurrent requests 21
22. Quick Break: Event Loops s = socket.socket(…) s.setblocking(ISBLOCKING) s.connect((HOST, PORT)) greeting = s.recv(1024) s.close() Blocking Wait for data Nonblocking Initiate, return immediately Data (if available) Exception: “I’m not done yet” Requires more plumbing 22
23. Quick Break: Event Loops Nonblocking sockets in an event loop 23 1. f(x): s = NonBlockingSocket(…) greeting = s.recv(1024) print x, “|”, greeting Events fd=5, fp=g, {s: ‘hi’, a: 5} fd=5, fp=g, {s: ‘hi’, a: 5} 2. Call recv(). fd=2, fp=f, {x: 8080} 3. Create context, add to the event loop. fd=3, fp=myfunc, {} fd=3, fp=myfunc, {} 4. Process events that are ready (select/poll). fd=18, fp=f, {x: 80} fd=18, fp=f, {x: 80} fd=18, fp=f, {x: 80} 5. Return to context when data is ready. 6. “80 | Hello from socket s!”
24. Take Three: Twisted Asynchronous framework Events and callbacks Twisted orchestrates context switches Twisted server Single event loop Concurrent requests 24
25. Take Three: Twisted Advantages Shared memory User space context switches Disadvantages Develop asynchronously Stuck in the framework Asynchronous libraries No I/O in C Unfair scheduling Using multiple cores 25
28. Take Four: gevent + gunicorn gevent Networking library Uses event loop Synchronous API Synchronous code running asynchronously Monkey patching Rewrites standard modules Coroutines for function context Lightweight threads, no stack greenlet implementation 28
29. Take Four: gevent + gunicorn gunicorn (“Green Unicorn”) Lightweight WSGI server Multiple worker processes Share queued requests gevent support 29
30. Take Four: gevent + gunicorn Advantages Best of both worlds! mod_wsgi Straightforward, synchronous code No framework, just python Multicore support Twisted Shared memory User space context switches Disadvantages Pure-python libraries Unfair scheduling 30
37. “Evented” Development Synchronous code still runs asynchronously Requests aren’t independent Things to keep in mind Duplicate work Socket caching CPU hogging 37
38. gunicorn + gevent in Production Managing gunicorn greins Randall Leeds (tilgovi): github/meebo/greins Multiple apps URL routing Server hooks Worker launch Pre/post requests Daemon interface Debugging gevent gevent-profiler Shaun Lindsay (srlindsay): github/meebo/gevent-profiler Execution trace Time spent 38
IntroductionMatt Spitz. Software Engineer at meebo. Here today to talk about how building web applications in python and the pros/cons of the various means by which we can serve them up.
Users make requests to an application, which uses a shared storage backend.
Same thing, just lots and lots and lots of concurrent requests
With such a large-scale application, small optimizations can have a huge impactSave money on hardware (machines, RAM, CPU)Faster response time, better user experienceHandling more concurrent requestsSubstantially decrease impact on shared resourcesone example of a high-concurrency web application is theadserver we run at meebobefore I talk about the adserver, let me introduce the meebo bar
Themeebo bar is deployed to our partner sites and offers a neat way to share content on the site and allows users to chat with other members of the site.
Show off the chat in the corner, the sharing buttons, and the ad unitCan’t give you numbers, but suffice it to say that any adserver to which you’re making those calls can be considered a “high-concurrency web application”
Selecting the ad a user is most likely to click onServing the most valuable ads (e.g. highest CPC)Respect whatever targeting the advertisers have selectedEnsuring smooth, complete delivery for each ad campaignTheadserver is a pretty complicated beast and I think that going through it wouldn’t really help in making my point for this talk, so I wrote a sample application that has a similar structure and resource-usage patterns
describeJaccardSimilarity (size(intersection(x,y))/size(union(x,y))) super arbitrary, just to represent some CPU processing in the applicationSHOW OFF THE CODE(make sure to show off the user fortune caching)
We’re gonna try four different serving implementations
How difficult is it to write code for these applications?What’s the extent to which these applications allow us to use 3rd party libraries?How efficient is the application in terms of memory?Can we take advantage of multi-core machines?
SHOW OFF THE CODE
Simple to writeRequests don’t affect one another--Need to reload all working set (all fortunes) with each requestNo database connection cachingIt’s a start, but it doesn’t scale
Before I show you a performance graph, want to go over the benchmarks25ms delay on interface between guest and host to exaggerate the effects of I/O on response time
8 processes maximumRequires loading all fortunes with each request
Apache spins up a number of worker processes to handle requestsWorkers handle a configurable number of requests before being replacedWorkers handle exactly one request at a timeMemory is cached in the worker, so we can re-use the set of fortunes between requestsOperating system handles schedulingMAKE SURE TO SHOW OFF THE HANDLER
Using almost the same simple, synchronous code as we had in the CGIMemory is cached across requests in the same workerNo shared memory between workersNeed to load set of all fortunes in each workerMore workers requires more RAMEach worker load requires a DB requestHammers the database on apache restart
Using 8 worker processes
Twisted is an asynchronous framework for building network applications Developer structures code as events and callbacksTwisted orchestrates context switches among requests, typically on things that take a long time (I/O)Twisted server Single event loop => single process Handles multiple requests simultaneously in the event loopAnd since we’re all in one process, memory is shared among requests
Some of this may be review, but it’s important that everyone understands thisBlocking: connect and recv wait until their actions complete before returningNonblocking: connect and recv initiate the action (if it hasn’t been already) and return the data or raise an exception immediatelyRequires a lot more plumbing than the example above
…so let’s go back to this slide (the next one)
Twisted is a framework built around an event loopProvides a nice interface for setting up your functions and callbacks (for success or error)Keeps track of multiple execution paths simultaneously, just as we saw in the previous exampleThe big problem with Twisted is that you can’t just plug in your synchronous app. You have to set up these events and callbacks for every piece of code might block.MAKE SURE TO SHOW OFF THE CODE AND HOW MUCH OF A PAIN IT IS
AdvantagesMemory is shared among requests (we only have to load the fortunes once to service many simultaneous requests)Context switches happen in user space (fast)DisadvantagesNeed to rewrite code to be asynchronous Guido sez: “I hate callback-based programming.” It’s hard to wrap your brain around. stuck in the framework– everything has to be asynchronous, you have to use Twisted’s standard libraries, which may not behave quite as you’d like3rd- party libraries must also be asynchronous No I/O in C libraries (at least not out of the box)CPU-intense requests monopolize the processormod_wsgi: O/S handles scheduling, processes scheduled at any time, and CPU time is shared “fairly” Twisted: CPU scheduled explicitly, CPU-bound blocks of code prevent other requests from runningTaking advantage of multiple cores isn’t trivial-- load balancer? multiprocessing module?
Note that Twisted is running only on a single core
geventNetworking library using libevent Has an event loop, but its API is synchronousTransforms synchronous applications to be asynchronous automatically!!!“Monkey patches” python system modules (socket)Rewrites socket calls to set up a callback and a context after writing the request to the socketFunction context in coroutinesThink of coroutines as lightweight threadsPointer to code + context, no stacke.g. Closures and generatorsUses an event loop to manage all concurrent requestsContext switch on network I/O (just like Twisted)
gunicornFast, lightweight WSGI server written by Benoit Uses multiple workers to handle requestsBig win: Supports gevent workers out of the boxEach worker maintains a pool of coroutines to handle incoming requests Those workers share memory among requestsAt this point, we look at the code.
AdvantagesBest of both worlds!mod_wsgiEasy to writeNo framework to do everything asynchronously, just pythonCan take advantage of multiple coresTwistedShared memory among requests within each workerContext switches in user spaceDisadvantagesSimilar to TwistedNo I/O in C librariesCPU-intense requests monopolize the processor
gunicorn_1 is comparable to TwistedNegligible performance impact when the application is made asynchronous
gunicorn_4-8 is faster than mod_wsgiMaking context switch deterministically and in user space is more efficient than OS scheduling
gevent takes care of transforming synchronous code, but it’s still executed in an event loop Synchronous code is not necessarily executed synchronouslyDuplicated loads: simultaneous database requests 1) no fortunes? load up the fortunes! 2) no fortunes? load up the fortunes! => use “events” to protect duplicate effortsSocket caching: can’t naively cache socketsCan’t use the same socket for two simultaneous operationsMust create a new socket per connection or use a poolCPU hogging Might want to offload CPU-intense things to another daemon/process
Managing gunicorngreinsRandall LeedsEnables running multiple apps in a single gunicorn instanceRoutes traffic based on URLAllows for global and per-app server hooksOn worker startup (preloading a working set)Pre/post requests (Apache-style request logging)Provides standard start/stop/reload/restart interface to gunicornDebugging gevent applicationsgevent-profilerShaun LindsayProvides a linear trace of all function calls and context switchesAnalyzes where CPU time is spent in a given application
Blocking code is easy to understand, but traditional deployments aren’t very efficientAsynchronous applications make the best use of resources, but they’re a pain to writeRunning gevent workers in gunicorn is both simple and efficient, as it allows you to write blocking code that is converted to be asynchronous automatically.At meebo, we've found this setup to be amazingly efficient and reliable, even under extreme loadA number of our mission-critical, high-concurrency web applications have been running under this setup for the last 7 months with no major issues or outages. Been able to save money on hardware with no impact on response time…we even got a Halloween costume out of it.