Presented at PyCon UK 2018 (18 September 2018, Cardiff).
The slides are incomplete.
Recording available at:
https://www.youtube.com/watch?v=-weU0Zy4Yd8
From an idea to production: building a recommender for BBC SoundsTatiana Al-Chueyr
This presentation was given on the 28th of September 2021 at the first MLOps London Meetup
Event website: https://www.meetup.com/mlopslondon/events/280295841/
Presentation given on the 21st of September 2021 at the London Beam Meet-up
Event website: https://www.meetup.com/London-Apache-Beam-Meetup/events/280442419/
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms.
Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second.
In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including:
safe deployment to dozens of geographic locations,
resource autoscaling to minimize processing costs,
separating compute and state storage for better scaling behavior,
dynamic work rebalancing of work items away from overutilized worker nodes,
offering a throughput-optimized batch processing capability with the same API as streaming,
grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system,
integrating with the Google Cloud security ecosystem, and other lessons.
Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.
From an idea to production: building a recommender for BBC SoundsTatiana Al-Chueyr
This presentation was given on the 28th of September 2021 at the first MLOps London Meetup
Event website: https://www.meetup.com/mlopslondon/events/280295841/
Presentation given on the 21st of September 2021 at the London Beam Meet-up
Event website: https://www.meetup.com/London-Apache-Beam-Meetup/events/280442419/
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms.
Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second.
In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including:
safe deployment to dozens of geographic locations,
resource autoscaling to minimize processing costs,
separating compute and state storage for better scaling behavior,
dynamic work rebalancing of work items away from overutilized worker nodes,
offering a throughput-optimized batch processing capability with the same API as streaming,
grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system,
integrating with the Google Cloud security ecosystem, and other lessons.
Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Dan Halperin
Apache Beam (incubating) is a unified batch and streaming data processing programming model that is efficient and portable. Beam evolved from a decade of system-building at Google, and Beam pipelines run today on both open source (Apache Flink, Apache Spark) and proprietary (Google Cloud Dataflow) runners. This talk will focus on I/O and connectors in Apache Beam, specifically its APIs for efficient, parallel, adaptive I/O. Google will discuss how these APIs enable a Beam data processing pipeline runner to dynamically rebalance work at runtime, to work around stragglers, and to automatically scale up and down cluster size as a job’s workload changes. Together these APIs and techniques enable Apache Beam runners to efficiently use computing resources without compromising on performance or correctness. Practical examples and a demonstration of Beam will be included.
Capacity Planning Infrastructure for Web Applications (Drupal)Ricardo Amaro
In this session we will try to solve a couple of recurring problems:
Site Launch and User expectations
Imagine a customer that provides a set of needs for hardware, sets a date and launches the site, but then he forgets to warn that they have sent out some (thousands of) emails to half the world announcing their new website launch! What do you think it will happen?
Of course launching a Drupal Site involves a lot of preparation steps and there are plenty of guides out there about common Drupal Launch Readiness Checklists which is not a problem anymore.
What we are really missing here is a Plan for Capacity.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
H2O World - Munging, modeling, and pipelines using Python - Hank RoarkSri Ambati
H2O World 2015 - Hank Roark
Hank's iPython Notebook for this presentation can be found here: https://github.com/h2oai/h2o-world-2015-training/blob/master/tutorials/python-munging-modeling-pipelines/Munging-Modeling-Pipelines-Using-H2O-Pipelines.ipynb
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
Apache Flink is a community-driven open source and memory-centric Big Data analytics framework. It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases.
Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table).
At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly).
In this talk, you will learn in more details about:
What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks?
How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment?
Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
This talk was given at Capital One on September 15, 2015 at the launch of the Washington DC Area Apache Flink Meetup. Apache flink is positioned at the forefront of 2 major trends in Big Data Analytics:
- Unification of Batch and Stream processing
- Multi-purpose Big Data Analytics frameworks
In these slides, we will also find answers to the burning question: Why Apache Flink? You will also learn more about how Apache Flink compares to Hadoop MapReduce, Apache Spark and Apache Storm.
How to Improve Performance Testing Using InfluxDB and Apache JMeterInfluxData
Apache JMeter is a useful way to run performance tests across different servers. In order to monitor these results, SAP chose to integrate JMeter with InfluxDB, their time series database, to collect and store the temporary transactions. They use Grafana to visualize real-time performance metrics. What happens if your database goes down – for any reason? It could be because of too many JMeter threads trying to access the database or because Grafana is trying to access too many cores of transactions during a performance test. Discover how SAP improves their performance monitoring team’s productivity.
In this webinar, Subhodeep Ganguly will cover:
SAP’s approach to recovering transactions due to database failure
How JMeter execution threads will store the data in a temporary flat/CSV file compatible with InfluxDB
Their ability to reduce recovery times and to improve automatic performance testing
Usage of influx-replay tool as a plugin or compact jar file during the execution of an end-to-end performance test
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...Flink Forward
http://flink-forward.org/kb_sessions/apache-beam-a-unified-model-for-batch-and-streaming-data-processing/
Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business, and consumers of these datasets have detailed requirements for latency, cost, and completeness. Apache Beam (incubating) defines a new data processing programming model that evolved from more than a decade of experience within Google, including MapReduce, FlumeJava, MillWheel, and Cloud Dataflow. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, et al.) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, touch on its evolution, describe main concepts in the programming model, and compare with similar systems. We’ll go from a simple scenario to a relatively complex data processing pipeline, and finally demonstrate execution of that pipeline on multiple runtimes.
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
A report covers the functional comparison and performance evaluation between Apache Flink, Apache Spark Streaming, Apache Storm and Apache Gearpump(incubating)
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
For the past two years, the open-source Hopsworks platform has used Spark to distribute hyperparameter optimization tasks for Machine Learning. Hopsworks provides some basic optimizers (gridsearch, randomsearch, differential evolution) to propose combinations of hyperparameters (trials) that are run synchronously in parallel on executors as map functions. However, many such trials perform poorly, and we waste a lot of CPU and harware accelerator cycles on trials that could be stopped early, freeing up the resources for other trials.
In this talk, we present our work on Maggy, an open-source asynchronous hyperparameter optimization framework built on Spark that transparently schedules and manages hyperparameter trials, increasing resource utilization, and massively increasing the number of trials that can be performed in a given period of time on a fixed amount of resources. Maggy is also used to support parallel ablation studies using Spark. We have commercial users evaluating Maggy and we will report on the gains they have seen in reduced time to find good hyperparameters and improved utilization of GPU hardware. Finally, we will perform a live demo on a Jupyter notebook, showing how to integrate maggy in existing PySpark applications.
First-ever scalable, distributed deep learning architecture using Spark & Tac...Arimo, Inc.
This talk was first presented at the 2015 Strata+Hadoop World NYC (http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/43484)
Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.
In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.
The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Spark with Tachyon.
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
This talk was presented at SRE NYC Meetup on August 16, 2017 at Squarespace HQ.
https://www.youtube.com/watch?v=UJ1QAKprVr4
As the engineering teams at Squarespace grow, we have been building more and more microservices. However, this has added operational strain as we try to shoehorn a growing, complex dynamic environment into our static data center infrastructure. We needed to rethink how we handle deployments, dependency management, resource allocation, monitoring, and alerting. Docker containerization and Kubernetes orchestration helps us tackle many of these problems, but the journey has been challenging. In this talk, we’ll discuss the challenges of running Kubernetes in a datacenter and how we switched to a more SLA-focused alert structure than per instance health with Prometheus and AlertManager.
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
In this presentation I talk about our motivation to converting our microservices to run on Kubernetes. I discuss many of the technical challenges we encountered along the way, including networking issues, Java issues, monitoring and alerting, and managing all of our resources!
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Dan Halperin
Apache Beam (incubating) is a unified batch and streaming data processing programming model that is efficient and portable. Beam evolved from a decade of system-building at Google, and Beam pipelines run today on both open source (Apache Flink, Apache Spark) and proprietary (Google Cloud Dataflow) runners. This talk will focus on I/O and connectors in Apache Beam, specifically its APIs for efficient, parallel, adaptive I/O. Google will discuss how these APIs enable a Beam data processing pipeline runner to dynamically rebalance work at runtime, to work around stragglers, and to automatically scale up and down cluster size as a job’s workload changes. Together these APIs and techniques enable Apache Beam runners to efficiently use computing resources without compromising on performance or correctness. Practical examples and a demonstration of Beam will be included.
Capacity Planning Infrastructure for Web Applications (Drupal)Ricardo Amaro
In this session we will try to solve a couple of recurring problems:
Site Launch and User expectations
Imagine a customer that provides a set of needs for hardware, sets a date and launches the site, but then he forgets to warn that they have sent out some (thousands of) emails to half the world announcing their new website launch! What do you think it will happen?
Of course launching a Drupal Site involves a lot of preparation steps and there are plenty of guides out there about common Drupal Launch Readiness Checklists which is not a problem anymore.
What we are really missing here is a Plan for Capacity.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
H2O World - Munging, modeling, and pipelines using Python - Hank RoarkSri Ambati
H2O World 2015 - Hank Roark
Hank's iPython Notebook for this presentation can be found here: https://github.com/h2oai/h2o-world-2015-training/blob/master/tutorials/python-munging-modeling-pipelines/Munging-Modeling-Pipelines-Using-H2O-Pipelines.ipynb
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
Apache Flink is a community-driven open source and memory-centric Big Data analytics framework. It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases.
Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table).
At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly).
In this talk, you will learn in more details about:
What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks?
How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment?
Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
This talk was given at Capital One on September 15, 2015 at the launch of the Washington DC Area Apache Flink Meetup. Apache flink is positioned at the forefront of 2 major trends in Big Data Analytics:
- Unification of Batch and Stream processing
- Multi-purpose Big Data Analytics frameworks
In these slides, we will also find answers to the burning question: Why Apache Flink? You will also learn more about how Apache Flink compares to Hadoop MapReduce, Apache Spark and Apache Storm.
How to Improve Performance Testing Using InfluxDB and Apache JMeterInfluxData
Apache JMeter is a useful way to run performance tests across different servers. In order to monitor these results, SAP chose to integrate JMeter with InfluxDB, their time series database, to collect and store the temporary transactions. They use Grafana to visualize real-time performance metrics. What happens if your database goes down – for any reason? It could be because of too many JMeter threads trying to access the database or because Grafana is trying to access too many cores of transactions during a performance test. Discover how SAP improves their performance monitoring team’s productivity.
In this webinar, Subhodeep Ganguly will cover:
SAP’s approach to recovering transactions due to database failure
How JMeter execution threads will store the data in a temporary flat/CSV file compatible with InfluxDB
Their ability to reduce recovery times and to improve automatic performance testing
Usage of influx-replay tool as a plugin or compact jar file during the execution of an end-to-end performance test
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...Flink Forward
http://flink-forward.org/kb_sessions/apache-beam-a-unified-model-for-batch-and-streaming-data-processing/
Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business, and consumers of these datasets have detailed requirements for latency, cost, and completeness. Apache Beam (incubating) defines a new data processing programming model that evolved from more than a decade of experience within Google, including MapReduce, FlumeJava, MillWheel, and Cloud Dataflow. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, et al.) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, touch on its evolution, describe main concepts in the programming model, and compare with similar systems. We’ll go from a simple scenario to a relatively complex data processing pipeline, and finally demonstrate execution of that pipeline on multiple runtimes.
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
A report covers the functional comparison and performance evaluation between Apache Flink, Apache Spark Streaming, Apache Storm and Apache Gearpump(incubating)
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
For the past two years, the open-source Hopsworks platform has used Spark to distribute hyperparameter optimization tasks for Machine Learning. Hopsworks provides some basic optimizers (gridsearch, randomsearch, differential evolution) to propose combinations of hyperparameters (trials) that are run synchronously in parallel on executors as map functions. However, many such trials perform poorly, and we waste a lot of CPU and harware accelerator cycles on trials that could be stopped early, freeing up the resources for other trials.
In this talk, we present our work on Maggy, an open-source asynchronous hyperparameter optimization framework built on Spark that transparently schedules and manages hyperparameter trials, increasing resource utilization, and massively increasing the number of trials that can be performed in a given period of time on a fixed amount of resources. Maggy is also used to support parallel ablation studies using Spark. We have commercial users evaluating Maggy and we will report on the gains they have seen in reduced time to find good hyperparameters and improved utilization of GPU hardware. Finally, we will perform a live demo on a Jupyter notebook, showing how to integrate maggy in existing PySpark applications.
First-ever scalable, distributed deep learning architecture using Spark & Tac...Arimo, Inc.
This talk was first presented at the 2015 Strata+Hadoop World NYC (http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/43484)
Deep learning algorithms have been widely used in many real-world applications, including computer vision, machine translation, and fraud detection. Unfortunately, deep learning only works best when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data, and have been used extensively. By having deep learning available on Spark, businesses can fully take advantage of deep learning capabilities on their datasets using their existing Spark infrastructure.
In this talk, we present a scalable implementation of predictive deep learning algorithms on Spark, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This, to our best knowledge, is the first successful implementation of CNNs and RNNs on Spark. To support big model training, we use Tachyon as common storage layers between the Spark workers. With its in-memory distributed execution model, Tachyon provides a scalable approach even when the model is too big to be handled on a single machine. Our solution also exploits graphical processing units (GPUs) for matrix computation whenever they are available on worker nodes, further improving execution time.
The attendees will learn about deep learning models, the architecture of the system, and how to train and run deep learning models on Spark with Tachyon.
Kubernetes @ Squarespace: Kubernetes in the DatacenterKevin Lynch
This talk was presented at SRE NYC Meetup on August 16, 2017 at Squarespace HQ.
https://www.youtube.com/watch?v=UJ1QAKprVr4
As the engineering teams at Squarespace grow, we have been building more and more microservices. However, this has added operational strain as we try to shoehorn a growing, complex dynamic environment into our static data center infrastructure. We needed to rethink how we handle deployments, dependency management, resource allocation, monitoring, and alerting. Docker containerization and Kubernetes orchestration helps us tackle many of these problems, but the journey has been challenging. In this talk, we’ll discuss the challenges of running Kubernetes in a datacenter and how we switched to a more SLA-focused alert structure than per instance health with Prometheus and AlertManager.
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
In this presentation I talk about our motivation to converting our microservices to run on Kubernetes. I discuss many of the technical challenges we encountered along the way, including networking issues, Java issues, monitoring and alerting, and managing all of our resources!
BPF & Cilium - Turning Linux into a Microservices-aware Operating SystemThomas Graf
Container runtimes cause Linux to return to its original purpose: to serve applications interacting directly with the kernel. At the same time, the Linux kernel is traditionally difficult to change and its development process is full of myths. A new efficient in-kernel programming language called eBPF is changing this and allows everyone to extend existing kernel components or glue them together in new forms without requiring to change the kernel itself.
This presentation introduces Data Plane Development Kit overview and basics. It is a part of a Network Programming Series.
First, the presentation focuses on the network performance challenges on the modern systems by comparing modern CPUs with modern 10 Gbps ethernet links. Then it touches memory hierarchy and kernel bottlenecks.
The following part explains the main DPDK techniques, like polling, bursts, hugepages and multicore processing.
DPDK overview explains how is the DPDK application is being initialized and run, touches lockless queues (rte_ring), memory pools (rte_mempool), memory buffers (rte_mbuf), hashes (rte_hash), cuckoo hashing, longest prefix match library (rte_lpm), poll mode drivers (PMDs) and kernel NIC interface (KNI).
At the end, there are few DPDK performance tips.
Tags: access time, burst, cache, dpdk, driver, ethernet, hub, hugepage, ip, kernel, lcore, linux, memory, pmd, polling, rss, softswitch, switch, userspace, xeon
deep understanding of howto packet would reach to destination and basic understanding of network protocols.
learn howto manipulate with linux network and know howto manipulate with linux iptables.
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterIvan Babrou
Presented at LISA18: https://www.usenix.org/conference/lisa18/presentation/huynh
While there are plenty of readily available metrics for monitoring Linux kernel, many gems remain hidden. With the help of recent developments in eBPF, it is now possible to run safe programs in the kernel to collect arbitrary information with little to no overhead. A few examples include:
* Disk latency and io size histograms
* Run queue (scheduler) latency
* Page cache efficiency
* Directory cache efficiency
* LLC (aka L3 cache) efficiency
* Kernel timer counters
* System-wide TCP retransmits
Practically any event from "perf list" output and any kernel function can be traced, analyzed and turned into a Prometheus metric with almost arbitrary labels attached to it.
If you are already familiar with BCC tools, you may think if ebpf_exporter as bcc tools turned into prometheus metrics.
In this tutorial we’ll go over eBPF basics, how to write programs and get insights into a running system.
Using BigBench to compare Hive and Spark (short version)Nicolas Poggi
BigBench is the brand new standard for benchmarking and testing Big Data systems. This talk first introduces BigBench and how problems can it solve. Then, presents both Hive and Spark benchmark results with with their respective 1 and 2 versions under different configurations. Results are further classified by use cases, showing where each platform shines (or doesn't), and why, based on performance metrics and log-file analysis. The talk concludes with the main findings, the scalability and limits of each framework.
A Kernel of Truth: Intrusion Detection and Attestation with eBPFoholiab
"Attestation is hard" is something you might hear from security researchers tracking nation states and APTs, but it's actually pretty true for most network-connected systems!
Modern deployment methodologies mean that disparate teams create workloads for shared worker-hosts (ranging from Jenkins to Kubernetes and all the other orchestrators and CI tools in-between), meaning that at any given moment your hosts could be running any one of a number of services, connecting to who-knows-what on the internet.
So when your network-based intrusion detection system (IDS) opaquely declares that one of these machines has made an "anomalous" network connection, how do you even determine if it's business as usual? Sure you can log on to the host to try and figure it out, but (in case you hadn't noticed) computers are pretty fast these days, and once the connection is closed it might as well not have happened... Assuming it wasn't actually a reverse shell...
At Yelp we turned to the Linux kernel to tell us whodunit! Utilizing the Linux kernel's eBPF subsystem - an in-kernel VM with syscall hooking capabilities - we're able to aggregate metadata about the calling process tree for any internet-bound TCP connection by filtering IPs and ports in-kernel and enriching with process tree information in userland. The result is "pidtree-bcc": a supplementary IDS. Now whenever there's an alert for a suspicious connection, we just search for it in our SIEM (spoiler alert: it's nearly always an engineer doing something "innovative")! And the cherry on top? It's stupid fast with negligible overhead, creating a much higher signal-to-noise ratio than the kernels firehose-like audit subsystems.
This talk will look at how you can tune the signal-to-noise ratio of your IDS by making it reflect your business logic and common usage patterns, get more work done by reducing MTTR for false positives, use eBPF and the kernel to do all the hard work for you, accidentally load test your new IDS by not filtering all RFC-1918 addresses, and abuse Docker to get to production ASAP!
As well as looking at some of the technologies that the kernel puts at your disposal, this talk will also tell pidtree-bcc's road from hackathon project to production system and how focus on demonstrating business value early on allowed the organization to give us buy-in to build and deploy a brand new project from scratch.
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...DevSecCon
Matt Carroll
Infrastructure Security Engineer at Yelp
"Attestation is hard" is something you might hear from security researchers tracking nation states and APTs, but it's actually pretty true for most network-connected systems!
Modern deployment methodologies mean that disparate teams create workloads for shared worker-hosts (ranging from Jenkins to Kubernetes and all the other orchestrators and CI tools in-between), meaning that at any given moment your hosts could be running any one of a number of services, connecting to who-knows-what on the internet.
So when your network-based intrusion detection system (IDS) opaquely declares that one of these machines has made an "anomalous" network connection, how do you even determine if it's business as usual? Sure you can log on to the host to try and figure it out, but (in case you hadn't noticed) computers are pretty fast these days, and once the connection is closed it might as well not have happened... Assuming it wasn't actually a reverse shell...
At Yelp we turned to the Linux kernel to tell us whodunit! Utilizing the Linux kernel's eBPF subsystem - an in-kernel VM with syscall hooking capabilities - we're able to aggregate metadata about the calling process tree for any internet-bound TCP connection by filtering IPs and ports in-kernel and enriching with process tree information in userland. The result is "pidtree-bcc": a supplementary IDS. Now whenever there's an alert for a suspicious connection, we just search for it in our SIEM (spoiler alert: it's nearly always an engineer doing something "innovative")! And the cherry on top? It's stupid fast with negligible overhead, creating a much higher signal-to-noise ratio than the kernels firehose-like audit subsystems.
This talk will look at how you can tune the signal-to-noise ratio of your IDS by making it reflect your business logic and common usage patterns, get more work done by reducing MTTR for false positives, use eBPF and the kernel to do all the hard work for you, accidentally load test your new IDS by not filtering all RFC-1918 addresses, and abuse Docker to get to production ASAP!
As well as looking at some of the technologies that the kernel puts at your disposal, this talk will also tell pidtree-bcc's road from hackathon project to production system and how focus on demonstrating business value early on allowed the organization to give us buy-in to build and deploy a brand new project from scratch.
Choosing a communication platform is an important decision. From simple two-way communication to complex multi-node architectures, ZeroMQ, the embeddable networking library, helps provide a safe, fast and reliable communication medium.
This webinar will give you an overview of the ZeroMQ architecture, explaining the advantages and exploring usage patterns and cross-platform capabilities. We'll also go through examples of the patterns using different languages, including C++, Swift, Python and C.
KubeCon EU 2016 Keynote: Pushing Kubernetes ForwardKubeAcademy
The Kubernetes community has aspirations of becoming the Linux kernel of distributed systems. Together we want to build a scalable, stable, and secure platform for distributed system that is the ubiquitous choice for people building server infrastructure. This talk will discuss the major community efforts made in recent months to deliver this goal and the work we need to do to continue our momentum.
Sched Link: http://sched.co/68lU
VISUG - Approaches for application request throttlingMaarten Balliauw
Speaking from experience building a SaaS: users are insane. If you are lucky, they use your service, but in reality, they probably abuse. Crazy usage patterns resulting in more requests than expected, request bursts when users come back to the office after the weekend, and more! These all pose a potential threat to the health of our web application and may impact other users or the service as a whole. Ideally, we can apply some filtering at the front door: limit the number of requests over a given timespan, limiting bandwidth, ...
In this talk, we’ll explore the simple yet complex realm of rate limiting. We’ll go over how to decide on which resources to limit, what the limits should be and where to enforce these limits – in our app, on the server, using a reverse proxy like Nginx or even an external service like CloudFlare or Azure API management. The takeaway? Know when and where to enforce rate limits so you can have both a happy application as well as happy customers.
Talk given at the London AICamp meet up on the 13 July 2023. It's an introduction on building open-source ChatGPT-like chat bots and some of the considerations to have while training/tuning them using Airflow.
Presentation given on the 15th July 2021 at the Airflow Summit 2021
Conference website: https://airflowsummit.org/sessions/2021/clearing-airflow-obstructions/
Recording: https://www.crowdcast.io/e/airflowsummit2021/40
Artificial intelligence breaks into our lives. In the future, everything will probably be clear, but so far, some questions have arisen, and increasingly these issues affect aspects of morality and ethics. Which principles do we need to keep in mind while surfacing machine learning algorithms? How the editorial team affects the day to day development of applications at BBC?
Place: Kharkiv National University of Radio Electronics, Ukraine
When: 17th November 2019.
Presentation about some common mistakes English learners make - and how it is possible to try to identify part of them automatically (spelling, capitalization and article). This presentation was made during PyCon SK on the 12th of March 2016. Many of the results are due to the partnership of the University of Cambridge and Education First.
Slides presenting some numbers of PythonBrasil[8] conference (PyCon Brasil), that happened in Rio de Janeiro, during November 2012. Authors: @tati_alchueyr and @turicas
Desarollando aplicaciones móviles con Python y AndroidTatiana Al-Chueyr
Charla presentada en PyConAr 2011 (Junín, Argentina), acerca como desarollar aplicaciones móviles con Python y Android.
El código de ejemplo puede ser bajado en:
http://github.com/tatiana/pyandroid
Transifex: Ensinando o seu Software Público a falar novos idiomasTatiana Al-Chueyr
(Portuguese)
Presentation related to Transifex.net, Public Software Portal and InVesalius. It shows the improvements in the translation process of InVesalius after using Transifex.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
PyConUK 2018 - Journey from HTTP to gRPC
1. A journey from HTTP
to gRPC
Tati Al-Chueyr
@tati_alchueyr
PyCon UK 2018
15 September 2018, Cardiff
2. tati_alchueyr.__doc__
● computer engineer by Unicamp
● senior data engineer at BBC
(previously engineer at EF, globo.com &
Ministry of Science and Technology of Brazil)
● open source enthusiast
● pythonist since 2003
● Amanda’s mummy
3. bbc.datalab.mission
“Bring the BBC’s data together accessible
through a common platform, along with flexible
and scalable tools to support machine learning
to enable content enrichment and deeper
personalization”
16. the recommender platform started being built
view orchestrate recommend
content database
the first
generation of
microservices
was developed
using
version
1
17. the microservices started to grow up
view orchestrate recommend
content database
they happily
learned
how to talk to
each other using
JSON over
HTTP
http http http
http http
json json json
json
json
version
1
18. and their contracts were defined using swagger
view orchestrate recommend
content database
version
1
19. however...
view orchestrate recommend
content database
althought for
some tasks they
performed well,
for others they
would have
huge latencies
if there were
concurrent
users
version
1
20. import time
from flask import Flask
app = Flask(__name__)
@app.route("/io")
def io():
time.sleep(2)
return "IO bound task completed"
@app.route("/cpu")
def cpu():
total = 0
for i in range(19840511):
total += i*i
return "CPU bound task completed"
$ FLASK_APP=flask_api.py flask runFlask==1.0.2
flask_api.py
21. $ boom http://127.0.0.1:5000/io -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 2.0495 s
Average 2.0302 s
Fastest 2.0227 s
Slowest 2.0377 s
Amplitude 0.0150 s
Standard deviation 0.004716
RPS 4
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
22. $ boom http://127.0.0.1:5000/cpu -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 19.2738 s
Average 18.8230 s
Fastest 18.1451 s
Slowest 19.2623 s
Amplitude 1.1172 s
Standard deviation 0.379198
RPS 0
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
23. why?
Flask uses Werkzeug as WSGI
● By default Werkzeug uses threads
Python has GIL (Global Interpreter Lock)
● Only one thread can execute python bytecode a time
● the threads fight for CPU - blocking one another all the way,
returning the response to the user almost in the same time
● This does not affect I/O - but does affect computations
26. a gunicorn microservice
version
2
micoservice master
worker
worker
worker
n
and with
synchronous
gunicorn
workers, each
microservice
could now
respond to n
concurrent
requests
s
s
s
27. import time
from flask import Flask
app = Flask(__name__)
@app.route("/io")
def io():
time.sleep(2)
return "IO bound task completed"
@app.route("/cpu")
def cpu():
total = 0
for i in range(19840511):
total += i*i
return "CPU bound task completed"
$ gunicorn --bind 0.0.0.0:5000 --workers 1 flask_api:appFlask==1.0.2
gunicorn==19.9.0
flask_api.py
unchanged
changed
28. $ boom http://127.0.0.1:5000/io -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 20.0589 s
Average 11.0316 s
Fastest 2.0171 s
Slowest 20.0579 s
Amplitude 18.0407 s
Standard deviation 5.754321
RPS 0
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
1 x
s
29. $ boom http://127.0.0.1:5000/io -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 10.0679 s
Average 6.0398 s
Fastest 2.0494 s
Slowest 10.0412 s
Amplitude 7.9918 s
Standard deviation 2.823781
RPS 0
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
2 x
s
30. $ boom http://127.0.0.1:5000/io -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 2.0541 s
Average 2.0300 s
Fastest 2.0233 s
Slowest 2.0432 s
Amplitude 0.0199 s
Standard deviation 0.007658
RPS 4
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
10 x
s
31. $ boom http://127.0.0.1:5000/cpu -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 18.3119 s
Average 10.0470 s
Fastest 1.8415 s
Slowest 18.3008 s
Amplitude 16.4593 s
Standard deviation 5.244141
RPS 0
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
1 x
s
32. $ boom http://127.0.0.1:5000/cpu -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 10.5875 s
Average 6.2530 s
Fastest 2.8555 s
Slowest 10.5788 s
Amplitude 7.7232 s
Standard deviation 2.670508
RPS 0
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
3 x
s
33. $ boom http://127.0.0.1:5000/cpu -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 10.1825 s
Average 7.6599 s
Fastest 5.9517 s
Slowest 10.1776 s
Amplitude 4.2260 s
Standard deviation 2.038286
RPS 0
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index
5 x
s
34. why?
Gunicorn, by default, spawns synchronous processes for each
worker
● the number of workers limits the amount of concurrent requests, for
this reason I/O was affected in a negative way (compared to
previous “no limit”)
● for CPU, there is a significant improvement to pure Flask/Werkzeug,
since the service can now handle requests concurrently without
having to wait for all of them to finish
● There is a limit of the # workers (suggested: (2 x $num_cores) + 1).
After a point they will start thrashing system resources decreasing
the throughput of the entire system
36. stronger forces decided to replace them by
recommend
content database
version
1
view orchestrate
with the
promise of
higher
performance
and
free type
checking
37. along came the new generation...
view orchestrate recommend
content database
there was a
learning curve
towards gRPC
version
3
38. the microservices started to grow up
view orchestrate recommend
content database
and in two
months time
most of the
microservices
started talking
protocol buffers
over tcp
http tcp tcp
tcp http
json pb3 pb3
pb3
json
version
3
pb3
json
39. syntax = "proto3";
message Empty {
}
service Sample {
rpc IntenseProcess(Empty) returns (Empty) {}
rpc IntenseIO(Empty) returns (Empty) {}
}
grpc.proto
$ pip install grpcio==1.15.0 grpcio-tools==1.15.0
$ python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=.
grpc.proto
$ ls
grpc_pb2.py
grpc_pb2_grpc.py The 2 in pb2 indicates that the generated code
is following Protocol Buffers Python API
version 2. It has no relation to the Protocol
Buffers Language version, which is the one
indicated by syntax the .proto file.
40. grpc_server.py
$ python grpc_server.py
grpcio==1.15.0
grpcio-tools==1.15.0
import time
from concurrent import futures
import grpc
import grpc_pb2
import grpc_pb2_grpc
class SampleServicer(grpc_pb2_grpc.SampleServicer):
def IntenseProcess(self, request, context):
total = 0
for i in range(19840511):
total += i*i
return grpc_pb2.Empty()
def IntenseIO(self, request, context):
time.sleep(2)
return grpc_pb2.Empty()
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
grpc_pb2_grpc.add_SampleServicer_to_server(SampleServicer(), server)
print('Starting server. Listening on port 5001.')
server.add_insecure_port('[::]:50051')
server.start()
try:
while True:
time.sleep(86400)
except KeyboardInterrupt:
server.stop(0)
48. import time
from flask import Flask
app = Flask(__name__)
@app.route("/io")
def io():
time.sleep(2)
return "IO bound task completed"
@app.route("/cpu")
def cpu():
total = 0
for i in range(19840511):
total += i*i
return "CPU bound task completed"
$ gunicorn --bind 0.0.0.0:5000 --workers 1 flask_api:app
-k gevent
Flask==1.0.2
gunicorn==19.9.0
flask_api.py
unchanged
changed
49. $ boom http://127.0.0.1:5000/io -c 10 -n 10
Running 10 queries - concurrency 10
-------- Results --------
Successful calls 10
Total time 2.0366 s
Average 2.0219 s
Fastest 2.0185 s
Slowest 2.0252 s
Amplitude 0.0066 s
Standard deviation 0.001698
RPS 4
BSI :(
-------- Status codes --------
Code 200 10 times.
-------- Legend --------
RPS: Request Per Second
BSI: Boom Speed Index