The document provides an overview of Amazon EC2 instance types and best practices for optimizing performance. It discusses factors to consider when choosing an EC2 instance, how instances deliver performance and flexibility, and tips for making the most of different instance types. The document reviews EC2 instance history, describes virtual CPUs and resource allocation, and provides guidance on topics like NUMA, hugepages, operating systems, and hardware aspects that impact performance.
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
At Netflix, we make the best use of Amazon EC2 instance types and features to create a high- performance cloud, achieving near bare-metal speed for our workloads. This session summarizes the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and helps you improve performance, reduce latency outliers, and make better use of EC2 features. We show how to choose EC2 instance types, how to choose between Xen modes (HVM, PV, or PVHVM), and the importance of EC2 features such SR-IOV for bare-metal performance. We also cover basic and advanced kernel tuning and monitoring, including the use of Java and Node.js flame graphs and performance counters.
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
Kubernetes is the most popular container orchestration system that is natively designed for Cloud. At Lyft and Cloudera, we have both emerged the next-generation, cloud-native infrastructure based on Kubernetes, which supports various distributed workloads.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
Talk presented at Velocity 2013. Description: When faced with performance issues on complex production systems and distributed cloud environments, it can be difficult to know where to begin your analysis, or to spend much time on it when it isn’t your day job. This talk covers various methodologies, and anti-methodologies, for systems analysis, which serve as guidance for finding fruitful metrics from your current performance monitoring products. Such methodologies can help check all areas in an efficient manner, and find issues that can be easily overlooked, especially for virtualized environments which impose resource controls. Some of the tools and methodologies covered, including the USE Method, were developed by the speaker and have been used successfully in enterprise and cloud environments.
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
This presents the CephFS performance scalability and evaluation results. Specifically, it addresses some technical issues such as multi core scalability, cache size, static pinning, recovery, and QoS.
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
Talk by for Velocity 2017 by Brendan Gregg: Performance analysis superpowers with Linux eBPF.
"Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will investigate this new technology, which sooner or later will be available to everyone who uses Linux. The talk will dive deep on these new tracing, observability, and debugging capabilities. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
At Netflix, we make the best use of Amazon EC2 instance types and features to create a high- performance cloud, achieving near bare-metal speed for our workloads. This session summarizes the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and helps you improve performance, reduce latency outliers, and make better use of EC2 features. We show how to choose EC2 instance types, how to choose between Xen modes (HVM, PV, or PVHVM), and the importance of EC2 features such SR-IOV for bare-metal performance. We also cover basic and advanced kernel tuning and monitoring, including the use of Java and Node.js flame graphs and performance counters.
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
Kubernetes is the most popular container orchestration system that is natively designed for Cloud. At Lyft and Cloudera, we have both emerged the next-generation, cloud-native infrastructure based on Kubernetes, which supports various distributed workloads.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Stop the Guessing: Performance Methodologies for Production SystemsBrendan Gregg
Talk presented at Velocity 2013. Description: When faced with performance issues on complex production systems and distributed cloud environments, it can be difficult to know where to begin your analysis, or to spend much time on it when it isn’t your day job. This talk covers various methodologies, and anti-methodologies, for systems analysis, which serve as guidance for finding fruitful metrics from your current performance monitoring products. Such methodologies can help check all areas in an efficient manner, and find issues that can be easily overlooked, especially for virtualized environments which impose resource controls. Some of the tools and methodologies covered, including the USE Method, were developed by the speaker and have been used successfully in enterprise and cloud environments.
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
This presents the CephFS performance scalability and evaluation results. Specifically, it addresses some technical issues such as multi core scalability, cache size, static pinning, recovery, and QoS.
Cgroups, namespaces and beyond: what are containers made from?Docker, Inc.
Linux containers are different from Solaris Zones or BSD Jails: they use discrete kernel features like cgroups, namespaces, SELinux, and more. We will describe those mechanisms in depth, as well as demo how to put them together to produce a container. We will also highlight how different container runtimes compare to each other.
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
In the networking world there are a number of ways to increase performance over naive use of basic Berkeley sockets. These techniques have ranged from polling blocking sockets, non-blocking sockets controlled by Epoll, all the way through completely bypassing the Linux kernel for maximum network performance where you talk directly to the network interface card by using something like DPDK or Netmap. All these tools have their place, and generally occupy a space from convenience to performance. But in recent years, that landscape has changed massively.. The tools available to the average Linux systems developer have improved from the creation of io_uring, to the expansion of bpf from a simple filtering language to a full-on programming environment embedded directly in the kernel. Along with that came something called XDP (express datapath). This was Linux kernel's answer to kernel-bypass networking. AF_XDP is the new socket type created by this feature, and generally works very similarly to something like DPDK. History lessons out of the way, this talk will look into, and discuss the merits of this technology, it's place in the broader ecosystem and how it can be used to attain the highest level of performance possible. This talk will dive into crucial details, such as how AF_XDP works, how it can be integrated into a larger system and finally more advanced topics such as request sharding/load balancing. There will be detailed look at the design of AF_XDP, the eBpf code used, as well as the userspace code required to drive it all. It will also include performance numbers from this setup compared to regular kernel networking. And most importantly how to put all this together to handle as much data as possible on a single modern multi-core system.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
AWS provides customers a fully managed service for running Apache Flink applications called Amazon Kinesis Data Analytics. Running a hosted service for Apache Flink applications that accept arbitrary Java code from customers can pose unique challenges. One such challenge is issue attribution. How do you determine whether runtime errors are due to errors in a customer’s code or problems with the underlying infrastructure? In this talk, we will describe how we automatically categorize errors and either programmatically or manually intervene to restore application availability. This is critical for us to ensure availability of application that is expected of an AWS service and also maintain sustainable operations. We will review how we improved the Apache Flink engine to crisply disambiguate issues, design choices we made and how it helped us in ensuring highly available Apache Flink applications.
High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
Introduction to high performance computing, what is it, how to use it and when to use what. Provides a detailed checklist how to build pipelines and tips to optimize cluster usage and reduce waiting time in queue. It also provides a quick overview of resources available in Compute Canada.
This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
Cgroups, namespaces and beyond: what are containers made from?Docker, Inc.
Linux containers are different from Solaris Zones or BSD Jails: they use discrete kernel features like cgroups, namespaces, SELinux, and more. We will describe those mechanisms in depth, as well as demo how to put them together to produce a container. We will also highlight how different container runtimes compare to each other.
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
In the networking world there are a number of ways to increase performance over naive use of basic Berkeley sockets. These techniques have ranged from polling blocking sockets, non-blocking sockets controlled by Epoll, all the way through completely bypassing the Linux kernel for maximum network performance where you talk directly to the network interface card by using something like DPDK or Netmap. All these tools have their place, and generally occupy a space from convenience to performance. But in recent years, that landscape has changed massively.. The tools available to the average Linux systems developer have improved from the creation of io_uring, to the expansion of bpf from a simple filtering language to a full-on programming environment embedded directly in the kernel. Along with that came something called XDP (express datapath). This was Linux kernel's answer to kernel-bypass networking. AF_XDP is the new socket type created by this feature, and generally works very similarly to something like DPDK. History lessons out of the way, this talk will look into, and discuss the merits of this technology, it's place in the broader ecosystem and how it can be used to attain the highest level of performance possible. This talk will dive into crucial details, such as how AF_XDP works, how it can be integrated into a larger system and finally more advanced topics such as request sharding/load balancing. There will be detailed look at the design of AF_XDP, the eBpf code used, as well as the userspace code required to drive it all. It will also include performance numbers from this setup compared to regular kernel networking. And most importantly how to put all this together to handle as much data as possible on a single modern multi-core system.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Talk for SCaLE13x. Video: https://www.youtube.com/watch?v=_Ik8oiQvWgo . Profiling can show what your Linux kernel and appliacations are doing in detail, across all software stack layers. This talk shows how we are using Linux perf_events (aka "perf") and flame graphs at Netflix to understand CPU usage in detail, to optimize our cloud usage, solve performance issues, and identify regressions. This will be more than just an intro: profiling difficult targets, including Java and Node.js, will be covered, which includes ways to resolve JITed symbols and broken stacks. Included are the easy examples, the hard, and the cutting edge.
AWS provides customers a fully managed service for running Apache Flink applications called Amazon Kinesis Data Analytics. Running a hosted service for Apache Flink applications that accept arbitrary Java code from customers can pose unique challenges. One such challenge is issue attribution. How do you determine whether runtime errors are due to errors in a customer’s code or problems with the underlying infrastructure? In this talk, we will describe how we automatically categorize errors and either programmatically or manually intervene to restore application availability. This is critical for us to ensure availability of application that is expected of an AWS service and also maintain sustainable operations. We will review how we improved the Apache Flink engine to crisply disambiguate issues, design choices we made and how it helped us in ensuring highly available Apache Flink applications.
High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
Introduction to high performance computing, what is it, how to use it and when to use what. Provides a detailed checklist how to build pipelines and tips to optimize cluster usage and reduce waiting time in queue. It also provides a quick overview of resources available in Compute Canada.
This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.
https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksAmazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We will also provide an overview of the newest instances announced at re:Invent, including the latest generation of Memory and Compute Optimized Instances R4 and C5 instances, new Storage Optimized High I/O I3 instances, and new larger T2 instances. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Learning Objectives:
• Get an overview of the EC2 instance platform, key platform features, and the concept of instance generations
• Learn about the latest generation of Amazon EC2 Instances
• Learn best practices around instance selection to optimize performance
(DVO312) Sony: Building At-Scale Services with AWS Elastic BeanstalkAmazon Web Services
Learn about Sony's efforts to build a cloud-native authentication and profile management platform on AWS. Sony engineers demonstrate how they used AWS Elastic Beanstalk (Elastic Beanstalk) to deploy, manage, and scale their applications. They also describe how they use AWS CloudFormation for resource provisioning, Amazon DynamoDB for the main database, and AWS Lambda and Amazon Redshift for log handling and analysis. This discussion focuses on best practices, security considerations, tradeoffs, and final architecture and implementation. By the end of the session, you will clearly understand how to use Elastic Beanstalk as a platform to quickly and easily build at-scale web application on AWS, and how to use Elastic Beanstalk with other AWS services to build cloud-native applications.
AWS re:Invent 2016: The AWS Hero’s Journey to Achieving Autonomous, Self-Heal...Amazon Web Services
We are all embarking on a journey in the cloud that can be frightening at times, thrilling at others, but at all times filled with pitfalls and scary monsters that threaten the security of our infrastructure, applications, and data. The ultimate reward for all our hard work is to achieve a state of autonomous, self-healing security within our environment--one that can withstand any threats, whether internal or external. In this session, we walk you through the steps you need to be successful in your journey, just like Ellie Mae and many other enterprises and agencies. Your journey starts with security automation, and from there you will push outside of your security comfort zone, thanks to the gift of enhanced visibility and omniscience. Next we use CloudFormation Templates and custom signatures to move through our next security challenge with speed, and finally, we build auto-remediation into our security strategy with AWS Lambda workflows that enable the system to self-correct when misconfigurations occur. This fast-paced session will be filled code, best practices to help you in your quest, and even a few surprises about the ultimate destination of your journey. Session sponsored by Evident.io.
AWS Competency Partner
AWS re:Invent 2016: Another Day, Another Billion Packets (NET401)Amazon Web Services
In this session, we walk through the Amazon VPC network presentation and describe the problems we were trying to solve when we created it. Next, we walk through how these problems are traditionally solved, and why those solutions are not scalable, inexpensive, or secure enough for AWS. Finally, we provide an overview of the solution that we've implemented and discuss some of the unique mechanisms that we use to ensure customer isolation, get packets into and out of the network, and support new features like VPC endpoints.
AWS re:Invent 2016: Optimizing Network Performance for Amazon EC2 Instances (...Amazon Web Services
Many customers are using Amazon EC2 instances to run applications with high performance networking requirements. In this session, we provide an overview of Amazon EC2 network performance features (enhanced networking, ENA, placement groups, etc.), and discuss how we are innovating on behalf of our customers to improve networking performance in a scalable and cost-efficient manner. We share best practices and performance tips for getting the best networking performance out of your Amazon EC2 instances.
AWS re:Invent 2016: AWS Mobile State of the Union - Serverless, New User Expe...Amazon Web Services
AWS provides a range of services and tools to help you create industry leading, cloud-enabled mobile apps that can securely scale to millions of users globally. Join Amit Patel, GM of AWS Mobile, to hear our vision for mobile apps and the cloud, industry trends, recent product launches, and success stories directly from our customers. We'll walk through and demo the AWS Mobile offerings for building compelling cloud-enabled mobile apps and for engaging your app users. You’ll learn how to use these offerings (serverless – API Gateway/Lambda, Cognito, and new services) to make it easy to develop both your iOS and Android frontend, as well as your mobile backend.
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon EC2 provides a broad selection of instance types to deliver high performance for a diverse mix of applications. In this session, we overview the drivers of system performance and discuss in depth how Amazon EC2 instances deliver system performance while also providing elasticity and complete control over your infrastructure. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
Amazon Elastic Compute Cloud (Amazon EC2) provides a broad selection of instance types to accommodate a diverse mix of workloads. In this technical session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, and Memory Optimized families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Learning Objectives: • Understand the differences between instances • Learn best practices and tips for getting the most out of EC2 instances
Amazon EC2 provides a broad selection of instance types to deliver high performance for a diverse mix of applications. In this session, we overview the drivers of system performance and discuss in depth how Amazon EC2 instances deliver system performance while also providing elasticity and complete control over your infrastructure. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Amazon EC2 provides a broad selection of instance types to deliver high performance for a diverse mix of applications. In this session, we overview the drivers of system performance and discuss in depth how Amazon EC2 instances deliver system performance while also providing elasticity and complete control over your infrastructure. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
AWS Speaker: Ian Massingham, Sr Mgr, Technical Evangelist - Amazon Web Services
Customer Speaker: Andrew Mori, Konga, Technical Director
Optimizing elastic search on google compute engineBhuvaneshwaran R
If you are running the elastic search clusters on the GCE, then we need to take a look at the Capacity planning, OS level and Elasticsearch level optimization. I have presented this at GDG Delhi on Feb 22,2020.
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
Your AMI is one of the core foundations for running applications and services effectively on Amazon EC2. In this session, you'll learn how to optimize your AMI, including how you can measure and diagnose system performance and tune parameters for improved CPU and network performance. We'll cover application-specific examples from Netflix on how optimized AMIs can lead to improved performance.
Amazon EC2 changes the economics of computing and provides you with complete control of your computing resources. It is designed to make web-scale cloud computing easier for developers. In this session, we will take you on a journey, starting with the basics of key management and security groups and ending with an explanation of Auto Scaling and how you can use it to match capacity and costs to demand using dynamic policies. We will also discuss tools and best practices that will help you build failure resilient applications that take advantage of the scale and robustness of AWS regions.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
UIIN Conference, Madrid, 27-29 May 2024
James Wilson, Orkestra and Deusto Business School
Emily Wise, Lund University
Madeline Smith, The Glasgow School of Art
Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
2. What to Expect from the Session
Understanding of the factors that goes into choosing an EC2
instance
Defining system performance and how it is characterized for
different workloads
How Amazon EC2 instances deliver performance while providing
flexibility and agility
How to make the most of your EC2 instance experience through the
lens of several instance types
9. What’s a Virtual CPU? (vCPU)
A vCPU is typically a hyper-threaded physical core*
On Linux, “A” threads enumerated before “B” threads
On Windows, threads are interleaved
Divide vCPU count by 2 to get core count
Cores by EC2 & RDS DB Instance type:
https://aws.amazon.com/ec2/virtualcores/
* The “t” family is special
10.
11. Disable Hyper-Threading If You Need To
Useful for FPU heavy applications
Use ‘lscpu’ to validate layout
Hot offline the “B” threads
for i in `seq 64 127`; do
echo 0 > /sys/devices/system/cpu/cpu${i}/online
done
Set grub to only initialize the first half of
all threads
maxcpus=63
[ec2-user@ip-172-31-7-218 ~]$ lscpu
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 4
NUMA node(s): 4
Model name: Intel(R) Xeon(R) CPU
Hypervisor vendor: Xen
Virtualization type: full
NUMA node0 CPU(s): 0-15,64-79
NUMA node1 CPU(s): 16-31,80-95
NUMA node2 CPU(s): 32-47,96-111
NUMA node3 CPU(s): 48-63,112-127
14. Resource Allocation
All resources assigned to you are dedicated to your instance with no
over commitment*
All vCPUs are dedicated to you
Memory allocated is assigned only to your instance
Network resources are partitioned to avoid “noisy neighbors”
Curious about the number of instances per host? Use “Dedicated
Hosts” as a guide.
*Again, the “T” family is special
15. “Launching new instances and running tests
in parallel is easy…[when choosing an
instance] there is no substitute for measuring
the performance of your full application.”
- EC2 documentation
16. Timekeeping Explained
Timekeeping in an instance is deceptively hard
gettimeofday(), clock_gettime(), QueryPerformanceCounter()
The TSC
CPU counter, accessible from userspace
Requires calibration, vDSO
Invariant on Sandy Bridge+ processors
Xen pvclock; does not support vDSO
On current generation instances, use TSC as clocksource
17. Benchmarking - Time Intensive Application
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
time_t start,end;
time (&start);
for ( int x = 0; x < 100000000; x++ ) {
float f;
float g;
float h;
f = 123456789.0f;
g = 123456789.0f;
h = f * g;
struct timeval tv;
gettimeofday(&tv, NULL);
}
time (&end);
double dif = difftime (end,start);
printf ("Elasped time is %.2lf seconds.n", dif );
return 0;
}
21. P-state and C-state control
c4.8xlarge, d2.8xlarge, m4.10xlarge,
m4.16xlarge, p2.16xlarge, x1.16xlarge,
x1.32xlarge
By entering deeper idle states, non-idle
cores can achieve up to 300MHz higher
clock frequencies
But… deeper idle states require more
time to exit, may not be appropriate for
latency-sensitive workloads
Limit c-state by adding
“intel_idle.max_cstate=1” to grub
22. Tip: P-state control for AVX2
If an application makes heavy use of AVX2 on all cores, the processor
may attempt to draw more power than it should
Processor will transparently reduce frequency
Frequent changes of CPU frequency can slow an application
sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
See also: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html
23. Review: T2 Instances
Lowest cost EC2 instance at $0.0065 per hour
Burstable performance
Fixed allocation enforced with CPU credits
Model vCPU Baseline CPU Credits
/ Hour
Memory
(GiB)
Storage
t2.nano 1 5% 3 .5 EBS Only
t2.micro 1 10% 6 1 EBS Only
t2.small 1 20% 12 2 EBS Only
t2.medium 2 40%** 24 4 EBS Only
t2.large 2 60%** 36 8 EBS Only
General purpose, web serving, developer environments, small databases
24. How Credits Work
A CPU credit provides the performance of a
full CPU core for one minute
An instance earns CPU credits at a steady rate
An instance consumes credits when active
Credits expire (leak) after 24 hours
Baseline rate
Credit
balance
Burst
rate
26. Review: X1 Instances
Largest memory instance with 2 TB of DRAM
Quad socket, Intel E7 processors with 128 vCPUs
Model vCPU Memory (GiB) Local Storage Network
x1.16xlarge 64 976 1x 1920GB SSD 10Gbps
x1.32xlarge 128 1952 2x 1920GB SSD 20Gbps
In-memory databases, big data processing, HPC workloads
27. NUMA
Non-uniform memory access
Each processor in a multi-CPU system has
local memory that is accessible through a
fast interconnect
Each processor can also access memory
from other CPUs, but local memory access
is a lot faster than remote memory
Performance is related to the number of
CPU sockets and how they are connected -
Intel QuickPath Interconnect (QPI)
30. Tip: Kernel Support for NUMA Balancing
An application will perform best when the threads of its processes are
accessing memory on the same NUMA node.
NUMA balancing moves tasks closer to the memory they are accessing.
This is all done automatically by the Linux kernel when automatic NUMA
balancing is active: version 3.8+ of the Linux kernel.
Windows support for NUMA first appeared in the Enterprise and Data
Center SKUs of Windows Server 2003.
Set “numa=off” or use numactl to reduce NUMA paging if your
application uses more memory than will fit on a single socket or has
threads that move between sockets
31. Operating Systems Impact Performance
Memory intensive web application
Created many threads
Rapidly allocated/deallocated memory
Comparing performance of RHEL6 vs RHEL7
Notice high amount of “system” time in top
Found a benchmark tool (ebizzy) with a similar performance profile
Traced it’s performance with “perf”
32. On RHEL6
[ec2-user@ip-172-31-12-150-RHEL6 ebizzy-0.3]$ sudo perf stat ./ebizzy -S 10
12,409 records/s
real 10.00 s
user 7.37 s
sys 341.22 s
Performance counter stats for './ebizzy -S 10':
361458.371052 task-clock (msec) # 35.880 CPUs utilized
10,343 context-switches # 0.029 K/sec
2,582 cpu-migrations # 0.007 K/sec
1,418,204 page-faults # 0.004 M/sec
10.074085097 seconds time elapsed
34. On RHEL7
[ec2-user@ip-172-31-7-22-RHEL7 ~]$ sudo perf stat ./ebizzy-0.3/ebizzy -S 10
425,143 records/s
real 10.00 s
user 397.28 s
sys 0.18 s
Performance counter stats for './ebizzy-0.3/ebizzy -S 10':
397515.862535 task-clock (msec) # 39.681 CPUs utilized
25,256 context-switches # 0.064 K/sec
2,201 cpu-migrations # 0.006 K/sec
14,109 page-faults # 0.035 K/sec
10.017856000 seconds time elapsed
Up from 12,400 records/s!
Down from 1,418,204!
36. Hugepages
Disable Transparent Hugepages
# echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
# echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
Use Explicit Huge Pages
$ sudo mkdir /dev/hugetlbfs
$ sudo mount -t hugetlbfs none /dev/hugetlbfs
$ sudo sysctl -w vm.nr_hugepages=10000
$ HUGETLB_MORECORE=yes LD_PRELOAD=libhugetlbfs.so numactl --cpunodebind=0
--membind=0 /path/to/application
See also: https://lwn.net/Articles/375096/
37. Hardware
Split Driver Model
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Storage
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
1
23
4
5
38. Granting in pre-3.8.0 Kernels
Requires “grant mapping” prior to 3.8.0
Grant mappings are expensive operations due to TLB flushes
SSD
Inter domain I/O:
(1) Grant memory
(2) Write to ring buffer
(3) Signal event
(4) Read ring buffer
(5) Map grants
(6) Read or write grants
(7) Unmap grants
read(fd, buffer,…)
I/O domain Instance
39. Granting in 3.8.0+ Kernels, Persistent and Indirect
Grant mappings are set up in a pool one time
Data is copied in and out of the grant pool
SSD
read(fd, buffer…)
I/O domain Instance
Grant pool
Copy to
and from
grant pool
41. 2009 – Longer ago than you think
Avatar was the top movie in the theaters
Facebook overtook MySpace in active users
President Obama was sworn into office
The 2.6.32 Linux kernel was released
Tip: Use 3.10+ kernel
Amazon Linux 13.09 or later
Ubuntu 14.04 or later
RHEL/Centos 7 or later
Etc.
42. Device Pass Through: Enhanced Networking
SR-IOV eliminates need for driver domain
Physical network device exposes virtual function to instance
Requires a specialized driver, which means:
Your instance OS needs to know about it
EC2 needs to be told your instance can use it
43. Hardware
After Enhanced Networking
Driver Domain Guest Domain Guest Domain
VMM
NIC
Driver
Physical
CPU
Physical
Memory
SR-IOV Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
1
2
3
NIC
Driver
44. Elastic Network Adapter
Next Generation of Enhanced
Networking
Hardware Checksums
Multi-Queue Support
Receive Side Steering
20Gbps in a Placement Group
New Open Source Amazon Network
Driver
45. Network Performance
20 Gigabit & 10 Gigabit
Measured one-way, double that for bi-directional (full duplex)
High, Moderate, Low – A function of the instance size and EBS
optimization
Not all created equal – Test with iperf if it’s important!
Use placement groups when you need high and consistent instance
to instance bandwidth
All traffic limited to 5 Gb/s when exiting EC2
46. EBS Performance
Instance size affects throughput
Match your volume size and
type to your instance
Use EBS optimization if EBS
performance is important
47. Choose HVM AMIs
Timekeeping: use TSC
C state and P state controls
Monitor T2 CPU credits
Use a modern Linux OS
NUMA balancing
Persistent grants for I/O performance
Enhanced networking
Profile your application
Summary: Getting the Most Out of EC2 Instances
48. Bare metal performance goal, and in many scenarios already there
History of eliminating hypervisor intermediation and driver domains
Hardware assisted virtualization
Scheduling and granting efficiencies
Device pass through
Virtualization Themes
49. Next Steps
Visit the Amazon EC2 documentation
Launch an instance and try your app!
Let’s start at the basics
What is an EC2 instance?
They are virtual machines
Guests
On a Hypervisor
On physical hardware
Presentation built to be a deep dive
Going into the depths of how EC2 works
Highlight actionable things
Get the most performance out of your instances
Talk a bit about how to choose an EC2 instance
When it comes to performance
Making sure you’re picking the right one is as important as the tuning tips that I’m also going to talk about
EC2 is a big subject
Talk about the Purchase Options
APIs & SDK’s
Networking
Talk today about the instances themselves
How they operate
Features
Options when you go to launch
Other Topics
List Recommended sessions at the end
Let’s start at the basics
What is an EC2 instance?
They are virtual machines
Guests
On a Hypervisor
On physical hardware
Launched in 2006
“an instance”
Didn’t have a name
Did get any choices
Like the Model T – any color as long as black
Eventually gave it a name
M1 instance
Customers wanted more choice
We’ve been iterating and growing ever since.
Not only adding instances, but changing how EC2 works
Launched the cc2 in 2011
Placement groups
Bandwidth and latency
Hardware assisted virtualization
Exposes more of the underlying hardware
Lets you get even more performance
EC2 is always growing and changing based on customer feedback
Always check our documentation for the latest as you’re building out
How we do things today may be different in the future
Go over how we talk about instances and name them
Get on the same page
First letter is the family
Stands for what it’s suited for or what resources it has
C for compute
R for Ram
I for IOPS
Number is the generation
Like a version number
Last is the instance size
T-Shirt size
You’ve got a lot of choices and flexibility when you go to launch
It can seem overwhelming
Trying to pick the right instance
Looking at just the families
First find what your application is constrained by
If you need memory, start with R3
CPU, go with C4
If balanced, look at general purpose M or T
Perspective of your constraint, it’s easy to pick the right family
Test to find the right size within that family
If you need a little help, check the documentation
List of workloads for each family.
When you’re looking at instances, you’ll see something call vCPUs
On modern instances not in the T family
An hyperthreaded core
Hyperthreading is great to increase performance
I kinda lets your CPU do two things at once
Like waiting on IO
Real core count
Divide by two
Visit link, used for licensing.
To give a visual representation…
Output of LSTOPO on m4.10xlarge
Linux utility for enumerating hardware
Can run on any instance or physical server
Shows graphical output of hardware configuration
Sockets
Memory on each socket
L1-3 Cache
CPU thread to core mapping
Case of m4.10xlarge
40 threads
20 cores
Some applications don’t benefit from hyperthreading
Context switching may decrease performance
Typically compute heavy apps
Financial calculations & engineering simulations
These apps usually disable hyperthreading
If you’re not sure or don’t typically disable hyperthreading, don’t worry
If you do, try to run with it disabled it on EC2 and see if it improves performance
Easy on Linux, harder on Windows
Linux
The first set of threads on each cores is listed first, and the second or B threads are listed after that
Disable the last half, which will be all the B threads
Two ways
Online
Great for no reboot
But it may cause instability
Disable processors where threads may be running
Won’t be persisted after a reboot
In grub,
Set max cpus to match physical cpu count minus 1
Safer – disabled when booting
But makes it harder when you switch size
Windows is harder
Interleaved
Have to use CPU affinity
Same m4.10xlarge with hyperthreading turned off
Only one CPU thread per core
Comapred to the two that you saw earlier
Let’s dig into how instance sizes work
We build instances
Easy to scale vertically and horizontally
Look at the c4 family as an example
C4.8xlarge on the left
Largest instance size available
That single c4.8xlarge
Roughly equal to 2 c4.4xlarges
That c4.4xlarge has roughly half the
Number of vCPUS
Amount of ram
Available network bandwidth
Keeps following down the line
2x c4.4xlarge = 4x c4.2xlarge
And so on…
Reason is because of how we partition instances
Largest size is typically a full server
On the smaller one’s you’re running a fraction of it depending on the size
Virtualization historically has a bad reputation
Usually used to manage over utilization of resources
More virtual machines than physical resources
We use virtualization for a lot of other reasons
Security & Isolation
Dedicate specific resources to specific customers
vCPUS as an example
With exception of T
When you’re assigned a vCPU
only customer using it
Not sharing with anyone else on the box
Same applies to Memory & Network
We build with the goal of providing a consistent experience
No matter what else is happening
Last thing I want to say about choosing your instance
Cheesy to quote documentation
Good sentiment
Easy to get an app up and running
Don’t run synthetic benchmarks
Install your app and send some realistic load
Examples:
Mobile App
HPC application
BI database
Use a real workload to understand how your app will behave
Digging deeper into the OS…
On all systems, time keeping is important
Used for things like
Processing interrupts
Getting the time and date
Measuring performance
Most AMI’s on AWS use Xen clock by default
Compatible with all instance types
TSC was introduced in Sandy Bridge
Handled by bare metal
You’re talking to your processor
Not the hypervisor
And because of this, calls to it are going to be much quicker
To demonstrate this – simple application
It does two things
Performs a large number of get time of day calls
a bit of math
Don’t laugh at my code…
I’m a sysadmin, not a developer
Quick and dirty to test it out
These are results on Xen clock source
Profiled with Strace
Really great tool to use with any app, yours included
Shows the number of system calls make
& the time they took
Gettimeofday take the most amount of time with a lot of calls
Overall, the test took about 12 seconds to run with Xen clock source
On the same system
Switched clocksource to TSC
Reran the test
Results look a lot different
Gettimeofday doesn’t show up
Run time reduced to two seconds
This is extreme for a simple app
I’ve seen apps improve by as much as 40%
It’s an easy change to make on Linux
Do it while the system is running
First command shows available clock sources
Second shows the current clock source
Third would change it to TSC
On windows, it’s handled automatically
If you’re running a recently released EC2 instance
Can improve a lot of apps
JVM debugging
Performance tracing
SAP applications
Recently change to the platform
added P & C state control to the platform with C4, now available on many more
First, let’s talk about C states
C states control the power savings features of a processor
Using c4.8xlarge as an example
Base clock speed of 2.9Ghz
Can turbo up to 3.5Ghz on one or two cores
Must let other cores idle down
Great when you need a few cores to have high frequencies
Letting them idle down
increase the time it takes for them to respond when you want to actually use them
So if you have an application where latency is important
You can limit how deep they’ll sleep
Setting cstate parameter in grub
You can use P state to set the desired running frequency of the cores
Some customers and some workloads
consistency is more important than performance
Some Game servers good example
Operate in loops
Loop needs to complete in the same time, every time
You can set the P state to prevent the processor to prevent it from scaling up and down
Operates at the same frequency all the time
Next I want to talk about T2 and why they’re special
T2 instance are great general purpose instances
Lowest costs instance available on AWS at ~1/2 a cent per hour for t2.nano
Great for workloads where CPU demand varies over time
Websites
Developer environments
Small database
You start with a baseline level of performance
That you can see in the chart above
The magic of T2 is that you earn credits when the instance is idle
Allows you to burst above the baseline
We launched T2 because we saw that most workloads aren’t using 100% of CPU all of the time
T2 family is a great way to
Still get the performance you need when you need it
Don’t pay for it when you don’t
Let’s talk about how credits work
You can think of credits in a T2 like a bucket
When you boot the instance
Start with enough credits for OS & Application
When your app is up and running, you’ll use credits when you use CPU
A single credit will let you run 100% of one core for one minute
When the work dies down and instance becomes idle
Earning new credits that will start to file up the bucket
Credits also expire after 24 hours if unused
To monitor those instances
Cloudwatch Metrics
Two available
The one in Orange is the credit usage
Spikes when usage is high
Shows you how many credits you’re using per minute
The Blue is the Balance
Keep this above zero if you want more performance than baseline
Monitoring your credit balance lets you ensure you’re getting consistent performance on a T2
What you’ll want to hook on if you’re using autoscaling
Recently launched the X1
Biggest instance
2TB of RAM
128 Virtual CPUs
Great for apps that need a huge memory footprint
Good for:
In memory databases
big data processing
some HPC
When you have that much memory
Managing is important
On any system with multiple sockets
Memory attached to local socket will be faster than remote
Concept is called NUMA
On Intel, there’s a QPI between sockets
It’s the bus that transfer memory from one to another
Look at the r3.8xlarge as an example
Two sockets
122GB of ram on each socket
Between are 2 QPI links
Application on the left reading from the right
Will go over the QPI
Fast, but not as fast as what’s attached directly
When you go to X1, things are more complex
X1 is a 4 socket system
Numa is more important
Compared to an r3.8xlarge
More memory per socket
Only one QPI between sockets
Memory transfers from one zone to another are going to take longer on X1
So what can we do?
If you’ve ever watched top on a linux system
shows threads moving from one core to another
Process scheduling to make sure work is balanced
Around 3.8 kernel, started to use NUMA affinity
Will try to keep processes in same NUMA zone
Will also try to move memory around to be close to the process
The downside is that this can actually slow down performance on some apps
Especially true if you have a large memory pool spanning sockets
The scheduler will be moving things around when it doesn’t need to be
To disable, set NUMA=off in grub
will disable memory transfers between zones
disable NUMA awareness for process scheduling
Alternative is to use numactl to lock processes to a specific zone
Only be reading and writing memory that’s local to them
Another thing to keep in mind
Operating system and libraries can effect application performance
Use not is running a modern linux kernel important
Run as recent of a distro as you can
Recent customer visit
Custom Application using a large amount of memory
EC2 performance not as good as on premise
Their app was very complex and it was hard to get quick results when making changes
Found a benchmark tool (ebizzy) with similar behavior to test
Results of ebizzy on RHEL6
Used perf to profile and see what’s happening at a system level
Generated 12,000 requests/second
Lots of time in system space
1.5 million page faults
Generated flame graphs to see what’s happening
Created by Brendan Gregg, check out his site for more information
A really good way to understand
Paths the code is taking a system
Time spent in specific calls
You can see ebizzy on the bottom
Making lots of madvise calls
End up with a xen hypercall
Compiled the same app on RHEL7 and tested on same instance type
Saw significantly better performance
RPS went from 12,000 – 425,000
Page faults went from 1.5 million to only 14,000
What happened?
This is where flamegraphs really shine
Same exact flame graph
Same Code
Sam run type
Only difference is the OS version
What the flamegraph showed us
Glibc Changed the path memory calls on RHEL7
Instead of long madvase with trip to hypervisor
Single intel optimized call for memory management
Recompile when moving to a different OS, it can make a big difference
Last memory related tip is to Disable transparent huge pages
Huge pages are a really big subject with a lot of different options
See article
It goes into detail about all the different options
Transparent is enabled by default on most recent distributions
Disabling transparent and using explicit
Can help significantly for apps that are accessing a lot of memory
Next, let’s talk about IO
We have a few families that are optimized for IO
I2 – IOPS – SSD Based
D2 – Dense storage - Magetic
Need a modern kernel to get best storage performance
Reason is split driver model
Application on left doing some disk IO
Talks to the front end driver
Then back end
Then real driver
Then hardware
Data transfer happens through shared pages
Need permissions to be granted and released
Granting had lots of overhead in early kernels
Every time it needs to write to disk
Talks to VMM
Get permission to write to device
Fill a buffer with the data
Pass to backend
Wait for data to be written
Remove the grant
Really expensive process, lots of buffer flushing
Gets worse the more CPUs you have
Persistent grants created to solve this.
Permission to write is reused for all transactions between front and back
Grants don’t need to be unmapped
Translation buffer never flushed
Much better performance for IO operations
Validating grants is easy
Run dmesg and grep for blockfront
This is i2.8xlarge
All volumes have persistant grants enabled.
If I haven’t said it enough
Using a modern kernel is really important
Many customers still use Centos6
Just by switching OS’s to 3.10
Seen as much as a 60% improvement
2.6 Kernel in Centos6 released in 2009
Long time ago in the cloud computing world
Please use a modern Kernel & OS.
Same lines as split driver model
Release enhanced networking with C3
Uses Single Root IO Virtualization – SRIOV
Physical device exposed to instance itself
Has a few requirements
Needs a special driver installed in the OS
EC2 needs to be told to expose it that way
Network path is much simpler
Packets don’t’ have to go through the VMM
Higher packets per second
Decreased jitter – talking to bare metal
It’s free on all supported instance
Enabled by default in many AMI’s
Highly recommend it if you’re touching the network
And we’re not done improving the network
Still making constant improvements
Latest is with a new Network Adapter
Launched with the X1
Called Elastic Network Adapter – ENA
Built a new Amazon developed Open source Driver
Will grow with us as we’re adding new features to the network
Built to handle throughputs up to 20 Gigabits/second
This + Hardware checksums & RSS make it the fastest network available on AWS today.
Touch briefly on network
Touch on a few points about network performance
Attend the Deep dive to learn more
Easy to forget that network can be a bottleneck on smaller instance types
Customer doing S3 performance testing
Not getting good performance
Found out all network traffic was going through a T2 NAT
Largest instances should get closer to 5Gbps when leaving EC2 and talking to things like S3
When we list 10 & 20 Gigabit bandwidth
Instance bandwidth is bi-directional
On p2, X1, and m4
20Gb in and Out at the same time
But you need placement groups and multiple TCP streams
Just like network throughput, EBS throughput is function of size of the instance
Larger instance, more EBS traffic
EBS optimization by default on newest instances
Don’t have to worry about network and EBS competing
Look at the EBS documentation
Table of every EBS optimized instance
Throughput and max IOPS
Great place to go to look for specific performance out of EBS
In conclusion
Lots of things
Getting the most out of it
At bare minimum
Benchmark your app
Use a modern OS
Monitor Cloudwatch
Use enhanced networking
Goal is to make virtualization as transparent as possible
Eliminate any inefficiencies it may cause
Goal of bare metal like performance
Already there in a lot of ways
So if you have any questions, the EC2 documentation is a great resource and covers even more than I could today. Otherwise, launch an instance and start testing your app. Thank you!
So if you have any questions, the EC2 documentation is a great resource and covers even more than I could today. Otherwise, launch an instance and start testing your app. Thank you!