This document provides an overview of performance tuning for Java applications. It discusses top-down and bottom-up performance analysis approaches. It also covers choosing the right garbage collector and JVM tuning basics like calculating allocation rates and live data size from GC logs. The document shows examples of tuning JVM settings for latency using CMS and G1 collectors as well as tuning for throughput using ParallelOldGC.
This document provides an overview and tuning guide for the Hotspot Garbage Collection system. It introduces the presenters and outlines what will be covered, including an introduction to GC concepts like collectors, flags, and tooling. It then discusses when and why GC tuning is important and covers common scenarios like memory leaks, long pause times, premature promotion, and low throughput. For each scenario, it provides an example log, analysis, and suggestions for tuning flags to address the issue. The goal is to help users understand how to analyze GC logs, identify issues, and tune the GC to improve performance.
LCU14-410: How to build an Energy Model for your SoCLinaro
LCU14-410: How to build an Energy Model for your SoC
---------------------------------------------------
Speaker: Morten Rasmussen
Date: September 18, 2014
---------------------------------------------------
★ Session Summary ★
- ARM to provide a quick overview of the current energy model
- Introduce the methodology/recipe used to build the energy model
- Discuss ways in which the model is used today and intended next steps
- Key outcomes:
- Describe the
- Identify gaps and limitations
Summary of EAS workshop (Amit)
-Summary of hacking sessions - plan to integrate Qualcomm-ARM-Linaro work to send upstream
-Key outcomes:
-List of features and responsibilities
-Dependencies between upstreaming of features, if any
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137778
Google Event: https://plus.google.com/u/0/events/ck3ti7eurknnsq0a4e9ks5a1sbs
Video: https://www.youtube.com/watch?v=JfZt8W3NVgk&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-410
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
This document discusses techniques for improving web performance, including using content delivery networks (CDNs) and optimizing front-end content delivery to reduce latency from networking issues like TCP slow start and SSL negotiation. It also covers protocols like SPDY and HTTP/2 that aim to enhance web performance through features like request multiplexing and server push. Tools for analyzing network performance are also mentioned.
This document discusses tuning the Java Hotspot G1 garbage collector (GC) for improved performance in BigData applications using HBase as a case study. It provides background on the G1 GC, describes how tuning the XX:G1HeapWastePercent flag from the default 5% to 2% resulted in a 29.3% reduction in total GC pause time, an 18.6% improvement in throughput, and a 15.7% reduction in latency for an HBase workload. The document concludes by providing contact information for those interested in learning more about G1 GC tuning and contributing to OpenJDK.
This document discusses using Perfmon and Profiler tools in SQL Server to capture metrics and traces in order to diagnose and resolve performance issues. It provides examples of using these tools to identify issues such as high disk queue lengths caused by table scans without indexes, low memory availability causing page file usage, long transaction log backups impacting drives, and inefficient cursor usage. The document emphasizes establishing a cause-and-effect relationship between metrics and traces to define effective mitigation strategies like adding indexes, memory, or changing queries.
This document discusses HTTP caching. It covers:
- The purpose of caching is to eliminate unnecessary requests and server load by caching responses.
- HTTP defines various caching mechanisms like expiration dates, ETags, and Cache-Control headers that allow caching by clients, proxies, and gateways.
- Cache-Control headers allow fine-grained control over caching behavior and expiration of responses.
- Status codes provide information about cacheability and conditional requests.
- Tools like browser developer tools and caching validation services can help test and debug caching.
This document discusses PostgreSQL streaming replication and switchover/switchback capabilities. It covers limitations in earlier PostgreSQL versions, timelines, new features in version 9.3 that enable switchover/switchback without needing fresh backups, and things to know like using clean shutdown and the recovery.conf file. A demo of these features is promised at the end.
This document provides an overview and tuning guide for the Hotspot Garbage Collection system. It introduces the presenters and outlines what will be covered, including an introduction to GC concepts like collectors, flags, and tooling. It then discusses when and why GC tuning is important and covers common scenarios like memory leaks, long pause times, premature promotion, and low throughput. For each scenario, it provides an example log, analysis, and suggestions for tuning flags to address the issue. The goal is to help users understand how to analyze GC logs, identify issues, and tune the GC to improve performance.
LCU14-410: How to build an Energy Model for your SoCLinaro
LCU14-410: How to build an Energy Model for your SoC
---------------------------------------------------
Speaker: Morten Rasmussen
Date: September 18, 2014
---------------------------------------------------
★ Session Summary ★
- ARM to provide a quick overview of the current energy model
- Introduce the methodology/recipe used to build the energy model
- Discuss ways in which the model is used today and intended next steps
- Key outcomes:
- Describe the
- Identify gaps and limitations
Summary of EAS workshop (Amit)
-Summary of hacking sessions - plan to integrate Qualcomm-ARM-Linaro work to send upstream
-Key outcomes:
-List of features and responsibilities
-Dependencies between upstreaming of features, if any
---------------------------------------------------
★ Resources ★
Zerista: http://lcu14.zerista.com/event/member/137778
Google Event: https://plus.google.com/u/0/events/ck3ti7eurknnsq0a4e9ks5a1sbs
Video: https://www.youtube.com/watch?v=JfZt8W3NVgk&list=UUIVqQKxCyQLJS6xvSmfndLA
Etherpad: http://pad.linaro.org/p/lcu14-410
---------------------------------------------------
★ Event Details ★
Linaro Connect USA - #LCU14
September 15-19th, 2014
Hyatt Regency San Francisco Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
This document discusses techniques for improving web performance, including using content delivery networks (CDNs) and optimizing front-end content delivery to reduce latency from networking issues like TCP slow start and SSL negotiation. It also covers protocols like SPDY and HTTP/2 that aim to enhance web performance through features like request multiplexing and server push. Tools for analyzing network performance are also mentioned.
This document discusses tuning the Java Hotspot G1 garbage collector (GC) for improved performance in BigData applications using HBase as a case study. It provides background on the G1 GC, describes how tuning the XX:G1HeapWastePercent flag from the default 5% to 2% resulted in a 29.3% reduction in total GC pause time, an 18.6% improvement in throughput, and a 15.7% reduction in latency for an HBase workload. The document concludes by providing contact information for those interested in learning more about G1 GC tuning and contributing to OpenJDK.
This document discusses using Perfmon and Profiler tools in SQL Server to capture metrics and traces in order to diagnose and resolve performance issues. It provides examples of using these tools to identify issues such as high disk queue lengths caused by table scans without indexes, low memory availability causing page file usage, long transaction log backups impacting drives, and inefficient cursor usage. The document emphasizes establishing a cause-and-effect relationship between metrics and traces to define effective mitigation strategies like adding indexes, memory, or changing queries.
This document discusses HTTP caching. It covers:
- The purpose of caching is to eliminate unnecessary requests and server load by caching responses.
- HTTP defines various caching mechanisms like expiration dates, ETags, and Cache-Control headers that allow caching by clients, proxies, and gateways.
- Cache-Control headers allow fine-grained control over caching behavior and expiration of responses.
- Status codes provide information about cacheability and conditional requests.
- Tools like browser developer tools and caching validation services can help test and debug caching.
This document discusses PostgreSQL streaming replication and switchover/switchback capabilities. It covers limitations in earlier PostgreSQL versions, timelines, new features in version 9.3 that enable switchover/switchback without needing fresh backups, and things to know like using clean shutdown and the recovery.conf file. A demo of these features is promised at the end.
Whitepaper: Exadata Consolidation Success StoryKristofferson A
1. The document discusses database and server consolidation using Oracle Exadata and describes the challenges of managing highly consolidated environments to ensure quality of service.
2. It outlines a 4-step process for accurate provisioning and capacity planning using a tool called the Provisioning Worksheet: collecting database details, defining the target Exadata hardware capacity, creating a provisioning plan, and reviewing resource utilization.
3. The process relies on basic capacity planning to ensure workload requirements fit available capacity. Database CPU and storage requirements are gathered, a target Exadata configuration is set, databases are mapped to nodes in the plan, and final utilization is summarized to identify any capacity shortfalls.
Benchmarking OTM and Java - Is Your Platform Limiting PerformanceMavenWire
This document discusses benchmarking various hardware platforms and operating systems for optimal OTM performance. It provides an agenda for a presentation that will teach how to benchmark OTM platforms using tools like VolanoMark, DaCapo, Soap Stone and Hammerora. The presentation will show hands-on exercises for running the benchmarks and interpreting the results. Higher scores are better for VolanoMark and Soap Stone, while lower scores indicate better performance for DaCapo and Hammerora. Online resources for monitoring performance and learning more about the benchmarks are also provided.
Anti patterns in Hadoop Cluster deploymentSunil Govindan
Rohith Sharma, Naganarasimha, and Sunil presented on Hadoop cluster configurations and anti-patterns. They discussed sample node manager configurations with high resources, related YARN and MapReduce resource tuning settings, and anti-patterns like not configuring container heap size properly leading to out of memory errors. They also covered YARN capacity scheduler queue planning best practices like queue mapping, preemption, user limits, and application priority to improve cluster utilization.
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Kristofferson A
The document discusses mining the Automatic Workload Repository (AWR) in Oracle databases for capacity planning, visualization, and other real-world uses. It introduces Karl Arao as a speaker and discusses topics he will cover including AWR, diagnosing performance issues using AWR data, visualization of AWR data, capacity planning, and tools for working with AWR data like scripts and linear regression. References and resources on working with AWR are also provided.
Hotspot Garbage Collection - The Useful PartsjClarity
The document discusses garbage collection in the Java HotSpot virtual machine. It covers the basics of garbage collection theory, HotSpot's memory organization and different collectors. The presentation also discusses how to read and analyze GC logs to understand application performance and identify issues like memory leaks or premature promotion.
This document provides an overview of a Hadoop cluster deployment and configuration best practices from Rohith Sharma, Naganarasimha, and Sunil. It discusses:
1. Examples of YARN resource configurations for high-end node managers with 64GB RAM, 8-16 CPU cores, and 100TB of disk space.
2. Common YARN and MapReduce configuration parameters to tune resources like memory, CPU, and I/O.
3. Anti-patterns related to container memory allocation, long shuffle phases in MapReduce, and RM restarts impacting performance.
4. Best practices for queue configuration, capacity planning, user limits, and application priorities to improve cluster utilization.
What's New in Postgres Plus Advanced Server 9.3EDB
Learn more about EnterpriseDB's Postgres Plus Advanced Server 9.3!
Highlights of Postgres Plus Advanced Server 9.3 include:
Major Partitioning Enhancements
Materialized Views
New RPM packages
New EDB Failover Manager
New capabilities in Postgres Enterprise Manager 4.0
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Naganarasimha Garla
Distributed resource scheduling frameworks like Kubernetes, Mesos, YARN, and Swarm each take different architectural approaches to scheduling resources across a cluster. The document provides an overview of each framework's architecture, key features related to scheduling like priority, isolation, and support for multiple container types. It also compares the frameworks based on functional attributes such as resource granularity, scheduler support, oversubscription, and support for isolation and applications.
This is part one of my Monitoring Distributed Apps series.
Here we explore premises of Distributed Application monitoring focusing on metrics, why do we need them and gradually introducing Prometheus as a solution.
The video recording is available here: https://youtu.be/lvogDmRN-Hs
This is a copy of the NoSQL Day 2019 session presented in Washington D.C on May 2019. It covers a series of the most common HBase issues observed among Cloudera customer base, together with RCA and recipes for recovery.
This document discusses the timeline server which collects and stores application metrics and event data in YARN. It describes the limitations of the original job history server and application history server, which only supported MapReduce jobs and did not capture YARN-level data. The timeline server versions 1 and 2 are presented as improved solutions, with version 2 focusing on distributed and reliable storage in HBase, a new data model to support arbitrary application types, and online aggregation of metrics.
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)Kristofferson A
This document discusses resource management in Oracle databases. It begins with an introduction of the speaker and his company, Accenture Enkitec Group. It then covers various aspects of resource management including the consolidation and resource management lifecycle, new features in Oracle 12c such as instance caging and threaded execution, barriers to adopting resource management, and a systematic approach to implementing resource management. Real-world scenarios are also discussed.
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Michael Pirker
Latency problem was reported for VDisk CA-CL1-Disk04-N at 02/05/15 8:09,
The environment are two clusters connected with Metro Mirror. The first aim of this document is to show how we found the root cause of this problem in the link between the two clusters.
The second aim of this document is to describe how the root cause for this problem was found by using the BVQ structured performance problem analysis method. It demonstrates that successful analysis work needs a structured method and also a tool which supports this method and delivers the needed technical insight. We have the concept that everybody should be able to conduct a performance analysis. This is important because the level of service is lowered day by day and especially small customers are more and more reliant on their own skills or on the skills of their partners. This is a common problem occurring at all vendors!
This document discusses strategies for scaling MySQL infrastructure. It covers various backup options like physical backups using data file copies or snapshots, and logical backups using SQL dumps or CSV exports. It emphasizes the importance of consistency across versions, data, configurations, and operations. Security best practices are outlined like restricting superuser access and using SSL. Scaling is discussed in terms of read replicas, connection pooling, proxy usage, and topology managers. Automation, advanced monitoring, and separate developer environments are also recommended.
The document discusses lessons learned from setting up and maintaining a PostgreSQL cluster for a data analytics platform. It describes four stories where problems arose: 1) Implementing automatic failover using Repmgr when the master node failed, 2) The disk filling up faster than expected due to PostgreSQL's MVCC implementation, 3) Being unable to add a new standby node due to missing WAL segments, and 4) Long running queries on the standby node causing conflicts with replication. The key lessons are around using the right tools like Repmgr for replication management, tuning autovacuum, archiving WALs, and addressing hardware limitations for analytics workloads.
5 Best Practices for Monitoring Hive and MapReduce Application PerformanceDriven Inc.
Apache Hive queries and MapReduce jobs often experience performance issues and bottlenecks because of the multi-tenant nature of Hadoop and a lack of visibility into performance. There are 5 best practices that leading enterprises have used to gain control and reliably meet service levels.
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
MapQuest developed a search ahead feature for their mobile app to enable auto-complete searching across their large dataset. They used Solr and implemented various techniques to optimize performance, including custom routing, analysis during ETL, and extensive JVM tuning. Their architecture included multiple Solr clusters with different configurations. Through testing and monitoring, they were able to meet their sub-140ms response time requirement for queries.
This document summarizes a case study of using Apache Spark at scale to process over 60 TB of data at Facebook. It describes the previous implementation using Hive, which involved many small jobs that were unmanageable and slow. The new Spark implementation processes the entire dataset in a single job with two stages, shuffling over 90 TB of intermediate data. It provides performance comparisons showing significant reductions in CPU time, latency, and resource usage. It also details reliability and performance improvements made to Spark, such as fixing memory leaks, enabling seamless cluster restarts, and reducing shuffle write latency. Configuration tuning tips are provided to optimize memory usage and shuffle processing.
This document discusses Apache Tez, a framework for accelerating Hadoop query processing. Some key points:
- Tez is a dataflow framework that expresses computations as directed acyclic graphs (DAGs) of tasks, allowing for optimizations like container reuse and locality-aware scheduling.
- It is built on YARN and provides a customizable execution engine as well as APIs for applications like Hive and Pig.
- By expressing jobs as DAGs, Tez can reduce overheads, queueing delays, and better utilize cluster resources compared to the traditional MapReduce framework.
- The document provides examples of how Tez can improve performance for operations like joins, aggregations, and handling of multiple outputs
This document provides a summary of improvements made to Hive's performance through the use of Apache Tez and other optimizations. Some key points include:
- Hive was improved to use Apache Tez as its execution engine instead of MapReduce, reducing latency for interactive queries and improving throughput for batch queries.
- Statistics collection was optimized to gather column-level statistics from ORC file footers, speeding up statistics gathering.
- The cost-based optimizer Optiq was added to Hive, allowing it to choose better execution plans.
- Vectorized query processing, broadcast joins, dynamic partitioning, and other optimizations improved individual query performance by over 100x in some cases.
This document discusses performance monitoring in Hive and provides information on how to identify reasons for inefficient Hive queries. It suggests examining the Hive plan, Hive logs, MapReduce monitoring, and Hadoop job logs to find issues related to the Hive query, Hadoop/Hive configuration, data distribution strategy, or system problems. Sample output is shown for a Hive abstract syntax tree, stage dependencies, stage plans, and session logs to demonstrate how these sources can be analyzed.
Hive on spark is blazing fast or is it finalHortonworks
This presentation was given at the Strata + Hadoop World, 2015 in San Jose.
Apache Hive is the most popular and most widely used SQL solution for Hadoop. To keep pace with Hadoop’s increasingly vital role in the Enterprise, Hive has transformed from a batch-only, high-latency system into a modern SQL engine capable of both batch and interactive queries over large datasets. Hive’s momentum is accelerating: With Spark integration and a shift to in-memory processing on the horizon, Hive continues to expand the boundaries of Big Data.
In this talk the speakers examined Hive performance, past, present and future. In particular they looked at Hive’s origins as a petabyte scale SQL engine.
Through some numbers and graphs, they showed how Hive became 100x faster by moving beyond MapReduce, by vectorizing execution and by introducing a cost-based optimizer.
They detailed and discussed the challenges of scalable SQL on Hadoop.
The looked into Hive’s sub-second future, powered by LLAP and Hive on Spark.
And showed just how fast Hive on Spark really is.
Whitepaper: Exadata Consolidation Success StoryKristofferson A
1. The document discusses database and server consolidation using Oracle Exadata and describes the challenges of managing highly consolidated environments to ensure quality of service.
2. It outlines a 4-step process for accurate provisioning and capacity planning using a tool called the Provisioning Worksheet: collecting database details, defining the target Exadata hardware capacity, creating a provisioning plan, and reviewing resource utilization.
3. The process relies on basic capacity planning to ensure workload requirements fit available capacity. Database CPU and storage requirements are gathered, a target Exadata configuration is set, databases are mapped to nodes in the plan, and final utilization is summarized to identify any capacity shortfalls.
Benchmarking OTM and Java - Is Your Platform Limiting PerformanceMavenWire
This document discusses benchmarking various hardware platforms and operating systems for optimal OTM performance. It provides an agenda for a presentation that will teach how to benchmark OTM platforms using tools like VolanoMark, DaCapo, Soap Stone and Hammerora. The presentation will show hands-on exercises for running the benchmarks and interpreting the results. Higher scores are better for VolanoMark and Soap Stone, while lower scores indicate better performance for DaCapo and Hammerora. Online resources for monitoring performance and learning more about the benchmarks are also provided.
Anti patterns in Hadoop Cluster deploymentSunil Govindan
Rohith Sharma, Naganarasimha, and Sunil presented on Hadoop cluster configurations and anti-patterns. They discussed sample node manager configurations with high resources, related YARN and MapReduce resource tuning settings, and anti-patterns like not configuring container heap size properly leading to out of memory errors. They also covered YARN capacity scheduler queue planning best practices like queue mapping, preemption, user limits, and application priority to improve cluster utilization.
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Kristofferson A
The document discusses mining the Automatic Workload Repository (AWR) in Oracle databases for capacity planning, visualization, and other real-world uses. It introduces Karl Arao as a speaker and discusses topics he will cover including AWR, diagnosing performance issues using AWR data, visualization of AWR data, capacity planning, and tools for working with AWR data like scripts and linear regression. References and resources on working with AWR are also provided.
Hotspot Garbage Collection - The Useful PartsjClarity
The document discusses garbage collection in the Java HotSpot virtual machine. It covers the basics of garbage collection theory, HotSpot's memory organization and different collectors. The presentation also discusses how to read and analyze GC logs to understand application performance and identify issues like memory leaks or premature promotion.
This document provides an overview of a Hadoop cluster deployment and configuration best practices from Rohith Sharma, Naganarasimha, and Sunil. It discusses:
1. Examples of YARN resource configurations for high-end node managers with 64GB RAM, 8-16 CPU cores, and 100TB of disk space.
2. Common YARN and MapReduce configuration parameters to tune resources like memory, CPU, and I/O.
3. Anti-patterns related to container memory allocation, long shuffle phases in MapReduce, and RM restarts impacting performance.
4. Best practices for queue configuration, capacity planning, user limits, and application priorities to improve cluster utilization.
What's New in Postgres Plus Advanced Server 9.3EDB
Learn more about EnterpriseDB's Postgres Plus Advanced Server 9.3!
Highlights of Postgres Plus Advanced Server 9.3 include:
Major Partitioning Enhancements
Materialized Views
New RPM packages
New EDB Failover Manager
New capabilities in Postgres Enterprise Manager 4.0
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Naganarasimha Garla
Distributed resource scheduling frameworks like Kubernetes, Mesos, YARN, and Swarm each take different architectural approaches to scheduling resources across a cluster. The document provides an overview of each framework's architecture, key features related to scheduling like priority, isolation, and support for multiple container types. It also compares the frameworks based on functional attributes such as resource granularity, scheduler support, oversubscription, and support for isolation and applications.
This is part one of my Monitoring Distributed Apps series.
Here we explore premises of Distributed Application monitoring focusing on metrics, why do we need them and gradually introducing Prometheus as a solution.
The video recording is available here: https://youtu.be/lvogDmRN-Hs
This is a copy of the NoSQL Day 2019 session presented in Washington D.C on May 2019. It covers a series of the most common HBase issues observed among Cloudera customer base, together with RCA and recipes for recovery.
This document discusses the timeline server which collects and stores application metrics and event data in YARN. It describes the limitations of the original job history server and application history server, which only supported MapReduce jobs and did not capture YARN-level data. The timeline server versions 1 and 2 are presented as improved solutions, with version 2 focusing on distributed and reliable storage in HBase, a new data model to support arbitrary application types, and online aggregation of metrics.
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)Kristofferson A
This document discusses resource management in Oracle databases. It begins with an introduction of the speaker and his company, Accenture Enkitec Group. It then covers various aspects of resource management including the consolidation and resource management lifecycle, new features in Oracle 12c such as instance caging and threaded execution, barriers to adopting resource management, and a systematic approach to implementing resource management. Real-world scenarios are also discussed.
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Michael Pirker
Latency problem was reported for VDisk CA-CL1-Disk04-N at 02/05/15 8:09,
The environment are two clusters connected with Metro Mirror. The first aim of this document is to show how we found the root cause of this problem in the link between the two clusters.
The second aim of this document is to describe how the root cause for this problem was found by using the BVQ structured performance problem analysis method. It demonstrates that successful analysis work needs a structured method and also a tool which supports this method and delivers the needed technical insight. We have the concept that everybody should be able to conduct a performance analysis. This is important because the level of service is lowered day by day and especially small customers are more and more reliant on their own skills or on the skills of their partners. This is a common problem occurring at all vendors!
This document discusses strategies for scaling MySQL infrastructure. It covers various backup options like physical backups using data file copies or snapshots, and logical backups using SQL dumps or CSV exports. It emphasizes the importance of consistency across versions, data, configurations, and operations. Security best practices are outlined like restricting superuser access and using SSL. Scaling is discussed in terms of read replicas, connection pooling, proxy usage, and topology managers. Automation, advanced monitoring, and separate developer environments are also recommended.
The document discusses lessons learned from setting up and maintaining a PostgreSQL cluster for a data analytics platform. It describes four stories where problems arose: 1) Implementing automatic failover using Repmgr when the master node failed, 2) The disk filling up faster than expected due to PostgreSQL's MVCC implementation, 3) Being unable to add a new standby node due to missing WAL segments, and 4) Long running queries on the standby node causing conflicts with replication. The key lessons are around using the right tools like Repmgr for replication management, tuning autovacuum, archiving WALs, and addressing hardware limitations for analytics workloads.
5 Best Practices for Monitoring Hive and MapReduce Application PerformanceDriven Inc.
Apache Hive queries and MapReduce jobs often experience performance issues and bottlenecks because of the multi-tenant nature of Hadoop and a lack of visibility into performance. There are 5 best practices that leading enterprises have used to gain control and reliably meet service levels.
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
MapQuest developed a search ahead feature for their mobile app to enable auto-complete searching across their large dataset. They used Solr and implemented various techniques to optimize performance, including custom routing, analysis during ETL, and extensive JVM tuning. Their architecture included multiple Solr clusters with different configurations. Through testing and monitoring, they were able to meet their sub-140ms response time requirement for queries.
This document summarizes a case study of using Apache Spark at scale to process over 60 TB of data at Facebook. It describes the previous implementation using Hive, which involved many small jobs that were unmanageable and slow. The new Spark implementation processes the entire dataset in a single job with two stages, shuffling over 90 TB of intermediate data. It provides performance comparisons showing significant reductions in CPU time, latency, and resource usage. It also details reliability and performance improvements made to Spark, such as fixing memory leaks, enabling seamless cluster restarts, and reducing shuffle write latency. Configuration tuning tips are provided to optimize memory usage and shuffle processing.
This document discusses Apache Tez, a framework for accelerating Hadoop query processing. Some key points:
- Tez is a dataflow framework that expresses computations as directed acyclic graphs (DAGs) of tasks, allowing for optimizations like container reuse and locality-aware scheduling.
- It is built on YARN and provides a customizable execution engine as well as APIs for applications like Hive and Pig.
- By expressing jobs as DAGs, Tez can reduce overheads, queueing delays, and better utilize cluster resources compared to the traditional MapReduce framework.
- The document provides examples of how Tez can improve performance for operations like joins, aggregations, and handling of multiple outputs
This document provides a summary of improvements made to Hive's performance through the use of Apache Tez and other optimizations. Some key points include:
- Hive was improved to use Apache Tez as its execution engine instead of MapReduce, reducing latency for interactive queries and improving throughput for batch queries.
- Statistics collection was optimized to gather column-level statistics from ORC file footers, speeding up statistics gathering.
- The cost-based optimizer Optiq was added to Hive, allowing it to choose better execution plans.
- Vectorized query processing, broadcast joins, dynamic partitioning, and other optimizations improved individual query performance by over 100x in some cases.
This document discusses performance monitoring in Hive and provides information on how to identify reasons for inefficient Hive queries. It suggests examining the Hive plan, Hive logs, MapReduce monitoring, and Hadoop job logs to find issues related to the Hive query, Hadoop/Hive configuration, data distribution strategy, or system problems. Sample output is shown for a Hive abstract syntax tree, stage dependencies, stage plans, and session logs to demonstrate how these sources can be analyzed.
Hive on spark is blazing fast or is it finalHortonworks
This presentation was given at the Strata + Hadoop World, 2015 in San Jose.
Apache Hive is the most popular and most widely used SQL solution for Hadoop. To keep pace with Hadoop’s increasingly vital role in the Enterprise, Hive has transformed from a batch-only, high-latency system into a modern SQL engine capable of both batch and interactive queries over large datasets. Hive’s momentum is accelerating: With Spark integration and a shift to in-memory processing on the horizon, Hive continues to expand the boundaries of Big Data.
In this talk the speakers examined Hive performance, past, present and future. In particular they looked at Hive’s origins as a petabyte scale SQL engine.
Through some numbers and graphs, they showed how Hive became 100x faster by moving beyond MapReduce, by vectorizing execution and by introducing a cost-based optimizer.
They detailed and discussed the challenges of scalable SQL on Hadoop.
The looked into Hive’s sub-second future, powered by LLAP and Hive on Spark.
And showed just how fast Hive on Spark really is.
Big Data visualization with Apache Spark and Zeppelinprajods
This presentation gives an overview of Apache Spark and explains the features of Apache Zeppelin(incubator). Zeppelin is the open source tool for data discovery, exploration and visualization. It supports REPLs for shell, SparkSQL, Spark(scala), python and angular. This presentation was made on the Big Data Day, at the Great Indian Developer Summit, Bangalore, April 2015
Performance Tuning Oracle Weblogic Server 12cAjith Narayanan
The document summarizes techniques for monitoring and tuning Oracle WebLogic server performance. It discusses monitoring operating system metrics like CPU, memory, network and I/O usage. It also covers monitoring and tuning the Java Virtual Machine, including garbage collection. Specific tools are outlined for monitoring servers like the WebLogic admin console, and command line JVM tools. The document provides tips for configuring domain and server parameters to optimize performance, including enabling just-in-time starting of internal applications, configuring stuck thread handling, and setting connection backlog buffers.
JVM and OS Tuning for accelerating Spark applicationTatsuhiro Chiba
1) The document discusses optimizing Spark applications through JVM and OS tuning. Tuning aspects covered include JVM heap sizing, garbage collection options, process affinity, and large memory pages.
2) Benchmark results show that after applying these optimizations, execution time was reduced by 30-50% for Kmeans clustering and TPC-H queries compared to the default configuration.
3) Dividing the application across multiple smaller JVMs instead of a single large JVM helped reduce garbage collection overhead and resource contention, improving performance by up to 16%.
Robby Morgan presented on Bazaarvoice's large-scale use of Solr. Bazaarvoice uses Solr to index over 250 million documents and handle up to 10,000 queries per second. They deployed Solr across multiple data centers for high availability. Key lessons included ensuring adequate RAM, simulating performance before large deployments, and challenges with cross-data center replication and schema changes. Overall, Solr provided fast search but real-time updates and elastic scaling required additional work.
Enterprise application performance - Understanding & LearningsDhaval Shah
This document discusses enterprise application performance, including:
- Performance basics like response time, throughput, and availability
- Common metrics like response time, transactions per second, and concurrent users
- Factors that affect performance such as software issues, configuration settings, and hardware resources
- Case studies where the author analyzed memory leaks, optimized services, and addressed an inability to meet non-functional requirements
- Learnings around heap dump analysis, hotspot identification, and database monitoring
This session brings to your attention how several millions of dollars are wasted and what you can do to save money. Optimizing garbage collection performance not only saves money, but also improves the overall customer experience as well.
How Adobe uses Structured Streaming at ScaleDatabricks
Adobe’s Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard earned lessons and as we reached this scale specifically with Structured Streaming.
Know thy Lag
While consuming off a Kafka topic which sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Reading Data In
Fan Out Pattern using minPartitions to Use Kafka Efficiently
Overload protection using maxOffsetsPerTrigger
More Apache Spark Settings used to optimize Throughput
MicroBatching Best Practices
Map() +ForEach() vs MapPartitons + forEachPartition
Adobe Spark Speculation and its Effects
Calculating Streaming Statistics
Windowing
Importance of the State Store
RocksDB FTW
Broadcast joins
Custom Aggegators
OffHeap Counters using Redis
Pipelining
This presentation was given to the system adminstration team to give them an idea of how GC works and what to look for when there is abottleneck and troubles.
The document discusses troubleshooting performance issues for SQL Server. It begins with an introduction and case study on the MS Society of Canada's website. It then discusses optimizing the environment, using Performance Monitor (PerfMon) to monitor performance, and concludes with recommendations to address issues like high CPU usage, slow disk speeds, and insufficient memory.
This session brings to your attention how several millions of dollars are wasted and what you can do to save money. Optimizing garbage collection performance not only saves money, but also improves the overall customer experience as well.
Trivadis TechEvent 2016 Capacity Management with TVD-CapMan - recent projects...Trivadis
TVD-CapMan is capacity management software that collects metrics on CPU, I/O, memory usage and other resources from Oracle databases. It analyzes the data to identify resource shortages and spare capacities, perform trend analysis and predictions, and make recommendations for database distribution and consolidation across host servers. The software was demonstrated through examples showing reports on metric trends over time, predictions, sizing recommendations, and visualizations of resource usage across a database environment.
NYC Java Meetup - Profiling and PerformanceJason Shao
A brief overview of some of the tools that ship with the Java platform that can be used to troubleshoot performance issues, and common production/performance problems
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
Apache Impala is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. Juan Yu demystifies the cost model Impala Planner uses and how Impala optimizes queries and explains how to identify performance bottleneck through query plan and profile and how to drive Impala to its full potential.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
Now that you have your apps running on K8s, wondering how to get the response time that you need ? Tuning a polyglot set of microservices to get the performance that you need can be challenging in Kubernetes. The key to overcoming this is observability. Luckily there are a number of tools such as Prometheus that can provide all the metrics you need, but here is the catch, there is so much of data and metrics that is difficult make sense of it all. This is where Hyperparameter tuning can come to the rescue to help build the right models.
This talk covers best practices that will help attendees
1. To understand and avoid common performance related problems.
2. Discuss observability tools and how they can help identify perf issues.
3. Look closer into Kruize Autotune which is a Open Source Autonomous Performance Tuning Tool for Kubernetes and where it can help.
Introduces important facts and tools to help you get starting with performance improvement.
Learn to monitor and analyze important metrics, then you can start digging and improving.
Includes useful munin probes, predefined SQL queries to investigate your database's performance, and a top 5 of the most common performance problems in custom Apps.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
Session ID: SFO17-307
Session Name: WALT vs PELT : Redux
- SFO17-307
Speaker: Pavan Kumar Kondeti
Track: LMG
★ Session Summary ★
New data on the comparison of the WALT and PELT load tracking schemes in the scheduler
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-307/
Presentation:
Video: https://www.youtube.com/watch?v=r3QKEYpyetU
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsJaime Crespo
Query optimization tutorial for Beginners using MySQL 8.0 and MariaDB 10.3 presented at the Open Source Database Percona Live Europe 2018 organized in Frankfurt. The source can be found and errors can be reported at https://github.com/jynus/query-optimization
Material URL moved to: http://jynus.com/dbahire/pleu18
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2mAKgJi.
Ian Nowland and Joel Barciauskas talk about the challenges Datadog faces as the company has grown its real-time metrics systems that collect, process, and visualize data to the point they now handle trillions of points per day. They also talk about how the architecture has evolved, and what they are looking to in the future as they architect for a quadrillion points per day. Filmed at qconnewyork.com.
Ian Nowland is the VP Engineering Metrics and Alerting at Datadog. Joel Barciauskas currently leads Datadog's distribution metrics team, providing accurate, low latency percentile measures for customers across their infrastructure.
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
3. @TwitterEng 3
Performance Tuning Overview
Top-Down Analysis
- Commonly used when you have the ability to change code at the highest level of the software stack.
1. Monitor target application under load
- `System level diagnostics
- JVM level diagnostics
2. Profile Application Under load
3. Identify bottlenecks, Analyze, and Optimize.
-Make code more efficient
-Reduce allocation rates
4. Repeat
Thursday, September 26, 13
4. @TwitterEng 4
Performance Tuning Overview
Bottom-Up Analysis
- Commonly used when you do not have the ability to change code at the highest level of the software stack.
- JVM and OS performance optimization is a common use case.
1. Monitor CPU-level statistics against target application under load
- Use hardware counters (cache misses, path level, etc)
- HW Profile and map to instructions, OS/JVM, and Scala/Java code
- Use tools when available, otherwise visual inspect assembly code
2. Manipulate static and runtime compilers to address code issues
- Missed optimizations
-Example: autobox elision
3. Manipulate javac / scala compiler
4. Manipulate core platform libraries
5. Identify issues at higher level of the application stack
6. Repeat
Thursday, September 26, 13
10. @TwitterEng 10
Choosing the Right Metrics
Identify Metrics
- What’s important to your users
- What influences your bottom line?
- What are you willing to trade off?
Define Success
- If its not broken .... Don’t fix it.
- Perfect is the enemy of done.
Thursday, September 26, 13
11. @TwitterEng 11
Choosing the Right Metrics
We want it all!
-High Throughput
-Fast response times
-Small footprint
But …
-There’s no free lunch.
Choose your metrics wisely
-Target metrics that impact your customers first
Use Statistics!
- High variability can render some metrics useless
Thursday, September 26, 13
12. @TwitterEng 12
Throughput Metrics
Transactions per Second (TPS)
- # of Transactions / Time
- Aka pages/sec, queries/sec, hits/sec
-Good measure of top end performance
Average Response time
-Inverse of TPS
-Time / #Transactions
-Sometimes a rolling average.
CPU utilization
-Measure of computation efficiency
-Good for capacity planning, not for development regression testing (new features
can increase work).
Thursday, September 26, 13
13. @TwitterEng 13
Latency Metrics
Maximum response time
- Worst case
99% response time
- Drops a few outliers
90% response time
- May drop too many outliers and give a false sense of security
Critical Injection Rate
- Critical jOPs in SPECjbb2013
- Achievable throughput under response time SLA
Not Average Response Time
Thursday, September 26, 13
14. @TwitterEng 14
Memory Footprint Metrics
Heap size after Full GC (Live Data Size) Upcoming slide
Native process size
- # ps aux PID
Static footprint
- Size of application binary
- Size of .jar
- Why does it mater?
- download/deployment speed
-update/refresh speed
Thursday, September 26, 13
16. @TwitterEng 16
JVM Tuning Basics
Track size of Old Generation after Full GCs
[GC 435426K->392697K(657920K), 0.1411660 secs]
[Full GC 392697K->390333K(927232K), 0.5547680 secs]
[GC 625853K->592369K(1000960K), 0.1852460 secs]
[GC 831473K->800585K(1068032K), 0.1707610 secs]
[Full GC 800585K->798499K(1456640K), 1.9056030 secs]
Calculating Live Data Size
Thursday, September 26, 13
17. @TwitterEng 17
JVM Tuning Basics
Track size of Old Generation after Young GCs if no Full GC events occur
2013-09-10T05:39:03.489+0000: [GC[ParNew: 11766264K-
>18476K(13212096K), 0.0326070 secs] 12330878K-
>583306K(16357824K), 0.0327090 secs] [Times: user=0.48
sys=0.01, real=0.03 secs]
2013-09-10T05:42:54.666+0000: [GC[ParNew: 11762604K-
>20088K(13212096K), 0.0270110 secs] 12327434K-
>585068K(16357824K), 0.0271140 secs] [Times: user=0.39
sys=0.00, real=0.02 secs]
2013-09-10T05:46:41.623+0000: [GC[ParNew: 11764216K-
>21013K(13212096K), 0.0267490 secs] 12329196K-
>586133K(16357824K), 0.0268490 secs] [Times: user=0.40
sys=0.00, real=0.03 secs]
Calculating Live Data Size
Thursday, September 26, 13
18. @TwitterEng 18
JVM Tuning Basics
Size of Old Generation
-Good starting point: 2X size of live data at steady state.
-If object promotion rate causes frequent CMS cycles, increase size of the old
generation
-If live data size is 5GB, starting point should be ~10GB.
- Old Generation size alone.
- Set –Xms and –Xmx to same value
- Nobody really needs extra Full GC pauses
Young and Old Generation Sizing
Thursday, September 26, 13
19. @TwitterEng 19
JVM Tuning Basics
Size of Young Generation
- Young gen = Old gen is a good starting point.
- Young generation size should increase with allocation rate
- Sometimes 2-3x larger than Old Gen
- Young GC times dominated by copying of live objects to Survivor spaces, not
size of overall Young Generation
- Size so that most objects die in Young Generation
- Higher Allocation rates -> Larger Young Generation
Young and Old Generation Sizing
Thursday, September 26, 13
20. @TwitterEng 20
JVM Tuning Basics
Example Enterprise Application
- Significant application state
- In memory cache cache size: 3.5GB
- Overall Live data size: 4GB
- High allocation rate of transient data
-Most objects die in large young generation
- Suggested Initial Heap Size Suggestion
--Xms16g -Xmx16g -Xmn8g
Young and Old Generation Sizing
Thursday, September 26, 13
21. @TwitterEng 21
JVM Tuning Basics
Throughput
--XX:+UseParallelOldGC
Low server response times?
- CMS
- Older technology
- Can be highly tuned, but tuning can be brittle
- -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
- G1
- Current development focus
- Young GC times slower than CMS
- -XX:+UseG1GC
Choosing a Garbage Collector
Thursday, September 26, 13
26. @TwitterEng 26
Tuning for Latency
Enable CMS
- -XX:+UseConcMarkSweepGC
Good to have
--XX:+CMSScavengeBeforeRemark
- -XX:+ParallelRefProcEnabled
--XX:CMSInitiatingOccupancyFraction=70
Start with Basic Tuning Guidelines
- -XX:+PermSize256m -XX:MaxPermSize=256m
- Old Gen Size is 2X Live Data Size
- Young Gen Size = Old Gen Size
Using CMS
Thursday, September 26, 13
27. @TwitterEng 27
Tuning for Latency
General rules of thumb
-Increase young gen. size to handle higher allocation rates.
- Increase young gen size if promotion rate high
- May suffer from premature promotion, i.e. promotions
from too frequent young GC.
-Larger young gen decreases GC frequency, and gives
more time for objects to die.
-Increase Old Gen size if promotion rate is still high, avoid
allocation and concurrent mode failures
Using CMS
Thursday, September 26, 13
28. @TwitterEng 28
Tuning for Latency
CMS Tuned for Latency
-Xmx18g -Xms18g –XX:PermSize=256m
-XX:MaxPermSize=256M -XX:+CMSScavengeBeforeRemark
-XX:-OmitStackTraceInFastThrow -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-XX:SurvivorRatio=6 -XX:NewSize=8g
-XX:MaxNewSize=8g –verbosegc
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
Note: Increased Young Gen Size, Survivor Ratio Tuning
Using CMS
Thursday, September 26, 13
29. @TwitterEng 29
Tuning for Latency
Enable G1
-XX:+UseG1GC –XX:MaxGCPauseMillis=100
- Start with just overall heap size and target pause time.
- Increase Young Generation Size for High Allocation
- Tune to keep remembered set processing low
Using G1GC
Thursday, September 26, 13
30. @TwitterEng 30
Tuning for Latency
G1 Tuning to Consider
-XX:InitiatingHeapOccupancyPercent=90
–XX:G1MixedGCLiveThresholdPercent: The occupancy threshold
of live objects in the old region to be included in the mixed collection.
–XX:G1HeapWastePercent: The threshold of garbage that you can
tolerate in the heap.
–XX:G1MixedGCCountTarget: The target number of mixed garbage
collections within which the regions with at most
G1MixedGCLiveThresholdPercent live data should be collected.
–XX:G1OldCSetRegionThresholdPercent: A limit on the max
number of old regions that can be collected during a mixed collection.
Reference: Monica Beckwith’s InfoQ article:
“G1: One Garbage Collector To Rule Them All“
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-
All
Using G1GC
Thursday, September 26, 13
31. @TwitterEng 31
Tuning for Latency
G1GC Tuned for Latency
- -XX:+TieredCompilation –XX:InitialCodeCacheSize=256m
–XX:ReservedCodeCacheSize=256m -Xmx18g -Xms18g
–XX:PermSize=256m -XX:MaxPermSize=256M --XX:+UseG1GC
–XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=90
-XX+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
Note: MaxGCPauseMillis biggest tuning knob.
Don’t start with CMS Tuning!
Using G1GC
Thursday, September 26, 13
33. @TwitterEng 33
Enable ParallelOldGC
--XX:+UseParallelOldGC
Old Gen needs to be 2-4X live data size (LDS)
Young generation should be ¾ the heap
Often used when tuning for throughput
--XX:+AggressiveOpts
--XX:+TieredCompilation
Disabling adaptive sizing and tuning survivor spaces directly.
- -XX:-AdaptiveSizePolicy -XX:SurvivorRatio=7
-XX:TargetSurvivorRatio=90
Using ParallelOldGC
Tuning for Throughput
Thursday, September 26, 13
34. @TwitterEng 34
Tuning for Throughput
ParallelOldGC tuned for Throughput:
-showversion -server -XX:-UseBiasedLocking
-XX:LargePageSizeInBytes=2m -XX:+AlwaysPreTouch
-XX:+UseLargePages -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+UseLargePages
-Xms29g -Xmx29g -Xmn27g -XX:+UseParallelOldGC
-XX:ParallelGCThreads=24 -XX:SurvivorRatio=16
-XX:TargetSurvivorRatio=90 -XX:-UseAdaptiveSizePolicy
-XX:+AggressiveOpts -XX:InitialCodeCacheSize=160m -
XX:ReservedCodeCache=160m -XX:+TieredCompilation
Using ParallelOldGC
Thursday, September 26, 13
35. @TwitterEng 35
Enable G1
--XX:+UseG1GC
Old Gen needs to be 2X live data size (LDS)
Young generation should be ¾ the heap
Often used when tuning for throughput
--XX:+AggressiveOpts
--XX:+TieredCompilation
Using G1GC
Tuning for Throughput
Thursday, September 26, 13
36. @TwitterEng 36
Tuning for Throughput
G1GC tuned for throughput:
-showversion -server -XX:-UseBiasedLocking
-XX:LargePageSizeInBytes=2m -XX:+AlwaysPreTouch
-XX:+UseLargePages -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+UseLargePages
-Xms28g -Xmx28g -Xmn21g -XX:+UseG1GC
-XX:+AggressiveOpts
-XX:InitialCodeCacheSize=160m -XX:ReservedCodeCache=160m
-XX:+TieredCompilation
Using G1GC
Thursday, September 26, 13
37. @TwitterEng 37
Enable CMS, and tune for throughput
--XX:+UseParNewGC -XX:+UseConcMarkSweepGC
- Configure heap to avoid promotion
- Application design should separate stateful and stateless components
to allow targeted tuning.
Young generation should be ¾ the heap
- Young generation should be size to ensure nearly all objects
die young.
- Very large heaps, very large old generation
- Use memory to avoid the need for Full GC.
Tuning survivor spaces manually, etc.
- -XX:SurvivorRatio=7 -XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
Using CMS
Tuning for Throughput
Thursday, September 26, 13
38. @TwitterEng 38
Tuning for Throughput
CMS Tuned for Throughput
-Xmx18g -Xms18g –XX:PermSize=256m
-XX:MaxPermSize=256M -XX:+CMSScavengeBeforeRemark
-XX:-OmitStackTraceInFastThrow -XX:+UseAggressiveOpts
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=90
-XX:+UseCMSInitiatingOccupancyOnly
-XX:SurvivorRatio=6 -XX:NewSize=16g
-XX:MaxNewSize=16g –verbosegc
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:InitialCodeCacheSize=160m -XX:ReservedCodeCache=160m
-XX:+TieredCompilation
Using CMS
Thursday, September 26, 13
40. @TwitterEng 40
Enable ParallelOldGC
--XX:+UseParallelOldGC
Old Gen needs to be 2X live data size (LDS)
Young generation should start at 1/2 the Old Generation size.
Strategy is to reduce young and old GC sizes independently
until a maximum acceptable end user response time is met.
Definitely not low-pause. Trading higher response times, for
lower footprint and lower throughput.
Using ParallelOldGC
Tuning for Footprint
Thursday, September 26, 13
41. @TwitterEng 41
Tuning for Footprint
ParallelOldGC tuned for Footprint
-showversion -server -XX:LargePageSizeInBytes=2m
-XX:+UseLargePages -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+UseLargePages
-Xms8g -Xmx8g -Xmn4g -XX:+UseParallelOldGC
-XX:-UseAdaptiveSizePolicy -XX:+AggressiveOpts
–XX:PermSize=256m -XX:MaxPermSize=256M
Using ParallelOldGC
Thursday, September 26, 13
42. @TwitterEng 42
Enable G1
--XX:+UseG1GC
Heap should be 3x live data size (LDS)
-Do not tune the size of the young generation
-Allow G1 to adapt the size
- Tune only after observer minimum size according to G1
Increase the Pause Target to decrease GC overhead
--XX:MaxGCPauseMillis=400
Strategy is to reduce young and old GC sizes independently
until a maximum acceptable end user response time is met.
Using G1GC
Tuning for Footprint
Thursday, September 26, 13
43. @TwitterEng 43
Tuning for Footprint
G1 Tuned for Footprint
-showversion-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xms12g -Xmx12g -XX:+UseG1GC -XX:InitialCodeCacheSize=160m
-XX:ReservedCodeCache=160m
Using G1GC
Thursday, September 26, 13
44. @TwitterEng 44
Enable CMS, and tune for throughput
--XX:+UseParNewGC -XX:+UseConcMarkSweepGC
Old Gen needs to be 2X live data size (LDS)
Young generation should start at 1/2 the Old Generation size.
- Young generation should be sized so “enough” objects die in
the old generation to reduce the pressure on CMS
- Promotion rate needs to be low enough so CMS concurrent
threads don’t loose the race (ConcurrentMode Failures)
Strategy is to reduce young and old GC sizes independently
until a maximum acceptable end user response time is met.
-Young Generation first, then OldGen.
Using CMS
Tuning for Footprint
Thursday, September 26, 13
45. @TwitterEng 45
Tuning for Footprint
Example of a highly tuned CMS deploy for throughput:
-Xmx12g -Xms12g -Xmn4g –XX:PermSize=256m
-XX:MaxPermSize=256M -XX:+CMSScavengeBeforeRemark
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=60
-XX:SurvivorRatio=6 –verbosegc
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
Note: Increased Young Gen Size, Survivor Ratio Tuning
Using CMS
Thursday, September 26, 13
47. @TwitterEng 47
Common Performance Issues
Size of Permanent Generation
- Perm. Gen. only collects and resizes at Full GC.
Heap before GC invocations=40019 (full 36522):
par new generation total 15354176K, used 14K [0x00000003b9c00000, 0x0000000779c00000,
0x0000000779c00000)
eden space 14979712K, 0% used [0x00000003b9c00000, 0x00000003b9c039a8, 0x000000074c0a0000)
from space 374464K, 0% used [0x000000074c0a0000, 0x000000074c0a0000, 0x0000000762e50000)
to space 374464K, 0% used [0x0000000762e50000, 0x0000000762e50000, 0x0000000779c00000)
concurrent mark-sweep generation total 2097152K, used 588343K [0x0000000779c00000, 0x00000007f9c00000,
0x00000007f9c00000)
concurrent-mark-sweep perm gen total 102400K, used 102399K [0x00000007f9c00000, 0x0000000800000000,
0x0000000800000000)
2013-09-05T17:21:39.530+0000: [Full GC[CMS: 588343K->588343K(2097152K), 1.6166150 secs] 588357K-
>588343K(17451328K), [CMS Perm : 102399K->102399K(102400K)], 1.6167040 secs] [Times: user=1.57 sys=0.00,
real=1.61 secs]
Heap after GC invocations=40020 (full 36523):
par new generation total 15354176K, used 0K [0x00000003b9c00000, 0x0000000779c00000,
0x0000000779c00000)
eden space 14979712K, 0% used [0x00000003b9c00000, 0x00000003b9c00000, 0x000000074c0a0000)
from space 374464K, 0% used [0x000000074c0a0000, 0x000000074c0a0000, 0x0000000762e50000)
to space 374464K, 0% used [0x0000000762e50000, 0x0000000762e50000, 0x0000000779c00000)
concurrent mark-sweep generation total 2097152K, used 588343K [0x0000000779c00000, 0x00000007f9c00000,
0x00000007f9c00000)
concurrent-mark-sweep perm gen total 102400K, used 102399K [0x00000007f9c00000, 0x0000000800000000,
0x0000000800000000)
}
Recommendation: -XX:PermSize=256m –XX:MaxPermSize=256m
In Enterprise Software
Thursday, September 26, 13
48. @TwitterEng 48
Common Performance Issues
Size of Code Cache
- Default size is 64mb, 96mb if running TieredCompilation
- Enterprise Applications have lots of code
Aggressively Tune to Avoid Issue
-Tuning Without Using TieredCompilation
- -XX:InitialCodeCacheSize=128m
-XX:ReservedCodeCacheSize=128m
- Tuning With Using TieredCompilation
- -XX:InitialCodeCacheSize=256m
-XX:ReservedCodeCacheSize=256m
In Enterprise Software
Thursday, September 26, 13
50. @TwitterEng 50
What’s up with Twitter and JDK Development?
Twitter runs Java + Scala on the HotSpot JVM
- Most Highly Optimized Managed Runtime
-Open source :-)
- Massive performance gains moving technologies
Own and Optimize our Platform
- Build out diagnostic tools
- Build, test, and deploy OpenJDK
- Optimize HotSpot Runtime Compilers for Scala, etc.
- Tailored GC for Twitter’s needs
-extremely low latency requirements ( < 10ms)
@TwitterJDK
Thursday, September 26, 13
51. @TwitterEng 51
What’s up with Twitter and JDK Development?
Contribute Back to the Community
- Working closely with Oracle Java Development
- Collaborating with Other OpenJDK contributors
- Posting tools to Github and OpenJDK repositories
Interesting isn’t it?
- We’re just ramping up now.
- Follow us soon: @TwitterJDK (new idea)
- Follow me at: @dagskeenan
- #jointheflock
@TwitterJDK
Thursday, September 26, 13