1. Install the dhcp rpm package and configure the dhcpd service to run on a specific network interface such as eth1.
2. Copy the sample dhcpd configuration file and modify it to suit your network settings such as IP ranges, DNS servers, and lease times. You can provide static IP addresses by specifying client hardware addresses or names.
3. Start the dhcpd service and test that DHCP is functioning properly by requesting IP addresses with client devices on the configured network interface. Restart dhcpd if the configuration file is modified.
The document discusses resource overbooking and application profiling techniques in shared hosting platforms. It proposes using application profiling to determine resource needs and then provisioning resources through controlled overbooking to improve platform utilization while still meeting performance guarantees. Dynamic variations in workload are handled through online profiling and reacting by recomputing allocations or moving applications as needed. Experimental results show that small amounts of overbooking can yield large gains in throughput and that diverse application requirements impact the effectiveness of different capsule placement algorithms.
What do data center operators need to know when deploying Hadoop in the Data Center? Multi-tenancy, network topology, workload types, and myriad other factors affect the way applications run and perform in the data center. Understanding performance characteristics of the distributed system is key to not only optimize for Hadoop, but allows Hadoop to seamlessly operate side-by-side existing applications.
At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by access pattern. This allows us to tune the access policy, SLA, and configuration for each cluster. We will explain our various use-cases, their performance requirements, and operational considerations and how those are served by the corresponding clusters. We will discuss what our baseline Hadoop node looks like. Various, sometimes competing, considerations such as storage size, disk IO, CPU throughput, fewer fast cores versus many slower cores, 1GE bonded network interfaces versus a single 10 GE card, 1T, 2T or 3T disk drives, and power draw all need to be considered in a trade-off where cost and performance are major factors. We will show how we have arrived at quite different hardware platforms at Twitter, not only saving money, but also increasing performance.
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
This document discusses scalable network designs and how modern networks can help applications. It begins with a brief history of network software and describes how switches now run Linux. Typical network designs are presented starting small and scaling up through multiple racks and core switches. The benefits of layer 3 designs, jumbo frames, and deep buffers to prevent packet loss are covered. Finally, it discusses how the network can help applications by detecting server failures, redirecting traffic, and enabling fast failover through features only possible by the switch running Linux.
Performance Analysis and Troubleshooting Methodologies for DatabasesScyllaDB
Have you heard about the USE Method (Utilization - Saturation - Errors), RED (Rate - Errors - Duration), or Golden Signals (Latency - Traffic - Errors - Saturations)?
In this presentation, we will talk briefly about these different, but similar “focuses” and discuss how we can apply them to the data infrastructure performance analysis troubleshooting, and monitoring.
We will use MySQL as an example but most of the talk will apply to other database technologies as well.
Outline to use if needed.
- Introduce the Challenge of Troubleshooting by Random Googling (1min)
- Introduce USE Method, how it applies to databases (5 min)
- Introduce RED Method, how it applies to databases (5 min)
- Introduce Golden Signals (4 min)
- Provide a High-Level Comparison of Methods as a takeaway (4 min).
This document discusses strategies for reducing the mean time to recovery (MTTR) in HBase to below 1 minute. It outlines how HBase recovery works and key components involved. Some techniques discussed to reduce MTTR include faster failure detection by lowering Zookeeper timeouts, improving parallelism in region reassignment, and rewriting the data recovery process in HBase 0.96. However, the document notes that high MTTR is often due to downtime from HDFS data replication when a datanode fails along with a regionserver.
Key considerations in productionizing streaming applicationsKafkaZone
This document discusses key considerations for productionizing streaming applications. It provides an overview of streaming concepts and Spark structured streaming. It discusses how to choose a streaming engine based on factors like latency, throughput, and functionality requirements. Spark structured streaming is recommended due to its low latency, high throughput, and developer-friendly APIs. The document also discusses using Spark Lens to monitor streaming jobs and make recommendations to meet performance goals like trigger intervals. It provides examples of experimenting with different cluster configurations and analyzing the results with Spark Lens.
1. Install the dhcp rpm package and configure the dhcpd service to run on a specific network interface such as eth1.
2. Copy the sample dhcpd configuration file and modify it to suit your network settings such as IP ranges, DNS servers, and lease times. You can provide static IP addresses by specifying client hardware addresses or names.
3. Start the dhcpd service and test that DHCP is functioning properly by requesting IP addresses with client devices on the configured network interface. Restart dhcpd if the configuration file is modified.
The document discusses resource overbooking and application profiling techniques in shared hosting platforms. It proposes using application profiling to determine resource needs and then provisioning resources through controlled overbooking to improve platform utilization while still meeting performance guarantees. Dynamic variations in workload are handled through online profiling and reacting by recomputing allocations or moving applications as needed. Experimental results show that small amounts of overbooking can yield large gains in throughput and that diverse application requirements impact the effectiveness of different capsule placement algorithms.
What do data center operators need to know when deploying Hadoop in the Data Center? Multi-tenancy, network topology, workload types, and myriad other factors affect the way applications run and perform in the data center. Understanding performance characteristics of the distributed system is key to not only optimize for Hadoop, but allows Hadoop to seamlessly operate side-by-side existing applications.
At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by access pattern. This allows us to tune the access policy, SLA, and configuration for each cluster. We will explain our various use-cases, their performance requirements, and operational considerations and how those are served by the corresponding clusters. We will discuss what our baseline Hadoop node looks like. Various, sometimes competing, considerations such as storage size, disk IO, CPU throughput, fewer fast cores versus many slower cores, 1GE bonded network interfaces versus a single 10 GE card, 1T, 2T or 3T disk drives, and power draw all need to be considered in a trade-off where cost and performance are major factors. We will show how we have arrived at quite different hardware platforms at Twitter, not only saving money, but also increasing performance.
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
This document discusses scalable network designs and how modern networks can help applications. It begins with a brief history of network software and describes how switches now run Linux. Typical network designs are presented starting small and scaling up through multiple racks and core switches. The benefits of layer 3 designs, jumbo frames, and deep buffers to prevent packet loss are covered. Finally, it discusses how the network can help applications by detecting server failures, redirecting traffic, and enabling fast failover through features only possible by the switch running Linux.
Performance Analysis and Troubleshooting Methodologies for DatabasesScyllaDB
Have you heard about the USE Method (Utilization - Saturation - Errors), RED (Rate - Errors - Duration), or Golden Signals (Latency - Traffic - Errors - Saturations)?
In this presentation, we will talk briefly about these different, but similar “focuses” and discuss how we can apply them to the data infrastructure performance analysis troubleshooting, and monitoring.
We will use MySQL as an example but most of the talk will apply to other database technologies as well.
Outline to use if needed.
- Introduce the Challenge of Troubleshooting by Random Googling (1min)
- Introduce USE Method, how it applies to databases (5 min)
- Introduce RED Method, how it applies to databases (5 min)
- Introduce Golden Signals (4 min)
- Provide a High-Level Comparison of Methods as a takeaway (4 min).
This document discusses strategies for reducing the mean time to recovery (MTTR) in HBase to below 1 minute. It outlines how HBase recovery works and key components involved. Some techniques discussed to reduce MTTR include faster failure detection by lowering Zookeeper timeouts, improving parallelism in region reassignment, and rewriting the data recovery process in HBase 0.96. However, the document notes that high MTTR is often due to downtime from HDFS data replication when a datanode fails along with a regionserver.
Key considerations in productionizing streaming applicationsKafkaZone
This document discusses key considerations for productionizing streaming applications. It provides an overview of streaming concepts and Spark structured streaming. It discusses how to choose a streaming engine based on factors like latency, throughput, and functionality requirements. Spark structured streaming is recommended due to its low latency, high throughput, and developer-friendly APIs. The document also discusses using Spark Lens to monitor streaming jobs and make recommendations to meet performance goals like trigger intervals. It provides examples of experimenting with different cluster configurations and analyzing the results with Spark Lens.
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
The document summarizes a presentation on using persistent memory technology with Apache HBase. It describes the current HBase model using volatile memory and write-ahead logs, as well as persistent memory technology capabilities. The presentation proposes using persistent memory with HBase to store data non-volatility, eliminate write-ahead logs for faster performance, and increase availability through replica regions in memory. It shows performance test results finding over 2x higher throughput and up to 104x lower maximum latency without write-ahead logs when using persistent memory with HBase.
MapR clusters disks into storage pools for data distribution. By default, storage pools contain 3 disks each. The mrconfig command can be used to create, remove, and manage storage pools and disks. Each node supports up to 36 storage pools. Zookeeper should always be started before other services and is critical for high availability. Logs are centrally stored for 30 days by default and can be configured through yarn-site.xml.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
M|18 Where and How to Optimize for PerformanceMariaDB plc
The document discusses optimizing performance in MariaDB. It defines key performance metrics like throughput, response time, and scalability. It recommends identifying where time is spent through profiling, using benchmarks with a real dataset and environment, improving hardware, optimizing queries and indexes, tuning the query cache, temporary tables, and file sorting. Other tips include proper data types, normalization/denormalization, index design, and server configuration best practices like buffer sizes, table caches, and using InnoDB features like the adaptive hash index. Encoding is also discussed.
Software Architecture Reconstruction: Why What and HowMehdi Mirakhorli
Every system is a legacy system, the moment a programmer writes a line of code it becomes a legacy. Therefore in even relatively new systems similar to long lived systems, developers are faced with a body of code that they need to understand, and from which they need to extract architectural knowledge. Unfortunately, anecdotal evidence has shown that such knowledge tends to be tacit in nature, stored in the heads of people, and inconsistently scattered across various software artifacts and repositories. Furthermore, architectural knowledge vaporizes over time. Given the size, complexity, and longevity of many projects, developers therefore often lack a comprehensive knowledge of architectural design decisions and consequently make changes in the code that inadvertently degrade the underlying design and compromise its qualities.
This technical briefing will answer three fundamental questions about software architecture recovery: Why? What? and How? Through several examples it articulates and synthesizes technical forces and financial motivations that make software companies to invest in software architecture recovery. It discusses “what” are the pieces of design knowledge that can be recovered and lastly demonstrates a methodology as well as required tools for answering “how” to reconstruct architecture from implementation artifacts.
C* Summit EU 2013: The Cassandra Experience at Orange DataStax Academy
Speaker: Jean Armel Luce — Senior Software Engineer/Cassandra Admin at Orange
Video: http://www.youtube.com/watch?v=mefOE9K7sLI&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=28
At Orange, Jean Armel has helped develop an open source tool for the migration of data to Cassandra; Jean and his team were in need of the NoSQL solution Apache Cassandra in order to sustain the growth of requests and volume of data required by their application PnS. In this session, Jean Armel will start out with an overview of the Orange application PnS and dive into why they chose Apache Cassandra how they did their data migration without any interruption of service. Jean Armel will also show how his application behaves after the migration
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at XiaomiMichael Stack
This document discusses optimizations made to HDFS for Hbase at XiaoMi. It addresses issues like shared memory allocation causing full GC in datanodes, listen drops on SSD clusters causing delays, peer cache bucket adjustment, and connection timeouts. Changes like preallocating shared memory, increasing socket backlog, reducing client and datanode timeouts, and adjusting the datanode dead node detection help improve performance and availability. The overall goal is to maintain local data, return fast responses from HDFS to Hbase, and reduce GC overhead from both systems.
Computing Performance: On the Horizon (2021)Brendan Gregg
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
Effectively Migrating to Cassandra from a Relational DatabaseTodd McGrath
Cassandra kicks butt. But, how do enjoy the benefits when refactoring an existing RDBMS based app? How can we migrate to Cassandra and test our data model, prepare our operations while simultaneously leaving our RDBMS online? What do we need to consider? How can we do it? Why should we do it?
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Michael Pirker
Latency problem was reported for VDisk CA-CL1-Disk04-N at 02/05/15 8:09,
The environment are two clusters connected with Metro Mirror. The first aim of this document is to show how we found the root cause of this problem in the link between the two clusters.
The second aim of this document is to describe how the root cause for this problem was found by using the BVQ structured performance problem analysis method. It demonstrates that successful analysis work needs a structured method and also a tool which supports this method and delivers the needed technical insight. We have the concept that everybody should be able to conduct a performance analysis. This is important because the level of service is lowered day by day and especially small customers are more and more reliant on their own skills or on the skills of their partners. This is a common problem occurring at all vendors!
The document provides an overview of initiatives and findings related to improving backup and restore processes at Century Link. Key accomplishments include developing health check scripts for TSM servers to reduce downtime and improve backup success rates. New findings identified opportunities to streamline TSM tape storage utilization by removing expired nodes and eliminating duplicate data to save on tapes. Issues were also found with some TSM server storage pools not having offsite copies. Suggested changes to address issues included evaluating backup tools, server configurations, and separating test/dev environments. Help was requested to approve removing expired and duplicate data.
This document provides instructions for completing a lab exercise on advanced port scanning using Nmap and Metasploit. Students will learn how to identify open ports and services on a target machine, the Metasploitable VM. The document outlines tasks to perform Nmap scans to discover the target IP address, identify open ports and services, perform version scanning, and clean up the host list. Screenshots are requested to support answers.
Performance Problems with iSeries Applicationsmboadway
In this Webcast, iSeries performance expert Mike Boadway of MB Software, used your enrollment survey responses to drive the focus of the content. Many performance problems start with applications and jobs that create a lot of unnecessary work for your AS/400 for iSeries...you'll learn why it's important to solve the underlying software issues of any performance problem before you go out and spend a lot of money on new hardware.
This document discusses various metrics for monitoring MySQL database performance, including hard and soft capacity limits, availability, task performance, resource performance, and scalability. It provides examples of top 10 metric lists and Nagios plugins for MySQL monitoring. It recommends monitoring work getting done, desirable metrics like utilization and response times, and avoiding non-actionable metrics like cache hit ratios. Alerts are best when they combine multiple critical issues rather than single isolated problems.
This document summarizes a project to replace aging NetApp storage devices and improve backup capabilities. It outlines performance details of current systems, options for new vSeries devices with increased throughput and capacity, and an improved backup solution using Snapshots. The solution would provide new NetApp filers in the PPMC and Quincy data centers, optionally replace the Sabey filer, and allow maintaining 90 days of Snapshots across sites rather than relying on tapes for backups.
Cloudera’s performance engineering team recently completed a new round of benchmark testing based on Impala 2.5 and the most recent stable releases of the major SQL engine options for the Apache Hadoop platform, including Apache Hive-on-Tez and Apache Spark/Spark SQL. This presentation explains the methodology and results.
This document discusses various techniques for improving cache performance, including reducing the miss rate and miss penalty. It describes reducing misses through larger block sizes, higher associativity, victim caches, and prefetching. It also covers reducing miss penalties via read priority on misses, non-blocking caches, and adding a second level cache. The goal is to improve CPU performance by lowering the miss rate, miss penalty, and time to hit in the cache.
Managing Apache Spark Workload and Automatic OptimizingDatabricks
eBay is highly using Spark as one of most significant data engines. In data warehouse domain, there are millions of batch queries running every day against 6000+ key DW tables, which contains over 22PB data (compressed) and still keeps booming every year. In machine learning domain, it is playing a more and more significant role. We have introduced our great achievement in migration work from MPP database to Apache Spark last year in Europe Summit. Furthermore, from the vision of the entire infrastructure, it is still a big challenge on managing workload and efficiency for all Spark jobs upon our data center. Our team is leading the whole infrastructure of big data platform and the management tools upon it, helping our customers -- not only DW engineers and data scientists, but also AI engineers -- to leverage on the same page. In this session, we will introduce how to benefit all of them within a self-service workload management portal/system. First, we will share the basic architecture of this system to illustrate how it collects metrics from multiple data centers and how it detects the abnormal workload real-time. We develop a component called Profiler which is to enhance the current Spark core to support customized metric collection. Next, we will demonstrate some real user stories in eBay to show how the self-service system reduces the efforts both in customer side and infra-team side. That's the highlight part about Spark job analysis and diagnosis. Finally, some incoming advanced features will be introduced to describe an automatic optimizing workflow rather than just alerting.
Speaker: Lantao Jin
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
Describes the thinking behind MapR's architecture. MapR"s Hadoop achieves better reliability on commodity hardware compared to anything on the planet, including custom, proprietary hardware from other vendors. Apache HDFS and Cassandra replication is also discussed, as are SAN and NAS storage systems like Netapp and EMC.
Talk for QConSF 2015: "Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of system performance tools, touring common problems with system metrics, monitoring, statistics, visualizations, measurement overhead, and benchmarks. This will likely involve some unlearning, as you discover tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many popular talks on operating system performance tools. This is an anti-version of these talks, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully."
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
The document summarizes a presentation on using persistent memory technology with Apache HBase. It describes the current HBase model using volatile memory and write-ahead logs, as well as persistent memory technology capabilities. The presentation proposes using persistent memory with HBase to store data non-volatility, eliminate write-ahead logs for faster performance, and increase availability through replica regions in memory. It shows performance test results finding over 2x higher throughput and up to 104x lower maximum latency without write-ahead logs when using persistent memory with HBase.
MapR clusters disks into storage pools for data distribution. By default, storage pools contain 3 disks each. The mrconfig command can be used to create, remove, and manage storage pools and disks. Each node supports up to 36 storage pools. Zookeeper should always be started before other services and is critical for high availability. Logs are centrally stored for 30 days by default and can be configured through yarn-site.xml.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
M|18 Where and How to Optimize for PerformanceMariaDB plc
The document discusses optimizing performance in MariaDB. It defines key performance metrics like throughput, response time, and scalability. It recommends identifying where time is spent through profiling, using benchmarks with a real dataset and environment, improving hardware, optimizing queries and indexes, tuning the query cache, temporary tables, and file sorting. Other tips include proper data types, normalization/denormalization, index design, and server configuration best practices like buffer sizes, table caches, and using InnoDB features like the adaptive hash index. Encoding is also discussed.
Software Architecture Reconstruction: Why What and HowMehdi Mirakhorli
Every system is a legacy system, the moment a programmer writes a line of code it becomes a legacy. Therefore in even relatively new systems similar to long lived systems, developers are faced with a body of code that they need to understand, and from which they need to extract architectural knowledge. Unfortunately, anecdotal evidence has shown that such knowledge tends to be tacit in nature, stored in the heads of people, and inconsistently scattered across various software artifacts and repositories. Furthermore, architectural knowledge vaporizes over time. Given the size, complexity, and longevity of many projects, developers therefore often lack a comprehensive knowledge of architectural design decisions and consequently make changes in the code that inadvertently degrade the underlying design and compromise its qualities.
This technical briefing will answer three fundamental questions about software architecture recovery: Why? What? and How? Through several examples it articulates and synthesizes technical forces and financial motivations that make software companies to invest in software architecture recovery. It discusses “what” are the pieces of design knowledge that can be recovered and lastly demonstrates a methodology as well as required tools for answering “how” to reconstruct architecture from implementation artifacts.
C* Summit EU 2013: The Cassandra Experience at Orange DataStax Academy
Speaker: Jean Armel Luce — Senior Software Engineer/Cassandra Admin at Orange
Video: http://www.youtube.com/watch?v=mefOE9K7sLI&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=28
At Orange, Jean Armel has helped develop an open source tool for the migration of data to Cassandra; Jean and his team were in need of the NoSQL solution Apache Cassandra in order to sustain the growth of requests and volume of data required by their application PnS. In this session, Jean Armel will start out with an overview of the Orange application PnS and dive into why they chose Apache Cassandra how they did their data migration without any interruption of service. Jean Armel will also show how his application behaves after the migration
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at XiaomiMichael Stack
This document discusses optimizations made to HDFS for Hbase at XiaoMi. It addresses issues like shared memory allocation causing full GC in datanodes, listen drops on SSD clusters causing delays, peer cache bucket adjustment, and connection timeouts. Changes like preallocating shared memory, increasing socket backlog, reducing client and datanode timeouts, and adjusting the datanode dead node detection help improve performance and availability. The overall goal is to maintain local data, return fast responses from HDFS to Hbase, and reduce GC overhead from both systems.
Computing Performance: On the Horizon (2021)Brendan Gregg
Talk by Brendan Gregg for USENIX LISA 2021. https://www.youtube.com/watch?v=5nN1wjA_S30 . "The future of computer performance involves clouds with hardware hypervisors and custom processors, servers running a new type of BPF software to allow high-speed applications and kernel customizations, observability of everything in production, new Linux kernel technologies, and more. This talk covers interesting developments in systems and computing performance, their challenges, and where things are headed."
Effectively Migrating to Cassandra from a Relational DatabaseTodd McGrath
Cassandra kicks butt. But, how do enjoy the benefits when refactoring an existing RDBMS based app? How can we migrate to Cassandra and test our data model, prepare our operations while simultaneously leaving our RDBMS online? What do we need to consider? How can we do it? Why should we do it?
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Michael Pirker
Latency problem was reported for VDisk CA-CL1-Disk04-N at 02/05/15 8:09,
The environment are two clusters connected with Metro Mirror. The first aim of this document is to show how we found the root cause of this problem in the link between the two clusters.
The second aim of this document is to describe how the root cause for this problem was found by using the BVQ structured performance problem analysis method. It demonstrates that successful analysis work needs a structured method and also a tool which supports this method and delivers the needed technical insight. We have the concept that everybody should be able to conduct a performance analysis. This is important because the level of service is lowered day by day and especially small customers are more and more reliant on their own skills or on the skills of their partners. This is a common problem occurring at all vendors!
The document provides an overview of initiatives and findings related to improving backup and restore processes at Century Link. Key accomplishments include developing health check scripts for TSM servers to reduce downtime and improve backup success rates. New findings identified opportunities to streamline TSM tape storage utilization by removing expired nodes and eliminating duplicate data to save on tapes. Issues were also found with some TSM server storage pools not having offsite copies. Suggested changes to address issues included evaluating backup tools, server configurations, and separating test/dev environments. Help was requested to approve removing expired and duplicate data.
This document provides instructions for completing a lab exercise on advanced port scanning using Nmap and Metasploit. Students will learn how to identify open ports and services on a target machine, the Metasploitable VM. The document outlines tasks to perform Nmap scans to discover the target IP address, identify open ports and services, perform version scanning, and clean up the host list. Screenshots are requested to support answers.
Performance Problems with iSeries Applicationsmboadway
In this Webcast, iSeries performance expert Mike Boadway of MB Software, used your enrollment survey responses to drive the focus of the content. Many performance problems start with applications and jobs that create a lot of unnecessary work for your AS/400 for iSeries...you'll learn why it's important to solve the underlying software issues of any performance problem before you go out and spend a lot of money on new hardware.
This document discusses various metrics for monitoring MySQL database performance, including hard and soft capacity limits, availability, task performance, resource performance, and scalability. It provides examples of top 10 metric lists and Nagios plugins for MySQL monitoring. It recommends monitoring work getting done, desirable metrics like utilization and response times, and avoiding non-actionable metrics like cache hit ratios. Alerts are best when they combine multiple critical issues rather than single isolated problems.
This document summarizes a project to replace aging NetApp storage devices and improve backup capabilities. It outlines performance details of current systems, options for new vSeries devices with increased throughput and capacity, and an improved backup solution using Snapshots. The solution would provide new NetApp filers in the PPMC and Quincy data centers, optionally replace the Sabey filer, and allow maintaining 90 days of Snapshots across sites rather than relying on tapes for backups.
Cloudera’s performance engineering team recently completed a new round of benchmark testing based on Impala 2.5 and the most recent stable releases of the major SQL engine options for the Apache Hadoop platform, including Apache Hive-on-Tez and Apache Spark/Spark SQL. This presentation explains the methodology and results.
This document discusses various techniques for improving cache performance, including reducing the miss rate and miss penalty. It describes reducing misses through larger block sizes, higher associativity, victim caches, and prefetching. It also covers reducing miss penalties via read priority on misses, non-blocking caches, and adding a second level cache. The goal is to improve CPU performance by lowering the miss rate, miss penalty, and time to hit in the cache.
Managing Apache Spark Workload and Automatic OptimizingDatabricks
eBay is highly using Spark as one of most significant data engines. In data warehouse domain, there are millions of batch queries running every day against 6000+ key DW tables, which contains over 22PB data (compressed) and still keeps booming every year. In machine learning domain, it is playing a more and more significant role. We have introduced our great achievement in migration work from MPP database to Apache Spark last year in Europe Summit. Furthermore, from the vision of the entire infrastructure, it is still a big challenge on managing workload and efficiency for all Spark jobs upon our data center. Our team is leading the whole infrastructure of big data platform and the management tools upon it, helping our customers -- not only DW engineers and data scientists, but also AI engineers -- to leverage on the same page. In this session, we will introduce how to benefit all of them within a self-service workload management portal/system. First, we will share the basic architecture of this system to illustrate how it collects metrics from multiple data centers and how it detects the abnormal workload real-time. We develop a component called Profiler which is to enhance the current Spark core to support customized metric collection. Next, we will demonstrate some real user stories in eBay to show how the self-service system reduces the efforts both in customer side and infra-team side. That's the highlight part about Spark job analysis and diagnosis. Finally, some incoming advanced features will be introduced to describe an automatic optimizing workflow rather than just alerting.
Speaker: Lantao Jin
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
Describes the thinking behind MapR's architecture. MapR"s Hadoop achieves better reliability on commodity hardware compared to anything on the planet, including custom, proprietary hardware from other vendors. Apache HDFS and Cassandra replication is also discussed, as are SAN and NAS storage systems like Netapp and EMC.
Talk for QConSF 2015: "Broken benchmarks, misleading metrics, and terrible tools. This talk will help you navigate the treacherous waters of system performance tools, touring common problems with system metrics, monitoring, statistics, visualizations, measurement overhead, and benchmarks. This will likely involve some unlearning, as you discover tools you have been using for years, are in fact, misleading, dangerous, or broken.
The speaker, Brendan Gregg, has given many popular talks on operating system performance tools. This is an anti-version of these talks, to focus on broken tools and metrics instead of the working ones. Metrics can be misleading, and counters can be counter-intuitive! This talk will include advice and methodologies for verifying new performance tools, understanding how they work, and using them successfully."
Spark started at Facebook as an experiment when the project was still in its early phases. Spark's appeal stemmed from its ease of use and an integrated environment to run SQL, MLlib, and custom applications. At that time the system was used by a handful of people to process small amounts of data. However, we've come a long way since then. Currently, Spark is one of the primary SQL engines at Facebook in addition to being the primary system for writing custom batch applications. This talk will cover the story of how we optimized, tuned and scaled Apache Spark at Facebook to run on 10s of thousands of machines, processing 100s of petabytes of data, and used by 1000s of data scientists, engineers and product analysts every day. In this talk, we'll focus on three areas: * *Scaling Compute*: How Facebook runs Spark efficiently and reliably on tens of thousands of heterogenous machines in disaggregated (shared-storage) clusters. * *Optimizing Core Engine*: How we continuously tune, optimize and add features to the core engine in order to maximize the useful work done per second. * *Scaling Users:* How we make Spark easy to use, and faster to debug to seamlessly onboard new users.
Speakers: Ankit Agarwal, Sameer Agarwal
Mysql8 advance tuning with resource groupMarco Tusa
I have a very noisy secondary application written by a very, very bad developer that accesses my servers, mostly with read queries, and occasionally with write updates. Reads and writes are obsessive and create an impact on the MAIN application. My task is to limit the impact of this secondary application without having the main one affected. To do that I will create two resource groups, one for WRITE and another for READ. The first group, Write_app2, will have no cpu affiliation, but will have lowest priority.
Are you using the fastest query tool for Hadoop? Provide and discuss the latest performance results of the industry standard TPC_H benchmarks executed across an assortment of open source query tools such as Hive (using MR, TEZ, LLAP, SPARK), SparkSQL, Presto, and Drill. Additionally, the performance tests will utilize a variety of data sizes and popular storage formats such as ORC, Parquet and Text and compression codecs.
The document discusses troubleshooting a SnapDiff performance issue. Key factors that can affect SnapDiff throughput include hardware resources, Data ONTAP version, disk I/O latency, number of files in volumes, parallel SnapDiff sessions, and overlapping backup jobs. Symptoms of timeouts between ONTAP and the media agent during the indexing phase suggest the indexing is taking too long. The document provides workarounds like skipping cataloging and running it later, and using live browse for restores if needed before investigation is complete.
Critical Performance Metrics for DDR4 based SystemsBarbara Aichinger
Servers are critical to today's Cloud Computing and DDR memory is at the heart of all Cloud Computing Servers. Presented at DesignCon 2015 this presentation outlines new measurable performance metrics for DDR4 Memory Subsystems.
SVC / Storwize: cache partition analysis (BVQ howto) Michael Pirker
This document provides a summary of cache partitioning and management for the SVC storage system. It discusses how the cache is divided into partitions that correspond to managed disk groups. Each partition is allocated a percentage of the total cache and thresholds are used to trigger destaging to avoid any single partition becoming overloaded. Examples are given showing partitions operating normally, a partition becoming overloaded when its backend controller cannot keep up with destaging, and the overall cache responding when usage crosses thresholds.
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...HostedbyConfluent
"With a few tweaks under the hood of your Kafka Streams implementation, you greatly improve performance. Sound too good to be true? Well, the secret lies in understanding storage engines.
You may already know that if you're using Kafka streams, you already have a storage engine in place, but do you know what options are available to tune it for optimal performance and scalability?
This presentation will discuss the importance of optimizing and choosing storage engines for Kafka streams applications.
Outline:
- What a storage engine is and how it relates to Kafka stateful streams
- The importance of understanding storage engines for optimal performance and scalability
- Evaluation of Storage Engines - Overview of popular storage engines, including Leveldb, Rocksdb, and Speedb open-source
- Review of the 5 most relevant configurable items and how they affect performance
- Practical ways to optimize and fine tune your storage engine
- Showcase - 2 minutes drop-in replacement demonstration"
Drill Down the most underestimate Oracle Feature - Database Resource ManagerLuis Marques
Being a crucial feature on managing database load and with real world practice showing that Database
Resource Manager (DBRM) is not often used, this talk want to change this and demystify this feature by
explaining how it works in detail on different scenarios, the CPU math behind it, how to measure it in
real-time using Python and SQL and exploring more complex features to understand its behaviour.
Special attention will be made to understand its internals whenever possible.
The document discusses using capacity planning and performance analysis to improve system performance. It describes two case studies: capacity planning for an Oracle RAC database and performance analysis of a SQL Server application on HP blades. For the Oracle case, different platform options were evaluated and optimized configurations identified. For SQL Server, enabling AWE resolved soft paging and reduced response times by improving memory usage. The lessons highlight challenges in using performance tools and the need for better fault detection and data presentation.
This presentation will give a quick introduction on how to use slurm, the scheduler that runs programs (scripts) in HPC. Targeted for audience who are new to the Lawrencium or who may want to learn a few more things in troubleshooting their jobs.
This document discusses Apache NiFi and how it was used to create a new composable data flow system for Schlumberger in just 10 man hours. The previous system was very complex, took over 100 man years to create, and was difficult to change. NiFi allows for easy visualization of the data flow, debugging of issues, and rapid creation of new processors. It also enables quick testing of data flows using curated test data sets and live data in Docker containers. Next steps discussed include further exploring use cases for rig data ingestion with NiFi to provide data provenance and understand the chain of custody of data as it moves through the system.
Similar to Case 2 inspecting cause of service delay - a tire manufacturer (20)
OnTune provides real-time system performance monitoring down to the second, allowing it to detect issues that other solutions miss. It collects a wide range of system data without customization and stores historical data to help analyze long-term trends and troubleshoot intermittent problems. Case studies demonstrate how OnTune's high-granularity data helped users identify specific processes causing performance issues and crashes.
- onTune is a system analysis and performance monitoring tool that provides real-time data collection and monitoring of key system metrics like CPU, memory, disk, network usage, and applications every 2 seconds with low overhead.
- It allows visibility into performance across on-premise and cloud environments from a single view and helps identify issues through deep diagnostic reports and root cause analysis.
- onTune's simple interface makes it fast and easy to implement and use to optimize system performance, reduce costs, and ensure service quality and SLAs.
Case 3 inspecting cause of failure - a car maker plmTeemStone Pty Ltd
Three Oracle Java processes on the PLM DB#2 server requested 3GB of memory at 13:26, causing high CPU I/O wait times and paging as the server struggled to allocate memory. This overwhelmed the server and caused it to temporarily hang. Inspecting the server's CPU usage, paging activity, and memory usage graphs during the hang showed abnormal peaks at 13:26 that identified the memory allocation request as the likely root cause of the problem. Stopping the EM monitoring agent on the troubled server reduced memory usage and prevented further hangs.
George, a system administrator, was having trouble completing a deployment that was taking all morning. He had contacted colleagues, experts, and developers for help but was still unable to resolve the issue. System administrators play a critical role in maintaining computer infrastructure but troubleshooting problems can be challenging and time-consuming. Expert performance analysis tools like onTune that collect real-time system data and monitor all processes can help administrators identify issues faster and save organizations time and money compared to general system management solutions.
This document discusses the System Analysis & Performance Instrument tool from TeemStone Corp. It outlines some of the key benefits of their onTune product over competitors like IBM Tivoli, HP OpenView, and BMC Patrol. Specifically, it notes that onTune allows for real-time monitoring at the second level, collects abundant performance data, and has intuitive functions for problem and performance analysis. Case studies and customer testimonials are also provided showing how onTune helped customers identify issues that other tools could not detect.
An internet shopping mall was experiencing frequent system errors and crashes from a new interactive flash program. The system operator could not determine the root cause of the issues through standard Windows performance monitoring tools. After installing OnTune to monitor system and process resource usage, the operator discovered that the virtual memory usage of the flash program was steadily increasing until it caused the system to crash. This identified a memory leak in the flash program code. The developer was notified and fixed the problem, resolving the system crashes.
onTune is a next generation system management solution that can quickly analyze, identify, and offer solutions to any performance problems for the system administrator in real-time. It can be installed through a simple installation and automatically monitors the system without additional configuration changes. onTune also creates reports like PowerPoint to analyze historical system performance and status.
onTune is a next generation system management solution from TeemStone that allows real-time monitoring and analysis of system performance. It collects system data without customization and allows administrators to monitor CPU, memory, I/O, and other essential elements. onTune supports monitoring in virtualization environments from vendors like IBM, HP, and Oracle. It is currently used by IBM for virtualization monitoring and supports various monitoring and analysis scenarios for ensuring system performance.
onTune is a next generation performance monitoring and analysis solution that allows real-time monitoring of systems to maximize analytic capacity and understand the reasons for any errors. It collects performance data every 2 seconds to provide more detailed insight compared to traditional SMS tools with collection intervals of 1 minute or more. System administrators can monitor CPU, memory, I/O and other metrics for overall systems or individual processes.
onTune provides powerful virtualization monitoring and analysis capabilities. It can monitor virtual resources on servers from various hardware vendors that use technologies like VMware, Xen, and Oracle's LDOM. onTune collects structure information and monitors both logical virtualized resources and physical server resources. It presents this information visually to help administrators respond to issues. onTune also helps analyze the root causes of errors in virtualized systems through real-time performance monitoring and analysis.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
2. 1
hqplvc1 – Memory Usage
It seemed that the service responses of hqplvc1 in the evening
on 19 Mar were very bad due to lack of memory.
At that time, PagingSpace In/Out was very high, and RSS of
Kernel process was very low. It seems that the business
applications were restarted or memory of those applications
were paged out and the services were delayed.
It seems that 30% of memory is lacked.
3. 2
CPU usage of all servers on 24, Mar
Server Average Max
hqpap4 88.95 100
hqpap3 41.94 95.2
hqpap2 28.66 96.23
hqplvc1 26.77 92.67
hqpdb1 37.82 74
hqpap6 24.05 68.1
hqpscm0 13.52 39.57
hqpha1 12.24 37.5
hqpscm3 7.78 35.63
hqpscm2 6.48 27.43
hqpap5 0.09 0.7
Time Period : 08h ~ 12h on 24, Mar.
hqpap4, hqpap3, hqpap2, hqplvc1 shows CPU usage peak. See details in
following page.
High CPU usage of all the servers except hqplvc1 were caused by process
disp+work. On haplvc1 server, CPU time had mostly consumed by the
kernel processes by sdb user.
5. 4
hqpap3 – Memory usage
It seemed that the service responses of hqpap3 in the morning
on 24 Mar were very bad due to lack of memory.
At that time, high PagingSpace In/Out occurred.
The cause of service delay may be caused by lack of memory
Considering max values of ActiveVM, the size of lack of
memory is approximately 5 GB.
Lack of Memory: Approx. 5GB
6. 5
hqplvc1 – Memory Usage
It seemed that the service responses of hqplvc1 in the evening
on 19 Mar were very bad due to lack of memory.
At that time, PagingSpace In/Out was very high, and RSS of
Kernel process was very low. It seems that the business
applications were restarted or memory of those applications
were paged out and the services were delayed.
It seems that 30% of memory is lacked.