Continuent Tungsten - Scalable Saa S Data Managementguest2e11e8
The key needs of SaaS vendors include:
i) managing multi-tenant architectures with shared DBMS, ii) maintaining customer SLAs for uptime and performance and iii) optimized, efficient operations.
The key benefits Continuent Tungsten offers SaaS vendors are:
i) high availability and protection from data loss, ii) simple, efficient cluster management and iii) enable complex database topologies.
Tungsten offers high-availability, database cluster management and management of complex topologies for multi-tenant architectures.
Tungsten high availability and data protection features include maintaining live copies with data consistency checking and tightly coupled backup/restore integration with cluster management tools.
Tungsten cluster management allows SaaS vendors to migrate customers and perform system upgrades without downtime, thus enabling these maintenance operations during normal business hours.
Tungsten also enables complex replication topologies, including data filtering and data archiving strategies, maintaining extra data copies for data-marts, routing different customers to different DBMS copies, and providing cross-site multi-master replication.
The document discusses several concepts related to building scalable and available systems, including:
- Scalability involves a system's ability to handle expected loads with acceptable performance and to grow easily when loads increase. This may involve scaling up using bigger/faster systems or scaling out across multiple systems.
- Availability is the goal of having a system operational 100% of the time, requiring redundancy so there are no single points of failure.
- Performance measures like response time and throughput relate to a system's scalability and capacity. Distributing load across redundant and partitioned components can help improve scalability and availability.
The document provides an overview of distributed caching with Coherence, JPA with TopLink Grid, and integrating Coherence with WebLogic Server. It discusses Coherence clustering, data management options like partitioned caching, data processing options like events and queries, how TopLink Grid allows scaling JPA applications using the Coherence data grid, and how Coherence servers integrate with the WebLogic lifecycle.
MoreVRP is a database performance monitoring and acceleration tool, and offers DBAs the capability to have real-time monitoring and resource management and control.
Proactive performance monitoring with adaptive thresholdsJohn Beresniewicz
Presentation given at UKOUG 2008 conference on the Adaptive Thresholds technology in Oracle database 10.2+ and Enterprise Manager 11. Adaptive Thresholds allows users to do consistent and effective performance monitoring across systems and architectures by using statistical characterization of metric streams to automatically set and adapt monitoring thresholds independent of application workload.
Acumatica SaaS provides benefits like disaster recovery, backups, high availability, software updates and maintenance that surpass most external hosting providers. It uses 24/7 monitoring to ensure consistent performance. Data is securely hosted on AWS and accessible from any device. Automated backups are taken every 2 hours and retained for months. The optional backup access service allows downloading backups. Failover protection is included, and the recovery process involves restoring from the additional backup location. Customizations can be easily maintained through upgrades due to Acumatica's APIs.
This slides show how to utilize real-world applications to teach early architecture exploration of electronics, embedded systems, software/firmware and semiconductor using visualsim.
This document discusses distributed data center architectures and disaster recovery strategies. It begins by providing background on the evolution of data centers and then covers key aspects of distributed data center design like replication, high availability, and disaster recovery plans. The objectives of disaster recovery plans, such as recovery point and recovery time objectives, are explained. Different disaster recovery architectures like warm and hot standbys are also summarized.
Continuent Tungsten - Scalable Saa S Data Managementguest2e11e8
The key needs of SaaS vendors include:
i) managing multi-tenant architectures with shared DBMS, ii) maintaining customer SLAs for uptime and performance and iii) optimized, efficient operations.
The key benefits Continuent Tungsten offers SaaS vendors are:
i) high availability and protection from data loss, ii) simple, efficient cluster management and iii) enable complex database topologies.
Tungsten offers high-availability, database cluster management and management of complex topologies for multi-tenant architectures.
Tungsten high availability and data protection features include maintaining live copies with data consistency checking and tightly coupled backup/restore integration with cluster management tools.
Tungsten cluster management allows SaaS vendors to migrate customers and perform system upgrades without downtime, thus enabling these maintenance operations during normal business hours.
Tungsten also enables complex replication topologies, including data filtering and data archiving strategies, maintaining extra data copies for data-marts, routing different customers to different DBMS copies, and providing cross-site multi-master replication.
The document discusses several concepts related to building scalable and available systems, including:
- Scalability involves a system's ability to handle expected loads with acceptable performance and to grow easily when loads increase. This may involve scaling up using bigger/faster systems or scaling out across multiple systems.
- Availability is the goal of having a system operational 100% of the time, requiring redundancy so there are no single points of failure.
- Performance measures like response time and throughput relate to a system's scalability and capacity. Distributing load across redundant and partitioned components can help improve scalability and availability.
The document provides an overview of distributed caching with Coherence, JPA with TopLink Grid, and integrating Coherence with WebLogic Server. It discusses Coherence clustering, data management options like partitioned caching, data processing options like events and queries, how TopLink Grid allows scaling JPA applications using the Coherence data grid, and how Coherence servers integrate with the WebLogic lifecycle.
MoreVRP is a database performance monitoring and acceleration tool, and offers DBAs the capability to have real-time monitoring and resource management and control.
Proactive performance monitoring with adaptive thresholdsJohn Beresniewicz
Presentation given at UKOUG 2008 conference on the Adaptive Thresholds technology in Oracle database 10.2+ and Enterprise Manager 11. Adaptive Thresholds allows users to do consistent and effective performance monitoring across systems and architectures by using statistical characterization of metric streams to automatically set and adapt monitoring thresholds independent of application workload.
Acumatica SaaS provides benefits like disaster recovery, backups, high availability, software updates and maintenance that surpass most external hosting providers. It uses 24/7 monitoring to ensure consistent performance. Data is securely hosted on AWS and accessible from any device. Automated backups are taken every 2 hours and retained for months. The optional backup access service allows downloading backups. Failover protection is included, and the recovery process involves restoring from the additional backup location. Customizations can be easily maintained through upgrades due to Acumatica's APIs.
This slides show how to utilize real-world applications to teach early architecture exploration of electronics, embedded systems, software/firmware and semiconductor using visualsim.
This document discusses distributed data center architectures and disaster recovery strategies. It begins by providing background on the evolution of data centers and then covers key aspects of distributed data center design like replication, high availability, and disaster recovery plans. The objectives of disaster recovery plans, such as recovery point and recovery time objectives, are explained. Different disaster recovery architectures like warm and hot standbys are also summarized.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
Anti patterns in Hadoop Cluster deploymentSunil Govindan
Rohith Sharma, Naganarasimha, and Sunil presented on Hadoop cluster configurations and anti-patterns. They discussed sample node manager configurations with high resources, related YARN and MapReduce resource tuning settings, and anti-patterns like not configuring container heap size properly leading to out of memory errors. They also covered YARN capacity scheduler queue planning best practices like queue mapping, preemption, user limits, and application priority to improve cluster utilization.
This document discusses the timeline server which collects and stores application metrics and event data in YARN. It describes the limitations of the original job history server and application history server, which only supported MapReduce jobs and did not capture YARN-level data. The timeline server versions 1 and 2 are presented as improved solutions, with version 2 focusing on distributed and reliable storage in HBase, a new data model to support arbitrary application types, and online aggregation of metrics.
Distributed Services Scheduling and Cloud ProvisioningAr Agarwal
This is the presentation for my final year project at NIT Allahabad (2013-14). The purpose of the project is to design a scheduling algorithm for cloud environment with proper resource management.
EM12c: Capacity Planning with OEM MetricsMaaz Anjum
Some of my thoughts and adventures encapsulated in a presentation regarding Capacity Planning, Resource Utilization, and Enterprise Managers Collected Metrics.
IBM Managing Workload Scalability with MQ ClustersIBM Systems UKI
This document discusses various clustering scenarios for WebSphere MQ, beginning with a simple initial setup and expanding in complexity. It addresses scenarios like workload balancing, high availability during failures, and location dependencies when applications and services are distributed across data centers separated by large distances. Key points covered include using queue aliases, cluster workload priorities, and the AMQSCLM monitoring tool to help direct messages to available instances of services and ensure responses can be routed properly even if client or queue manager failures occur.
This document provides an overview of Oracle Coherence, an in-memory distributed computing platform. It defines a data grid, describes Coherence clustering and data management options like partitioned caching. It also covers data processing options in Coherence like events, parallel queries, continuous query caches, and invocable maps. The document concludes with an overview of the Coherence Incubator project.
Best Practices: Migrating a Postgres Production Database to the CloudEDB
Do you want to learn how you can move to the Cloud? This presentation will provide the solid ideas and approaches you need to plan and execute a successful migration of a production Postgres database to the Cloud.
This document discusses high availability strategies for MySQL databases across multiple datacenters. It covers architectural considerations for hot/hot vs hot/cold configurations and disaster recovery approaches. The main sections explore replication techniques like MySQL replication and alternative schemes, application high availability mechanisms, and how Percona can help with high availability solutions and services.
- The document discusses managing a large OLTP database at PayPal, including capacity management, planned maintenance, performance management, and troubleshooting. It provides details on monitoring the database infrastructure, conducting maintenance such as patching and switchovers, and optimizing performance for Oracle RAC environments. The goal is to support business needs and provide uninterrupted service through proactive management of the database tier.
Learn about the challenges and the design patterns that will help you prepare your application for Azure.
.NET Core samples are available here: https://github.com/cmendible/dotnetcore.samples/tree/master/cloud.design.patterns
The Next Generation Application Server – How Event Based Processing yields s...Guy Korland
The document discusses event-based processing using event containers to achieve scalability in application servers. It describes how event containers allow collocating services and data in memory to minimize latency and maximize throughput. This approach provides built-in failover/redundancy through SLA-driven containers and allows linear scalability through automated deployment of additional processing units. Customer use cases that were able to significantly improve performance and scalability using this approach are also presented.
[DSBW Spring 2009] Unit 05: Web ArchitecturesCarles Farré
The document discusses physical architecture design for web applications. It describes several common architecture patterns including single server, separate database, and replicated web servers. Key considerations for architecture design are also outlined, such as performance, scalability, availability, security and constraints related to cost, complexity and standards.
Disaster Recovery & Data Backup StrategiesSpiceworks
This document discusses data backup strategies and planning. It emphasizes that backups are critical for businesses to protect their data and recover from data loss. The document outlines planning considerations like identifying critical systems and data, recovery objectives, and capacity needs. It then covers various backup methods and factors to consider when developing a backup plan such as repository type, media type, and testing procedures. Regularly monitoring and testing backups is key to ensuring the plan is effective.
SharePoint Backup And Disaster Recovery with Joel OlesonJoel Oleson
This walks through the various options around backup and restore with SharePoint. This deck was presented at Tech Ed South East Asia 2008 by Joel Oleson
Netezza provides workload management options to efficiently service user queries. It allows restricting the maximum concurrent jobs, creating resource sharing groups to control resource allocation disproportionately, and uses multiple schedulers like gatekeeper and GRA. Gatekeeper queues jobs and schedules based on priority and resource availability. GRA allocates resources to jobs based on user's resource group. Short queries can be prioritized using short query bias which reserves system resources for such queries.
Metro Cluster High Availability or SRM Disaster Recovery?David Pasek
Presentation explains the difference between multi site high availability (aka metro cluster) and disaster recovery. General concepts are similar for any products but presentation is more tailored for VMware technologies.
The document provides an overview of a workshop on implementing a data warehouse appliance using Netezza. The objectives are to understand the need for a data warehouse appliance approach compared to traditional data warehousing, learn about Netezza's architecture and capabilities, and be able to apply selected Netezza functionality and optimize Netezza databases for performance. Key topics that will be covered include the differences between symmetric multi-processing (SMP) and massively parallel processing (MPP) architectures, and how Netezza's asymmetric MPP architecture works to optimize performance.
The document provides an overview of distributed systems patterns and practices. It discusses why distributed systems are used to solve problems like single points of failure and elastic demand. Common distributed system patterns are explained, including leader-follower models, data replication across nodes, and handling failures. Specific distributed systems like Zookeeper, HDFS and Cassandra are described as examples of implementing patterns like quorum management and consistent hashing for replicated data.
This document provides a reverse chronological work history for Geoff Garrad including dates worked, contracting companies, tasks performed, ROV systems used, additional equipment, locations, and clients. It lists over 30 jobs between 2007 and 2015 involving construction, inspection, surveying, and operations support using various workclass ROVs for companies like Subsea 7, Fugro Rovtech, I.S.S., and Deep Ocean. The jobs involved tasks like pipeline surveying, jacket inspection, drilling support, and diver assistance on fields throughout the North Sea.
The document announces the National Conference on Recent Innovations in Science, Engineering, Technology and Management (NCRISETM - 2016) to be held from October 25-27, 2016 at Amity University in Greater Noida, India. It provides information on registration fees, payment details, instructions, and contact information for registration, publication, finance, and sponsorship. Authors are asked to fill out forms providing their contact information, paper title, and signature transferring copyright.
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
RMOUG 2020 abstract:
This session will cover core concepts for Oracle performance analysis first introduced in Oracle 10g and forming the backbone of many features in the Diagnostic and Tuning packs. The presentation will cover the theoretical basis and meaning of these concepts, as well as illustrate how they are fundamental to many user-facing features in both the database itself and Enterprise Manager.
Anti patterns in Hadoop Cluster deploymentSunil Govindan
Rohith Sharma, Naganarasimha, and Sunil presented on Hadoop cluster configurations and anti-patterns. They discussed sample node manager configurations with high resources, related YARN and MapReduce resource tuning settings, and anti-patterns like not configuring container heap size properly leading to out of memory errors. They also covered YARN capacity scheduler queue planning best practices like queue mapping, preemption, user limits, and application priority to improve cluster utilization.
This document discusses the timeline server which collects and stores application metrics and event data in YARN. It describes the limitations of the original job history server and application history server, which only supported MapReduce jobs and did not capture YARN-level data. The timeline server versions 1 and 2 are presented as improved solutions, with version 2 focusing on distributed and reliable storage in HBase, a new data model to support arbitrary application types, and online aggregation of metrics.
Distributed Services Scheduling and Cloud ProvisioningAr Agarwal
This is the presentation for my final year project at NIT Allahabad (2013-14). The purpose of the project is to design a scheduling algorithm for cloud environment with proper resource management.
EM12c: Capacity Planning with OEM MetricsMaaz Anjum
Some of my thoughts and adventures encapsulated in a presentation regarding Capacity Planning, Resource Utilization, and Enterprise Managers Collected Metrics.
IBM Managing Workload Scalability with MQ ClustersIBM Systems UKI
This document discusses various clustering scenarios for WebSphere MQ, beginning with a simple initial setup and expanding in complexity. It addresses scenarios like workload balancing, high availability during failures, and location dependencies when applications and services are distributed across data centers separated by large distances. Key points covered include using queue aliases, cluster workload priorities, and the AMQSCLM monitoring tool to help direct messages to available instances of services and ensure responses can be routed properly even if client or queue manager failures occur.
This document provides an overview of Oracle Coherence, an in-memory distributed computing platform. It defines a data grid, describes Coherence clustering and data management options like partitioned caching. It also covers data processing options in Coherence like events, parallel queries, continuous query caches, and invocable maps. The document concludes with an overview of the Coherence Incubator project.
Best Practices: Migrating a Postgres Production Database to the CloudEDB
Do you want to learn how you can move to the Cloud? This presentation will provide the solid ideas and approaches you need to plan and execute a successful migration of a production Postgres database to the Cloud.
This document discusses high availability strategies for MySQL databases across multiple datacenters. It covers architectural considerations for hot/hot vs hot/cold configurations and disaster recovery approaches. The main sections explore replication techniques like MySQL replication and alternative schemes, application high availability mechanisms, and how Percona can help with high availability solutions and services.
- The document discusses managing a large OLTP database at PayPal, including capacity management, planned maintenance, performance management, and troubleshooting. It provides details on monitoring the database infrastructure, conducting maintenance such as patching and switchovers, and optimizing performance for Oracle RAC environments. The goal is to support business needs and provide uninterrupted service through proactive management of the database tier.
Learn about the challenges and the design patterns that will help you prepare your application for Azure.
.NET Core samples are available here: https://github.com/cmendible/dotnetcore.samples/tree/master/cloud.design.patterns
The Next Generation Application Server – How Event Based Processing yields s...Guy Korland
The document discusses event-based processing using event containers to achieve scalability in application servers. It describes how event containers allow collocating services and data in memory to minimize latency and maximize throughput. This approach provides built-in failover/redundancy through SLA-driven containers and allows linear scalability through automated deployment of additional processing units. Customer use cases that were able to significantly improve performance and scalability using this approach are also presented.
[DSBW Spring 2009] Unit 05: Web ArchitecturesCarles Farré
The document discusses physical architecture design for web applications. It describes several common architecture patterns including single server, separate database, and replicated web servers. Key considerations for architecture design are also outlined, such as performance, scalability, availability, security and constraints related to cost, complexity and standards.
Disaster Recovery & Data Backup StrategiesSpiceworks
This document discusses data backup strategies and planning. It emphasizes that backups are critical for businesses to protect their data and recover from data loss. The document outlines planning considerations like identifying critical systems and data, recovery objectives, and capacity needs. It then covers various backup methods and factors to consider when developing a backup plan such as repository type, media type, and testing procedures. Regularly monitoring and testing backups is key to ensuring the plan is effective.
SharePoint Backup And Disaster Recovery with Joel OlesonJoel Oleson
This walks through the various options around backup and restore with SharePoint. This deck was presented at Tech Ed South East Asia 2008 by Joel Oleson
Netezza provides workload management options to efficiently service user queries. It allows restricting the maximum concurrent jobs, creating resource sharing groups to control resource allocation disproportionately, and uses multiple schedulers like gatekeeper and GRA. Gatekeeper queues jobs and schedules based on priority and resource availability. GRA allocates resources to jobs based on user's resource group. Short queries can be prioritized using short query bias which reserves system resources for such queries.
Metro Cluster High Availability or SRM Disaster Recovery?David Pasek
Presentation explains the difference between multi site high availability (aka metro cluster) and disaster recovery. General concepts are similar for any products but presentation is more tailored for VMware technologies.
The document provides an overview of a workshop on implementing a data warehouse appliance using Netezza. The objectives are to understand the need for a data warehouse appliance approach compared to traditional data warehousing, learn about Netezza's architecture and capabilities, and be able to apply selected Netezza functionality and optimize Netezza databases for performance. Key topics that will be covered include the differences between symmetric multi-processing (SMP) and massively parallel processing (MPP) architectures, and how Netezza's asymmetric MPP architecture works to optimize performance.
The document provides an overview of distributed systems patterns and practices. It discusses why distributed systems are used to solve problems like single points of failure and elastic demand. Common distributed system patterns are explained, including leader-follower models, data replication across nodes, and handling failures. Specific distributed systems like Zookeeper, HDFS and Cassandra are described as examples of implementing patterns like quorum management and consistent hashing for replicated data.
This document provides a reverse chronological work history for Geoff Garrad including dates worked, contracting companies, tasks performed, ROV systems used, additional equipment, locations, and clients. It lists over 30 jobs between 2007 and 2015 involving construction, inspection, surveying, and operations support using various workclass ROVs for companies like Subsea 7, Fugro Rovtech, I.S.S., and Deep Ocean. The jobs involved tasks like pipeline surveying, jacket inspection, drilling support, and diver assistance on fields throughout the North Sea.
The document announces the National Conference on Recent Innovations in Science, Engineering, Technology and Management (NCRISETM - 2016) to be held from October 25-27, 2016 at Amity University in Greater Noida, India. It provides information on registration fees, payment details, instructions, and contact information for registration, publication, finance, and sponsorship. Authors are asked to fill out forms providing their contact information, paper title, and signature transferring copyright.
Students focused on making numbers using MAB blocks, writing numbers in their expanded form such as 345 = 300 + 40 + 5, and matching numbers to their written words.
The document discusses the role of student affairs in promoting student learning and success. It outlines various services provided by student affairs, such as academic support, counseling, financial aid, and residential life, to help students transition to college and succeed academically and socially. It also notes that student affairs seeks to collaborate with other campus offices and that involvement in student affairs activities is linked to better student retention and grade performance.
Top 10 applications engineer interview questions and answersjanhjonh
This document provides resources for applications engineer interviews, including common interview questions, tips, and examples. It lists 10 frequently asked interview questions for applications engineers along with sample answers. Additionally, it provides many links to further interview preparation materials on topics such as situational interviews, behavioral interviews, phone interviews, and more. The document aims to help candidates seeking applications engineer roles to succeed in interviews and secure job offers.
Delise Marajh has over 15 years of experience in learning and development, most recently as a Learning and Development Specialist for Liberty where she partners with business to design and facilitate learning interventions to improve performance. She holds qualifications in biotechnology, retirement funds, education and training, and wealth management. Her experience also includes positions in retirement fund administration and as a microbiologist working with dolphins.
These are the slides from my presentation at CLOUDCOMP 2009 on AppScale, an open source platform for running Google App Engine apps on. See our project home page at http://appscale.cs.ucsb.edu or our code page at http://code.google.com/p/appscale
ATM is a high-speed networking standard designed to support voice, image, video, and data communications through fixed-size cells. It provides high bandwidth, high data transfer rates, quality of service, and efficient bandwidth allocation. ATM is used for both constant rate traffic like audio and video as well as variable rate traffic like data. It can be implemented through a company's own ATM network or through fixed connections from network operators. While ATM requires new hardware and software and has some complexity, it allows for a single network connection that can easily mix different media types.
The document discusses Deployit, an application release automation platform from XebiaLabs that aims to optimize the application deployment process by reducing costs, accelerating time to market, and bridging the gap between development and operations. Deployit utilizes deployment packages, environments, and deployments to automate application releases in a lightweight and scalable way that supports both public and private clouds.
The document discusses normalization in relational databases. It defines some key concepts like functional dependencies, normal forms, and anomalies like insertion and deletion anomalies. It explains how normalization aims to eliminate anomalies by decomposing relations and placing attributes together that are closely related based on functional dependencies. The goal of normalization is to produce a stable and flexible database design with relations that faithfully represent the enterprise data.
Last week I had a good opportunity talk to 50+ budding healthcare Entrepreneurs @ #In50Hrs, about the tools and standards could help them to build a Prototype from Idea stage in 50 hours. Some very interesting prototypes presented...
John Duchneskie, assistant managing editor for The Philadelphia Inquirer, teaches journalists to use Datawrapper.de to create charts and ArcGIS Online to create maps in this presentation for NewsTrain in Murfreesboro, Tennessee, on Sept. 30-Oct. 1, 2016. It is accompanied by a handout, "Create Your Own Simple Graphics for Mobile." NewsTrain is a training initiative of Associated Press Media Editors. More info: http://bit.ly/NewsTrain
Planning datacenter migrations can involve thousands of workloads and tens of thousands of servers and are often deeply interdependent. Application discovery and dependency mapping are important early first steps in the migration process, but difficult to perform at scale due to the lack of automated tools. AWS Application Discovery Service is a new service (coming soon) that automatically identifies data center applications and dependencies, and baselines application health and performance to help plan your application migration to AWS quickly and reliably. This talk introduces the new Application Discovery Service capabilities for simplifying the planning process for data center and large scale migrations to AWS. We will discuss how you can use the AWS Application Discovery Service data service to examine the applications running your data center, their attributes, and their dependencies and then use this information to help reduce the time, cost, and risk of migrating applications to AWS.
The document discusses application portfolio management. It begins by explaining the need for application portfolio management and how it provides visibility into an organization's applications. It then discusses how to capture application information, including how to categorize applications using metrics like type, business functionality, usage profile, and capabilities. Industry reference models for capabilities are also mentioned. The document provides examples of how to map applications to business capabilities and show application dependencies through a RACI chart. It suggests publishing the application portfolio information on an intranet for feedback. Overall, the document provides an overview of best practices for setting up and maintaining an application portfolio management practice.
Adding Value in the Cloud with Performance TestRodolfo Kohn
This document discusses the importance of performance testing cloud applications and outlines best practices for defining performance requirements, testing methodology, and identifying issues. It provides examples of performance problems found in databases, applications, operating systems, and networks. The key goals of performance testing are to understand system behavior under load, find bottlenecks and hidden bugs, and verify that requirements are met.
“Performance testing is the process by which software is tested to determine the current system performance. This process aims to gather information about current performance, but places no value judgments on the findings".
Jugal Shah has over 14 years of experience in IT working in roles such as manager, solution architect, DBA, developer and software engineer. He has worked extensively with database technologies including SQL Server, MySQL, PostgreSQL and others. He has received the MVP award from Microsoft for SQL Server in multiple years. Common causes of SQL Server performance problems include configuration issues, design problems, bottlenecks and poorly written queries or code. Various tools can be used to diagnose issues including dynamic management views, Performance Monitor, SQL Server Profiler and DBCC commands.
Grails has great performance characteristics but as with all full stack frameworks, attention must be paid to optimize performance. In this talk Lari will discuss common missteps that can easily be avoided and share tips and tricks which help profile and tune Grails applications.
Leveraging Functional Tools and AWS for Performance TestingThoughtworks
This document discusses leveraging functional test tools and AWS for performance testing. It describes challenges with functional testing like needing quick reusable tools for continuous integration. It also covers using AWS to help with performance testing by allowing different customer environments to be easily setup and configured. Key aspects of performance testing discussed include measuring response times, concurrency, and failover testing using tools like SOAP UI, custom code, and analyzing performance counters.
WebLogic Server Work Managers and Overload ProtectionJames Bayer
A tour of the WebLogic Server work manager and self-tuning thread pool features that automatically adjust to changing workloads and protect the server from overload conditions.
This document discusses performance engineering for batch and web applications. It begins by outlining why performance testing is important. Key factors that influence performance testing include response time, throughput, tuning, and benchmarking. Throughput represents the number of transactions processed in a given time period and should increase linearly with load. Response time is the duration between a request and first response. Tuning improves performance by configuring parameters without changing code. The performance testing process involves test planning, creating test scripts, executing tests, monitoring tests, and analyzing results. Methods for analyzing heap dumps and thread dumps to identify bottlenecks are also provided. The document concludes with tips for optimizing PostgreSQL performance by adjusting the shared_buffers configuration parameter.
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
Abstract: Recent projects have stressed the "need for speed" while handling large amounts of data, with near zero downtime. An analysis of multiple environments has identified optimizations and architectures that improve both performance and reliability. The session covers data gathering and analysis, discussing everything from the network (multiple NICs, nearby catalogs, high speed Ethernet), to the latest features of extreme scale. Performance analysis helps pinpoint where time is spent (bottlenecks) and we discuss optimization techniques (MQ tuning, IIB performance best practices) as well as helpful IBM support pacs. Log Analysis pinpoints system stress points (e.g. CPU starvation) and steps on the path to near zero downtime.
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
This document discusses scheduling updates in streaming data warehouses. It proposes a scheduling framework to handle complications from streaming data, including view hierarchies, data consistency, inability to preempt updates, and transient overload. Key aspects of the proposed system include defining a scheduling metric based on data staleness rather than job properties, and developing two modes (push and pull) for auditing logs to provide data accountability. The goal is to propagate new data across relevant tables and views as quickly as possible to allow real-time decision making.
- The document discusses understanding system performance and knowing when it's time for a system tune-up. It covers monitoring tools like DBQL and Viewpoint, establishing performance baselines, using real-time alerts, and examining growth patterns.
- It emphasizes the importance of regular benchmarks to compare performance over time, especially before and after upgrades. Successful benchmarks require consistency in data, queries, indexing, and concurrency levels.
- The document outlines various aspects of performance tuning like query tuning, load techniques, compression, and utilizing new database features. It stresses automating processes and educating developers on database technologies.
PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.
HA and DR Architecture for HANA on Power Deck - 2022-Nov-21.PPTXThinL389917
This document discusses high availability (HA) and disaster recovery (DR) architectures for SAP HANA on IBM Power Systems. It provides an overview of typical HA/DR configurations including host auto-failover, SAP HANA system replication in performance-optimized and cost-optimized modes, and the roles of cluster managers like Pacemaker in automating failover. Key aspects covered are recovery point objectives (RPOs), recovery time objectives (RTOs), synchronous vs. asynchronous replication modes, and multi-tier DR landscapes.
Speed up your XPages Application performanceMaarga Systems
This document discusses best practices for optimizing performance of XPages applications on Domino servers. It covers recommended server hardware and software configurations including memory allocation, enabling server-side caching, and configuring timeouts. Application-level optimizations are also presented such as reducing database lookups, limiting partial refreshes, and properly using scoped variables. Tools for identifying bottlenecks like XPages Toolbox are also mentioned. The document aims to provide guidance for configuring servers and coding applications for optimal performance when deploying and maintaining XPages applications.
DB12c: All You Need to Know About the Resource ManagerAndrejs Vorobjovs
Resource Manager has changed a lot in Oracle Database 12c, especially if Oracle Multitenant is used. It can manage the available resources between the consumer groups in a single PDB as well as among all the PDBs. DBAs who are planning the upgrades or consolidations to Oracle Database 12c need to understand how the new resource manager works and how the existing resource management plans need to be changed to make them work in the new Oracle Multitenant configuration.
This paper will explain the differences between 11g and 12c resource manager, will dig into resource management features and limitations in 12c Oracle Multitenant, will provide guidelines for migrating your current resource management plan to 12c at the time of upgrade or consolidation, and will also reveal how much overhead the resource manager introduces.
This document discusses Oracle's database management and performance monitoring tools. It describes features like Automatic Database Diagnostic Monitor (ADDM) which can diagnose performance issues. It also discusses automatic SQL tuning which can capture and tune SQL statements. Real-time monitoring of SQL and PL/SQL is introduced which provides visibility into executions. The document emphasizes manageability and aims to boost administrator productivity through self-managing capabilities and issue detection.
More and more clients are looking to understand the capabilities of the OTM/G-Log architecture and configuration in order better tune OTM. Usually, this is required because of poor OTM performance or as preparation for significant changes to OTM configuration, volume, or platform. The client may be experience poor performance throughout the entire system or for a very specific use cases. The primary objective of a Performance Tuning Exercise is to understand how OTM is being utilized and to recommend solution to improve the performance of OTM.
We recommend and will take the audience through a “ground-up” performance tuning exercise, starting with hardware and infrastructure, moving to Java and App server tuning, then to OTM technical tuning and finally to the OTM functional tuning (data, agents, etc).
These audits may identify hardware constraints at each tier, networking, or other infrastructure constraints causing sub-optimal system performance. Simply stated, the performance audit will identify all bottlenecks in the system if they exist.
In many cases the largest performance is impacts are not hardware, but rather how the data is configured within the application. So as part of the exercise we will analyze database performance, individual SQL queries, OTM Queues, bulk planning parameters, agents, rates and the settlement process.
Understanding the methods which will best identify these bottlenecks will help you avoid performance issues early in your project and save considerable time and expense as you near go-live. This presentation will guide you through the steps necessary to better understand what is impacting performance and how to best handle it. It will provide lessons learned and tools that are available to you better manage and maintain a healthy OTM environment.
Presented by Chris Plough at MavenWire
by Ben Willett, Solutions Architect, AWS
Database Week at the AWS Loft is an opportunity to learn about Amazon’s broad and deep family of managed database services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon RDS and Amazon Aurora relational databases, Amazon DynamoDB non-relational databases, Amazon Neptune graph databases, and Amazon ElastiCache managed Redis, along with options for database migration, caching, search and more. You'll will learn how to get started, how to support applications, and how to scale.
Similar to IBM Information on Demand 2013 - Session 2839 - Using IBM PureData System for Analytics Workload Management (20)
Serverless Cloud Data Lake with Spark for Serving Weather Data
1) The document discusses using a serverless architecture with IBM Cloud services like SQL Query powered by Spark, Cloud Object Storage, and Cloud Functions to build a cost-effective cloud data lake for serving historical weather data on demand.
2) It describes how data skipping techniques and geospatial indexes in SQL Query can accelerate queries by an order of magnitude by pruning irrelevant data.
3) The new serverless solution provides unlimited storage, global coverage, and supports large queries for machine learning and analytics at an order of magnitude lower cost than the previous implementation.
This document summarizes an IBM Cloud Day 2021 presentation on IBM Cloud Data Lakes. It describes the architecture of IBM Cloud Data Lakes including data skipping capabilities, serverless analytics, and metadata management. It then discusses an example COVID-19 data lake built on IBM Cloud to provide trusted COVID-19 data to analytics applications. Key aspects included landing, preparation, and integration zones; serverless pipelines for data ingestion and transformation; and a data mart for querying and reporting.
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
- The document discusses serverless data analytics using IBM's cloud services, including a serverless data lake built on cloud object storage, serverless SQL queries using Spark, and serverless data processing functions.
- It provides an example of a COVID-19 data lake built on IBM Cloud that collects and integrates data from various sources, prepares and transforms the data, and makes it available for analytics and dashboards through serverless SQL queries.
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
- The document discusses an IBM Cloud Day 2021 event focused on well-architected data lakes. It provides an overview of two sessions on data lake architecture and building a cloud native data lake on IBM Cloud.
- It also summarizes the key capabilities organizations need from a data lake, including visualizing data, flexibility/accessibility, governance, and gaining insights. Cloud data lakes can address these needs for various roles.
IBM's Cloud-based Data Lake for Analytics and AI presentation covered:
1) IBM's cloud data lake provides serverless architecture, low barriers to entry, and pay-as-you-go pricing for analytics on data stored in cloud object storage.
2) The data lake offers SQL-based data exploration, transformation, and analytics capabilities as well as industry-leading optimizations for time series and geospatial data.
3) Security features include customer-controlled encryption keys and options to hide SQL queries and keys from IBM.
The document summarizes IBM's cloud data lake and SQL query services. It discusses how these services allow users to ingest, store, and analyze large amounts of data in the cloud. Key points include that IBM's cloud data lake provides a fully managed data lake service with serverless consumption and fully elastic scaling. It also discusses how IBM SQL Query allows users to analyze data stored in cloud object storage using SQL, and supports various data formats and analytics use cases including log analysis, time series analysis, and spatial queries.
Serverless SQL provides a serverless analytics platform that allows users to analyze data stored in object storage without having to manage infrastructure. Key features include seamless elasticity, pay-per-query consumption, and the ability to analyze data directly in object storage without having to move it. The platform includes serverless storage, data ingest, data transformation, analytics, and automation capabilities. It aims to create a sharing economy for analytics by allowing various users like developers, data engineers, and analysts flexible access to data and analytics.
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
Cloud is a sharing economy that reduces your spending. But does this also apply to data and analytics? Doesn't this require you to provision dedicated data warehouse systems to run analytics SQL queries on terabytes of data? With IBM Cloud, the answer is no. By using serverless analytics via IBM Cloud SQL Query, you can analyze your data directly where it sits, be it in IBM Cloud Object Storage or in your NoSQL databases. Due to the serverless nature of SQL Query, you only pay for your queries depending on the data volume that they process. There are no standing costs. You do not need to provision and wait for a data warehouse. But you can still run SQLs on terabytes of data.
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
You don't necessarily have to set up a relational database, tables and load data in order to use a surprisingly rich set of SQL capabilities on your data in the cloud. IBM SQL Query lets you analyze terabytes of distributed data of heterogeneous formats with a complete ANSI SQL dialect in a completely serverless usage model, elegantly ETL data between formats and partitioning layouts as needed, and run complex time series transformations, analysis and correlations with advanced built-in timeseries SQL algorithms that are differentiating in the entire industry. It also support a complete PostGIS compliant geospatial SQL function set. Come explore the stunningly advanced world of SQL without a database in IBM Cloud.
IBM THINK 2019 - Cloud-Native Clickstream Analysis in IBM CloudTorsten Steinbach
Agile user and workload insights are one of the key elements of a cloud-native solution. When done well, this represents a real competitive advantage. In this session, we show you how to run cloud-native clickstream analysis with IBM Cloud. By combining serverless mechanisms like object storage for affordable and scalable persistency with SQL Query for serverless analysis of your clickstream data, you can establish a very cost-effective clickstream analysis pipeline easily and quickly.
IBM THINK 2019 - Self-Service Cloud Data Management with SQL Torsten Steinbach
SQL is a powerful language to express data transformations. But did you know that you can also use IBM Cloud SQL to convert data between various data formats and layouts on disks? In this session, you will see the full power of using SQL Query to move and transform your cloud data in an entirely self-service fashion. You can specify any data format, layout or partitioning with a simple SQL statement. See how you can move and transform terabytes of data in the cloud in a very scalable fashion and still being charged only for the individual SQL movement and transformation jobs without having standing costs.
Torsten Steinbach and Chris Glew present IBM Cloud Query, a serverless analytics service that allows users to run ANSI SQL queries against data stored in cloud object storage. Some key points:
- IBM Cloud Query allows users to query data in various open formats like CSV, Parquet, and JSON stored in cloud object storage using SQL, with results also stored in object storage.
- It has a pay-per-query pricing model with no infrastructure to manage. Queries can be run via a web console, REST API, or Python client.
- The presentation outlines the architecture and provides examples of using Cloud Query for log analytics, data exploration, and building serverless data pipelines with Cloud Functions.
IBM Insight 2014 - Advanced Warehouse Analytics in the CloudTorsten Steinbach
- dashDB is IBM's cloud data warehouse service that provides advanced analytics capabilities in the cloud
- It offers three deployment options: an entry plan deployed within Bluemix, a bare metal option, and virtual machine options
- The entry plan provides terabyte-scale capacity within Bluemix while the bare metal and VM options provide more capacity and dedicated resources
- All options provide in-database analytics and backup/restore to Swift object storage for high availability
IBM Insight 2015 - 1823 - Geospatial analytics with dashDB in the cloudTorsten Steinbach
This document discusses geospatial analytics capabilities in IBM dashDB. It describes how dashDB supports geospatial data types and functions that allow spatial queries and analysis. This includes functions for spatial predicates, constructors, and calculations. GeoJSON and other formats can be loaded and dashDB implements OGC and ISO spatial standards. Predictive analytics is also possible using the R extension to dashDB. Overall the summary discusses dashDB's geospatial and predictive analytic capabilities for spatial data.
IBM Insight 2015 - 1824 - Using Bluemix and dashDB for Twitter AnalysisTorsten Steinbach
Using Bluemix and dashDB for Twitter Analysis
This document discusses using IBM's Bluemix and dashDB services for Twitter analysis. It provides an overview of the IBM Insights for Twitter service in Bluemix, which allows querying and searching over enriched Twitter data stored in dashDB. Examples are given of queries that can be performed, such as searching for tweets about an upcoming movie within a time frame or searching for tweets with positive sentiment about a product. The document also discusses loading Twitter data into dashDB using a Bluemix app and performing predictive analytics on the data using built-in R and Python capabilities in dashDB.
IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company i...Torsten Steinbach
This document summarizes a presentation on analyzing weather data using IBM's Cloud-Based Analytics of The Weather Company in IBM Bluemix. It discusses loading weather data from various sources into the IBM dashDB data warehouse for analysis using R and Python. Key points include:
- Loading weather data from sources like S3, Swift, on-premise databases, and The Weather Company into IBM Bluemix for analysis in dashDB.
- Using the ibmdbR and ibmdbPy packages to interface with dashDB from R and Python, performing analytics like predictive modeling, statistics, and visualizations directly in the database.
- Publishing predictive models and analytics as web applications using the dashDB REST API.
This document discusses analyzing geospatial data with IBM Cloud Data Services and Esri ArcGIS. It provides an overview of using Cloudant as a NoSQL database to store geospatial data in GeoJSON format and then load it into IBM dashDB for analytics. GeoJSON data can be stored in Cloudant in three different structures - as simple geometry, feature collections, or features. The document also describes how geospatial data from Cloudant can be transformed and loaded into dashDB tables for analysis using IBM data warehousing technologies.
Penify - Let AI do the Documentation, you write the Code.KrishnaveniMohan1
Penify automates the software documentation process for Git repositories. Every time a code modification is merged into "main", Penify uses a Large Language Model to generate documentation for the updated code. This automation covers multiple documentation layers, including InCode Documentation, API Documentation, Architectural Documentation, and PR documentation, each designed to improve different aspects of the development process. By taking over the entire documentation process, Penify tackles the common problem of documentation becoming outdated as the code evolves.
https://www.penify.dev/
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Flutter vs. React Native: A Detailed Comparison for App Development in 2024dhavalvaghelanectarb
Choosing the right framework for your cross-platform mobile app can be a tough decision. Both Flutter and React Native offer compelling features and have earned their place in the development world. Here is a detailed comparison to help you weigh their strengths and weaknesses. Here are the pros and cons of developing mobile apps in React Native vs Flutter.
What is Continuous Testing in DevOps - A Definitive Guide.pdfkalichargn70th171
Once an overlooked aspect, continuous testing has become indispensable for enterprises striving to accelerate application delivery and reduce business impacts. According to a Statista report, 31.3% of global enterprises have embraced continuous integration and deployment within their DevOps, signaling a pervasive trend toward hastening release cycles.
Superpower Your Apache Kafka Applications Development with Complementary Open...Paul Brebner
Kafka Summit talk (Bangalore, India, May 2, 2024, https://events.bizzabo.com/573863/agenda/session/1300469 )
Many Apache Kafka use cases take advantage of Kafka’s ability to integrate multiple heterogeneous systems for stream processing and real-time machine learning scenarios. But Kafka also exists in a rich ecosystem of related but complementary stream processing technologies and tools, particularly from the open-source community. In this talk, we’ll take you on a tour of a selection of complementary tools that can make Kafka even more powerful. We’ll focus on tools for stream processing and querying, streaming machine learning, stream visibility and observation, stream meta-data, stream visualisation, stream development including testing and the use of Generative AI and LLMs, and stream performance and scalability. By the end you will have a good idea of the types of Kafka “superhero” tools that exist, which are my favourites (and what superpowers they have), and how they combine to save your Kafka applications development universe from swamploads of data stagnation monsters!
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceICS
This webinar explores the “secure-by-design” approach to medical device software development. During this important session, we will outline which security measures should be considered for compliance, identify technical solutions available on various hardware platforms, summarize hardware protection methods you should consider when building in security and review security software such as Trusted Execution Environments for secure storage of keys and data, and Intrusion Detection Protection Systems to monitor for threats.
Boost Your Savings with These Money Management AppsJhone kinadey
A money management app can transform your financial life by tracking expenses, creating budgets, and setting financial goals. These apps offer features like real-time expense tracking, bill reminders, and personalized insights to help you save and manage money effectively. With a user-friendly interface, they simplify financial planning, making it easier to stay on top of your finances and achieve long-term financial stability.
🏎️Tech Transformation: DevOps Insights from the Experts 👩💻campbellclarkson
Connect with fellow Trailblazers, learn from industry experts Glenda Thomson (Salesforce, Principal Technical Architect) and Will Dinn (Judo Bank, Salesforce Development Lead), and discover how to harness DevOps tools with Salesforce.
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfkalichargn70th171
Moving to a more digitally focused era, the importance of software is rapidly increasing. Software tools are crucial for upgrading life standards, enhancing business prospects, and making a smart world. The smooth and fail-proof functioning of the software is very critical, as a large number of people are dependent on them.
The Comprehensive Guide to Validating Audio-Visual Performances.pdfkalichargn70th171
Ensuring the optimal performance of your audio-visual (AV) equipment is crucial for delivering exceptional experiences. AV performance validation is a critical process that verifies the quality and functionality of your AV setup. Whether you're a content creator, a business conducting webinars, or a homeowner creating a home theater, validating your AV performance is essential.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Streamlining End-to-End Testing Automation with Azure DevOps Build & Release Pipelines
Automating end-to-end (e2e) test for Android and iOS native apps, and web apps, within Azure build and release pipelines, poses several challenges. This session dives into the key challenges and the repeatable solutions implemented across multiple teams at a leading Indian telecom disruptor, renowned for its affordable 4G/5G services, digital platforms, and broadband connectivity.
Challenge #1. Ensuring Test Environment Consistency: Establishing a standardized test execution environment across hundreds of Azure DevOps agents is crucial for achieving dependable testing results. This uniformity must seamlessly span from Build pipelines to various stages of the Release pipeline.
Challenge #2. Coordinated Test Execution Across Environments: Executing distinct subsets of tests using the same automation framework across diverse environments, such as the build pipeline and specific stages of the Release Pipeline, demands flexible and cohesive approaches.
Challenge #3. Testing on Linux-based Azure DevOps Agents: Conducting tests, particularly for web and native apps, on Azure DevOps Linux agents lacking browser or device connectivity presents specific challenges in attaining thorough testing coverage.
This session delves into how these challenges were addressed through:
1. Automate the setup of essential dependencies to ensure a consistent testing environment.
2. Create standardized templates for executing API tests, API workflow tests, and end-to-end tests in the Build pipeline, streamlining the testing process.
3. Implement task groups in Release pipeline stages to facilitate the execution of tests, ensuring consistency and efficiency across deployment phases.
4. Deploy browsers within Docker containers for web application testing, enhancing portability and scalability of testing environments.
5. Leverage diverse device farms dedicated to Android, iOS, and browser testing to cover a wide range of platforms and devices.
6. Integrate AI technology, such as Applitools Visual AI and Ultrafast Grid, to automate test execution and validation, improving accuracy and efficiency.
7. Utilize AI/ML-powered central test automation reporting server through platforms like reportportal.io, providing consolidated and real-time insights into test performance and issues.
These solutions not only facilitate comprehensive testing across platforms but also promote the principles of shift-left testing, enabling early feedback, implementing quality gates, and ensuring repeatability. By adopting these techniques, teams can effectively automate and execute tests, accelerating software delivery while upholding high-quality standards across Android, iOS, and web applications.
3. PureData for Analytics (Striper, Full Rack)
PureData for Analytics Architecture
SPU Blade 1
Disk 1
Slice 1
Disk 40
Q3
Q2
DataReceive
Slice 40
Q1
Q3
Q2
DataReceive Central Host
...
...
Q1
Q3
Q2Load 1
Load 2
BI App
• Operational Query: Q1
• Analytics Query: Q2
Power User
Heavy ad-hoc query: Q3
ETL 1
Load 1
ETL 2
Load 2
SPU Blade 6
Disk 1
Slice 1
Disk 40
Q3
Q2
DataReceive
Slice 40
Q3
Q2
DataReceive
...
...
Standby
Host
. . .
4. Challenging Workload Situations
Large amount of concurrent workload
Large queries can run out of memory and temp space
Throughput can be worse than with lower concurrency
Concurrent mix of short queries & heavy queries
Large queries can starve out the short ones
Concurrent ingest & queries
Loads can starve out queries
Rushing (workload shifts)
Sudden arrival of large amount of a set of users/app can
monopolize the system
Runaway queries
Carelessly submitted heavy queries (e.g. by power user) can
occupy system without business value (e.g. cartesian join)
10 min
5. Resources that matter for Workload Management
Allocation
Memory
Temporary storage
Utilization
CPU
I/O bandwidth
Network bandwidth
In use even when query is not
actively worked on
We call these the fixed resources
In use only when query is
actively worked on
We call these the
renewable resources
6. Meeting User Objectives through WLM
Simple user-oriented way to specify performance goals
Ability to sub-divide system resources and assign
to different users, tenants or applications
Low level control knobs (such as declarative concurrency limits)
should not be the primary user model of WLM
Ensure consistent performance for a tenant
Don’t “spoil” users just because the system could at the moment
Ability to declare maximum resource limit for a tenant
Respect declared relative priorities
Allow explicit declaration of query priority by the application/user
Higher priority queries always go before lower priority queries
60%
25%
15%
7. The Control Tool Box
Pag
e 7
Admission
Sequence
I/O Priority
Process
CPU Priority
Delay
Allocation &
Concurrency Limits
20 min
8. Admission Control through Scheduler Queues
GATEKEEPER
GRA
SNIPPET
Disk
Fabric
CPU
JOBS PLANS SNIPPETS
PLANNER
Control admission by
priority & duration
Control admission & execution
by renewable resource share
Control admission
by fixed resource fit
9. Declaring Priorities
Pag
e 9
Four priority levels: Critical, High, Medium, Low
Higher priority queries get served first within the
same resource sharing group
System level default priority:
SET SYSTEM DEFAULT [DEFPRIORITY | MAXPRIORITY ]
TO [CRITICAL | HIGH | NORMAL | LOW | NONE]
Set default priority per permission group:
CREATE GROUP <group name> WITH DEFPRIORITY <prio>;
Change default priority of specific user:
ALTER USER <user> WITH DEFPRIORITY LOW MAXPRIORITY HIGH;
Changing priority of existing session:
nzsession priority -high –u <user> -pw <pw> -id <session ID>, or
ALTER SESSION [<session ID>] SET PRIORITY TO <prio>;
10. Gatekeeper
Page 10
Limits how many plans can run concurrently
By priority and estimated duration
host.gkEnabled=yes
Priority Queues
Critical & High (host.gkHighPriQueries),
Normal(host.gkMaxPerQueue), Low (host.gkLowPriQueries)
Duration queues
Split Normal by estimated duration
host.gkMaxPerQueue=20,5,3,1
host.gkQueueThreshold=1,10,60,-1
Passes jobs to GRA
11. GRA & Resource Sharing Groups
Page 11
Resource Sharing Groups (RSGs)
Different from user groups for permissions
A group with a resource minimum:
CREATE GROUP Analysts WITH RESOURCE MINIMUM 50;
User in only one RSG
By default: public
Optionally: Job Limit
GRA
(Guaranteed Resource Allocation)
Accuracy: +/- 5% resource use
CPU, Disk, Network; Host & SPU
Averaged over trailing window of one hour
Control Mechanisms
Admission: Job order by groups’ compliance with goals
Execution: feedback loop that modifies weights & applies “greed” waits
Sum of all
currently active
groups ≙ 100%
12. Short Query Bias – SQB
Reserving resources for short queries
Part of memory is reserved for short queries only
Short queries in special queue per group that is always served first
host.schedSQBEnabled=true
host.schedSQBNominalSecs=2
Cache retention priority for transient data (nzlocal)
Reserved resources
host.schedSQBReservedGraSlots, host.schedSQBReservedSnSlots
host.schedSQBReservedSnMb, host.schedSQBReservedHostMb
30 min
13. GRA Ceilings
Data Service Providers need to control user experience
Give the user only the performance that he paid for
Don’t “spoil” users
GRA can hard limit a group’s resources share
ALTER GROUP ... WITH RESOURCE MAXIMUM 30;
MAX can be larger than MIN (allow limited headroom)
Controlled by inserting delay
Delay at end of each snippet
Until it would have ended
13
30%
Resources
Time
A
delay
A A A
B B B B
A A A A ...
...
Resources
Time
B ...delayB delayB delayB
14. GRA Load
Pag
e
Load: Insert from an external table
INSERT INTO JUNK SELECT * FROM EXTERNAL '<stdin>‘ ...
Host only snippet – no SPU snippet!
Data sent to common data receive thread per slice
GRA’s execution control using weights has no bite on loads
Can’t balance load requirements, queries get clobbered
GRA load as an additional mechanism on top of GRA
host.schedGRALoadEnabled=true
Controls load rates based on a load performance model
“How fast could the load go without any concurrent stuff”
Limits data send rate according to GRA group goal
Tracks system utilization and actual rates
15. WLM Mechanisms Review
Priority
Gatekeeper
GRA
SQB
GRA Ceilings
GRA Load
30%
60%
25%
15%
16. Usage Scenarios
Application Consolidation
Mixed workload: ELT v reports v interactive
Departmental Chargeback
Data Analysis Service Provider
System rollout / Application migration
Power Users
17. Usage Scenarios: Application Consolidation
Combine applications from separate systems
Need to maintain SLAs, provide fair share
Use RESOURCE MINIMUM per application
If one group has no jobs, others will expand
Workload App A App B App C
Setup 50% 30% 20%
No App A - 60% 40%
No A, B - - 100%
18. Usage Scenarios: Mixed Workload
Uncontrolled ELT may affect queries
Big queries (reports / analytics) may delay little ones
Interactive are highly variable and sensitive
* Limit loads only when you want other groups to fully expand
Workload MINIMUM MAXIMUM JOB LIMIT
ELT 10-30% 10% * 4-10 or OFF
Reports 20-40% 4-10 or OFF
Prompts 40-70% 100% OFF
19. Usage Scenarios: Department control
System used by independent departments
Or applications
Want to control them; want some balance
But OK to use more if nobody else needs it
Create a RESOURCE GROUP for each
Set RESOURCE MINIMUM as expected
Monitor / change share over time based on _V_SCHED_GRA
May even have chargebacks
System charged to departments / cost centers
Track via _V_SCHED_GRA
20. Usage Scenarios: Service Provider
Data Analysis Service Provider
Paying customers – need to limit
Fixed Departmental Chargeback
System charged to departments / cost centers
Not variable: FIXED
They paid for 10%, refused to pay more; They only get 10%!
RESOURCE MAXIMUM
Limits use of system; does not expand
21. Usage Scenarios
New system rollout
Consistent experience as applications arrive
Set RESOURCE MAXIMUM for early users
Increase over time; eventually remove
Power Users
Individuals that write raw SQL
Killer / crazy queries; and lots of them!
Use JOB LIMIT, GK queue limits, runaway query event
22. Work Load Analysis
Capacity planning & application fit
Query History - Shared, remotable DB (NPS)
Query text, start / end, queue time …
Virtual table: _v_sched_gra
Per group: jobs started / completed, resource details,
compliance, busy%, …
Virtual table: _v_system_util
Host & SPU resources: CPU, disk (table & temp), network
Nzportal
23. WLM Guidelines & Best Practices
No more than 8 -10 Resource Sharing Groups
Want each group to be able to run concurrently
Roughly N max size snippets at once
Approximately 11
+/- 5% means that a 5% group could get NO TIME and be
compliant
Smaller groups are harder to keep balanced
24. WLM Guidelines & Best Practices
RESOURCE MINIMUMs should add up to 100%
Not strictly necessary
Easier to think about
OK to change RSG minimums on the fly
e.g. to have different day/night balances
ADMIN
Gets at least 50% of system resources
Avoid using the Admin user account for normal work
Gets a lot of boosts, can ruin balance for other groups
Like "root" - use in emergencies, occasional privileged access
25. WLM Guidelines & Best Practices
Short Query Bias (SQB)
Many boosts for "short" queries
Go to the head of the queue
Can use reserved memory etc.
More CPU, preferential disk access
Default: estimate less than 2 seconds
May not be right for you
Make sure short queries are short!
Check plan files
26. WLM Guidelines & Best Practices
PRIORITY
Control queries within a group
E.g. interactive queries v reports v loads
Users, groups have defaults
Can set in a session & for running queries
Two impacts:
Select higher first; go to the head of the queue
Increase resource share for a query -- within the group
Normal gets 2X Low, High gets 2X normal, …
27. WLM Guidelines & Best Practices
RESOURCE MAXIMUM
Limit a RSG
To protect other RSGs
Other cases: pay for use, control growth experience
Generally 5% accuracy: average over an hour
Uses delay latency variation
Values should be between 30% and 80%.
Larger values are sort of meaningless, not very effective
Smaller values introduce a lot of variability
28. WLM Guidelines & Best Practices
Limiting Jobs: two ways
RSG JOB LIMIT works for a specific RSG
Example: limit the ETL group to 10 loads
ALTER GROUP … WITH JOB LIMIT 10
Gatekeeper queues: limit jobs across RSGs
Example: limit large jobs across the entire system
Set query priority to LOW (user or session)
Limit the GK LOW queue size
Example: limit long jobs across the system
Split GK normal queue at (say) 300 seconds
<300 seconds 48, >300 seconds 5
29. WLM Guidelines & Best Practices
JOB LIMIT
Limits one RSG; protects others
Consider job type & peak throughput
A few medium queries can hit peak
Maybe ten or so loads
Small queries? May need dozens
Limits shorts, longs, all priorities
JOB LIMIT best for groups with big queries, loads
30. WLM Guidelines & Best Practices
Experiment: Limit changes, record, verify.
Your workload is not the same as others
Your workload today is not the same as yesterday’s
Effects may depend on subtle workload differences
Effects can be hard to predict
31. Thank You
Your feedback is important!
• Access the Conference Agenda Builder to
complete your session surveys
o Any web or mobile browser at
http://iod13surveys.com/surveys.html
o Any Agenda Builder kiosk onsite