This document discusses various ways to optimize FirebirdSQL performance, including hardware optimization like using SSD drives, RAID 10, and enabling write caches. It also discusses optimizing the Firebird server configuration by increasing cache sizes, enabling CPU affinity, and optimizing programming and SQL. Specific tips include using prepared statements, limiting record fetches, avoiding unnecessary indexes, and using analytical functions in Firebird 3.0. Overall, the document provides over 45 tips across hardware, software, programming, and SQL optimizations to improve FirebirdSQL performance.
Arrow Flight is a proposed RPC layer for Apache Arrow that allows for efficient transfer of Arrow record batches between systems. It uses GRPC as the foundation to define streams of Arrow data that can be consumed in parallel across locations. Arrow Flight supports custom actions that can be used to build services on top of the generic API. By extending GRPC, Arrow Flight aims to simplify the creation of data applications while enabling high performance data transfer and locality awareness.
Hyperspace: An Indexing Subsystem for Apache SparkDatabricks
At Microsoft, we store datasets (both from internal teams and external customers) ranging from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative, ‘finding needle in a haystack’ type of queries (e.g., point-lookups, summarization etc.).
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...HostedbyConfluent
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kaundinya | Current 2022
When Apache Kafka® was first created, one of the hallmarks was its native replication protocol, which provided built-in resiliency in the system. As a business scales, there’s a need to have this fault-tolerance transcend beyond the local data center, and a multi-geographic deployment becomes critical. Traditionally, Kafka Connect based solutions have tried their hand at enabling these types of deployments. However, this presents its own set of operational challenges that can be quite costly.
In this talk, we will go over how you can use the existing replication protocol across clusters. You will learn how to use Cluster Linking to run a multi-region data streaming deployment without the burden and operational overhead of running yet another data system. We will discuss:.
* Automation options for creating mirror topics
* Failover processes and caveats to consider
* Handling ACL replication and consumer offset synchronization
* And more!
So, join us on this intergalactic journey to discover how you can use Cluster Linking to decrease your operational overhead, maintain a multi-geographic deployment, and perhaps even reach infinity (and beyond)!
Dynamic filtering for presto join optimisationOri Reshef
@Roman Zeyde Explains how to optimize Presto Joins in selective use cases.
Roman is a Talpiot graduate and an ex-googler, today working as Varada presto architect.
Fail-Safe Cluster for FirebirdSQL and something moreAlexey Kovyazin
With Firebird HQbird it is possible to create high available cluster or warm standby solution. This presentation defines the problem and describes ways how to create such solutions.
Deep Dive into the New Features of Apache Spark 3.0Databricks
Continuing with the objectives to make Spark faster, easier, and smarter, Apache Spark 3.0 extends its scope with more than 3000 resolved JIRAs. We will talk about the exciting new developments in the Spark 3.0 as well as some other major initiatives that are coming in the future.
This document discusses various ways to optimize FirebirdSQL performance, including hardware optimization like using SSD drives, RAID 10, and enabling write caches. It also discusses optimizing the Firebird server configuration by increasing cache sizes, enabling CPU affinity, and optimizing programming and SQL. Specific tips include using prepared statements, limiting record fetches, avoiding unnecessary indexes, and using analytical functions in Firebird 3.0. Overall, the document provides over 45 tips across hardware, software, programming, and SQL optimizations to improve FirebirdSQL performance.
Arrow Flight is a proposed RPC layer for Apache Arrow that allows for efficient transfer of Arrow record batches between systems. It uses GRPC as the foundation to define streams of Arrow data that can be consumed in parallel across locations. Arrow Flight supports custom actions that can be used to build services on top of the generic API. By extending GRPC, Arrow Flight aims to simplify the creation of data applications while enabling high performance data transfer and locality awareness.
Hyperspace: An Indexing Subsystem for Apache SparkDatabricks
At Microsoft, we store datasets (both from internal teams and external customers) ranging from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative, ‘finding needle in a haystack’ type of queries (e.g., point-lookups, summarization etc.).
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...HostedbyConfluent
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kaundinya | Current 2022
When Apache Kafka® was first created, one of the hallmarks was its native replication protocol, which provided built-in resiliency in the system. As a business scales, there’s a need to have this fault-tolerance transcend beyond the local data center, and a multi-geographic deployment becomes critical. Traditionally, Kafka Connect based solutions have tried their hand at enabling these types of deployments. However, this presents its own set of operational challenges that can be quite costly.
In this talk, we will go over how you can use the existing replication protocol across clusters. You will learn how to use Cluster Linking to run a multi-region data streaming deployment without the burden and operational overhead of running yet another data system. We will discuss:.
* Automation options for creating mirror topics
* Failover processes and caveats to consider
* Handling ACL replication and consumer offset synchronization
* And more!
So, join us on this intergalactic journey to discover how you can use Cluster Linking to decrease your operational overhead, maintain a multi-geographic deployment, and perhaps even reach infinity (and beyond)!
Dynamic filtering for presto join optimisationOri Reshef
@Roman Zeyde Explains how to optimize Presto Joins in selective use cases.
Roman is a Talpiot graduate and an ex-googler, today working as Varada presto architect.
Fail-Safe Cluster for FirebirdSQL and something moreAlexey Kovyazin
With Firebird HQbird it is possible to create high available cluster or warm standby solution. This presentation defines the problem and describes ways how to create such solutions.
Deep Dive into the New Features of Apache Spark 3.0Databricks
Continuing with the objectives to make Spark faster, easier, and smarter, Apache Spark 3.0 extends its scope with more than 3000 resolved JIRAs. We will talk about the exciting new developments in the Spark 3.0 as well as some other major initiatives that are coming in the future.
Sputnik: Airbnb’s Apache Spark Framework for Data EngineeringDatabricks
Apache Spark is a general-purpose big data execution engine. You can work with different data sources with the same set of API in both batch and streaming mode.
Building Active Directory Monitoring with Telegraf, InfluxDB, and GrafanaBoni Yeamin
Building Active Directory Monitoring with Telegraf, InfluxDB, and Grafana: A Brief Overview
Active Directory (AD) Monitoring is essential for maintaining network security, performance, and compliance. One powerful approach to achieve this is by utilizing the combination of Telegraf, InfluxDB, and Grafana.
Telegraf: Data Collection
Telegraf acts as a versatile data collector, capable of retrieving various metrics from your AD environment. It offers a range of plugins to monitor AD-related parameters, including event logs, replication status, user activity, and more. Telegraf gathers these metrics and prepares them for further processing.
InfluxDB: Data Storage
InfluxDB serves as a robust time-series database, designed to handle high-frequency data updates. It's an ideal choice for storing the metrics collected by Telegraf. The schemaless architecture accommodates evolving data requirements. Metrics are stored with timestamps, making historical analysis and trend identification seamless.
Grafana: Data Visualization
Grafana excels in turning data into meaningful insights. It connects to InfluxDB and transforms raw metrics into interactive, visually appealing dashboards. You can design custom visualizations, such as line charts for monitoring replication status, gauges for real-time user login activity, and tables for critical event logs. Alerts can also be set up to notify administrators of anomalies.
For a long time, relational database management systems have been the only solution for persistent data store. However, with the phenomenal growth of data, this conventional way of storing has become problematic.
To manage the exponentially growing data traffic, largest information technology companies such as Google, Amazon and Yahoo have developed alternative solutions that store data in what have come to be known as NoSQL databases.
Some of the NoSQL features are flexible schema, horizontal scaling and no ACID support. NoSQL databases store and replicate data in distributed systems, often across datacenters, to achieve scalability and reliability.
The CAP theorem states that any networked shared-data system (e.g. NoSQL) can have at most two of three desirable properties:
• consistency(C) - equivalent to having a single up-to-date copy of the data
• availability(A) of that data (for reads and writes)
• tolerance to network partitions(P)
Because of this inherent tradeoff, it is necessary to sacrifice one of these properties. The general belief is that designers cannot sacrifice P and therefore have a difficult choice between C and A.
In this seminar two NoSQL databases are presented: Amazon's Dynamo, which sacrifices consistency thereby achieving very high availability and Google's BigTable, which guarantees strong consistency while provides only best-effort availability.
This document discusses Flipkart's search architecture and how it addresses challenges for e-commerce search. It has a diverse catalog of 13 million products across 900 categories. It needs high performance with 99.99% availability and 1000 queries per second. There are also high rates of updates. Solutions discussed include caching, external source fields for sorting/faceting/filtering, and relevance optimizations. Caching improves performance 10-50x by caching results. External fields help with updates and partitioning. Relevance is tuned using boosts, user feedback, and query classification.
The document describes Google File System (GFS), which was designed by Google to store and manage large amounts of data across thousands of commodity servers. GFS consists of a master server that manages metadata and namespace, and chunkservers that store file data blocks. The master monitors chunkservers and maintains replication of data blocks for fault tolerance. GFS uses a simple design to allow it to scale incrementally with growth while providing high reliability and availability through replication and fast recovery from failures.
This document discusses logical replication with pglogical. It begins by explaining that pglogical performs row-oriented replication and outputs replication data that can be used in various ways. It then covers the architectures of standalone PostgreSQL, physical replication, and logical replication. The rest of the document discusses key aspects of pglogical such as its output plugin, selective replication capabilities, performance and future plans, and examples of using the output with other applications.
Batch Processing at Scale with Flink & IcebergFlink Forward
Flink Forward San Francisco 2022.
Goldman Sachs's Data Lake platform serves as the firm's centralized data platform, ingesting 140K (and growing!) batches per day of Datasets of varying shape and size. Powered by Flink and using metadata configured by platform users, ingestion applications are generated dynamically at runtime to extract, transform, and load data into centralized storage where it is then exported to warehousing solutions such as Sybase IQ, Snowflake, and Amazon Redshift. Data Latency is one of many key considerations as producers and consumers have their own commitments to satisfy. Consumers range from people/systems issuing queries, to applications using engines like Spark, Hive, and Presto to transform data into refined Datasets. Apache Iceberg allows our applications to not only benefit from consistency guarantees important when running on eventually consistent storage like S3, but also allows us the opportunity to improve our batch processing patterns with its scalability-focused features.
by
Andreas Hailu
Flink provides unified batch and stream processing. It natively supports streaming dataflows, long batch pipelines, machine learning algorithms, and graph analysis through its layered architecture and treatment of all computations as data streams. Flink's optimizer selects efficient execution plans such as shipping strategies and join algorithms. It also caches loop-invariant data to speed up iterative algorithms and graph processing.
URP? Excuse You! The Three Kafka Metrics You Need to KnowTodd Palino
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
Under-replicated Partitions: The mother of all metrics
Request Latencies: Why your users complain
Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
The document discusses Protocol Buffers, which are a mechanism for serializing structured data. It provides advantages over XML such as being smaller, faster, and generating easier to use data access classes. It describes how protocol buffers work by defining data formats in a .proto file, then using the protocol buffer compiler to generate classes to access serialized data. The document outlines best practices for protocol buffers like maintaining backward compatibility and choosing field numbers and types carefully.
Best Practices for Becoming an Exceptional Postgres DBA EDB
Drawing from our teams who support hundreds of Postgres instances and production database systems for customers worldwide, this presentation provides real-real best practices from the nation's top DBAs. Learn top-notch monitoring and maintenance practices, get resource planning advice that can help prevent, resolve, or eliminate common issues, learning top database tuning tricks for increasing system performance and ultimately, gain greater insight into how to improve your effectiveness as a DBA.
The document describes the Volcano/Cascades query optimizer. It uses dynamic programming to efficiently search the large space of possible query execution plans. The optimizer represents queries as logical and physical operators connected by transformation and implementation rules. It explores the logical plan space and then builds physical plans by applying these rules. The search is guided by estimating physical operator costs. The optimizer memoizes partial results to avoid redundant work. This approach allows finding optimal execution plans in a principled way that scales to complex queries and optimizer extensions.
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
In this talk, Pascal Desmarets, CEO and Founder of Hackolade discusses the foundations of NoSQL data modeling. He highlights:
- Why is data modeling a key success factor?
- “Sweet spot” use cases where NoSQL shines the most
- Basic principles of Data Modeling for ScyllaDB
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
- Profiling Hadoop jobs at Twitter revealed that compression/decompression of intermediate data and deserialization of complex object keys were very expensive. Optimizing these led to performance improvements of 1.5x or more.
- Using columnar file formats like Apache Parquet allows reading only needed columns, avoiding deserialization of unused data. This led to gains of up to 3x.
- Scala macros were developed to generate optimized implementations of Hadoop's RawComparator for common data types, avoiding deserialization for sorting.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
PostgreSQL major version upgrade using built in Logical ReplicationAtsushi Torikoshi
This document discusses upgrading a PostgreSQL database from a major version to another using logical replication. It begins with an introduction to logical replication and its advantages over other upgrade methods. It then covers the architecture of logical replication, including how the walsender reads WAL files and applies changes to the subscriber. Finally, it addresses some limitations of logical replication for upgrades, such as objects that are not replicated, and provides strategies for completing the upgrade while minimizing downtime.
The document discusses MySQL's buffer pool and buffer management. It describes how the buffer pool caches frequently accessed data in memory for faster access. The buffer pool contains several lists including a free list, LRU list, and flush list. It explains functions for reading pages from storage into the buffer pool, replacing pages using LRU, and flushing dirty pages to disk including single page flushes during buffer allocation.
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
Realtime analytics over large datasets has become an increasing wide-spread demand, over the past several years, Hadoop ecosystem has been continuously evolving, even complex queries over large datasets can be realized in an interactive fashion with distributed processing framework like Apache Spark, new paradigm of efficient storage were introduced as well to facilitate data processing framework, such as Apache Parquet, ORC provide fast scan over columnar data format, and Apache Hbase offers fast ingest and millisecond scale random access.
In this talk, we will outline Apache Carbondata, a new addition to open source Hadoop ecosystem which is an indexed columnar file format aimed for bridging the gap to fully enable real-time analytics abilities. It has been deeply integrated with Spark SQL and enables dramatic acceleration of query processing by leveraging efficient encoding/compression and effective predicate push down through Carbondata’s multi-level index technique.
The document provides an overview of various MySQL storage engines. It discusses key storage engines like MyISAM, InnoDB, MEMORY, and MERGE. It describes that storage engines manage how data tables are handled and each engine has its own advantages and purposes. The selection of a storage engine depends on the user's table type and purpose, considering factors like transactions, backups, and special features.
PostgreSQL: O melhor banco de dados Universoelliando dias
PostgreSQL é um sistema gerenciador de banco de dados de código aberto desenvolvido na década de 1980. Ele suporta vários sistemas operacionais e plataformas, oferece alta escalabilidade, disponibilidade e desempenho, e é utilizado por grandes empresas como a NASA, Apple, Cisco e Yahoo. PostgreSQL tem licença BSD e pode ser modificado e redistribuído livremente.
Firebird recovery tools and techniques by IBSurgeonAlexey Kovyazin
Presentation "Firebird recovery tools and techniques", by Alexey Kovyazin (IBSurgeon). This presentation is devoted to Firebird corruptions: why they occur, what are their symptoms, how to fix corruptions with Firebird standard tools and with advanced IBSurgeon tools and services.
Firebird es un sistema de administración de base de datos relacional de código abierto cuya función principal es apoyar y promover el desarrollo del manejador de base de datos Firebird. Firebird ofrece características como soporte de transacciones ACID, claves foráneas, buena seguridad basada en usuarios y roles, y compatibilidad con SQL-92. Existen dos tipos de servidores Firebird: Classic y Superserver, siendo Classic la opción recomendada para la mayoría de situaciones debido a su menor consumo de recursos.
Sputnik: Airbnb’s Apache Spark Framework for Data EngineeringDatabricks
Apache Spark is a general-purpose big data execution engine. You can work with different data sources with the same set of API in both batch and streaming mode.
Building Active Directory Monitoring with Telegraf, InfluxDB, and GrafanaBoni Yeamin
Building Active Directory Monitoring with Telegraf, InfluxDB, and Grafana: A Brief Overview
Active Directory (AD) Monitoring is essential for maintaining network security, performance, and compliance. One powerful approach to achieve this is by utilizing the combination of Telegraf, InfluxDB, and Grafana.
Telegraf: Data Collection
Telegraf acts as a versatile data collector, capable of retrieving various metrics from your AD environment. It offers a range of plugins to monitor AD-related parameters, including event logs, replication status, user activity, and more. Telegraf gathers these metrics and prepares them for further processing.
InfluxDB: Data Storage
InfluxDB serves as a robust time-series database, designed to handle high-frequency data updates. It's an ideal choice for storing the metrics collected by Telegraf. The schemaless architecture accommodates evolving data requirements. Metrics are stored with timestamps, making historical analysis and trend identification seamless.
Grafana: Data Visualization
Grafana excels in turning data into meaningful insights. It connects to InfluxDB and transforms raw metrics into interactive, visually appealing dashboards. You can design custom visualizations, such as line charts for monitoring replication status, gauges for real-time user login activity, and tables for critical event logs. Alerts can also be set up to notify administrators of anomalies.
For a long time, relational database management systems have been the only solution for persistent data store. However, with the phenomenal growth of data, this conventional way of storing has become problematic.
To manage the exponentially growing data traffic, largest information technology companies such as Google, Amazon and Yahoo have developed alternative solutions that store data in what have come to be known as NoSQL databases.
Some of the NoSQL features are flexible schema, horizontal scaling and no ACID support. NoSQL databases store and replicate data in distributed systems, often across datacenters, to achieve scalability and reliability.
The CAP theorem states that any networked shared-data system (e.g. NoSQL) can have at most two of three desirable properties:
• consistency(C) - equivalent to having a single up-to-date copy of the data
• availability(A) of that data (for reads and writes)
• tolerance to network partitions(P)
Because of this inherent tradeoff, it is necessary to sacrifice one of these properties. The general belief is that designers cannot sacrifice P and therefore have a difficult choice between C and A.
In this seminar two NoSQL databases are presented: Amazon's Dynamo, which sacrifices consistency thereby achieving very high availability and Google's BigTable, which guarantees strong consistency while provides only best-effort availability.
This document discusses Flipkart's search architecture and how it addresses challenges for e-commerce search. It has a diverse catalog of 13 million products across 900 categories. It needs high performance with 99.99% availability and 1000 queries per second. There are also high rates of updates. Solutions discussed include caching, external source fields for sorting/faceting/filtering, and relevance optimizations. Caching improves performance 10-50x by caching results. External fields help with updates and partitioning. Relevance is tuned using boosts, user feedback, and query classification.
The document describes Google File System (GFS), which was designed by Google to store and manage large amounts of data across thousands of commodity servers. GFS consists of a master server that manages metadata and namespace, and chunkservers that store file data blocks. The master monitors chunkservers and maintains replication of data blocks for fault tolerance. GFS uses a simple design to allow it to scale incrementally with growth while providing high reliability and availability through replication and fast recovery from failures.
This document discusses logical replication with pglogical. It begins by explaining that pglogical performs row-oriented replication and outputs replication data that can be used in various ways. It then covers the architectures of standalone PostgreSQL, physical replication, and logical replication. The rest of the document discusses key aspects of pglogical such as its output plugin, selective replication capabilities, performance and future plans, and examples of using the output with other applications.
Batch Processing at Scale with Flink & IcebergFlink Forward
Flink Forward San Francisco 2022.
Goldman Sachs's Data Lake platform serves as the firm's centralized data platform, ingesting 140K (and growing!) batches per day of Datasets of varying shape and size. Powered by Flink and using metadata configured by platform users, ingestion applications are generated dynamically at runtime to extract, transform, and load data into centralized storage where it is then exported to warehousing solutions such as Sybase IQ, Snowflake, and Amazon Redshift. Data Latency is one of many key considerations as producers and consumers have their own commitments to satisfy. Consumers range from people/systems issuing queries, to applications using engines like Spark, Hive, and Presto to transform data into refined Datasets. Apache Iceberg allows our applications to not only benefit from consistency guarantees important when running on eventually consistent storage like S3, but also allows us the opportunity to improve our batch processing patterns with its scalability-focused features.
by
Andreas Hailu
Flink provides unified batch and stream processing. It natively supports streaming dataflows, long batch pipelines, machine learning algorithms, and graph analysis through its layered architecture and treatment of all computations as data streams. Flink's optimizer selects efficient execution plans such as shipping strategies and join algorithms. It also caches loop-invariant data to speed up iterative algorithms and graph processing.
URP? Excuse You! The Three Kafka Metrics You Need to KnowTodd Palino
What do you really know about how to monitor a Kafka cluster for problems? Is your most reliable monitoring your users telling you there’s something broken? Are you capturing more metrics than the actual data being produced? Sure, we all know how to monitor disk and network, but when it comes to the state of the brokers, many of us are still unsure of which metrics we should be watching, and what their patterns mean for the state of the cluster. Kafka has hundreds of measurements, from the high-level numbers that are often meaningless to the per-partition metrics that stack up by the thousands as our data grows.
We will thoroughly explore three key monitoring concepts in the broker, that will leave you an expert in identifying problems with the least amount of pain:
Under-replicated Partitions: The mother of all metrics
Request Latencies: Why your users complain
Thread pool utilization: How could 80% be a problem?
We will also discuss the necessity of availability monitoring and how to use it to get a true picture of what your users see, before they come beating down your door!
The document discusses Protocol Buffers, which are a mechanism for serializing structured data. It provides advantages over XML such as being smaller, faster, and generating easier to use data access classes. It describes how protocol buffers work by defining data formats in a .proto file, then using the protocol buffer compiler to generate classes to access serialized data. The document outlines best practices for protocol buffers like maintaining backward compatibility and choosing field numbers and types carefully.
Best Practices for Becoming an Exceptional Postgres DBA EDB
Drawing from our teams who support hundreds of Postgres instances and production database systems for customers worldwide, this presentation provides real-real best practices from the nation's top DBAs. Learn top-notch monitoring and maintenance practices, get resource planning advice that can help prevent, resolve, or eliminate common issues, learning top database tuning tricks for increasing system performance and ultimately, gain greater insight into how to improve your effectiveness as a DBA.
The document describes the Volcano/Cascades query optimizer. It uses dynamic programming to efficiently search the large space of possible query execution plans. The optimizer represents queries as logical and physical operators connected by transformation and implementation rules. It explores the logical plan space and then builds physical plans by applying these rules. The search is guided by estimating physical operator costs. The optimizer memoizes partial results to avoid redundant work. This approach allows finding optimal execution plans in a principled way that scales to complex queries and optimizer extensions.
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
In this talk, Pascal Desmarets, CEO and Founder of Hackolade discusses the foundations of NoSQL data modeling. He highlights:
- Why is data modeling a key success factor?
- “Sweet spot” use cases where NoSQL shines the most
- Basic principles of Data Modeling for ScyllaDB
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
- Profiling Hadoop jobs at Twitter revealed that compression/decompression of intermediate data and deserialization of complex object keys were very expensive. Optimizing these led to performance improvements of 1.5x or more.
- Using columnar file formats like Apache Parquet allows reading only needed columns, avoiding deserialization of unused data. This led to gains of up to 3x.
- Scala macros were developed to generate optimized implementations of Hadoop's RawComparator for common data types, avoiding deserialization for sorting.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
PostgreSQL major version upgrade using built in Logical ReplicationAtsushi Torikoshi
This document discusses upgrading a PostgreSQL database from a major version to another using logical replication. It begins with an introduction to logical replication and its advantages over other upgrade methods. It then covers the architecture of logical replication, including how the walsender reads WAL files and applies changes to the subscriber. Finally, it addresses some limitations of logical replication for upgrades, such as objects that are not replicated, and provides strategies for completing the upgrade while minimizing downtime.
The document discusses MySQL's buffer pool and buffer management. It describes how the buffer pool caches frequently accessed data in memory for faster access. The buffer pool contains several lists including a free list, LRU list, and flush list. It explains functions for reading pages from storage into the buffer pool, replacing pages using LRU, and flushing dirty pages to disk including single page flushes during buffer allocation.
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
Realtime analytics over large datasets has become an increasing wide-spread demand, over the past several years, Hadoop ecosystem has been continuously evolving, even complex queries over large datasets can be realized in an interactive fashion with distributed processing framework like Apache Spark, new paradigm of efficient storage were introduced as well to facilitate data processing framework, such as Apache Parquet, ORC provide fast scan over columnar data format, and Apache Hbase offers fast ingest and millisecond scale random access.
In this talk, we will outline Apache Carbondata, a new addition to open source Hadoop ecosystem which is an indexed columnar file format aimed for bridging the gap to fully enable real-time analytics abilities. It has been deeply integrated with Spark SQL and enables dramatic acceleration of query processing by leveraging efficient encoding/compression and effective predicate push down through Carbondata’s multi-level index technique.
The document provides an overview of various MySQL storage engines. It discusses key storage engines like MyISAM, InnoDB, MEMORY, and MERGE. It describes that storage engines manage how data tables are handled and each engine has its own advantages and purposes. The selection of a storage engine depends on the user's table type and purpose, considering factors like transactions, backups, and special features.
PostgreSQL: O melhor banco de dados Universoelliando dias
PostgreSQL é um sistema gerenciador de banco de dados de código aberto desenvolvido na década de 1980. Ele suporta vários sistemas operacionais e plataformas, oferece alta escalabilidade, disponibilidade e desempenho, e é utilizado por grandes empresas como a NASA, Apple, Cisco e Yahoo. PostgreSQL tem licença BSD e pode ser modificado e redistribuído livremente.
Firebird recovery tools and techniques by IBSurgeonAlexey Kovyazin
Presentation "Firebird recovery tools and techniques", by Alexey Kovyazin (IBSurgeon). This presentation is devoted to Firebird corruptions: why they occur, what are their symptoms, how to fix corruptions with Firebird standard tools and with advanced IBSurgeon tools and services.
Firebird es un sistema de administración de base de datos relacional de código abierto cuya función principal es apoyar y promover el desarrollo del manejador de base de datos Firebird. Firebird ofrece características como soporte de transacciones ACID, claves foráneas, buena seguridad basada en usuarios y roles, y compatibilidad con SQL-92. Existen dos tipos de servidores Firebird: Classic y Superserver, siendo Classic la opción recomendada para la mayoría de situaciones debido a su menor consumo de recursos.
El documento describe el Modelo Entidad Relación (MER), el cual permite representar abstracciones y conocimiento en un sistema de información a través de entidades y relaciones. El MER utiliza diagramas que incluyen símbolos como rectángulos y líneas para mostrar la conexión entre entidades, relaciones y atributos. Las entidades representan objetos del mundo real y tienen atributos que los describen, mientras que las relaciones muestran cómo se vinculan las entidades.
This document discusses known compatibility issues when migrating from earlier versions of Firebird to version 2.1. Key points include: changes to note in v2.1 like the need to upgrade metadata; compatibility issues related to SQL migration like changes in DDL, DML, and PSQL; security changes in v2.1 like the renaming of the security database; and platform-specific issues when installing on Windows or POSIX systems. The document provides guidance on addressing these issues to ensure a smooth migration.
Understandung Firebird optimizer, by Dmitry Yemanov (in English)Alexey Kovyazin
The document discusses Firebird's query optimizer. It explains that the optimizer analyzes statistical information to retrieve data in the most efficient way. It can use rule-based or cost-based strategies. Rule-based uses heuristics while cost-based calculates costs based on statistics. The optimizer prepares queries, calculates costs of different plans, and chooses the most efficient plan based on selectivity, cardinality, and cost metrics. It relies on up-to-date statistics stored in the database to estimate costs and make optimization decisions.
In this presentation we consider how to resolve Firebird performance problems: what Firebird database parameters we need to monitor and how we need to tune Firebird configuration and adjust client applications.
This document discusses how to prevent corruption in Firebird databases. It outlines key things to monitor like server parameters, database files, backups, and indices to recognize potential problems. Regular maintenance like validating metadata, checking backups, and analyzing logs and errors is important. The document also introduces FBDataGuard, a tool that automates monitoring of 26 database and server parameters, alerts of issues, and helps maintain databases to prevent corruption from hardware failures or other causes.
Firebird database recovery and protection for enterprises and ISVMind The Firebird
This document discusses database reliability and recovery for Firebird databases. It begins by addressing whether Firebird databases are reliable, noting that all databases can experience corruption from hardware or software failures over time. The document then discusses what happens when a Firebird database crashes, explaining that the database consists of system metadata pages and user data pages. It provides an overview of standard recovery methods using gfix and gbak commands. Additional approaches for recovery are presented, including the FBDataGuard and FirstAID tools from IBSurgeon. Offers are made to developers for subscribing to FBDataGuard instances to protect databases.
This document discusses best practices for optimizing SQL Server performance. It recommends establishing a baseline, identifying bottlenecks, making one change at a time and measuring the impact. It also provides examples of metrics, tools and techniques to monitor performance at the system, database and query levels. These include Windows Performance Monitor, SQL Server Activity Monitor, Dynamic Management Views and trace flags.
The document discusses recommendations for improving DB2 system availability and performance. It recommends automating the monitoring of critical DB2 messages, applying preventative service regularly, designing applications for high availability and parallelism, managing virtual storage above 16MB efficiently, and performing high performance object recovery through techniques like using DASD and parallel jobs.
Geek Sync | Guide to Understanding and Monitoring TempdbIDERA Software
You can watch the replay for this Geek Sync webcast in the IDERA Resource Center: http://ow.ly/7OmW50A5qNs
Every SQL Server system you work with has a tempdb database. In this Geek Sync, you’ll learn how tempdb is structured, what it’s used for and the common performance problems that are tied to this shared resource.
Presentation backup and recovery best practices for very large databases (v...xKinAnx
This document provides best practices for backup and recovery of very large databases (VLDBs). It discusses VLDB trends requiring databases to scale to terabytes and beyond. The key is protecting growing data while maintaining cost efficiency. The presentation covers assessing recovery requirements, architecting backup environments, leveraging Oracle tools, planning data layout, developing backup procedures, and recovery strategies. It also provides a Starbucks case study example.
Exchange Server 2013 introduced changes to the database and store to decrease hardware costs, increase reliability and availability, and provide better data protection and diagnostics. Key changes included running the store as multiple processes per database, optimizing data structures and storage for sequential IO, adding managed availability monitoring and recovery actions, and improving diagnostic tools and data available through PowerShell.
There are more and more companies have big Firebird databases, from 100Gb till 1Tb. Maintenance and optimization tasks for such databases are different from small, and database administrators need take into account several important things about big Firebird databases.
Hoodie (Hadoop Upsert Delete and Incremental) is an analytical, scan-optimized data storage abstraction which enables applying mutations to data in HDFS on the order of few minutes and chaining of incremental processing in hadoop
Galaxy Big Data with MariaDB 10 by Bernard Garros, Sandrine Chirokoff and Stéphane Varoqui.
Presented 26.6.2014 at the MariaDB Roadshow in Paris, France.
MySQL Enterprise Backup provides fast, consistent, online backups of MySQL databases. It allows for backing up InnoDB and MyISAM tables while the database is running, minimizing downtime. The tool takes physical backups of the data files rather than logical backups, allowing for very fast restore times compared to alternatives like mysqldump. It supports features like compressed backups, incremental backups, and point-in-time recovery.
This document discusses database backup and recovery strategies. It outlines different backup types including logical, physical, hot, and cold backups. It describes how backups can protect a database from failures, increase uptime, and minimize data loss. The document also categorizes different types of failures and whether recovery is needed. It provides details on enabling archive logging mode and performing physical database backups in both open and closed states. Logical backups using Oracle Export and Import utilities are also covered.
This document discusses database backup and recovery. It defines backup as additional copies of data for restoration if the primary copy is lost or corrupted. There are several types of backups including full, incremental, differential, and mirror backups. Recovery brings the database back to a prior consistent state, using techniques like log files, check pointing, and immediate or deferred transaction updates. Factors like backup location, test restores, automation, and database design can influence recovery duration. Alternatives to traditional backup and recovery include standby databases, replication, and disk mirroring.
This document introduces Oracle database architecture, including the main components of an Oracle instance, Oracle database, and tablespaces. An Oracle instance consists of an SGA (memory) and background processes. The Oracle database comprises control files, redo log files, and data files. Common tablespaces include SYSTEM, SYSAUX, UNDO, TEMPORARY, and user tablespaces like DATA and INDEX. The document also discusses building a career with passion in Oracle database technologies.
During this webinar, we are going to take a look at the difference between a backup (using tools like BART, Barman or pgBackRest) and a dump (using tools like pg_dump). We will review the advantages and disadvantages, key considerations, and tools available for both methods.
SQL Server is really the brain of SharePoint. The default settings of SQL server are not optimised for SharePoint. In this session, Serge Luca (SharePoint MVP) and Isabelle Van Campenhoudt (SQL Server MVP) will give you an overview of what every SQL Server DBA needs to know regarding configuring, monitoring and setting up SQL Server for SharePoint 2013. After a quick description of the SharePoint architecture (site, site collections,…), we will describe the different types of SharePoint databases and their specific configuration settings. Some do’s and don’ts specific to SharePoint and also the disaster recovery options for SharePoint, including (but not only) SQL Server Always On Availability, groups for High availability and disaster recovery in order to achieve an optimal level of business continuity.
Benefits of Attending this Session:
Tips & tricks
Lessons learned from the field
Super return on Investment
Make your SharePoint fly by tuning and optimizing SQL Serverserge luca
This document summarizes a presentation on optimizing SQL Server for SharePoint. It discusses basic SharePoint database concepts, planning for long-term performance by optimizing resources like CPU, RAM, disks and network latency. It also covers optimal SQL Server configuration including installation, database settings like recovery models and file placement. Maintaining databases through tools like DBCC CheckDB and measuring performance using counters and diagnostic queries is also presented. The presentation emphasizes the importance of collaboration between SharePoint and database administrators to ensure compliance and optimize performance.
This document discusses very large database (VLDB) configurations and maintenance. It begins by defining a VLDB as a database occupying more than 1 terabyte or containing several billion rows. It then covers various configuration topics like operating system settings, instance memory allocation including the importance of tempdb configuration, and database file configuration. The document also discusses maintenance best practices such as disaster recovery planning, partitioning data to aid restores, compressing backups and data, purging or archiving old data, and performing regular index maintenance and integrity checks.
Similar to Firebird's Big Databases (in English) (20)
The document discusses OLTP-EMUL, an open-source performance testing tool for Firebird. It describes how to configure and run OLTP-EMUL tests on Firebird 2.5, 3.0 and 4.0 to simulate a trading application. IBSurgeon regularly runs these tests and publishes the results online, showing significant performance improvements from Firebird 2.5 to 3.0 and 4.0, especially for the SuperServer engine. Configuration settings for Firebird.conf that optimize performance are also provided.
Transactions in Firebird work by assigning each record multiple versions, with each version associated with a transaction. This allows concurrent transactions to operate on stable views of the database. The transaction inventory pages (TIP) track the state of transactions and four markers - next, oldest active, oldest snapshot, oldest interesting - help determine which record versions are visible to each transaction. Keeping transactions short helps avoid problems related to these markers being out of date and prevents unnecessary record versions from building up over time.
Professional tools for Firebird optimization and maintenance from IBSurgeonAlexey Kovyazin
How to create better environment for big Firebird databases? How DBA can recognize and solve problems with Firebird performance, backups or corruptions (and better prevent corruptions)? This session was devoted to professional Firebird tools from IBSurgeon which help to solve all these problems.
Firebird migration: from Firebird 1.5 to Firebird 2.5Alexey Kovyazin
This document summarizes the migration of a 75GB Firebird database from version 1.5 to 2.5 for a pharmaceutical distributor in Russia. Key steps included preparing metadata, testing the data conversion, application migration by checking and updating SQL queries, and optimizing performance. Over 55,000 SQL queries were analyzed, with around 750 requiring changes to work with the new version. The migration was completed in less than 4 months with improved performance on the new platform.
This document outlines several investment opportunities in Russian retail networks, including a stock exchange that handles over $100 million in daily turnover, a retail network called Lider with shops in 10 cities, and two building material retail networks called Baucenter with shops in 5 cities that cover over 15,000 square meters. It also mentions a shoes retail network called ECCO SHOES with a presence in 60 cities and over 100 shops.
FBScanner: IBSurgeon's tool to solve all types of performance problems with F...Alexey Kovyazin
FBScanner can be used to profile database applications, monitor user activity, manage database connections (including client disconnection on both Classic and SuperServer architecture). It’s also ideal for troubleshooting INET errors (INET/inet_error: send errno = 10054), as well as auditing existing applications and performance tuning.
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)Alexey Kovyazin
This document discusses cost-based optimization and statistics in Firebird. It covers:
1) Rule-based optimization uses heuristics while cost-based optimization uses statistical data to estimate the cost of different access paths and choose the most efficient.
2) Statistics like selectivity, cardinality, and histograms help estimate costs by providing information on data distribution and amounts.
3) The optimizer aggregates costs from the bottom up and chooses the access path with the lowest total cost based on the statistical information.
СУБД Firebird: Краткий обзор, Дмитрий Еманов (in Russian)Alexey Kovyazin
Небольшая презентация Дмитрия Еманова, ведущего разработчика Firebird, посвящена обзору СУБД Firebird, в том числе текущему состоянию и планам развития.
Open Source: взгляд изнутри, Дмитрий Еманов (The Firebird Project) (in Russian)Alexey Kovyazin
Презентация ведущего разработчика проекта Firebird Дмитрия Еманова посвящена современным моделям Open Source: бизнес-моделям, способам организации коммьюнити, а также рассказывает о месте и основных вехах развития СУБД с открытым кодом Firebird
Firebird Scalability, by Dmitry Yemanov (in English)Alexey Kovyazin
In-depth presentation regarding key concepts of Firebird scalability, including SuperServer vs Classic discussion, memory usage for page and sorting buffers, CPU and concurrency, multi-CPU and multi-core, TPC-C figure, etc.
Firebird 2.1 What's New by Vladislav Khorsun (English)Alexey Kovyazin
Detailed presentation devoted to new features of Firebird 2.1 by Vladislav Khorsun, core Firebird Developer. Main features are covered, including tips and tricks and often usage scenarios.
Firebird: Универсальная СУБД. Краткая презентация на Интероп 2008, Дмитрий Ем...Alexey Kovyazin
Дмитрий Еманов представил краткую презентацию, посвященную Firebird на конференции Интероп 2008. Презентация рассматривает место Firebird среди других СУБД с открытым кодом, описывает текущее состояние дел и представляет планы на будущее.
Firebird в 2008: новые возможности и планы по дальнейшему развитию, by Дмитри...Alexey Kovyazin
Презентация Дмитрия Еманова, ведущего разработчика Firebird, посвящена планам развития Firebird в 2008 b 2009 годах. Описаны новые подходы к архитектуре Firebird в версиях 2.5 и 3.0, новые функции Firebird и планы развития.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
2. Agenda What is a big Firebird database? 1 Terabyte (1024Gb) database test Real-world examples Our statistics for databases sizes growth Peculiarities of big databases maintenance Backup/restore approach Health checks for indices and data Automated maintenance with FBDataGuard Licensing changes in IBSurgeon No more “Personal” Focus on preventing corruptions, not fixing
4. What is a big Firebird database? Pre-history 2005 IBSurgeon found 37 gigabytes per table limit in Firebird 1.0-1.5 and all InterBase version 2006 Firebird 2.0 fixes that limit, and introduces nbackup AVARDA (ERP) shows working with the 100gb database 2009 – IBSurgeon created 1 TB Firebird database http://www.ib-aid.com/articles/item104
5. What is a big Firebird database? Examples from our customers BAS-X, Australia, warehouse & ERP 200-450 GB, no BLOBs Watermark, UK, scanned documents and full-text search 300+ GBs, BLOBs Profitmed, Russia, medical distribution company 50 GB, no BLOBs Avarda, Russia, ERP 100Gb, no BLOBs Megafon, Russian cell operator 50 GB, cell information storage, no BLOBs
9. Zero maintenance? Self-tuned in 90% cases Default firebird.conf is good enough Default database parameters: sweep, buffers, etc are good enough Page size 4096 is Ok Maintenance is simple Just make backup/restore Not true for big databases.
10. Big databases are hard to maintain Backup/restore cycle can take hours Depends on data density and HDD performance – numeric data are restored slowly Bad example - 1 hour to restore 3.6 GB database Good example – 12 hours to backup 450 Gb database Sweep takes from 30 minutes to 4 hours For 50Gb database gfix of the 37 GB database takes 1-2 hours Restore of 50 GB database takes 4 hours
21. Key difference between small and big databases Small database – backup/restore and relax Just make backup/restore – it creates database from scratch and removes all problems Transaction issues and other problems does have time to grow between backup/restore cycles Big database – need to check and maintain in several steps…
22. Big database requires individual maintenance plan Maintenance plan depends on size of database and work mode (8x5, 24x7) Backups scheme is not simple Perform test restores separately To be checked Errors – in firebird.log and run error checking quries on live database Metadata – check integrity Metadata limits Data & BLOBs – walk through data, check segmentation Indices – check indices health Transactions – any gaps, garbage growth, other problems
24. Example of backup plan for big Firebird database Maintenance server Main server Firebird database Nbackup copy Checking restore Gbak-b And each step should be confirmed and reported.
25. What to monitor-1 Server and database errors Server and database availability Check all changes in firebird.log Check metadata Transactions Transaction markers monitoring (garbage problems) Limit (2 billions between backup/restore) Users Min/max/avg users
26. What to monitor-2 Database files Single volume and multi-volume Paths – where to stored (not at the same drive with temp files and backups!) Sizes and growth limits Delta-files (nbackup) Life-time and sizes Backup files Existence, sizes and growth limits
27. What to monitor-3 Temp files Firebird Overall size and quantity Number of formats per table No more than 255 Less formats in production Non-activated and deactivated indices Deactivated – explicitly deactivated (why deactivated?) Non-activated – indicates problems during restore
28. What to monitor-4 Periodical statistics (gstat) Firebird server version Latest patches are recommended Firebird installation size Firebird logs size and paths
29. Maintenance-1 Backups Revolver (days, week, month copies) backups Backup depth Checking restore (need to check results) Growth prognosis (if not enough space, backup should be canceled) Control backup time (too long backup indicates problems)
30. Maintenance-2 Indices Recalculate indices statistics Selected or excluded Check index status – active/in-active/non-activated Check physical index health
31. Maintenance-3 Validate database with gfix Don’t forget to shutdown database Analysis (includingfirebird.log) Metadata validation Check important system tables Firebird.log maintenance When log becomes very big, copy it to backup log files And some more things….
36. FBDataGuarddoes all above things… Watches database files, volumes, deltas, performs and checks backups in the right way Verifies metadata, data and indices Watches for errors, limits and wrong versions Sends alerts and recommendations
37. And even more – protects from hardware failures FBDataGuard Extractor extracts data from corrupted database and inserts to the new Metadata repository New БД Tables data BLOBs
38. Firebird DataGuard Watch for 26 important database and server parameters Alerts for potential and real problem by email Proper automation of database maintenance Windows, Linux, MacOS, Firebird 1.5-2.1 Special licensing for ISV (Independent Software Vendors) Firebird developers
40. Why to change? Prevent corruptions instead of fixing It’s time to fight reasons, not consequences Wwronglicense understanding No more Personal Software-As-A-Service offer – 1 month and 1 instance