MonetDB :column-store approach in databaseNikhil Patteri
This document provides an overview of MonetDB, an open-source column-oriented database management system. It discusses MonetDB's design, which focuses on vertical fragmentation and storing each table column separately. This allows queries to access only relevant columns and improves performance. The document also describes the X100 query processor that was developed for MonetDB and aims to execute queries with high efficiency. It explains how MonetDB represents data and processes queries to maximize performance.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
A column-oriented DBMS is a database management system (DBMS) that stores its content by column rather than by row. This has advantages for data warehouses and library catalogues where aggregates are computed over large numbers of similar data items.
This document discusses the different types of tablespaces in InnoDB including the system tablespace (ibdata1), file-per-table tablespaces (.ibd), general tablespaces (.ibd), undo tablespaces (undo_001), and temporary tablespaces (.ibt, ibtmp1). It provides details on the structure and management of space within these tablespaces including pages, extents, segments, the file space header, extent descriptor pages, and the doublewrite buffer.
I promise that understand NoSQL is as easy as playing with LEGO bricks ! The Google Bigtable presented in 2006 is the inspiration for Apache HBase: let's take a deep dive into Bigtable to better understand Hbase.
As part of NoSQL series, I presented Google Bigtable paper. In presentation I tried to give some plain introduction to Hadoop, MapReduce, HBase
www.scalability.rs
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
The document compares the performance of row-stores and column-stores for data warehousing workloads. It finds that with certain optimizations, the performance difference can be minimized:
A row-store can match the performance of a column-store by vertically partitioning columns and allowing virtual tuple IDs. Removing optimizations from a column-store, like compression and late materialization, causes its performance to degrade to that of a row-store. While column stores are better suited for data warehousing, row-stores can achieve similar performance with improvements to support vertical partitioning and column-specific optimizations.
BigTable is a distributed storage system designed by Google to manage large amounts of structured data across thousands of machines. It is a sparse, multidimensional sorted map that scales to petabytes of data. BigTable uses other Google technologies like Google File System for storage, and MapReduce for distributed computations. Data is stored across tablets that are dynamically partitioned and distributed among tablet servers for high performance and availability.
MonetDB :column-store approach in databaseNikhil Patteri
This document provides an overview of MonetDB, an open-source column-oriented database management system. It discusses MonetDB's design, which focuses on vertical fragmentation and storing each table column separately. This allows queries to access only relevant columns and improves performance. The document also describes the X100 query processor that was developed for MonetDB and aims to execute queries with high efficiency. It explains how MonetDB represents data and processes queries to maximize performance.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
A column-oriented DBMS is a database management system (DBMS) that stores its content by column rather than by row. This has advantages for data warehouses and library catalogues where aggregates are computed over large numbers of similar data items.
This document discusses the different types of tablespaces in InnoDB including the system tablespace (ibdata1), file-per-table tablespaces (.ibd), general tablespaces (.ibd), undo tablespaces (undo_001), and temporary tablespaces (.ibt, ibtmp1). It provides details on the structure and management of space within these tablespaces including pages, extents, segments, the file space header, extent descriptor pages, and the doublewrite buffer.
I promise that understand NoSQL is as easy as playing with LEGO bricks ! The Google Bigtable presented in 2006 is the inspiration for Apache HBase: let's take a deep dive into Bigtable to better understand Hbase.
As part of NoSQL series, I presented Google Bigtable paper. In presentation I tried to give some plain introduction to Hadoop, MapReduce, HBase
www.scalability.rs
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
The document compares the performance of row-stores and column-stores for data warehousing workloads. It finds that with certain optimizations, the performance difference can be minimized:
A row-store can match the performance of a column-store by vertically partitioning columns and allowing virtual tuple IDs. Removing optimizations from a column-store, like compression and late materialization, causes its performance to degrade to that of a row-store. While column stores are better suited for data warehousing, row-stores can achieve similar performance with improvements to support vertical partitioning and column-specific optimizations.
BigTable is a distributed storage system designed by Google to manage large amounts of structured data across thousands of machines. It is a sparse, multidimensional sorted map that scales to petabytes of data. BigTable uses other Google technologies like Google File System for storage, and MapReduce for distributed computations. Data is stored across tablets that are dynamically partitioned and distributed among tablet servers for high performance and availability.
The document describes BigTable, a distributed storage system developed at Google to handle large amounts of structured data. BigTable stores data in sparse, distributed, multidimensional sorted maps, with rows organized by lexicographical ordering of row keys. It provides a flexible data model with columns grouped into column families and versions of each cell value stored by timestamp. BigTable is scalable, fault-tolerant and self-managing, using a master server to manage tablet servers that store and serve ranges of table data.
The document discusses problems with the existing Redis implementation and requirements for a new solution. It evaluates Aerospike and Apache Ignite as potential replacements. Aerospike is highlighted as a good fit due to its support for sharding, ACID compliance, and ability to store data on disk or RAM for high availability and reduced costs compared to RAM-only solutions.
Fractal Tree Indexes : From Theory to PracticeTim Callaghan
Fractal Tree Indexes are compared to the indexing incumbent, B-trees. The capabilities are then shown what they bring to MySQL (in TokuDB) and MongoDB (in TokuMX).
Presented at Percona Live London 2013.
"Dear Students,
Greetings from www.etraining.guru
We provide BEST online training for IBM DB2 LUW/UDB DBA by a database architect. Our DB2 Trainer comes with a working experience of 11+ years, 9+ years in DB2 and a DB2 certified professional.
DB2 LUW DBA Course Content: http://www.etraining.guru/course/dba/online-training-db2-luw-udb-dba
Course Cost: USD 350 (or) INR 21000
Number of Hours: 30-35 hours
Regards,
Karthik
www.etraining.guru"
The document summarizes the key data structures used to organize data in InnoDB:
- InnoDB stores data in tablespaces which consist of data files. A tablespace header tracks free/used extents within these files.
- Data files contain fixed-size pages which are organized into extents of 1MB each. Page headers identify page types like interior, leaf, etc.
- File segments allocate ranges of pages to index trees. The root node of each index references two segment headers to allocate leaf/non-leaf pages separately.
This document provides an overview of several Google technologies that help enable its fast and reliable services, including Google File System (GFS), Chubby lock service, MapReduce, and BigTable. BigTable is described as Google's proprietary, non-relational database that uses compression and a distributed, tablet-based architecture to provide high performance at scale across commodity hardware. It stores data as multidimensional sparse maps divided into tablets that are distributed, replicated, and load balanced for availability and scalability.
TokuDB is an ACID/transactional storage engine that makes MySQL even better by increasing performance, adding high compression, and allowing for true schema agility. All of these features are made possible by Tokutek's Fractal Tree indexes.
MariaDB's Andrew Hutchings and Shane Johnson walk through new features of the MariaDB ColumnStore storage engine, tools and adapters, then provide a sneak peak at what's planned for the next release.
This document provides an overview of Google Bigtable, a distributed storage system for structured data. It discusses Bigtable's design including its use of column families, row keys, and versioning. It also describes Bigtable's basic implementation including its use of the Google File System (GFS) and how data is divided into tablets and distributed across tablet servers. The document then discusses related systems like HBase and how it compares to Bigtable. It provides examples of Bigtable's performance and real-world usage at Google. Finally, it poses some thoughts for discussion and provides useful references for further information.
A column-oriented database stores data tables as columns rather than rows. This improves the speed of queries that aggregate data over large numbers of records by only reading the necessary columns from disk. Column databases compress data well and avoid reading unnecessary columns. However, they have slower insert speeds and incremental loads compared to row-oriented databases, which store each row together and are faster for queries needing entire rows.
The document provides an overview of MySQL database including:
- A brief history of MySQL and descriptions of some early and modern storage engines.
- Explanations of the physical and logical architectures of MySQL, focusing on InnoDB storage engine components like the tablespace, redo logs, and buffer pool.
- An overview of installing, configuring, and optimizing MySQL for production use, including storage engine, server variable, and hardware recommendations.
- Descriptions of MySQL administration tools and methods for monitoring performance and activity.
- Explanations of MySQL replication including configuration, best practices, and use of global transaction identifiers.
- Discussions of backup strategies including logical dumps and binary backups.
InnoDB is the default storage engine for MySQL databases. It uses tablespaces, pages, and a data dictionary to store table definitions, structures, and indexes. Corruption can occur if the data dictionary, files, or pages are damaged. Errors must be analyzed to determine if the data dictionary, files, or pages need repair. Potential fixes include hex editing files to correct metadata, updating checksums, or restoring from backups.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
When your query execution is slow, a couple of questions arise. Where to look for resources utilization? What tools do you have to analyze CPU, hard drive and RAM bottlenecks? Could you do something to reduce query execution time? MariaDB's Patrick LeBlanc and Roman Nozdrin touch on both Columnstore's query execution introspection tools as well as operating system capabilities that everyone should know about. They go on to discuss a number of real life use cases too. Some called for configuration changes whilst others forced them to make serious changes in the code.
Bigtable is a distributed storage system for structured data designed to be scalable, high performance, and highly available. It uses a sparse, multidimensional sorted map to store data across many servers. Bigtable allows for asynchronous updates to different pieces of data at very high read and write rates and efficient scans across large datasets. It has been applied to applications like analytics, Earth imagery, and personalized search at Google.
The document discusses atomic DDL operations in MySQL 8.0. It describes the requirements for a transactional data dictionary storage engine and storage engines that support atomic DDL. It provides examples of how DDL statements like CREATE TABLE, DROP TABLE, and DROP SCHEMA are implemented atomically in MySQL 8.0 using a single transaction, compared to previous versions where these operations were not fully atomic. This ensures consistency after DDL operations and prevents issues like orphan files or tables.
In recent years, we have seen an overwhelming number of TV commercials that promise that the Cloud can help with many problems, including some family issues. What stands behind the terms “Cloud” and “Cloud Computing,” and what we can actually expect from this phenomenon? A group of students of the Computer Systems Technology department and Dr. T. Malyuta, whom has been working with the Cloud technologies since its early days, will provide an overview of the business and technological aspects of the Cloud.
Google developed Bigtable to address the challenges of Google's massive scale of data storage and analysis needs. Bigtable is a distributed storage system that provides a flexible data model and can scale to petabytes of data across thousands of commodity servers. It uses a column-oriented data structure that allows for efficient storage of sparse datasets and flexible querying. Bigtable also provides features like replication, load balancing, and data compression to ensure high performance, fault tolerance, and manageability as data volumes continue growing rapidly over time.
This document summarizes Google's Bigtable storage system. Bigtable stores data as a sparse, distributed, persistent multidimensional sorted map. It is built using the Google File System for storage, Chubby for locking, and a tablet structure with tablets split across multiple servers. Bigtable provides a simple data model and interfaces for clients to perform read and write operations on large datasets.
This document discusses LOFAR (Low-Frequency Array), a radio telescope array in the Netherlands, and how Python is used for data processing tasks related to LOFAR such as transient detection. It describes how Python is used for distributed computation, image processing, source extraction from datacubes, database storage in MonetDB, and creating VOEvents to report astronomical observations. Python libraries like NumPy, SciPy, Django, and custom libraries are crucial to the data processing pipeline.
O documento resume um estudo de desempenho do MySQL usando o benchmark TPC-H. O TPC-H simula um sistema de apoio à decisão com oito tabelas relacionadas e dez consultas complexas. Os resultados mostram os tempos de criação de tabelas, carregamento de dados de 6GB e execução das consultas no MySQL em uma máquina com especificações descritas.
The document describes BigTable, a distributed storage system developed at Google to handle large amounts of structured data. BigTable stores data in sparse, distributed, multidimensional sorted maps, with rows organized by lexicographical ordering of row keys. It provides a flexible data model with columns grouped into column families and versions of each cell value stored by timestamp. BigTable is scalable, fault-tolerant and self-managing, using a master server to manage tablet servers that store and serve ranges of table data.
The document discusses problems with the existing Redis implementation and requirements for a new solution. It evaluates Aerospike and Apache Ignite as potential replacements. Aerospike is highlighted as a good fit due to its support for sharding, ACID compliance, and ability to store data on disk or RAM for high availability and reduced costs compared to RAM-only solutions.
Fractal Tree Indexes : From Theory to PracticeTim Callaghan
Fractal Tree Indexes are compared to the indexing incumbent, B-trees. The capabilities are then shown what they bring to MySQL (in TokuDB) and MongoDB (in TokuMX).
Presented at Percona Live London 2013.
"Dear Students,
Greetings from www.etraining.guru
We provide BEST online training for IBM DB2 LUW/UDB DBA by a database architect. Our DB2 Trainer comes with a working experience of 11+ years, 9+ years in DB2 and a DB2 certified professional.
DB2 LUW DBA Course Content: http://www.etraining.guru/course/dba/online-training-db2-luw-udb-dba
Course Cost: USD 350 (or) INR 21000
Number of Hours: 30-35 hours
Regards,
Karthik
www.etraining.guru"
The document summarizes the key data structures used to organize data in InnoDB:
- InnoDB stores data in tablespaces which consist of data files. A tablespace header tracks free/used extents within these files.
- Data files contain fixed-size pages which are organized into extents of 1MB each. Page headers identify page types like interior, leaf, etc.
- File segments allocate ranges of pages to index trees. The root node of each index references two segment headers to allocate leaf/non-leaf pages separately.
This document provides an overview of several Google technologies that help enable its fast and reliable services, including Google File System (GFS), Chubby lock service, MapReduce, and BigTable. BigTable is described as Google's proprietary, non-relational database that uses compression and a distributed, tablet-based architecture to provide high performance at scale across commodity hardware. It stores data as multidimensional sparse maps divided into tablets that are distributed, replicated, and load balanced for availability and scalability.
TokuDB is an ACID/transactional storage engine that makes MySQL even better by increasing performance, adding high compression, and allowing for true schema agility. All of these features are made possible by Tokutek's Fractal Tree indexes.
MariaDB's Andrew Hutchings and Shane Johnson walk through new features of the MariaDB ColumnStore storage engine, tools and adapters, then provide a sneak peak at what's planned for the next release.
This document provides an overview of Google Bigtable, a distributed storage system for structured data. It discusses Bigtable's design including its use of column families, row keys, and versioning. It also describes Bigtable's basic implementation including its use of the Google File System (GFS) and how data is divided into tablets and distributed across tablet servers. The document then discusses related systems like HBase and how it compares to Bigtable. It provides examples of Bigtable's performance and real-world usage at Google. Finally, it poses some thoughts for discussion and provides useful references for further information.
A column-oriented database stores data tables as columns rather than rows. This improves the speed of queries that aggregate data over large numbers of records by only reading the necessary columns from disk. Column databases compress data well and avoid reading unnecessary columns. However, they have slower insert speeds and incremental loads compared to row-oriented databases, which store each row together and are faster for queries needing entire rows.
The document provides an overview of MySQL database including:
- A brief history of MySQL and descriptions of some early and modern storage engines.
- Explanations of the physical and logical architectures of MySQL, focusing on InnoDB storage engine components like the tablespace, redo logs, and buffer pool.
- An overview of installing, configuring, and optimizing MySQL for production use, including storage engine, server variable, and hardware recommendations.
- Descriptions of MySQL administration tools and methods for monitoring performance and activity.
- Explanations of MySQL replication including configuration, best practices, and use of global transaction identifiers.
- Discussions of backup strategies including logical dumps and binary backups.
InnoDB is the default storage engine for MySQL databases. It uses tablespaces, pages, and a data dictionary to store table definitions, structures, and indexes. Corruption can occur if the data dictionary, files, or pages are damaged. Errors must be analyzed to determine if the data dictionary, files, or pages need repair. Potential fixes include hex editing files to correct metadata, updating checksums, or restoring from backups.
1. Log structured merge trees store data in multiple levels with different storage speeds and costs, requiring data to periodically merge across levels.
2. This structure allows fast writes by storing new data in faster levels before merging to slower levels, and efficient reads by querying multiple levels and merging results.
3. The merging process involves loading, sorting, and rewriting levels to consolidate and propagate deletions and updates between levels.
When your query execution is slow, a couple of questions arise. Where to look for resources utilization? What tools do you have to analyze CPU, hard drive and RAM bottlenecks? Could you do something to reduce query execution time? MariaDB's Patrick LeBlanc and Roman Nozdrin touch on both Columnstore's query execution introspection tools as well as operating system capabilities that everyone should know about. They go on to discuss a number of real life use cases too. Some called for configuration changes whilst others forced them to make serious changes in the code.
Bigtable is a distributed storage system for structured data designed to be scalable, high performance, and highly available. It uses a sparse, multidimensional sorted map to store data across many servers. Bigtable allows for asynchronous updates to different pieces of data at very high read and write rates and efficient scans across large datasets. It has been applied to applications like analytics, Earth imagery, and personalized search at Google.
The document discusses atomic DDL operations in MySQL 8.0. It describes the requirements for a transactional data dictionary storage engine and storage engines that support atomic DDL. It provides examples of how DDL statements like CREATE TABLE, DROP TABLE, and DROP SCHEMA are implemented atomically in MySQL 8.0 using a single transaction, compared to previous versions where these operations were not fully atomic. This ensures consistency after DDL operations and prevents issues like orphan files or tables.
In recent years, we have seen an overwhelming number of TV commercials that promise that the Cloud can help with many problems, including some family issues. What stands behind the terms “Cloud” and “Cloud Computing,” and what we can actually expect from this phenomenon? A group of students of the Computer Systems Technology department and Dr. T. Malyuta, whom has been working with the Cloud technologies since its early days, will provide an overview of the business and technological aspects of the Cloud.
Google developed Bigtable to address the challenges of Google's massive scale of data storage and analysis needs. Bigtable is a distributed storage system that provides a flexible data model and can scale to petabytes of data across thousands of commodity servers. It uses a column-oriented data structure that allows for efficient storage of sparse datasets and flexible querying. Bigtable also provides features like replication, load balancing, and data compression to ensure high performance, fault tolerance, and manageability as data volumes continue growing rapidly over time.
This document summarizes Google's Bigtable storage system. Bigtable stores data as a sparse, distributed, persistent multidimensional sorted map. It is built using the Google File System for storage, Chubby for locking, and a tablet structure with tablets split across multiple servers. Bigtable provides a simple data model and interfaces for clients to perform read and write operations on large datasets.
This document discusses LOFAR (Low-Frequency Array), a radio telescope array in the Netherlands, and how Python is used for data processing tasks related to LOFAR such as transient detection. It describes how Python is used for distributed computation, image processing, source extraction from datacubes, database storage in MonetDB, and creating VOEvents to report astronomical observations. Python libraries like NumPy, SciPy, Django, and custom libraries are crucial to the data processing pipeline.
O documento resume um estudo de desempenho do MySQL usando o benchmark TPC-H. O TPC-H simula um sistema de apoio à decisão com oito tabelas relacionadas e dez consultas complexas. Os resultados mostram os tempos de criação de tabelas, carregamento de dados de 6GB e execução das consultas no MySQL em uma máquina com especificações descritas.
This document discusses benchmarking TPC-H queries in MongoDB compared to MySQL. It introduces MongoDB and describes setting up the TPC-H data by embedding all tables into a single MongoDB collection. Six sample queries are presented and run using Map-Reduce and the Aggregation Framework. Benchmark results show MongoDB performing worse than MySQL on all queries due to data conversion difficulties and MongoDB's immature Aggregation Framework. The document concludes that while MongoDB is suitable for some applications, it is not well-suited to complex queries like those in TPC-H due to its lack of standard query language and server-side processing abilities.
Em 1808, a família real portuguesa fugiu para o Brasil devido à invasão francesa. Eles trouxeram centenas de pessoas e bens para o Rio de Janeiro. Em 1815, D. João VI se tornou rei de Portugal e abriu os portos brasileiros para o comércio internacional. Ele também estabeleceu várias instituições culturais e industriais no Brasil. No entanto, em 1821 ele retornou a Portugal sob pressão popular, deixando seu filho D. Pedro como príncipe regente no Brasil.
The document discusses best practices for formatting code, including using proper indentation, naming conventions, and line lengths. It provides examples of good and bad HTML/CSS and PHP code formatting. The presentation also covers techniques for writing clean code such as using yoga conditions, counting in for loops, and comparing single vs double quotes.
El documento analiza las actividades en Facebook de 5 usuarios (Eduardo, Hubertus, Tincho, Chunty y Esteban) y encuentra que a pesar de sus diferentes niveles de actividad, ninguno ha publicado nunca un estado. Además, todos comparten más de 10 amigos en común a pesar de sus diferentes estilos de uso de la red social.
Este documento presenta el Reglamento del Servicio de Medicina Preventiva en el Transporte en México. Establece que la Secretaría de Comunicaciones y Transportes es responsable de aplicar exámenes médicos al personal que trabaja en el transporte para evaluar su aptitud y expedir constancias. Incluye definiciones de términos como examen psicofísico integral, dictamen de aptitud y terceros autorizados para aplicar exámenes. Describe los procedimientos y responsabilidades de la Dirección General de Protección y Medicina Preventiva en el Transport
Este documento ofrece consejos sobre cómo comunicarse de manera efectiva en el trabajo evitando frases y expresiones que pueden proyectar una mala imagen o falta de profesionalismo. Algunas frases a evitar incluyen "no es justo", "así se ha hecho siempre" y "no hay ningún problema", mientras que expresiones mejor son "no tengo esa información ahora pero la conseguiré lo más rápido posible" y "intentaré hacerlo". El documento también advierte no echarle la culpa a otros o quejarse diciendo "odio este trabajo".
Angie Lorena Murcia Santana went on vacation to San Gil, Colombia. She visited the zoo and walked in the park by herself. At her hotel, she sunbathed by the pool, swam with friends, and ate fruit. In Pescaderito, Angie swam alone and played in the water. She also went to bars, drinking lemonade at one and visiting Copa Cabana, and took a trip to San Gil park with her mother and grandmother.
El documento trata sobre varias enfermedades de transmisión sexual como VIH/SIDA, gonorrea, clamidia, uretritis inespecíficas, chancroide, linfogranuloma venéreo, sífilis, herpes genital, cáncer de próstata, cáncer de testículo, cáncer de uretra y cáncer de pene. Describe los agentes causales, síntomas, factores de riesgo y manifestaciones clínicas de cada enfermedad.
Este documento resume el Tema 2 sobre la empresa y su entorno. Explica las funciones de una empresa comercial, la importancia de los clientes, las funciones empresariales fundamentales como la producción, ventas y finanzas. También describe cómo establecer objetivos, la dirección por objetivos y excepción, y la necesidad de gestionar riesgos para el desarrollo de la empresa.
From ExactSource- Rules Regatding Admission of Expert Witness TestimonyChuck Detling
Produced by ExactSource this is an updated analysis of the admissibility of biomechanic and accident reconstruction expert witness testimony in all 50 states.
The document summarizes several industry standard benchmarks for measuring database and application server performance including SPECjAppServer2004, EAStress2004, TPC-E, and TPC-H. It discusses PostgreSQL's performance on these benchmarks and key configuration parameters used. There is room for improvement in PostgreSQL's performance on TPC-E, while SPECjAppServer2004 and EAStress2004 show good performance. TPC-H performance requires further optimization of indexes and query plans.
Este documento describe la formación de una nueva ideología urbana desde mediados del siglo XVIII hasta el siglo XX. Se introdujeron nuevos modelos de viviendas multifamiliares que ofrecían más comodidades como tuberías de agua, cocinas mejor diseñadas y servicios modernos. Esto benefició a todas las clases sociales al generalizarse las viviendas compartidas. También destaca la importancia del concreto en la construcción durante este período y cómo el modernismo simplificó el diseño arquitectónico enfocándose en la funcionalidad
O documento discute as principais vitaminas, suas funções, fontes alimentares, efeitos da deficiência e do excesso. Cobre as vitaminas A, D, E, K, C, B1, B2, B6, B12 e B5, descrevendo brevemente o papel de cada uma no organismo e possíveis problemas de saúde relacionados à falta ou excesso delas.
El documento contiene formatos para participantes de un curso de gestión empresarial de enfermería. Los formatos incluyen preguntas sobre razones para establecer un negocio, habilidades, pasatiempos que podrían comercializarse, competencia, clientes y recursos para iniciar un negocio. El objetivo general es ayudar a los participantes a definir un plan de negocio potencial.
Nesta aula os alunos irão compreender os fundamentos ontológicos e sociais da ética, incluindo sua evolução e princípios norteadores. Serão apresentados os elementos conceituais e sistemas racionais da ética para estimular a compreensão dos alunos sobre conceitos como justiça, direito e dever.
A corrected comparison between the databases by Tyler Weatherby 2017 Spring. A benchmark is done between MySQL MyISAM engine, MySQL Memory engine, and MonetDB engine on TPC-H data. In this project, we added the index/key to important tables.
The document discusses using Apache Cassandra and Apache Spark together for time series analysis of weather data. It provides an overview of Cassandra and how it is suitable for storing time series data modeled by primary key. It then discusses using Spark to perform analytics on the Cassandra data by rolling it up into aggregated tables for things like daily high/low temperatures and precipitation amounts. Example Cassandra table schemas and Spark queries are shown to illustrate the end-to-end workflow of ingesting raw weather data, storing it in Cassandra, and analyzing it using Spark.
Kafka is a distributed publish-subscribe system that is well-suited for building real-time data pipelines and streaming applications. It addresses issues that arise from scaling these applications, such as decoupling data producers and consumers and supporting parallel data processing. Kafka uses topics to organize streams of records called messages, which are partitioned and can be replicated across multiple servers. Producers write data to topics and consumers read from topics in a pull-based fashion coordinated by Zookeeper.
Cassandra Day Denver 2014: Using Cassandra to Support Crisis Informatics Rese...DataStax Academy
Crisis Informatics is an area of research that investigates how members of the public make use of social media during times of crisis. The amount of social media data generated by a single event is significant: millions of tweets and status updates accompanied by gigabytes of photos and video. To investigate the types of digital behaviors that occur around these events requires a significant investment in designing, developing, and deploying large-scale software infrastructure for both data collection and analysis. Project EPIC at the University of Colorado has been making use of Cassandra since Spring 2012 to provide a solid foundation for Project EPIC's data collection and analysis activities. Project EPIC has collected terabytes of social media data associated with hundreds of disaster events that must be stored, processed, analyzed, and visualized. This talk will cover how Project EPIC makes use of Cassandra and discuss some of the architectural, modeling, and analysis challenges encountered while developing the Project EPIC software infrastructure.
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Caserta
Caserta Concepts' implementation team presented a solution that performs big data analytics on active trade data in real-time. They presented the core components – Storm for the real-time ingest, Cassandra, a NoSQL database, and others. For more information on future events, please check out http://www.casertaconcepts.com/.
- Fred de Villamil is the director of infrastructure at Synthesio and has been working with Linux/BSD and open source since 1996.
- Synthesio uses Elasticsearch to power over 13,000 dashboards, indexing over 75 billion documents and 200TB of data across 5 clusters with 163 servers and 400TB of storage.
- They initially had performance issues with cross-cluster queries in MySQL but migrated to Elasticsearch in 2015 and saw significant performance improvements with their "Clipping Revolution" implementation.
- Over time they encountered issues at scale including too many shards, slow restarts, and garbage collection problems. They optimized their implementation with changes like rack awareness, G1GC tuning, and field data cache configuration.
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
In this webinar, we discuss how the secret sauce to your business analytics strategy remains rooted on your approached, methodologies and the amount of data incorporated into this critical exercise. We also address best practices to supercharge your cloud analytics initiatives, and tips and tricks on designing the right information architecture, data models and other tactical optimizations.
To learn more, visit: http://www.snaplogic.com/redshift-trial
The document discusses sizing a MongoDB cluster for a large coffee chain called PlanetDollar. It describes collecting mobile app performance data, including 2 years of historical event data with 3000-5000 events per second. The key steps to size the MongoDB cluster are: 1) calculate the collection and index sizes based on the amount of data, 2) estimate the working set size based on frequently accessed data, 3) use a simplified model to estimate IOPS requirements and adjust based on factors like working sets, and 4) calculate the number of shards needed based on storage, memory and IOPS requirements.
What You Need To Know About The Top Database TrendsDell World
The last 5 years have seen transformative changes in both personal and enterprise technologies. Many of these changes have been driven by or are driving paradigm shifts in database technologies and information systems. These include trends such as engineered systems including Exadata, "Big Data" technologies such as Hadoop ,"NoSQL" databases, SSDs, in-memory and columnar technologies. In this presentation we’ll review these big trends and describe how they are changing the database landscape and influencing the career prospects for database professionals.
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Databricks
This document summarizes a presentation on extending Spark SQL Data Sources APIs with join push down. The presentation discusses how join push down can significantly improve query performance by reducing data transfer and exploiting data source capabilities like indexes. It provides examples of join push down in enterprise data pipelines and SQL acceleration use cases. The presentation also outlines the challenges of network speeds and exploiting data source capabilities, and how join push down addresses these challenges. Future work discussed includes building a cost model for global optimization across data sources.
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
This document summarizes Florian Lautenschlager's presentation at FOSDEM 2015 about using Apache Solr as a scalable time series database. The presentation discusses how to efficiently store billions of time-correlated data objects using data compression techniques and metadata in Solr documents. This allows fast retrieval of data points within milliseconds while using only 37GB of disk space for 68 billion objects. The document also outlines Solr's query capabilities and how custom functions can perform server-side decompression and aggregation for efficient querying of time series data stored in Solr.
The document describes Curriculum Associates' journey to develop a real-time application architecture to provide teachers and students with real-time feedback. They started with batch ETL to a data warehouse and migrated to an in-memory database. They added Kafka message queues to ingest real-time event data and integrated a data lake. Now their system uses MemSQL, Kafka, and a data lake to provide real-time and batch processed data to users.
The document describes Curriculum Associates' journey to develop a real-time application architecture to provide teachers and students with real-time feedback. They started with batch ETL to a data warehouse and migrated to an in-memory database. They added Kafka message queues to ingest real-time event data and integrated a data lake. Now their system uses MemSQL, Kafka, and a data lake to provide real-time and batch processed data to users.
The document describes Curriculum Associates' journey to develop a real-time application architecture to provide teachers and students with real-time feedback. They started with batch ETL to a data warehouse and migrated to an in-memory database. They added Kafka message queues to ingest real-time event data and integrated a data lake. Now their system uses MemSQL, Kafka, and a data lake to provide real-time and batch processed data to users.
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
In this session, you will learn the key differences between a relational database management service (RDBMS) and non-relational (NoSQL) databases like Amazon DynamoDB. You will learn about suitable and unsuitable use cases for NoSQL databases. You'll learn strategies for migrating from an RDBMS to DynamoDB through a 5-phase, iterative approach. See how Sony migrated an on-premises MySQL database to the cloud with Amazon DynamoDB, and see the results of this migration.
The document discusses secrets and best practices for optimizing the performance of an OLTP system. It describes how the speaker's team was able to reduce response times by 50% through focused tuning of the application to database interface. Some techniques that helped include identifying redundant database calls, reducing round trips by passing data in arrays, processing data in bulk using INSERT statements, and returning less unused data. The document provides recommendations for locking strategies, using JDBC features like arrays and batching, and setting the optimal row prefetch.
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...Databricks
At Lennox International, we have thousands of IoT connected devices streaming data into the Azure platform with a minute level polling interval. The challenge was to use these data sets, combine with external data sources such as weather, and predict equipment failure with high levels of accuracy along with their influencing patterns and parameters. Previously the team was using a combination of on-premise and desktop tools to run algorithms on a sample set of devices. The result was low accuracy levels (around 65%) on a process that took more than 6 hours.
The team had to work through several data orchestration challenges and identify a machine learning platform which enabled them to collaborate between our engineering SME’s, Data Engineers and Data Scientists. The team decided to use Azure Databricks to build the data engineering pipelines, appropriate machine learning models and extract predictions using PySpark. To enhance the sophistication of the learning, the team worked on a variety of Spark ML models such as Gradient Boosted Trees and Random Forest. The team also implemented stacking, ensemble methods using H2O driverless AI and sparkling water on Azure Databricks clusters, which can scale up to 1000 cores.
Join us in this session and see how this resulted in models that run in 40 minutes with minimal tuning and predict failures with accuracy of about 90%.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
1. MySQL vs. MonetDB
A benchmark comparison between in-memory and out-of-memory
databases
Derek Aikins
Advisor: Dr. Feng Yu
2. Overview
• History of MySql.
• What is a relational database.
• History of Monetdb.
• What is a column-store database.
• Relational vs. Column-Store databases operations
• Goal of Project
• TPC-H
• Installing and compiling TPC-H
• Generating data using TPC-H
• Generating queries using TPC-H
• Queries
• Results of Tests
• Problems Encountered
• Conclusion
3. History of MySQL
• The world’s most popular open source relational database.
• Leading database choice for web-based applications and used by high
profile web properties including Facebook, Twitter, and Youtube.
• Created by a Swedish company, MySQL AB, originally developed by
David Axmark and Micheal Widenius in 1994.
• First Version release on May 23, 1995.
• MySQL AB was acquired by Sun Microsystems in 2008
• Oracle acquired Sun Microsystems on January 27, 2010
4. What is a Relational Database?
• MySql is a relational database
• A relational database is a digital database that organizes data into one
or more tables of columns and rows.
• Tables are known as relations
• Each table represents one “entity type” such as customer or product
• Rows (records) represent instances of that type of entity such as
“Lee” or “chair”
• Columns represent values to that instance such as address or price.
5. Examples of a database
Region Table
R_REGIONKEY R_NAME R_COMMENT
0 AFRICA
lar deposits. blithely final packages cajole. regular waters are
final requests. regular accounts are
1 AMERICA hs use ironic, even requests. s
2 ASIA ges. thinly even pinto beans ca
3 EUROPE ly final courts cajole furiously final excuse
4 MIDDLE EAST
uickly special accounts cajole carefully blithely close requests.
carefully final asymptotes
Nation Table
N_NATION
KEY
N_NAME N_REGIONKE
Y
N_COMMENT
0 ALGERIA 0 haggle. carefully final
1 ARGINTINA 1 al foxes promise slyly
2 BRAZIL 1 y alongside of the pending
3 CANADA 1 eas hang ironic,
4 EGYPT 4 y above the carefully
6. History of Monetdb
• An open source column-store database
• Developed in the Netherlands at the Centrum Wiskunde & Informatica (CWI)
• Data mining project in the 1990s required improved database support which
resulted in a CWI spin-off called Data Distilleries, which used early
implementations in its analytical suite
• Data Distilleries became a subsidiary of SPSS in 2003, which was later acquired
by IBM in 2009
• MonetDB in its current form was first created by Peter A. Boncz and Martin L.
Kerstein at the University of Amsterdam
• Please refer to Dr. Boncz’s thesis for more details
Monet: a next-Generation DBMS Kernel For Query-Intensive Applications
• The first version was released on September 30, 2004
7. What is a Column-Store Database
• Column-store databases store data as columns rather than rows
• By storing data in columns rather than rows, the database can access
the data it needs to answer a query rather than scanning and
discarding unwanted data in rows
• Query performance is often increased as a result, particularly in very
large data sets
• In most cases Column-Store databases store data in-memory (RAM)
unlike most row based databases that store their data on the
harddrive
8. Relational vs. Column-Store database
operation
Cust_ID Name Address City State Zip code Area
Code
Phone # Rent/Own Annual
Income
1 Jack 12 A St. Howland OH 44481 330 369-3597 Rent 74,000
2 Brian 13 B St. Howland OH 44481 330 856-1534 Rent 58,000
3 Mike 8 K St Warren OH 44483 330 373-1215 Own 92,000
4 Anna 62 Main St. Sharon PA 16101 724 654-0893 Own 110,000
5 Tasha 546 1st St. Stow OH 44752 216 849-5775 Rent 52,000
6 Sidney 84 Third St. Gilbert AZ 76534 480 758-6549 Own 90,000
7 Tyler 846 Wick Rd. Las Vegas NV 65487 231 654-5473 Own 60,000
8 Aaron 213 Maple St. Daytona FL 32547 519 159-3425 Rent 66,000
9 Beth 8749 Trump St. Detriot MI 87945 375 325-1849 Own 50,000
9. Goal of this Project
• Take a standard dataset and a standard set of queries and run the test
on two different databases, MySQL and Monetdb.
• By doing so, I intend to demonstrate the efficiency and speed that a
column-store database has over a traditional relational database.
• To do this I will be using a data generator, TPC-H, for benchmarking
databases which can also generate the queries for the data.
• Then push all data generated by TPC-H into both databases and run
each queries multiple times to get average times on both databases.
• After all runs are complete, I will gather all results generated and
compare how the two databases performed.
10. TPC-H
• A decision support benchmark
• Consists of a suite of business oriented ad-hoc queries and
concurrent data modifications
• The queries and the data populating the database have been chosen
to have broad industry-wide relevance
• This benchmark illustrates decision support systems that examine
large volumes of data, execute queries with a high degree of
complexity, and give answers to critical business questions
11. Installing and Compiling TPC-H
• The program used to generate the data from TPC-H is called dbgen.
• To install dbgen first I need to download the file from the TPC-H site using the
following command cd Downloads/tpch_2_16_0/tpch_2_15_0/dbgen/
• Then I had to create a makefile and go in and change the lines to CC = gcc
DATABASE=SQLSERVER Machine=LINUX
Workload=TPCH
• Next in the dbgen folder I had to find the tpcd.h file and edit the lines
define START_TRAN "BEGIN WORK;“ define
END_TRAN "COMMIT WORK;“
• Then I ran the make command.
12. Generating the data- 100 Mb
• After installing and setup TPC-H, I generated the data using dbgen
• Using the command ./dbgen -s 0.1 to generate the data where the 0.1 in the
command dictates the amount of data to be generated in this case I use 100 Mb
• Once the data was generated I created the database in MySQL using the CREATE
DATABASE tpch; command and then chose the database to load the data in the
tables.
• I then created each table with the CREATE TABLE command and set all the
descriptions for each column
• Once the tables were created it was time to load the data into each table using
the LOAD DATA LOCAL INFILE 'customer.tbl' INTO TABLE CUSTOMER FIELDS
TERMINATED BY '|‘ and changing the table name to each table
13. Query 1
mysql> select
s_acctbal,
s_name,
n_name,
p_partkey, where
p_mfgr, p_partkey = ps_partkey
s_address, and s_suppkey = ps_suppkey
s_phone, and s_nationkey = n_nationkey
s_comment and n_regionkey = r_regionkey
from and r_name = 'ASIA'
part, order by
supplier, s_acctbal desc,
partsupp, n_name,
nation, s_name
region p_partkey:
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 19
and p_type like 'PROMO ANODIZED BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
)
14. Query 2
select
-> l_orderkey,
-> sum(l_extendedprice * (1 - l_discount)) as revenue,
-> o_orderdate,
-> o_shippriority
-> from
-> customer,
-> orders,
-> lineitem
-> where
-> c_mktsegment = 'AUTOMOBILE'
-> and c_custkey = o_custkey
-> and l_orderkey = o_orderkey
-> and o_orderdate < date '1995-08-15'
-> and l_shipdate > date '1995-08-27'
-> group by
-> l_orderkey,
-> o_orderdate,
-> o_shippriority
-> order by
-> revenue desc,
-> o_orderdate;
15. Query 3
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1996-10-29'
and o_orderdate < date '1996-10-29' + interval '3' month
and exists (
select
*
from
lineitem
where
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
)
group by
o_orderpriority
order by
o_orderpriority;
16. Query 4
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1993-03-05'
and l_shipdate < date '1993-03-06' + interval '1' year
and l_discount between .03 and .06
and l_quantity < 3;
17. Query 5
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkeyt
and p_brand = 'Brand#11'
and p_container = 'MED JAR'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
);
18. Results for Queries 1 & 2
Query 1 Results
26.156667 25.97666667
0.0120647
0
5
10
15
20
25
30
Query 1
TIMEINSECONDS
Chart Title
MySQL (InnoDB) MySQL (MyISAM) Monetdb
Query 2 Results
645.31667 652.2966667
0.0201565
0
100
200
300
400
500
600
700
Query 2
TIMEINSECONDS
Chart Title
MySQL (InnoDB) MySQL (MyISAM) Monetdb
19. Results for Query 3 & 4
853.25
2086.01
0.014732667
0
500
1000
1500
2000
2500
Query 3
TIMEINSECONDS
QUERY 3
MySQL (InnoDB) MySQL (MyISAM) Monetdb
0.483333333
0.29333333
0.011313667
0
0.1
0.2
0.3
0.4
0.5
0.6
Query 4
TIMEINSECONDS
QUERY 4
MySQL (InnoDB) MySQL (MyISAM) Monetdb
20. Result for Query 5
137.1533333
337.2766667
0.018765667
0
50
100
150
200
250
300
350
400
Query 5
TIMEINSECONDS
QUERY 5
MySQL (InnoDB) MySQL (MyISAM) Monetdb
21. Total Numerical Results
MySQL (InnoDB) MySQL (MyISAM) Monetdb
QUERY TIME TIME (in seconds) QUERY TIME TIME (in seconds) Query TIME TIME (in seconds)
Query 1 Query 1 Query 1
run 1 26.14 sec 26.14 run 1 25.83 sec 25.83 run 1 12.854 ms 0.012854
run 2 26.20 sec 26.2 run 2 26.06 sec 26.06 run 2 12.141 ms 0.012141
run 3 26.13 sec 26.13 run 3 26.04 sec 26.04 run 3 11.199 ms 0.011199
Average Time 26.15666667 Average Time 25.977 Average Time 0.012064667
22. Total Numerical Results
MySQL (InnoDB) MySQL (MyISAM) Monetdb
Query 2 TIME TIME (in seconds) Query 2 TIME TIME (in seconds) Query 2 TIME TIME (in seconds)
run 1 10 min 42.03 sec 642.03 run 1 10 min 52.36 sec 652.36 run 1 23.548 ms 0.023548
run 2 10 min 46.84 sec 646.84 run 2 10 min 51.56 sec 652.56 run 2 16.765 ms 0.016765
run 3 10 min 47.08 sec 647.08 run 3 10 min 51.97 sec 651.97 run 3 24.834 ms 0.024834
Average time 645.3166667 Average time 652.2966667 Average time 0.0201565
Query 3 Query 3 Query 3
run 1 14 min 0.65 sec 840.65 run 1 35 min 53.17 sec 2153.17 run 1 15.332 ms 0.015332
run 2 14 min 12.67 sec 852.67 run 2 34 min 11.52 sec 2051.52 run 2 15.168 ms 0.015168
run 3 14 min 26.43 sec 866.43 run 3 34 min 13.34 sec 2053.34 run 3 13.698 ms 0.013698
Average time 853.25 Average time 2086.01 Average time 0.014732667
23. Total Numerical Results
MySQL (InnoDB) MySQL (MyISAM) Monetdb
QUERY TIME TIME (in seconds) QUERY TIME TIME (in seconds) Query TIME TIME (in seconds)
Query 4 Query 4 Query 4
run 1 .48 sec 0.48 run 1 0.29 sec 0.29 run 1 12.992 ms 0.012992
run 2 .49 sec 0.49 run 2 0.30 sec 0.30 run 2 10.641 ms 0.010641
run 3 .48 sec 0.48 run 3 0.29 sec 0.29 run 3 10.308 ms 0.010308
Average time 0.483333333 Average time 0.293333333 Average time 0.011313667
Query 5 Query 5 Query 5
run 1 2 min 18.08 sec 138.08 run 1 5 min 36.59 sec 336.59 run 1 18.776 ms 0.018776
run 2 2 min 19.34 sec 139.34 run 2 5 min 39.10 sec 339.1 run 2 21.746 ms 0.021746
run 3 2 min 14.04 sec 134.04 run 3 5 min 36.14 sec 336.14 run 3 15.775 ms 0.015775
Average time 137.1533333 Average time 337.2766667 Average time 0.018765667
24. Comparison Results
Queries MySQL
(InnoDB)
MySQL
(MyISAM)
Monetdb InnoDB/
MyISAM
InnoDB/
Monetdb
MyISAM/
Monetdb
Average time Average time Average time
Query 1 26.157 25.977 0.0121 0.0069 times faster
than InnoDB
2,168 times faster
than InnoDB
2,153 times faster
than MyISAM
Query 2 645.317 652.297 0.0201 0.0108 times slower
than InnoDB
32,015 times faster
than InnoDB
32,361 times faster
than MyISAM
Query 3 853.25 2086.01 0.0147 2.445 times slower
than InnoDB
57,915 times faster
than InnoDB
141,591 times faster
than MyISAM
Query 4 0.0483 0.293 0.0113 1.648 times faster
than InnoDB
43 times faster than
InnoDB
26 times faster than
MyISAM
Query 5 137.153 337.277 0.0188 2.459 times slower
than InnoDB
7309 times faster than
InnoDB
17,973 times faster
than MyISAM
25. Challenges Encountered
• Throughout this project I encountered several challenges:
• The first difficulty encountered was the installation of the several programs
used for this project
• Once all programs were installed the next challenge was the uploading of the
data to the databases
• After all data was loaded into the database tables, one of the largest
challenges was to examine each query and fill in areas that needed exact
information from the tables for the query to even run
• The largest challenge I faced through this entire project was learning to use
the command line to do everything as I have not had much experience with
this.
26. Summary
• A relational database is a digital database that organizes data into one
or more tables of columns and rows.
• Column-store databases store data as columns rather than rows
• TPC-H is a decision support benchmark that examine large volumes of
data, execute queries with a high degree of complexity, and give
answers to critical business questions
• As the data shows from the tests conducted on the two different
databases, column-store databases such as Monetdb are considerably
faster in run time compared to traditional relational databases such
as MySQL.
27. ALL Results and Test Queries can be found at:
https://github.com/Djaikins/MySQL-vs-Monetdb
or
By searching djaikins on Github