New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
This document summarizes a talk on optimizer hints in databases. It begins with introducing the speaker and their background. It then covers the basics of query optimization in databases and how hints can provide additional information to the optimizer. Specifically, it discusses query hints to force a plan, statistics hints to provide join selectivity, and data hints about column dependencies. It notes that PostgreSQL does not support hints directly but similar control is possible through configuration parameters. It concludes by listing some drawbacks of hints.
Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQLInMobi Technology
Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL
By Álvaro Hernández at India PostgreSQL UserGroup Meetup, Bangalore
at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
This document summarizes Spark as a service on YARN clusters and discusses key features:
- Spark on YARN allows running multiple workflows like Spark and Hadoop on the same cluster and improves resource utilization. The application master can dynamically request more containers as needed.
- Qubole YARN clusters support autoscaling to upscale and downscale based on load and use spot instances for cost savings.
- Spark applications were limited by initial resource allocation. Dynamic provisioning allows applications to request more executors or release unused executors to improve performance and cluster utilization.
Agenda
• Technical cases in PostgreSQL
• Database Monitoring Methods
By Rohit Vyas at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
PostgreSQL performance improvements in 9.5 and 9.6Tomas Vondra
The document summarizes performance improvements in PostgreSQL versions 9.5 and 9.6. Some key improvements discussed include optimizations to sorting, hash joins, BRIN indexes, parallel query processing, aggregate functions, checkpoints, and freezing. Performance tests on sorting, hash joins, and parallel queries show significant speedups from these changes, such as faster sorting times and better scalability with parallel queries.
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
This document discusses several performance improvements made in PostgreSQL versions 9.5 and beyond. Some key improvements discussed include:
- Faster sorting through allowing sorting by inlined functions, abbreviated keys for VARCHAR/TEXT/NUMERIC, and Sort Support benefits.
- Improved hash joins through reduced palloc overhead, smaller NTUP_PER_BUCKET, and dynamically resizing the hash table.
- Index improvements like avoiding index tuple copying, GiST and bitmap index scan optimizations, and block range tracking in BRIN indexes.
- Aggregate functions see speedups through using 128-bit integers for internal state instead of NUMERIC in some cases.
- Other optimizations affect PL/pgSQL performance,
POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 11 release.
Creative Commons Attribution License http://momjian.us/presentations
Last updated: September, 2018
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
This document summarizes a talk on optimizer hints in databases. It begins with introducing the speaker and their background. It then covers the basics of query optimization in databases and how hints can provide additional information to the optimizer. Specifically, it discusses query hints to force a plan, statistics hints to provide join selectivity, and data hints about column dependencies. It notes that PostgreSQL does not support hints directly but similar control is possible through configuration parameters. It concludes by listing some drawbacks of hints.
Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQLInMobi Technology
Toro DB- Open-source, MongoDB-compatible database, built on top of PostgreSQL
By Álvaro Hernández at India PostgreSQL UserGroup Meetup, Bangalore
at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
This document summarizes Spark as a service on YARN clusters and discusses key features:
- Spark on YARN allows running multiple workflows like Spark and Hadoop on the same cluster and improves resource utilization. The application master can dynamically request more containers as needed.
- Qubole YARN clusters support autoscaling to upscale and downscale based on load and use spot instances for cost savings.
- Spark applications were limited by initial resource allocation. Dynamic provisioning allows applications to request more executors or release unused executors to improve performance and cluster utilization.
Agenda
• Technical cases in PostgreSQL
• Database Monitoring Methods
By Rohit Vyas at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
PostgreSQL performance improvements in 9.5 and 9.6Tomas Vondra
The document summarizes performance improvements in PostgreSQL versions 9.5 and 9.6. Some key improvements discussed include optimizations to sorting, hash joins, BRIN indexes, parallel query processing, aggregate functions, checkpoints, and freezing. Performance tests on sorting, hash joins, and parallel queries show significant speedups from these changes, such as faster sorting times and better scalability with parallel queries.
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
This document discusses several performance improvements made in PostgreSQL versions 9.5 and beyond. Some key improvements discussed include:
- Faster sorting through allowing sorting by inlined functions, abbreviated keys for VARCHAR/TEXT/NUMERIC, and Sort Support benefits.
- Improved hash joins through reduced palloc overhead, smaller NTUP_PER_BUCKET, and dynamically resizing the hash table.
- Index improvements like avoiding index tuple copying, GiST and bitmap index scan optimizations, and block range tracking in BRIN indexes.
- Aggregate functions see speedups through using 128-bit integers for internal state instead of NUMERIC in some cases.
- Other optimizations affect PL/pgSQL performance,
POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 11 release.
Creative Commons Attribution License http://momjian.us/presentations
Last updated: September, 2018
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
Performing simple DDL operations as ADD/DROP INDEX in a tightly connected cluster as PXC, can become a nightmare. Metalock will prevent Data modifications for long period of time and to bypass this, we need to become creative, like using Rolling schema upgrade or Percona online-schema-change. With NBO, we will be able to avoid such craziness at least for a simple operation like adding an index. In this brief talk I will illustrate what you should do to see the negative effect of NON using NBO, as well what you should do to use it correctly and what to expect out of it.
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
Table partitioning and aggregated data tables (such as materialized views) are two approaches to improve PostgreSQL database performance as data volumes grow large over time. Table partitioning involves splitting a large table into multiple smaller tables (partitions) based on a partition function and key, while aggregated data tables pre-compute query results to avoid repeated computation. Both can improve query performance but come with caveats such as increased planning time for partitions or expensive refresh costs for materialized views. The best approach depends on each unique situation and data access patterns.
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiSatoshi Nagayasu
The document provides an overview of new features in PostgreSQL versions 9.4 and 9.5, including improvements to NoSQL support with JSONB and GIN indexes, analytics functions like aggregation and materialized views, SQL features like UPSERT, security with row level access policies, replication capabilities using logical decoding, and infrastructure to support parallelization. It also outlines the status and changes between versions, and resources for using and learning about PostgreSQL.
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
This document discusses how PostgreSQL works with disks and provides recommendations for disk subsystem monitoring, hardware selection, and configuration tuning to optimize performance. It explains that PostgreSQL relies on disk I/O for reading pages, writing the write-ahead log (WAL), and checkpointing. It recommends monitoring disk utilization, IOPS, latency, and I/O wait. The document also provides tips for choosing hardware like SSDs or RAID configurations and configuring the operating system, file systems, and PostgreSQL to improve performance.
The document discusses PostgreSQL's write-ahead log (WAL), which records database changes before writing them to disk for crash safety. The WAL allows for features like online backups by archiving WAL records, point-in-time recovery by restoring from backups and replaying WAL, and replication by transmitting WAL to standby servers. It works by writing each change as a WAL record before updating data pages, and replaying the log during recovery to reconstruct unfinished transactions after a crash.
In-core compression: how to shrink your database size in several timesAleksander Alekseev
The document discusses techniques for compressing database size in Postgres, including:
1. Using in-core block-level compression as a feature of Postgres Pro EE to shrink database size by several times.
2. The ZSON extension provides transparent JSONB compression by replacing common strings with 16-bit codes and compressing the data.
3. Various schema optimizations like proper data types, column ordering, and packing data can reduce size by improving storage layout and enabling TOAST compression.
This document provides instructions for setting up hot standby replication between a primary and secondary PostgreSQL database. It describes configuring WAL archiving on the primary, taking a backup of the primary to initialize the secondary, creating a recovery.conf file on the secondary, and testing replication. It also explains how to trigger a failover, switchover, and rebuild the primary database after a failover.
The document provides an overview of PostgreSQL performance tuning. It discusses caching, query processing internals, and optimization of storage and memory usage. Specific topics covered include the PostgreSQL configuration parameters for tuning shared buffers, work memory, and free space map settings.
Right now postgres can't compress its data in many situations and that leads sometimes to increased storage overhead by the order of magnitude comparing with commercial DBMS. Common viewpoint that this task can be accomplished by file system level compression but most popular and well tested Linux file system can't do that. I will talk about our patches that implements page compression on disk or on disk + in memory; in what situation it is better to use what kind of compression; and also discuss experience of using compression in production.
Как в PostgreSQL устроено взаимодействие с диском, какие проблемы производительности при этом бывают и как их решать выбором подходящего hardware, настройками операционной системы и настройками PostgreSQL
This talk cover various advanced topics in the area of backups:
- incremental backups;
- archive management;
- backup validation;
- retention policies;
etc.
Based on these features, we'll compare various backup/recovery solutions for PostgreSQL.
This information will help you to choose the most appropriate tool for your system.
Size can creep up on you. Some day you may wake up to a multi-terabyte Postgres system handling over 3000 tps staring you down. Learn the best ways to manage these systems as they grow, and find out what new features in 9.0 have made life easier for administrators and application developers working with big data.
This talk will lead you through solutions to problems Postgres faces when it gets big: backups, transaction wraparound, bloat, huge catalogs and upgrades. You need to monitor the right things, find the gems in DBA-friendly database functions and catalog tables, and know the right places to look to spot problems early. We’ll also go over monitoring best practices and open source tools to get the job done.
Working with multiple versions of Postgres back to version 8.2 will be included, and as well as tips on making the most out of new features in 9.0. War stories will be taken from real-world work with Emma, an email marketing company with a few large databases.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
This document discusses streaming replication in PostgreSQL. It covers how streaming replication works, including the write-ahead log and replication processes. It also discusses setting up replication between a primary and standby server, including configuring the servers and verifying replication is working properly. Monitoring replication is discussed along with views and functions for checking replication status. Maintenance tasks like adding or removing standbys and pausing replication are also mentioned.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
This document describes how to configure MySQL database replication between a master and slave server. The key steps are:
1. Configure the master server by editing its configuration file to enable binary logging and set the server ID. Create a replication user and grant privileges.
2. Export the databases from the master using mysqldump.
3. Configure the slave server by editing its configuration file to point to the master server. Import the database dump. Start replication on the slave.
4. Verify replication is working by inserting data on the master and checking it is replicated to the slave.
This document provides an overview of troubleshooting streaming replication in PostgreSQL. It begins with introductions to write-ahead logging and replication internals. Common troubleshooting tools are then described, including built-in views and functions as well as third-party tools. Finally, specific troubleshooting cases are discussed such as replication lag, WAL bloat, recovery conflicts, and high CPU recovery usage. Throughout, examples are provided of how to detect and diagnose issues using the various tools.
This is the presentation from Null/OWASP/g4h December Bangalore MeetUp by Ahamed Nafeez.
technology.inmobi.com/events/null-owasp-g4h-december-meetup
Proxpective: Attacking Web Proxies like never before
Devrim Gunduz gives a presentation on Write-Ahead Logging (WAL) in PostgreSQL. WAL logs all transactions to files called write-ahead logs (WAL files) before changes are written to data files. This allows for crash recovery by replaying WAL files. WAL files are used for replication, backup, and point-in-time recovery (PITR) by replaying WAL files to restore the database to a previous state. Checkpoints write all dirty shared buffers to disk and update the pg_control file with the checkpoint location.
In 40 minutes the audience will learn a variety of ways to make postgresql database suddenly go out of memory on a box with half a terabyte of RAM.
Developer's and DBA's best practices for preventing this will also be discussed, as well as a bit of Postgres and Linux memory management internals.
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
Table partitioning and aggregated data tables (such as materialized views) are two approaches to improve PostgreSQL database performance as data volumes grow large over time. Table partitioning involves splitting a large table into multiple smaller tables (partitions) based on a partition function and key, while aggregated data tables pre-compute query results to avoid repeated computation. Both can improve query performance but come with caveats such as increased planning time for partitions or expensive refresh costs for materialized views. The best approach depends on each unique situation and data access patterns.
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiSatoshi Nagayasu
The document provides an overview of new features in PostgreSQL versions 9.4 and 9.5, including improvements to NoSQL support with JSONB and GIN indexes, analytics functions like aggregation and materialized views, SQL features like UPSERT, security with row level access policies, replication capabilities using logical decoding, and infrastructure to support parallelization. It also outlines the status and changes between versions, and resources for using and learning about PostgreSQL.
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
This document discusses how PostgreSQL works with disks and provides recommendations for disk subsystem monitoring, hardware selection, and configuration tuning to optimize performance. It explains that PostgreSQL relies on disk I/O for reading pages, writing the write-ahead log (WAL), and checkpointing. It recommends monitoring disk utilization, IOPS, latency, and I/O wait. The document also provides tips for choosing hardware like SSDs or RAID configurations and configuring the operating system, file systems, and PostgreSQL to improve performance.
The document discusses PostgreSQL's write-ahead log (WAL), which records database changes before writing them to disk for crash safety. The WAL allows for features like online backups by archiving WAL records, point-in-time recovery by restoring from backups and replaying WAL, and replication by transmitting WAL to standby servers. It works by writing each change as a WAL record before updating data pages, and replaying the log during recovery to reconstruct unfinished transactions after a crash.
In-core compression: how to shrink your database size in several timesAleksander Alekseev
The document discusses techniques for compressing database size in Postgres, including:
1. Using in-core block-level compression as a feature of Postgres Pro EE to shrink database size by several times.
2. The ZSON extension provides transparent JSONB compression by replacing common strings with 16-bit codes and compressing the data.
3. Various schema optimizations like proper data types, column ordering, and packing data can reduce size by improving storage layout and enabling TOAST compression.
This document provides instructions for setting up hot standby replication between a primary and secondary PostgreSQL database. It describes configuring WAL archiving on the primary, taking a backup of the primary to initialize the secondary, creating a recovery.conf file on the secondary, and testing replication. It also explains how to trigger a failover, switchover, and rebuild the primary database after a failover.
The document provides an overview of PostgreSQL performance tuning. It discusses caching, query processing internals, and optimization of storage and memory usage. Specific topics covered include the PostgreSQL configuration parameters for tuning shared buffers, work memory, and free space map settings.
Right now postgres can't compress its data in many situations and that leads sometimes to increased storage overhead by the order of magnitude comparing with commercial DBMS. Common viewpoint that this task can be accomplished by file system level compression but most popular and well tested Linux file system can't do that. I will talk about our patches that implements page compression on disk or on disk + in memory; in what situation it is better to use what kind of compression; and also discuss experience of using compression in production.
Как в PostgreSQL устроено взаимодействие с диском, какие проблемы производительности при этом бывают и как их решать выбором подходящего hardware, настройками операционной системы и настройками PostgreSQL
This talk cover various advanced topics in the area of backups:
- incremental backups;
- archive management;
- backup validation;
- retention policies;
etc.
Based on these features, we'll compare various backup/recovery solutions for PostgreSQL.
This information will help you to choose the most appropriate tool for your system.
Size can creep up on you. Some day you may wake up to a multi-terabyte Postgres system handling over 3000 tps staring you down. Learn the best ways to manage these systems as they grow, and find out what new features in 9.0 have made life easier for administrators and application developers working with big data.
This talk will lead you through solutions to problems Postgres faces when it gets big: backups, transaction wraparound, bloat, huge catalogs and upgrades. You need to monitor the right things, find the gems in DBA-friendly database functions and catalog tables, and know the right places to look to spot problems early. We’ll also go over monitoring best practices and open source tools to get the job done.
Working with multiple versions of Postgres back to version 8.2 will be included, and as well as tips on making the most out of new features in 9.0. War stories will be taken from real-world work with Emma, an email marketing company with a few large databases.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
This document discusses streaming replication in PostgreSQL. It covers how streaming replication works, including the write-ahead log and replication processes. It also discusses setting up replication between a primary and standby server, including configuring the servers and verifying replication is working properly. Monitoring replication is discussed along with views and functions for checking replication status. Maintenance tasks like adding or removing standbys and pausing replication are also mentioned.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
This document describes how to configure MySQL database replication between a master and slave server. The key steps are:
1. Configure the master server by editing its configuration file to enable binary logging and set the server ID. Create a replication user and grant privileges.
2. Export the databases from the master using mysqldump.
3. Configure the slave server by editing its configuration file to point to the master server. Import the database dump. Start replication on the slave.
4. Verify replication is working by inserting data on the master and checking it is replicated to the slave.
This document provides an overview of troubleshooting streaming replication in PostgreSQL. It begins with introductions to write-ahead logging and replication internals. Common troubleshooting tools are then described, including built-in views and functions as well as third-party tools. Finally, specific troubleshooting cases are discussed such as replication lag, WAL bloat, recovery conflicts, and high CPU recovery usage. Throughout, examples are provided of how to detect and diagnose issues using the various tools.
This is the presentation from Null/OWASP/g4h December Bangalore MeetUp by Ahamed Nafeez.
technology.inmobi.com/events/null-owasp-g4h-december-meetup
Proxpective: Attacking Web Proxies like never before
Building ML Pipelines:
- What do ML Pipelines Look Like?
- Building one ML pipeline
- ML pipeline in code
- Why use ML pipeline?
By Debidatta Dwibedi, presented at Data Science Meetup at InMobi.
http://technology.inmobi.com/events/data-science-meetup
This document defines and explains cloud computing. It begins by defining cloud and computing separately, then combining the terms to explain cloud computing as computing done over the Internet. It describes how cloud computing differs from conventional computing by being distributed across networks rather than done locally. The document also defines the three main types of cloud computing: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). It provides examples to illustrate each type and explains their relationship. Additional advantages and applications of cloud computing are discussed.
8 Ways a Digital Media Platform is More Powerful than “Marketing”New Rainmaker
You may have heard that “media not marketing” is the future of online business … but what does that actually mean, what can it look like?
As you’ll see in this SlideShare, examples of a media-first approach done very well are all around us, it only takes a simple shift in thinking to see them.
Can this "media not marketing" approach to building an audience have an actual effect on the bottom line revenue of your business, or is it just more philosophical wordplay?
Let's find out ...
The document provides five design principles for creating slides that effectively communicate messages to audiences:
1. Focus on the main message you want the audience to remember.
2. Keep designs simple with less text and only 1 main point per slide.
3. Use interesting fonts instead of boring standard ones to engage audiences.
4. Include high quality images that visually represent the message.
5. Choose a color scheme that fits the theme and works cohesively.
Rand Fishkin discusses why content marketing often fails and provides 5 key reasons: 1) Unrealistic expectations of how content marketing works, 2) Creating content without a community to amplify it, 3) Focusing on content creation but not amplification, 4) Ignoring search engine optimization, and 5) Giving up too soon and not allowing time for content to gain traction. He emphasizes that content marketing is a long-term process of building relationships and that most successful content took years of iteration before gaining significant reach.
SlideShare now has a player specifically designed for infographics. Upload your infographics now and see them take off! Need advice on creating infographics? This presentation includes tips for producing stand-out infographics. Read more about the new SlideShare infographics player here: http://wp.me/p24NNG-2ay
This infographic was designed by Column Five: http://columnfivemedia.com/
No need to wonder how the best on SlideShare do it. The Masters of SlideShare provides storytelling, design, customization and promotion tips from 13 experts of the form. Learn what it takes to master this type of content marketing yourself.
This document provides tips to avoid common mistakes in PowerPoint presentation design. It identifies the top 5 mistakes as including putting too much information on slides, not using enough visuals, using poor quality or unreadable visuals, having messy slides with poor spacing and alignment, and not properly preparing and practicing the presentation. The document encourages presenters to use fewer words per slide, high quality images and charts, consistent formatting, and to spend significant time crafting an engaging narrative and rehearsing their presentation. It emphasizes that an attractive design is not as important as being an effective storyteller.
10 Ways to Win at SlideShare SEO & Presentation OptimizationOneupweb
Thank you, SlideShare, for teaching us that PowerPoint presentations don't have to be a total bore. But in order to tap SlideShare's 60 million global users, you must optimize. Here are 10 quick tips to make your next presentation highly engaging, shareable and well worth the effort.
For more content marketing tips: http://www.oneupweb.com/blog/
This document provides tips for getting more engagement from content published on SlideShare. It recommends beginning with a clear content marketing strategy that identifies target audiences. Content should be optimized for SlideShare by using compelling visuals, headlines, and calls to action. Analytics and search engine optimization techniques can help increase views and shares. SlideShare features like lead generation and access settings help maximize results.
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
This document provides a summary of the analytics available through SlideShare for monitoring the performance of presentations. It outlines the key metrics that can be viewed such as total views, actions, and traffic sources over different time periods. The analytics help users identify topics and presentation styles that resonate best with audiences based on view and engagement numbers. They also allow users to calculate important metrics like view-to-contact conversion rates. Regular review of the analytics insights helps users improve future presentations and marketing strategies.
Each month, join us as we highlight and discuss hot topics ranging from the future of higher education to wearable technology, best productivity hacks and secrets to hiring top talent. Upload your SlideShares, and share your expertise with the world!
Not sure what to share on SlideShare?
SlideShares that inform, inspire and educate attract the most views. Beyond that, ideas for what you can upload are limitless. We’ve selected a few popular examples to get your creative juices flowing.
How to Make Awesome SlideShares: Tips & TricksSlideShare
Turbocharge your online presence with SlideShare. We provide the best tips and tricks for succeeding on SlideShare. Get ideas for what to upload, tips for designing your deck and more.
SlideShare is a global platform for sharing presentations, infographics, videos and documents. It has over 18 million pieces of professional content uploaded by experts like Eric Schmidt and Guy Kawasaki. The document provides tips for setting up an account on SlideShare, uploading content, optimizing it for searchability, and sharing it on social media to build an audience and reputation as a subject matter expert.
The document discusses various techniques for optimizing query performance in MySQL, including using indexes appropriately, avoiding full table scans, and tools like EXPLAIN, Performance Schema, and pt-query-digest for analyzing queries and identifying optimization opportunities. It provides recommendations for index usage, covering indexes, sorting and joins, and analyzing slow queries.
MySQL® 5.7 is a great release which has a lot to offer, especially in the development and replication areas. It provides a lot of new optimizer features for developers to take advantage of, a much more powerful GIS function and high performance JSON data type, allowing for a more powerful store for semi-structured data. It also features dramatically improved Performance Schema, Parallel and Multi-Source replication, allowing you to scale much further than ever before, just to give you a taste. In this webinar, we will provide an overview of the most important MySQL 5.7 features.
This webinar will be part of a 3-part series which will include MySQL 5.7 for Developers and MySQL 5.7 for DBAs.
Oracle Database 11g Release 2 includes enhancements to database administration features such as automated segment creation, audit trail management tools, and SQL*Plus exit behavior configuration; it also changes the installation process by making ASM a separate Grid Infrastructure and including full software updates in patch set installations.
Query Optimization with MySQL 5.6: Old and New TricksMYXPLAIN
The document discusses query optimization techniques for MySQL 5.6, including both established techniques and new features in 5.6. It provides an overview of tools for profiling queries such as EXPLAIN, the slow query log, and the performance schema. It also covers indexing strategies like compound indexes and index condition pushdown.
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
MariaDB Server 10.2 includes several new features for analytics, JSON, replication, database compatibility, storage engines, security, administration, performance, and optimizations. Some key additions include window functions and common table expressions for more efficient queries, JSON and GeoJSON functions, delayed and compressed replication, multi-trigger support, CHECK constraints, indexes on virtual columns, the MyRocks storage engine, per-user load limitations, and TLS connections. MaxScale 2.1 provides up to 2.8x performance gains along with new security features like encrypted binlogs and LDAP authentication as well as support for Aurora clusters and dynamic configurations.
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
The document provides an overview of new features and enhancements in MariaDB Server 10.2 and MaxScale 2.1. For MariaDB Server 10.2, key additions include window functions, common table expressions, JSON and GeoJSON functions, new replication features like delayed replication, storage engine enhancements including a new MyRocks storage engine, and performance optimizations. MaxScale 2.1 focuses on performance improvements up to 2.8x faster, enhanced security features like encrypted binlogs and SSL, and support for Aurora clusters and dynamic configuration.
The slow query log aggregates queries that took longer than a threshold to run and examines more than a minimum number of rows. Tools like mk-query-digest and mysqldumpslow can analyze the slow query log to provide summaries of the longest running queries, number of calls, and other metrics to help identify optimization opportunities. The top query in this example was a SELECT statement joining multiple tables that accounted for over 99% of the total execution time recorded in the log.
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
The document discusses code optimization techniques in Spark SQL's Catalyst optimizer. It describes how function outlining can improve performance of generated Java code by splitting large routines into smaller ones. The document outlines a Spark SQL query optimization case study where outlining a 300+ line routine from Catalyst code generation improved query performance by up to 19% on a Power8 cluster. Overall, the document examines how function outlining and other code generation optimizations in Catalyst can help the Java JIT compiler better optimize Spark SQL queries.
The document summarizes new features in the query optimizer in MariaDB 10.4, including:
1) An optimizer trace that provides insight into the query planning process.
2) Using sampling for histogram collection during ANALYZE TABLE to improve performance.
3) Rowid filtering that pushes qualifying conditions into joins to filter out non-matching rows earlier.
4) Updated default settings that make better use of statistics and condition selectivity.
- The document discusses managing a large OLTP database at PayPal, including capacity management, planned maintenance, performance management, and troubleshooting. It provides details on monitoring the database infrastructure, conducting maintenance such as patching and switchovers, and optimizing performance for Oracle RAC environments. The goal is to support business needs and provide uninterrupted service through proactive management of the database tier.
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
The pg_stat_monitor is the statistics collection tool based on PostgreSQL’s contrib module pg_stat_statements. PostgreSQL’s pg_stat_statements provides only basic statistics, which is sometimes not enough. The major shortcoming in pg_stat_statements is that it accumulates all the queries and statistics, but does not provide aggregated statistics or histogram information. In this case, a user needs to calculate the aggregate, which is quite expensive. Pg_stat_monitor provides the pre-calculated aggregates. pg_stat_monitor collects and aggregates data on a bucket basis. The size and number of buckets should be configured using GUC (Grand Unified Configuration). The buckets are used to collect the statistics and aggregate them in a bucket. The talk will cover the usage of pg_stat_monitor and how it is better than pg_stat_statements.
SQL Server 2022 Programmability & PerformanceGianluca Hotz
SQL Server 2022 has introduced many new features across all areas of the product. In this session, we will focus on the news regarding programmability and performance improvements.
This document provides a summary of a presentation on becoming an accidental PostgreSQL database administrator (DBA). It covers topics like installation, configuration, connections, backups, monitoring, slow queries, and getting help. The presentation aims to help those suddenly tasked with DBA responsibilities to not panic and provides practical advice on managing a PostgreSQL database.
As cloud adoption has grown more rapidly in the last decade , how DBA's a can add more value to system and bring in more scalability to the DB server. This talk was presented at Open Source India 2018 conference by Kabilesh and Manosh of Mydbops. They share a few experience and value addition made to customers during their consulting process.
Modernizing Your Database with SQL Server 2019 discusses SQL Server 2019 features that can help modernize a database, including:
- The Hybrid Buffer Pool which supports persistent memory to improve performance on read-heavy workloads.
- Memory-Optimized TempDB Metadata which stores TempDB metadata in memory-optimized tables to avoid certain blocking issues.
- Intelligent Query Processing features like Adaptive Query Processing, Batch Mode processing on rowstores, and Scalar UDF Inlining which improve query performance.
- Approximate Count Distinct, a new function that provides an estimated count of distinct values in a column faster than a precise count.
- Lightweight profiling, enabled by default, which provides query plan
The document summarizes new features in Oracle Database 12c from Oracle 11g that would help a DBA currently using 11g. It lists and briefly describes features such as the READ privilege, temporary undo, online data file move, DDL logging, and many others. The objectives are to make the DBA aware of useful 12c features when working with a 12c database and to discuss each feature at a high level within 90 seconds.
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
Getting to understand your Kubernetes storage capabilities is important in order to run a proper cluster in production. In this session I will demonstrate how to use Sherlock, an open source platform written to test persistent NVMe/TCP storage in Kubernetes, either via synthetic workload or via variety of databases, all easily done and summarized to give you an estimate of what your IOPS, Latency and Throughput your storage can provide to the Kubernetes cluster.
In this presentation we discuss the New Features of MariaDB 10.4. First we give a short overview of the MariaDB Branches and Forks. Then we talk about the announced IPO. Technically we cover topics like Authentication, Accounts, InnoDB, Optimizer improvements, Application-Time Period Tables the new Backup Stage Galera 4 and other changes...
Ensemble methods of algorithmic trading, it's background and other details.
By Abhijit Sharang, presented at Data Science Meetup at InMobi
http://technology.inmobi.com/events/data-science-meetup
Backbone and Graphs, why we need them and how it works.
By Ashutosh Agrawal, presented at Bangalore JS Meetup at InMobi
http://technology.inmobi.com/events/bangalore-js-meetup
The digital universe is huge and is growing at a stellar rate and along with it grows the data generated every second. By 2020, there will be nearly as many digital bits as there are stars in this universe. That effectively means infinite as per the reports published by IDC in 2014. InMobi has grown leaps and bounds globally in past few years and that has only caused the data here to grow exponentially. There are thousands of advertisers and publishers on InMobi network, handling the OLTP ( 200-300 GB ) and OLAP ( 14TB ) demands high availability and the best performance. To ensure the smoothness and 24/7 availability of our production database servers, we are using a lot of open source technologies to keep an eye on all the Postgresql servers running across different data centres. We have one of the biggest Postgresql Master-Slave Streaming Replication production setup and it is very important for us to monitor the database performance, production traffic and some analytics on top of each and every database server @InMobi.
This presentation is from Null/OWASP/G4H November Bangalore MeetUp 2014.
technology.inmobi.com/events/null-owasp-g4h-november-meetup
Talk Outline:-
A) Reflective-(Non-Persistent Cross-site Scripting)
- What is Reflective Cross-site scripting.
- Testing for Reflected Cross site scripting
How to Test
- Black Box testing
- Bypass XSS filters
- Gray Box testing
Tools
Defending Against Reflective Cross-site scripting.
Examples of Reflective Cross-Site Scripting Attacks.
B) Stored -(Persistent Cross-site Scripting)
What is Stored Cross-site scripting.
How to Test
- Black Box testing
- Gray Box testing
Tools
Defending Against Stored Cross-site scripting.
Examples of Stored Cross-Site Scripting Attacks.
This is the presentation from Null/OWASP/g4h November Bangalore MeetUp by Shivendra Saxena.
technology.inmobi.com/events/null-owasp-g4h-november-meetup
This topic would deal with the introduction to threat modeling. We'll discuss about the process of brainstorming about the issues which might appear when the product gets built. Will discuss about the STRIDE model and about the importance of the eraky detection of the security issues.
This is the presentation from Null/OWASP/g4h December Bangalore MeetUp by Akash Mahajan.
technology.inmobi.com/events/null-owasp-g4h-december-meetup
Abstract:
This will cover the basics of Hyper Text Transfer Protocol. You will learn how to send HTTP requests like GET, POST by crafting them manually and using a command line tool like CURL. You will also see how session management using cookies happens using the same tools.
To practice along please install curl (http://curl.haxx.se/download.html).
The Synapse IoT Stack: Technology Trends in IOT and Big DataInMobi Technology
This is the presentation from Big Data November Bangalore Meetup 2014.
http://technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- What does THE HIVE provide?
- Goals of Synapse Tech Stack
- THE HIVE Startups
- Demystifying IoT Market
- Synapse Stack for IoT
- Big Data Challenge
- Synapse Lambda Architecture
- Synapse Components
- Synapse Internals
- AKILI – Synapse Machine Learning
This presentation is from BigData November Bangalore MeetUp by Varun Vasudev.
technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- Overview of YARN
- New YARN Innovation in Hadoop 2.6
- Rolling upgrades
- Added fault tolerance
- CPU scheduling in Capacity Scheduler
- C-Group isolation
- Node labels
- Support for long running services
This is the presentation from Null/OWASP/g4h Bangalore December MeetUp by Vandana Verma.
technology.inmobi.com/events/null-owasp-g4h-december-meetup
Outline:
Security news from November and December 2014.
This is the presentation from Null/OWASP/g4h Bangalore October MeetUp by Narayanan Subramaniam.
technology.inmobi.com/events/null-october-meetup
Matriux is a GNU/Linux, Debian based security distribution designed for penetration testing and cyber forensic investigations. It is a distribution designed for security enthusiasts and professionals, can also be used normally as your default OS.
In the presentation , we will see how we can turn any system into a powerful penetration testing toolkit, without having to install any software into your hardisk. Matriux is designed to run from a Live environment like a CD / DVD or USB stick or it can easily be installed to your hard disk in a few steps.
This is the presentation from Null/OWASP/g4h Bangalore October MeetUp by Manasdeep.
http://technology.inmobi.com/events/null-october-meetup
This talk will focus on the general overview of the PCI-DSS standard and how does it help to protect the cardholder data. Changes introduced in the new PCI DSS v3.0 standard will further explore how it safeguards the Cardholder data environment for the various entities.
Talk Outline:
- PCI DSS v3 : An Overview
- PCI DSS: How it is different from other similar standards?
- PCI DSS vs ISO 27001
- Protecting Cardholder data through PCI DSS v3
- Common Myths regarding PCI DSS
- Security vs Compliance
This is the presentation from Bangalore Big Data November Meetup given by Davin Chaiken, AltiScale.
technology.inmobi.com/events/bigdata-meetup
Talk Outline:
- Altiscale Company Introduction and Perspective
- Altiscale Architecture
- Use Cases: Performance, Job Analysis, Scheduling
- Infinite Hadoop
- Challenges to the Hadoop Community
What is Shodan?
- Search engine for the Internet connected devices by John Matherly (@achillean).
- Probes devices on specific ports, aggregates the output and indexes aka Google for TCP banners
- Has a powerful API, Python & Ruby libraries
- Integration with Maltego, Metasploit & Armitage.
Rohit Chatter is a principal architect at inMobi who has 17 years of experience working for companies like Yahoo!, Tivo, and Alcatel Lucent. He specializes in designing big data solutions using technologies like Hadoop, Hive, and HBase. In this presentation, he discusses the opportunities and challenges of big data, including issues around data growth, access, and timely insights. He then describes the features a big data BI product should have, such as custom reports, dashboards, and the ability to ingest, define relationships, and visualize large amounts of data quickly and easily. Finally, he provides examples of how big data BI can help industries like media, e-commerce, and telecommunications.
Massively Parallel Processing with Procedural Python - Pivotal HAWQInMobi Technology
The document discusses massively parallel processing using procedural Python. It describes EMC Corporation and its subsidiaries which provide data storage, virtualization, security, and other software solutions. It also discusses Pivotal's open source contributions and the architecture of its HAWQ database which allows Python user-defined functions to perform parallel operations across clusters.
Tez is a data processing framework that allows dataflow jobs to be expressed as directed acyclic graphs (DAGs). It is built on top of YARN for resource management and aims to provide better performance than MapReduce by enabling container reuse, late binding of tasks, and simplifying operations. Tez defines APIs for developers to express DAGs and processing logic to customize jobs.
This presentation presents the common challenges in building an analytics platform (audience platform is chosen as the use case) and provides a few guidelines and recommendations on how to address them. The presentation starts with motivating the need for such a platform and the components that make it up. It then provides common design options for these components and suggests alternatives for them. The presentation concludes with a design proposal that is being evaluated for the audience platform in Inmobi.
In these slides, we explore the unique challenges that mobile data present. The high cardinality, low signal to noise ratio and realtime needs have significant system implications. We outline how InMobi tackles these challenges. A specific Data Science use case is also presented. We outline our approach to user segmentation. A brief description of the challenges faced and our attempts to address them is also included.
The document provides a report on the Freedom Hack event held on February 8-9, 2014 in Bangalore, India. It summarizes the hackathon process, including that over 160 teams registered, 47 teams were shortlisted, and 41 teams with 118 hackers ultimately participated in the 24-hour hacking event. It also describes the demographics of participants, speakers and judges at the event, online presence including social media statistics, and lists the winning teams with the first place team receiving a 50,000 INR cash prize.
This document provides an overview of Hadoop fundamentals including:
- Why Hadoop is used for big data applications due to its ability to handle petabytes of data across commodity hardware in a scalable and economical way.
- What Hadoop is and how it provides a distributed storage and processing infrastructure based on Google's papers using HDFS for storage and MapReduce for processing.
- How HDFS stores and replicates blocks of data across nodes to provide fault tolerance and how MapReduce uses a simple programming model of map and reduce functions to distribute processing.
- An example word count application is described to illustrate how MapReduce can be used to count word frequencies by mapping words to counts and then reducing the
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
2. 2
●
June 10, 2014 – branch 9.4
●
June 2014 – CF1 - Completed
●
August 2014 – CF2 - Completed
●
October 2014 – CF3 - Completed
●
December 2014 – CF4 - Completed
●
February 2015 – CF5 – In Progress
Development Status
3. 3
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
4. 4
Multi-column subselect
Update
● Update more than one column with subselect
● SQL standard syntax
UPDATE tab SET (col1, col2) =
(SELECT foo, bar FROM tab2)
WHERE ...
5. 5
SKIP LOCKED
● Like SELECT NOWAIT
● Except skip rows instead of error
postgres=# SELECT * FROM a FOR UPDATE NOWAIT;
ERROR: could not obtain lock on row in relation "a"
postgres=# SELECT * FROM a FOR UPDATE SKIP
LOCKED;
a | b | c
----+----+----
2 | 2 | 2
3 | 3 | 3
7. 7
Row Level Security
● Allows controlling at row level which
rows can be retrieved by SELECT or
manipulated using INSERT | UPDATE |
DELETE
● Need to define policies for tables
using Policy commands (CREATE | ALTER |
DROP Policy)
● Row Security needs to be enabled and
disabled by the owner on a per-table
basis using
ALTER TABLE .. ENABLE/DISABLE ROW
SECURITY.
8. 8
Row Level Security
● ROW SECURITY is disabled on tables by
default and must be enabled for
policies on the table to be used.
● If nopolicies exist on a table with ROW
SECURITY enabled, a default-deny policy
is used and no records will be visible.
● A new role capability, BYPASSRLS, which
can only be set by the superuser, is
added to allow other users to be able
to bypass row security using
row_security = OFF
9. 9
Row Level Security
● row_security - a new parameter in
postgresql.conf controls if row
security policies are to be applied to
queries which are run against tables
that have row security enabled.
on - all users, except superusers and
the owner of the table, will have the
row policies for the table applied to
their queries.
force - this is to apply policies for
superusers and owner of table.
off -will bypass row policies for the
table, if user doing operation has
BYPASSRLS attribute, and error if not.
10. 10
Row Level Security – How it works
● create table clients ( id serial primary key,
account_name text not null unique,
account_manager text not null
);
CREATE TABLE
create user peter;
CREATE ROLE
create user joanna;
CREATE ROLE
create user bill;
CREATE ROLE
11. 11
Row Level Security – How it works
● Grant appropriate permissions
grant all on table clients to peter, joanna, bill;
GRANT
grant all on sequence clients_id_seq to peter, joanna, bill;
GRANT
● Populate the table
insert into clients (account_name, account_manager)
values ('initrode', 'peter'), ('initech', 'bill'), ('chotchkie''s',
'joanna');
INSERT 0 3
12. 12
Row Level Security – How it works
● By default, all the rows are visible.
$ c - peter
$ select * from clients;
id | account_name | account_manager
----+--------------+-----------------
1 | initrode | peter
2 | initech | bill
3 | chotchkie's | joanna
(3 rows)
13. 13
Row Level Security – How it works
● Now lets create policies and enable row level security
create policy just_own_clients on clients
for all
to public
using ( account_manager = current_user );
CREATE POLICY
alter table clients ENABLE ROW LEVEL SECURITY;
ALTER TABLE
14. 14
Row Level Security – How it works
● Now, I can only see rows belonging to myself:
$ select * from clients;
id | account_name | account_manager
----+--------------+-----------------
1 | initrode | peter
(1 row)
$ c - joanna
$ select * from clients;
id | account_name | account_manager
----+--------------+-----------------
3 | chotchkie's | joanna
(1 row)
15. 15
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
16. 16
min and max wal size
● checkpoint_segments removed!
● Instead, control min and max size
● min_wal_size (default 80MB)
● max_wal_size (default 1GB)
● Checkpoints auto-tuned to happen in between
● Moving average of previous checkpoints
● Space only consumed when actually needed
17. 17
Foreign Table Inheritance
● Foreign tables can now be inheritance
children, or parents.
● PostgreSQL offers a way to do
partitioning by using
table inheritance and CHECK constraints
● This feature can be used for sharding
18. 18
Commit Timestamp Tracking
● Optional tracking of commit timestamps
● track_commit_timestamp=on
● Default is off and changing the value
of this parameter requires server
restart
● User can retrieve the information for
transactions that were committed after
above option is enabled
● Can be used by multimaster systems for
conflict resolution
20. 20
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
21. 21
pg_rewind
● a tool for synchronizing a PostgreSQL
cluster with another copy of the same
cluster, after the clusters' timelines
have diverged
● This is used to bring an old master
server back online after failover, as a
standby that follows the new master
● The advantage of pg_rewind over taking
a new base backup, or tools like rsync,
is that pg_rewind does not require
reading through all unchanged files in
the cluster
22. 22
pg_rewind
● It is lot faster when the database is
large and only a small portion of it
differs between the clusters
● The target server (old-master) must be
shut down cleanly before running
pg_rewind
● pg_rewind requires that the
wal_log_hints option is enabled in
postgresql.conf, or that data checksums
were enabled when the cluster was
initialized with initdb.
● full_page_writes must also be enabled.
23. 23
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
24. 24
BRIN
● Block Range Index
● Stores only bounds-per-block-range
● Default is 128 blocks
● Very small indexes
● Scans all blocks for matches
● Used for scanning large tables
25. 25
BRIN
=# CREATE TABLE brin_example AS SELECT
generate_series(1,100000000) AS id;
SELECT 100000000
=# CREATE INDEX btree_index ON
brin_example(id);
CREATE INDEX
Time: 239033.974 ms
=# CREATE INDEX brin_index ON
brin_example USING brin(id);
CREATE INDEX
Time: 42538.188 ms
26. 26
BRIN
=# CREATE TABLE brin_example AS SELECT
generate_series(1,100000000) AS id;
SELECT 100000000
=# CREATE INDEX btree_index ON brin_example(id);
CREATE INDEX
Time: 239033.974 ms
=# CREATE INDEX brin_index ON brin_example USING
brin(id);
CREATE INDEX
Time: 42538.188 ms
Conclusion – Brin index creation is much faster
27. 27
BRIN – Index creation with different block
ranges
=# CREATE INDEX brin_index_64 ON brin_example USING
brin(id) WITH (pages_per_range = 64);
CREATE INDEX
=# CREATE INDEX brin_index_256 ON brin_example USING
brin(id) WITH (pages_per_range = 256);
CREATE INDEX
=# CREATE INDEX brin_index_512 ON brin_example USING
brin(id) WITH (pages_per_range = 512);
CREATE INDEX
29. 29
BRIN – How it works
● A new index access method intended to
accelerate scans of very large tables,
without the maintenance overhead of
btrees or other traditional indexes.
● They work by maintaining "summary" data
about block ranges.
30. 30
BRIN – How it works
● For data types with natural 1-D sort
orders like integers, the summary info
consists of the maximum and the minimum
values of each indexed column within
each page range
● As new tuples are added into the index,
the summary information is updated if
the block range in which the tuple is
added is already summarized
● Otherwise subsequent pass of Vacuum or
the brin_summarize_new_values()
function will create the summary
information.
31. 31
Read Scalability
● We will see a boost in scalability
for read workload when the data can fit
in RAM. I have ran a pgbench read-only
load to compare the performance
difference between 9.4 and HEAD
(62f5e447)on IBM POWER-8 having 24
cores, 192 hardware threads, 492GB RAM
● The data is mainly taken for 2 kind of
workloads, when all the data fits in
shared buffers (scale_factor = 300) and
when all the data can't fit in shared
buffers, but can fit in RAM
(scale_factor = 1000)
32. 32
Read Scalability – Data fits in shared_buffers
1 8 16 32 64 128 256
0
100000
200000
300000
400000
500000
600000
pgbench -S -M prepared, PG9.5dev as of commit 62f5e4
median of 3 5-minute runs, scale_factor = 300, max_connections = 300, shared_buffers = 8GB
9.4
HEAD
Client Count
TPS
33. 33
Read Scalability
● In 9.4 it peaks at 32 clients, now it
peaks at 64 clients and we can see the
performance improvement upto (~98%) and
it is better in all cases at higher
client count starting from 32 clients
● The main work which lead to this
improvement is commit – ab5194e6
(Improve LWLock scalability)
34. 34
Read Scalability
● The previous implementation has a
bottleneck around spin locks that were
acquired for LWLock Acquisition and
Release and the implementation for 9.5
has changed the LWLock implementation
to use atomic operations to manipulate
the state.
35. 35
Read Scalability – Data fits in RAM
1 8 16 32 64 128 256
0
50000
100000
150000
200000
250000
300000
350000
400000
pgbench -S -M prepared, PG9.5dev as of commit 62f5e4
median of 3 5-minute runs, scale_factor = 1000, max_connections = 300, shared_buffers = 8GB
9.4
HEAD
Client Count
TPS
36. 36
Read Scalability
● In this case, we could see the good
performance improvement (~25%)even at
32 clients and it went upto (~96%) at
higher client count, in this case also
where in 9.4 it was peaking at 32
client count, now it peaks at 64 client
count and the performance is better
atall higher client counts.
● The main work which lead to this
improvement is commit – ab5194e6
(Improve LWLock scalability)
37. 37
Read Scalability
● In this case there were mainly 2
bottlenecks
● a BufFreeList LWLock was getting
acquired to find a free buffer for a
page
● to change the association of buffer in
buffer mapping hash table a LWLock is
acquired on a hash partition towhich
the buffer to be associated belongs and
as there were just 16 such partitions
38. 38
Read Scalability
● To reduce the bottleneck due to first
problem, used a spinlock which is held
just long enough to pop the freelist or
advance the clock sweep hand, and then
released
● To reduce the bottleneck due to second
problem, increase the buffer partitions
to 128
● The crux of this improvement is that we
had to resolve both the bottlenecks
together to see a major improvement in
scalability
39. 39
Parallel Vacuumdb
● vacuumdb can use concurrent connections
● Add -j<n> to command line
● Speed up important VACUUM or ANALYZE
● This option reduces the time of the
processing but it also increases the
load on the database server.This option
reduces the time of the processing but
it also increases the load on the
database server.
40. 40
Sorting Improvements
● Use abbreviated keys for faster sorting
of text
● transformation of strings into binary
keys using strxfrm(), and sorting the
keys instead
● using a strcmp()-based comparator with
the keys, which only considers raw byte
ordering
● abbreviate by taking the first few
characters of the strxfrm() blob.
41. 41
Sorting Improvements
● If the abbreviated comparison is
insufficent to resolve the comparison,
we fall back on the normal comparator.
● This can be much faster than the old
way of doing sorting if the first few
bytes of the string are usually
sufficient to resolve the comparison.
42. 42
Sorting Improvements
● As an example
create table stuff as select
random()::text as a, 'filler filler
filler'::text as b, g as c from
generate_series(1, 1000000) g;
SELECT 1000000
create index on stuff (a);
CREATE INDEX
● On PPC64 m/c, before this feature,
above operation use to take 6.3 seconds
and after feature it took just 1.9
seconds, which is 3x improvement.
Hooray!
43. 43
WAL Compression
● Optional compression for full page
images in WAL
● wal_compression=on
● Default is off, can be set by user and
doesn't require restart
● Support for compressing full page
images
44. 44
WAL Compression
● Smaller WAL
● Faster writes, faster replication
● Costs CPU
● Only compresses FPIs
● Still useful to gzip archives!
45. 45
Index Scan Optimization
● improved performance for Index Scan on ">" condition
● We can see performance improvement from 5 to 30 percent.
46. 46
● Thanks to Magnus Hagander who has presented the paper for
PostgreSQL 9.5 features in PGConf US 2015. Some of the
slides in this paper are from his paper. You can download his
slides from http://www.hagander.net/talks/
● Thanks to Hubert 'depesz' Lubaczewski and Michael Paquier
for writing blogs for new features in PostgreSQL 9.5. Some of
the examples used in this paper are taken from their blogs.