Comparisons are fundamental to computing - and comparing strings is not nearly as straightforward as you might think. Come learn about the history, nuance and surprises of “putting words in order” that you never knew existed in computer science, and how that nuance impacts both general programming and SQL programming. Next, walk through a few actual scenarios and demonstrations using PostgreSQL as a user and administrator, which you can re-run yourself later for further study, including one way you could easily corrupt your self-managed PostgreSQL database if you aren't prepared. Finally we’ll dive into an explanation of the surprising behaviors we saw in PostgreSQL, and learn more about user and administrative features PostgreSQL provides related to localized string comparison.
Percona Toolkit for Effective MySQL AdministrationMydbops
The document discusses various tools from the Percona Toolkit that can be used for effective MySQL administration. It describes tools like pt-config-diff to find configuration differences, pt-query-digest to analyze MySQL queries from logs, pt-duplicate-key-checker to check for duplicate indexes, and pt-table-checksum to perform replication consistency checks. Installation instructions and usage examples are provided for some of the key tools.
Tips And Tricks For Bioinformatics Software Engineeringjtdudley
This document provides tips and tricks for software engineering in bioinformatics. It discusses using object-oriented software design principles like encapsulation and inheritance. It also covers best practices like automating documentation, performance optimization, working with data using databases and file formats, parallel and distributed computing, hardware acceleration, and web services.
These slides were presented on a Software Craftsmanship meetup @ EPAM Hungary on 26 January, 2017.
During the talk we went through the evolution of structured data analytics in Spark. We compared the RDD, the SparkSQL (DataFrame) and the DataSet APIs. We used the very latest and greatest Spark 2.1, released on December 28, went through code samples and dove deep into Spark optimizations. The code samples can be downloaded from here: https://github.com/symat/spark-api-comparison
This document provides an introduction and overview of Cassandra and NoSQL databases. It discusses the challenges faced by modern web applications that led to the development of NoSQL databases. It then describes Cassandra's data model, API, consistency model, and architecture including write path, read path, compactions, and more. Key features of Cassandra like tunable consistency levels and high availability are also highlighted.
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
1) The document provides a quick guide to using data.table in R and Pentaho Data Integration (PDI) for fast data loading and manipulation. It discusses benchmarks showing data.table is 2-20x faster than traditional methods for reading, ordering, and transforming large data.
2) The outline discusses how to use basic data.table functions for speed gains and to overcome R's scaling limitations. It also provides a very brief overview of PDI's capabilities for Extract/Transform/Load (ETL) workflows without writing code.
3) The benchmarks section shows data.table is up to 500% faster than traditional R methods for reading large CSV files and orders of magnitude faster for sorting and aggregating
PostgreSQL is a free and open-source relational database management system that provides high performance and reliability. It supports replication through various methods including log-based asynchronous master-slave replication, which the presenter recommends as a first option. The upcoming PostgreSQL 9.4 release includes improvements to replication such as logical decoding and replication slots. Future releases may add features like logical replication consumers and SQL MERGE statements. The presenter took questions at the end and provided additional resources on PostgreSQL replication.
The document provides an overview of developing Python applications with MySQL Connector/Python:
- It introduces MySQL Connector/Python and its features like dual licensing, supported MySQL server versions, and choice of three APIs including the traditional PEP249 API and new X DevAPI.
- It demonstrates how to install MySQL Connector/Python using pip or MySQL Installer and covers other installation methods. Basic usage examples are provided for the traditional and X DevAPIs.
- Tips are given like using prepared statements to protect against SQL injection, checking for warnings, and recommendations for character sets and user privileges. The new MySQL X DevAPI for SQL and NoSQL is also overviewed.
Percona Toolkit for Effective MySQL AdministrationMydbops
The document discusses various tools from the Percona Toolkit that can be used for effective MySQL administration. It describes tools like pt-config-diff to find configuration differences, pt-query-digest to analyze MySQL queries from logs, pt-duplicate-key-checker to check for duplicate indexes, and pt-table-checksum to perform replication consistency checks. Installation instructions and usage examples are provided for some of the key tools.
Tips And Tricks For Bioinformatics Software Engineeringjtdudley
This document provides tips and tricks for software engineering in bioinformatics. It discusses using object-oriented software design principles like encapsulation and inheritance. It also covers best practices like automating documentation, performance optimization, working with data using databases and file formats, parallel and distributed computing, hardware acceleration, and web services.
These slides were presented on a Software Craftsmanship meetup @ EPAM Hungary on 26 January, 2017.
During the talk we went through the evolution of structured data analytics in Spark. We compared the RDD, the SparkSQL (DataFrame) and the DataSet APIs. We used the very latest and greatest Spark 2.1, released on December 28, went through code samples and dove deep into Spark optimizations. The code samples can be downloaded from here: https://github.com/symat/spark-api-comparison
This document provides an introduction and overview of Cassandra and NoSQL databases. It discusses the challenges faced by modern web applications that led to the development of NoSQL databases. It then describes Cassandra's data model, API, consistency model, and architecture including write path, read path, compactions, and more. Key features of Cassandra like tunable consistency levels and high availability are also highlighted.
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
1) The document provides a quick guide to using data.table in R and Pentaho Data Integration (PDI) for fast data loading and manipulation. It discusses benchmarks showing data.table is 2-20x faster than traditional methods for reading, ordering, and transforming large data.
2) The outline discusses how to use basic data.table functions for speed gains and to overcome R's scaling limitations. It also provides a very brief overview of PDI's capabilities for Extract/Transform/Load (ETL) workflows without writing code.
3) The benchmarks section shows data.table is up to 500% faster than traditional R methods for reading large CSV files and orders of magnitude faster for sorting and aggregating
PostgreSQL is a free and open-source relational database management system that provides high performance and reliability. It supports replication through various methods including log-based asynchronous master-slave replication, which the presenter recommends as a first option. The upcoming PostgreSQL 9.4 release includes improvements to replication such as logical decoding and replication slots. Future releases may add features like logical replication consumers and SQL MERGE statements. The presenter took questions at the end and provided additional resources on PostgreSQL replication.
The document provides an overview of developing Python applications with MySQL Connector/Python:
- It introduces MySQL Connector/Python and its features like dual licensing, supported MySQL server versions, and choice of three APIs including the traditional PEP249 API and new X DevAPI.
- It demonstrates how to install MySQL Connector/Python using pip or MySQL Installer and covers other installation methods. Basic usage examples are provided for the traditional and X DevAPIs.
- Tips are given like using prepared statements to protect against SQL injection, checking for warnings, and recommendations for character sets and user privileges. The new MySQL X DevAPI for SQL and NoSQL is also overviewed.
This document summarizes new features and improvements in MySQL 8.0. Key highlights include utf8mb4 becoming the default character set to support Unicode 9.0, performance improvements for utf8mb4 of up to 1800%, continued enhancements to JSON support including new functions, expanded GIS functionality including spatial reference system support, and new functions for working with UUIDs and bitwise operations. It also provides a brief history of MySQL and outlines performance improvements seen in benchmarks between MySQL versions.
The document summarizes the scaling challenges faced by Fotolog, a large photo blogging community. It discusses how Fotolog grew to hosting hundreds of millions of photos and billions of comments. It describes Fotolog's technology stack including their use of MySQL, Memcached, 3Par storage and CDNs. It also outlines some of the MySQL scaling techniques used, such as sharding, replication, table partitioning and optimization.
These are the slides from my presentation at MySQL Conference and Expo 2007 held in Santa Clara, CA. The talk was focused on scaling InnoDB to meet Fotolog's unique challenges.
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2dXUUTG.
Nathan Taylor provides an introduction to the dynamic analysis research space, suggesting integrating these techniques into various internal tools. Filmed at qconnewyork.com.
Nathan Taylor is a software developer currently employed at Fastly, where he works on making the Web faster through high performance content delivery. Previous gigs have included hacking on low-level systems software such as Java runtimes at Twitter and, prior to that, the Xen virtual machine monitor in grad school.
The document discusses creating an optimized algorithm in R. It covers writing functions and algorithms in R, creating R packages, and optimizing code performance using parallel computing and high performance computing. Key steps include reviewing existing algorithms, identifying gaps, testing and iterating a new algorithm, publishing the work, and making the algorithm available to others through an R package.
R is an open-source statistical programming language that can be used for data analysis and visualization. The document provided an introduction to R including how to install R, create variables, import and assemble data, perform basic statistical analyses like t-tests and linear regression, and create plots and graphs. Key functions and concepts introduced included using c() to combine values into vectors, reading in data from CSV files, using lm() for linear regression, and the basic plot() function.
Jump Start into Apache® Spark™ and DatabricksDatabricks
These are the slides from the Jump Start into Apache Spark and Databricks webinar on February 10th, 2016.
---
Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
The document summarizes the Percona Toolkit, which contains free and open source command-line tools for MySQL based on Percona's experience developing best practices. Some of the most popular tools are pt-summary, pt-mysql-summary, pt-stalk, pt-archiver, and pt-query-digest, which allow users to summarize MySQL servers, analyze queries from logs, and check for issues. The toolkit can be installed via package repositories or by downloading individual tools.
Introduces important facts and tools to help you get starting with performance improvement.
Learn to monitor and analyze important metrics, then you can start digging and improving.
Includes useful munin probes, predefined SQL queries to investigate your database's performance, and a top 5 of the most common performance problems in custom Apps.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
Are blade server suitable for HPTC? This talk covers the pros and cons of building your next cluster using blades.
Talk given at International Supercomputing blade workshop in 2007.
This talk will present R as a programming language suited for solving data analysis and modeling problems, MLflow as an open source project to help organizations manage their machine learning lifecycle and the intersection of both by adding support for R in MLflow. It will be highly interactive and touch on some of the technical implementation choices taken while making R available in MLflow. It will also demonstrate using MLflow tracking, projects, and models directly from R as well as reusing R models in MLflow to interoperate with other programming languages and technologies.
Node is used to build a reverse proxy to provide secure access to internal web resources and sites for mobile clients within a large enterprise. Performance testing shows the proxy can handle over 1000 requests per second with latency under 1 second. Code quality analysis tools like Plato and testing frameworks like Jest are useful for maintaining high quality code. Scalability is achieved through auto-scaling virtual machine instances with a load balancer and configuration management.
1. The document discusses using graphics and data visualization to improve understanding of database performance issues and SQL tuning. It provides examples of how visualizations can clearly show relationships in complex SQL queries and data that are difficult to understand from text or code alone.
2. Key steps in visual SQL tuning are laid out, including drawing tables as nodes, joins as connection lines, and filters as markings on tables. This helps identify optimization opportunities like missing indexes or stale statistics.
3. The document emphasizes that a lack of clarity in visualizing complex data and queries can have devastating consequences, while graphics enable easy understanding and effective problem-solving.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
JCConf 2022 - New Features in Java 18 & 19Joseph Kuo
This document summarizes Joseph Kuo's presentation on new features in Java 18 and 19. It discusses survey results on the state of the Java ecosystem from TIOBE Index, GitHub Octoverse, and Stack Overflow. It then covers new language features including simple web server, UTF-8 default encoding, code snippets in JavaDoc, pattern matching for switch/instanceof, record patterns, vector API, virtual threads, and preview features.
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
The document discusses using the Rosie Pattern Language (RPL) instead of regular expressions for parsing log and data files. RPL aims to address issues with regex like readability, maintainability, and performance. It describes how RPL is designed like a programming language with common patterns. RPL patterns are loaded into the Rosie Pattern Engine which can parse files and annotate text with semantic tags.
=-=-=-==-=-Overview of the Talk-=-=-=-=-=
Introduction to the Subject
Database
Rational Database
Object Rational Database
Database Management System
History
Programming
SQL,
Connecting Java, Matlab to a Database
Advance DBMS
Data Grid
BigTable
Demo
Products
MySQL, SQLite, Oracle,
DB2, Microsoft Access,
Microsoft SQL Server
Products Comparison.
The document proposes an IT infrastructure for Shiv LLC, a company with locations in Los Angeles, Dallas, and Houston. It recommends implementing an Active Directory domain to enable communication and file sharing across the three locations. A centralized file server would store common files and applications. Each location would have its own local area network, connected to the other sites and to the internet via VPN. Firewalls, antivirus software, and regular backups would help secure the network and protect company data. The design allows for future growth and expansion as the company scales up.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
More Related Content
Similar to String Comparison Surprises: Did Postgres lose my data?
This document summarizes new features and improvements in MySQL 8.0. Key highlights include utf8mb4 becoming the default character set to support Unicode 9.0, performance improvements for utf8mb4 of up to 1800%, continued enhancements to JSON support including new functions, expanded GIS functionality including spatial reference system support, and new functions for working with UUIDs and bitwise operations. It also provides a brief history of MySQL and outlines performance improvements seen in benchmarks between MySQL versions.
The document summarizes the scaling challenges faced by Fotolog, a large photo blogging community. It discusses how Fotolog grew to hosting hundreds of millions of photos and billions of comments. It describes Fotolog's technology stack including their use of MySQL, Memcached, 3Par storage and CDNs. It also outlines some of the MySQL scaling techniques used, such as sharding, replication, table partitioning and optimization.
These are the slides from my presentation at MySQL Conference and Expo 2007 held in Santa Clara, CA. The talk was focused on scaling InnoDB to meet Fotolog's unique challenges.
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2dXUUTG.
Nathan Taylor provides an introduction to the dynamic analysis research space, suggesting integrating these techniques into various internal tools. Filmed at qconnewyork.com.
Nathan Taylor is a software developer currently employed at Fastly, where he works on making the Web faster through high performance content delivery. Previous gigs have included hacking on low-level systems software such as Java runtimes at Twitter and, prior to that, the Xen virtual machine monitor in grad school.
The document discusses creating an optimized algorithm in R. It covers writing functions and algorithms in R, creating R packages, and optimizing code performance using parallel computing and high performance computing. Key steps include reviewing existing algorithms, identifying gaps, testing and iterating a new algorithm, publishing the work, and making the algorithm available to others through an R package.
R is an open-source statistical programming language that can be used for data analysis and visualization. The document provided an introduction to R including how to install R, create variables, import and assemble data, perform basic statistical analyses like t-tests and linear regression, and create plots and graphs. Key functions and concepts introduced included using c() to combine values into vectors, reading in data from CSV files, using lm() for linear regression, and the basic plot() function.
Jump Start into Apache® Spark™ and DatabricksDatabricks
These are the slides from the Jump Start into Apache Spark and Databricks webinar on February 10th, 2016.
---
Spark is a fast, easy to use, and unified engine that allows you to solve many Data Sciences and Big Data (and many not-so-Big Data) scenarios easily. Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning, and graph processing. We will leverage Databricks to quickly and easily demonstrate, visualize, and debug our code samples; the notebooks will be available for you to download.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
The document summarizes the Percona Toolkit, which contains free and open source command-line tools for MySQL based on Percona's experience developing best practices. Some of the most popular tools are pt-summary, pt-mysql-summary, pt-stalk, pt-archiver, and pt-query-digest, which allow users to summarize MySQL servers, analyze queries from logs, and check for issues. The toolkit can be installed via package repositories or by downloading individual tools.
Introduces important facts and tools to help you get starting with performance improvement.
Learn to monitor and analyze important metrics, then you can start digging and improving.
Includes useful munin probes, predefined SQL queries to investigate your database's performance, and a top 5 of the most common performance problems in custom Apps.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
Are blade server suitable for HPTC? This talk covers the pros and cons of building your next cluster using blades.
Talk given at International Supercomputing blade workshop in 2007.
This talk will present R as a programming language suited for solving data analysis and modeling problems, MLflow as an open source project to help organizations manage their machine learning lifecycle and the intersection of both by adding support for R in MLflow. It will be highly interactive and touch on some of the technical implementation choices taken while making R available in MLflow. It will also demonstrate using MLflow tracking, projects, and models directly from R as well as reusing R models in MLflow to interoperate with other programming languages and technologies.
Node is used to build a reverse proxy to provide secure access to internal web resources and sites for mobile clients within a large enterprise. Performance testing shows the proxy can handle over 1000 requests per second with latency under 1 second. Code quality analysis tools like Plato and testing frameworks like Jest are useful for maintaining high quality code. Scalability is achieved through auto-scaling virtual machine instances with a load balancer and configuration management.
1. The document discusses using graphics and data visualization to improve understanding of database performance issues and SQL tuning. It provides examples of how visualizations can clearly show relationships in complex SQL queries and data that are difficult to understand from text or code alone.
2. Key steps in visual SQL tuning are laid out, including drawing tables as nodes, joins as connection lines, and filters as markings on tables. This helps identify optimization opportunities like missing indexes or stale statistics.
3. The document emphasizes that a lack of clarity in visualizing complex data and queries can have devastating consequences, while graphics enable easy understanding and effective problem-solving.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
JCConf 2022 - New Features in Java 18 & 19Joseph Kuo
This document summarizes Joseph Kuo's presentation on new features in Java 18 and 19. It discusses survey results on the state of the Java ecosystem from TIOBE Index, GitHub Octoverse, and Stack Overflow. It then covers new language features including simple web server, UTF-8 default encoding, code snippets in JavaDoc, pattern matching for switch/instanceof, record patterns, vector API, virtual threads, and preview features.
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
The document discusses using the Rosie Pattern Language (RPL) instead of regular expressions for parsing log and data files. RPL aims to address issues with regex like readability, maintainability, and performance. It describes how RPL is designed like a programming language with common patterns. RPL patterns are loaded into the Rosie Pattern Engine which can parse files and annotate text with semantic tags.
=-=-=-==-=-Overview of the Talk-=-=-=-=-=
Introduction to the Subject
Database
Rational Database
Object Rational Database
Database Management System
History
Programming
SQL,
Connecting Java, Matlab to a Database
Advance DBMS
Data Grid
BigTable
Demo
Products
MySQL, SQLite, Oracle,
DB2, Microsoft Access,
Microsoft SQL Server
Products Comparison.
The document proposes an IT infrastructure for Shiv LLC, a company with locations in Los Angeles, Dallas, and Houston. It recommends implementing an Active Directory domain to enable communication and file sharing across the three locations. A centralized file server would store common files and applications. Each location would have its own local area network, connected to the other sites and to the internet via VPN. Firewalls, antivirus software, and regular backups would help secure the network and protect company data. The design allows for future growth and expansion as the company scales up.
Similar to String Comparison Surprises: Did Postgres lose my data? (20)
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Best 20 SEO Techniques To Improve Website Visibility In SERP
String Comparison Surprises: Did Postgres lose my data?
1. String Comparison Surprises:
Did Postgres Lose My Data?
Putting words in order
without losing your mind or your data
Jeremy Schneider
2. Jeremy
DB & Perf Engineer / AWS
Slack: pgtreats.info/slack-invite
Blog: ardentperf.com
Twitter/X: @jer_s
Organizer / Seattle PG User Group
• Engineering for Aurora and RDS Open Source: Aurora PostgreSQL
first GA release (2017), ICU & collation, data durability &
corruption protection, performance, fleet data collection and
analysis, early logical replication work, new major versions,
addressing complex operational issues, recruiting & training
• Launched an internal Amazon-wide dedicated PostgreSQL chat
channel in 2018 which now has thousands of members and
numerous technical discussions daily
• Founder of "RAC Attack" Community Driven Oracle Cluster
Database Workshop – almost 40 events across 15 countries
between 2011 and 2016
• Participant, speaker and/or organizer at Linux and Oracle
user groups & conferences since the early 2000’s. PostgreSQL user
groups & conferences since 2017.
• Database blog since 2007, Oracle ACE Alumni, Oak Table
Schneider
7. #PASSDataCommunitySummit
create table arabic_dictionary_research (
word text,
crossreferences text,
notes text
) partition by range (word);
create table arabic_dictionary_research_p1 partition of arabic_dictionary_research
for values from (')'ا to (';)'ح
create table arabic_dictionary_research_p2 partition of arabic_dictionary_research
for values from (')'ح to (';)'س
create table arabic_dictionary_research_p3 partition of arabic_dictionary_research
for values from (')'س to (';)'ل
create table arabic_dictionary_research_p4 partition of arabic_dictionary_research
for values from (')'ل to (';)'ے
create table arabic_dictionary_research_p5 partition of arabic_dictionary_research
default;
14. #PASSDataCommunitySummit
aws ec2 run-instances
--instance-type t2.micro --key-name mac --tag-specifications
'ResourceType=instance,Tags=[{Key=Name,Value=research-db-hotstandby}]'
--image-id ami-0fd2c44049dd805b8 --region us-east-1
sudo apt install postgresql-common
sudo sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install postgresql-15
# cut and paste instructions from
# https://ubuntu.com/server/docs/databases-postgresql
# to easily set up the hot standby database
19. #PASSDataCommunitySummit
• Verify Backup and Log File Retention (long enough for investigation)
• Articulate and Write the Business Impact at Present
• Freeze Ongoing Changes (any dev teams)
• Inventory Copies of Data
• Safely Scan to Determine If There’s More Corruption
• Follow General Best Practices
• Two-person rule, rename/move not delete, verify/compare healthy neighboring data,
test remediations before applying on prod, document everything.
Checklist for Responding to Data Corruption
https://ardentperf.com/2019/11/08/postgresql-invalid-page-and-checksum-verification-failed/
30. #PASSDataCommunitySummit
1978: JIS C 6226 (JIS X 0208) “2-byte” character set was
developed to express hiragana and kanji characters (Japan);
followed by GB/T 2312 (1980, China) and Big5 (1984, Taiwan)
1981: Xerox 8010 Information System released; family included
16-bit encodings and 27 languages (sadly didn’t sell)
1985-1987: Discussions between Xerox & Apple Engineers about
need for “one universal character set”, soon joined by engineers
from many other companies (IBM, Microsoft, Sun, NeXT, Novell,
etc.) as well as Toronto University
1991: Unicode Consortium Incorporated to 'standardize, extend
and promote the Unicode character encoding, a fixed-width, 16-
bit character encoding for over 60,000 graphic characters.’
1991: First Unicode Standard is published
“…begin at 0 and add the next character”
43. #PASSDataCommunitySummit
Putting Strings In Order
banqueta
baño
Baptisto
como
chorizo
Baptisto
banqueta
baño
chorizo
como
baño
banqueta
Baptisto
chorizo
como
select * from (values('Baptisto'),('banqueta'),('baño'),('como'),('chorizo’)) list(word)
order by word;
44. #PASSDataCommunitySummit
• Contractions: two (or more) characters sort as if they
were a single base letter. In Table 4, CH acts like a
single letter sorted after C.
• Expansions: a single character sorts as if it were a
sequence of two (or more) characters. In Table 4, an Œ
ligature sorts as if it were the sequence of O + E.
• Backwards Accent: In row 1 of Table 5, the first accent
difference is on the o, so that is what determines the
order. In some French
dictionary ordering traditions, however,
it is the last accent difference that
determines the order, as shown in row 2.
Linguistic Collation is Complex
https://www.unicode.org/reports/tr10/
https://www.cybertec-postgresql.com/en/case-insensitive-pattern-matching-in-postgresql/
47. #PASSDataCommunitySummit
• 6.1 (1997) – FAQ instructions to set locale as user running PG (for formatting)
• 6.3.1 (1998) – Multibyte support at database level
• 7.3 (2002) – Multibyte & locale by default, remove --enable-multibyte
• 8.4 (2009) – Locale (LC_COLLATE) at database level
• 9.1 (2011) – Support collation per-column and COLLATE clause
• 10 (2017) – Support ICU at column level
• 13 (2020) – Track collation-related versions for some Operating Systems
• 15 (2022) – Support ICU collation at database/cluster level
• 16 (2023) – Support custom ICU collation rules, build ICU by default
• Future – Discussion: multiple ICU versions, initdb use ICU by default
Tatsuo Ishii, Oleg Bartunov, Oleg Broytmann, Josef Balatka, Radek Strnad, Heikki Linnakangas, Tom Lane, Peter Eisentraut, Jeff Davis, and more…
Localization Support in PostgreSQL
48. #PASSDataCommunitySummit
• Formatting in TO_CHAR, TO_NUMBER, etc
• Automatic “recoding” of data and query results to client encoding
• Encoding and language/translation of messages (error, warning, etc)
• Character classification (letter, number, punctuation, whitespace, etc)
Flexible Collation:
• ORDER BY
• LIKE, regex
• =
• Partitions, Constraints, Generated Columns, Triggers…
• Any expression that involves strings (cf. COLLATE sql clause)
Localization Support in PostgreSQL
50. PostgreSQL does not include its own
collation code. Instead, it depends on
an external library installed and
managed separately.
PostgreSQL Collation Libraries
53. #PASSDataCommunitySummit
Levels of Defaults:
• OS Environment (for initdb)
• Template0/1 (for database)
• Database
• Table/Column
• Data Type (for constants)
• Explicit in SQL statement
Collation Precedence in PostgreSQL
Conflict Resolution Rules:
1. Explicit > Implicit
2. Non-default > Default
3. Indeterminate collation only
raises error if collation is needed
at runtime
Docs: Part III (Server Admin)
Chapter 24 (Localization)
Part 24.2 (Collation Support)
54. #PASSDataCommunitySummit
• Case insensitive comparison
• Comparison of base characters, ignoring accents
• Example: count rows where user input was Mexico, México, mexico, or méxico
• Compare digits by numeric value
• Example: id-45 < id-123
• Ignore whitespace, so that similar strings are kept close together
• By default, glibc keeps similar strings close but with ICU whitespace can cause similar
strings to sort far apart from each other.
• Example: “full time” and “full-time” and “fulltime”
• May get extra performance by comparing without normalizing
• Safe for strings that are system-generated and guaranteed to be consistent, or that are pre-
normalized
Advanced Collation Support with ICU
58. #PASSDataCommunitySummit
Version management remains a significant challenge for collation:
• With both libc and ICU, changing the version of the library can cause
corruption that isn’t noticed until long afterwards.
• For OS upgrades, it’s safe to use dump-and-load or logical replication.
• Safe to install old ICU version on new OS, but may need to build from source.
• The “ALTER … REFRESH COLLATION” command is dangerous.
• Does not change anything; instructs database to permanently forget the warning.
• On version mismatch, PostgreSQL unfortunately issues WARNING instead
of ERROR, perhaps only to server logs. Responsibility of the admin to know
you shouldn’t change the OS/libs, and that it means possible corruption.
Collation Challenges in PostgreSQL
59. #PASSDataCommunitySummit
With both libc and ICU, changing the
version of the library can cause corruption
that isn’t noticed until long afterwards.
Can trigger a version change:
• OS Upgrade
• Failover and Hot Standby
• Patroni, Kubernetes, etc
• Distributed Systems
Collation Challenges in PostgreSQL
Can be corrupted by version change:
• Indexes
• All types, not just btree
• Constraints
• All types, not just unique/primary-key
• Partitions
• FDWs – eg. mergejoin depends on same
local/remote ordering
• Maybe: un-refreshed materialized views,
triggers, generated columns? (I’m not sure)
67. #PASSDataCommunitySummit
Version management remains a significant challenge for collation:
• With both libc and ICU, changing the version of the library can cause
corruption that isn’t noticed until long afterwards.
• For OS upgrades, it’s safe to use dump-and-load or logical replication.
• Safe to install old ICU version on new OS, but may need to build from source.
• The “ALTER … REFRESH COLLATION” command is dangerous.
• Does not change anything; instructs database to permanently forget the warning.
• On version mismatch, PostgreSQL unfortunately issues WARNING instead
of ERROR, perhaps only to server logs. Responsibility of the admin to know
you shouldn’t change the OS/libs, and that it means possible corruption.
Localization Challenges in PostgreSQL
68. #PASSDataCommunitySummit
Version management remains a significant challenge for collation:
• With both libc and ICU, changing the version of the library can cause
corruption that isn’t noticed until long afterwards.
• For OS upgrades, it’s safe to use dump-and-load or logical replication.
• Safe to install old ICU version on new OS, but may need to build from source.
• The “ALTER … REFRESH COLLATION” command is dangerous.
• Does not change anything; instructs database to permanently forget the warning.
• On version mismatch, PostgreSQL unfortunately issues WARNING instead
of ERROR, perhaps only to server logs. Responsibility of the admin to know
you shouldn’t change the OS/libs, and that it means possible corruption.
Localization Challenges in PostgreSQL
71. #PASSDataCommunitySummit
Collation Torture Test
Data to answer the questions:
Is this really a problem?
How common are sort order changes?
• 10 years of historical versions
• Ubuntu and RHEL
• ICU and glibc
github.com/ardentperf/glibc-unicode-sorting
74. #PASSDataCommunitySummit
• In summary: both glibc and ICU have regular collation
changes. Both have had at least one release with very
large numbers of changes.
• Code published on github to generate the 26 million
strings with PL/pgSQL snippet
https://github.com/ardentperf/glibc-unicode-sorting/blob/main/run-icu.sh#L65
Collation Torture Test
77. #PASSDataCommunitySummit
• 6.1 (1997) – FAQ instructions to set locale as user running PG (for formatting)
• 6.3.1 (1998) – Multibyte support at database level
• 7.3 (2002) – Multibyte & locale by default, remove --enable-multibyte
• 8.4 (2009) – Locale (LC_COLLATE) at database level
• 9.1 (2011) – Support collation per-column and COLLATE clause
• 10 (2017) – Support ICU at column level
• 13 (2020) – Track collation-related versions for some Operating Systems
• 15 (2022) – Support ICU collation at database/cluster level
• 16 (2023) – Support custom ICU collation rules, build ICU by default
• Future – Discussion: multiple ICU versions, initdb use ICU by default
Tatsuo Ishii, Oleg Bartunov, Oleg Broytmann, Josef Balatka, Radek Strnad, Heikki Linnakangas, Tom Lane, Peter Eisentraut, Jeff Davis, and more…
Localization Support in PostgreSQL
78. #PASSDataCommunitySummit
• 6.1 (1997) – FAQ instructions to set locale as user running PG (for formatting)
• 6.3.1 (1998) – Multibyte support at database level
• 7.3 (2002) – Multibyte & locale by default, remove --enable-multibyte
• 8.4 (2009) – Locale (LC_COLLATE) at database level
• 9.1 (2011) – Support collation per-column and COLLATE clause
• 10 (2017) – Support ICU at column level
• 13 (2020) – Track collation-related versions for some Operating Systems
• 15 (2022) – Support ICU collation at database/cluster level
• 16 (2023) – Support custom ICU collation rules, build ICU by default
• Future – Discussion: multiple ICU versions, initdb use ICU by default
Tatsuo Ishii, Oleg Bartunov, Oleg Broytmann, Josef Balatka, Radek Strnad, Heikki Linnakangas, Tom Lane, Peter Eisentraut, Jeff Davis, and more…
Localization Support in PostgreSQL
79. Key Takeaways
• Use collation features for more readable and elegant SQL, when doing fuzzy
comparisons or multi-lingual sorting
• ICU brings powerful new capabilities around linguistic collation
• Assume there are exotic unexpected characters in your data
• Move toward using ICU in PostgreSQL (for both safety and capability)
• When upgrading your operating system, (1) dump or (2) logical or (3) use old ICU
• Not a great idea to under-pay your administrators.
Give them lots of thanks and some extra vacation time. 🏝