Speaker: Joe Conway
There are many use cases for text search and pattern matching, and there are also a wide variety of techniques available in PostgreSQL to perform text search and pattern matching. Figuring out the best "match" between use case and technique can be confusing. This talk will review the possibilities and provide guidance regarding when to use what method, and especially how to properly deal with the related index methods to ensure speedy searches. This talk covers:
* The primary available search methods
* Examples illustrating when to use each
* Extensive discussion of index use
* Timing comparisons using realistic examples
Text mining and social network analysis of twitter data part 1Johan Blomme
Twitter is one of the most popular social networks through which millions of users share information and express views and opinions. The rapid growth of internet data is a driver for mining the huge amount of unstructured data that is generated to uncover insights from it.
In the first part of this paper we explore different text mining tools. We collect tweets containing the “#MachineLearning” hashtag, prepare the data and run a series of diagnostics to mine the text that is contained in tweets. We also examine the issue of topic modeling that allows to estimate the similarity between documents in a larger corpus.
Birds of a feather session on Nagios Plugins New Threshold Specification Syntax.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Natural Language Processing in R (rNLP)fridolin.wild
The introductory slides of a workshop given to the doctoral school at the Institute of Business Informatics of the Goethe University Frankfurt. The tutorials are available on http://crunch.kmi.open.ac.uk/w/index.php/Tutorials
Text mining and social network analysis of twitter data part 1Johan Blomme
Twitter is one of the most popular social networks through which millions of users share information and express views and opinions. The rapid growth of internet data is a driver for mining the huge amount of unstructured data that is generated to uncover insights from it.
In the first part of this paper we explore different text mining tools. We collect tweets containing the “#MachineLearning” hashtag, prepare the data and run a series of diagnostics to mine the text that is contained in tweets. We also examine the issue of topic modeling that allows to estimate the similarity between documents in a larger corpus.
Birds of a feather session on Nagios Plugins New Threshold Specification Syntax.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Natural Language Processing in R (rNLP)fridolin.wild
The introductory slides of a workshop given to the doctoral school at the Institute of Business Informatics of the Goethe University Frankfurt. The tutorials are available on http://crunch.kmi.open.ac.uk/w/index.php/Tutorials
Question 1 1 pts Skip to question text.As part of a bank account.docxamrit47
Question 1 1 pts Skip to question text.
As part of a bank account implementation, there is an account class and a checking account class. These two classes should be related by:
polymorphism
abstract classes
both composition and inheritance
inheritance
composition
Flag this Question
Question 2 1 pts
When using OOP, which of the following terms refers to a mechanism for a behavior, basically how it’s implemented?
composition
inheritance
polymorphism
dynamic binding
Flag this Question
Question 3 1 pts
To access an element in an Array object,
Use the ArrayList's get() method.
Use the ArrayList's element() method.
Individual elements in an ArrayList can’t be accessed without doing a sequential query getSequential(), returning every element up to and including the element requested.
Use square brackets around an index value.
Flag this Question
Question 4 1 pts
To access an element in an ArrayList object,
Use square brackets around an index value.
Use the ArrayList element() method.
Use the ArrayList get() method.
Individual elements in an ArrayList can’t be accessed without doing a sequential query getSequential(), returning every element up to and including the element requested.
Flag this Question
Question 5 1 pts
What term below is defined as a message that tells the program that something has happened?
an interaction
a listener
an action
an event
Flag this Question
Question 6 1 pts
Which item below is defined as an object?
A String
An Array
All of the above
An ArrayList
Flag this Question
Question 7 1 pts
When a text-box-enter event occurs, which method and parameter are required to handle this type of action? (1 point)
actionEvent with an actionperformed parameter
actionListener with an interfaceID parameter
windowListener with an eventID parameter
actionPerformed with an actionEvent parameter
Flag this Question
Question 8 1 pts
Which class includes the setTitle and setSize methods?
JFrame
JWindow
JBox
JOptionpane
Flag this Question
Question 9 1 pts
Before utilizing the binary search method, __________ must be done to the array?
indexing
splitting
sorting
importing
Flag this Question
Question 10 1 pts
Which layout manager implements a one-compartment layout scheme?
GridlessLayout
GridBagLayout
GridLayout
FlowLayout
BorderLayout
Flag this Question
Question 11 1 pts
What is the default layout manager for a JFrame window?
GridBagLayout
GridLayout
FlowLayout
GridlessLayout
BorderLayout
Flag this Question
Question 12 1 pts
To call the superclass constructor, super() must be the first line in a constructor.
True
False
Flag this Question
Question 13 1 pts
Method overriding is when a method has the same name, same sequence of parameter types, and the same return type as a method in a superclass.
True
False
Flag this Question
Question 14 1 pts
Type casting, also known as promotion is ...
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/ibrettflorio
REGEX! Love it or hate it, sometimes you actually need it. And when that time comes, there's no reason to be afraid or to ask help from that one weirdo on your team who actually loves regular expressions. (I'm that weirdo, fwiw.)
This session is geared towards beginning and intermediate regex users, as well as experienced programmers and developers who just don't really grok regex. We'll cover the following topics using practical examples that you might encounter in your own projects. (ie. No matching against "dog" and "cat".)
* What is regex? How's it work? A brief history.
* Syntax, special characters, character classes
* Grouping, capturing, and common gotchas
* Use cases for matching, validating, and replacing
* More advanced topics like backreferences and lookarounds
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxmonicafrancis71118
COMM 166 Final Research Proposal Guidelines
The proposal should contain well-developed sections (Put clear titles on the top of each section) of your outline that you submitted earlier. The proposal should have seven (7) major sections:
1. Introduction: A brief overview of all your sections. Approx. one page
2. A summary of the literature review. In this section you would summarize the previous research (summarize at least 8-10 scholarly research articles), and also your field data collection results (if it was connected to your proposal topic). Also indicate the gaps in the previous research, including your pilot study, and the need for your research study. Please devote around three pages in reviewing the previous research and finding the gaps.
3. Arising from the literature review, write the Purpose Statement of your research (purpose statement should have all its parts clearly written. Follow the examples from textbook).
4. Identify two to three main hypotheses or research questions (based on the quantitative/qualitative research design). Also give some of your supporting research questions. Follow the examples from textbook.
5. Describe the research strategy of inquiry and methods that you would use and why. The method part should be the substantial part of your paper, around three pages. Define your knowledge claims, strategies, and methods from the textbook (and cite), why you chose them, and how you will conduct the research in detail.
6. A page on the significance of your study.
7. A complete reference list of your sources in APA style.
The total length of the paper should be between 8-10 pages (excluding the reference and cover pages).
If you have further questions, please do not hesitate to contact me.
Best wishes
Dev
mportant notes about grading:
1. Compiler errors: All code you submit must compile. Programs that do not compile will receive an automatic zero. If you run out of time, it is better to comment out the parts that do not compile, than hand in a more complete file that does not compile.
2. Late assignments: You must submit your code before the deadline. Verify on Sakai that you have submitted the correct version. If you submit the incorrect version before the deadline and realize that you have done so after the deadline, we will only grade the version received before the deadline.
A Prolog interpreter
In this project, you will implement a Prolog interpreter in OCaml.
If you want to implement the project in Python, download the source code here and follow the README file. Parsing functions and test-cases are provided.
Pseudocode
Your main task is to implement the non-deterministic abstract interpreter covered in the lecture Control in Prolog. The pseudocode of the abstract interpreter is in the lecture note.
Bonus
There is also a bonus task for implementing a deterministic Prolog interpreter with support for backtracking (recover from bad choices) and choice points (produce multiple results). Please refer t.
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxcargillfilberto
COMM 166 Final Research Proposal Guidelines
The proposal should contain well-developed sections (Put clear titles on the top of each section) of your outline that you submitted earlier. The proposal should have seven (7) major sections:
1. Introduction: A brief overview of all your sections. Approx. one page
2. A summary of the literature review. In this section you would summarize the previous research (summarize at least 8-10 scholarly research articles), and also your field data collection results (if it was connected to your proposal topic). Also indicate the gaps in the previous research, including your pilot study, and the need for your research study. Please devote around three pages in reviewing the previous research and finding the gaps.
3. Arising from the literature review, write the Purpose Statement of your research (purpose statement should have all its parts clearly written. Follow the examples from textbook).
4. Identify two to three main hypotheses or research questions (based on the quantitative/qualitative research design). Also give some of your supporting research questions. Follow the examples from textbook.
5. Describe the research strategy of inquiry and methods that you would use and why. The method part should be the substantial part of your paper, around three pages. Define your knowledge claims, strategies, and methods from the textbook (and cite), why you chose them, and how you will conduct the research in detail.
6. A page on the significance of your study.
7. A complete reference list of your sources in APA style.
The total length of the paper should be between 8-10 pages (excluding the reference and cover pages).
If you have further questions, please do not hesitate to contact me.
Best wishes
Dev
mportant notes about grading:
1. Compiler errors: All code you submit must compile. Programs that do not compile will receive an automatic zero. If you run out of time, it is better to comment out the parts that do not compile, than hand in a more complete file that does not compile.
2. Late assignments: You must submit your code before the deadline. Verify on Sakai that you have submitted the correct version. If you submit the incorrect version before the deadline and realize that you have done so after the deadline, we will only grade the version received before the deadline.
A Prolog interpreter
In this project, you will implement a Prolog interpreter in OCaml.
If you want to implement the project in Python, download the source code and follow the README file. Parsing functions and test-cases are provided.
Pseudocode
Your main task is to implement the non-deterministic abstract interpreter covered in the lecture Control in Prolog. The pseudocode of the abstract interpreter is in the lecture note.
Bonus
There is also a bonus task for implementing a deterministic Prolog interpreter with support for backtracking (recover from bad choices) and choice points (produce multiple results). Please refer to th.
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxdrandy1
COMM 166 Final Research Proposal Guidelines
The proposal should contain well-developed sections (Put clear titles on the top of each section) of your outline that you submitted earlier. The proposal should have seven (7) major sections:
1. Introduction: A brief overview of all your sections. Approx. one page
2. A summary of the literature review. In this section you would summarize the previous research (summarize at least 8-10 scholarly research articles), and also your field data collection results (if it was connected to your proposal topic). Also indicate the gaps in the previous research, including your pilot study, and the need for your research study. Please devote around three pages in reviewing the previous research and finding the gaps.
3. Arising from the literature review, write the Purpose Statement of your research (purpose statement should have all its parts clearly written. Follow the examples from textbook).
4. Identify two to three main hypotheses or research questions (based on the quantitative/qualitative research design). Also give some of your supporting research questions. Follow the examples from textbook.
5. Describe the research strategy of inquiry and methods that you would use and why. The method part should be the substantial part of your paper, around three pages. Define your knowledge claims, strategies, and methods from the textbook (and cite), why you chose them, and how you will conduct the research in detail.
6. A page on the significance of your study.
7. A complete reference list of your sources in APA style.
The total length of the paper should be between 8-10 pages (excluding the reference and cover pages).
If you have further questions, please do not hesitate to contact me.
Best wishes
Dev
mportant notes about grading:
1. Compiler errors: All code you submit must compile. Programs that do not compile will receive an automatic zero. If you run out of time, it is better to comment out the parts that do not compile, than hand in a more complete file that does not compile.
2. Late assignments: You must submit your code before the deadline. Verify on Sakai that you have submitted the correct version. If you submit the incorrect version before the deadline and realize that you have done so after the deadline, we will only grade the version received before the deadline.
A Prolog interpreter
In this project, you will implement a Prolog interpreter in OCaml.
If you want to implement the project in Python, download the source code and follow the README file. Parsing functions and test-cases are provided.
Pseudocode
Your main task is to implement the non-deterministic abstract interpreter covered in the lecture Control in Prolog. The pseudocode of the abstract interpreter is in the lecture note.
Bonus
There is also a bonus task for implementing a deterministic Prolog interpreter with support for backtracking (recover from bad choices) and choice points (produce multiple results). Please refer to th.
JavaScript - Chapter 9 - TypeConversion and Regular Expressions WebStackAcademy
Type Conversion:
JavaScript is loosely typed language and most of the time operators automatically convert a value to the right type but there are also cases when we need to explicitly do type conversions.
While JavaScript provides numerous ways to convert data from one type to another but there are two most common data conversions :
Converting Values to String
Converting Values to Numbers
Regular Expressions:
A regular expression is an object that describes a pattern of characters.
The JavaScript RegExp class represents regular expressions, and both String and RegExp define methods that use regular expressions to perform powerful pattern-matching and search-and-replace functions on text.
c language faq's contains the frequently asked questions in c language and useful for all c programmers. This is an pdf document of Sterve_summit who is one of the best writer for c language
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC
Speaker: Rajni Baliyan
As the volume of data of a personal nature and commodification of information collected and analysed increases; so is the focus on privacy and data security. Many countries are examining international and domestic laws in order to protect consumers and organisations alike.
The Australian Senate has recently passed a bill containing mandatory requirements to notify the privacy commissioner and consumers when data is at risk of causing serious harm in the case of a data breach occurring.
Europe has also announced new laws that allow consumers more control over their data. These laws allow consumers to tell companies to erase any data held about them.
These new laws will have a significant impact on organisations that store personal information.
This talk will examine some of these legislative changes and how specific PostgreSQL features can assist organisations in meeting their obligations and avoid heavy fines associated with breaching them.
While the physical replication in PostgreSQL is quite robust, however, it doesn’t fit well in the picture when:
- You need partial replication only
- You want to replicate between different major versions of PostgreSQL
- You need to replicate multiple databases to the same target
- Transformation of the data is needed
- You want to replicate in order to upgrade without downtime
The answer to these use cases is logical replication
This talk will discuss and cover these use cases followed by a logical replication demo.
More Related Content
Similar to PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
Question 1 1 pts Skip to question text.As part of a bank account.docxamrit47
Question 1 1 pts Skip to question text.
As part of a bank account implementation, there is an account class and a checking account class. These two classes should be related by:
polymorphism
abstract classes
both composition and inheritance
inheritance
composition
Flag this Question
Question 2 1 pts
When using OOP, which of the following terms refers to a mechanism for a behavior, basically how it’s implemented?
composition
inheritance
polymorphism
dynamic binding
Flag this Question
Question 3 1 pts
To access an element in an Array object,
Use the ArrayList's get() method.
Use the ArrayList's element() method.
Individual elements in an ArrayList can’t be accessed without doing a sequential query getSequential(), returning every element up to and including the element requested.
Use square brackets around an index value.
Flag this Question
Question 4 1 pts
To access an element in an ArrayList object,
Use square brackets around an index value.
Use the ArrayList element() method.
Use the ArrayList get() method.
Individual elements in an ArrayList can’t be accessed without doing a sequential query getSequential(), returning every element up to and including the element requested.
Flag this Question
Question 5 1 pts
What term below is defined as a message that tells the program that something has happened?
an interaction
a listener
an action
an event
Flag this Question
Question 6 1 pts
Which item below is defined as an object?
A String
An Array
All of the above
An ArrayList
Flag this Question
Question 7 1 pts
When a text-box-enter event occurs, which method and parameter are required to handle this type of action? (1 point)
actionEvent with an actionperformed parameter
actionListener with an interfaceID parameter
windowListener with an eventID parameter
actionPerformed with an actionEvent parameter
Flag this Question
Question 8 1 pts
Which class includes the setTitle and setSize methods?
JFrame
JWindow
JBox
JOptionpane
Flag this Question
Question 9 1 pts
Before utilizing the binary search method, __________ must be done to the array?
indexing
splitting
sorting
importing
Flag this Question
Question 10 1 pts
Which layout manager implements a one-compartment layout scheme?
GridlessLayout
GridBagLayout
GridLayout
FlowLayout
BorderLayout
Flag this Question
Question 11 1 pts
What is the default layout manager for a JFrame window?
GridBagLayout
GridLayout
FlowLayout
GridlessLayout
BorderLayout
Flag this Question
Question 12 1 pts
To call the superclass constructor, super() must be the first line in a constructor.
True
False
Flag this Question
Question 13 1 pts
Method overriding is when a method has the same name, same sequence of parameter types, and the same return type as a method in a superclass.
True
False
Flag this Question
Question 14 1 pts
Type casting, also known as promotion is ...
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/ibrettflorio
REGEX! Love it or hate it, sometimes you actually need it. And when that time comes, there's no reason to be afraid or to ask help from that one weirdo on your team who actually loves regular expressions. (I'm that weirdo, fwiw.)
This session is geared towards beginning and intermediate regex users, as well as experienced programmers and developers who just don't really grok regex. We'll cover the following topics using practical examples that you might encounter in your own projects. (ie. No matching against "dog" and "cat".)
* What is regex? How's it work? A brief history.
* Syntax, special characters, character classes
* Grouping, capturing, and common gotchas
* Use cases for matching, validating, and replacing
* More advanced topics like backreferences and lookarounds
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxmonicafrancis71118
COMM 166 Final Research Proposal Guidelines
The proposal should contain well-developed sections (Put clear titles on the top of each section) of your outline that you submitted earlier. The proposal should have seven (7) major sections:
1. Introduction: A brief overview of all your sections. Approx. one page
2. A summary of the literature review. In this section you would summarize the previous research (summarize at least 8-10 scholarly research articles), and also your field data collection results (if it was connected to your proposal topic). Also indicate the gaps in the previous research, including your pilot study, and the need for your research study. Please devote around three pages in reviewing the previous research and finding the gaps.
3. Arising from the literature review, write the Purpose Statement of your research (purpose statement should have all its parts clearly written. Follow the examples from textbook).
4. Identify two to three main hypotheses or research questions (based on the quantitative/qualitative research design). Also give some of your supporting research questions. Follow the examples from textbook.
5. Describe the research strategy of inquiry and methods that you would use and why. The method part should be the substantial part of your paper, around three pages. Define your knowledge claims, strategies, and methods from the textbook (and cite), why you chose them, and how you will conduct the research in detail.
6. A page on the significance of your study.
7. A complete reference list of your sources in APA style.
The total length of the paper should be between 8-10 pages (excluding the reference and cover pages).
If you have further questions, please do not hesitate to contact me.
Best wishes
Dev
mportant notes about grading:
1. Compiler errors: All code you submit must compile. Programs that do not compile will receive an automatic zero. If you run out of time, it is better to comment out the parts that do not compile, than hand in a more complete file that does not compile.
2. Late assignments: You must submit your code before the deadline. Verify on Sakai that you have submitted the correct version. If you submit the incorrect version before the deadline and realize that you have done so after the deadline, we will only grade the version received before the deadline.
A Prolog interpreter
In this project, you will implement a Prolog interpreter in OCaml.
If you want to implement the project in Python, download the source code here and follow the README file. Parsing functions and test-cases are provided.
Pseudocode
Your main task is to implement the non-deterministic abstract interpreter covered in the lecture Control in Prolog. The pseudocode of the abstract interpreter is in the lecture note.
Bonus
There is also a bonus task for implementing a deterministic Prolog interpreter with support for backtracking (recover from bad choices) and choice points (produce multiple results). Please refer t.
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxcargillfilberto
COMM 166 Final Research Proposal Guidelines
The proposal should contain well-developed sections (Put clear titles on the top of each section) of your outline that you submitted earlier. The proposal should have seven (7) major sections:
1. Introduction: A brief overview of all your sections. Approx. one page
2. A summary of the literature review. In this section you would summarize the previous research (summarize at least 8-10 scholarly research articles), and also your field data collection results (if it was connected to your proposal topic). Also indicate the gaps in the previous research, including your pilot study, and the need for your research study. Please devote around three pages in reviewing the previous research and finding the gaps.
3. Arising from the literature review, write the Purpose Statement of your research (purpose statement should have all its parts clearly written. Follow the examples from textbook).
4. Identify two to three main hypotheses or research questions (based on the quantitative/qualitative research design). Also give some of your supporting research questions. Follow the examples from textbook.
5. Describe the research strategy of inquiry and methods that you would use and why. The method part should be the substantial part of your paper, around three pages. Define your knowledge claims, strategies, and methods from the textbook (and cite), why you chose them, and how you will conduct the research in detail.
6. A page on the significance of your study.
7. A complete reference list of your sources in APA style.
The total length of the paper should be between 8-10 pages (excluding the reference and cover pages).
If you have further questions, please do not hesitate to contact me.
Best wishes
Dev
mportant notes about grading:
1. Compiler errors: All code you submit must compile. Programs that do not compile will receive an automatic zero. If you run out of time, it is better to comment out the parts that do not compile, than hand in a more complete file that does not compile.
2. Late assignments: You must submit your code before the deadline. Verify on Sakai that you have submitted the correct version. If you submit the incorrect version before the deadline and realize that you have done so after the deadline, we will only grade the version received before the deadline.
A Prolog interpreter
In this project, you will implement a Prolog interpreter in OCaml.
If you want to implement the project in Python, download the source code and follow the README file. Parsing functions and test-cases are provided.
Pseudocode
Your main task is to implement the non-deterministic abstract interpreter covered in the lecture Control in Prolog. The pseudocode of the abstract interpreter is in the lecture note.
Bonus
There is also a bonus task for implementing a deterministic Prolog interpreter with support for backtracking (recover from bad choices) and choice points (produce multiple results). Please refer to th.
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxdrandy1
COMM 166 Final Research Proposal Guidelines
The proposal should contain well-developed sections (Put clear titles on the top of each section) of your outline that you submitted earlier. The proposal should have seven (7) major sections:
1. Introduction: A brief overview of all your sections. Approx. one page
2. A summary of the literature review. In this section you would summarize the previous research (summarize at least 8-10 scholarly research articles), and also your field data collection results (if it was connected to your proposal topic). Also indicate the gaps in the previous research, including your pilot study, and the need for your research study. Please devote around three pages in reviewing the previous research and finding the gaps.
3. Arising from the literature review, write the Purpose Statement of your research (purpose statement should have all its parts clearly written. Follow the examples from textbook).
4. Identify two to three main hypotheses or research questions (based on the quantitative/qualitative research design). Also give some of your supporting research questions. Follow the examples from textbook.
5. Describe the research strategy of inquiry and methods that you would use and why. The method part should be the substantial part of your paper, around three pages. Define your knowledge claims, strategies, and methods from the textbook (and cite), why you chose them, and how you will conduct the research in detail.
6. A page on the significance of your study.
7. A complete reference list of your sources in APA style.
The total length of the paper should be between 8-10 pages (excluding the reference and cover pages).
If you have further questions, please do not hesitate to contact me.
Best wishes
Dev
mportant notes about grading:
1. Compiler errors: All code you submit must compile. Programs that do not compile will receive an automatic zero. If you run out of time, it is better to comment out the parts that do not compile, than hand in a more complete file that does not compile.
2. Late assignments: You must submit your code before the deadline. Verify on Sakai that you have submitted the correct version. If you submit the incorrect version before the deadline and realize that you have done so after the deadline, we will only grade the version received before the deadline.
A Prolog interpreter
In this project, you will implement a Prolog interpreter in OCaml.
If you want to implement the project in Python, download the source code and follow the README file. Parsing functions and test-cases are provided.
Pseudocode
Your main task is to implement the non-deterministic abstract interpreter covered in the lecture Control in Prolog. The pseudocode of the abstract interpreter is in the lecture note.
Bonus
There is also a bonus task for implementing a deterministic Prolog interpreter with support for backtracking (recover from bad choices) and choice points (produce multiple results). Please refer to th.
JavaScript - Chapter 9 - TypeConversion and Regular Expressions WebStackAcademy
Type Conversion:
JavaScript is loosely typed language and most of the time operators automatically convert a value to the right type but there are also cases when we need to explicitly do type conversions.
While JavaScript provides numerous ways to convert data from one type to another but there are two most common data conversions :
Converting Values to String
Converting Values to Numbers
Regular Expressions:
A regular expression is an object that describes a pattern of characters.
The JavaScript RegExp class represents regular expressions, and both String and RegExp define methods that use regular expressions to perform powerful pattern-matching and search-and-replace functions on text.
c language faq's contains the frequently asked questions in c language and useful for all c programmers. This is an pdf document of Sterve_summit who is one of the best writer for c language
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...PGConf APAC
Speaker: Rajni Baliyan
As the volume of data of a personal nature and commodification of information collected and analysed increases; so is the focus on privacy and data security. Many countries are examining international and domestic laws in order to protect consumers and organisations alike.
The Australian Senate has recently passed a bill containing mandatory requirements to notify the privacy commissioner and consumers when data is at risk of causing serious harm in the case of a data breach occurring.
Europe has also announced new laws that allow consumers more control over their data. These laws allow consumers to tell companies to erase any data held about them.
These new laws will have a significant impact on organisations that store personal information.
This talk will examine some of these legislative changes and how specific PostgreSQL features can assist organisations in meeting their obligations and avoid heavy fines associated with breaching them.
While the physical replication in PostgreSQL is quite robust, however, it doesn’t fit well in the picture when:
- You need partial replication only
- You want to replicate between different major versions of PostgreSQL
- You need to replicate multiple databases to the same target
- Transformation of the data is needed
- You want to replicate in order to upgrade without downtime
The answer to these use cases is logical replication
This talk will discuss and cover these use cases followed by a logical replication demo.
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018PGConf APAC
There's no need to re-invent the wheel! Dozens of people have already tried...and succeeded. This talk is a categorized and illustrated overview on most popular and/or useful PostgreSQL specific scripts, utilities and whole toolsets that DBAs should be aware of for solving daily tasks. Inlcuding - performance monitoring, logs management/analyzis, identifying/fixing most common adminstration problems around areas of general performance metrics, tuning, locking, indexing, bloat, leaving out high-availability topics. Covered are venerable oldies from wiki.postgresql.org as well as my newer favourites from Github.
Speaker: Alexander Kukushkin
Kubernetes is a solid leader among different cloud orchestration engines and its adoption rate is growing on a daily basis. Naturally people want to run both their applications and databases on the same infrastructure.
There are a lot of ways to deploy and run PostgreSQL on Kubernetes, but most of them are not cloud-native. Around one year ago Zalando started to run HA setup of PostgreSQL on Kubernetes managed by Patroni. Those experiments were quite successful and produced a Helm chart for Patroni. That chart was useful, albeit a single problem: Patroni depended on Etcd, ZooKeeper or Consul.
Few people look forward to deploy two applications instead of one and support them later on. In this talk I would like to introduce Kubernetes-native Patroni. I will explain how Patroni uses Kubernetes API to run a leader election and store the cluster state. I’m going to live-demo a deployment of HA PostgreSQL cluster on Minikube and share our own experience of running more than 130 clusters on Kubernetes.
Patroni is a Python open-source project developed by Zalando in cooperation with other contributors on GitHub: https://github.com/zalando/patroni
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
Speakers: Dominic Dwyer & Wei Shan Ang
This talk was presented in Percona Live Europe 2017. However, we did not have enough time to test against more scenario. We will be giving an updated talk with a more comprehensive tests and numbers. We hope to run it against citusDB and MongoRocks as well to provide a comprehensive comparison.
https://www.percona.com/live/e17/sessions/high-performance-json-postgresql-vs-mongodb
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC
Speaker: Lukas Fittl
Your PostgreSQL database is one of the most important pieces of your architecture - yet the level of introspection available in Postgres is often hard to work with. Its easy to get very detailed information, but what should you really watch out for, send reports on and alert on?
In this talk we'll discuss how query performance statistics can be made accessible to application developers, critical entries one should monitor in the PostgreSQL log files, how to collect EXPLAIN plans at scale, how to watch over autovacuum and VACUUM operations, and how to flag issues based on schema statistics.
We'll also talk a bit about monitoring multi-server setups, first going into high availability and read standbys, logical replication, and then reviewing how monitoring looks like for sharded databases like Citus.
The talk will primarily describe free/open-source tools and statistics views readily available from within Postgres.
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...PGConf APAC
Speaker: Ian Barwick
PostgreSQL and reliability go hand-in-hand - but your data is only truly safe with a solid and trusted backup system in place, and no matter how good your application is, it's useless if it can't talk to your database.
In this talk we'll demonstrate how to set up a reliable replication
cluster using open source tools closely associated with the PostgreSQL project. The talk will cover following areas:
- how to set up and manage a replication cluster with `repmgr`
- how to set up and manage reliable backups with `Barman`
- how to manage failover and application connections with `repmgr` and `PgBouncer`
Ian Barwick has worked for 2ndQuadrant since 2014, and as well as making various contributions to PostgreSQL itself, is lead `repmgr` developer. He lives in Tokyo, Japan.
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...PGConf APAC
Speaker: Muhammad Usama
Pgpool-II has been around to complement PostgreSQL over a decade and provides many features like connection pooling, failover, query caching, load balancing, and HA. High Availability (HA) is very critical to most enterprise application, the clients needs the ability to automatically reconnect with a secondary node when the master nodes goes down.
This is where Pgpool-II watchdog feature comes in, the core feature of Pgpool-II provides HA by eliminating the SPOF is the Watchdog. This watchdog feature has been around for a while but it went through major overhauling and enhancements in recent releases. This talk aims to explain the watchdog feature, the recent enhancements went into the watchdog and describe how it can be used to provide PostgreSQL HA and automatic failover.
Their is rising trend of enterprise deployment shifting to cloud based environment, Pgpool II can be used in the cloud without any issues. In this talk we will give some ideas how Pgpool-II is used to provide PostgreSQL HA in cloud environment.
Finally we will summarise the major features that have been added in the recent major release of Pgpool II and whats in the pipeline for the next major release.
PGConf APAC 2018 - PostgreSQL performance comparison in various cloudsPGConf APAC
Speaker: Oskari Saarenmaa
Aiven PostgreSQL is available in five different public cloud providers' infrastructure in more than 60 regions around the world, including 18 in APAC. This has given us a unique opportunity to benchmark and compare performance of similar configurations in different environments.
We'll share our benchmark methods and results, comparing various PostgreSQL configurations and workloads across different clouds.
About a year ago I was caught up in line-of-fire when a production system started behaving abruptly
- A batch process which would finish in 15minutes started taking 1.5 hours
- We started facing OLTP read queries on standby being cancelled
- We faced a sudden slowness on the Primary server and we were forced to do a forceful switch to standby.
We were able to figure out that some peculiarities of the application code and batch process were responsible for this. But we could not fix the application code (as it is packaged application).
In this talk I would like to share more details of how we debugged, what was the problem we were facing and how we applied a work around for it. We also learnt that a query returning in 10minutes may not be as dangerous as a query returning in 10sec but executed 100s of times in an hour.
I will share in detail-
- How to map the process/top stats from OS with pg_stat_activity
- How to get and read explain plan
- How to judge if a query is costly
- What tools helped us
- A peculiar autovacuum/vacuum Vs Replication conflict we ran into
- Various parameters to tune autvacuum and auto-analyze process
- What we have done to work-around the problem
- What we have put in place for better monitoring and information gathering
This presentation was used by Blair during his talk on Aurora and PostgreSQl compatibility for Aurora at pgDay Asia 2017. The talk was part of dedicated PostgreSQL track at FOSSASIA 2017
PostgreSQL is one of the most loved databases and that is why AWS could not hold back from offering PostgreSQL as RDS. There are some really nice features in RDS which can be good for DBA and inspiring for Enterprises to build resilient solution with PostgreSQL.
This ppt was used by Devrim at pgDay Asia 2017. He talked about some important facts about WAL - Transaction Logs or xlogs in PostgreSQL. Some of these can really come handy on a bad day
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
1. Where’s Waldo? - Text Search and Pattern
Matching in PostgreSQL
Joe Conway
joe.conway@crunchydata.com
mail@joeconway.com
Crunchy Data
March 22, 2018
2. Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Where’s Waldo?
Many potential methods
Usually best to use simplest method that fits use case
Might need to combine more than one method
Joe Conway PGConf APAC 2018 2/54
3. Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Agenda
Summary of methods
Overview by method
Example use cases
Joe Conway PGConf APAC 2018 3/54
4. Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Caveats
Full text search could easily fill a tutorial
⇒ this talk provides overview
Even other methods cannot be covered exhaustively
⇒ this talk provides overview
citext not covered, should be considered
Joe Conway PGConf APAC 2018 4/54
5. Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Text Search Methods
Standard Pattern Matching
LIKE operator
SIMILAR TO operator
POSIX-style regular expressions
PostgreSQL extensions
fuzzystrmatch
Soundex
Levenshtein
Metaphone
Double Metaphone
pg trgm
Full Text Search
Joe Conway PGConf APAC 2018 5/54
6. Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Note on Extensions
Extensions used are created as shown below
CREATE EXTENSION pg_trgm;
CREATE EXTENSION fuzzystrmatch;
Joe Conway PGConf APAC 2018 6/54
7. Overview
Overview by Method
Use Cases
Questions
Intro
Agenda
Methods
Sample Data
Sample Data
CREATE TABLE messages (
[...]
_from text NOT NULL,
_to text NOT NULL,
subject text NOT NULL,
bodytxt text NOT NULL,
fti tsvector NOT NULL,
[...]
);
select count(1) from messages;
count
---------
1086568
(1 row)
Joe Conway PGConf APAC 2018 7/54
8. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
LIKE Syntax
Expression returns TRUE if string matches pattern
Typically string comes from relation in FROM clause
Used as predicate in the WHERE clause to filter returned rows
LIKE is case sensitive
ILIKE is case insensitive
string LIKE pattern [ESCAPE escape-character]
string ~~ pattern [ESCAPE escape-character]
string ILIKE pattern [ESCAPE escape-character]
string ~~* pattern [ESCAPE escape-character]
lower(string) LIKE pattern [ESCAPE escape-character]
Joe Conway PGConf APAC 2018 8/54
9. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Negating LIKE
To negate match, use the NOT keyword
Appropriate operator also works
string NOT LIKE pattern [ESCAPE escape-character]
string !~~ pattern [ESCAPE escape-character]
string NOT ILIKE pattern [ESCAPE escape-character]
string !~~* pattern [ESCAPE escape-character]
NOT (string LIKE pattern [ESCAPE escape-character])
NOT (string ILIKE pattern [ESCAPE escape-character])
Joe Conway PGConf APAC 2018 9/54
10. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Wildcards
Pattern can contain wildcard characters
Underscore (”_”) matches any single character
Percent sign (”%”) matches zero or more characters
With no wildcards, expression acts like equals
To match literal wildcard chars, they must be escaped
Default escape char is backslash (””)
May be changed using ESCAPE clause
Match the literal escape char by doubling up
Joe Conway PGConf APAC 2018 10/54
11. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Alternate Index Op Classes
varchar_pattern_ops, text_pattern_ops and bpchar_pattern_ops
Useful for anchored pattern matching, e.g. ”<pattern>%”
Used by LIKE, SIMILAR TO, or POSIX regex when not using ”C” locale
Also create ”normal” index for queries with <, <=, >, or >=
Does NOT work for ILIKE or ~~*
Expression index over lower(column)
pg trgm index operator class
Joe Conway PGConf APAC 2018 11/54
12. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
ESCAPE Example
SELECT 'AbC_%_dEf' LIKE 'AbC#_#%#_d%' ESCAPE '#';
?column?
----------
t
(1 row)
Joe Conway PGConf APAC 2018 12/54
13. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
SIMILAR TO Syntax
Equivalent to LIKE
Interprets pattern using SQL definition of regex
string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]
Joe Conway PGConf APAC 2018 13/54
14. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Wildcards
Same as LIKE
Also supports meta-characters borrowed from POSIX REs
pipe (”|”): either of two alternatives
asterisk (”*”): repetition >= 0 times
plus (”+”): repetition >= 1 time
question mark (”?”): repetition 0 or 1 time
”{m}”: repetition exactly m times
”{m,}”: repetition >= m times
”{m,n}”: repetition >= m and <= n times
parentheses (”()”): group items into a single logical item
Joe Conway PGConf APAC 2018 14/54
15. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
SIMILAR TO Examples
SELECT 'AbCdEf' SIMILAR TO 'AbC%' AS true,
'AbCdEf' SIMILAR TO 'Ab(C|c)%' AS true,
'Abcccdef' SIMILAR TO 'Abc{4}%' AS false,
'Abcccdef' SIMILAR TO 'Abc{3}%' AS true,
'Abccdef' SIMILAR TO 'Abc?d?%' AS true;
true | true | false | true | true
------+------+-------+------+------
t | t | f | t | t
(1 row)
Joe Conway PGConf APAC 2018 15/54
16. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Syntax
Similar to LIKE and ILIKE
Allowed to match anywhere within string
⇒ unless RE is explicitly anchored
Interprets pattern using POSIX definition of regex
string ~ pattern -- matches RE, case sensitive
string ~* pattern -- matches RE, case insensitive
string !~ pattern -- not matches RE, case sensitive
string !~* pattern -- not matches RE, case insensitive
Joe Conway PGConf APAC 2018 16/54
17. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Syntax
POSIX-style REs complex enough to deserve own talk
See: www.postgresql.org/docs/9.5/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
SELECT 'AbCdefzzzzdef' ~* 'Ab((C|c).*)?z+def.*' AS true,
'AbcabcAbc' ~ '^Ab.*bc$' AS true,
'AbcabcAbc' ~ '^Ab' AS true,
'AbcAbcAbc' ~* 'abc' AS true,
'AbcAbcAbc' ~* '^abc$' AS false;
true | true | true | true | false
------+------+------+------+-------
t | t | t | t | f
(1 row)
Joe Conway PGConf APAC 2018 17/54
18. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Example
Really slow without an index
EXPLAIN ANALYZE SELECT date FROM messages
WHERE bodytxt ~* 'multixact';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=108 width=8)
(actual time=6.435..26851.944 rows=2580 loops=1)
Filter: (bodytxt ~* 'multixact'::text)
Rows Removed by Filter: 1083988
Planning time: 1.682 ms
Execution time: 26852.410 ms
Joe Conway PGConf APAC 2018 18/54
19. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Example
Use trigram GIN index
CREATE INDEX trgm_gin_bodytxt_idx
ON messages USING gin (bodytxt using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE bodytxt ~* 'multixact';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_bodytxt_idx
(cost=0.00..124.81 rows=108 width=0)
(actual time=66.095..66.095 rows=2581 loops=1)
Index Cond: (bodytxt ~* 'multixact'::text)
Planning time: 3.680 ms
Execution time: 192.912 ms
Joe Conway PGConf APAC 2018 19/54
20. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Example
Or use trigram GiST index . . . oops
CREATE INDEX trgm_gist_bodytxt_idx
ON messages USING gist (bodytxt using gist_trgm_ops);
ERROR: index row requires 8672 bytes, maximum size is 8191
Joe Conway PGConf APAC 2018 20/54
21. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Regular Expression Compared to FTS
For the sake of comparison - with full text search
EXPLAIN ANALYZE SELECT date FROM messages
WHERE fti @@ 'multixact:D';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on messages_fti_idx
(cost=0.00..64.75 rows=5433 width=0)
(actual time=1.085..1.085 rows=1475 loops=1)
Index Cond: (fti @@ '''multixact'':D'::tsquery)
Planning time: 0.504 ms
Execution time: 22.054 ms
Joe Conway PGConf APAC 2018 21/54
22. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Soundex
soundex: converts string to four character code
difference: converts two strings, reports # matching positions
Generally finds similarity of English names
Part of fuzzystrmatch extension
SELECT soundex('Joseph'), soundex('Josef'),
difference('Joseph', 'Josef');
soundex | soundex | difference
---------+---------+------------
J210 | J210 | 4
Joe Conway PGConf APAC 2018 22/54
23. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Levenshtein
Calculates Levenshtein distance between two strings
Comparisons case sensitive
Strings non-null, maximum 255 bytes
Part of fuzzystrmatch extension
SELECT levenshtein('Joseph','Josef') AS two,
levenshtein('John','Joan') AS one,
levenshtein('foo','foo') AS zero;
two | one | zero
-----+-----+------
2 | 1 | 0
Joe Conway PGConf APAC 2018 23/54
24. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Metaphone
Constructs code for an input string
Comparisons case in-sensitive
Strings non-null, maximum 255 bytes
max_output_length arg sets max length of code
Part of fuzzystrmatch extension
SELECT metaphone('extensive',6) AS "EKSTNS",
metaphone('exhaustive',6) AS "EKSHST",
metaphone('ExTensive',3) AS "EKS",
metaphone('eXhaustivE',3) AS "EKS";
EKSTNS | EKSHST | EKS | EKS
--------+--------+-----+-----
EKSTNS | EKSHST | EKS | EKS
Joe Conway PGConf APAC 2018 24/54
25. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Double Metaphone
Computes primary and alternate codes for string
Non-English names especially, can be different
Comparisons case in-sensitive
No length limit on the input strings
Part of fuzzystrmatch extension
SELECT dmetaphone('extensive') AS "AKST",
dmetaphone('exhaustive') AS "AKSS",
dmetaphone('Magnus') AS "MNS",
dmetaphone_alt('Magnus') AS "MKNS";
AKST | AKSS | MNS | MKNS
------+------+-----+------
AKST | AKSS | MNS | MKNS
Joe Conway PGConf APAC 2018 25/54
26. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Trigram Matching
Functions and operators for determining similarity
Trigram is group of three consecutive characters from string
Similarity of two strings - count number of trigrams shared
Index operator classes supporting fast similar strings search
Support indexed searches for LIKE and ILIKE queries
Comparisons case in-sensitive
Part of pg trgm extension
Joe Conway PGConf APAC 2018 26/54
27. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Trigram Matching Example
timing
SELECT set_limit(0.6); -- defaults to 0.3
SELECT DISTINCT _from, -- uses trgm_gist_idx
similarity(_from, 'Josef Konway <mail@joeconway.com>') AS sml
FROM messages WHERE _from % 'Josef Konway <mail@joeconway.com>'
ORDER BY sml DESC, _from;
_from | sml
------------------------------------+----------
Joseph Conway <mail@joeconway.com> | 0.724138
Joe Conway <mail@joeconway.com> | 0.703704
jconway <mail@joeconway.com> | 0.678571
"Joe Conway" <joe.conway@mail.com> | 0.62963
(4 rows)
Time: 502.002 ms
Joe Conway PGConf APAC 2018 27/54
28. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Overview
Searches documents with potentially complex criteria
Superior to other methods in many cases because:
Offers linguistic support for derived words
Ignores stop words
Ranks results by relevance
Very flexibly uses indexes
Topic very complex - see:
http://www.postgresql.org/docs/9.5/static/textsearch.html
http://www.postgresql.org/docs/9.5/static/datatype-textsearch.html
http://www.postgresql.org/docs/9.5/static/functions-textsearch.html
http://www.postgresql.org/docs/9.5/static/textsearch-indexes.html
Joe Conway PGConf APAC 2018 28/54
29. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Preprocessing
Convert text to tsvector
Store tsvector
Index tsvector
CREATE FUNCTION messages_fti_trigger_func()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN NEW.fti =
setweight(to_tsvector(coalesce(NEW.subject, '')), 'A') ||
setweight(to_tsvector(coalesce(NEW.bodytxt, '')), 'D');
RETURN NEW; END $$;
CREATE TRIGGER messages_fti_trigger BEFORE INSERT OR UPDATE
OF subject, bodytxt ON messages FOR EACH ROW
EXECUTE PROCEDURE messages_fti_trigger_func();
CREATE INDEX messages_fti_idx ON messages USING gin (fti);
Joe Conway PGConf APAC 2018 29/54
30. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Weighting
Weights used in relevance ranking
Array specifies how heavily to weigh each category
{D-weight, C-weight, B-weight, A-weight}
defaults: {0.1, 0.2, 0.4, 1.0}
Joe Conway PGConf APAC 2018 30/54
31. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Creating tsvector
Parse into tokens
Classes of tokens can be processed differently
Postgres has standard parser and predefined set of classes
Custom parsers can be created
Convert tokens into lexemes
Dictionaries used for this step
⇒ standard dictionaries provided
⇒ custom ones can be created
Normalized: different forms of same word made alike
⇒ fold upper-case letters to lower-case
⇒ removal of suffixes
⇒ elimination of stop words
Joe Conway PGConf APAC 2018 31/54
32. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Writing tsquery
The pattern to be matched
Lexemes combined with boolean operators
& (AND)
| (OR)
! (NOT)
! (NOT) binds most tightly
& (AND) binds more tightly than | (OR)
Parentheses used to enforce grouping
Label with * to specify prefix matching
Supports weight labels
Joe Conway PGConf APAC 2018 32/54
33. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Writing tsquery
to tsquery() to create
Normalizes tokens lexemes
Discards stop words
Can also cast to tsquery
Tokens taken at face value
No weight labels
SELECT to_tsquery('(hello:A & the:A) | writing:D') AS to_tsq,
'(hello & the) | writing'::tsquery AS cast_tsq
UNION ALL
SELECT to_tsquery('postgres:*'),
'postgres:*'::tsquery;
to_tsq | cast_tsq
-----------------------+-----------------------------
'hello':A | 'write':D | 'hello' & 'the' | 'writing'
'postgr':* | 'postgres':*
Joe Conway PGConf APAC 2018 33/54
34. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Writing tsquery
Alternative function plainto tsquery()
Text parsed and normalized
& (AND) operator inserted between surviving words
Should use simple strings only
No boolean operators,
No weight labels,
No prefix-match labels
SELECT plainto_tsquery('(hello:B & the:B) | writing:D') AS tsq1,
plainto_tsquery('postgres:*') AS tsq2;
tsq1 | tsq2
-------------------------------------+----------
'hello' & 'b' & 'b' & 'write' & 'd' | 'postgr'
Joe Conway PGConf APAC 2018 34/54
35. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Match Operator
Text search match operator @@
Returns true if tsvector (preprocessed document)
matches tsquery (search pattern)
Either maybe be written first
SELECT split_part(_from, '<', 1) AS name, date
FROM messages
WHERE fti @@ 'multixact:A & race:D & bug:D';
name | date
-----------------+------------------------
Alvaro Herrera | 2013-11-25 07:36:19-08
Andres Freund | 2013-11-25 08:26:55-08
Andres Freund | 2013-11-29 11:58:06-08
Joe Conway PGConf APAC 2018 35/54
36. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Relevance Ranking
ts rank(): based on frequency of matching lexemes
ts rank cd(): lexeme proximity taken into consideration
WITH ts(q) AS
(
SELECT 'multixact:A & (crash:D | (data:D & loss:D))'::tsquery
)
SELECT ts_rank(m.fti, ts.q) as tsrank
FROM messages m, ts
WHERE m.fti @@ ts.q
ORDER BY tsrank DESC LIMIT 4;
tsrank
----------
0.999997
0.999997
0.999997
0.999997
Joe Conway PGConf APAC 2018 36/54
37. Overview
Overview by Method
Use Cases
Questions
LIKE Operator
SIMILAR TO Operator
POSIX-style Regular Expressions
Fuzzy
Full Text Search
Highlighting
ts headline(): returns excerpt with query terms highlighted
Apply in an outer query, after inner query LIMIT
⇒ avoids ts headline() overhead on eliminated rows
SELECT subject, tsrank, ts_headline(format('%s: %s', subject, bodytxt), q)
FROM (WITH ts(q) AS
(SELECT 'multixact:A & (crash:D | (data:D & loss:D))'::tsquery)
SELECT ts_rank(m.fti, ts.q) as tsrank, ts.q, m.subject, m.bodytxt
FROM messages m, ts WHERE m.fti @@ ts.q ORDER BY tsrank DESC LIMIT 4
) AS inner_query LIMIT 1;
-[ RECORD 1 ]--------------------------------------------------------------
subject | Is anyone aware of data loss causing MultiXact bugs in 9.3.2?
tsrank | 0.999997
ts_headline | <b>data</b> <b>loss</b> causing <b>MultiXact</b>
bugs in 9.3.2?: I've had multiple complaints
of apparent <b>data</b>
Joe Conway PGConf APAC 2018 37/54
38. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Pattern Matching: Example Use Cases
Equal
Anchored
Anchored case-insensitive
Reverse Anchored case-insensitive
Unanchored case-insensitive
Fuzzy
Complex Search with Relevancy Ranking
Joe Conway PGConf APAC 2018 38/54
39. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Equal
Find all the rows where column matches '<pattern>'
Equal operator with suitable index is best
Without an index
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from = 'Joseph Conway <mail@joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=61 width=8)
(actual time=49.192..527.343 rows=14 loops=1)
Filter: (_from = 'Joseph Conway <mail@joeconway.com>'::text)
Rows Removed by Filter: 1086554
Planning time: 0.256 ms
Execution time: 527.386 ms
Joe Conway PGConf APAC 2018 39/54
40. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Equal
With an index
CREATE INDEX from_idx ON messages(_from);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from = 'Joseph Conway <mail@joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on from_idx
(cost=0.00..4.88 rows=61 width=0)
(actual time=0.051..0.051 rows=14 loops=1)
Index Cond: (_from = 'Joseph Conway <mail@joeconway.com>'::text)
Planning time: 0.267 ms
Execution time: 0.161 ms
Joe Conway PGConf APAC 2018 40/54
41. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored
Find all the rows where column matches '<pattern>%'
LIKE operator with suitable index is best
This index does not do the job
CREATE INDEX from_idx ON messages(_from);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from LIKE 'Joseph Conway%';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=62 width=8)
(actual time=52.991..536.316 rows=14 loops=1)
Filter: (_from ~~ 'Joseph Conway%'::text)
Rows Removed by Filter: 1086554
Planning time: 0.264 ms
Execution time: 536.362 ms
Joe Conway PGConf APAC 2018 41/54
42. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored
Note text_pattern_ops - this works
CREATE INDEX pattern_idx ON messages(_from using text_pattern_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from LIKE 'Joseph Conway%';
QUERY PLAN
-------------------------------------------------------------------
Index Scan using pattern_idx on messages
(cost=0.43..8.45 rows=62 width=8)
(actual time=0.043..0.082 rows=14 loops=1)
Index Cond: ((_from ~>=~ 'Joseph Conway'::text)
AND (_from ~<~ 'Joseph Conwaz'::text))
Filter: (_from ~~ 'Joseph Conway%'::text)
Planning time: 0.490 ms
Execution time: 0.133 ms
Joe Conway PGConf APAC 2018 42/54
43. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored Case-Insensitive
Find all the rows where column matches '<pattern>%'
⇒ but in Case-Insensitive way
LIKE operator with suitable expression index is good
CREATE INDEX lower_pattern_idx ON messages(lower(_from) using text_pattern_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE lower(_from) LIKE 'joseph conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages [...]
-> Bitmap Index Scan on lower_pattern_idx
(cost=0.00..214.76 rows=5433 width=0)
(actual time=0.074..0.074 rows=14 loops=1)
Index Cond: ((lower(_from) ~>=~ 'joseph conway'::text)
AND (lower(_from) ~<~ 'joseph conwaz'::text))
Planning time: 0.505 ms
Execution time: 0.258 ms
Joe Conway PGConf APAC 2018 43/54
44. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored Case-Insensitive
Can also use trigram GIN index with ILIKE
CREATE INDEX trgm_gin_idx
ON messages USING gin (_from using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE 'joseph conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_idx
(cost=0.00..176.46 rows=62 width=0)
(actual time=92.980..92.980 rows=155 loops=1)
Index Cond: (_from ~~* 'joseph conway%'::text)
Planning time: 0.857 ms
Execution time: 93.473 ms
Joe Conway PGConf APAC 2018 44/54
45. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Anchored Case-Insensitive
Or a trigram GiST index with ILIKE
CREATE INDEX trgm_gist_idx
ON messages USING gist (_from using gist_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE 'joseph conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gist_idx
(cost=0.00..8.88 rows=62 width=0)
(actual time=53.080..53.080 rows=155 loops=1)
Index Cond: (_from ~~* 'joseph conway%'::text)
Planning time: 1.068 ms
Execution time: 53.604 ms
Joe Conway PGConf APAC 2018 45/54
46. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Reverse Anchored Case-Insensitive
Find all the rows where column matches '%<pattern>'
⇒ but in Case-Insensitive way
LIKE operator with suitable expression index is good
CREATE INDEX rev_lower_pattern_idx ON messages(lower(reverse(_from)) using text_pattern_ops);
EXPLAIN ANALYZE SELECT date FROM messages WHERE lower(reverse(_from))
LIKE reverse('%joeconway.com>');
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages [...]
-> Bitmap Index Scan on rev_lower_pattern_idx
(cost=0.00..214.76 rows=5433 width=0)
(actual time=1.357..1.357 rows=2749 loops=1)
Index Cond: ((lower(reverse(_from)) ~>=~ '>moc.yawnoceoj'::text)
AND (lower(reverse(_from)) ~<~ '>moc.yawnoceok'::text))
Planning time: 0.278 ms
Execution time: 17.491 ms
Joe Conway PGConf APAC 2018 46/54
47. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Reverse Anchored Case-Insensitive
Can also use trigram GIN index with ILIKE
CREATE INDEX trgm_gin_idx
ON messages USING gin (_from using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_idx
(cost=0.00..177.58 rows=2344 width=0)
(actual time=80.537..80.537 rows=2749 loops=1)
Index Cond: (_from ~~* '%joeconway.com>'::text)
Planning time: 0.915 ms
Execution time: 88.723 ms
Joe Conway PGConf APAC 2018 47/54
48. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Reverse Anchored Case-Insensitive
Or a trigram GiST index with ILIKE
CREATE INDEX trgm_gist_idx
ON messages USING gist (_from using gist_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%joeconway.com>';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gist_idx
(cost=0.00..193.99 rows=2344 width=0)
(actual time=58.386..58.386 rows=2749 loops=1)
Index Cond: (_from ~~* '%joeconway.com>'::text)
Planning time: 0.921 ms
Execution time: 66.771 ms
Joe Conway PGConf APAC 2018 48/54
49. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Unanchored Case-Insensitive
Find all the rows where column matches '%<pattern>%'
⇒ but in Case-Insensitive way
This cannot use expression or pattern ops index
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%Conway%';
QUERY PLAN
-------------------------------------------------------------------
Seq Scan on messages
(cost=0.00..197436.10 rows=5096 width=8)
(actual time=2.242..2002.998 rows=7402 loops=1)
Filter: (_from ~~* '%Conway%'::text)
Rows Removed by Filter: 1079166
Planning time: 0.860 ms
Execution time: 2003.667 ms
Joe Conway PGConf APAC 2018 49/54
50. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Unanchored Case-Insensitive
Use trigram GIN index with ILIKE
CREATE INDEX trgm_gin_idx
ON messages USING gin (_from using gin_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%Conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gin_idx
(cost=0.00..94.22 rows=5096 width=0)
(actual time=9.060..9.060 rows=7402 loops=1)
Index Cond: (_from ~~* '%Conway%'::text)
Planning time: 0.915 ms
Execution time: 30.567 ms
Joe Conway PGConf APAC 2018 50/54
51. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Unanchored Case-Insensitive
Or a trigram GiST index with ILIKE
CREATE INDEX trgm_gist_idx
ON messages USING gist (_from using gist_trgm_ops);
EXPLAIN ANALYZE SELECT date FROM messages
WHERE _from ILIKE '%Conway%';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on trgm_gist_idx
(cost=0.00..422.63 rows=5096 width=0)
(actual time=128.881..128.881 rows=7402 loops=1)
Index Cond: (_from ~~* '%Conway%'::text)
Planning time: 0.871 ms
Execution time: 149.755 ms
Joe Conway PGConf APAC 2018 51/54
52. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Fuzzy
Find all the rows where column matches '<pattern>' ⇒ but in an inexact way
Use dmetaphone function with an expression index
Might also use Soundex, Levenshtein, Metaphone, or pg trgm
CREATE INDEX dmet_expr_idx ON messages(dmetaphone(_from));
EXPLAIN ANALYZE SELECT _from FROM messages
WHERE dmetaphone(_from) = dmetaphone('josef konwei');
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages [...]
-> Bitmap Index Scan on dmet_expr_idx
(cost=0.00..101.17 rows=5433 width=0)
(actual time=0.085..0.085 rows=108 loops=1)
Index Cond: (dmetaphone(_from) = 'JSFK'::text)
Planning time: 0.272 ms
Execution time: 0.445 ms
Joe Conway PGConf APAC 2018 52/54
53. Overview
Overview by Method
Use Cases
Questions
Summary
Equal
Anchored
Fuzzy
Complex
Complex Requirements
Full Text Search
Complex multi-word searching
Relevancy Ranking
EXPLAIN ANALYZE SELECT date FROM messages
WHERE fti @@ 'bug:A & deadlock:D & startup:D';
QUERY PLAN
-------------------------------------------------------------------
Bitmap Heap Scan on messages
[...]
-> Bitmap Index Scan on messages_fti_idx
(cost=0.00..52.02 rows=2 width=0)
(actual time=9.261..9.261 rows=93 loops=1)
Index Cond: (fti @@ '''bug'':A & ''deadlock'':D &
''startup'':D'::tsquery)
Planning time: 0.469 ms
Execution time: 12.614 ms
Joe Conway PGConf APAC 2018 53/54