2. 2
What is Fuzzy Matching?
Fuzzy Matching also called as Approximate String Matching is a technique that helps identify
two elements of text, strings, or entries that are approximately similar but are not exactly the
same.
3. 3
How does Fuzzy Matching help in the real
world?
There are many situations where the Fuzzy Matching technique can come in handy. Let’s look at some
real-world examples of using Fuzzy Matching.
1. Creating a Single Customer View: A large organization is bound to have a multitude of such tables which
they could join to obtain a single customer view. This often requires fuzzy string matching
2. Fraud Detection: A good fuzzy string matching algorithm can help in detecting fraud within an
organization. FAA used fuzzy string matching to single out several pilots for exhibiting fraudulent behavior.
3. Data Accuracy: Fuzzy string matching can help improve data quality and accuracy by data deduplication,
identification of false-positive, etc.
4. 4
How does Fuzzy Matching work?
Traditional logic is binary in nature i.e. a statement is either true or false. On the contrary, fuzzy logic
indicates the degree to which a statement is true.
5. 5
How does Fuzzy Name Matching work?
One of the most important use cases of fuzzy matching arises when we want to join tables using the
name field. Matching these requires a set of rules that can handle slight variations in the name field.
These sets of rules are called fuzzy rules and we call this process as Fuzzy Name Matching.
6. 6
How to perform Fuzzy Name Matching?
Like with many computing techniques there are popular algorithms that can be used in performing Fuzzy
Name Matching. The following are some popular Fuzzy Name Matching algorithms.
1. Levenshtein Distance: The Levenshtein distance is a metric used to measure the difference between
2 string sequences. It gives us a measure of the number of single character insertions, deletions or
substitutions required to change one string into another.
2. The Soundex Algorithm: Soundex is a phonetic algorithm that is used to search for names that sound
similar but are spelled differently. It is most commonly used for genealogical database searches.
3. The Metaphone and Double Metaphone Algorithms: The Metaphone algorithm is an improvement
over the vanilla Soundex algorithm, while the double Metaphone algorithm builds upon the Metaphone
algorithm. The ‘double’ Metaphone algorithm returns two keys for words that have more than one
pronunciation.
4. Cosine Similarity: Cosine Similarity between two non-zero vectors is equal to the cosine of the angle
between them.
7. 7
Implementing Fuzzy Matching...
Fuzzy Matching algorithms can be implemented in various programming languages.
1. Fuzzy String Matching Using Python: Fuzzywuzzy is a python library that is used for fuzzy string
matching. The basic comparison metric used by the Fuzzywuzzy library is the Levenshtein distance.
2. Fuzzy String Matching Using Java: Things were a little tougher in java as it isn't specifically designed
for data science. However, there are a lot of github repositories available that perform fuzzy string
matching using java.
3. Fuzzy String Matching Using Microsoft Excel: Excel also provides a Fuzzy Lookup Add-In that is
used to perform fuzzy matching between columns on the desktop version.
8. 8
Fuzzy Matching best practices
1. Fuzzy string matching is a widely researched area and new algorithms/software are periodically
released therefore it pays to keep your eyes and ears open for new developments.
2. Even after rigorous testing, you are bound to end up with a few false positives so make sure that you
don't use fuzzy software to process sensitive data.
3. Fuzzy string matching pays the highest dividends when you have a lot of data that if matched
correctly results in a large upside while false positives don't matter as much.
9. 9
Learn more about Fuzzy
Matching:
https://nanonets.com/blog/fuzzy-matching-fuzzy-logic/