SlideShare a Scribd company logo
1 of 9
Download to read offline
Fuzzy Matching/Logic
Explained
2
What is Fuzzy Matching?
Fuzzy Matching also called as Approximate String Matching is a technique that helps identify
two elements of text, strings, or entries that are approximately similar but are not exactly the
same.
3
How does Fuzzy Matching help in the real
world?
There are many situations where the Fuzzy Matching technique can come in handy. Let’s look at some
real-world examples of using Fuzzy Matching.
1. Creating a Single Customer View: A large organization is bound to have a multitude of such tables which
they could join to obtain a single customer view. This often requires fuzzy string matching
2. Fraud Detection: A good fuzzy string matching algorithm can help in detecting fraud within an
organization. FAA used fuzzy string matching to single out several pilots for exhibiting fraudulent behavior.
3. Data Accuracy: Fuzzy string matching can help improve data quality and accuracy by data deduplication,
identification of false-positive, etc.
4
How does Fuzzy Matching work?
Traditional logic is binary in nature i.e. a statement is either true or false. On the contrary, fuzzy logic
indicates the degree to which a statement is true.
5
How does Fuzzy Name Matching work?
One of the most important use cases of fuzzy matching arises when we want to join tables using the
name field. Matching these requires a set of rules that can handle slight variations in the name field.
These sets of rules are called fuzzy rules and we call this process as Fuzzy Name Matching.
6
How to perform Fuzzy Name Matching?
Like with many computing techniques there are popular algorithms that can be used in performing Fuzzy
Name Matching. The following are some popular Fuzzy Name Matching algorithms.
1. Levenshtein Distance: The Levenshtein distance is a metric used to measure the difference between
2 string sequences. It gives us a measure of the number of single character insertions, deletions or
substitutions required to change one string into another.
2. The Soundex Algorithm: Soundex is a phonetic algorithm that is used to search for names that sound
similar but are spelled differently. It is most commonly used for genealogical database searches.
3. The Metaphone and Double Metaphone Algorithms: The Metaphone algorithm is an improvement
over the vanilla Soundex algorithm, while the double Metaphone algorithm builds upon the Metaphone
algorithm. The ‘double’ Metaphone algorithm returns two keys for words that have more than one
pronunciation.
4. Cosine Similarity: Cosine Similarity between two non-zero vectors is equal to the cosine of the angle
between them.
7
Implementing Fuzzy Matching...
Fuzzy Matching algorithms can be implemented in various programming languages.
1. Fuzzy String Matching Using Python: Fuzzywuzzy is a python library that is used for fuzzy string
matching. The basic comparison metric used by the Fuzzywuzzy library is the Levenshtein distance.
2. Fuzzy String Matching Using Java: Things were a little tougher in java as it isn't specifically designed
for data science. However, there are a lot of github repositories available that perform fuzzy string
matching using java.
3. Fuzzy String Matching Using Microsoft Excel: Excel also provides a Fuzzy Lookup Add-In that is
used to perform fuzzy matching between columns on the desktop version.
8
Fuzzy Matching best practices
1. Fuzzy string matching is a widely researched area and new algorithms/software are periodically
released therefore it pays to keep your eyes and ears open for new developments.
2. Even after rigorous testing, you are bound to end up with a few false positives so make sure that you
don't use fuzzy software to process sensitive data.
3. Fuzzy string matching pays the highest dividends when you have a lot of data that if matched
correctly results in a large upside while false positives don't matter as much.
9
Learn more about Fuzzy
Matching:
https://nanonets.com/blog/fuzzy-matching-fuzzy-logic/

More Related Content

What's hot

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Databricks
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
Sajid Marwat
 

What's hot (20)

Introduction to Python Programming
Introduction to Python ProgrammingIntroduction to Python Programming
Introduction to Python Programming
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
 
Priority queues
Priority queuesPriority queues
Priority queues
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Trie Data Structure
Trie Data Structure Trie Data Structure
Trie Data Structure
 
Cryptography with caesar Cipher
Cryptography with caesar CipherCryptography with caesar Cipher
Cryptography with caesar Cipher
 
Protecting user data in profile matching social networks
Protecting user data in profile matching social networksProtecting user data in profile matching social networks
Protecting user data in profile matching social networks
 
Hash Function
Hash FunctionHash Function
Hash Function
 
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Stream...
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 
Parsing
ParsingParsing
Parsing
 
UNIT I LINEAR DATA STRUCTURES – LIST
UNIT I 	LINEAR DATA STRUCTURES – LIST 	UNIT I 	LINEAR DATA STRUCTURES – LIST
UNIT I LINEAR DATA STRUCTURES – LIST
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Applied Cryptography
Applied CryptographyApplied Cryptography
Applied Cryptography
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
 
Text mining Pre-processing
Text mining Pre-processingText mining Pre-processing
Text mining Pre-processing
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 

Similar to Fuzzy Matching or Fuzzy Logic Explained

Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
Andi Wu
 
Lecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnLecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can Learn
Kodok Ngorex
 

Similar to Fuzzy Matching or Fuzzy Logic Explained (20)

The search engine index
The search engine indexThe search engine index
The search engine index
 
Irjet v7 i4693
Irjet v7 i4693Irjet v7 i4693
Irjet v7 i4693
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
 
Achieving flatness selecting the honeywords
Achieving flatness selecting the honeywordsAchieving flatness selecting the honeywords
Achieving flatness selecting the honeywords
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining Methodologies
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learning
 
Spreadsheets are code
Spreadsheets are codeSpreadsheets are code
Spreadsheets are code
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered Harmful
 
unit-5.pdf
unit-5.pdfunit-5.pdf
unit-5.pdf
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity ranking
 
Lecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnLecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can Learn
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
 

More from OliviaSmith160

More from OliviaSmith160 (7)

What is Accounts Payable
What is Accounts PayableWhat is Accounts Payable
What is Accounts Payable
 
The Accounts Payable Process
The Accounts Payable ProcessThe Accounts Payable Process
The Accounts Payable Process
 
What is Zonal OCR?
What is Zonal OCR?What is Zonal OCR?
What is Zonal OCR?
 
PDF OCR
PDF OCRPDF OCR
PDF OCR
 
Document Parsing
Document ParsingDocument Parsing
Document Parsing
 
Payment Reconciliation
Payment ReconciliationPayment Reconciliation
Payment Reconciliation
 
PDF to Excel
PDF to ExcelPDF to Excel
PDF to Excel
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Fuzzy Matching or Fuzzy Logic Explained

  • 2. 2 What is Fuzzy Matching? Fuzzy Matching also called as Approximate String Matching is a technique that helps identify two elements of text, strings, or entries that are approximately similar but are not exactly the same.
  • 3. 3 How does Fuzzy Matching help in the real world? There are many situations where the Fuzzy Matching technique can come in handy. Let’s look at some real-world examples of using Fuzzy Matching. 1. Creating a Single Customer View: A large organization is bound to have a multitude of such tables which they could join to obtain a single customer view. This often requires fuzzy string matching 2. Fraud Detection: A good fuzzy string matching algorithm can help in detecting fraud within an organization. FAA used fuzzy string matching to single out several pilots for exhibiting fraudulent behavior. 3. Data Accuracy: Fuzzy string matching can help improve data quality and accuracy by data deduplication, identification of false-positive, etc.
  • 4. 4 How does Fuzzy Matching work? Traditional logic is binary in nature i.e. a statement is either true or false. On the contrary, fuzzy logic indicates the degree to which a statement is true.
  • 5. 5 How does Fuzzy Name Matching work? One of the most important use cases of fuzzy matching arises when we want to join tables using the name field. Matching these requires a set of rules that can handle slight variations in the name field. These sets of rules are called fuzzy rules and we call this process as Fuzzy Name Matching.
  • 6. 6 How to perform Fuzzy Name Matching? Like with many computing techniques there are popular algorithms that can be used in performing Fuzzy Name Matching. The following are some popular Fuzzy Name Matching algorithms. 1. Levenshtein Distance: The Levenshtein distance is a metric used to measure the difference between 2 string sequences. It gives us a measure of the number of single character insertions, deletions or substitutions required to change one string into another. 2. The Soundex Algorithm: Soundex is a phonetic algorithm that is used to search for names that sound similar but are spelled differently. It is most commonly used for genealogical database searches. 3. The Metaphone and Double Metaphone Algorithms: The Metaphone algorithm is an improvement over the vanilla Soundex algorithm, while the double Metaphone algorithm builds upon the Metaphone algorithm. The ‘double’ Metaphone algorithm returns two keys for words that have more than one pronunciation. 4. Cosine Similarity: Cosine Similarity between two non-zero vectors is equal to the cosine of the angle between them.
  • 7. 7 Implementing Fuzzy Matching... Fuzzy Matching algorithms can be implemented in various programming languages. 1. Fuzzy String Matching Using Python: Fuzzywuzzy is a python library that is used for fuzzy string matching. The basic comparison metric used by the Fuzzywuzzy library is the Levenshtein distance. 2. Fuzzy String Matching Using Java: Things were a little tougher in java as it isn't specifically designed for data science. However, there are a lot of github repositories available that perform fuzzy string matching using java. 3. Fuzzy String Matching Using Microsoft Excel: Excel also provides a Fuzzy Lookup Add-In that is used to perform fuzzy matching between columns on the desktop version.
  • 8. 8 Fuzzy Matching best practices 1. Fuzzy string matching is a widely researched area and new algorithms/software are periodically released therefore it pays to keep your eyes and ears open for new developments. 2. Even after rigorous testing, you are bound to end up with a few false positives so make sure that you don't use fuzzy software to process sensitive data. 3. Fuzzy string matching pays the highest dividends when you have a lot of data that if matched correctly results in a large upside while false positives don't matter as much.
  • 9. 9 Learn more about Fuzzy Matching: https://nanonets.com/blog/fuzzy-matching-fuzzy-logic/