K Nearest Neighbor Algorithm

•Download as PPTX, PDF•

1 like•651 views

The document discusses the K-nearest neighbor (KNN) algorithm, a non-parametric lazy learning classification method. KNN stores all training examples and classifies new examples based on their similarity to stored examples, by finding the k most similar examples and basing the classification on those neighbors. The document provides examples of how KNN can be used for spam filtering by classifying emails based on their word counts and distances to stored labeled examples. The classification result may depend on the choice of k for the number of neighbors.

Technology

K-NEAREST NEIGHBOR
ALGORITHM
Presented by Hien Nguyen

WHY DO WE CARE?
Amazon Prime Movie Adaptive Text Retrieval
Spam Email filtering Online course recommendation

WHAT IS K-NN ALGORITHM?
KNN is a non parametric lazy
learning algorithm that stores
all available cases and classifies
new cases based on a similarity
measure
3
Features Comparator
Library
Recommendati
on

K-NN CLASSIFICATION
4
Sir/Mada
m
Occurrenc
es
Word
Cout
Class
10 100 Spam
5 200 Good
25 100 Spam
30 400 Spam
20 300 Spam
0 400 Good
20 500 Good
30 600 Spam
10 600 Good
15 700 Spam
8 700 Spm
2 700 Good

K-NN CLASSIFICATION
2
21
2
21 )()( yyxxD 

K-NN CLASSIFICATIONSir/Madam
Occurrence
s
Word Cout Class Distanc
e
10 100 Spam 100
5 200 Good 5
25 100 Spam 101.11
30 400 Spam 200.99
20 300 Spam 100.49
0 400 Good 200.24
20 500 Good 300.16
30 600 Spam 400.49
10 600 Good 400
15 700 Spam 500.02
5
8 700 Spm 500.00
4

KNN – VOTING
Majority voting:
 all votes are equal. Count how many of the k neighbours have that class.
Return the class with the most votes.
Inverse distance-weighted voting:
 Closer neighbours get higher votes. While there are better-motivated methods, the
simplest version is to take a neighbour’s vote to be the inverse of its distance to the
new instance:
 Then we sum the votes and return the class with the highest vote
10

SUMMARY
KNN is conceptually simple, yet able to solve complex problems
Can work with relatively little information
Learning is simple (no learning at all!)
Memory and CPU cost
Feature selection problem
Sensitive to representation
11WWW.ISMARTSOFT.COM

PRACTICE
Creature A Creature B
3-NN ? ?
5-NN ? ?

ACKNOWLEDGEMENT
Two examples of spam email and game are from MIT Open Course
Midterm and Final exam pages.

The document provides information on various statistical and mathematical concepts including: - Measures of central tendency such as mean, mode, and median using example data sets. - Probability and how it always adds up to 1. - Calculating angles in pie charts using total frequency. - Calculating the range as the difference between the largest and smallest values. - Types of correlation such as positive, negative, and none. - Rules for adding and subtracting fractions, rounding numbers, and order of operations. - How to multiply and simplify ratios.

$Converting fractions-to-decimals2-ptnbh9$ $Converting fractions-to-decimals2-ptnbh9$

Converting fractions-to-decimals2-ptnbh9

g2desai

Comparing and ordering_decimals

Mandy Flannery

The document discusses ordering and comparing decimal numbers on McDonald's menu prices from least to most expensive. It explains how to compare decimals by lining up the decimal points and looking at the place value of each digit, and how adding zeros when ordering decimals makes the comparison clearer. Examples are provided to demonstrate how to use place value and a number line to determine if one decimal is greater than, less than, or equal to another decimal.

Converting Fractions to Decimals Hex

Kathy Favazza

The document discusses converting between fractions and decimals. It provides examples of writing fractions as decimals by dividing the numerator by the denominator. It also shows how to write decimals as fractions by determining if they are terminating or repeating decimals. If repeating, the repeating digits are written over the same number of 9s and then simplified. Examples are given of matching fractions to their decimal equivalents.

phnsrl

Renshou Dai

This document discusses the relationships between various metrics for measuring signal quality: EVM, ρ, SNR, and phase noise. It shows that EVM, ρ, and SNR all measure signal quality but in different ways, and relates each to phase noise RMS values. The document provides equations to convert between the metrics and examples of corresponding values in a table for various phase noise levels ranging from 0 to 3 degrees. It also discusses implications for phase noise requirements in IS-97 CDMA receivers based on the specified ρ accuracy.

Comparing and ordering_decimals

meo001

1) The document discusses ordering decimals from least to greatest using place value and number lines. It provides examples of ordering prices from a McDonald's menu and test scores. 2) Equivalent decimals have the same value even if they have a different number of decimal places. Annexing zeros by adding trailing zeros does not change a decimal's value. 3) To order decimals from least to greatest, decimals are first lined up and zeros are annexed to give each number the same number of decimal places. Then the decimals are compared using place value starting from the left.

Comparing And Ordering Decimals

pfannebeka

This document discusses beats, which occur when two waves of the same amplitude but slightly different frequencies travel in the same direction. It defines the number of beats per second as the difference in frequencies of the two waves. It then works through two problems: determining the frequency of a second tuning fork based on it producing 3 beats per second with a 820Hz fork, and calculating the beats per second of a 355Hz and 350Hz sound wave.

November 16

khyps13

Fraction To Decimal

Donna Furrey

This document provides instructions for converting decimals to fractions and fractions to decimals. It explains that the place value of the last digit determines the denominator of the fraction. For decimals, the place value is determined by powers of ten. For fractions, the place value determines where the digit goes in the decimal. It also addresses situations where the denominator is not a power of ten, in which case the fraction needs to be divided.

$Comparing fractions and decimals$ $Comparing fractions and decimals$

Comparing fractions and decimals

g2desai

This document provides a lesson on comparing fractions and decimals. It begins with the lesson objectives which are to compare and order fractions with or without symbols and to identify equivalent fractions. It then discusses a video clip that can be used to reinforce the concepts. The bulk of the document explains how to compare fractions by finding a common denominator, how to compare decimals by looking at place values, and how to convert between fractions and decimals. It provides examples of comparing, converting fractions to decimals, converting decimals to fractions, and comparing mixed fractions and decimals.

Ordering decimals

emteacher

Comparing and Ordering Decimals

NeilfieOrit2

This document provides instructions for comparing and ordering decimals. It explains how to: 1) Compare decimals by lining up the decimal points, finding the first differing digit, comparing that digit, and using <, >, or =. 2) Order decimals from least to greatest by lining up decimal points, annexing zeros, and comparing place values. 3) Order decimals from greatest to least using the same steps. Worked examples and practice problems are provided to demonstrate comparing and ordering decimals.

Sept 4 Notes

april_lamb

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Neo4j

Essentials of Automations: Exploring Attributes & Automation Parameters

Safe Software

Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they? Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality. You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

AppSec PNW: Android and iOS Application Security with MobSF

Ajin Abraham

Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application. This talk covers: Using MobSF for static analysis of mobile applications. Interactive dynamic security assessment of Android and iOS applications. Solving Mobile app CTF challenges. Reverse engineering and runtime analysis of Mobile malware. How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.

The Microsoft 365 Migration Tutorial For Beginner.pptx

operationspcvita

HCL Notes and Domino License Cost Reduction in the World of DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/ The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this! We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model. Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward. These topics will be covered - Reducing license cost by finding and fixing misconfigurations and superfluous accounts - How do CCB and CCX licenses really work? - Understanding the DLAU tool and how to best utilize it - Tips for common problem areas, like team mailboxes, functional/test users, etc - Practical examples and best practices to implement right away

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

June Patch Tuesday

Ivanti

Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.

Dandelion Hashtable: beyond billion requests per second on a commodity server

Antonios Katsarakis

This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Fwdays

At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience

Apps Break Data

Ivo Velitchkov

How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

5th Power Grid Model Meet-up It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology. Power Grid Model The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services. Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability. Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization. What to expect For the upcoming meetup we are organizing, we have an exciting lineup of activities planned: -Insightful presentations covering two practical applications of the Power Grid Model. -An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024. -An interactive brainstorming session to discuss and propose new feature requests. -An opportunity to connect with fellow Power Grid Model enthusiasts and users.

Introduction of Cybersecurity with OSS at Code Europe 2024

Hiroshi SHIBATA

I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems. The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS. Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application. I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.

What's hot

LO 7 beats

Wenwan Zhang

November 16

khyps13

Fraction To Decimal

Donna Furrey

$Comparing fractions and decimals$ $Comparing fractions and decimals$

Comparing fractions and decimals

g2desai

Ordering decimals

emteacher

Comparing and Ordering Decimals

NeilfieOrit2

Sept 4 Notes

april_lamb

What's hot (7)

LO 7 beats

November 16

Fraction To Decimal

$Comparing fractions and decimals$ $Comparing fractions and decimals$

Comparing fractions and decimals

Ordering decimals

Comparing and Ordering Decimals

Sept 4 Notes

Recently uploaded

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Neo4j

Essentials of Automations: Exploring Attributes & Automation Parameters

Safe Software

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

AppSec PNW: Android and iOS Application Security with MobSF

Ajin Abraham

The Microsoft 365 Migration Tutorial For Beginner.pptx

operationspcvita

HCL Notes and Domino License Cost Reduction in the World of DLAU

panagenda

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

June Patch Tuesday

Ivanti

Dandelion Hashtable: beyond billion requests per second on a commodity server

Antonios Katsarakis

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Fwdays

Apps Break Data

Ivo Velitchkov

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

Introduction of Cybersecurity with OSS at Code Europe 2024

Hiroshi SHIBATA

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

saastr

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research

Neo4j

Fueling AI with Great Data with Airbyte Webinar

Zilliz

Generating privacy-protected synthetic data using Secludy and Milvus

Zilliz

During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

Neo4j

Recently uploaded (20)

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Essentials of Automations: Exploring Attributes & Automation Parameters

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

AppSec PNW: Android and iOS Application Security with MobSF

The Microsoft 365 Migration Tutorial For Beginner.pptx

HCL Notes and Domino License Cost Reduction in the World of DLAU

Monitoring and Managing Anomaly Detection on OpenShift.pdf

June Patch Tuesday

Dandelion Hashtable: beyond billion requests per second on a commodity server

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Apps Break Data

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

Northern Engraving | Nameplate Manufacturing Process - 2024

5th LF Energy Power Grid Model Meet-up Slides

Introduction of Cybersecurity with OSS at Code Europe 2024

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

Harnessing the Power of NLP and Knowledge Graphs for Opioid Research

Fueling AI with Great Data with Airbyte Webinar

Generating privacy-protected synthetic data using Secludy and Milvus

Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

K Nearest Neighbor Algorithm

1. K-NEAREST NEIGHBOR ALGORITHM Presented by Hien Nguyen

2. WHY DO WE CARE? Amazon Prime Movie Adaptive Text Retrieval Spam Email filtering Online course recommendation

3. WHAT IS K-NN ALGORITHM? KNN is a non parametric lazy learning algorithm that stores all available cases and classifies new cases based on a similarity measure 3 Features Comparator Library Recommendati on

4. K-NN CLASSIFICATION 4 Sir/Mada m Occurrenc es Word Cout Class 10 100 Spam 5 200 Good 25 100 Spam 30 400 Spam 20 300 Spam 0 400 Good 20 500 Good 30 600 Spam 10 600 Good 15 700 Spam 8 700 Spm 2 700 Good

5. K-NN CLASSIFICATION 2 21 2 21 )()( yyxxD 

6. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Good 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4

7. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Good 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4 K=1 => M: Good email

8. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Spam 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4 K=3 => M: Spam

9. K-NN CLASSIFICATIONSir/Madam Occurrence s Word Cout Class Distanc e 10 100 Spam 100 5 200 Good 5 25 100 Spam 101.11 30 400 Spam 200.99 20 300 Spam 100.49 0 400 Good 200.24 20 500 Good 300.16 30 600 Spam 400.49 10 600 Good 400 15 700 Spam 500.02 5 8 700 Spm 500.00 4 K=5 => M: Spam

10. KNN – VOTING Majority voting:  all votes are equal. Count how many of the k neighbours have that class. Return the class with the most votes. Inverse distance-weighted voting:  Closer neighbours get higher votes. While there are better-motivated methods, the simplest version is to take a neighbour’s vote to be the inverse of its distance to the new instance:  Then we sum the votes and return the class with the highest vote 10

11. SUMMARY KNN is conceptually simple, yet able to solve complex problems Can work with relatively little information Learning is simple (no learning at all!) Memory and CPU cost Feature selection problem Sensitive to representation 11WWW.ISMARTSOFT.COM

12. PRACTICE Creature A Creature B 3-NN ? ? 5-NN ? ?

13. ACKNOWLEDGEMENT Two examples of spam email and game are from MIT Open Course Midterm and Final exam pages.

Editor's Notes

Nearest Neighbors have been used in statistical estimation and pattern recognition already in the beginning of 1970’s (non-parametric techniques). Dynamic Memory: A theory of Reminding and Learning in Computer and People (Schank, 1982). People reason by remembering and learn by doing. Thinking is reminding, making analogies

K Nearest Neighbor Algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Recently uploaded

Recently uploaded (20)

K Nearest Neighbor Algorithm

Editor's Notes