Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Data mining to predict academic per... by Ranjith Gowda 297 views
- Data Warehousing and Data Mining Pr... by Tommy96 1289 views
- Data Mining Project for student ac... by Mohammed Kharma 270 views
- STUDENTS’ PERFORMANCE PREDICTION SY... by ijdkpjournal 319 views
- Predicting students' performance us... by ijdkpjournal 1043 views
- Final Year IEEE Project 2013-2014 -... by elysiumtechnologies 29614 views
- Data mining by Akannsha Totewar 37978 views
- IEEE 2014 - 2015 DATA MINING PROJEC... by g14ganesh 18836 views
- Data mining slides by smj 20377 views
- ieee projects 2014-15 for cse with ... by vsanthosh05 32636 views
- Data mining project: by Tommy96 500 views
- Download by butest 627 views

No Downloads

Total Views

1,735

On Slideshare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

29

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Classification Technique KNN in Data Mining ---on dataset “Iris” Comp722 data mining Kaiwen Qi, UNC Spring 2012
- 2. Outline Dataset introduction Data processing Data analysis KNN & Implementation Testing
- 3. Dataset Raw dataset Iris(http://archive.ics.uci.edu/ml/datasets/Iris) 5 Attributes (a) Raw 150 total records Sepal length in cm data (continious number) Sepal width in cm (continious number) 50 records Iris Setosa Petal length in cm (continious number) Petal width in cm 50 records Iris Versicolour (continious number) Class (nominal data: 50 records Iris Virginica Iris Setosa Iris Versicolour Iris Virginica) (b) Data (C) Data organization
- 4. Classification Goal Task
- 5. Data Processing Original data
- 6. Data Processing• Balanced distribution
- 7. Data Analysis Statistics
- 8. Data Analysis Histogram
- 9. Data Analysis Histogram
- 10. KNN KNN algorithm The unknown data, the green circle, is classified to be square when K is 5. The distance between two points is calculated with Euclidean distance d(p, q)= . .In this example, square is the majority in 5 nearest neighbors.
- 11. KNN Advantage the skimpiness of implementation. It is good at dealing with numeric attributes. Does not set up the model and just imports the dataset with very low computer overhead. Does not need to calculate the useful attribute subset. Compared with naïve Bayesian, we do not need to worry about lack of available probability data
- 12. Implementation of KNN Algorithm Algorithm: KNN. Asses a classification label from training data for an unlabeled data Input: K, the number of neighbors. Dataset that include training data Output: A string that indicates unknown tuple’s classification Method: Create a distance array whose size is K Initialize the array with the distances between the unlabeled tuple with first K records in dataset Let i=k+1 calculate the distance between the unlabeled tuple with the (k+1)th record in dataset, if the distance is greater than the biggest distance in the array, replace the old max distance with the new distance; i=i+1 repeat step (4) until i is greater than dataset size(150) Count the class number in the array, the class of biggest number is mining result
- 13. Implementation of KNN UML
- 14. Testing Testing (K=7, total 150 tuples)
- 15. Testing Testing (K=7, 60% data as training data)
- 16. Testing Input random distribution dataset Random dataset Accuracy test:
- 17. Performance Comparison Decision tree Advantage Naïve Bayesian • comprehensibility • construct a decision tree without any Advantage domain knowledge • relatively simply. • handle high dimensional • By simply calculating • By eliminating unrelated attributes attributes frequency from and tree pruning, it simplifies training datanand without classification calculation any other operations (e.g. Disadvantage sort, search), • requires good quality of training data. Disadvantage • usually runs in memory • The assumption of • Not good at handling continuous independence is not right number features. • No available probability data to calculate probability
- 18. Conclusion KNN is a simple algorithm with high classification accuracy for dataset with continuous attributes. It shows high performance with balanced distribution training data as input.
- 19. ThanksQuestion?

Be the first to comment