Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,106 views

Published on

Kagggle Projects - Digit Recognizer and Titanic Disaster

No Downloads

Total views

1,106

On SlideShare

0

From Embeds

0

Number of Embeds

18

Shares

0

Downloads

16

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Sawinder Pal Kaur, PhD Kaggle Projects
- 2. Outline Problem Statement Methods used Results
- 3. Problem: Digit Recognizer Identify handwritten single digits 0~9, based on grey scale images. Sample images
- 4. Statement Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel- value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive. pixel0 pixel1 pixel2 ... pixel27 pixel28 pixel29 pixel30 ... pixel55 | | | ... | pixel756 pixel757 pixel758 ... pixel783
- 5. Statement The training data set, has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image. The test data set, is the same as the training set, except that it does not contain the "label" column. Goal of the problem is to predict the images in the test data set
- 6. Methods used to solve the problem Random Forest Support Vector Machine (SVM) K-Nearest Neighborhood (KNN)
- 7. Random Forest Ensemble of decision trees Each tree is trained on a bootstrapped sample of the original data set Each time a node is split, only a randomly chosen subset of the dimensions are considered for splitting Each tree is fully grown and not pruned When a new input is entered into the system, it is run down all of the trees. The result may either be an average or weighted average of all of the terminal nodes that are reached, or, in the case of categorical variables, a voting majority
- 8. Random Forest
- 9. Support Vector Machine In a SVM model original objects (training data) are treated as a points in the space (input space) These are mapped (rearranged) to a new space (feature space) using mathematical functions called kernels After mapping objects of separate categories are divided by a clear gap as wide as possible
- 10. K Nearest Neighborhood Basic idea If it walks like a duck, quacks like a duck than it is probably a duck There are three key elements : a set of labeled objects (e.g., a set of stored records) a distance or similarity metric to compute distance between objects, and the value of k, the number of nearest neighbors. To classify an unlabeled object : the distance of this object to the labeled objects is computed, its k-nearest neighbors are identified, and the class labels of these nearest neighbors are then used to determine the class label of the object.
- 11. Results Random Forests with 500 trees gave 97% accuracy on the test data. SVM with RBF kernel and C=1, gave 97.71% accuracy on the test data. KNN with k=10 gave 96% accuracy.
- 12. Titanic: Machine Learning from Disaster
- 13. Problem The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. In this project, the analysis of what sorts of people were likely to survive is done. In particular, the tools of machine learning are applied to predict which passengers survived the tragedy.
- 14. Statement The historical data has been split into two groups, a 'training set' and a 'test set'. For the training set, the outcome whether or not the passenger survived the sinking ( 0 for deceased, 1 for survived ) is provided. The goal of the problem is to predict the outcome for each passenger in the test set.
- 15. Methods used to solve the problem • Random Forest • Support Vector Machine (SVM)
- 16. Results Random Forests with 300 trees gave 77.9% accuracy on the test data. SVM with RBF kernel and C=1, gave 77.7% accuracy on the test data.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment