The document outlines Jayanti Prasad's presentation on machine learning with scikit-learn. The presentation covers an introduction to machine learning and scikit-learn, and examples of linear regression, logistic regression, and nearest neighbors algorithms using scikit-learn and datasets like Iris.
Efficient Bayesian Inference for Online Matrix Factorization in Bandit SettingsVijay Pal Jat
An online recommender system sequentially recommends one or more items at a time to a user and receives interactive feedback from the user. Based on the feedback it receives from the users, the recommender system adapts its underlying model (e.g., a low-rank matrix factorization model for the user-item ratings matrix) in order to improve its future recommendations. In this thesis, we develop algorithms for interactive recommendation where the underlying model is based on a probabilistic matrix factorization operating under the bandit setting. We present several sequential inference algorithms to efficiently update the posterior distributions of latent factors of users and items and leverage these distributions to acquire the most useful ratings for the next round. In particular, we propose online posterior distribution inference algorithms based on (1) The Stochastic Gradient Langevin Dynamics (SGLD) algorithm, which is almost as efficient as standard stochastic gradient descent style updates; and (2) P ́olya-gamma latent variable based online Gibbs sampling algorithm. These algorithms enable us to develop efficient Thompson sampling based selection strategies to balance exploration-exploitation trade-offs. Experiments on synthetic and real-world datasets demonstrate that our algorithms are faster and lead to small cumulative regrets as compared to state-of-the-art methods.
These are slides about the 2021 Fighting Game Artificial Intelligence Competition virtually presented at the 2021 IEEE Conference on Games (CoG), August 16-20, 2021. (updated on August 28, 2021)
These are slides about the 2017 Fighting Game Artificial Intelligence Competition presented at the 2017 IEEE Conference on Computational Intelligence and Games (CIG 2017) on August 22, 2017 in New York City, USA.
Efficient Bayesian Inference for Online Matrix Factorization in Bandit SettingsVijay Pal Jat
An online recommender system sequentially recommends one or more items at a time to a user and receives interactive feedback from the user. Based on the feedback it receives from the users, the recommender system adapts its underlying model (e.g., a low-rank matrix factorization model for the user-item ratings matrix) in order to improve its future recommendations. In this thesis, we develop algorithms for interactive recommendation where the underlying model is based on a probabilistic matrix factorization operating under the bandit setting. We present several sequential inference algorithms to efficiently update the posterior distributions of latent factors of users and items and leverage these distributions to acquire the most useful ratings for the next round. In particular, we propose online posterior distribution inference algorithms based on (1) The Stochastic Gradient Langevin Dynamics (SGLD) algorithm, which is almost as efficient as standard stochastic gradient descent style updates; and (2) P ́olya-gamma latent variable based online Gibbs sampling algorithm. These algorithms enable us to develop efficient Thompson sampling based selection strategies to balance exploration-exploitation trade-offs. Experiments on synthetic and real-world datasets demonstrate that our algorithms are faster and lead to small cumulative regrets as compared to state-of-the-art methods.
These are slides about the 2021 Fighting Game Artificial Intelligence Competition virtually presented at the 2021 IEEE Conference on Games (CoG), August 16-20, 2021. (updated on August 28, 2021)
These are slides about the 2017 Fighting Game Artificial Intelligence Competition presented at the 2017 IEEE Conference on Computational Intelligence and Games (CIG 2017) on August 22, 2017 in New York City, USA.
These are slides about the 2019 Fighting Game Artificial Intelligence Competition presented at the 2019 IEEE Conference on Games (CoG) on August 21, 2019 in London, UK.
These are slides about the 2020 Fighting Game Artificial Intelligence Competition virtually presented at the 2020 IEEE Conference on Games (CoG), August 24-27, 2020.
http://imatge-upc.github.io/vqa-2016-cvprw/
This thesis studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework.As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62\% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.
These are slides about the 2018 Fighting Game Artificial Intelligence Competition presented at the 2018 IEEE Conference on Computational Intelligence and Games (CIG 2018) on August 15, 2018 in Maastricht, The Netherlands.
2015 Fighting Game Artificial Intelligence Competitionftgaic
These are the slides about the 2015 Fighting Game Artificial Intelligence Competition presented at the 2015 IEEE Conference on Computational Intelligence and Games (CIG 2015) on September 2, 2015 in Tainan, Taiwan.
A fun and learn Visual quiz with key answers on some of the core concepts of machine learning with key answers. The machine learning concepts covered are supervised learning, unsupervised learning, parametric approach, non-parametric approach, clustering, classification, regression, association rule mining, dimensionality reduction, feature selection, feature extraction, wrapper methods, PCA, LDA, overfitting, underfitting etc.
Top 5 Data Analytics course in Rajouri GardenSaanviJoshi3
Rajouri Garden, located in Delhi, is a thriving neighborhood that offers a range of opportunities for individuals seeking to enhance their data analytics skills. In today’s data-driven world, expertise in data analytics is highly valued across industries. If you’re looking for the best data analytics course in Rajouri Garden, this blog post will explore the top five data analytics courses in rajouri garden.
Mushrooms are a type of fungus that can be found all over the world. There are over 45000+ species of mushrooms and only a small fraction of them are edible.
•Mushroom classification is the process of identifying the mushrooms and classifying them into different categories based on their edibility.
CNN (Convolutional Neural Network) is a deep learning neural network specialized for image and video analysis.
Enhanced Privacy Preserving Access Control in Incremental Data using microagg...ravi sharma
The use of one or more techniques designed to make it impossible or at least more difficult to identify a particular individual from stored data related to them.
These are slides about the 2019 Fighting Game Artificial Intelligence Competition presented at the 2019 IEEE Conference on Games (CoG) on August 21, 2019 in London, UK.
These are slides about the 2020 Fighting Game Artificial Intelligence Competition virtually presented at the 2020 IEEE Conference on Games (CoG), August 24-27, 2020.
http://imatge-upc.github.io/vqa-2016-cvprw/
This thesis studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework.As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62\% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.
These are slides about the 2018 Fighting Game Artificial Intelligence Competition presented at the 2018 IEEE Conference on Computational Intelligence and Games (CIG 2018) on August 15, 2018 in Maastricht, The Netherlands.
2015 Fighting Game Artificial Intelligence Competitionftgaic
These are the slides about the 2015 Fighting Game Artificial Intelligence Competition presented at the 2015 IEEE Conference on Computational Intelligence and Games (CIG 2015) on September 2, 2015 in Tainan, Taiwan.
A fun and learn Visual quiz with key answers on some of the core concepts of machine learning with key answers. The machine learning concepts covered are supervised learning, unsupervised learning, parametric approach, non-parametric approach, clustering, classification, regression, association rule mining, dimensionality reduction, feature selection, feature extraction, wrapper methods, PCA, LDA, overfitting, underfitting etc.
Top 5 Data Analytics course in Rajouri GardenSaanviJoshi3
Rajouri Garden, located in Delhi, is a thriving neighborhood that offers a range of opportunities for individuals seeking to enhance their data analytics skills. In today’s data-driven world, expertise in data analytics is highly valued across industries. If you’re looking for the best data analytics course in Rajouri Garden, this blog post will explore the top five data analytics courses in rajouri garden.
Mushrooms are a type of fungus that can be found all over the world. There are over 45000+ species of mushrooms and only a small fraction of them are edible.
•Mushroom classification is the process of identifying the mushrooms and classifying them into different categories based on their edibility.
CNN (Convolutional Neural Network) is a deep learning neural network specialized for image and video analysis.
Enhanced Privacy Preserving Access Control in Incremental Data using microagg...ravi sharma
The use of one or more techniques designed to make it impossible or at least more difficult to identify a particular individual from stored data related to them.
Learning Strategy with Groups on Page Based Students' Profilesaciijournal
Most of students desire to know about their knowledge level to perfect their exams. In learning environment
the fields of study overwhelm on page with collaboration or cooperation. Students can do their exercises
either individually or collaboratively with their peers. The system provides the guidelines for students'
learning system about interest fields as Java in this system. Especially the system feedbacks information
about exam to know their grades without teachers. The participants who answered the exam can discuss
with each others because of sharing e mail and list of them.
Learning strategy with groups on page based students' profilesaciijournal
Most of students desire to know about their knowledge level to perfect their exams. In learning environment the fields of study overwhelm on page with collaboration or cooperation. Students can do their exercises either individually or collaboratively with their peers. The system provides the guidelines for students' learning system about interest fields as Java in this system. Especially the system feedbacks information about exam to know their grades without teachers. The participants who answered the exam can discuss with each others because of sharing e mail and list of them.
An overview of Media Analytics outlining the evolution of image classification and knowledge extraction. The presentation offers an insight into the Big-Data Analytics for Media Management.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Machine learning and optimization techniques for electrical drives.pptx
Scikit-learn1
1. Machine Learning with Scikit learn
Jayanti Prasad
Inter-University Centre for Astronomy & Astrophysics (IUCAA)
PUNE - 411007
June 22, 2018
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 1 / 41
2. Plan of the Talk
Plan of the Talk
Introduction
Scikit-learn
(1) Linear Regression
(2) Logistic Regression
(3) Nearest Neighbors
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 2 / 41
3. Introduction
Machine Learing : Definitions
Arthur Samuel
A field of study that gives computers the ability to learn without being
explicitly programmed.
Tom M. Mitchell
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with the experience E.
Here we need to define :
Task (T), either one or more
Experience (E)
Performance (P)
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 3 / 41
4. Introduction
Machine Learning
Training Machines to see patterns in the data and learn from that - Ability
to learn in a changing environment.
INPUT
OUTPUT
ALGORITHM
INPUT
OUTPUT
ALGORITHM
NORMAL COMPUTING MACHINE LEARNING
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 4 / 41
5. Introduction
Machine Learning
DATA
LABELLED (X,y) UNLABELLED (X)
X-Features
y- Target
SUPERVISED
UNSUPERVISED
REGRESSION CLASSIFICATION
CLUSTERING
SIMILARITY
LINK PREDICTION
DATA REDUCTION
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 5 / 41
6. Introduction
Layman’s Machine Learning (Supervised) Recipes
1 Split the data into two parts: Training/Learning and Testing.
(X, y) = (XTrain, yTrain) + (XTest, yTest) (1)
2 Find a model f (X, θ) : X −→ y from the training data. This need
some mismatch function to be minimized.
χ2
(θ) =
N
i=1
(yi − f (Xi , θ))2
(2)
3 The model could be predictive (make predictions in the future), or
descriptive (gain knowledge from data), or both.
4 Apply the model on testing data and find the accuracy.
5 Once satisfied apply model for some unknown feature X and get the
target y for that.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 6 / 41
8. scikit-learn
Scitkit-learn : What it is ?
Scikit-learn is a free machine learning library for the Python
programming language.
The scikit-learn project started as scikits.learn, a Google Summer of
Code project by David Cournapeau.
Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a
separately-developed and distributed third-party extension to SciPy.
Scikit-learn is largely written in Python, with some core algorithms
written in Cython to achieve performance.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 8 / 41
9. scikit-learn
Scikit-learn : Why to learn ?
If you have problem with a lot of data and you want to draw insight
from that using some open source python library.
If you want to understand common problems of Machine Learning
such as regression, classification, clusters ...
If you want to develop new techniques or compare different techniques.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 9 / 41
10. scikit-learn
Scikit-learn : What it has ?
Reference : http://scikit-learn.org/stable/index.html
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 10 / 41
11. scikit-learn
Scikit-learn : Installation
Scikit-learn requires:
Python (>= 2.7 or >= 3.3),
NumPy (>= 1.8.2),
SciPy (>= 0.13.3).
One can install it from the source by cloning the git repo:
git clone https://github.com/scikit-learn/scikit-learn.git
pip install -U scikit-learn
conda install scikit-learn
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 11 / 41
17. scikit-learn
Test datasets : Iris Flower species
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 17 / 41
18. scikit-learn
Test datasets : Iris Flower species
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 18 / 41
19. Linear Regression
Linear Regression : Theory
1 The data is fitted with a linear model :
yi =
N
j=1
wi xi = w0 + w1xi for one dimensional case. (3)
2 Minimize the mismatch (wrt w0 and w1) for training data :
χ2
(w, X) = |yi − (w0 + w1xi )|2
(4)
3 Apply the model on testing data and see the accuracy.
4 Once satisfied apply the model for unknown data.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 19 / 41
20. Linear Regression Example 1
Linear Regression : Examples
[linear-regression-general.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 20 / 41
21. Linear Regression Example 1
Linear Regression: Examples [linear-regression-general.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 21 / 41
22. Linear Regression Example 1
Linear Regression : Ridge (Regularization)
Ordinary Least Square (OLS) can over-fit the data and in order to
avoid that we add an extra term in the mismatch function.
Q(w, X) = |yi − (w0 + w1xi )|2
+ λR(w, X), (5)
where λ is the regularization parameter and R is regularization
function.
There are many choices for R and one of those is the what is called L2
norm.
Q(w, X) = |yi − (w0 + w1xi )|2
+ λ||w||2
2 (6)
There are many ways to set the regularization parameter λ and one of
those is generalized Cross-Validation.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 22 / 41
23. Linear Regression Example 2
Linear Regression : Example [linear-regression-ridge.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 23 / 41
24. Linear Regression Example 2
Linear Regression [linear-regression-ridge.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 24 / 41
25. Logistic Regression
Logistic Regression
It is used when the ’target’ is binary - ’0’ and ’1’ or ’yes’ and ’no’.
If the probability of ’1’ is ’p’ then the ’odd’ is defined as p/(1 − p) and
a function called ’logit’ is defined as :
l = ln
p
1 − p
(7)
We try to compute p by fitting a line to the ’logit’ function:
ln
p
1 − p
= w0 + w1 ∗ x, (8)
which gives:
p =
1
1 + e−(w0+w1∗x)
(9)
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 25 / 41
27. Logistic Regression Example 3
Logistic Regression : MNIST
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 27 / 41
28. Logistic Regression Example 3
Logistic Regression : MNIST
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 28 / 41
29. Logistic Regression Example 3
Logistic Regression: MNIST
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 29 / 41
30. Logistic Regression Example 4
Logistic Regression: Iris
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 30 / 41
31. Logistic Regression Example 4
Logistic Regression: Iris
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 31 / 41
32. Nearest neighbors
Nearest neighbors : Theory
Nearest Neighbors is one of the import techniques of ML which can be
used for supervised as well as un-supervised learning.
sklearn.neighbors package from scikit-learn can be used for this
purpose.
The principle behind nearest neighbor methods is to find a predefined
number of training samples closest in distance to the new point, and
predict the label from these.
The number of samples can be a user-defined constant (k-nearest
neighbor learning), or vary based on the local density of points
(radius-based neighbor learning).
The distance can, in general, be any metric measure: standard
Euclidean distance is the most common choice,
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 32 / 41
33. Nearest neighbors
Nearest neighbors : Scikit-learn
scikit-learn implements two different nearest neighbors classifiers:
KNeighborsClassifier: learning based on the k nearest neighbors of each
query point.
RadiusNeighborsClassifier: learning based on the number of neighbors
within a fixed radius r of each training point.
Weights to neighbours can be uniform or non-uniform.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 33 / 41
34. Nearest neighbors
Nearest Neighbors : Regression
NN based regression can be used for the purpose when ’labels’ are
continuous in place of discrete.
The ’label’ for the query point can be found from the mean of the
labels of the nearest points.
NN based regression can be used for finding some missing/blurred area
in an image.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 34 / 41
36. Nearest neighbors Example 5
Iris with NN: [knn-neighbours-iris.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 36 / 41
37. Nearest neighbors Example 5
Iris classification with Nearest Neighbors
We can consider only two features of the iris flowers.
Can make prediction for all the points on the two dimensional grid on
the basis of nearest neighbors.
The result can be shown by a plot with distinct regions for different
type of flowers.
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 37 / 41
38. Nearest neighbors Example
Iris KNN : [knn-iris.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 38 / 41
39. Nearest neighbors Example
Iris KNN : [knn-iris.ipynb]
Jayanti Prasad (IUCAA-PUNE) Machine Learning with scikit-learn June 22, 2018 39 / 41