Naive Bayes Classifier is a machine learning technique that is exceedingly useful to address several classification problems. It is often used as a baseline classifier to benchmark results. It is also used as a standalone classifier for tasks such as spam filtering where the naive assumption (conditional independence) made by the classifier seem reasonable. In this presentation we discuss the mathematical basis for the Naive Bayes and illustrate with examples
In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4. 5 algorithm, and is typically used in the machine learning and natural language processing domains.
In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4. 5 algorithm, and is typically used in the machine learning and natural language processing domains.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
K-Nearest neighbor is one of the most commonly used classifier based in lazy learning. It is one of the most commonly used methods in recommendation systems and document similarity measures. It mainly uses Euclidean distance to find the similarity measures between two data points.
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
Introduction to Bayesian classifier. It describes the basic algorithm and applications of Bayesian classification. Explained with the help of numerical problems.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
The Text Classification slides contains the research results about the possible natural language processing algorithms. Specifically, it contains the brief overview of the natural language processing steps, the common algorithms used to transform words into meaningful vectors/data, and the algorithms used to learn and classify the data.
To learn more about RAX Automation Suite, visit: www.raxsuite.com
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
An introduction to the Transformers architecture and BERTSuman Debnath
The transformer is one of the most popular state-of-the-art deep (SOTA) learning architectures that is mostly used for natural language processing (NLP) tasks. Ever since the advent of the transformer, it has replaced RNN and LSTM for various tasks. The transformer also created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT.
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
K-Nearest neighbor is one of the most commonly used classifier based in lazy learning. It is one of the most commonly used methods in recommendation systems and document similarity measures. It mainly uses Euclidean distance to find the similarity measures between two data points.
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
Introduction to Bayesian classifier. It describes the basic algorithm and applications of Bayesian classification. Explained with the help of numerical problems.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
The Text Classification slides contains the research results about the possible natural language processing algorithms. Specifically, it contains the brief overview of the natural language processing steps, the common algorithms used to transform words into meaningful vectors/data, and the algorithms used to learn and classify the data.
To learn more about RAX Automation Suite, visit: www.raxsuite.com
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
Presented by Ted Xiao at RobotXSpace on 4/18/2017. This workshop covers the fundamentals of Natural Language Processing, crucial NLP approaches, and an overview of NLP in industry.
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Kinjal Basu from LinkedIn discussed Online Parameter Selection for web-based Ranking vis Bayesian Optimization
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processingNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 2.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
Artificial Intelligence Course: Linear models ananth
In this presentation we present the linear models: Regression and Classification. We illustrate with several examples. Concepts such as underfitting (Bias) and overfitting (Variance) are presented. Linear models can be used as stand alone classifiers for simple cases and they are essential building blocks as a part of larger deep learning networks
CS241 - Fall 2017 - Assignment #6Assigned November 21st, .docxannettsparrow
CS241 - Fall 2017 - Assignment #6
Assigned: November 21st, 2017
Due: November 30th, 2017
Collaboration policy: The goal of homework is to give you practice in
mastering the course material. Consequently, you are encouraged to collab-
orate with others (groups of at most three). In fact, students who form
study groups generally do better on exams than do students who work alone.
If you do work in a study group, however, you owe it to yourself and your
group to be prepared for your study group meeting. Specifically, you should
spend at least 30–45 minutes trying to solve each problem beforehand. If
your group is unable to solve a problem, it is your responsibility to get help
from the instructor before the assignment is due. You must write up
each problem solution and/or code any programming assignment
by yourself without assistance, even if you collaborate with others to solve
the problem. You are asked to identify your collaborators. If you did
not work with anyone, you must write “Collaborators: none.” If you obtain
a solution through research (e.g., on the web), acknowledge your source, but
write up the solution in your own words. It is a violation of this pol-
icy to submit a problem solution that you cannot orally explain
to the instructor. No other student may use your solutions; this includes
your writing, code, tests, documentation, etc. It is a violation of this policy
to permit anyone other than the instructor and yourself read-access to the
location where you keep your solutions.
1
Submission Guidelines: Your group has to submit your work on Black-
board by the due date. Only one submission per group is necessary.
Just make sure that you identify your group in the header by
putting all names in the “author” field. For each of the program-
ming assignments you must use the header template provided in Blackboard.
The header must contain, your name, course number, semester, homework
number, problem number, and list of collaborators (if any, otherwise put
“none”). Your answers to questions that do not require coding must be in-
cluded in this header as well. Your code must follow the Java formatting
standards posted in Blackboard. Format will also be part of your grade.
To complete your submission, you have to upload two files to Blackboard:
your source file and your class file. The submission will be returned without
grading if any of these guidelines is not followed.
Style and Correctness: Keep in mind that your goal is to communi-
cate. Full credit will be given only to the correct solution which is described
clearly. Convoluted and obtuse descriptions might receive low marks, even
when they are correct. Also, aim for concise solutions, as it will save you
time spent on write-ups, and also help you conceptualize the key idea of the
problem.
2
Assignment 6
Programming Assignment Grading Rubric:
The following rubric applies only to the programming assignment.
Program
characteristic
Program feature
Credit
possible
Part 3
.
Generative Adversarial Networks : Basic architecture and variantsananth
In this presentation we review the fundamentals behind GANs and look at different variants. We quickly review the theory such as the cost functions, training procedure, challenges and go on to look at variants such as CycleGAN, SAGAN etc.
Convolutional Neural Networks : Popular Architecturesananth
In this presentation we look at some of the popular architectures, such as ResNet, that have been successfully used for a variety of applications. Starting from the AlexNet and VGG that showed that the deep learning architectures can deliver unprecedented accuracies for Image classification and localization tasks, we review other recent architectures such as ResNet, GoogleNet (Inception) and the more recent SENet that have won ImageNet competitions.
Artificial Neural Networks have been very successfully used in several machine learning applications. They are often the building blocks when building deep learning systems. We discuss the hypothesis, training with backpropagation, update methods, regularization techniques.
In this presentation we discuss the convolution operation, the architecture of a convolution neural network, different layers such as pooling etc. This presentation draws heavily from A Karpathy's Stanford Course CS 231n
Mathematical Background for Artificial Intelligenceananth
Mathematical background is essential for understanding and developing AI and Machine Learning applications. In this presentation we give a brief tutorial that encompasses basic probability theory, distributions, mixture models, anomaly detection, graphical representations such as Bayesian Networks, etc.
This presentation discusses the state space problem formulation and different search techniques to solve these. Techniques such as Breadth First, Depth First, Uniform Cost and A star algorithms are covered with examples. We also discuss where such techniques are useful and the limitations.
This is the first lecture of the AI course offered by me at PES University, Bangalore. In this presentation we discuss the different definitions of AI, the notion of Intelligent Agents, distinguish an AI program from a complex program such as those that solve complex calculus problems (see the integration example) and look at the role of Machine Learning and Deep Learning in the context of AI. We also go over the course scope and logistics.
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
Deep Learning techniques have enabled exciting novel applications. Recent advances hold lot of promise for speech based applications that include synthesis and recognition. This slideset is a brief overview that presents a few architectures that are the state of the art in contemporary speech research. These slides are brief because most concepts/details were covered using the blackboard in a classroom setting. These slides are meant to supplement the lecture.
Overview of TensorFlow For Natural Language Processingananth
TensorFlow open sourced recently by Google is one of the key frameworks that support development of deep learning architectures. In this slideset, part 1, we get started with a few basic primitives of TensorFlow. We will also discuss when and when not to use TensorFlow.
This slide set on convolutional neural networks is meant to be supplementary material to the slides from Andrej Karpathy's course. In this slide set we explain the motivation for CNN and also describe how to understand CNN coming from a standard feed forward neural networks perspective. For detailed architecture and discussions refer the original slides. I might post more detailed slides later.
This presentation discusses decision trees as a machine learning technique. This introduces the problem with several examples: cricket player selection, medical C-Section diagnosis and Mobile Phone price predictor. It discusses the ID3 algorithm and discusses how the decision tree is induced. The definition and use of the concepts such as Entropy, Information Gain are discussed.
This presentation is a part of ML Course and this deals with some of the basic concepts such as different types of learning, definitions of classification and regression, decision surfaces etc. This slide set also outlines the Perceptron Learning algorithm as a starter to other complex models to follow in the rest of the course.
This is the first lecture on Applied Machine Learning. The course focuses on the emerging and modern aspects of this subject such as Deep Learning, Recurrent and Recursive Neural Networks (RNN), Long Short Term Memory (LSTM), Convolution Neural Networks (CNN), Hidden Markov Models (HMM). It deals with several application areas such as Natural Language Processing, Image Understanding etc. This presentation provides the landscape.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
In this presentation we discuss the hypothesis of MaxEnt models, describe the role of feature functions and their applications to Natural Language Processing (NLP). The training of the classifier is discussed in a later presentation.
In this presentation we describe the formulation of the HMM model as consisting of states that are hidden that generate the observables. We introduce the 3 basic problems: Finding the probability of a sequence of observation given the model, the decoding problem of finding the hidden states given the observations and the model and the training problem of determining the model parameters that generate the given observations. We discuss the Forward, Backward, Viterbi and Forward-Backward algorithms.
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
The discrete or atomic representation of words don't scale well to support rich semantics. Distributed representations associate a word with a vector based on the context in which the word occurs. In this presentation we describe the problem of word representation with a few illustrations and also describe the approach taken by word2vec. We also discuss the limitations of using a static database approach.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
2. Classification Problems with Naïve Bayes
• Two step process
• Build the model – this is the training process where
we estimate model parameters
• Use the model – here, we predict the output class
given the inputs
Copyright 2016 JNResearch, All Rights Reserved
Train
Predict
Training
Data
Input to be
classified
Model
Prediction
3. When to use Naïve Bayes: Three Scenarios
• You have come up with a neat solution to a ML problem.
Your manager wants you to do a quick demo to your CEO
in the next couple of hours.
• You are assigned a classification problem similar to spam
filtering. Your manager says: We need this feature in our
next release, a less accurate model is okay.
• You have come up with a sophisticated deep learning
based model. You submitted this for review and you are
asked to benchmark your results against standard
approaches
Copyright 2016 JNResearch, All Rights Reserved
4. Naïve Bayes Classifier
A simple classifier model that is:
• Based on the Bayes theorem
• Uses Supervised Learning
• Easy to build
• Faster to train, compared to the other models
• Often used as a baseline classifier for benchmarking
Copyright 2016 JNResearch, All Rights Reserved
5. Foundation: Bayes Theorem
𝐹𝑟𝑜𝑚 𝐵𝑎𝑦𝑒𝑠 𝑡ℎ𝑒𝑜𝑟𝑒𝑚, 𝑤𝑒 ℎ𝑎𝑣𝑒: 𝑃 𝑌 𝑋 = 𝑃 𝑋 𝑌 𝑃(𝑌)/𝑃(𝑋)
Suppose Y represents the class variable and X1, X2, X3, … Xn are inputs:
𝑃 𝑌 𝑋1, … , 𝑋 𝑛 =
𝑃 𝑋1, … , 𝑋 𝑛 𝑌 𝑃(𝑌)
𝑃(𝑋1, … , 𝑋 𝑛)
Assuming Xi ⊥ Xj given Y for all i, j , we may write the above equation as:
𝑃 𝑌 𝑋1, … , 𝑋 𝑛 =
𝑃 𝑋1|𝑌) 𝑃(𝑋2 𝑌 … 𝑃(𝑋 𝑛|𝑌) 𝑃(𝑌)
𝑃(𝑋1, … , 𝑋 𝑛)
𝑃 𝑌 𝑋1, … , 𝑋 𝑛 ∝ 𝑃(𝑌) 𝑖=1
𝑛
𝑃(𝑋𝑖|𝑌)
𝑌 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑦 𝑃(𝑌)
𝑖=1
𝑛
𝑃(𝑋𝑖|𝑌)
Copyright 2016 JNResearch, All Rights Reserved
(Naïve Assumption)
6. What is Naïve about Naïve Bayes?
• In many applications, treating one element in the input (Xi) independent of every other element in
the input (Xj for all j) and also ignoring word order is quite a strong assumption
• Why?
• The sentence: “day great is today a” is a jumbled form of “Today is a great day”. This suggests that the
ordering of words is important for us to perform a semantic interpretation. NB classifier treats each word
as independent and hence ignores the order, which in many cases will not hold.
• Take a selfie of yourself, the picture looks great! What if we randomly shift the pixels around throughout
the image? Though all pixels are still present in the modified image, their order is severely altered.
• But:
• Despite the Naïve assumption, NB Classifier still works and produces accurate results for a
number of applications!
• Consider the problem of search using key words. Does the word order matter?
Copyright 2016 JNResearch, All Rights Reserved
7. Estimating the model parameters
• Naïve Bayes Model: 𝑌 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑦 𝑃(𝑌) 𝑖=1
𝑛
𝑃(𝑋𝑖|𝑌)
• The model parameters are: 𝑃(𝑌) and 𝑃(𝑋𝑖|𝑌) for all values of Xi
• Given a dataset that has several training examples, where each example has an
input (𝑋1, … , 𝑋 𝑛) and the expected target output (Y), we need to “learn” the
model
• We can perform maximum likelihood estimates in order to determine model
parameters
Copyright 2016 JNResearch, All Rights Reserved
9. Naïve Bayes Case Study (Ref: Kaggle)
Copyright 2016 JNResearch, All Rights Reserved
10. Document Classification With Naïve Bayes
• Document (or text in our discussion today) classification assigns a class label to
the given document. Formally:
• Given the input as document d and a set of classes 𝐶 = {𝑐1, 𝑐2, … , 𝑐 𝑛}, predict a class 𝑐 ∈ 𝐶
• Example:
• Gmail categorizes the incoming mails in to Primary, Social, Promotions, Junk – we can define:
𝐶 = {𝑃𝑟𝑖𝑚𝑎𝑟𝑦, 𝑆𝑜𝑐𝑖𝑎𝑙, 𝑃𝑟𝑜𝑚𝑜𝑡𝑖𝑜𝑛𝑠, 𝐽𝑢𝑛𝑘}
• Assign a class label 𝑐 ∈ 𝐶 for every incoming mail d
• The term document could be a plain text or even a compound multimedia
document. We use the term document to refer to any entity that can be classified.
Copyright 2016 JNResearch, All Rights Reserved
11. Can we build rule based models to do this?
• For instance, in email categorization, one may apply a set of if-then-else rules and
determine the class of input mail
• With well written rules, in general, one can get high precision but often low recall
• Drafting a set of comprehensive rules is difficult and expensive as they need
expert knowledge
• Example: Suppose I receive a mail from flipkart, should that be classified as a
promotion or primary?
• It depends!
Copyright 2016 JNResearch, All Rights Reserved
12. Supervised Machine Learning
• Input
• A document d, consisting of word tokens 𝑤1, 𝑤1, … , 𝑤𝑗
• A finite set of classes: 𝐶 = {𝑐1, 𝑐2, … , 𝑐 𝑛}
• The training dataset: 𝐷𝑡𝑟𝑎𝑖𝑛 = { 𝑑1, 𝑐1 , 𝑑2, 𝑐2 , … , (𝑑 𝑚, 𝑐 𝑚)}
• Output
• A model M such that: 𝑀: 𝑑 → 𝑐
Copyright 2016 JNResearch, All Rights Reserved
13. Bag of words representation
• A document can be considered to be an ordered
sequence of words
• Naïve Bayes classifier ignores the word order and
correlations and hence we may represent a
document d as bag of words or unigrams
• In a typical English sentence, there may be many
words that are required for grammatical purposes
and may not contribute to the classification decision.
• We can do some pre-processing and remove such
words before sending the document to the classifier
Copyright 2016 JNResearch, All Rights Reserved
𝑀 "I love my Samsung Galaxy Grand 2" = 𝑐
𝑀 "love Samsung Galaxy Grand" = 𝑐
14. Text Classification with Multinomial Naïve Bayes
• Recall: 𝑌 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑦 𝑃(𝑌) 𝑖=1
𝑛
𝑃(𝑋𝑖|𝑌) where the model parameters are: 𝑃(𝑌) and
𝑃(𝑋𝑖|𝑌) for all values of Xi
• For the document classification problem with bag of words model, Xi is a word in the
document and Y is the document class
• Estimate the model parameters as below and save the model as a table T:
• For each class 𝑐 ∈ 𝐶, determine
• 𝑀𝐿𝐸: 𝑃 𝐶 = 𝑐 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑡ℎ𝑎𝑡 𝑎𝑟𝑒 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑐
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠
• For each word 𝑤𝑖 ∈ 𝑉 𝑎𝑛𝑑 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑐𝑗 ∈ 𝐶, 𝑀𝐿𝐸 𝑃 𝑤𝑖 𝑐𝑗 =
𝑐𝑜𝑢𝑛𝑡(𝑤 𝑖,𝑐 𝑗)
𝑤∈𝑉 𝑐𝑜𝑢𝑛𝑡(𝑤,𝑐 𝑗)
• Prediction: Given a new input, generate the word tokens using the same procedure used
for training. Retrieve the model values from the Table T and compute 𝑌
Copyright 2016 JNResearch, All Rights Reserved
15. Are we done? Not yet
• What happens if 𝑐𝑜𝑢𝑛𝑡(𝑤𝑖, 𝑐𝑗) = 0 ?
• This can happen if the new unseen input has a word 𝑤𝑖, that was not
encountered in the training data.
• Example: The training data contains “I love my Samsung Galaxy Grand
2” but doesn’t have the word, say, “adore”. If the unseen input is: “I
adore my Samsung Galaxy Grand 2”, the entire probability
computation will be zero!
Copyright 2016 JNResearch, All Rights Reserved
16. Laplace Smoothing (Add 1)
• Assume any word in the vocabulary has occurred at least once
• This assumption results in the estimation:
𝑃 𝑤𝑖 𝑐𝑗 =
𝑐𝑜𝑢𝑛𝑡 𝑤𝑖, 𝑐𝑗 + 1
( 𝑤∈𝑉 𝑐𝑜𝑢𝑛𝑡 𝑤, 𝑐𝑗 ) + |𝑉|
• The above ensures that the probabilities don’t go to zero
• What happens when you encounter a word that is not in the vocabulary?
Copyright 2016 JNResearch, All Rights Reserved
17. Variants of Naïve Bayes
• In the previous slides we showed the MLE probability computation to be based on counts of
words in each document. This is called the multinomial model.
• Multinomial is a natural fit when we solve topic classification kind of problems
• E.g. Consider the problem of classifying a given article in to Scientific, Business, Sports.
• Sometimes, just the presence or absence of a given word in a document is adequate in order to
classify. We may choose a Binarized Naïve Bayes in such cases.
• E.g Consider the problem of sentiment analysis. If there is a word “fantastic”, it doesn’t need to be
repeated in the same document in order for us to conclude the polarity of the sentiment.
• A number of applications may involve features that are real valued. We can use a Gaussian (or
some other) variant for these.
Copyright 2016 JNResearch, All Rights Reserved
18. Multinomial Naïve Bayes
• In a multinomial classification model, the frequency of occurrence of each word
in the document is taken in to account (instead of presence/absence)
• Compute prior for classes using maximum likelihood estimates
• The algorithm to compute 𝑃 𝑤𝑖 𝑐𝑗 is:
• Concatenate all documents that have the class 𝑐𝑗, let it be textj
• Let n be the number of tokens in textj and 𝛼 be the constant used for smoothing
• For each word 𝑤 𝑘 in the vocabulary, let nk be the number of occurrences of 𝑤 𝑘 in the textj
𝑃 𝑤𝑖 𝑐𝑗 =
𝑛 𝑘 + 𝛼
𝑛 + 𝛼|𝑉|
Copyright 2016 JNResearch, All Rights Reserved
19. Binarized Multinomial Naïve Bayes
• In a binarized multinomial classification model, we count only the presence or
absence of a given word (or feature) in the given document as opposed to using
the frequency. That is, we clamp the word count of a word w in a document j to 1
• Compute prior for classes using maximum likelihood estimates
• The algorithm to compute 𝑃 𝑤𝑖 𝑐𝑗 is:
• In each document d, keep only one instance of the given word w (Remove duplicates)
• Concatenate all documents that have the class 𝑐𝑗, let it be textj
• Let n be the number of tokens in textj and 𝛼 be the constant used for smoothing
• For each word 𝑤 𝑘 in the vocabulary, let nk be the number of occurrences of 𝑤 𝑘 in the textj
𝑃 𝑤𝑖 𝑐𝑗 =
𝑛 𝑘 + 𝛼
𝑛 + 𝛼|𝑉|
Copyright 2016 JNResearch, All Rights Reserved
20. Gaussian Naïve Bayes
• So far, we have looked at text and dealt with word occurrence counts that are discrete
values
• What happens when the features are continuous valued or even vectors of continuous
values? E.g images with RGB values?
• Gaussian Naïve Bayes is useful to classify such inputs
𝑃 𝑋 𝑛 𝑌 =
1
2𝜋𝜎 𝑦
2
exp(−
𝑥𝑖 − 𝜇 𝑦
2
2𝜎 𝑦
2 )
• Estimate the parameters 𝜎 𝑦 𝑎𝑛𝑑 𝜇 𝑦 using maximum likelihood
Copyright 2016 JNResearch, All Rights Reserved
21. Estimating Parameters (Ref: T Mitchelle)
Copyright 2016 JNResearch, All Rights Reserved
How many parameters
must we estimate for
Gaussian Naïve Bayes
if Y has k possible
values:
𝑋 = 𝑋1 𝑋2 … 𝑋 𝑛
22. Gaussian Naïve Bayes: Example
• Suppose we are required to predict the price range (high_end, mid_range, low_end) of a
mobile phone given its specifications.
• We observe that some elements in the specification (e.g screen size) are continuous
variables.
• We can either discretize these elements and use discrete NB classifier or we can directly use
a Gaussian NB
Copyright 2016 JNResearch, All Rights Reserved
23. Practical Considerations
• Probability computations in a joint distribution involve multiplying many terms
that are small fractions. This might sometime cause underflow errors. Use log
probabilities to avoid this issue
• Use the distribution that is natural to the problem on hand
• The choice of the distribution to use is your decision. There is no rule that says you should
use Gaussian all the time!
• You can discretize continuous variables so that you can use Binarized, Bernoulli or
Multinomial discrete Naïve Bayes. But you might lose fidelity due to discretization.
• Exercise judgement while choosing the features, you can minimize the data
required by removing redundant features
Copyright 2016 JNResearch, All Rights Reserved
24. Case Study: Accurate Searching on Twitter
• Twitter’s search API allows keywords based search
• Suppose we search Twitter with the keyword “congress”,
we might end up getting tweets that pertain to Indian
National Congress, Mobile World Congress, American
Congress, Science Congress and so on.
• A narrow search using “exact” search phrase would
improve precision but will miss many relevant and
interesting tweets
• Is there a way to search the Twitter such that we get
precise matches without missing interesting and relevant
tweets?
Copyright 2016 JNResearch, All Rights Reserved
25. Summary
• Despite the naïve assumptions, Naïve Bayes classifier is pretty useful. Do not skip
it in favour of complex models without evaluating it for your application. You may
be in for surprise!
• There are many variants of Naïve Bayes Classifier, the common thing about them
is that all are based on Bayes theorem and make same assumptions.
• Choose binarized model if number of occurrences of a given word do not
contribute to the classification process.
• If the features are continuous variables, use Gaussian Naïve Bayes or perform
discretization if you get good accuracy
Copyright 2016 JNResearch, All Rights Reserved