SlideShare a Scribd company logo
P1WU
UNIT – III: CLASSIFICATION
Topic 1: A CHARACTERIZATION OF TEXT
CLASSIFICATION
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT III
1.A Characterization of
Text Classification
2. Unsupervised
Algorithms: Clustering
3. Naïve Text Classification
4. Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or
Dimensionality Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional
Indexing
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
• Scientists became very serious about addressing the question:
• “Can we build a model that learns from available data and
automatically makes the right decisions and predictions?”
• Answer can be found in numerous applications that are emerging
from the fields of
1. pattern classification,
2. machine learning, and
3. artificial intelligence.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
• Data from various sensoring devices combined with powerful
learning algorithms and domain knowledge led to :
• many great inventions that we now take for granted in our
everyday life:
• Internet queries via search engines like Google,
• text recognition at the post office,
• barcode scanners at the supermarket, the diagnosis of diseases,
• speech recognition by Siri or
• Google Now on our mobile phone, just to name a few.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
• Classification is:
• the data mining process of
• finding a model (or function) that
• describes and distinguishes data classes or concepts,
• for the purpose of being able to use the model to predict the class of objects
whose class label is unknown.
• That is, predicts categorical class labels (discrete or nominal).
• Classifies the data (constructs a model) based on the training set.
• It predict group membership for data instances.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
What is CLASSIFICATION?
• Classification and prediction are :
• two forms of data analysis that can used to extract models describing
important data classes or to predict the future data trends.
• C & P help us to provide a better understanding of large data.
• Classification predicts categorical (discrete, unordered) labels.
• Prediction models continuous valued functions.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
• How can we classify?
• The trick here is Machine Learning which requires us to make classifications based on past
observations (the learning part).
• We give the machine a set of data having texts with labels tagged to it and then we let the model
to learn on all these data which will later give us some useful insight on the categories of text
input we feed.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Applications of Classification
• Classification of (potential) customers for:
• Credit approval, risk prediction, selective marketing
• Performance prediction based on
• selected indicators
• Medical diagnosis based on symptoms or reactions to Therapy
• Application areas:
• Credit approval
• Target marketing
• Medical diagnosis
• Treatment effectiveness analysis
• Performance prediction
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
When is classification needed?
• Scenarios:
• In each of these examples, the data analysis task is classification,
• where a model or classifier is constructed to predict categorical labels, such as
• “safe” or “risky” for the loan application data;
• “yes” or “no” for the marketing data; or
• “treatment A,” “treatment B,” or “treatment C” for the medical data.
• These categories can be represented by discrete values, where the ordering among values
has no meaning.
• For example,
• the values 1, 2, and 3 may be used to represent treatments A, B, and C,
• where there is no ordering implied among this group of treatment regimes.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO CLASSIFICATION
Aim: predict categorical class labels
for new tuples/samples
Input: a training set of tuples/samples,
each with a class label
Output: a model (a classifier) based on
the training set and the class labels
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Why Classification?
• A classical problem extensively studied by
• statisticians and machine learning researchers
• Predicts categorical class labels.
• Produces a model (classifier).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Typical Applications of Classification
• Example:
• {credit history, salary} credit approval ( Yes/No)
• {Temp, Humidity}  Rain (Yes/No)
• A set of documents  sports, technology, etc.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
• Another Example:
• If x >= 90 then grade =A.
• If 80<=x<90 then grade =B.
• If 70<=x<80 then grade =C.
• If 60<=x<70 then grade =D.
• If x<50 then grade =F.
WHAT ARE TEXT CLASSIFICATION?
• Text classification is a machine
learning technique that assigns a
set of predefined categories
to open-ended text.
• Text classifiers can be used to
organize, structure, and categorize
pretty much any kind of text –
from documents, medical studies
and files, and all over the web.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is meant by text classification?
• Text classification or Text Categorization
is the activity of labeling natural
language texts with relevant categories
from a predefined set.
• In laymen terms, text classification is a
process of extracting generic tags from
unstructured text.
• These generic tags come from a set of
pre-defined categories.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is meant by text classification or Document classification ?
• Document classification or document categorization is
• a problem in library science, information science and
computer science.
• The task is to assign a document to one or more classes or
categories.
• This may be done "manually" or algorithmically.
•Wikipedia
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
What is meant by text classification?
• Text classification also known as text tagging or text
categorization is the process of categorizing text into
organized groups.
• By using Natural Language Processing (NLP), text
classifiers can automatically analyze text and then
assign a set of pre-defined tags or categories based on
its content.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Text Classification Examples
• Text classification is becoming
• an increasingly important part of businesses as it allows to
easily get insights from data and automate business processes.
• Some of the most common examples and use cases for
automatic text classification include the following:
a) Sentiment Analysis
b) Topic Detection
c) Language Detection
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Text Classification Examples
a) Sentiment Analysis: the process of understanding if a given text is
talking positively or negatively about a given subject
(e.g. for brand monitoring purposes).
b) Topic Detection: the task of identifying the theme or topic of a piece
of text
(e.g. know if a product review is about Ease of Use, Customer Support,
or Pricing when analyzing customer feedback).
c) Language Detection: the procedure of detecting the language of a
given text
(e.g. know if an incoming support ticket is written in English or Spanish for
automatically routing tickets to the appropriate team).
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• For example,
• new articles can be organized by topics;
• support tickets can be organized by urgency;
• chat conversations can be organized by language;
• brand mentions can be organized by sentiment; and so on.
• Text classification is
• one of the fundamental tasks in natural language processing with broad applications such
as sentiment analysis, topic labeling, spam detection, and intent detection.
• Here’s an example of how it works:
• “The user interface is quite straightforward and easy to use.”
• A text classifier can take this phrase as an input, analyze its content, and then automatically
assign relevant tags, such as UI and Easy To Use.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• First tactic for categorizing documents is to assign a
label to each document,
• but this solve the problem only when the users know the
labels of the documents they looking for.
• This tactic does not solve more generic problem of
finding documents on specific topic or subject.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• For that case, better solution is to
• group documents by common generic topics and label each group
with a meaningful name.
• Each labeled group is called category or class.
• Document classification is
• the process of categorizing documents under a given cluster or
category using fully supervised learning process.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Why is Text Classification Important?
• It’s estimated that around 80% of all information is unstructured, with text
being one of the most common types of unstructured data.
• Because of the messy nature of text,
• analyzing, understanding, organizing, and sorting through text data is hard and time-consuming, so
most companies fail to use it to its full potential.
• This is where text classification with machine learning comes in.
• Using text classifiers, companies can automatically structure all manner of
relevant text, from
• , legal documents, social media, chatbots, surveys, and more in a fast and cost-effective way.
• This allows companies to
• save time analyzing text data, automate business processes, and make data-driven business
decisions.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Reasons for: Text Classification Important
a) Scalability
• Manually analyzing and organizing is slow and much less accurate..
• Machine learning can automatically analyze millions of surveys, comments, emails,
etc., at a fraction of the cost, often in just a few minutes.
• Text classification tools are scalable to any business needs, large or small.
b) Real-time analysis
• There are critical situations that companies need to identify as soon as possible and
take immediate action (e.g., PR crises on social media).
• Machine learning text classification can follow your brand mentions constantly and in
real time, so you'll identify critical information and be able to take action right away.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Reasons for: Text Classification Important
c) Consistent criteria
• Human annotators make mistakes when classifying text data due to
distractions, fatigue, and boredom, and human subjectivity creates inconsistent
criteria.
• Machine learning, on the other hand, applies the same lens and criteria to all
data and results.
• Once a text classification model is properly trained it performs with
unsurpassed accuracy.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• Classification could be performed
1. manually by domain experts or
2. automatically using well- known and
• widely used classification algorithms such as decision tree and
Naïve Bayes.
• Documents are classified according to
• other attributes (e.g. author, document type, publishing year
etc.) or according to their subjects.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• there are two main kind of subject classification of documents:
1. The content based approach and
2. the request based approach.
• In Content based classification,
• the weight that is given to subjects in a document decides the class to which the document is assigned.
• For example, it is a rule in some library classification that at least 15% of the content of a book
should be about the class to which the book is assigned.
• In automatic classification, the number of times given words appears in a document determine the
class.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• In Request oriented
classification, the anticipated
request from users is impacting
how documents are being
classified.
• The classifier asks himself:
• “Under which description should this
entity be found?” and
• “think of all the possible queries and
decide for which ones the entity at
hand is relevant”.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Text Classification Applications
• With the help of text classification, businesses can make sense of large
amounts of data using techniques like
• aspect-based sentiment analysis to understand what people are talking about
and how they’re talking about each aspect.
• Text classification can help support teams provide a stellar experience
by
• automating tasks that are better left to computers, saving precious time that
can be spent on more important things.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Text Classification Applications
• models can help you analyze survey results to discover patterns and
insights like:
• What do people like about our product or service?
• What should we improve?
• What do we need to change?
• By combining both quantitative results and qualitative analyses,
• teams can make more informed decisions without having to spend hours
manually analyzing every single open-ended response.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Text Classification Applications
• Text classification has thousands of use cases and is applied to a wide range
of tasks.
• In some cases, data classification tools work behind the scenes to enhance
app features we interact with on a daily basis (like email spam filtering).
• In some other cases, classifiers are used by marketers, product managers,
engineers, and salespeople to automate business processes and save
hundreds of hours of manual data processing.
• Some of the top applications and use cases of text classification include:
1. Detecting urgent issues
2. Automating customer support processes
3. Listening to the Voice of customer (VoC)
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Characterization of Text Classification
• Automatic document classification tasks can be divided into three
types
1. Unsupervised document classification (document clustering): the
classification must be done totally without reference to external information.
2. Semi-supervised document classification: parts of the documents are labeled
by the external method.
3. Supervised document classification where some external method (such as
human feedback) provides information on the correct classification for
documents
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Computational Supervised Learning
• Computational Supervised Learning is also called classification aimed
to:
• Learn from past experience, and
• use the learned knowledge to classify new data
• Knowledge learned by intelligent algorithms
• Examples:
• Clinical diagnosis for patients
• Cell type classification
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Overall Picture of Supervised Learning
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Biomedical
Financial
Government
Scientific
Decision trees
Emerging patterns
SVM
Neural networks
Classifiers (M-Doctors)
Unsupervised Learning
• Unsupervised learning is a machine learning technique in which
models are not supervised using training dataset. Instead, models itself
find the hidden patterns and insights from the given data. It can be
compared to learning which takes place in the human brain while
learning new things. It can be defined as:
• “Unsupervised learning is a type of machine learning in which models
are trained using unlabeled dataset and are allowed to act on that data
without any supervision”.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Unsupervised Learning
Unsupervised learning cannot be directly applied to a regression or
classification problem because unlike supervised learning, we have the
input data but no corresponding output data.
The goal of unsupervised learning is to
find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Unsupervised Learning
Example: Suppose the unsupervised learning algorithm is given an input
dataset containing images of different types of cats and dogs.
The algorithm is never trained upon the given dataset, which means it
does not have any idea about the features of the dataset.
The task of the unsupervised learning algorithm is to identify the image
features on their own.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Unsupervised Learning
• . Unsupervised learning algorithm will
• perform this task by clustering the image dataset into the groups according to
similarities between images.
• By Simply,
• no training data is provided Examples:
• neural network models
• independent component analysis
• clustering
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Supervised vs. Unsupervised Learning
classification Vs clustering
• Supervised learning (classification)
• Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations
• New data is classified based on the training set
• Unsupervised learning (clustering)
• The class labels of training data is unknown
• Given a set of measurements, observations, etc. with the aim of establishing
the existence of classes or clusters in the data
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES

More Related Content

What's hot

Machine Learning
Machine LearningMachine Learning
Machine Learning
Rahul Kumar
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
Tharuka Vishwajith Sarathchandra
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
butest
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
Aniket Maurya
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
Machine learning
Machine learningMachine learning
Machine learning
Vatsal Gajera
 
Machine learning
Machine learning Machine learning
Machine learning
Saurabh Agrawal
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Samra Shahzadi
 
The vector space model
The vector space modelThe vector space model
The vector space model
pkgosh
 
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdfCS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
butest
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
ASHOK KUMAR
 
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
CS8080  IRT UNIT - III  SLIDES IN PDF.pdfCS8080  IRT UNIT - III  SLIDES IN PDF.pdf
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
RAHUL DANGWAL
 
Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case Studies
DATAVERSITY
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 

What's hot (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdfCS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
CS8080  IRT UNIT - III  SLIDES IN PDF.pdfCS8080  IRT UNIT - III  SLIDES IN PDF.pdf
CS8080 IRT UNIT - III SLIDES IN PDF.pdf
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 
Smart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case StudiesSmart Data Slides: Machine Learning - Case Studies
Smart Data Slides: Machine Learning - Case Studies
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 

Similar to CS8080_IRT_UNIT - III T1 A CHARACTERIZATION OF TEXT CLASSIFICATION.pdf

CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf
CS8080_IRT_UNIT - III T8  FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdfCS8080_IRT_UNIT - III T8  FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf
CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdfCS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdfCS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdfCS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T5 DECISION TREES.pdf
CS8080_IRT_UNIT - III T5  DECISION TREES.pdfCS8080_IRT_UNIT - III T5  DECISION TREES.pdf
CS8080_IRT_UNIT - III T5 DECISION TREES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdfCS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Data Mining 101
Data Mining 101Data Mining 101
Data Mining 101
Ali Septiandri
 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdfCS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
ADA Consulting
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
audeleypearl
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
roushhsiu
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
Sharjeel Imtiaz
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
jasontseng19
 
Quality technician job description
Quality technician job descriptionQuality technician job description
Quality technician job description
qualitymanagement246
 
CS8082_MachineLearnigTechniques _Unit-1.ppt
CS8082_MachineLearnigTechniques _Unit-1.pptCS8082_MachineLearnigTechniques _Unit-1.ppt
CS8082_MachineLearnigTechniques _Unit-1.ppt
pushpait
 
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdfCS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 

Similar to CS8080_IRT_UNIT - III T1 A CHARACTERIZATION OF TEXT CLASSIFICATION.pdf (20)

CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf
CS8080_IRT_UNIT - III T8  FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdfCS8080_IRT_UNIT - III T8  FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf
CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf
 
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdfCS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
 
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdfCS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
 
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdfCS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
 
CS8080_IRT_UNIT - III T5 DECISION TREES.pdf
CS8080_IRT_UNIT - III T5  DECISION TREES.pdfCS8080_IRT_UNIT - III T5  DECISION TREES.pdf
CS8080_IRT_UNIT - III T5 DECISION TREES.pdf
 
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdfCS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
 
Data Mining 101
Data Mining 101Data Mining 101
Data Mining 101
 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
 
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdfCS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
 
Quality technician job description
Quality technician job descriptionQuality technician job description
Quality technician job description
 
CS8082_MachineLearnigTechniques _Unit-1.ppt
CS8082_MachineLearnigTechniques _Unit-1.pptCS8082_MachineLearnigTechniques _Unit-1.ppt
CS8082_MachineLearnigTechniques _Unit-1.ppt
 
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdfCS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdfCS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdfCS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
CS3391 -OOP -UNIT – III  NOTES FINAL.pdfCS3391 -OOP -UNIT – III  NOTES FINAL.pdf
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
CS3391 -OOP -UNIT – II  NOTES FINAL.pdfCS3391 -OOP -UNIT – II  NOTES FINAL.pdf
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
CS3391 -OOP -UNIT – I  NOTES FINAL.pdfCS3391 -OOP -UNIT – I  NOTES FINAL.pdf
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdfCS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING (14)

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.ppt
 
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
 
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdfCS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdfCS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
CS3391 -OOP -UNIT – III  NOTES FINAL.pdfCS3391 -OOP -UNIT – III  NOTES FINAL.pdf
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
CS3391 -OOP -UNIT – II  NOTES FINAL.pdfCS3391 -OOP -UNIT – II  NOTES FINAL.pdf
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
CS3391 -OOP -UNIT – I  NOTES FINAL.pdfCS3391 -OOP -UNIT – I  NOTES FINAL.pdf
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS3251-_PIC
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdfCS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
 

Recently uploaded

一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
Shiny Christobel
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
Addu25809
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
cannyengineerings
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
Indrajeet sahu
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
AIRCC Publishing Corporation
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 

Recently uploaded (20)

一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
Zener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and ApplicationsZener Diode and its V-I Characteristics and Applications
Zener Diode and its V-I Characteristics and Applications
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENTNATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
NATURAL DEEP EUTECTIC SOLVENTS AS ANTI-FREEZING AGENT
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...Pressure Relief valve used in flow line to release the over pressure at our d...
Pressure Relief valve used in flow line to release the over pressure at our d...
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Open Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surfaceOpen Channel Flow: fluid flow with a free surface
Open Channel Flow: fluid flow with a free surface
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 

CS8080_IRT_UNIT - III T1 A CHARACTERIZATION OF TEXT CLASSIFICATION.pdf

  • 1. P1WU UNIT – III: CLASSIFICATION Topic 1: A CHARACTERIZATION OF TEXT CLASSIFICATION AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 2. UNIT III 1.A Characterization of Text Classification 2. Unsupervised Algorithms: Clustering 3. Naïve Text Classification 4. Supervised Algorithms 5. Decision Tree 6. k-NN Classifier 7. SVM Classifier 8. Feature Selection or Dimensionality Reduction 9. Evaluation metrics 10. Accuracy and Error 11. Organizing the classes 12. Indexing and Searching 13. Inverted Indexes 14. Sequential Searching 15. Multi-dimensional Indexing AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 3. INTRODUCTION TO CLASSIFICATION AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 4. INTRODUCTION TO CLASSIFICATION • Scientists became very serious about addressing the question: • “Can we build a model that learns from available data and automatically makes the right decisions and predictions?” • Answer can be found in numerous applications that are emerging from the fields of 1. pattern classification, 2. machine learning, and 3. artificial intelligence. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 5. INTRODUCTION TO CLASSIFICATION • Data from various sensoring devices combined with powerful learning algorithms and domain knowledge led to : • many great inventions that we now take for granted in our everyday life: • Internet queries via search engines like Google, • text recognition at the post office, • barcode scanners at the supermarket, the diagnosis of diseases, • speech recognition by Siri or • Google Now on our mobile phone, just to name a few. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 6. INTRODUCTION TO CLASSIFICATION • Classification is: • the data mining process of • finding a model (or function) that • describes and distinguishes data classes or concepts, • for the purpose of being able to use the model to predict the class of objects whose class label is unknown. • That is, predicts categorical class labels (discrete or nominal). • Classifies the data (constructs a model) based on the training set. • It predict group membership for data instances. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 7. INTRODUCTION TO CLASSIFICATION What is CLASSIFICATION? • Classification and prediction are : • two forms of data analysis that can used to extract models describing important data classes or to predict the future data trends. • C & P help us to provide a better understanding of large data. • Classification predicts categorical (discrete, unordered) labels. • Prediction models continuous valued functions. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 8. INTRODUCTION TO CLASSIFICATION • How can we classify? • The trick here is Machine Learning which requires us to make classifications based on past observations (the learning part). • We give the machine a set of data having texts with labels tagged to it and then we let the model to learn on all these data which will later give us some useful insight on the categories of text input we feed. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 9. Applications of Classification • Classification of (potential) customers for: • Credit approval, risk prediction, selective marketing • Performance prediction based on • selected indicators • Medical diagnosis based on symptoms or reactions to Therapy • Application areas: • Credit approval • Target marketing • Medical diagnosis • Treatment effectiveness analysis • Performance prediction AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 10. When is classification needed? • Scenarios: • In each of these examples, the data analysis task is classification, • where a model or classifier is constructed to predict categorical labels, such as • “safe” or “risky” for the loan application data; • “yes” or “no” for the marketing data; or • “treatment A,” “treatment B,” or “treatment C” for the medical data. • These categories can be represented by discrete values, where the ordering among values has no meaning. • For example, • the values 1, 2, and 3 may be used to represent treatments A, B, and C, • where there is no ordering implied among this group of treatment regimes. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 11. INTRODUCTION TO CLASSIFICATION Aim: predict categorical class labels for new tuples/samples Input: a training set of tuples/samples, each with a class label Output: a model (a classifier) based on the training set and the class labels AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 12. Why Classification? • A classical problem extensively studied by • statisticians and machine learning researchers • Predicts categorical class labels. • Produces a model (classifier). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 13. Typical Applications of Classification • Example: • {credit history, salary} credit approval ( Yes/No) • {Temp, Humidity}  Rain (Yes/No) • A set of documents  sports, technology, etc. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES • Another Example: • If x >= 90 then grade =A. • If 80<=x<90 then grade =B. • If 70<=x<80 then grade =C. • If 60<=x<70 then grade =D. • If x<50 then grade =F.
  • 14. WHAT ARE TEXT CLASSIFICATION? • Text classification is a machine learning technique that assigns a set of predefined categories to open-ended text. • Text classifiers can be used to organize, structure, and categorize pretty much any kind of text – from documents, medical studies and files, and all over the web. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 15. What is meant by text classification? • Text classification or Text Categorization is the activity of labeling natural language texts with relevant categories from a predefined set. • In laymen terms, text classification is a process of extracting generic tags from unstructured text. • These generic tags come from a set of pre-defined categories. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 16. What is meant by text classification or Document classification ? • Document classification or document categorization is • a problem in library science, information science and computer science. • The task is to assign a document to one or more classes or categories. • This may be done "manually" or algorithmically. •Wikipedia AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 17. What is meant by text classification? • Text classification also known as text tagging or text categorization is the process of categorizing text into organized groups. • By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 18. Text Classification Examples • Text classification is becoming • an increasingly important part of businesses as it allows to easily get insights from data and automate business processes. • Some of the most common examples and use cases for automatic text classification include the following: a) Sentiment Analysis b) Topic Detection c) Language Detection AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 19. Text Classification Examples a) Sentiment Analysis: the process of understanding if a given text is talking positively or negatively about a given subject (e.g. for brand monitoring purposes). b) Topic Detection: the task of identifying the theme or topic of a piece of text (e.g. know if a product review is about Ease of Use, Customer Support, or Pricing when analyzing customer feedback). c) Language Detection: the procedure of detecting the language of a given text (e.g. know if an incoming support ticket is written in English or Spanish for automatically routing tickets to the appropriate team). AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 20. A Characterization of Text Classification • For example, • new articles can be organized by topics; • support tickets can be organized by urgency; • chat conversations can be organized by language; • brand mentions can be organized by sentiment; and so on. • Text classification is • one of the fundamental tasks in natural language processing with broad applications such as sentiment analysis, topic labeling, spam detection, and intent detection. • Here’s an example of how it works: • “The user interface is quite straightforward and easy to use.” • A text classifier can take this phrase as an input, analyze its content, and then automatically assign relevant tags, such as UI and Easy To Use. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 21. A Characterization of Text Classification • First tactic for categorizing documents is to assign a label to each document, • but this solve the problem only when the users know the labels of the documents they looking for. • This tactic does not solve more generic problem of finding documents on specific topic or subject. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 22. A Characterization of Text Classification • For that case, better solution is to • group documents by common generic topics and label each group with a meaningful name. • Each labeled group is called category or class. • Document classification is • the process of categorizing documents under a given cluster or category using fully supervised learning process. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 23. Why is Text Classification Important? • It’s estimated that around 80% of all information is unstructured, with text being one of the most common types of unstructured data. • Because of the messy nature of text, • analyzing, understanding, organizing, and sorting through text data is hard and time-consuming, so most companies fail to use it to its full potential. • This is where text classification with machine learning comes in. • Using text classifiers, companies can automatically structure all manner of relevant text, from • , legal documents, social media, chatbots, surveys, and more in a fast and cost-effective way. • This allows companies to • save time analyzing text data, automate business processes, and make data-driven business decisions. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 24. Reasons for: Text Classification Important a) Scalability • Manually analyzing and organizing is slow and much less accurate.. • Machine learning can automatically analyze millions of surveys, comments, emails, etc., at a fraction of the cost, often in just a few minutes. • Text classification tools are scalable to any business needs, large or small. b) Real-time analysis • There are critical situations that companies need to identify as soon as possible and take immediate action (e.g., PR crises on social media). • Machine learning text classification can follow your brand mentions constantly and in real time, so you'll identify critical information and be able to take action right away. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 25. Reasons for: Text Classification Important c) Consistent criteria • Human annotators make mistakes when classifying text data due to distractions, fatigue, and boredom, and human subjectivity creates inconsistent criteria. • Machine learning, on the other hand, applies the same lens and criteria to all data and results. • Once a text classification model is properly trained it performs with unsurpassed accuracy. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 26. A Characterization of Text Classification • Classification could be performed 1. manually by domain experts or 2. automatically using well- known and • widely used classification algorithms such as decision tree and Naïve Bayes. • Documents are classified according to • other attributes (e.g. author, document type, publishing year etc.) or according to their subjects. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 27. A Characterization of Text Classification • there are two main kind of subject classification of documents: 1. The content based approach and 2. the request based approach. • In Content based classification, • the weight that is given to subjects in a document decides the class to which the document is assigned. • For example, it is a rule in some library classification that at least 15% of the content of a book should be about the class to which the book is assigned. • In automatic classification, the number of times given words appears in a document determine the class. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 28. A Characterization of Text Classification • In Request oriented classification, the anticipated request from users is impacting how documents are being classified. • The classifier asks himself: • “Under which description should this entity be found?” and • “think of all the possible queries and decide for which ones the entity at hand is relevant”. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 29. Text Classification Applications • With the help of text classification, businesses can make sense of large amounts of data using techniques like • aspect-based sentiment analysis to understand what people are talking about and how they’re talking about each aspect. • Text classification can help support teams provide a stellar experience by • automating tasks that are better left to computers, saving precious time that can be spent on more important things. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 30. Text Classification Applications • models can help you analyze survey results to discover patterns and insights like: • What do people like about our product or service? • What should we improve? • What do we need to change? • By combining both quantitative results and qualitative analyses, • teams can make more informed decisions without having to spend hours manually analyzing every single open-ended response. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 31. Text Classification Applications • Text classification has thousands of use cases and is applied to a wide range of tasks. • In some cases, data classification tools work behind the scenes to enhance app features we interact with on a daily basis (like email spam filtering). • In some other cases, classifiers are used by marketers, product managers, engineers, and salespeople to automate business processes and save hundreds of hours of manual data processing. • Some of the top applications and use cases of text classification include: 1. Detecting urgent issues 2. Automating customer support processes 3. Listening to the Voice of customer (VoC) AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 32. A Characterization of Text Classification • Automatic document classification tasks can be divided into three types 1. Unsupervised document classification (document clustering): the classification must be done totally without reference to external information. 2. Semi-supervised document classification: parts of the documents are labeled by the external method. 3. Supervised document classification where some external method (such as human feedback) provides information on the correct classification for documents AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 33. Computational Supervised Learning • Computational Supervised Learning is also called classification aimed to: • Learn from past experience, and • use the learned knowledge to classify new data • Knowledge learned by intelligent algorithms • Examples: • Clinical diagnosis for patients • Cell type classification AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 34. Overall Picture of Supervised Learning AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Biomedical Financial Government Scientific Decision trees Emerging patterns SVM Neural networks Classifiers (M-Doctors)
  • 35. Unsupervised Learning • Unsupervised learning is a machine learning technique in which models are not supervised using training dataset. Instead, models itself find the hidden patterns and insights from the given data. It can be compared to learning which takes place in the human brain while learning new things. It can be defined as: • “Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision”. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 36. Unsupervised Learning Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 37. Unsupervised Learning Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset. The task of the unsupervised learning algorithm is to identify the image features on their own. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 38. Unsupervised Learning • . Unsupervised learning algorithm will • perform this task by clustering the image dataset into the groups according to similarities between images. • By Simply, • no training data is provided Examples: • neural network models • independent component analysis • clustering AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 39. Supervised vs. Unsupervised Learning classification Vs clustering • Supervised learning (classification) • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning (clustering) • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 40. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES