Text Classification Using
Machine Learning
Understanding, Implementing, and Optimizing
Introduction to Text Classification
• Definition and purpose
• text categorization or document classification
• natural language processing (NLP) task
• analyze and categorize textual data into predefined categories or classes,
• Applications (e.g., spam detection, sentiment analysis, topic
categorization)
• Information Organization
• Document Filtering and Routing
• Sentiment Analysis
• Topic Categorization
• Spam Detection
• Language Identification
• Fraud Detection
• Legal and Compliance Analysis
• Customer Support and Ticket Routing
• Medical Document Classification
Importance of Text Classification
• Enhancing information retrieval
• Personalizing user experiences
• Automating decision-making processes
• Product reviews and feedback analysis
• Legal document analysis
• Fraud detection analysis
• Language analysis
Supervised Learning for Text
Classification
• supervised learning in the context of text classification
• Labeled datasets with input texts and corresponding labels
Text Representation
• Bag of Words (BoW) model
• Term Frequency-Inverse Document Frequency (TF-IDF)
• Word Embeddings (e.g., Word2Vec, GloVe)
Bag of Words (BoW)
1. The cat in the hat."
2. "The cat sat on the mat."
3. "The dog barked."
• Sentence 1: ["The", "cat", "in", "the", "hat."]
• Sentence 2: ["The", "cat", "sat", "on", "the", "mat."]
• Sentence 3: ["The", "dog", "barked."]
Vocabulary: ["The", "cat", "in", "hat", "sat", "on", "mat", "dog", "barked."]
• Sentence 1: [1, 1, 1, 1, 0, 0, 0, 0, 0]
• Sentence 2: [1, 1, 0, 1, 1, 1, 1, 0, 0]
• Sentence 3: [1, 0, 0, 0, 0, 0, 0, 1, 1]
• This model discards the word order and structure but captures the presence of words in the
document. Frequency of the work not taken into account.
Term Frequency-Inverse Document
Frequency
• TF-IDF assigns higher weights to terms that are more specific to a particular document but less
frequent across the entire corpus.
1. "The cat in the hat."
2. "The cat sat on the mat."
3. "The dog barked."
• Document 1: ["The", "cat", "in", "the", "hat."]
• Document 2: ["The", "cat", "sat", "on", "the", "mat."]
• Document 3: ["The", "dog", "barked."]
• TF(Document 1): {"The": 2, "cat": 1, "in": 1, "hat": 1}
• TF(Document 2): {"The": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}
• TF(Document 3): {"The": 1, "dog": 1, "barked": 1}
TF-IDF
• IDF("The"): log(3/3) = 0
• IDF("cat"): log(3/2) ≈ 0.41
• IDF("in"): log(3/1) ≈ 0.48
• IDF("hat"): log(3/1) ≈ 0.48
• IDF("sat"): log(3/1) ≈ 0.48
• IDF("on"): log(3/1) ≈ 0.48
• IDF("mat"): log(3/1) ≈ 0.48
• IDF("dog"): log(3/1) ≈ 0.48
• IDF("barked"): log(3/1) ≈ 0.48
• TF-IDF(Document 1): {"The": 0, "cat": 0.41, "in": 0.48, "hat": 0.48}
• TF-IDF(Document 2): {"The": 0, "cat": 0.41, "sat": 0.48, "on": 0.48, "mat": 0.48}
• TF-IDF(Document 3): {"The": 0, "dog": 0.48, "barked": 0.48}
Word Embeddings
• Word2vec
• Sentences:
• "I love natural language processing."
• "Word embeddings capture semantic relationships."
• Vocabulary:
• Unique words: "I", "love", "natural", "language", "processing", "Word", "embeddings", "capture", "semantic", "relationships.“
Initialized vectors
• I: [0.2, 0.5]
• love: [0.8, 0.3]
• natural: [0.4, 0.7]
• language: [0.6, 0.2]
• processing: [0.1, 0.9]
• Word: [0.7, 0.4]
• embeddings: [0.3, 0.6]
• capture: [0.9, 0.2]
• semantic: [0.5, 0.8]
• relationships: [0.2, 0.7]
Word2Vec
• Lets train with Skip Gram model with window size 1
• Updated vector for "natural": [0.4, 0.7] + 0.01 * ( [0.8, 0.3] + [0.6, 0.2] ) = [0.4, 0.7] + 0.01 * [1.4, 0.5] = [0.414, 0.705]
• Repeat this for every word with more no of epoch
• Similarity("love", "natural") = dot([0.8, 0.3], [0.414, 0.705]) / (magnitude([0.8, 0.3]) * magnitude([0.414, 0.705]))
• = (0.8 * 0.414 + 0.3 * 0.705) / (sqrt(0.8^2 + 0.3^2) * sqrt(0.414^2 + 0.705^2))
• ≈ 0.994
GloVe (Global Vectors for Word Representation)
GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm for generating word embeddings. It operates on the co-occurrence statistics of words in a corpus and aims to capture
global semantic relationships between words. The key idea is to model the probability of word co-occurrences and learn word embeddings that reflect these probabilities
• Corpus:
• "I love natural language processing."
• "Word embeddings capture semantic relationships."
• Vocabulary:
• "I", "love", "natural", "language", "processing", "Word", "embeddings", "capture", "semantic", "relationships."
Build Word-Word Co-occurrence Matrix:
I love natural language processing Word embeddings capture semantic relationships
I 0 1 1 1 1 0 0 0 0 0
love 1 0 1 1 1 0 0 0 0 0
natural 1 1 0 1 1 0 0 0 0 0
language 1 1 1 0 1 0 0 0 0 0
processing 1 1 1 1 0 0 0 0 0 0
Word 0 0 0 0 0 0 1 1 1 1
embeddings 0 0 0 0 0 1 0 1 1 1
capture 0 0 0 0 0 1 1 0 1 1
semantic 0 0 0 0 0 1 1 1 0 1
relationships 0 0 0 0 0 1 1 1 1 0
Glove
Initialize word embeddings
• I: [0.2, 0.5]
• love: [0.8, 0.3]
• natural: [0.4, 0.7]
• language: [0.6, 0.2]
• processing: [0.1, 0.9]
• Word: [0.7, 0.4]
• embeddings: [0.3, 0.6]
• capture: [0.9, 0.2]
• semantic: [0.5, 0.8]
• relationships: [0.2, 0.7]
It aims to minimize the difference between the dot product of word vectors and the logarithm of their co-occurrence
probabilities.
Glove
J = ΣiΣj f(Xij) * (wiT * wj + bwi + bwj - log(Xij))^2
Similarity("love", "natural") = dot([0.8, 0.3], [0.4, 0.7]) / (magnitude([0.8, 0.3]) * magnitude([0.4, 0.7]))
= (0.8 * 0.4 + 0.3 * 0.7) / (sqrt(0.8^2 + 0.3^2) * sqrt(0.4^2 + 0.7^2))
≈ 0.966
Text Classification Algorithms
• Naive Bayes
• Support Vector Machines (SVM)
• Logistic Regression
• Decision Trees and Random Forest
• Neural Networks (e.g., Recurrent Neural Networks, Long Short-
Term Memory networks)
Naive Bayes
Naive Bayes is a probabilistic machine learning algorithm that is
commonly used for classification tasks. It is based on Bayes'
theorem and makes the assumption that features are
conditionally independent given the class label, which simplifies
the computation.
SVM
Preprocessing Text Data
• Tokenization
• Removing stop words
• Stemming and Lemmatization
• Handling rare words and misspellings
Building a Text Classification Model
• Data Splitting: Training and Testing Sets
• Vectorizing Text Data
• Model Training
• Evaluation Metrics (e.g., accuracy, precision, recall, F1 score)
Model Evaluation
• Confusion Matrix
• ROC Curve and AUC-ROC
• Precision-Recall Curve
Challenges in Text Classification
• Handling Imbalanced Classes
• Dealing with Multiclass Classification
• Overfitting and Underfitting
Feature Engineering for Text
Classification
• N-grams
• Word Embeddings
• Feature Scaling
Tools and Libraries
• Python libraries (e.g., NLTK, Scikit-learn, TensorFlow, PyTorch)
Model Interpretability
• predictions of a text classification model
• importance of model interpretability in certain applications
Best Practices
• Cross-validation
• Hyperparameter tuning
• Ensemble methods for improved performance
Future Trends
• Recent advancements in text classification
• Emerging technologies and methodologies

Text Classification Using Machine Learning.pptx

  • 9.
    Text Classification Using MachineLearning Understanding, Implementing, and Optimizing
  • 10.
    Introduction to TextClassification • Definition and purpose • text categorization or document classification • natural language processing (NLP) task • analyze and categorize textual data into predefined categories or classes, • Applications (e.g., spam detection, sentiment analysis, topic categorization) • Information Organization • Document Filtering and Routing • Sentiment Analysis • Topic Categorization • Spam Detection • Language Identification • Fraud Detection • Legal and Compliance Analysis • Customer Support and Ticket Routing • Medical Document Classification
  • 11.
    Importance of TextClassification • Enhancing information retrieval • Personalizing user experiences • Automating decision-making processes • Product reviews and feedback analysis • Legal document analysis • Fraud detection analysis • Language analysis
  • 12.
    Supervised Learning forText Classification • supervised learning in the context of text classification • Labeled datasets with input texts and corresponding labels
  • 13.
    Text Representation • Bagof Words (BoW) model • Term Frequency-Inverse Document Frequency (TF-IDF) • Word Embeddings (e.g., Word2Vec, GloVe)
  • 14.
    Bag of Words(BoW) 1. The cat in the hat." 2. "The cat sat on the mat." 3. "The dog barked." • Sentence 1: ["The", "cat", "in", "the", "hat."] • Sentence 2: ["The", "cat", "sat", "on", "the", "mat."] • Sentence 3: ["The", "dog", "barked."] Vocabulary: ["The", "cat", "in", "hat", "sat", "on", "mat", "dog", "barked."] • Sentence 1: [1, 1, 1, 1, 0, 0, 0, 0, 0] • Sentence 2: [1, 1, 0, 1, 1, 1, 1, 0, 0] • Sentence 3: [1, 0, 0, 0, 0, 0, 0, 1, 1] • This model discards the word order and structure but captures the presence of words in the document. Frequency of the work not taken into account.
  • 15.
    Term Frequency-Inverse Document Frequency •TF-IDF assigns higher weights to terms that are more specific to a particular document but less frequent across the entire corpus. 1. "The cat in the hat." 2. "The cat sat on the mat." 3. "The dog barked." • Document 1: ["The", "cat", "in", "the", "hat."] • Document 2: ["The", "cat", "sat", "on", "the", "mat."] • Document 3: ["The", "dog", "barked."] • TF(Document 1): {"The": 2, "cat": 1, "in": 1, "hat": 1} • TF(Document 2): {"The": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1} • TF(Document 3): {"The": 1, "dog": 1, "barked": 1}
  • 16.
    TF-IDF • IDF("The"): log(3/3)= 0 • IDF("cat"): log(3/2) ≈ 0.41 • IDF("in"): log(3/1) ≈ 0.48 • IDF("hat"): log(3/1) ≈ 0.48 • IDF("sat"): log(3/1) ≈ 0.48 • IDF("on"): log(3/1) ≈ 0.48 • IDF("mat"): log(3/1) ≈ 0.48 • IDF("dog"): log(3/1) ≈ 0.48 • IDF("barked"): log(3/1) ≈ 0.48 • TF-IDF(Document 1): {"The": 0, "cat": 0.41, "in": 0.48, "hat": 0.48} • TF-IDF(Document 2): {"The": 0, "cat": 0.41, "sat": 0.48, "on": 0.48, "mat": 0.48} • TF-IDF(Document 3): {"The": 0, "dog": 0.48, "barked": 0.48}
  • 17.
    Word Embeddings • Word2vec •Sentences: • "I love natural language processing." • "Word embeddings capture semantic relationships." • Vocabulary: • Unique words: "I", "love", "natural", "language", "processing", "Word", "embeddings", "capture", "semantic", "relationships.“ Initialized vectors • I: [0.2, 0.5] • love: [0.8, 0.3] • natural: [0.4, 0.7] • language: [0.6, 0.2] • processing: [0.1, 0.9] • Word: [0.7, 0.4] • embeddings: [0.3, 0.6] • capture: [0.9, 0.2] • semantic: [0.5, 0.8] • relationships: [0.2, 0.7]
  • 18.
    Word2Vec • Lets trainwith Skip Gram model with window size 1 • Updated vector for "natural": [0.4, 0.7] + 0.01 * ( [0.8, 0.3] + [0.6, 0.2] ) = [0.4, 0.7] + 0.01 * [1.4, 0.5] = [0.414, 0.705] • Repeat this for every word with more no of epoch • Similarity("love", "natural") = dot([0.8, 0.3], [0.414, 0.705]) / (magnitude([0.8, 0.3]) * magnitude([0.414, 0.705])) • = (0.8 * 0.414 + 0.3 * 0.705) / (sqrt(0.8^2 + 0.3^2) * sqrt(0.414^2 + 0.705^2)) • ≈ 0.994
  • 19.
    GloVe (Global Vectorsfor Word Representation) GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm for generating word embeddings. It operates on the co-occurrence statistics of words in a corpus and aims to capture global semantic relationships between words. The key idea is to model the probability of word co-occurrences and learn word embeddings that reflect these probabilities • Corpus: • "I love natural language processing." • "Word embeddings capture semantic relationships." • Vocabulary: • "I", "love", "natural", "language", "processing", "Word", "embeddings", "capture", "semantic", "relationships." Build Word-Word Co-occurrence Matrix: I love natural language processing Word embeddings capture semantic relationships I 0 1 1 1 1 0 0 0 0 0 love 1 0 1 1 1 0 0 0 0 0 natural 1 1 0 1 1 0 0 0 0 0 language 1 1 1 0 1 0 0 0 0 0 processing 1 1 1 1 0 0 0 0 0 0 Word 0 0 0 0 0 0 1 1 1 1 embeddings 0 0 0 0 0 1 0 1 1 1 capture 0 0 0 0 0 1 1 0 1 1 semantic 0 0 0 0 0 1 1 1 0 1 relationships 0 0 0 0 0 1 1 1 1 0
  • 20.
    Glove Initialize word embeddings •I: [0.2, 0.5] • love: [0.8, 0.3] • natural: [0.4, 0.7] • language: [0.6, 0.2] • processing: [0.1, 0.9] • Word: [0.7, 0.4] • embeddings: [0.3, 0.6] • capture: [0.9, 0.2] • semantic: [0.5, 0.8] • relationships: [0.2, 0.7] It aims to minimize the difference between the dot product of word vectors and the logarithm of their co-occurrence probabilities.
  • 21.
    Glove J = ΣiΣjf(Xij) * (wiT * wj + bwi + bwj - log(Xij))^2 Similarity("love", "natural") = dot([0.8, 0.3], [0.4, 0.7]) / (magnitude([0.8, 0.3]) * magnitude([0.4, 0.7])) = (0.8 * 0.4 + 0.3 * 0.7) / (sqrt(0.8^2 + 0.3^2) * sqrt(0.4^2 + 0.7^2)) ≈ 0.966
  • 22.
    Text Classification Algorithms •Naive Bayes • Support Vector Machines (SVM) • Logistic Regression • Decision Trees and Random Forest • Neural Networks (e.g., Recurrent Neural Networks, Long Short- Term Memory networks)
  • 23.
    Naive Bayes Naive Bayesis a probabilistic machine learning algorithm that is commonly used for classification tasks. It is based on Bayes' theorem and makes the assumption that features are conditionally independent given the class label, which simplifies the computation.
  • 31.
  • 33.
    Preprocessing Text Data •Tokenization • Removing stop words • Stemming and Lemmatization • Handling rare words and misspellings
  • 34.
    Building a TextClassification Model • Data Splitting: Training and Testing Sets • Vectorizing Text Data • Model Training • Evaluation Metrics (e.g., accuracy, precision, recall, F1 score)
  • 35.
    Model Evaluation • ConfusionMatrix • ROC Curve and AUC-ROC • Precision-Recall Curve
  • 36.
    Challenges in TextClassification • Handling Imbalanced Classes • Dealing with Multiclass Classification • Overfitting and Underfitting
  • 37.
    Feature Engineering forText Classification • N-grams • Word Embeddings • Feature Scaling
  • 38.
    Tools and Libraries •Python libraries (e.g., NLTK, Scikit-learn, TensorFlow, PyTorch)
  • 39.
    Model Interpretability • predictionsof a text classification model • importance of model interpretability in certain applications
  • 40.
    Best Practices • Cross-validation •Hyperparameter tuning • Ensemble methods for improved performance
  • 41.
    Future Trends • Recentadvancements in text classification • Emerging technologies and methodologies