Text Classification Using Machine Learning.pptx

Text Classification Using
Machine Learning
Understanding, Implementing, and Optimizing

Introduction to Text Classification
• Definition and purpose
• text categorization or document classification
• natural language processing (NLP) task
• analyze and categorize textual data into predefined categories or classes,
• Applications (e.g., spam detection, sentiment analysis, topic
categorization)
• Information Organization
• Document Filtering and Routing
• Sentiment Analysis
• Topic Categorization
• Spam Detection
• Language Identification
• Fraud Detection
• Legal and Compliance Analysis
• Customer Support and Ticket Routing
• Medical Document Classification

Importance of Text Classification
• Enhancing information retrieval
• Personalizing user experiences
• Automating decision-making processes
• Product reviews and feedback analysis
• Legal document analysis
• Fraud detection analysis
• Language analysis

Supervised Learning for Text
Classification
• supervised learning in the context of text classification
• Labeled datasets with input texts and corresponding labels

Text Representation
• Bag of Words (BoW) model
• Term Frequency-Inverse Document Frequency (TF-IDF)
• Word Embeddings (e.g., Word2Vec, GloVe)

Bag of Words (BoW)
1. The cat in the hat."
2. "The cat sat on the mat."
3. "The dog barked."
• Sentence 1: ["The", "cat", "in", "the", "hat."]
• Sentence 2: ["The", "cat", "sat", "on", "the", "mat."]
• Sentence 3: ["The", "dog", "barked."]
Vocabulary: ["The", "cat", "in", "hat", "sat", "on", "mat", "dog", "barked."]
• Sentence 1: [1, 1, 1, 1, 0, 0, 0, 0, 0]
• Sentence 2: [1, 1, 0, 1, 1, 1, 1, 0, 0]
• Sentence 3: [1, 0, 0, 0, 0, 0, 0, 1, 1]
• This model discards the word order and structure but captures the presence of words in the
document. Frequency of the work not taken into account.

Term Frequency-Inverse Document
Frequency
• TF-IDF assigns higher weights to terms that are more specific to a particular document but less
frequent across the entire corpus.
1. "The cat in the hat."
2. "The cat sat on the mat."
3. "The dog barked."
• Document 1: ["The", "cat", "in", "the", "hat."]
• Document 2: ["The", "cat", "sat", "on", "the", "mat."]
• Document 3: ["The", "dog", "barked."]
• TF(Document 1): {"The": 2, "cat": 1, "in": 1, "hat": 1}
• TF(Document 2): {"The": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}
• TF(Document 3): {"The": 1, "dog": 1, "barked": 1}

TF-IDF
• IDF("The"): log(3/3) = 0
• IDF("cat"): log(3/2) ≈ 0.41
• IDF("in"): log(3/1) ≈ 0.48
• IDF("hat"): log(3/1) ≈ 0.48
• IDF("sat"): log(3/1) ≈ 0.48
• IDF("on"): log(3/1) ≈ 0.48
• IDF("mat"): log(3/1) ≈ 0.48
• IDF("dog"): log(3/1) ≈ 0.48
• IDF("barked"): log(3/1) ≈ 0.48
• TF-IDF(Document 1): {"The": 0, "cat": 0.41, "in": 0.48, "hat": 0.48}
• TF-IDF(Document 2): {"The": 0, "cat": 0.41, "sat": 0.48, "on": 0.48, "mat": 0.48}
• TF-IDF(Document 3): {"The": 0, "dog": 0.48, "barked": 0.48}

Word Embeddings
• Word2vec
• Sentences:
• "I love natural language processing."
• "Word embeddings capture semantic relationships."
• Vocabulary:
• Unique words: "I", "love", "natural", "language", "processing", "Word", "embeddings", "capture", "semantic", "relationships.“
Initialized vectors
• I: [0.2, 0.5]
• love: [0.8, 0.3]
• natural: [0.4, 0.7]
• language: [0.6, 0.2]
• processing: [0.1, 0.9]
• Word: [0.7, 0.4]
• embeddings: [0.3, 0.6]
• capture: [0.9, 0.2]
• semantic: [0.5, 0.8]
• relationships: [0.2, 0.7]

Word2Vec
• Lets train with Skip Gram model with window size 1
• Updated vector for "natural": [0.4, 0.7] + 0.01 * ( [0.8, 0.3] + [0.6, 0.2] ) = [0.4, 0.7] + 0.01 * [1.4, 0.5] = [0.414, 0.705]
• Repeat this for every word with more no of epoch
• Similarity("love", "natural") = dot([0.8, 0.3], [0.414, 0.705]) / (magnitude([0.8, 0.3]) * magnitude([0.414, 0.705]))
• = (0.8 * 0.414 + 0.3 * 0.705) / (sqrt(0.8^2 + 0.3^2) * sqrt(0.414^2 + 0.705^2))
• ≈ 0.994

GloVe (Global Vectors for Word Representation)
GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm for generating word embeddings. It operates on the co-occurrence statistics of words in a corpus and aims to capture
global semantic relationships between words. The key idea is to model the probability of word co-occurrences and learn word embeddings that reflect these probabilities
• Corpus:
• "I love natural language processing."
• "Word embeddings capture semantic relationships."
• Vocabulary:
• "I", "love", "natural", "language", "processing", "Word", "embeddings", "capture", "semantic", "relationships."
Build Word-Word Co-occurrence Matrix:
I love natural language processing Word embeddings capture semantic relationships
I 0 1 1 1 1 0 0 0 0 0
love 1 0 1 1 1 0 0 0 0 0
natural 1 1 0 1 1 0 0 0 0 0
language 1 1 1 0 1 0 0 0 0 0
processing 1 1 1 1 0 0 0 0 0 0
Word 0 0 0 0 0 0 1 1 1 1
embeddings 0 0 0 0 0 1 0 1 1 1
capture 0 0 0 0 0 1 1 0 1 1
semantic 0 0 0 0 0 1 1 1 0 1
relationships 0 0 0 0 0 1 1 1 1 0

Glove
Initialize word embeddings
• I: [0.2, 0.5]
• love: [0.8, 0.3]
• natural: [0.4, 0.7]
• language: [0.6, 0.2]
• processing: [0.1, 0.9]
• Word: [0.7, 0.4]
• embeddings: [0.3, 0.6]
• capture: [0.9, 0.2]
• semantic: [0.5, 0.8]
• relationships: [0.2, 0.7]
It aims to minimize the difference between the dot product of word vectors and the logarithm of their co-occurrence
probabilities.

Glove
J = ΣiΣj f(Xij) * (wiT * wj + bwi + bwj - log(Xij))^2
Similarity("love", "natural") = dot([0.8, 0.3], [0.4, 0.7]) / (magnitude([0.8, 0.3]) * magnitude([0.4, 0.7]))
= (0.8 * 0.4 + 0.3 * 0.7) / (sqrt(0.8^2 + 0.3^2) * sqrt(0.4^2 + 0.7^2))
≈ 0.966

Text Classification Algorithms
• Naive Bayes
• Support Vector Machines (SVM)
• Logistic Regression
• Decision Trees and Random Forest
• Neural Networks (e.g., Recurrent Neural Networks, Long Short-
Term Memory networks)

Naive Bayes
Naive Bayes is a probabilistic machine learning algorithm that is
commonly used for classification tasks. It is based on Bayes'
theorem and makes the assumption that features are
conditionally independent given the class label, which simplifies
the computation.

Preprocessing Text Data
• Tokenization
• Removing stop words
• Stemming and Lemmatization
• Handling rare words and misspellings

Building a Text Classification Model
• Data Splitting: Training and Testing Sets
• Vectorizing Text Data
• Model Training
• Evaluation Metrics (e.g., accuracy, precision, recall, F1 score)

Model Evaluation
• Confusion Matrix
• ROC Curve and AUC-ROC
• Precision-Recall Curve

Challenges in Text Classification
• Handling Imbalanced Classes
• Dealing with Multiclass Classification
• Overfitting and Underfitting

Feature Engineering for Text
Classification
• N-grams
• Word Embeddings
• Feature Scaling

Tools and Libraries
• Python libraries (e.g., NLTK, Scikit-learn, TensorFlow, PyTorch)

Model Interpretability
• predictions of a text classification model
• importance of model interpretability in certain applications

Best Practices
• Cross-validation
• Hyperparameter tuning
• Ensemble methods for improved performance

Future Trends
• Recent advancements in text classification
• Emerging technologies and methodologies

Text Classification Using Machine Learning.pptx

More Related Content

Similar to Text Classification Using Machine Learning.pptx

Recently uploaded

Text Classification Using Machine Learning.pptx