Natural Language Processing for Beginners.pptx

Natural Language Processing for
Beginners
Colleen M. Farrelly

About Me
• Data science
lead/entrepreneur
• Geometry and NLP
researcher
• Author
• The Shape of Data (NSP,
2023)
• Network science book
(Packt, 2024)
• Artist and Calligrapher

Generative
AI
A gentle introduction

What is
generative AI?
• Set of algorithms that generate:
• Images
• Text samples
• Videos
• Audio content
• Guided by:
• Training sample
• User specifications
• Deep learning architectures

GPT
• Generative Pre-trained
Transformer 4
• Decoder-only transformer
network
• Gives sequence-to-
sequence decoder with
long-range memory
• Already blurring lines
between human
composition and AI

Tools that should work in Anywhere
Speech generation:
• https://play.ht/text-to-speech-voices
Text generation (OpenAI alternative, GPT-2):
• https://huggingface.co/tasks/text-generation
Image generation:
• https://creator.nightcafe.studio/create

Sentiment
Analysis and
Text Classifiers
Theory and Practice

Sentiment Analysis
• Understand positive/negative/neutral
tone of text data
• Customer feedback
• Chatbot emotion regulation
• Predicting patient outcomes or
physician bias
• Expansion to other emotions:
• Anger
• Sadness
• Surprise
• Some packages exist for some languages
and applications.
• Other languages or emotions require
custom code and dictionaries.

Classification
• Surgical outcomes based on
physician notes
• Types of customer complaints
from automated feedback form
• Tasks for chatbot to route from
sales chatbot conversations

Wrangling Text to Numeric Matrix
• Document word
counts/frequencies
• Binary, count, or
weighted
frequency/
inverse
frequency
• Sparse numeric
matrix

Embeddings:
High
dimension to
low
dimension

Context Matters: Pretrained encoder/decoder neural networks

BERT Models
• Many BERT and RoBERTa models
on HuggingFace
• Pre-trained neural networks
• Good context for English
• Fairly complex sentences
• Fantasy or trademark words
• Some domain-specific versions
• Easy to use in Python
• Requires computer storage
space

Some Ethical
Considerations
• Training data bias (pretrained
embeddings or own data)
• Misuse of generative AI (fake
news)
• Representation in models
• Language accuracy biases in
multilingual models
• Misclassification biases
• Plagiarism bias against non-
native English speakers

Natural Language Processing for Beginners.pptx

More Related Content

Similar to Natural Language Processing for Beginners.pptx

More from Colleen Farrelly

Recently uploaded

Natural Language Processing for Beginners.pptx