SlideShare a Scribd company logo
1 of 24
Welcoming you to Journey
into Large Language Models!
Agenda of this Journey
Session1: Intro to NLP
• Data Preprocessing
• Similarities
• Word Embeddings
• Visualization
GRU, RNN, Types of
RNNs, LSTMs , Practical
Transformers, Types of
transformers, Transformer
Architecture
Practical on Finetuning Bert
Transformer using Hugging
Faces Library
Session2: NLP Using Deep Learning Session3: Advanced NLP
Session4: Practical
Introduction to NLP
What , Why , How?
Data Cleaning
• Tokenization
• Stopwords removal
• Stemming
• Lemmatization
• Morphological Segmentation
Vectorization/Embeddings.
Cosine Similarity,
Euclidean distance.
Types of text transformations
• OneHotEncoding (OHE)
• Bag of Words (BOW)
• Word2Vec, AvgWord2vec
Visualization of Word Vectors
• Using t-SNE
What is NLP?
Why NLP?
How NLP works?
Data Preprocessing
Tokenization : conversion of text into tokens.
Ex : GDSC is a university based community
group for students.
LowerCasing:
Ex: SATYA – satya
Stopwords Removal :
Ex: is, a, the, etc.
Stemming: Reducing words to their base or root form
by removing suffixes or prefixes.
Lemmatization(Lemma): Reducing words to their
base or root form by removing suffixes or prefixes.
Difference?
Ex: I am riding my bicycle to the store..
stem:"I am ride my bicycl to the store."
Lemma:"I be ride my bicycle to the store."
Morphological segmentation.
This divides words into smaller parts called
morphemes.
Ex: Untestably - "un," "test," "able" and "ly" as
morphemes (useful in lang translation)
Data Preprocessing
Vectorization/Word Embedding
Frequency/Count Based
• OneHotEncoding
• Bag of Word
• CountVectorizer
• Tf-Idf
• Glove
Predictive Based
• Word2Vec
• CBOW
• Skip-Gram
• AvgWord2Vec
Bag of Words
Link : https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/
Review 1: This movie is very scary and long
Review 2: This movie is not scary and is slow
Review 3: This movie is spooky and good
Vector of Review 1: [1 1 1 1 1 1 1 0 0 0 0]
Vector of Review 2: [1 1 2 0 0 1 1 0 1 0 0]
Vector of Review 3: [1 1 1 0 0 0 1 0 0 1 1]
Word2Vec
CBOW
Skip-Gram
It’s fun time…huhu!!
3 men are fishing in their boat when a sudden monster wave sends them all
overboard and into the water. Only 1 man got his hair wet. How?
Which word in the dictionary is spelt incorrectly?
Which letter of the alphabet has the most water?
What did the lava say to his girlfriend?
What do you call a guy who’s really loud?
How can machine know two words are similar or not?
Euclidean Distance
Cosine Similarity: joao Filix
Featurized representation of word embedding
It is interesting to know that King - Man + Woman ≈
Queen!
Enough!! Enough saying!
Can you Visualize and do some
practical session??
Questions?
https://forms.gle/TCzWP8mQQ4qpB83r7
Feedback?
Thank You!

More Related Content

Similar to Beyond Words: Journey into Large Language Models(LLMs) - Day-1

Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
From Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and BeyondFrom Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and Beyondlinshanleearchive
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
 
1908 working memory
1908 working memory1908 working memory
1908 working memoryWarNik Chow
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my groupNAVER Engineering
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Open2012 if-you-build-it
Open2012 if-you-build-itOpen2012 if-you-build-it
Open2012 if-you-build-itthe nciia
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSandeep Malhotra
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationMaryOsborne11
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014Judy Thompson
 

Similar to Beyond Words: Journey into Large Language Models(LLMs) - Day-1 (20)

Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
From Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and BeyondFrom Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and Beyond
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
 
1908 working memory
1908 working memory1908 working memory
1908 working memory
 
Word vectors
Word vectorsWord vectors
Word vectors
 
Machine translator Introduction
Machine translator IntroductionMachine translator Introduction
Machine translator Introduction
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Open2012 if-you-build-it
Open2012 if-you-build-itOpen2012 if-you-build-it
Open2012 if-you-build-it
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
A Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense DisambiguationA Simple Walkthrough of Word Sense Disambiguation
A Simple Walkthrough of Word Sense Disambiguation
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Word2vector
Word2vectorWord2vector
Word2vector
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014Teaching Speaking Online TESL 2014
Teaching Speaking Online TESL 2014
 

More from SahithiGurlinka

GDSC - GVPCE -Workshop on Git and GitHub
GDSC - GVPCE -Workshop on Git and GitHubGDSC - GVPCE -Workshop on Git and GitHub
GDSC - GVPCE -Workshop on Git and GitHubSahithiGurlinka
 
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4SahithiGurlinka
 
GDSC Google Cloud Study Jams Session - 3
GDSC Google Cloud Study Jams  Session -  3GDSC Google Cloud Study Jams  Session -  3
GDSC Google Cloud Study Jams Session - 3SahithiGurlinka
 
GDSC Web Bootcamp - Day - 2 - JavaScript
GDSC Web Bootcamp -  Day - 2   - JavaScriptGDSC Web Bootcamp -  Day - 2   - JavaScript
GDSC Web Bootcamp - Day - 2 - JavaScriptSahithiGurlinka
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxSahithiGurlinka
 
GDSC Study Jam Session 1
GDSC Study Jam Session 1GDSC Study Jam Session 1
GDSC Study Jam Session 1SahithiGurlinka
 
Building Career in Tech.pdf
Building Career in Tech.pdfBuilding Career in Tech.pdf
Building Career in Tech.pdfSahithiGurlinka
 
Info Session 2023-24.pdf
Info Session 2023-24.pdfInfo Session 2023-24.pdf
Info Session 2023-24.pdfSahithiGurlinka
 

More from SahithiGurlinka (13)

GDSC - GVPCE -Workshop on Git and GitHub
GDSC - GVPCE -Workshop on Git and GitHubGDSC - GVPCE -Workshop on Git and GitHub
GDSC - GVPCE -Workshop on Git and GitHub
 
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4GDSC  Google Cloud Study jam Web Bootcamp - Day-4  Session 4
GDSC Google Cloud Study jam Web Bootcamp - Day-4 Session 4
 
GDSC Google Cloud Study Jams Session - 3
GDSC Google Cloud Study Jams  Session -  3GDSC Google Cloud Study Jams  Session -  3
GDSC Google Cloud Study Jams Session - 3
 
GDSC Web Bootcamp - Day - 2 - JavaScript
GDSC Web Bootcamp -  Day - 2   - JavaScriptGDSC Web Bootcamp -  Day - 2   - JavaScript
GDSC Web Bootcamp - Day - 2 - JavaScript
 
GCSJ Session 4.pdf
GCSJ Session 4.pdfGCSJ Session 4.pdf
GCSJ Session 4.pdf
 
AlgoChase.pptx
AlgoChase.pptxAlgoChase.pptx
AlgoChase.pptx
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptx
 
GDSC Study Jam Session 1
GDSC Study Jam Session 1GDSC Study Jam Session 1
GDSC Study Jam Session 1
 
Hacktoberfest.pptx
Hacktoberfest.pptxHacktoberfest.pptx
Hacktoberfest.pptx
 
Blockchain Workshop
Blockchain WorkshopBlockchain Workshop
Blockchain Workshop
 
Google Cloud Study Jams
Google Cloud Study JamsGoogle Cloud Study Jams
Google Cloud Study Jams
 
Building Career in Tech.pdf
Building Career in Tech.pdfBuilding Career in Tech.pdf
Building Career in Tech.pdf
 
Info Session 2023-24.pdf
Info Session 2023-24.pdfInfo Session 2023-24.pdf
Info Session 2023-24.pdf
 

Recently uploaded

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 

Recently uploaded (20)

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 

Beyond Words: Journey into Large Language Models(LLMs) - Day-1

  • 1. Welcoming you to Journey into Large Language Models!
  • 2. Agenda of this Journey Session1: Intro to NLP • Data Preprocessing • Similarities • Word Embeddings • Visualization GRU, RNN, Types of RNNs, LSTMs , Practical Transformers, Types of transformers, Transformer Architecture Practical on Finetuning Bert Transformer using Hugging Faces Library Session2: NLP Using Deep Learning Session3: Advanced NLP Session4: Practical
  • 3. Introduction to NLP What , Why , How? Data Cleaning • Tokenization • Stopwords removal • Stemming • Lemmatization • Morphological Segmentation Vectorization/Embeddings. Cosine Similarity, Euclidean distance. Types of text transformations • OneHotEncoding (OHE) • Bag of Words (BOW) • Word2Vec, AvgWord2vec Visualization of Word Vectors • Using t-SNE
  • 4. What is NLP? Why NLP? How NLP works?
  • 5. Data Preprocessing Tokenization : conversion of text into tokens. Ex : GDSC is a university based community group for students. LowerCasing: Ex: SATYA – satya Stopwords Removal : Ex: is, a, the, etc. Stemming: Reducing words to their base or root form by removing suffixes or prefixes.
  • 6. Lemmatization(Lemma): Reducing words to their base or root form by removing suffixes or prefixes. Difference? Ex: I am riding my bicycle to the store.. stem:"I am ride my bicycl to the store." Lemma:"I be ride my bicycle to the store." Morphological segmentation. This divides words into smaller parts called morphemes. Ex: Untestably - "un," "test," "able" and "ly" as morphemes (useful in lang translation) Data Preprocessing
  • 7. Vectorization/Word Embedding Frequency/Count Based • OneHotEncoding • Bag of Word • CountVectorizer • Tf-Idf • Glove Predictive Based • Word2Vec • CBOW • Skip-Gram • AvgWord2Vec
  • 8. Bag of Words Link : https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/ Review 1: This movie is very scary and long Review 2: This movie is not scary and is slow Review 3: This movie is spooky and good Vector of Review 1: [1 1 1 1 1 1 1 0 0 0 0] Vector of Review 2: [1 1 2 0 0 1 1 0 1 0 0] Vector of Review 3: [1 1 1 0 0 0 1 0 0 1 1]
  • 12. 3 men are fishing in their boat when a sudden monster wave sends them all overboard and into the water. Only 1 man got his hair wet. How?
  • 13. Which word in the dictionary is spelt incorrectly?
  • 14. Which letter of the alphabet has the most water?
  • 15. What did the lava say to his girlfriend?
  • 16. What do you call a guy who’s really loud?
  • 17. How can machine know two words are similar or not?
  • 20. Featurized representation of word embedding It is interesting to know that King - Man + Woman ≈ Queen!
  • 21. Enough!! Enough saying! Can you Visualize and do some practical session??