Beyond Words: Journey into Large Language Models(LLMs) - Day-1

•Download as PPTX, PDF•

0 likes•13 views

SahithiGurlinka

Beyond Words: Journey into Large Language Models(LLMs)

Engineering

Welcoming you to Journey
into Large Language Models!

Agenda of this Journey
Session1: Intro to NLP
• Data Preprocessing
• Similarities
• Word Embeddings
• Visualization
GRU, RNN, Types of
RNNs, LSTMs , Practical
Transformers, Types of
transformers, Transformer
Architecture
Practical on Finetuning Bert
Transformer using Hugging
Faces Library
Session2: NLP Using Deep Learning Session3: Advanced NLP
Session4: Practical

Introduction to NLP
What , Why , How?
Data Cleaning
• Tokenization
• Stopwords removal
• Stemming
• Lemmatization
• Morphological Segmentation
Vectorization/Embeddings.
Cosine Similarity,
Euclidean distance.
Types of text transformations
• OneHotEncoding (OHE)
• Bag of Words (BOW)
• Word2Vec, AvgWord2vec
Visualization of Word Vectors
• Using t-SNE

Data Preprocessing
Tokenization : conversion of text into tokens.
Ex : GDSC is a university based community
group for students.
LowerCasing:
Ex: SATYA – satya
Stopwords Removal :
Ex: is, a, the, etc.
Stemming: Reducing words to their base or root form
by removing suffixes or prefixes.

Lemmatization(Lemma): Reducing words to their
base or root form by removing suffixes or prefixes.
Difference?
Ex: I am riding my bicycle to the store..
stem:"I am ride my bicycl to the store."
Lemma:"I be ride my bicycle to the store."
Morphological segmentation.
This divides words into smaller parts called
morphemes.
Ex: Untestably - "un," "test," "able" and "ly" as
morphemes (useful in lang translation)
Data Preprocessing

Vectorization/Word Embedding
Frequency/Count Based
• OneHotEncoding
• Bag of Word
• CountVectorizer
• Tf-Idf
• Glove
Predictive Based
• Word2Vec
• CBOW
• Skip-Gram
• AvgWord2Vec

Bag of Words
Link : https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/
Review 1: This movie is very scary and long
Review 2: This movie is not scary and is slow
Review 3: This movie is spooky and good
Vector of Review 1: [1 1 1 1 1 1 1 0 0 0 0]
Vector of Review 2: [1 1 2 0 0 1 1 0 1 0 0]
Vector of Review 3: [1 1 1 0 0 0 1 0 0 1 1]

3 men are fishing in their boat when a sudden monster wave sends them all
overboard and into the water. Only 1 man got his hair wet. How?

Which word in the dictionary is spelt incorrectly?

Which letter of the alphabet has the most water?

What did the lava say to his girlfriend?

What do you call a guy who’s really loud?

How can machine know two words are similar or not?

Featurized representation of word embedding
It is interesting to know that King - Man + Woman ≈
Queen!

Enough!! Enough saying!
Can you Visualize and do some
practical session??

https://forms.gle/TCzWP8mQQ4qpB83r7
Feedback?

Similar to Beyond Words: Journey into Large Language Models(LLMs) - Day-1

Anthiil Inside workshop on NLPSatyam Saxena

Representation Learning of Text for NLPAnuj Gupta

Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri

From Semantics to Self-supervised Learning for Speech and Beyondlinshanleearchive

Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena

1908 working memoryWarNik Chow

Word vectorsAdwait Bhave

Machine translator IntroductionHamid Shahrivari Joghan

[GAN by Hung-yi Lee]Part 3: The recent research of my groupNAVER Engineering

UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts

Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik

Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters

Open2012 if-you-build-itthe nciia

Natural Language ProcessingSandeep Malhotra

A Simple Walkthrough of Word Sense DisambiguationMaryOsborne11

Introduction to natural language processing (NLP)Alia Hamwi

Word2vectorAshis Chanda

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim

Question Answering - Application and ChallengesJens Lehmann

Teaching Speaking Online TESL 2014Judy Thompson

Similar to Beyond Words: Journey into Large Language Models(LLMs) - Day-1 (20)

Anthiil Inside workshop on NLP

Representation Learning of Text for NLP

Natural Language Processing, Techniques, Current Trends and Applications in I...

From Semantics to Self-supervised Learning for Speech and Beyond

Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...

1908 working memory

Word vectors

Machine translator Introduction

[GAN by Hung-yi Lee]Part 3: The recent research of my group

UCU NLP Summer Workshops 2017 - Part 2

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Deep Learning for Natural Language Processing: Word Embeddings

Open2012 if-you-build-it

Natural Language Processing

A Simple Walkthrough of Word Sense Disambiguation

Introduction to natural language processing (NLP)

Word2vector

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Question Answering - Application and Challenges

Teaching Speaking Online TESL 2014

Recently uploaded

Double Revolving field theory-how the rotor develops torqueBhangaleSonal

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai

A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath

DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal

Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm

Thermal Engineering -unit - III & IV.pptDineshKumar4165

Online food ordering system project report.pdfKamal Acharya

fitting shop and tools used in fitting shop .pptAfnanAhmad53

Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti

Electromagnetic relays used for power system .pptxNANDHAKUMARA10

Integrated Test Rig For HTFE-25 - NeometrixNeometrix_Engineering_Pvt_Ltd

457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876

Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156

UNIT 4 PTRP final Convergence in probability.pptxkalpana413121

Introduction to Serverless with AWS LambdaOmar Fathy

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X79953056974 Low Rate Call Girls In Saket, Delhi NCR

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Kandungan 087776558899

Signal Processing and Linear System AnalysisNational Chung Hsing University

Recently uploaded (20)

Double Revolving field theory-how the rotor develops torque

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...

A Study of Urban Area Plan for Pabna Municipality

DC MACHINE-Motoring and generation, Armature circuit equation

Basic Electronics for diploma students as per technical education Kerala Syll...

Thermal Engineering -unit - III & IV.ppt

Online food ordering system project report.pdf

fitting shop and tools used in fitting shop .ppt

Linux Systems Programming: Inter Process Communication (IPC) using Pipes

Electromagnetic relays used for power system .pptx

Integrated Test Rig For HTFE-25 - Neometrix

457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx

Standard vs Custom Battery Packs - Decoding the Power Play

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service

UNIT 4 PTRP final Convergence in probability.pptx

Introduction to Serverless with AWS Lambda

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil

Signal Processing and Linear System Analysis

Beyond Words: Journey into Large Language Models(LLMs) - Day-1

1. Welcoming you to Journey into Large Language Models!

2. Agenda of this Journey Session1: Intro to NLP • Data Preprocessing • Similarities • Word Embeddings • Visualization GRU, RNN, Types of RNNs, LSTMs , Practical Transformers, Types of transformers, Transformer Architecture Practical on Finetuning Bert Transformer using Hugging Faces Library Session2: NLP Using Deep Learning Session3: Advanced NLP Session4: Practical

3. Introduction to NLP What , Why , How? Data Cleaning • Tokenization • Stopwords removal • Stemming • Lemmatization • Morphological Segmentation Vectorization/Embeddings. Cosine Similarity, Euclidean distance. Types of text transformations • OneHotEncoding (OHE) • Bag of Words (BOW) • Word2Vec, AvgWord2vec Visualization of Word Vectors • Using t-SNE

4. What is NLP? Why NLP? How NLP works?

5. Data Preprocessing Tokenization : conversion of text into tokens. Ex : GDSC is a university based community group for students. LowerCasing: Ex: SATYA – satya Stopwords Removal : Ex: is, a, the, etc. Stemming: Reducing words to their base or root form by removing suffixes or prefixes.

6. Lemmatization(Lemma): Reducing words to their base or root form by removing suffixes or prefixes. Difference? Ex: I am riding my bicycle to the store.. stem:"I am ride my bicycl to the store." Lemma:"I be ride my bicycle to the store." Morphological segmentation. This divides words into smaller parts called morphemes. Ex: Untestably - "un," "test," "able" and "ly" as morphemes (useful in lang translation) Data Preprocessing

7. Vectorization/Word Embedding Frequency/Count Based • OneHotEncoding • Bag of Word • CountVectorizer • Tf-Idf • Glove Predictive Based • Word2Vec • CBOW • Skip-Gram • AvgWord2Vec

8. Bag of Words Link : https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/ Review 1: This movie is very scary and long Review 2: This movie is not scary and is slow Review 3: This movie is spooky and good Vector of Review 1: [1 1 1 1 1 1 1 0 0 0 0] Vector of Review 2: [1 1 2 0 0 1 1 0 1 0 0] Vector of Review 3: [1 1 1 0 0 0 1 0 0 1 1]

9. Word2Vec CBOW

10. Skip-Gram

11. It’s fun time…huhu!!

12. 3 men are fishing in their boat when a sudden monster wave sends them all overboard and into the water. Only 1 man got his hair wet. How?

13. Which word in the dictionary is spelt incorrectly?

14. Which letter of the alphabet has the most water?

15. What did the lava say to his girlfriend?

16. What do you call a guy who’s really loud?

17. How can machine know two words are similar or not?

18. Euclidean Distance

19. Cosine Similarity: joao Filix

20. Featurized representation of word embedding It is interesting to know that King - Man + Woman ≈ Queen!