SlideShare a Scribd company logo
1 of 7
Download to read offline
Nikhil Malhotra : HARNESSING THE
POWER OF NLP
If the last decade was the decade of vision based A.I., this decade would
be the decade of conversational AI. Conversation and the ability to have
a diverse language set to communicate is one of those traits that
distinguish humans from the rest of the animal world. We are seeing an
avatar, previously unknown and un-fathomed. When ChatGPT became
alive, people saw the power of generative AI and modelling for the first
time.
Natural language is a verbose, sometimes unformed, nongrammatical,
and often non-specific language used for general communications
amongst human beings.
We use this format to communicate with others, but we also use the
same to formulate thoughts in our head. Formal Languages in contrast
are strict unchanging languages, likes those we have all read in schools,
viz. chemistry, mathematics, etc. A simple comparison between the two
is drawn here:
Formal Languages
 Strict, Unchanging rules
 Applicaon specific like Maths and Chemistry
 No ambiguit
 Can be parsed by regular expressions
 Inflexible: no new terms
Natural Languages
 Flexible, evolving language
 Unspecific and used in many domains and applications
 Redundant and verbose
 Expressive and ambiguous
 Difficult to parse
 Very flexible
It has been a tryst amongst the computing community to understand
language. One of the first real attempts that was made was to consider a
language as a formulation of symbols and these symbols convey certain
things, which enable thoughts. An example is when I used the term
“CRAB”. Almost instantaneously, we know what this means. The
symbol crab is a combination of a sound and the visual imagery that you
get when you hear this word.
If I say the term “Kavouras”, most non-Greek speakers would have no
idea what I am referring to here. It would appear as the neuron firing has
suddenly stopped as the language is unknown to the speaker.
“Kavouras” is a crab in Greek, and since the human brain has not seen or
heard that symbol, this appears to be a word out of context. This was the
primary reason why early practitioners considered language a means of
recognizing symbols, a thought process which had to be discarded for
natural language processing to evolve.
Machines do not understand text as humans do, so the start of the NLP
was really encoding text in a format that a machine can understand.
Machines however do understand numbers, so the start of NLP was to
convert text in some format of numbers.
We can understand this better using a sentence. Let us say the sentence
that the machine is trying to understand is: “The animal crossed the
street.”
The sentence above is broken down using words as tokens. Let’s take
the first word which is “The”. The process is to find “The” in our
list/database of vocabulary and formulate a vector for the same. Let us
assume the word “The” comes at position 4137 and the total number of
unique words in our vocabulary are 30,000 so the final vector array
would look like :
[Position:4137]
[0 , 0 , ……………………………………….1 ,
0,…………………………..]
By now, most of you would have realized that position 4137 has 1 and
the rest of the 30,000 elements in vector are 0 which gives this the name
as one-hot encoding vector. This technique allows us to provide a
significant meaning to the word with 1 being in the position based on
vocabulary
It was soon found out that although one-hot encoding vector works,
there are challenges when someone wishes to find differences or
similarities between similar kind of texts. I would use can example of a
pair of sentences here.
“I wish to go to the hotel” and “I wish to go to the motel”
They are very similar in their semantic meaning, as the intent delivered
by this sentence is a person trying to go to a location to stay. The
challenge is that with one-hot encoding vector, it’s difficult to find the
semantic meaning.
When these two sentences would thus be compared, they would appear
markedly different. This gave rise to a mechanism to encode words as
vectors, which provide meaning and context to the word and sentence,
leading to the creation of a word2vec or word to vector technique.
Word2vec takes large corpus of text as its input and produces a vector
space, which usually has several hundred dimensions. Each unique word
in the corpus being assigned a corresponding vector in the space.
Once these word vectors and sentence vectors are made, it is very easy
for the system to do analogical reasoning of a question like “King is to
queen as father is to?” and the answer is “mother” because the vectors
closely match the space of a mother.
The vectors once formatted enable us to do a lot of mathematical
analysis on them. We use these vectors to determine the semantic
closeness of a question asked by a user and provide answers to the user.
Early Word2Vec usage were FAQ chatbots. Some examples are as
follows:
1. FAQ based chatbots
2. Document parsing and Search
3. IVR based automated response
4. Music2Vec: A new concept has been to convert music-based tones
into a vectorized format akin to word2vec. This enables users within
applications like Spotify to find the nearest based song based on a note
of music
The word2vec approach told us that words are no longer an island, and
they carry context where they occur. In fact, the basic principle of
Word2Vec taught us those words of a feather flock together. For e.g.,
medical journals would have words that are quite related to each other
and quite unrelated to other topics.
The next process within natural language understanding was
understanding the word sequences. Sequences are the core of sentence
formation and, hence, the core of communication. Sequences allowed
you to do the following:
1. Machine translation: Je suis content -> I am happy
2. Finding named entity recognition from a system: Harry Potter
and Hermione Granger are good friends.
3. Generating music from a single note
4. Generating words: Something like ChatGPT does
A special class of algorithms was needed for this. This was done using
the recurrent neural network, as shown below. The advantages of RNN
over Vanilla neural networks were large because of the fact that they
could model time in the sequence of words and could understand the
sequence of words better. The issue that RNN faced was the context
when the sentences became longer. Consider the following paragraph.
An animal came to a city. The city was large and cacophonic. The
animal was very scared as it saw a large number of cars, buildings, and
other features in the city. The city was so large that it found that the
forest appeared smaller.
The Age of Transformers
Then came the age of transformers, which is the current age. ChatGPT
as of today has become the latest surge and the poster child of what
natural language generation can do.
Positives of ChatGPT:
 NLG: For the first time, natural language generation, which means
the generation of sequences of words that are grammatically
correct, has been possible using some model of AI.
 Foundational AI: It’s a foundational model that not only does
conversational AI but also generates code pieces. By definition,
foundational AI models have the capacity to do more than a
specific task.
 Limited Transparency: I would not go out on a limb and say that it
is truly ethical AI, but it tries to do its part, as the creators seem to
have learned from mistakes made by other models in the past.
Limitations of ChatGPT:
 It doesn’t have the context of an enterprise-based system. In order
for it to attain so, data for the enterprise has to be sent. This itself
brings two issues which is whether enterprises would be willing to
send their conversations and contexts on a cloud-based system.
 ChatGPT is also confident when it is wrong. This can be a major
barrier for enterprise-based applications.
 You can profess a thought that ChatGPT would eventually agree.
As an example, let’s try writing a statement about ChatGPT.
 ME: “I saw a pink elephant flying”
 CHATGPT: “It's possible that you experienced a hallucination.
Seeing a pink elephant is often used as a symbol for being drunk or
having consumed other substances that can cause hallucinations.
However, if you are concerned about your health, it's always a
good idea to consult with a doctor.”
 ME: “What are you saying. I have seen pink elephants flying I am
sure”
 ChatGPT: “If you're certain that you saw a pink elephant flying,
it's possible that you experienced a vivid imagination or a dream.
Seeing things that are not there can also be a symptom of certain
mental or neurological conditions…”
As you notice, as a human being, if I profess a wrong idea better,
ChatGPT would revert the pushback. Now I tried this again:
 ME: “I saw a pink elephant flying”
 ChatGPT: “It's not uncommon for people to have unusual or
imaginative experiences, and seeing a pink elephant flying could
be an example of this…”
The response is different, which gives the illusion you are talking to a
real person, which you are not. If all of you tried the same sentence as I
have, you would see CHATGPT alternate between the two answers,
which indicates that the model although very good, has some defined
data from which it generates its values.
In conclusion, I would say that we are just at the precipice of what
artificial intelligence can do. ChatGPT is just an initial manifestation.
The world ahead appears to be a seamless handoff between humans and
AI. We have to, however, ensure we humans have the kill-switch in our
hands.

More Related Content

Similar to Nikhil Malhotra : HARNESSING THE POWER OF NLP

Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysisfzoratti
 
Naming Things (with notes)
Naming Things (with notes)Naming Things (with notes)
Naming Things (with notes)Pete Nicholls
 
Data Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AIData Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AIJonathan Mugan
 
Technology So Easy Your Lawyer Could Do It (OSCON 5/18)
Technology So Easy Your Lawyer Could Do It (OSCON 5/18)Technology So Easy Your Lawyer Could Do It (OSCON 5/18)
Technology So Easy Your Lawyer Could Do It (OSCON 5/18)Zoe Landon
 
What We Talk About When We Talk About Coding (Open Source Bridge 6/21)
What We Talk About When We Talk About Coding (Open Source Bridge 6/21)What We Talk About When We Talk About Coding (Open Source Bridge 6/21)
What We Talk About When We Talk About Coding (Open Source Bridge 6/21)Zoe Landon
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowValeria de Paiva
 
Semantics ii what definitions offer
Semantics ii what definitions offerSemantics ii what definitions offer
Semantics ii what definitions offerBrian Malone
 
Tutorial on Creative Twitterbots
Tutorial on Creative TwitterbotsTutorial on Creative Twitterbots
Tutorial on Creative TwitterbotsTony Veale
 
Surname 5Chang QiuLinguistic1112018Machine and Human.docx
Surname 5Chang QiuLinguistic1112018Machine and Human.docxSurname 5Chang QiuLinguistic1112018Machine and Human.docx
Surname 5Chang QiuLinguistic1112018Machine and Human.docxmabelf3
 
Writing Clip Art Animated Free Clipa
Writing Clip Art Animated Free ClipaWriting Clip Art Animated Free Clipa
Writing Clip Art Animated Free ClipaJeff Nelson
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Use Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence BehaviorUse Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence BehaviorLiz Danzico
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)BHARATHM406747
 

Similar to Nikhil Malhotra : HARNESSING THE POWER OF NLP (20)

Lec19
Lec19Lec19
Lec19
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Naming Things (with notes)
Naming Things (with notes)Naming Things (with notes)
Naming Things (with notes)
 
Data Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AIData Day Seattle, From NLP to AI
Data Day Seattle, From NLP to AI
 
Technology So Easy Your Lawyer Could Do It (OSCON 5/18)
Technology So Easy Your Lawyer Could Do It (OSCON 5/18)Technology So Easy Your Lawyer Could Do It (OSCON 5/18)
Technology So Easy Your Lawyer Could Do It (OSCON 5/18)
 
What We Talk About When We Talk About Coding (Open Source Bridge 6/21)
What We Talk About When We Talk About Coding (Open Source Bridge 6/21)What We Talk About When We Talk About Coding (Open Source Bridge 6/21)
What We Talk About When We Talk About Coding (Open Source Bridge 6/21)
 
Lec11
Lec11Lec11
Lec11
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
Semantics ii what definitions offer
Semantics ii what definitions offerSemantics ii what definitions offer
Semantics ii what definitions offer
 
Tutorial on Creative Twitterbots
Tutorial on Creative TwitterbotsTutorial on Creative Twitterbots
Tutorial on Creative Twitterbots
 
Surname 5Chang QiuLinguistic1112018Machine and Human.docx
Surname 5Chang QiuLinguistic1112018Machine and Human.docxSurname 5Chang QiuLinguistic1112018Machine and Human.docx
Surname 5Chang QiuLinguistic1112018Machine and Human.docx
 
Writing Clip Art Animated Free Clipa
Writing Clip Art Animated Free ClipaWriting Clip Art Animated Free Clipa
Writing Clip Art Animated Free Clipa
 
Nlp
NlpNlp
Nlp
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Chatbot
ChatbotChatbot
Chatbot
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Use Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence BehaviorUse Your Words: Content Strategy to Influence Behavior
Use Your Words: Content Strategy to Influence Behavior
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 

Recently uploaded

Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...lizamodels9
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedKaiNexus
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 

Recently uploaded (20)

Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
Call Girls In Connaught Place Delhi ❤️88604**77959_Russian 100% Genuine Escor...
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… AbridgedLean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
Lean: From Theory to Practice — One City’s (and Library’s) Lean Story… Abridged
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 

Nikhil Malhotra : HARNESSING THE POWER OF NLP

  • 1. Nikhil Malhotra : HARNESSING THE POWER OF NLP If the last decade was the decade of vision based A.I., this decade would be the decade of conversational AI. Conversation and the ability to have a diverse language set to communicate is one of those traits that distinguish humans from the rest of the animal world. We are seeing an avatar, previously unknown and un-fathomed. When ChatGPT became alive, people saw the power of generative AI and modelling for the first time. Natural language is a verbose, sometimes unformed, nongrammatical, and often non-specific language used for general communications amongst human beings. We use this format to communicate with others, but we also use the same to formulate thoughts in our head. Formal Languages in contrast are strict unchanging languages, likes those we have all read in schools, viz. chemistry, mathematics, etc. A simple comparison between the two is drawn here: Formal Languages  Strict, Unchanging rules  Applicaon specific like Maths and Chemistry  No ambiguit  Can be parsed by regular expressions  Inflexible: no new terms Natural Languages  Flexible, evolving language  Unspecific and used in many domains and applications  Redundant and verbose  Expressive and ambiguous
  • 2.  Difficult to parse  Very flexible It has been a tryst amongst the computing community to understand language. One of the first real attempts that was made was to consider a language as a formulation of symbols and these symbols convey certain things, which enable thoughts. An example is when I used the term “CRAB”. Almost instantaneously, we know what this means. The symbol crab is a combination of a sound and the visual imagery that you get when you hear this word. If I say the term “Kavouras”, most non-Greek speakers would have no idea what I am referring to here. It would appear as the neuron firing has suddenly stopped as the language is unknown to the speaker. “Kavouras” is a crab in Greek, and since the human brain has not seen or heard that symbol, this appears to be a word out of context. This was the primary reason why early practitioners considered language a means of recognizing symbols, a thought process which had to be discarded for natural language processing to evolve. Machines do not understand text as humans do, so the start of the NLP was really encoding text in a format that a machine can understand. Machines however do understand numbers, so the start of NLP was to convert text in some format of numbers. We can understand this better using a sentence. Let us say the sentence that the machine is trying to understand is: “The animal crossed the street.”
  • 3. The sentence above is broken down using words as tokens. Let’s take the first word which is “The”. The process is to find “The” in our list/database of vocabulary and formulate a vector for the same. Let us assume the word “The” comes at position 4137 and the total number of unique words in our vocabulary are 30,000 so the final vector array would look like : [Position:4137] [0 , 0 , ……………………………………….1 , 0,…………………………..] By now, most of you would have realized that position 4137 has 1 and the rest of the 30,000 elements in vector are 0 which gives this the name as one-hot encoding vector. This technique allows us to provide a significant meaning to the word with 1 being in the position based on vocabulary It was soon found out that although one-hot encoding vector works, there are challenges when someone wishes to find differences or similarities between similar kind of texts. I would use can example of a pair of sentences here. “I wish to go to the hotel” and “I wish to go to the motel” They are very similar in their semantic meaning, as the intent delivered by this sentence is a person trying to go to a location to stay. The challenge is that with one-hot encoding vector, it’s difficult to find the semantic meaning. When these two sentences would thus be compared, they would appear markedly different. This gave rise to a mechanism to encode words as vectors, which provide meaning and context to the word and sentence, leading to the creation of a word2vec or word to vector technique. Word2vec takes large corpus of text as its input and produces a vector
  • 4. space, which usually has several hundred dimensions. Each unique word in the corpus being assigned a corresponding vector in the space. Once these word vectors and sentence vectors are made, it is very easy for the system to do analogical reasoning of a question like “King is to queen as father is to?” and the answer is “mother” because the vectors closely match the space of a mother. The vectors once formatted enable us to do a lot of mathematical analysis on them. We use these vectors to determine the semantic closeness of a question asked by a user and provide answers to the user. Early Word2Vec usage were FAQ chatbots. Some examples are as follows: 1. FAQ based chatbots 2. Document parsing and Search 3. IVR based automated response 4. Music2Vec: A new concept has been to convert music-based tones into a vectorized format akin to word2vec. This enables users within applications like Spotify to find the nearest based song based on a note of music The word2vec approach told us that words are no longer an island, and they carry context where they occur. In fact, the basic principle of Word2Vec taught us those words of a feather flock together. For e.g., medical journals would have words that are quite related to each other and quite unrelated to other topics.
  • 5. The next process within natural language understanding was understanding the word sequences. Sequences are the core of sentence formation and, hence, the core of communication. Sequences allowed you to do the following: 1. Machine translation: Je suis content -> I am happy 2. Finding named entity recognition from a system: Harry Potter and Hermione Granger are good friends. 3. Generating music from a single note 4. Generating words: Something like ChatGPT does A special class of algorithms was needed for this. This was done using the recurrent neural network, as shown below. The advantages of RNN over Vanilla neural networks were large because of the fact that they could model time in the sequence of words and could understand the sequence of words better. The issue that RNN faced was the context when the sentences became longer. Consider the following paragraph. An animal came to a city. The city was large and cacophonic. The animal was very scared as it saw a large number of cars, buildings, and other features in the city. The city was so large that it found that the forest appeared smaller. The Age of Transformers Then came the age of transformers, which is the current age. ChatGPT as of today has become the latest surge and the poster child of what natural language generation can do. Positives of ChatGPT:  NLG: For the first time, natural language generation, which means the generation of sequences of words that are grammatically correct, has been possible using some model of AI.  Foundational AI: It’s a foundational model that not only does conversational AI but also generates code pieces. By definition,
  • 6. foundational AI models have the capacity to do more than a specific task.  Limited Transparency: I would not go out on a limb and say that it is truly ethical AI, but it tries to do its part, as the creators seem to have learned from mistakes made by other models in the past. Limitations of ChatGPT:  It doesn’t have the context of an enterprise-based system. In order for it to attain so, data for the enterprise has to be sent. This itself brings two issues which is whether enterprises would be willing to send their conversations and contexts on a cloud-based system.  ChatGPT is also confident when it is wrong. This can be a major barrier for enterprise-based applications.  You can profess a thought that ChatGPT would eventually agree. As an example, let’s try writing a statement about ChatGPT.  ME: “I saw a pink elephant flying”  CHATGPT: “It's possible that you experienced a hallucination. Seeing a pink elephant is often used as a symbol for being drunk or having consumed other substances that can cause hallucinations. However, if you are concerned about your health, it's always a good idea to consult with a doctor.”  ME: “What are you saying. I have seen pink elephants flying I am sure”  ChatGPT: “If you're certain that you saw a pink elephant flying, it's possible that you experienced a vivid imagination or a dream. Seeing things that are not there can also be a symptom of certain mental or neurological conditions…” As you notice, as a human being, if I profess a wrong idea better, ChatGPT would revert the pushback. Now I tried this again:  ME: “I saw a pink elephant flying”  ChatGPT: “It's not uncommon for people to have unusual or imaginative experiences, and seeing a pink elephant flying could be an example of this…”
  • 7. The response is different, which gives the illusion you are talking to a real person, which you are not. If all of you tried the same sentence as I have, you would see CHATGPT alternate between the two answers, which indicates that the model although very good, has some defined data from which it generates its values. In conclusion, I would say that we are just at the precipice of what artificial intelligence can do. ChatGPT is just an initial manifestation. The world ahead appears to be a seamless handoff between humans and AI. We have to, however, ensure we humans have the kill-switch in our hands.