What is Linguistics ?Definition :Scientific Study of Human Language. Linguistic forms are language form language meaning language in context.
Linguistic Intelligence• People who are good with words and languages fit in the category of linguistic intelligence.Alexander Graham Bell Abraham Lincoln William Shakespeare
Linguistic Form• Grammar• Morphology• Syntax• Phonology• Phonetics
Different Parts of SpeechTraditional grammar classifies into eight parts of speech :1. Verb2. Noun3. Pronoun4. Adjective5. Adverb6. Preposition7. Conjunction8. Interjection
Parts of SpeechThe Verb : It says something about a person or thing is called a verb.They are of three kindsTransitive verbIntransitive verbHelping verbeg: I enjoy dancing
The Noun : A noun is a word used as the name of a person , place or thing . There are two kinds of noun . They are concrete noun and abstract nounEg: The garage roof needs repairing .
The Pronoun : A pronoun is a word in a place of nounEg: Rita is my classmate. she is very intelligent First person pronoun I, Mine, me, myself. Second person pronoun You, yours , your Third person pronoun He , She , It, herself , himself
The Adverb :An adverb is a word that tell how , when or where an action is done.Eg: she ran slowly. How He came today. WhenThe Adjective : Adjective qualifies pronoun or nounEg: She was self-assured, smiling and very successful.
The Preposition : Preposition that is used before noun or pronoun that governs it.• Eg: I will be at home on Friday morning.The conjunction : eg: unless, but, as long asYou can use my car as long as you drive carefully
The Interjection : Interjections are short exclamations like Oh!, Um or Ah!Eg: expressing hesitation, doubt or disagreement "Hmm. Im not so sure.“ expressing surprise "Oh! Youre here!"
Sentence ConstructionSubject and predicate are the importantcomponent of sentence constructionA predicate is a verb that expresses thesubjects action or state of being.
Examplesubject predicateMichael Drove the carMichael drove the race car.
Phrases & ClausesA Phrase is a collection of words that may have nouns or verbs, but it does not have a subject doing a verb eg:• between ignorance and intelligence• broken into thousands of pieces• because of her glittering smile
Clause A clause is a collection of words that has a subject that is actively doing a verb.• because she smiled at him This is also example of dependent clauseExample for Independent clause• I drive a car.Example of connectivity.• Being rich is having money; being wealthy is having time
1. Jack came with Jill.2. The lady hit the man with an umbrella.3. He gave her cat food.
How to resolve the ambiguity? By constructing Syntactic tree A. Old men and women B. (Old men) and women
Pos tagging• Definition : The process of assigning a part- of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin) heat N Water V In P a Det Large adv vessel
WORD TAG Heat verb water noun in prep a det large adjvessel noun
Advantage of POS taggers?• Useful in Information retrievalText to speechWords disambiguation useful in preprocessing step of parsing -unique tag for each word reduces the number of parses
Pos tagging using tree bank S NP VP N V PP P NP N Apples grow on trees
• Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens.• The list of tokens becomes input for further processing such as parsing or text mining.• Tokenization is useful both in linguistics (where it is a form of text segmentation and in computer science, where it forms part of lexical analysis.
Pos tagging drawback Ambiguity• – “Plants/N need light and water.”• – “Each one plant/V one.”• – “Flies like a flower”• Flies: noun or verb? like: preposition, adverb, conjunction, noun, or verb? a: article, noun, or preposition? flower: noun or verb?
Stochastic probability• Based on probability of certain tag occurring, given various possibilities• Necessitates a training corpus A collection of sentences that have already been tagged• Several such corpora exist One of the best known is the Brown University Standard Corpus of Present-Day American English (or just the Brown Corpus) about 1,000,000 words from a wide variety of sources
Approach• Assign each word its most likely POS tag• – If w has tags t1, …, tk, then can use• P(ti | w) = c(w,ti)/(c(w,t1) + … + c(w,tk)), where c(w,ti) = number of times w/ti appears in the corpus
sources• Raymond Murphy –Intermediate english grammer• Upenn treebank II word tags• Other natural language processing website.