SlideShare a Scribd company logo
The Role of Natural Language Processingin Information Retrieval Searching for Meaning in Text Tony Russell-Rose, PhD 21-Mar-2011
Contents Introduction & Overview Background, Terminology NLP Fundamentals Paradigms & Perspectives NLP Tools & Techniques NLP Applications Text mining, Question Answering, etc. Conclusions & Further Reading 21-Mar-2011 2 NLP & IR
Introduction & Overview The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 3 NLP & IR
Information Retrieval Models, techniques & frameworks for satisfying an information need Common paradigm: document retrieval 21-Mar-2011 4 NLP & IR
Document Retrieval Methods include Boolean, vector space, probabilistic Rely on index terms “bag of words” Stop lists + stemming But text is “unstructured” Information may be “hidden” 21-Mar-2011 5 NLP & IR
NLP – a solved problem? As humans we do it effortlessly ... don’t we? DRUNK GETS NINE YEARS IN VIOLIN CASE PROSTITUTES APPEAL TO POPE  STOLEN PAINTING FOUND BY TREE  RED TAPE HOLDS UP NEW BRIDGE DEER KILL 300,000 RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES  MINERS REFUSE TO WORK AFTER DEATH   21-Mar-2011 6 NLP & IR
Problems with Text (1) Polysemy One word maps to many concepts e.g. bat Synonymy One concept maps to many words 21-Mar-2011 7 NLP & IR
Problems with Text (2) Word order Venetian blind vs. Blind venetian 21-Mar-2011 8 NLP & IR
Problems with Text (3) Language is generative Starbucks coffee is the best The place I like most when I need to feed my caffeine addiction is the company from Seattle with branches everywhere Many different ways to express a given idea Synonymy, paraphrase, metaphor, etc. 21-Mar-2011 9 NLP & IR
Problems with Text (4) ,[object Object]
The meaning of a sentence is completely determined by the meaning of its symbols and the syntax used to combine themLanguage is a form of communication All communication has a context: time and place of utterance, the writer, the reader, their backgroundknowledge, intentions, assumptions re the reader’s knowledge/intentions, etc. 21-Mar-2011 10 NLP & IR
Problems with Text (5) Language is changing I want to buy a mobile Ill-formed input “accomodation office” Co-ordination, negation, etc. This is not a talk about neuro-linguistic programming Multi-linguality Claudia Schiffer is on the cover of Elle Also sarcasm, irony, slang, jargon, etc. That was a wicked lecture Yep – the coffee break was the best part 21-Mar-2011 11 NLP & IR
Enter NLP / Text Analytics… Text Analytics: a set of linguistic, analytical and predictive techniques to extract structure and meaning from unstructured documents NLP: the academic term for Text Analytics analogous to “search” vs. “IR” Text Analytics ~= Natural Language Processing ~= Text Mining Text Mining -> Scientific / technical context, automated processing Text Analytics -> Business context, interactive apps 21-Mar-2011 12 NLP & IR
NLP Fundamentals The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 13 NLP & IR
NLP Perspectives Computational Linguistics Use of computational techniques to study linguistic phenomena Cognitive science Study of human information processing (perception, language, reasoning, etc.) Computer science Theoretical foundations of computation and practical techniques for implementation  Information science Analysis, classification, manipulation, retrieval and dissemination of information 21-Mar-2011 14 NLP & IR
NLP Paradigms Symbolic approaches Rule-based, hand coded (by linguists) Knowledge-intensive Statistical approaches Shallow statistical models, trained on annotated corpora Data-intensive 21-Mar-2011 15 NLP & IR
NLP Fundamentals (word level) Language is AMBIGUOUS To find structure, we must remove ambiguity! Lexical analysis (tokenisation) The cat sat on the mat I can’t tokenise this sentence Stop word removal No definitive list The Who, The The, Take That… To be or not to be 21-Mar-2011 16 NLP & IR
NLP Fundamentals (word level) Stemming fishing, fished, fish, fisher -> fish Lemmatization Stemming + context Passing -> pass + ING Were -> be + PAST Delegate = de-leg-ate (?) Ratify = rat-ify (?) Morphology (prefixes, suffixes, etc.) Gebäudereinigungsfirmenangestellter -> Gebäude + Reinigung + Firma + Angestellter (building + cleaning + company + employee) 21-Mar-2011 17 NLP & IR
NLP Fundamentals (sentence level) Syntax (part of speech tagging) book -> NOUN, VERB that -> DETERMINER flight -> NOUN Book that flight -> VB DT NN Ambiguity problem Time flies like an arrow -> ? Fruit flies like a banana -> ? Eats shoots and leaves -> ? 21-Mar-2011 18 NLP & IR
NLP Fundamentals (sentence level) Parsing (grammar) I saw a venetian blind I saw a blind venetian I saw the man on the hill with a telescope Rugby is a game played by men with odd-shaped balls Sentence boundary detection Punctuation denotes the end of a sentence! “But not always!”, said Fred... 21-Mar-2011 19 NLP & IR
NLP Tools & Techniques The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 20 NLP & IR
Early NLP Systems ELIZA Wiezenbaum 1966 Simple pattern matching PARRY Colby et al 1971 Pattern matching & planning SHRDLU Winograd 1972 Natural language understanding Comprehensive grammar of English 21-Mar-2011 21 NLP & IR
Word Prediction Assistive technologies Aurora Systems, TextHelp Google, Bing, Yahoo query suggestions 21-Mar-2011 22 NLP & IR
Spelling Correction Auto-correct Did You Mean 21-Mar-2011 23 NLP & IR
Text Categorization News agencies: classify incoming news stories Search engines: classify queries, e.g. search Google for ‘LOG313’ Identifying spam emails, e.g. http://www.paulgraham.com/spam.html Routing email or documents to appropriate people 21-Mar-2011 24 NLP & IR
Terminology Extraction Differentiate between useful index terms and ‘noise’ Help lexicographers identify new terminology: the C4 carbons of the nicotinamide rings X-ray therapy vs. X-ray therapies PAHO vs. Pan-American Health Organization Term extraction systems process scientific papers to identify terminology, possibly comparing it with a known list 21-Mar-2011 25 NLP & IR
Speech Recognition Spoken Dialogue Systems Directory enquiries (AT&T) Railways (Germany, Switzerland, Italy) Airlines (USA) iPhone Voice Search IBM Watson 21-Mar-2011 26 NLP & IR
Named Entity Recognition Identification of key concepts,  e.g. people, places, organisations, etc. Also postcodes, temporal/numerical expressions, etc. “Mexico has been trying to stage a recovery since the beginning of this year and it's always been getting ahead of itself in terms of fundamentals,” said Matthew Hickman of Lehman Brothers in New York.” 21-Mar-2011 27 NLP & IR
Named Entity Recognition: Applications Increase precision of IR New companies in York vs. Companies in New York Support navigation SMART tags, Apture, etc. Improve machine translation: “I live in a new house in new york” Speech synthesis, auto-summarisation, etc. 21-Mar-2011 28 NLP & IR
Named Entity Recognition Systems usually developed for specific domain Are typically language and domain-specific Accuracy > 90%  for newswire text Best system at MUC-7 scored 93.39% f-measure human annotators scored 97.60% and 96.95%. Lower for certain domains, e.g. life sciences (species, substances, proteins, genes, etc.)  extended naming conventions, extensive use of conjunction and disjunction, high occurrence of neologisms and abbreviations, etc. 21-Mar-2011 29 NLP & IR
NER (Newssift Example) 21-Mar-2011 30 NLP & IR
Machine Translation (before) 21-Mar-2011 31 NLP & IR
Machine Translation (after) 21-Mar-2011 32 NLP & IR
Summarisation Section 3 discusses two information access applications (text mining and question answering) closely associated with NLP.  NLP Techniques Named entity recognition  Information Extraction  Current document retrieval technologies could not identify information as specific as this within text.  Word Sense Disambiguation Text Mining Question Answering  “Wh” questions List questions  Examples include those mentioned here: text mining, question answering and cross-language information retrieval.  Ponte and Croft (1998) “A Language Modeling Approach to Information Retrieval” SIGIR.   Sanderson, M. (1994) “Word sense disambiguation and information retrieval” Proceedings of the 17th ACM SIGIR Conference   Sanderson, M. (2000) “Retrieving with Good Sense” Information Retrieval 2(1):49-69.   Question Answering (Section 3.2).   Summarization single-document vs. multi-document MS Word Bing 21-Mar-2011 33 NLP & IR
Information Extraction ,[object Object]
Based on pre-defined structures, e.g.movements of company executives victims of terrorist attacks corporate mergers / acquisitions interactions between genes and proteins Example (from MUC-6): Now, Mr. James is preparing to sail into the sunset, and Mr. Dooner is poised to rev up the engines to guide Interpublic Group's McCann-Erickson into the 21st century. Yesterday, McCann made official what had been widely anticipated: Mr. James, 57 years old, is stepping down as chief executive officer on July 1 and will retire as chairman at the end of the year. He will be succeeded by Mr. Dooner, 45. 21-Mar-2011 34 NLP & IR
Information Extraction <SUCCESSION_EVENT-2> :=     SUCCESSION_ORG:          <ORGANIZATION-1>     POST: "chairman"     IN_AND_OUT: <IN_AND_OUT-4>     VACANCY_REASON: DEPART_WORKFORCE   <IN_AND_OUT-4> :=     IO_PERSON: <PERSON-1>     NEW_STATUS: IN     ON_THE_JOB: NO     OTHER_ORG: <ORGANIZATION-1>     REL_OTHER_ORG: SAME_ORG   <ORGANIZATION-1> :=     ORG_NAME: "McCann-Erickson"     ORG_ALIAS: "McCann"     ORG_TYPE:  COMPANY   <PERSON-1> :=     PER_NAME: "John J. Dooner Jr."     PER_ALIAS: "John Dooner" "Dooner" 21-Mar-2011 35 NLP & IR
Information Extraction Can now use as metadata (for retrieval) Or store in DB and query against it, e.g. “show me all the events where a finance officer left a company to take up the position of CEO in another company” 21-Mar-2011 36 NLP & IR
IE Example: Yahoo Correlator 21-Mar-2011 37 NLP & IR
IE Example: Google Wonder Wheel 21-Mar-2011 38 NLP & IR
IE Example: Yahoo Search Assist 21-Mar-2011 39 NLP & IR
Information Extraction But IE is difficult when concepts are spread across multiple sentences: Pace American Group Inc. said it notified two top executives it intends to dismiss them because an internal investigation found evidence of “self-dealing” and “undisclosed financial relationships.” The executives are Don H. Pace, president and CEO; and Greg S. Kaplan, SVP and CFO. Event & org in first S, but names in second S Need to identify connection between “two top executives”  and “the executives”. 21-Mar-2011 40 NLP & IR
Information Extraction Anaphora resolution relies on context (semantic / pragmatic knowledge): We gave the bananas to the monkeys because they were hungry. We gave the bananas to the monkeys because they were ripe. We gave the bananas to the monkeys because they were here.  21-Mar-2011 41 NLP & IR
WordNet Semantic database for English  See also EuroWordNet Based on synsets car = automobile | railway car | elevator car | cable car 1stsynset = car, motorcar, machine, auto, automobile  Connected via relationships E.g. hypernymy (a kind of) separate hierarchies for nouns, verbs, adjectives and adverbs 21-Mar-2011 42 NLP & IR
Word Sense Disambiguation Resolve ambiguities due to polysemy Often characterised by syntax, e.g. Magnesium is a light metal (ADJ) The light in the kitchen is quite bright (NOUN) Can use a Part of Speech Tagger to resolve broad ambiguities PoS tags = NN, NNP, NNS, NNPS, etc. 21-Mar-2011 43 NLP & IR
Dictionary Definition of “set” set    /sɛt/ Show Spelled [set] Show IPA verb, set, set·ting, noun, adjective, interjection  –verb (used with object) 1. to put (something or someone) in a particular place: to set a vase on a table.  2. to place in a particular position or posture: Set thebaby on his feet.  3. to place in some relation to something or someone: We set a supervisor over the new workers.  4. to put into some condition: to set a house on fire.  5. to put or apply: to set fire to a house.  6. to put in the proper position: to set a chair back on its feet.  7. to put in the proper or desired order or condition for use: to set a trap.  8. to distribute or arrange china, silver, etc., for use on (a table): to set the table for dinner.  9. to place (the hair, especially when wet) on rollers, in clips, or the like, so that the hair will assume a particular style.  10. to put (a price or value) upon something: He set $7500 as the right amount for the car. The teacher sets a high value on neatness.  11. to fix the value of at a certain amount or rate; value: He set the car at $500. She sets neatness at a high value.  12. to post, station, or appoint for the purpose of performing some duty: to set spies on a person.  13. to determine or fix definitely: to set a time limit.  14. to resolve or decide upon: to set a wedding date.  15. to cause to pass into a given state or condition: to set one's mind at rest; to set a prisoner free.  16. to direct or settle resolutely or wishfully: to set one's mind to a task.  17. to present as a model; place before others as a standard: to set a good example.  18. to establish for others to follow: to set a fast pace.  19. to prescribe or assign, as a task.  20. to adjust (a mechanism) so as to control its performance.  ,[object Object]
22. to adjust (a timer, alarm of a clock, etc.) so as to sound when desired: He set the alarm for seven o'clock.
23. to fix or mount (a gem or the like) in a frame or setting.
24. to ornament or stud with gems or the like: a bracelet set with pearls.
25. to cause to sit; seat: to set a child in a highchair.
26. to put (a hen) on eggs to hatch them.
27. to place (eggs) under a hen or in an incubator for hatching.
28. to place or plant firmly: to set a flagpole in concrete.
29. to put into a fixed, rigid, or settled state, as the face, muscles, etc.
30. to fix at a given point or calibration: to set the dial on an oven; to set a micrometer.
31. to tighten (often followed by up ): to set nuts well up.
32. to cause to take a particular direction: to set one's course to the south.
33. Surgery . to put (a broken or dislocated bone) back in position.
34. (of a hunting dog) to indicate the position of (game) by standing stiffly and pointing with the muzzle.
35. Music . a. to fit, as words to music.
b. to arrange for musical performance.
c. to arrange (music) for certain voices or instruments.
36. Theater . a. to arrange the scenery, properties, lights, etc., on (a stage) for an act or scene.

More Related Content

What's hot

Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
Sai Kumar Kodam
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
Rupak Roy
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
Hemantha Kulathilake
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
kartikaVashisht
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
Vsevolod Dyomkin
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
GauravSonawane51
 
Machine learning
Machine learningMachine learning
Machine learning
Shailja Tripathi
 
Artificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain SituationsArtificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain Situations
Laguna State Polytechnic University
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
Marina Santini
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
M. Atif Qureshi
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
Vignesh Saravanan
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
Question answering
Question answeringQuestion answering
Question answering
Nafiseh Navabpour
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
DigiGurukul
 

What's hot (20)

Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
NLP
NLPNLP
NLP
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Language models
Language modelsLanguage models
Language models
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Artificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain SituationsArtificial Intelligence - Reasoning in Uncertain Situations
Artificial Intelligence - Reasoning in Uncertain Situations
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Question answering
Question answeringQuestion answering
Question answering
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
 

Viewers also liked

Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Basic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji marketBasic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji market
Chetan Kute
 
India food processing overview
India food processing overviewIndia food processing overview
India food processing overview
Yuvraj Kadam
 
Indian Food Processing Sector_2016
Indian Food Processing Sector_2016Indian Food Processing Sector_2016
Indian Food Processing Sector_2016
Asian Food Regulation Information Service
 
Food Processing Industry India
Food Processing Industry IndiaFood Processing Industry India
Food Processing Industry India
Ankit Agarwal
 
Food processing industry.
Food processing industry.Food processing industry.
Food processing industry.Rachana Tiwari
 

Viewers also liked (6)

Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Basic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji marketBasic supply-chain APMC bhaji market
Basic supply-chain APMC bhaji market
 
India food processing overview
India food processing overviewIndia food processing overview
India food processing overview
 
Indian Food Processing Sector_2016
Indian Food Processing Sector_2016Indian Food Processing Sector_2016
Indian Food Processing Sector_2016
 
Food Processing Industry India
Food Processing Industry IndiaFood Processing Industry India
Food Processing Industry India
 
Food processing industry.
Food processing industry.Food processing industry.
Food processing industry.
 

Similar to The Role of Natural Language Processing in Information Retrieval

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
AlyaaMachi
 
Nltk
NltkNltk
Nltk
Anirudh
 
NLP Introduction.ppt machine learning presentation
NLP  Introduction.ppt machine learning presentationNLP  Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
PriyankaRamavath3
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
Yousef Aburawi
 
REPORT.doc
REPORT.docREPORT.doc
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYA DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
ijaia
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
Rajnish Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
punedevscom
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
Shubhankar Mohan
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
Mario Flecha
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
rudolf eremyan
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
KarenVacca
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
OlusolaTop
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write5
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write4
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
Michel Bruley
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
Seth Grimes
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
Seth Grimes
 

Similar to The Role of Natural Language Processing in Information Retrieval (20)

Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Nltk
NltkNltk
Nltk
 
NLP Introduction.ppt machine learning presentation
NLP  Introduction.ppt machine learning presentationNLP  Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
 
AI_08_NLP.pptx
AI_08_NLP.pptxAI_08_NLP.pptx
AI_08_NLP.pptx
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYA DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP
NLPNLP
NLP
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 

More from Tony Russell-Rose

Visual approaches to patent retrieval
Visual approaches to patent retrievalVisual approaches to patent retrieval
Visual approaches to patent retrieval
Tony Russell-Rose
 
Towards Explainability in Professional Search
Towards Explainability in Professional SearchTowards Explainability in Professional Search
Towards Explainability in Professional Search
Tony Russell-Rose
 
Putting search theory to work on large datasets
Putting search theory to work on large datasetsPutting search theory to work on large datasets
Putting search theory to work on large datasets
Tony Russell-Rose
 
NLP techniques for automated query suggestions
NLP techniques for automated query suggestionsNLP techniques for automated query suggestions
NLP techniques for automated query suggestions
Tony Russell-Rose
 
Think outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationThink outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulation
Tony Russell-Rose
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Tony Russell-Rose
 
Designing the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryDesigning the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of Discovery
Tony Russell-Rose
 
User requirements for complex search strategies
User requirements for complex search strategiesUser requirements for complex search strategies
User requirements for complex search strategies
Tony Russell-Rose
 
Sentiment analysis in healthcare
Sentiment analysis in healthcareSentiment analysis in healthcare
Sentiment analysis in healthcare
Tony Russell-Rose
 
A Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourA Model of Consumer Search Behaviour
A Model of Consumer Search Behaviour
Tony Russell-Rose
 
A Taxonomy of Site Search
A Taxonomy of Site SearchA Taxonomy of Site Search
A Taxonomy of Site Search
Tony Russell-Rose
 
Patterns of Personalization
Patterns of PersonalizationPatterns of Personalization
Patterns of Personalization
Tony Russell-Rose
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implications
Tony Russell-Rose
 
From search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsFrom search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutions
Tony Russell-Rose
 
From Search to Discovery
From Search to DiscoveryFrom Search to Discovery
From Search to Discovery
Tony Russell-Rose
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and Tomorrow
Tony Russell-Rose
 
UI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryUI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information Discovery
Tony Russell-Rose
 

More from Tony Russell-Rose (17)

Visual approaches to patent retrieval
Visual approaches to patent retrievalVisual approaches to patent retrieval
Visual approaches to patent retrieval
 
Towards Explainability in Professional Search
Towards Explainability in Professional SearchTowards Explainability in Professional Search
Towards Explainability in Professional Search
 
Putting search theory to work on large datasets
Putting search theory to work on large datasetsPutting search theory to work on large datasets
Putting search theory to work on large datasets
 
NLP techniques for automated query suggestions
NLP techniques for automated query suggestionsNLP techniques for automated query suggestions
NLP techniques for automated query suggestions
 
Think outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulationThink outside the search box: a AI-based approach to search strategy formulation
Think outside the search box: a AI-based approach to search strategy formulation
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Designing the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of DiscoveryDesigning the Search Experience: The Language of Discovery
Designing the Search Experience: The Language of Discovery
 
User requirements for complex search strategies
User requirements for complex search strategiesUser requirements for complex search strategies
User requirements for complex search strategies
 
Sentiment analysis in healthcare
Sentiment analysis in healthcareSentiment analysis in healthcare
Sentiment analysis in healthcare
 
A Model of Consumer Search Behaviour
A Model of Consumer Search BehaviourA Model of Consumer Search Behaviour
A Model of Consumer Search Behaviour
 
A Taxonomy of Site Search
A Taxonomy of Site SearchA Taxonomy of Site Search
A Taxonomy of Site Search
 
Patterns of Personalization
Patterns of PersonalizationPatterns of Personalization
Patterns of Personalization
 
A taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implicationsA taxonomy of search strategies and their design implications
A taxonomy of search strategies and their design implications
 
From search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutionsFrom search to discovery: Information search strategies and design solutions
From search to discovery: Information search strategies and design solutions
 
From Search to Discovery
From Search to DiscoveryFrom Search to Discovery
From Search to Discovery
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and Tomorrow
 
UI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information DiscoveryUI Design Patterns for Search & Information Discovery
UI Design Patterns for Search & Information Discovery
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

The Role of Natural Language Processing in Information Retrieval

  • 1. The Role of Natural Language Processingin Information Retrieval Searching for Meaning in Text Tony Russell-Rose, PhD 21-Mar-2011
  • 2. Contents Introduction & Overview Background, Terminology NLP Fundamentals Paradigms & Perspectives NLP Tools & Techniques NLP Applications Text mining, Question Answering, etc. Conclusions & Further Reading 21-Mar-2011 2 NLP & IR
  • 3. Introduction & Overview The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 3 NLP & IR
  • 4. Information Retrieval Models, techniques & frameworks for satisfying an information need Common paradigm: document retrieval 21-Mar-2011 4 NLP & IR
  • 5. Document Retrieval Methods include Boolean, vector space, probabilistic Rely on index terms “bag of words” Stop lists + stemming But text is “unstructured” Information may be “hidden” 21-Mar-2011 5 NLP & IR
  • 6. NLP – a solved problem? As humans we do it effortlessly ... don’t we? DRUNK GETS NINE YEARS IN VIOLIN CASE PROSTITUTES APPEAL TO POPE STOLEN PAINTING FOUND BY TREE RED TAPE HOLDS UP NEW BRIDGE DEER KILL 300,000 RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES MINERS REFUSE TO WORK AFTER DEATH   21-Mar-2011 6 NLP & IR
  • 7. Problems with Text (1) Polysemy One word maps to many concepts e.g. bat Synonymy One concept maps to many words 21-Mar-2011 7 NLP & IR
  • 8. Problems with Text (2) Word order Venetian blind vs. Blind venetian 21-Mar-2011 8 NLP & IR
  • 9. Problems with Text (3) Language is generative Starbucks coffee is the best The place I like most when I need to feed my caffeine addiction is the company from Seattle with branches everywhere Many different ways to express a given idea Synonymy, paraphrase, metaphor, etc. 21-Mar-2011 9 NLP & IR
  • 10.
  • 11. The meaning of a sentence is completely determined by the meaning of its symbols and the syntax used to combine themLanguage is a form of communication All communication has a context: time and place of utterance, the writer, the reader, their backgroundknowledge, intentions, assumptions re the reader’s knowledge/intentions, etc. 21-Mar-2011 10 NLP & IR
  • 12. Problems with Text (5) Language is changing I want to buy a mobile Ill-formed input “accomodation office” Co-ordination, negation, etc. This is not a talk about neuro-linguistic programming Multi-linguality Claudia Schiffer is on the cover of Elle Also sarcasm, irony, slang, jargon, etc. That was a wicked lecture Yep – the coffee break was the best part 21-Mar-2011 11 NLP & IR
  • 13. Enter NLP / Text Analytics… Text Analytics: a set of linguistic, analytical and predictive techniques to extract structure and meaning from unstructured documents NLP: the academic term for Text Analytics analogous to “search” vs. “IR” Text Analytics ~= Natural Language Processing ~= Text Mining Text Mining -> Scientific / technical context, automated processing Text Analytics -> Business context, interactive apps 21-Mar-2011 12 NLP & IR
  • 14. NLP Fundamentals The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 13 NLP & IR
  • 15. NLP Perspectives Computational Linguistics Use of computational techniques to study linguistic phenomena Cognitive science Study of human information processing (perception, language, reasoning, etc.) Computer science Theoretical foundations of computation and practical techniques for implementation Information science Analysis, classification, manipulation, retrieval and dissemination of information 21-Mar-2011 14 NLP & IR
  • 16. NLP Paradigms Symbolic approaches Rule-based, hand coded (by linguists) Knowledge-intensive Statistical approaches Shallow statistical models, trained on annotated corpora Data-intensive 21-Mar-2011 15 NLP & IR
  • 17. NLP Fundamentals (word level) Language is AMBIGUOUS To find structure, we must remove ambiguity! Lexical analysis (tokenisation) The cat sat on the mat I can’t tokenise this sentence Stop word removal No definitive list The Who, The The, Take That… To be or not to be 21-Mar-2011 16 NLP & IR
  • 18. NLP Fundamentals (word level) Stemming fishing, fished, fish, fisher -> fish Lemmatization Stemming + context Passing -> pass + ING Were -> be + PAST Delegate = de-leg-ate (?) Ratify = rat-ify (?) Morphology (prefixes, suffixes, etc.) Gebäudereinigungsfirmenangestellter -> Gebäude + Reinigung + Firma + Angestellter (building + cleaning + company + employee) 21-Mar-2011 17 NLP & IR
  • 19. NLP Fundamentals (sentence level) Syntax (part of speech tagging) book -> NOUN, VERB that -> DETERMINER flight -> NOUN Book that flight -> VB DT NN Ambiguity problem Time flies like an arrow -> ? Fruit flies like a banana -> ? Eats shoots and leaves -> ? 21-Mar-2011 18 NLP & IR
  • 20. NLP Fundamentals (sentence level) Parsing (grammar) I saw a venetian blind I saw a blind venetian I saw the man on the hill with a telescope Rugby is a game played by men with odd-shaped balls Sentence boundary detection Punctuation denotes the end of a sentence! “But not always!”, said Fred... 21-Mar-2011 19 NLP & IR
  • 21. NLP Tools & Techniques The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 20 NLP & IR
  • 22. Early NLP Systems ELIZA Wiezenbaum 1966 Simple pattern matching PARRY Colby et al 1971 Pattern matching & planning SHRDLU Winograd 1972 Natural language understanding Comprehensive grammar of English 21-Mar-2011 21 NLP & IR
  • 23. Word Prediction Assistive technologies Aurora Systems, TextHelp Google, Bing, Yahoo query suggestions 21-Mar-2011 22 NLP & IR
  • 24. Spelling Correction Auto-correct Did You Mean 21-Mar-2011 23 NLP & IR
  • 25. Text Categorization News agencies: classify incoming news stories Search engines: classify queries, e.g. search Google for ‘LOG313’ Identifying spam emails, e.g. http://www.paulgraham.com/spam.html Routing email or documents to appropriate people 21-Mar-2011 24 NLP & IR
  • 26. Terminology Extraction Differentiate between useful index terms and ‘noise’ Help lexicographers identify new terminology: the C4 carbons of the nicotinamide rings X-ray therapy vs. X-ray therapies PAHO vs. Pan-American Health Organization Term extraction systems process scientific papers to identify terminology, possibly comparing it with a known list 21-Mar-2011 25 NLP & IR
  • 27. Speech Recognition Spoken Dialogue Systems Directory enquiries (AT&T) Railways (Germany, Switzerland, Italy) Airlines (USA) iPhone Voice Search IBM Watson 21-Mar-2011 26 NLP & IR
  • 28. Named Entity Recognition Identification of key concepts, e.g. people, places, organisations, etc. Also postcodes, temporal/numerical expressions, etc. “Mexico has been trying to stage a recovery since the beginning of this year and it's always been getting ahead of itself in terms of fundamentals,” said Matthew Hickman of Lehman Brothers in New York.” 21-Mar-2011 27 NLP & IR
  • 29. Named Entity Recognition: Applications Increase precision of IR New companies in York vs. Companies in New York Support navigation SMART tags, Apture, etc. Improve machine translation: “I live in a new house in new york” Speech synthesis, auto-summarisation, etc. 21-Mar-2011 28 NLP & IR
  • 30. Named Entity Recognition Systems usually developed for specific domain Are typically language and domain-specific Accuracy > 90% for newswire text Best system at MUC-7 scored 93.39% f-measure human annotators scored 97.60% and 96.95%. Lower for certain domains, e.g. life sciences (species, substances, proteins, genes, etc.) extended naming conventions, extensive use of conjunction and disjunction, high occurrence of neologisms and abbreviations, etc. 21-Mar-2011 29 NLP & IR
  • 31. NER (Newssift Example) 21-Mar-2011 30 NLP & IR
  • 32. Machine Translation (before) 21-Mar-2011 31 NLP & IR
  • 33. Machine Translation (after) 21-Mar-2011 32 NLP & IR
  • 34. Summarisation Section 3 discusses two information access applications (text mining and question answering) closely associated with NLP. NLP Techniques Named entity recognition Information Extraction Current document retrieval technologies could not identify information as specific as this within text. Word Sense Disambiguation Text Mining Question Answering “Wh” questions List questions Examples include those mentioned here: text mining, question answering and cross-language information retrieval. Ponte and Croft (1998) “A Language Modeling Approach to Information Retrieval” SIGIR.   Sanderson, M. (1994) “Word sense disambiguation and information retrieval” Proceedings of the 17th ACM SIGIR Conference   Sanderson, M. (2000) “Retrieving with Good Sense” Information Retrieval 2(1):49-69.   Question Answering (Section 3.2).   Summarization single-document vs. multi-document MS Word Bing 21-Mar-2011 33 NLP & IR
  • 35.
  • 36. Based on pre-defined structures, e.g.movements of company executives victims of terrorist attacks corporate mergers / acquisitions interactions between genes and proteins Example (from MUC-6): Now, Mr. James is preparing to sail into the sunset, and Mr. Dooner is poised to rev up the engines to guide Interpublic Group's McCann-Erickson into the 21st century. Yesterday, McCann made official what had been widely anticipated: Mr. James, 57 years old, is stepping down as chief executive officer on July 1 and will retire as chairman at the end of the year. He will be succeeded by Mr. Dooner, 45. 21-Mar-2011 34 NLP & IR
  • 37. Information Extraction <SUCCESSION_EVENT-2> := SUCCESSION_ORG: <ORGANIZATION-1> POST: "chairman" IN_AND_OUT: <IN_AND_OUT-4> VACANCY_REASON: DEPART_WORKFORCE   <IN_AND_OUT-4> := IO_PERSON: <PERSON-1> NEW_STATUS: IN ON_THE_JOB: NO OTHER_ORG: <ORGANIZATION-1> REL_OTHER_ORG: SAME_ORG   <ORGANIZATION-1> := ORG_NAME: "McCann-Erickson" ORG_ALIAS: "McCann" ORG_TYPE: COMPANY   <PERSON-1> := PER_NAME: "John J. Dooner Jr." PER_ALIAS: "John Dooner" "Dooner" 21-Mar-2011 35 NLP & IR
  • 38. Information Extraction Can now use as metadata (for retrieval) Or store in DB and query against it, e.g. “show me all the events where a finance officer left a company to take up the position of CEO in another company” 21-Mar-2011 36 NLP & IR
  • 39. IE Example: Yahoo Correlator 21-Mar-2011 37 NLP & IR
  • 40. IE Example: Google Wonder Wheel 21-Mar-2011 38 NLP & IR
  • 41. IE Example: Yahoo Search Assist 21-Mar-2011 39 NLP & IR
  • 42. Information Extraction But IE is difficult when concepts are spread across multiple sentences: Pace American Group Inc. said it notified two top executives it intends to dismiss them because an internal investigation found evidence of “self-dealing” and “undisclosed financial relationships.” The executives are Don H. Pace, president and CEO; and Greg S. Kaplan, SVP and CFO. Event & org in first S, but names in second S Need to identify connection between “two top executives” and “the executives”. 21-Mar-2011 40 NLP & IR
  • 43. Information Extraction Anaphora resolution relies on context (semantic / pragmatic knowledge): We gave the bananas to the monkeys because they were hungry. We gave the bananas to the monkeys because they were ripe. We gave the bananas to the monkeys because they were here. 21-Mar-2011 41 NLP & IR
  • 44. WordNet Semantic database for English See also EuroWordNet Based on synsets car = automobile | railway car | elevator car | cable car 1stsynset = car, motorcar, machine, auto, automobile Connected via relationships E.g. hypernymy (a kind of) separate hierarchies for nouns, verbs, adjectives and adverbs 21-Mar-2011 42 NLP & IR
  • 45. Word Sense Disambiguation Resolve ambiguities due to polysemy Often characterised by syntax, e.g. Magnesium is a light metal (ADJ) The light in the kitchen is quite bright (NOUN) Can use a Part of Speech Tagger to resolve broad ambiguities PoS tags = NN, NNP, NNS, NNPS, etc. 21-Mar-2011 43 NLP & IR
  • 46.
  • 47. 22. to adjust (a timer, alarm of a clock, etc.) so as to sound when desired: He set the alarm for seven o'clock.
  • 48. 23. to fix or mount (a gem or the like) in a frame or setting.
  • 49. 24. to ornament or stud with gems or the like: a bracelet set with pearls.
  • 50. 25. to cause to sit; seat: to set a child in a highchair.
  • 51. 26. to put (a hen) on eggs to hatch them.
  • 52. 27. to place (eggs) under a hen or in an incubator for hatching.
  • 53. 28. to place or plant firmly: to set a flagpole in concrete.
  • 54. 29. to put into a fixed, rigid, or settled state, as the face, muscles, etc.
  • 55. 30. to fix at a given point or calibration: to set the dial on an oven; to set a micrometer.
  • 56. 31. to tighten (often followed by up ): to set nuts well up.
  • 57. 32. to cause to take a particular direction: to set one's course to the south.
  • 58. 33. Surgery . to put (a broken or dislocated bone) back in position.
  • 59. 34. (of a hunting dog) to indicate the position of (game) by standing stiffly and pointing with the muzzle.
  • 60. 35. Music . a. to fit, as words to music.
  • 61. b. to arrange for musical performance.
  • 62. c. to arrange (music) for certain voices or instruments.
  • 63. 36. Theater . a. to arrange the scenery, properties, lights, etc., on (a stage) for an act or scene.
  • 64. b. to prepare (a scene) for dramatic performance.
  • 65. 37. Nautical . to spread and secure (a sail) so as to catch the wind.
  • 66. 38. Printing . a. to arrange (type) in the order required for printing.
  • 67. b. to put together types corresponding to (copy); compose in type: to set an article.
  • 68. 39. Baking . to put aside (a substance to which yeast has been added) in order that it may rise.
  • 69. 40. to change into curd: to set milk with rennet.
  • 70. 41. to cause (glue, mortar, or the like) to become fixed or hard.
  • 71. 42. to urge, goad, or encourage to attack: to set the hounds on a trespasser.
  • 72. 43. Bridge . to cause (the opposing partnership or their contract) to fall short: We set them two tricks at four spades. Only perfect defense could set four spades.
  • 73. 44. to affix or apply, as by stamping: The king set his seal to the decree.
  • 74. 45. to fix or engage (a fishhook) firmly into the jaws of a fish by pulling hard on the line once the fish has taken the bait.
  • 75. 46. to sharpen or put a keen edge on (a blade, knife, razor, etc.) by honing or grinding.
  • 76. 47. to fix the length, width, and shape of (yarn, fabric, etc.).
  • 77. 48. Carpentry . to sink (a nail head) with a nail set.
  • 78. 49. to bend or form to the proper shape, as a saw tooth or a spring.
  • 79. 50. to bend the teeth of (a saw) outward from the blade alternately on both sides in order to make a cut wider than the blade itself.
  • 80. –verb (used without object) 51. to pass below the horizon; sink: The sun sets early in winter.
  • 82. 53. to assume a fixed or rigid state, as the countenance or the muscles.
  • 83. 54. (of the hair) to be placed temporarily on rollers, in clips, or the like, in order to assume a particular style: Long hair sets more easily than short hair.
  • 84. 55. to become firm, solid, or permanent, as mortar, glue, cement, or a dye, due to drying or physical or chemical change.
  • 85. 56. to sit on eggs to hatch them, as a hen.
  • 86. 57. to hang or fit, as clothes.
  • 87. 58. to begin to move; start (usually followed by forth, out, off,  etc.).
  • 88. 59. (of a flower's ovary) to develop into a fruit.
  • 89. 60. (of a hunting dog) to indicate the position of game. 21-Mar-2011 44 NLP & IR
  • 90.
  • 91. 77. an apparatus for receiving radio or television programs; receiver.
  • 92. 78. Philately . a group of stamps that form a complete series.
  • 93. 79. Tennis . a unit of a match, consisting of a group of not fewer than sixgames with a margin of at least two games between the winner and loser: He won the match in straight sets of 6–3, 6–4, 6–4.
  • 94. 80. a construction representing a place or scene in which the action takes place in a stage, motion-picture, or television production.
  • 95. 81. Machinery . a. the bending out of the points of alternate teeth of a saw in opposite directions.
  • 96. b. a permanent deformation or displacement of an object or part.
  • 97. c. a tool for giving a certain form to something, as a saw tooth.
  • 98. 82. a chisel having a wide blade for dividing bricks.
  • 99. 83. Horticulture . a young plant, or a slip, tuber, or the like, suitable for planting.
  • 100. 84. Dance . a. the number of couples required to execute a quadrille or the like.
  • 101. b. a series of movements or figures that make up a quadrille or the like.
  • 102. 85. Music . a. a group of pieces played by a band, as in a night club, and followed by an intermission.
  • 103. b. the period during which these pieces are played.
  • 104. 86. Bridge . a failure to take the number of tricks specified by one's contract: Our being vulnerable made the set even more costly.
  • 105. 87. Nautical . a. the direction of a wind, current, etc.
  • 106. b. the form or arrangement of the sails, spars, etc., of a vessel.
  • 107. c. suit ( def. 12 ) .
  • 108. 88. Psychology . a temporary state of an organism characterized by a readiness to respond to certain stimuli in a specific way.
  • 109. 89. Mining . a timber frame bracing or supporting the walls or roof of a shaft or stope.
  • 110. 90. Carpentry . nail set.
  • 111. 90. Carpentry . nail set.
  • 112. 91. Mathematics . a collection of objects or elements classed together.
  • 113. 92. Printing . the width of a body of type.
  • 114. 93. sett ( def. 3 ) .
  • 115. –adjective 94. fixed or prescribed beforehand: a set time; set rules.
  • 116. 95. specified; fixed: The hall holds a set number of people.
  • 117. 96. deliberately composed; customary: set phrases.
  • 118. 97. fixed; rigid: a set smile.
  • 119. 98. resolved or determined; habitually or stubbornly fixed: to be set in one's opinions.
  • 120. 99. completely prepared; ready: Is everyone set?
  • 121. –interjection 100. (in calling the start of a race): Ready! Set! Go! Also, get set! 21-Mar-2011 45 NLP & IR
  • 122. Word Sense Disambiguation Surely PoS tagging would help IR performance? Could index docs using words AND PoS tags Very sensitive to tagger performance Does WSD improve IR performance? Conflicting evidence …“Perfect” WSD engine may provide only marginal gain, e.g. >76% accuracy needed for homonymy >55% for polysemy 21-Mar-2011 46 NLP & IR
  • 123. NLP Applications The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 47 NLP & IR
  • 124. NLP Applications Web & enterprise search Text classification Text summarisation Machine translation Human-computer interfaces (spelling correction etc.) Speech recognition & synthesis Natural language generation (e.g. in games) Text mining Question answering 21-Mar-2011 48 NLP & IR
  • 125. Finally becoming mainstream? ‘80% of corporate information is unstructured’ Entire value chain for some organisation (media / publishing etc.) Retail / eCommerce: Product reviews User generated content: blogs, forums, wikis Voice of the Customer: social media + sentiment analysis 161 billion gigabytes of digital information in 2006 approximately 988 exabytes by 2010 Audio / video still needs summaries & tags etc. 21-Mar-2011 49 NLP & IR
  • 126. Market Outlook Combine component technologies e.g. PoS tagging + NER, etc. Healthy growth: massive expansion in social media Lightweight NLP for buzz analysis, brand monitoring, etc. Dominant markets: Customer Experience (VoC), media/publishing, financial services & insurance, intelligence, life sciences, e-discovery Solutions still not standardized Need for self-service tuning & configuration Partner ecosystem developing Marketing services providers, platform vendors, CRM + call centre vendors, system integrators 21-Mar-2011 50 NLP & IR
  • 127. Text Mining Analogy with Data Mining Discover or infer new knowledge from unstructured text resources A <-> B and B <-> C Infer A <-> C? e.g. link between migraine headaches and magnesium deficiency Applications in life sciences, media/publishing, counter terrorism and competitive intelligence 21-Mar-2011 51 NLP & IR
  • 128. Text Mining Levels of text mining, e.g. for bioscience: Tagging entities: e.g. genes, proteins, diseases and chemical compounds Information extraction: identify meaningful structural relations within the text (e.g. interactions between proteins) Knowledge discovery: combine with other knowledge sources (such as external ontologies) to infer or discover new knowledge 21-Mar-2011 52 NLP & IR
  • 129. Question Answering Going beyond the document retrieval paradigm provide specific answers to specific questions Introduced as a TREC track in 1999 21-Mar-2011 53 NLP & IR
  • 130. Question Answering Yes/no questions “Is George W. Bush the current president of the USA?” “Is the Sea of Tranquility deep?” “Wh” questions “Who was the British Prime Minister before Margaret Thatcher?” “When was the Battle of Hastings?” List questions “Which football teams have won the Champions League this decade?” “Which roads lead to Rome?” Instruction-based questions “How do I cook lasagne?”, “What is the best way to build a bridge?” Explanation questions “Why did World War I start?” “How does a computer process floating point numbers?” Commands “Tell me the height of the Eiffel Tower.” “Name all the Kings of England” 21-Mar-2011 54 NLP & IR
  • 131. Question Answering Common use cases handled by major search engines Google, Yahoo!, Bing, Baidu, ... Specialist QA systems Wolfram Alpha: http://www.wolframalpha.com True Knowledge: http://www.trueknowledge.com Powerset: http://www.powerset.com/ IBM Watson 21-Mar-2011 55 NLP & IR
  • 132. Question Answering Basic approach: question analysis Predict type of answer expected Create query document retrieval answer extraction Based on output of (1) Varies from regex to deep linguistic analysis 21-Mar-2011 56 NLP & IR
  • 133. Evaluation How do we measure NLP accuracy? Compare against human annotation However: Humans often disagree Particularly for complex tasks Comparison can be measured in many ways “Bill Gates is chairman of Microsoft” -> “Gates” = person? 21-Mar-2011 57 NLP & IR
  • 134. Conclusions The Role of Natural Language Processing in Information Retrieval 21-Mar-2011 58 NLP & IR
  • 135. Conclusions NLP’s contribution to IR not always apparent Overemphasis on precision & recall? NLP provides benefits that go beyond quantitative metrics to support broader tasks & richer experiences Insight, analysis, visualisation, etc. NLP required for text mining, question answering & cross-language IR Bag of words model is insufficient Web becoming increasingly multi-lingual 21-Mar-2011 59 NLP & IR
  • 136. State of the Art Commodity technologies: Tokenisation, normalization, regular expression search Mainstream, but room for improvement: Lemmatization, spelling correction, speech synthesis Mainstream but needs improvement: PoS tagging, term extraction, summarization, speech recognition, text categorization, spoken dialogue systems Almost there: Information extraction, NL generation, simple speech translation Don’t hold your breath: Full machine translation, advanced dialogue systems 21-Mar-2011 60 NLP & IR
  • 137. Platforms & Toolkits Many commercialvendors Open source/access: GATE (Sheffield University) RapidMiner Stanford NLP OpenNLP OpenCalais LingPipe nltk 21-Mar-2011 61 NLP & IR
  • 138. Further Reading A. Goker & J. Davies, Information Retrieval: Searching in the 21st Century. Wiley-Blackwell, 2009 D. Jurafsky and J. Martin, Speech and Language Processing. Prentice-Hall, 2nd edition, 2008. Manning, C. and Schutze, H. Foundations of Statistical Language Processing. MIT Press, 1999. 21-Mar-2011 62 NLP & IR
  • 139. Thank you Tony Russell-Rose, PhD Vice-chair, BCS IRSG Chair, IEHF HCI Group Email: tgr@uxlabs.co.uk Blog: http://isquared.wordpress.com LinkedIn: http://www.linkedin.com/tonyrussellrose/ Twitter: @tonygrr

Editor's Notes

  1. Exercise: experiment with the 3 major search engines (Google, Bing, Yahoo) to see how they handle stop words. Try queries that are dominated by stop words, with and without quotes. How do they differ?
  2. Exercise: experiment with the 3 major search engines (Google, Bing, Yahoo) to see how they handle stemming. Try queries that are dominated by inflected forms, with and without quotes. How do they differ?
  3. Do as class exercise
  4. Demo: ELIZA http://nlp-addiction.com/eliza/
  5. Demo Google suggest
  6. Demo Auto-correct &amp; DYM
  7. Demo: try Google Translate (or Babelfish) with sentences containing ambiguous terms or NEs, e.g. “I live in a new house in new york”. Use the speech synthesizerhttp://translate.google.com/#en|de|I%20live%20in%20a%20new%20house%20in%20new%20york
  8. Demo Yahoo search assist + “explore concepts”
  9. Paired exercise: what is the most polysemous English word you can think of? How many senses do you think it has in an average dictionary? Think of some polysemous words (examples include “ball”, “bank”, “port” but there are many others). Use these as queries in a search engine and comment on what you notice about the results.
  10. Paired Exercise: try each of above queries in Google, Yahoo, Bing. Now try to answer the same questions with a normal web search (by entering a set of query terms instead of a natural language question). Are the results of this search any better or worse?
  11. Paired exercise: try previous queries on these sites