3. What is NLP?
• Natural Language Processing (NLP)
– Computers use (analyze, understand,
generate) natural language
– A somewhat applied field
• Computational Linguistics (CL)
– Computational aspects of the human
language faculty
– More theoretical
4. Why Study NLP?
• Human language interesting & challenging
– NLP offers insights into language
• Language is the medium of the web
• Interdisciplinary: Ling, CS, psych, math
• Help in communication
– With computers (ASR, TTS)
– With other humans (MT)
• Ambitious yet practical
5. Goals of NLP
• Scientific Goal
–Identify the computational machinery
needed for an agent to exhibit various
forms of linguistic behavior
• Engineering Goal
–Design, implement, and test systems
that process natural languages for
practical applications
6. Applications
• speech processing: get flight information or book
a hotel over the phone
• information extraction: discover names of people
and events they participate in, from a document
• machine translation: translate a document from
one human language into another
• question answering: find answers to natural
language questions in a text collection or
database
• summarization: generate a short biography of
Noam Chomsky from one or more news articles
7. General Themes
• Ambiguity of Language
• Language as a formal system
• Computation with human language
• Rule-based vs. Statistical Methods
• The need for efficiency
9. Text to Speech – artificial voice
• Text Input
• Break text into phonemes
– Match phonemes to voice elements
– Concatenate voice elements
– Manipulate pitch and spacing
• Output results
• Research question: How can a human voice be
used to produce an artificial voice?
• Model Talker - opportunities for active, hands-on
research (http://www.modeltalker.com)
10. Speech Recognition
• Spoken Input
• Identify words and phonemes in speech
– Generate text for recognized word parts
– Concatenate text elements
– Perform spelling, grammar and context checking
• Output results
• Research question: How can speech recognition
assist a deaf student taking notes in class?
• VUST – Villanova University Speech Transcriber
(http://www.csc.villanova.edu/~tway/publications/wayAT08.pdf)
11. Textual Analysis - Readability
• Text Input
• Analyze text & estimate “readability”
– Grade level of writing
– Consistency of writing
– Appropriateness for certain educ. level
• Output results
• Research question: How can computer
analyze text and measure readability?
• Opportunities for hands-on research
12. Plagiarism Detection
• Text Input
• Analyze text & locate “candidates”
– Find one or more passages that might be plagiarized
– Algorithm tries to do what a teacher does
– Search on Internet for candidate matches
• Output results
• Research question: What algorithms work like
humans when finding plagiarism?
• Experimental CS research
13. Intelligent Agents
• Example: ELIZA
• AIML: Artificial Intelligence Modeling Lang.
• Human types something
• Computer parses, “understands”, and generates
response
• Response is viewed by human
• Research question: How can computers
“understand” and “generate” human writing?
• Also good area for experimentation
15. What is Image Processing?
• Digital Image Processing
– Analog transmission in 1920
– Early improvements in 1920s
– Required digital computer (1948)
– Rapid advancement since
16. Historical Background
Newspaper industry used
Bartlane cable picture
transmission system to send
pictures by submarine cable
between London and New
York in 1920s
The number of distinct gray
levels coded by Bartlane
system was improved from 5
to 15 by the end of 1920s
17. Digital Image Processing
• The images in previous slides are digital
(now), but they are NOT the result of DIP
• Digital Image Processing is
– Processing digital images by a digital
computer
• DIP requires a digital computer and other
supporting technologies (e.g., data storage,
display and transmission)
18. Cool Applications
The first picture of moon
by US spacecraft Ranger 7
on July 31, 1964 at
9:09AM EDT
•Digitization
•Compression
•Error Recovery
Sir Godfrey N. Housefield and Prof.
Allan M. Cormack shared 1979
Nobel Prize in Medicine for the
invention of CT
• Enhancement
• Edges, Contrast,
Brightness, etc.
19. • Acquisition
– Digital cameras, scanners
– MRI and Ultrasound imaging
– Infrared and microwave imaging
• Transmission
– Internet, wireless communication
• Display
– Printers, LCD monitor, digital TV
Past 20 Years
30. General Themes
• Human vision is limited
• Digital images contain more information
that humans perceive
• Computers can use algorithms to extract
more information from digital images
• Computers can acquire, manipulate,
compress, transmit and modify images
31. Topic Ideas
1.Biometrics – identifying faces & retinas
2.Target Acquisition – see a tank from space
3.Computer Vision – detect microscopic flaws in
manufacturing
4.Assistive Technology – convert visual images
into tactile or textual form
5.Entertainment – remove red eye, morph faces,
digital filmmaking, movie magic
6.Image Description – use 3D dictionary to
describe contents of 2D image