SlideShare a Scribd company logo
1 of 14
Download to read offline
Introduction to
NLP
What is Natural
Language Processing?
Dan	
  Jurafsky	
  
Ques%on	
  Answering:	
  IBM’s	
  Watson	
  
• Won	
  Jeopardy	
  on	
  February	
  16,	
  2011!	
  
2	
  
WILLIAM WILKINSON’S
“AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”
INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
Bram	
  Stoker	
  
Dan	
  Jurafsky	
  
Informa%on	
  Extrac%on	
  
Subject:	
  curriculum	
  mee%ng	
  
	
  	
  	
  	
  	
  Date:	
  January	
  15,	
  2012	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  To:	
  Dan	
  Jurafsky	
  
	
  
Hi	
  Dan,	
  we’ve	
  now	
  scheduled	
  the	
  curriculum	
  meeIng.	
  
It	
  will	
  be	
  in	
  Gates	
  159	
  tomorrow	
  from	
  10:00-­‐11:30.	
  
-­‐Chris	
  
3	
  
Create new Calendar entry
Event: Curriculum mtg
Date: Jan-16-2012
Start: 10:00am
End: 11:30am
Where: Gates 159
Dan	
  Jurafsky	
  
Informa%on	
  Extrac%on	
  &	
  Sen%ment	
  Analysis	
  
• nice	
  and	
  compact	
  to	
  carry!	
  	
  
• since	
  the	
  camera	
  is	
  small	
  and	
  light,	
  I	
  won't	
  need	
  to	
  carry	
  
around	
  those	
  heavy,	
  bulky	
  professional	
  cameras	
  either!	
  	
  
• the	
  camera	
  feels	
  flimsy,	
  is	
  plasIc	
  and	
  very	
  light	
  in	
  weight	
  you	
  
have	
  to	
  be	
  very	
  delicate	
  in	
  the	
  handling	
  of	
  this	
  camera	
  
4	
  
Size	
  and	
  weight	
  
AWributes:	
  
	
  zoom	
  
	
  affordability	
  
	
  size	
  and	
  weight	
  
	
  flash	
  	
  
	
  ease	
  of	
  use	
  
✓	
  
✗	
  
✓	
  
Dan	
  Jurafsky	
  
Machine	
  Transla%on	
  
• Fully	
  automaIc	
  
5	
  
• Helping	
  human	
  translators	
  
Enter	
  Source	
  Text:	
  
TranslaIon	
  from	
  Stanford’s	
  Phrasal:	
  
这 不过 是 一 个 时间 的 问题 .	

	
  
This	
  is	
  only	
  a	
  maWer	
  of	
  Ime.	
  
	
  
Dan	
  Jurafsky	
  
Language	
  Technology	
  
Coreference	
  resoluIon	
  
QuesIon	
  answering	
  (QA)	
  
Part-­‐of-­‐speech	
  (POS)	
  tagging	
  
Word	
  sense	
  disambiguaIon	
  (WSD)	
  
Paraphrase	
  
Named	
  enIty	
  recogniIon	
  (NER)	
  
Parsing	
  
SummarizaIon	
  
InformaIon	
  extracIon	
  (IE)	
  
Machine	
  translaIon	
  (MT)	
  
Dialog	
  
SenIment	
  analysis	
  
	
  	
  	
  
mostly	
  solved	
  
making	
  good	
  progress	
  
sIll	
  really	
  hard	
  
Spam	
  detecIon	
  
Let’s	
  go	
  to	
  Agra!	
  
Buy	
  V1AGRA	
  …	
  
✓
✗
Colorless	
  	
  	
  green	
  	
  	
  ideas	
  	
  	
  sleep	
  	
  	
  furiously.	
  
	
  	
  	
  	
  	
  ADJ	
  	
  	
  	
  	
  	
  	
  	
  	
  ADJ	
  	
  	
  	
  NOUN	
  	
  VERB	
  	
  	
  	
  	
  	
  ADV	
  
Einstein	
  met	
  with	
  UN	
  officials	
  in	
  Princeton	
  
PERSON	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ORG	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LOC	
  
You’re	
  invited	
  to	
  our	
  dinner	
  
party,	
  Friday	
  May	
  27	
  at	
  8:30	
  
Party	
  
May	
  27	
  
add	
  
Best	
  roast	
  chicken	
  in	
  San	
  Francisco!	
  
The	
  waiter	
  ignored	
  us	
  for	
  20	
  minutes.	
  
Carter	
  told	
  Mubarak	
  he	
  shouldn’t	
  run	
  again.	
  
I	
  need	
  new	
  baWeries	
  for	
  my	
  mouse.	
  
The	
  13th	
  Shanghai	
  InternaIonal	
  Film	
  FesIval…	
  
第13届上海国际电影节开幕…
The	
  Dow	
  Jones	
  is	
  up	
  
Housing	
  prices	
  rose	
  
Economy	
  is	
  
good	
  
Q.	
  How	
  effecIve	
  is	
  ibuprofen	
  in	
  reducing	
  
fever	
  in	
  paIents	
  with	
  acute	
  febrile	
  illness?	
  
I	
  can	
  see	
  Alcatraz	
  from	
  the	
  window!	
  
XYZ	
  acquired	
  ABC	
  yesterday	
  
ABC	
  has	
  been	
  taken	
  over	
  by	
  XYZ	
  
Where	
  is	
  CiIzen	
  Kane	
  playing	
  in	
  SF?	
  	
  
Castro	
  Theatre	
  at	
  7:30.	
  Do	
  
you	
  want	
  a	
  Icket?	
  
The	
  S&P500	
  jumped	
  
Dan	
  Jurafsky	
  
Ambiguity makes NLP hard:
“Crash blossoms”	
  
Violinist	
  Linked	
  to	
  JAL	
  Crash	
  Blossoms	
  
Teacher	
  Strikes	
  Idle	
  Kids	
  
Red	
  Tape	
  Holds	
  Up	
  New	
  Bridges	
  
Hospitals	
  Are	
  Sued	
  by	
  7	
  Foot	
  Doctors	
  
Juvenile	
  Court	
  to	
  Try	
  ShooIng	
  Defendant	
  
Local	
  High	
  School	
  Dropouts	
  Cut	
  in	
  Half	
  
Dan	
  Jurafsky	
  
Ambiguity is pervasive
Fed raises interest rates
New York Times headline (17 May 2000)
Fed raises interest rates
Fed raises interest rates 0.5%
Dan	
  Jurafsky	
  
In-­‐video	
  quizzes!	
  
• Most	
  lectures	
  will	
  include	
  a	
  liWle	
  quiz	
  
• Just	
  to	
  check	
  basic	
  understanding	
  
• Simple,	
  mulIple-­‐choice.	
  
• You	
  can	
  retake	
  them	
  if	
  you	
  get	
  them	
  wrong	
  
9	
  
Dan	
  Jurafsky	
  
non-­‐standard	
  English	
  
Great	
  job	
  @jusInbieber!	
  Were	
  
SOO	
  PROUD	
  of	
  what	
  youve	
  
accomplished!	
  U	
  taught	
  us	
  2	
  
#neversaynever	
  &	
  you	
  yourself	
  
should	
  never	
  give	
  up	
  either♥	
  
segmenta%on	
  issues	
   idioms	
  
dark	
  horse	
  
get	
  cold	
  feet	
  
lose	
  face	
  
throw	
  in	
  the	
  towel	
  
neologisms	
  
unfriend	
  
Retweet	
  
bromance	
  
	
  
tricky	
  en%ty	
  names	
  
Where	
  is	
  A	
  Bug’s	
  Life	
  playing	
  …	
  
Let	
  It	
  Be	
  was	
  recorded	
  …	
  
…	
  a	
  mutaIon	
  on	
  the	
  for	
  gene	
  …	
  
world	
  knowledge	
  
Mary	
  and	
  Sue	
  are	
  sisters.	
  
Mary	
  and	
  Sue	
  are	
  mothers.	
  
But	
  that’s	
  what	
  makes	
  it	
  fun!	
  
the	
  New	
  York-­‐New	
  Haven	
  Railroad	
  
the	
  New	
  York-­‐New	
  Haven	
  Railroad	
  
Why else is natural language
understanding difficult?	
  
Dan	
  Jurafsky	
  
Making	
  progress	
  on	
  this	
  problem…	
  
• The	
  task	
  is	
  difficult!	
  	
  What	
  tools	
  do	
  we	
  need?	
  
• Knowledge	
  about	
  language	
  
• Knowledge	
  about	
  the	
  world	
  
• A	
  way	
  to	
  combine	
  knowledge	
  sources	
  
• How	
  we	
  generally	
  do	
  this:	
  
• probabilisIc	
  models	
  built	
  from	
  language	
  data	
  
• P(“maison”	
  →	
  “house”)	
  	
  	
  high	
  
• P(“L’avocat	
  général”	
  →	
  “the	
  general	
  avocado”)	
  	
  	
  low	
  
• Luckily,	
  rough	
  text	
  features	
  can	
  oven	
  do	
  half	
  the	
  job.	
  
Dan	
  Jurafsky	
  
This	
  class	
  
• Teaches	
  key	
  theory	
  and	
  methods	
  for	
  staIsIcal	
  NLP:	
  
• Viterbi	
  
• Naïve	
  Bayes,	
  Maxent	
  classifiers	
  
• N-­‐gram	
  language	
  modeling	
  
• StaIsIcal	
  Parsing	
  
• Inverted	
  index,	
  y-­‐idf,	
  	
  vector	
  models	
  of	
  meaning	
  
• For	
  pracIcal,	
  robust	
  real-­‐world	
  applicaIons	
  
• InformaIon	
  extracIon	
  
• Spelling	
  correcIon	
  
• InformaIon	
  retrieval	
  
• SenIment	
  analysis	
  
Dan	
  Jurafsky	
  
Skills	
  you’ll	
  need	
  
• Simple	
  linear	
  algebra	
  (vectors,	
  matrices)	
  
• Basic	
  probability	
  theory	
  
• Java	
  or	
  Python	
  programming	
  
• Weekly	
  programming	
  assignments	
  
Introduction to
NLP
What is Natural
Language Processing?

More Related Content

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

slides_intro.pdf

  • 1. Introduction to NLP What is Natural Language Processing?
  • 2. Dan  Jurafsky   Ques%on  Answering:  IBM’s  Watson   • Won  Jeopardy  on  February  16,  2011!   2   WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL Bram  Stoker  
  • 3. Dan  Jurafsky   Informa%on  Extrac%on   Subject:  curriculum  mee%ng            Date:  January  15,  2012                    To:  Dan  Jurafsky     Hi  Dan,  we’ve  now  scheduled  the  curriculum  meeIng.   It  will  be  in  Gates  159  tomorrow  from  10:00-­‐11:30.   -­‐Chris   3   Create new Calendar entry Event: Curriculum mtg Date: Jan-16-2012 Start: 10:00am End: 11:30am Where: Gates 159
  • 4. Dan  Jurafsky   Informa%on  Extrac%on  &  Sen%ment  Analysis   • nice  and  compact  to  carry!     • since  the  camera  is  small  and  light,  I  won't  need  to  carry   around  those  heavy,  bulky  professional  cameras  either!     • the  camera  feels  flimsy,  is  plasIc  and  very  light  in  weight  you   have  to  be  very  delicate  in  the  handling  of  this  camera   4   Size  and  weight   AWributes:    zoom    affordability    size  and  weight    flash      ease  of  use   ✓   ✗   ✓  
  • 5. Dan  Jurafsky   Machine  Transla%on   • Fully  automaIc   5   • Helping  human  translators   Enter  Source  Text:   TranslaIon  from  Stanford’s  Phrasal:   这 不过 是 一 个 时间 的 问题 .    This  is  only  a  maWer  of  Ime.    
  • 6. Dan  Jurafsky   Language  Technology   Coreference  resoluIon   QuesIon  answering  (QA)   Part-­‐of-­‐speech  (POS)  tagging   Word  sense  disambiguaIon  (WSD)   Paraphrase   Named  enIty  recogniIon  (NER)   Parsing   SummarizaIon   InformaIon  extracIon  (IE)   Machine  translaIon  (MT)   Dialog   SenIment  analysis         mostly  solved   making  good  progress   sIll  really  hard   Spam  detecIon   Let’s  go  to  Agra!   Buy  V1AGRA  …   ✓ ✗ Colorless      green      ideas      sleep      furiously.            ADJ                  ADJ        NOUN    VERB            ADV   Einstein  met  with  UN  officials  in  Princeton   PERSON                            ORG                                            LOC   You’re  invited  to  our  dinner   party,  Friday  May  27  at  8:30   Party   May  27   add   Best  roast  chicken  in  San  Francisco!   The  waiter  ignored  us  for  20  minutes.   Carter  told  Mubarak  he  shouldn’t  run  again.   I  need  new  baWeries  for  my  mouse.   The  13th  Shanghai  InternaIonal  Film  FesIval…   第13届上海国际电影节开幕… The  Dow  Jones  is  up   Housing  prices  rose   Economy  is   good   Q.  How  effecIve  is  ibuprofen  in  reducing   fever  in  paIents  with  acute  febrile  illness?   I  can  see  Alcatraz  from  the  window!   XYZ  acquired  ABC  yesterday   ABC  has  been  taken  over  by  XYZ   Where  is  CiIzen  Kane  playing  in  SF?     Castro  Theatre  at  7:30.  Do   you  want  a  Icket?   The  S&P500  jumped  
  • 7. Dan  Jurafsky   Ambiguity makes NLP hard: “Crash blossoms”   Violinist  Linked  to  JAL  Crash  Blossoms   Teacher  Strikes  Idle  Kids   Red  Tape  Holds  Up  New  Bridges   Hospitals  Are  Sued  by  7  Foot  Doctors   Juvenile  Court  to  Try  ShooIng  Defendant   Local  High  School  Dropouts  Cut  in  Half  
  • 8. Dan  Jurafsky   Ambiguity is pervasive Fed raises interest rates New York Times headline (17 May 2000) Fed raises interest rates Fed raises interest rates 0.5%
  • 9. Dan  Jurafsky   In-­‐video  quizzes!   • Most  lectures  will  include  a  liWle  quiz   • Just  to  check  basic  understanding   • Simple,  mulIple-­‐choice.   • You  can  retake  them  if  you  get  them  wrong   9  
  • 10. Dan  Jurafsky   non-­‐standard  English   Great  job  @jusInbieber!  Were   SOO  PROUD  of  what  youve   accomplished!  U  taught  us  2   #neversaynever  &  you  yourself   should  never  give  up  either♥   segmenta%on  issues   idioms   dark  horse   get  cold  feet   lose  face   throw  in  the  towel   neologisms   unfriend   Retweet   bromance     tricky  en%ty  names   Where  is  A  Bug’s  Life  playing  …   Let  It  Be  was  recorded  …   …  a  mutaIon  on  the  for  gene  …   world  knowledge   Mary  and  Sue  are  sisters.   Mary  and  Sue  are  mothers.   But  that’s  what  makes  it  fun!   the  New  York-­‐New  Haven  Railroad   the  New  York-­‐New  Haven  Railroad   Why else is natural language understanding difficult?  
  • 11. Dan  Jurafsky   Making  progress  on  this  problem…   • The  task  is  difficult!    What  tools  do  we  need?   • Knowledge  about  language   • Knowledge  about  the  world   • A  way  to  combine  knowledge  sources   • How  we  generally  do  this:   • probabilisIc  models  built  from  language  data   • P(“maison”  →  “house”)      high   • P(“L’avocat  général”  →  “the  general  avocado”)      low   • Luckily,  rough  text  features  can  oven  do  half  the  job.  
  • 12. Dan  Jurafsky   This  class   • Teaches  key  theory  and  methods  for  staIsIcal  NLP:   • Viterbi   • Naïve  Bayes,  Maxent  classifiers   • N-­‐gram  language  modeling   • StaIsIcal  Parsing   • Inverted  index,  y-­‐idf,    vector  models  of  meaning   • For  pracIcal,  robust  real-­‐world  applicaIons   • InformaIon  extracIon   • Spelling  correcIon   • InformaIon  retrieval   • SenIment  analysis  
  • 13. Dan  Jurafsky   Skills  you’ll  need   • Simple  linear  algebra  (vectors,  matrices)   • Basic  probability  theory   • Java  or  Python  programming   • Weekly  programming  assignments  
  • 14. Introduction to NLP What is Natural Language Processing?