Your SlideShare is downloading. ×
20 Slides
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

20 Slides

119
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
119
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The impulse to do this came from what Anna did on Russian word processor manuals in a class I taught.
  • Jan made the Czech National Corpus, Prague Dependency Treebank. He is Eva’s son. Eva has, inter alia, worked for the Spielberg Foundation on collecting and indexing the memories of Shoah survivors No large annotated corpora of Russian exist in the public domain. But large annotated corpora of Czech do Largely thanks to Jan Hajic and predecessors in Prague.
  • Roman Jakobson was alleged to speak 9 languages, all of the Russian.
  • A model of the way parts of speech succeed each other and the way words come out of parts of speech.
  • A model of the way parts of speech succeed each other and the way words come out of parts of speech. In Russian. Of course, this could mean “The red book”.
  • A model of the way parts of speech succeed each other and the way words come out of parts of speech. In Russian. Of course, this could mean “The red book”.
  • A model of the way parts of speech succeed each other and the way words come out of parts of speech. In Russian. Of course, this could mean “The red book”.
  • Russian and Czech are sufficiently similar that the tags can be the same (a mild lie). But the words are often different.
  • Jiri Hana helped us make a morphological analyser for Russian. Doesn’t choose which possibility is right, just lays out the possibilities. We assign probabilities, but they are not very accurate, and they don’t need to be, because the other probabilities do the work.
  • Transcript

    • 1. NLP for minority languages
      • Chris Brew
      • CSE and Linguistics
    • 2. Anna Feldman
      • faculty at Montclair State University
      • Russian native speaker. Slavic language expertise
      • Learnt to program while at OSU
      • Double major BA in English and East Asian Studies, HUJ
    • 3. Russian POS tagging Jan Hajič Eva Hajič ova
    • 4. Resource light tagging
      • All human languages are related in some way.
      • Some of them closely related.
      • We used that fact
    • 5.  
    • 6. Markov Model Det Adj Noun A red book
    • 7. Markov Model Adj-Nom Nn-Nom книга красная
    • 8. Markov Model Adj-Nom Nn-Nom kniha Červená Shared
    • 9. Markov Model Adj-Nom Nn-Nom kniha Červená Shared
    • 10. Markov Model Need something to get Russian words Adj-Nom Nn-Nom kniha Červená Shared
    • 11. Text
    • 12.  
    • 13. Text
    • 14.
      • Kirk Baker
      • Phonetician/phonologist, worked with Chip Gerfen at UNC
      • Now working at Collexis in DC area as computational linguist.
    • 15. Heuristics for animacy
    • 16. Tagalog: not yet in Google Translate
    • 17. Credo
      • Language provides computer science with very interesting problems
      • Theoretical ideas like language relatedness can have real technological impact
      • Machine learning has a kind of eldritch magic
      • Teach me about your beautiful language...