Your SlideShare is downloading. ×
  • Like
isd312-05-wordnet
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

isd312-05-wordnet

  • 750 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
750
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
36
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Pertemuan 5: WordNet, 8 November 2011
  • 2.  WordNet Latihan Open Problems WordNet – ISD312 NLTK dan Python 2
  • 3.  Kamus, Tesaurus English Manually developed Hubungan antara kata  synonym  hyponym, hypernym  meronym, holonym  antonym WordNet – ISD312 NLTK dan Python 3
  • 4.  Menggunakan package nltk  from nltk.corpus import wordnet as wn Mengakses synonym set (synset) sebuah kata  wn.synsets(motorcar)  wn.synset(car.n.01).lemma_names  wn.synset(car.n.01).lemmas  wn.synset(car.n.01).definition  wn.synset(car.n.01).examples WordNet – ISD312 NLTK dan Python 4
  • 5. >>> from nltk.corpus import wordnet as wn >>> wn.synsets(motorcar) [Synset(car.n.01)] motorcar adalah anggota himpunan sinonim car.n.01 Anggota lain dari himpunan sinonim car.n.01 >>> wn.synset(car.n.01).lemma_names [car, auto, automobile, machine, motorcar] WordNet – ISD312 NLTK dan Python 5
  • 6.  Lemma: nama synset dan kata  synset: car.n.01  kata: car  lemma: car.n.01.car Mendapatkan semua lemma dari himpunan sinonim car.n.01: >>> wn.synset(car.n.01).lemmas [Lemma(car.n.01.car), Lemma(car.n.01.auto), Lemma(car.n.01.automobile), Lemma(car.n.01.machine), Lemma(car.n.01.motorcar)] WordNet – ISD312 NLTK dan Python 6
  • 7. >>> wn.synset(car.n.01).definition a motor vehicle with four wheels; usually propelled by an internal combustion engine >>> wn.synset(car.n.01).examples [he needs a car to get to work] atau >>> wn.lemmas(car) [Lemma(car.n.01.car), Lemma(car.n.02.car), Lemma(car.n.03.car), Lemma(car.n.04.car), Lemma(cable_car.n.01.car)] WordNet – ISD312 NLTK dan Python 7
  • 8.  Kata car berada dalam synset berbeda >>> wn.synsets(car) [Synset(car.n.01), Synset(car.n.02), Synset(car.n.03), Synset(car.n.04), Synset(cable_car.n.01)] WordNet – ISD312 NLTK dan Python 8
  • 9. >>> for synset in wn.synsets(car):... print synset.lemma_names...[car, auto, automobile, machine, motorcar][car, railcar, railway_car, railroad_car][car, gondola][car, elevator_car][cable_car, car] WordNet – ISD312 NLTK dan Python 9
  • 10.  Hubungan generic-specific  animal, cat  vehicle, bicycle >>> motorcar = wn.synset(car.n.01) >>> types_of_motorcar = motorcar.hyponyms() >>> len(types_of_motorcar) 31 >>> types_of_motorcar[26] Synset(ambulance.n.01) WordNet – ISD312 NLTK dan Python 10
  • 11. >>> sorted([lemma.name for synset in types_of_motorcar for lemma in synset.lemmas])[Model_T, S.U.V., SUV, Stanley_Steamer, ambulance, beach_waggon,...] WordNet – ISD312 NLTK dan Python 11
  • 12. >>> motorcar.hypernyms()[Synset(motor_vehicle.n.01)]>>> paths = motorcar.hypernym_paths()>>> len(paths)2>>> [synset.name for synset in paths[0]][entity.n.01, physical_entity.n.01, object.n.01, whole.n.02, artifact.n.01,instrumentality.n.03, container.n.01, wheeled_vehicle.n.01,self-propelled_vehicle.n.01, motor_vehicle.n.01, car.n.01] WordNet – ISD312 NLTK dan Python 12
  • 13. >>> [synset.name for synset in paths[1]] [entity.n.01, physical_entity.n.01, object.n.01, whole.n.02, artifact.n.01, instrumentality.n.03, conveyance.n.03, vehicle.n.01, wheeled_vehicle.n.01, self-propelled_vehicle.n.01, motor_vehicle.n.01, car.n.01] Root Hypernym >>> motorcar.root_hypernyms() [Synset(entity.n.01)] WordNet – ISD312 NLTK dan Python 13
  • 14.  Bagian dari tree adalah trunk >>> wn.synset(tree.n.01).part_meronyms() [Synset(burl.n.02), Synset(crown.n.07), Synset(stump.n.01), Synset(trunk.n.01), Synset(limb.n.02)] >>> wn.synset(tree.n.01).substance_meronyms() [Synset(heartwood.n.01), Synset(sapwood.n.01)] >>> wn.synset(tree.n.01).member_holonyms() [Synset(forest.n.01)] WordNet – ISD312 NLTK dan Python 14
  • 15. >>> for synset in wn.synsets(mint, wn.NOUN):... print synset.name + :, synset.definition...batch.n.02: (often followed by `of) a large number or amount or extentmint.n.02: any north temperate plant of the genus Mentha with aromatic leaves andsmall mauve flowersmint.n.03: any member of the mint family of plantsmint.n.04: the leaves of a mint plant used fresh or candiedmint.n.05: a candy that is flavored with a mint oilmint.n.06: a plant where money is coined by authority of the government WordNet – ISD312 NLTK dan Python 15
  • 16. >>> wn.synset(mint.n.04).part_holonyms()[Synset(mint.n.02)]>>> wn.synset(mint.n.04).substance_holonyms()[Synset(mint.n.05)] WordNet – ISD312 NLTK dan Python 16
  • 17. >>> wn.synset(walk.v.01).entailments()[Synset(step.v.01)]>>> wn.synset(eat.v.01).entailments()[Synset(swallow.v.01), Synset(chew.v.01)]>>> wn.synset(tease.v.03).entailments()[Synset(arouse.v.07), Synset(disappoint.v.01)] WordNet – ISD312 NLTK dan Python 17
  • 18. >>> wn.lemma(supply.n.02.supply).antonyms()[Lemma(demand.n.02.demand)]>>> wn.lemma(rush.v.01.rush).antonyms()[Lemma(linger.v.04.linger)]>>> wn.lemma(horizontal.a.01.horizontal).antonyms( )[Lemma(vertical.a.01.vertical), Lemma(inclined.a.02.inclined)]>>> wn.lemma(staccato.r.01.staccato).antonyms()[Lemma(legato.r.01.legato)] WordNet – ISD312 NLTK dan Python 18
  • 19.  Synsets dihubungkan oleh lexical relations Diberikan sebuah synset, telusuri WordNet untuk menemukan synset yang mirip secara semantic Penting untuk membangun index Penting untuk mengolah kueri Kueri vehicle mengambil juga dokumen tentang limousine WordNet – ISD312 NLTK dan Python 19
  • 20.  Semakin dekat path antara dua lemma, semakin mirip makna semantik kedua lemma tersebut >>> right = wn.synset(right_whale.n.01) >>> orca = wn.synset(orca.n.01) >>> minke = wn.synset(minke_whale.n.01) >>> tortoise = wn.synset(tortoise.n.01) >>> novel = wn.synset(novel.n.01) WordNet – ISD312 NLTK dan Python 20
  • 21. >>> right.lowest_common_hypernyms(minke)[Synset(baleen_whale.n.01)]>>> right.lowest_common_hypernyms(orca)[Synset(whale.n.02)]>>> right.lowest_common_hypernyms(tortoise)[Synset(vertebrate.n.01)]>>> right.lowest_common_hypernyms(novel)[Synset(entity.n.01)] WordNet – ISD312 NLTK dan Python 21
  • 22. >>> wn.synset(baleen_whale.n.01).min_depth()14>>> wn.synset(whale.n.02).min_depth()13>>> wn.synset(vertebrate.n.01).min_depth()8>>> wn.synset(entity.n.01).min_depth()0 WordNet – ISD312 NLTK dan Python 22
  • 23. >>> right.path_similarity(minke)0.25>>> right.path_similarity(orca)0.16666666666666666>>> right.path_similarity(tortoise)0.076923076923076927>>> right.path_similarity(novel)0.043478260869565216 WordNet – ISD312 NLTK dan Python 23
  • 24.  WordNet Bahasa Indonesia Thesaurus Bahasa Indonesia  kateglo Membuat WordNet secara otomatis  mengidentifikasi kemunculan bahasa gaul  sesuatu banget, rempong, jablay, lebay VerbNet  nltk.corpus.verbnet WordNet – ISD312 NLTK dan Python 24
  • 25.  Temukan semua senses dari kata dish  menurut penguasaan bahasa Inggris anda  menurut cara yang dibahas di kelas WordNet – ISD312 NLTK dan Python 25
  • 26.  Soal nomor 27: The polysemy of a word is the number of senses it has. Using WordNet, we can determine that the noun dog has seven senses with len(wn.synsets(dog, n)). Compute the average polysemy of nouns, verbs, adjectives, and adverbs according to WordNet. WordNet – ISD312 NLTK dan Python 26
  • 27.  http://www.nltk.org/book KAmus, TEsaurus, dan GLOsarium, http://bahtera.org/kateglo http://www.sinonimkata.com/ http://tjerdastangkas.blogspot.com/search/label/isd312 WordNet – ISD312 NLTK dan Python 27
  • 28.  Lexical richness Perbandingan jumlah tokens dengan jumlah kata unik  len(text1) / len(set(text1))  Integer division  from __future__ import division Jumlah kemunculan sebuah token  text1.count(whale)  100 * text1.count(whale) / len(text1) WordNet – ISD312 NLTK dan Python 28
  • 29. >>> def lexicalDiversity(text):... return len(text) / len(set(text))>>> def percentage(count, total):... return 100 * count / totallexicalDiversity(text5)percentage(text1.count(whale), len(text1)) WordNet – ISD312 NLTK dan Python 29