Your SlideShare is downloading. ×
isd312-09-summarization
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

isd312-09-summarization

471
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
471
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Pertemuan 9: Summarization 12 Desember 2011
  • 2.  Summarization Diberikan sebuah dokumen (korpus), ringkas dalam kata-kata yang mewakili isinya Extractive summarization  kata-kata kunci Generative summarization  Kalimat ringkasan Information Retrieval – ISD312 Summarization 2
  • 3.  Simple statistics Most frequent words import nltk from __future__ import division from nltk.book import * Information Retrieval – ISD312 Summarization 3
  • 4. import nltkfrom __future__ import divisionfrom nltk.book import *def kataKunci(df, ambang): max = 0 for vocab in df.keys(): if max < df[vocab]: max = df[vocab] for vocab in df.keys(): if df[vocab] / max > ambang: print vocab, print Information Retrieval – ISD312 Summarization 4
  • 5.  Frase, Kumpulan kata Collocations Jaringan kata dalam dokumen Information Retrieval – ISD312 Summarization 5
  • 6.  Membangkitkan kalimat Simple statistics  Tabel statistik kemunculan kata  Statistik Bayesian  Probabilitas sebuah kata pada awal kalimat  Probabilitas sebuah kata mengikuti kata lainnya Metode lain  N-gram  POS-tag Information Retrieval – ISD312 Summarization 6
  • 7. The rapid growth of the Internet has resulted in enormous amounts of information that has become more difficult to access efficiently. Internet users require tools to help manage this vast quantity of information. The primary goal of this research is to create an efficient and effective tool that is able to summarize large documents quickly. This research presents a linear time algorithm for calculating lexical chains which is a method of capturing the “aboutness” of a document. This method is compared to previous, less efficient methods of lexical chain extraction. We also provide alternative methods for extracting and scoring lexical chains. We show that our method provides similar results to previous research, but is substantially more efficient. This efficiency is necessary in Internet search applications where many large documents may need to be summarized at once, and where the response time to the end user is extremely important. Information Retrieval – ISD312 Summarization 7
  • 8. import osos.chdir(pathtotugas)import tugasreload(tugas) Information Retrieval – ISD312 Summarization 8
  • 9. import nltkdata = Sebuah contoh kalimat yang ingin dianalisis menggunakan NLTKtokens = nltk.word_tokenize(data)text = nltk.Text(tokens) Information Retrieval – ISD312 Summarization 9
  • 10.  http://www.nltk.org/book http://tjerdastangkas.blogspot.com/search/label/isd312 Information Retrieval – ISD312 Summarization 10
  • 11. Senin, 12 Desember 2011

×