Successfully reported this slideshow.
Mining The Social Web  Ch8 Blogs et al.: Natural Language     Processing (and Beyond)      Ⅰ               발표 : 김연기     네이...
Natural Language       Processing• 마침표로 문장을 처리하자!
Natural Language       Processing• 마침표로 문장을 처리하자!
NLP Pipeline With NLTK        문장의 끝 찾기        단어 자르기       구문별 짝짖기(?)        단어 의미 부여          추출
Natural Language         Processing• 문장의 끝 찾기(EOS Detection)
Natural Language         Processing• 문장의 끝 찾기(EOS Detection)
Natural Language         Processing• 구문별 짝짓기 (POS Tagging)
Natural Language   Processing
Natural Language           Processing• 추출( Extraction)
Natural Language   Processing
Natural Language   Processing
Natural Language               Processingdef cleanHtml(html):return BeautifulStoneSoup(clean_html(html),convertEntities=Be...
Natural Language               Processing# Basic statsnum_words = sum([i[1] for i in fdist.items()])num_unique_words = len...
Natural Language   Processing
Natural Language               Processing# Summaization Approach 1:# Filter out non-significant sentences by using the ave...
Natural Language   Processing
Natural Language         Processing– Luhn’s Summarization Algorithm  • Score = (문장에서 중요한 단어)^2)/(문장 총단어    수)
Natural Language         Processing– Luhn’s Summarization Algorithm  • Score =
Upcoming SlideShare
Loading in …5
×

Mining the social web ch8 - 1

572 views

Published on

Published in: Education, Technology
  • Be the first to comment

Mining the social web ch8 - 1

  1. 1. Mining The Social Web Ch8 Blogs et al.: Natural Language Processing (and Beyond) Ⅰ 발표 : 김연기 네이버 아키텍트를 꿈꾸는 사람들 http://Cafe.naver.com/architect1
  2. 2. Natural Language Processing• 마침표로 문장을 처리하자!
  3. 3. Natural Language Processing• 마침표로 문장을 처리하자!
  4. 4. NLP Pipeline With NLTK 문장의 끝 찾기 단어 자르기 구문별 짝짖기(?) 단어 의미 부여 추출
  5. 5. Natural Language Processing• 문장의 끝 찾기(EOS Detection)
  6. 6. Natural Language Processing• 문장의 끝 찾기(EOS Detection)
  7. 7. Natural Language Processing• 구문별 짝짓기 (POS Tagging)
  8. 8. Natural Language Processing
  9. 9. Natural Language Processing• 추출( Extraction)
  10. 10. Natural Language Processing
  11. 11. Natural Language Processing
  12. 12. Natural Language Processingdef cleanHtml(html):return BeautifulStoneSoup(clean_html(html),convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]fp = feedparser.parse(FEED_URL)print "Fetched %s entries from %s" %(len(fp.entries[0].title), fp.feed.title)blog_posts = []for e in fp.entries:blog_posts.append({title: e.title, content: cleanHtml(e.content[0].value), link: e.links[0].href})
  13. 13. Natural Language Processing# Basic statsnum_words = sum([i[1] for i in fdist.items()])num_unique_words = len(fdist.keys())# Hapaxes are words that appear only oncenum_hapaxes = len(fdist.hapaxes())top_10_words_sans_stop_words = [w for w in fdist.items()if w[0] not in stop_words][:10]print post[title]print tNum Sentences:.ljust(25), len(sentences)print tNum Words:.ljust(25), num_wordsprint tNum Unique Words:.ljust(25), num_unique_wordsprint tNum Hapaxes:.ljust(25), num_hapaxesprint tTop 10 Most Frequent Words (sans stop words):ntt,ntt.join([%s (%s)‘ % (w[0], w[1]) for w in top_10_words_sans_stop_words])print
  14. 14. Natural Language Processing
  15. 15. Natural Language Processing# Summaization Approach 1:# Filter out non-significant sentences by using the averagescore plus a# fraction of the std dev as a filteravg = numpy.mean([s[1] for s in scored_sentences])std = numpy.std([s[1] for s in scored_sentences])mean_scored = [(sent_idx, score) for (sent_idx, score) inscored_sentences if score > avg + 0.5 * std]# Summarization Approach 2:# Another approach would be to return only the top N rankedsentences top_n_scored = sorted(scored_sentences, key=lambda s:s[1])[-TOP_SENTENCES:] top_n_scored = sorted(top_n_scored, key=lambda s: s[0])
  16. 16. Natural Language Processing
  17. 17. Natural Language Processing– Luhn’s Summarization Algorithm • Score = (문장에서 중요한 단어)^2)/(문장 총단어 수)
  18. 18. Natural Language Processing– Luhn’s Summarization Algorithm • Score =

×