Python learning for Natural Language Processing (2nd)

•Download as PPT, PDF•

0 likes•797 views

EunGi Hong

NLTK (Natural Language Toolkit)을 사용한 웹 텍스트 처리

Technology

홍은기
PYTHON LEARNING FOR
NATURAL LANGUAGE
PROCESSING

1. Learning Sequence
2. Lists and Functions
3. Loops
4. Processing Raw Text with NLTK
CONTENTS

• 1. Python Syntax
• 2. Strings and Console Output
• 3. Conditionals and Control Flow
• 4. Functions
• 5. Lists & Dictionaries
• 6. Student Becomes the Teacher(test)
• 7. Lists and Functions
• 8. Loops
• 9. Exam Statistics(test)
• 10. Advanced Topic in Python
• 11. Introduction to Classes
• 12. File Input and Output
LEARNING SEQUENCE
(WWW.CODECADEMY.COM)

PROCESSING RAW TEXT
WITH NLTK
(http://www.nltk.org/book/)
웹 상의 HTML 문서로부터 텍스트를 추출 후 ,
NLTK 를 사용하여 텍스트의 키워드를 추출
After extracting a text from HTML document on the
web, I tried to extract keywords from the text with
NLTK.

EXAMPLES
이주 아동 외면하는 ' 다문화 한국사회‘
(http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea)
[('', 65), ('(', 9), (')', 9), (' 한다 ', 6), ("'", 6), (' 있다 ', 5),
(' 아동 ', 5), (' 큰 ', 5), (' 모든 ', 5), (' 일 ', 5), (' 국제 ', 4),
(' 대한민국 ', 4), (' 나라 ', 4), (' 땅 ', 4), (' 국제사회 ', 4),
(' 인권 ', 4), (' 의원 ', 3), (' 세계 ', 3), (' 여의 ', 3), (' 수 ', 3),
(' 안 ', 3), (' 강한 ', 3), (' 불문 ', 2), (' 이주 ', 2), (' 법무부 ', 2)]

1. HTML TO RAW TEXT
# -*- coding: utf-8 -*-
from urllib import request
import nltk, re, pprint
from nltk import word_tokenize
from nltk import *
from bs4 import BeautifulSoup
url = “http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea”
html = request.urlopen(url).read().decode(‘utf8’)
raw = BeautifulSoup(html).get_text()

2. RAW TEXT TO LIST
raw = raw[30123:32364]
print (type(raw))
-> <class ‘str’>
tokens = word_tokenize(raw)
print (type(tokens))
-> <class ‘list’>

3. LIST TO VOCABULARIES
words = Trial.NounExtractor(token)

3. LIST TO VOCABULARIES
token = [‘ 철수는’ , ‘ 동생에게’ , ‘ 자전거를’ , ‘ 빌려주었다’ ]
words = Trial.NounExtractor(token)
words = [‘ 철수’ , ‘ 동생’ , ‘ 자전거’ , ‘ 빌려주었다’ ]

4. FREQUENCY DISTRIBUTION
fdist = FreqDist(words)
print (fdist.most_common(25))

EXAMPLES
나를 끌어내린 롯데월드
(http://www.huffingtonpost.kr/seungjoon-
ahn/story_b_6928016.html?utm_hp_ref=korea)
[('', 63), (' 그 ', 19), (' 것 ', 12), (' 우리 ', 10), ('!', 8), (' 없 ', 8),
(' 놀이기구 ', 8), (' 직원 ', 8), (' 수 ', 8), (' 시각장애인 ', 8), (' 안 ', 6),
(' 내 ', 5), (' 있었다 ', 5), (' 않 ', 5), (' 매뉴얼 ', 5), (' 근거 ', 5), (' 사람 ', 5),
(' 롯데월드 ', 5), (' 다른 ', 5), (' 있던 ', 4), (' 한 ', 4), (' 장애인 ', 4),
(' 설명 ', 4), (' 때 ', 4), (' 상황 ', 4)]

Similar to Python learning for Natural Language Processing (2nd)

NSLogger network logging extensionCocoaHeads France

Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Runwei Qiang

TYPO3 Transition Toolcrus0e

IPTC News in JSON Spring 2013Stuart Myles

Crawler 2Cheng-Yi Yu

Web scraping with phpChakrit Phain

What is the best full text search engine for Python?Andrii Soldatenko

Linqsamneang

N hidden gems you didn't know hippo delivery tier and hippo (forge) could giveWoonsan Ko

Yql hacku iitd_2012Anshu Prateek

Rest web servicesPaulo Gandra de Sousa

Consuming RESTful Web services in PHPZoran Jeremic

Consuming RESTful services in PHPZoran Jeremic

Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Lucidworks

Angular2 inter3Oswald Campesato

How to scrape data as economics studentNikolay Tretyakov

Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia

Everybody Loves AFNetworking ... and So Can you!jeffsoto

Examiness hints and tips from the trenchesIsmail Mayat

Similar to Python learning for Natural Language Processing (2nd) (20)

NSLogger network logging extension

Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...

TYPO3 Transition Tool

IPTC News in JSON Spring 2013

Crawler 2

Web scraping with php

What is the best full text search engine for Python?

Linq

N hidden gems you didn't know hippo delivery tier and hippo (forge) could give

Yql hacku iitd_2012

Rest web services

Consuming RESTful Web services in PHP

Consuming RESTful services in PHP

Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...

Angular2 inter3

How to scrape data as economics student

Monitoraggio del Traffico di Rete Usando Python ed ntop

Everybody Loves AFNetworking ... and So Can you!

Examiness hints and tips from the trenches

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

A Domino Admins Adventures (Engage 2024)Gabriella Davis

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

A Call to Action for Generative AI in 2024Results

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Slack Application Development 101 Slidespraypatel2

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Powerful Google developer tools for immediate impact! (2023-24 C)

Exploring the Future Potential of AI-Enabled Smartphone Processors

A Domino Admins Adventures (Engage 2024)

What Are The Drone Anti-jamming Systems Technology?

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

A Call to Action for Generative AI in 2024

Boost Fertility New Invention Ups Success Rates.pdf

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Handwritten Text Recognition for manuscripts and early printed texts

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Slack Application Development 101 Slides

Tata AIG General Insurance Company - Insurer Innovation Award 2024

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Advantages of Hiring UIUX Design Service Providers for Your Business

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Breaking the Kubernetes Kill Chain: Host Path Mount

Finology Group – Insurtech Innovation Award 2024

Automating Google Workspace (GWS) & more with Apps Script

Python learning for Natural Language Processing (2nd)

1. 홍은기 PYTHON LEARNING FOR NATURAL LANGUAGE PROCESSING

2. 1. Learning Sequence 2. Lists and Functions 3. Loops 4. Processing Raw Text with NLTK CONTENTS

3. • 1. Python Syntax • 2. Strings and Console Output • 3. Conditionals and Control Flow • 4. Functions • 5. Lists & Dictionaries • 6. Student Becomes the Teacher(test) • 7. Lists and Functions • 8. Loops • 9. Exam Statistics(test) • 10. Advanced Topic in Python • 11. Introduction to Classes • 12. File Input and Output LEARNING SEQUENCE (WWW.CODECADEMY.COM)

4. LISTS AND FUNCTIONS

5. LOOPS

6. PROCESSING RAW TEXT WITH NLTK (http://www.nltk.org/book/) 웹 상의 HTML 문서로부터 텍스트를 추출 후 , NLTK 를 사용하여 텍스트의 키워드를 추출 After extracting a text from HTML document on the web, I tried to extract keywords from the text with NLTK.

7. EXAMPLES 이주 아동 외면하는 ' 다문화 한국사회‘ (http://www.huffingtonpost.kr/kyongwhan- ahn/story_b_6927970.html?utm_hp_ref=korea) [('', 65), ('(', 9), (')', 9), (' 한다 ', 6), ("'", 6), (' 있다 ', 5), (' 아동 ', 5), (' 큰 ', 5), (' 모든 ', 5), (' 일 ', 5), (' 국제 ', 4), (' 대한민국 ', 4), (' 나라 ', 4), (' 땅 ', 4), (' 국제사회 ', 4), (' 인권 ', 4), (' 의원 ', 3), (' 세계 ', 3), (' 여의 ', 3), (' 수 ', 3), (' 안 ', 3), (' 강한 ', 3), (' 불문 ', 2), (' 이주 ', 2), (' 법무부 ', 2)]

8. 1. HTML TO RAW TEXT # -*- coding: utf-8 -*- from urllib import request import nltk, re, pprint from nltk import word_tokenize from nltk import * from bs4 import BeautifulSoup url = “http://www.huffingtonpost.kr/kyongwhan- ahn/story_b_6927970.html?utm_hp_ref=korea” html = request.urlopen(url).read().decode(‘utf8’) raw = BeautifulSoup(html).get_text()

9. 1. HTML TO RAW TEXT # -*- coding: utf-8 -*- from urllib import request import nltk, re, pprint from nltk import word_tokenize from nltk import * from bs4 import BeautifulSoup url = “http://www.huffingtonpost.kr/kyongwhan- ahn/story_b_6927970.html?utm_hp_ref=korea” html = request.urlopen(url).read().decode(‘utf8’) raw = BeautifulSoup(html).get_text()

10. 2. RAW TEXT TO LIST raw = raw[30123:32364] print (type(raw)) -> <class ‘str’> tokens = word_tokenize(raw) print (type(tokens)) -> <class ‘list’>

11. 3. LIST TO VOCABULARIES words = Trial.NounExtractor(token)

12. 3. LIST TO VOCABULARIES words = Trial.NounExtractor(token)

13. 3. LIST TO VOCABULARIES token = [‘ 철수는’ , ‘ 동생에게’ , ‘ 자전거를’ , ‘ 빌려주었다’ ] words = Trial.NounExtractor(token) words = [‘ 철수’ , ‘ 동생’ , ‘ 자전거’ , ‘ 빌려주었다’ ]

14. 4. FREQUENCY DISTRIBUTION fdist = FreqDist(words) print (fdist.most_common(25))

15. 4. FREQUENCY DISTRIBUTION fdist = FreqDist(words) print (fdist.most_common(25))

16. EXAMPLES 나를 끌어내린 롯데월드 (http://www.huffingtonpost.kr/seungjoon- ahn/story_b_6928016.html?utm_hp_ref=korea) [('', 63), (' 그 ', 19), (' 것 ', 12), (' 우리 ', 10), ('!', 8), (' 없 ', 8), (' 놀이기구 ', 8), (' 직원 ', 8), (' 수 ', 8), (' 시각장애인 ', 8), (' 안 ', 6), (' 내 ', 5), (' 있었다 ', 5), (' 않 ', 5), (' 매뉴얼 ', 5), (' 근거 ', 5), (' 사람 ', 5), (' 롯데월드 ', 5), (' 다른 ', 5), (' 있던 ', 4), (' 한 ', 4), (' 장애인 ', 4), (' 설명 ', 4), (' 때 ', 4), (' 상황 ', 4)]

17. POS TAGGED Thank_VB You_PRP !_.

Python learning for Natural Language Processing (2nd)

Recommended

Recommended

More Related Content

Similar to Python learning for Natural Language Processing (2nd)

Similar to Python learning for Natural Language Processing (2nd) (20)

More from EunGi Hong

More from EunGi Hong (16)

Recently uploaded

Recently uploaded (20)

Python learning for Natural Language Processing (2nd)