SlideShare a Scribd company logo
1 of 17
홍은기
PYTHON LEARNING FOR
NATURAL LANGUAGE
PROCESSING
1. Learning Sequence
2. Lists and Functions
3. Loops
4. Processing Raw Text with NLTK
CONTENTS
• 1. Python Syntax
• 2. Strings and Console Output
• 3. Conditionals and Control Flow
• 4. Functions
• 5. Lists & Dictionaries
• 6. Student Becomes the Teacher(test)
• 7. Lists and Functions
• 8. Loops
• 9. Exam Statistics(test)
• 10. Advanced Topic in Python
• 11. Introduction to Classes
• 12. File Input and Output
LEARNING SEQUENCE
(WWW.CODECADEMY.COM)
LISTS AND FUNCTIONS
LOOPS
PROCESSING RAW TEXT
WITH NLTK
(http://www.nltk.org/book/)
웹 상의 HTML 문서로부터 텍스트를 추출 후 ,
NLTK 를 사용하여 텍스트의 키워드를 추출
After extracting a text from HTML document on the
web, I tried to extract keywords from the text with
NLTK.
EXAMPLES
이주 아동 외면하는 ' 다문화 한국사회‘
(http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea)
[('', 65), ('(', 9), (')', 9), (' 한다 ', 6), ("'", 6), (' 있다 ', 5),
(' 아동 ', 5), (' 큰 ', 5), (' 모든 ', 5), (' 일 ', 5), (' 국제 ', 4),
(' 대한민국 ', 4), (' 나라 ', 4), (' 땅 ', 4), (' 국제사회 ', 4),
(' 인권 ', 4), (' 의원 ', 3), (' 세계 ', 3), (' 여의 ', 3), (' 수 ', 3),
(' 안 ', 3), (' 강한 ', 3), (' 불문 ', 2), (' 이주 ', 2), (' 법무부 ', 2)]
1. HTML TO RAW TEXT
# -*- coding: utf-8 -*-
from urllib import request
import nltk, re, pprint
from nltk import word_tokenize
from nltk import *
from bs4 import BeautifulSoup
url = “http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea”
html = request.urlopen(url).read().decode(‘utf8’)
raw = BeautifulSoup(html).get_text()
1. HTML TO RAW TEXT
# -*- coding: utf-8 -*-
from urllib import request
import nltk, re, pprint
from nltk import word_tokenize
from nltk import *
from bs4 import BeautifulSoup
url = “http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea”
html = request.urlopen(url).read().decode(‘utf8’)
raw = BeautifulSoup(html).get_text()
2. RAW TEXT TO LIST
raw = raw[30123:32364]
print (type(raw))
-> <class ‘str’>
tokens = word_tokenize(raw)
print (type(tokens))
-> <class ‘list’>
3. LIST TO VOCABULARIES
words = Trial.NounExtractor(token)
3. LIST TO VOCABULARIES
words = Trial.NounExtractor(token)
3. LIST TO VOCABULARIES
token = [‘ 철수는’ , ‘ 동생에게’ , ‘ 자전거를’ , ‘ 빌려주었다’ ]
words = Trial.NounExtractor(token)
words = [‘ 철수’ , ‘ 동생’ , ‘ 자전거’ , ‘ 빌려주었다’ ]
4. FREQUENCY DISTRIBUTION
fdist = FreqDist(words)
print (fdist.most_common(25))
4. FREQUENCY DISTRIBUTION
fdist = FreqDist(words)
print (fdist.most_common(25))
EXAMPLES
나를 끌어내린 롯데월드
(http://www.huffingtonpost.kr/seungjoon-
ahn/story_b_6928016.html?utm_hp_ref=korea)
[('', 63), (' 그 ', 19), (' 것 ', 12), (' 우리 ', 10), ('!', 8), (' 없 ', 8),
(' 놀이기구 ', 8), (' 직원 ', 8), (' 수 ', 8), (' 시각장애인 ', 8), (' 안 ', 6),
(' 내 ', 5), (' 있었다 ', 5), (' 않 ', 5), (' 매뉴얼 ', 5), (' 근거 ', 5), (' 사람 ', 5),
(' 롯데월드 ', 5), (' 다른 ', 5), (' 있던 ', 4), (' 한 ', 4), (' 장애인 ', 4),
(' 설명 ', 4), (' 때 ', 4), (' 상황 ', 4)]
POS TAGGED
Thank_VB You_PRP !_.

More Related Content

Similar to Python learning for Natural Language Processing (2nd)

NSLogger network logging extension
NSLogger network logging extensionNSLogger network logging extension
NSLogger network logging extensionCocoaHeads France
 
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Runwei Qiang
 
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Runwei Qiang
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Toolcrus0e
 
IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013Stuart Myles
 
Web scraping with php
Web scraping with phpWeb scraping with php
Web scraping with phpChakrit Phain
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
N hidden gems you didn't know hippo delivery tier and hippo (forge) could give
N hidden gems you didn't know hippo delivery tier and hippo (forge) could giveN hidden gems you didn't know hippo delivery tier and hippo (forge) could give
N hidden gems you didn't know hippo delivery tier and hippo (forge) could giveWoonsan Ko
 
Consuming RESTful Web services in PHP
Consuming RESTful Web services in PHPConsuming RESTful Web services in PHP
Consuming RESTful Web services in PHPZoran Jeremic
 
Consuming RESTful services in PHP
Consuming RESTful services in PHPConsuming RESTful services in PHP
Consuming RESTful services in PHPZoran Jeremic
 
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Lucidworks
 
How to scrape data as economics student
How to scrape data as economics studentHow to scrape data as economics student
How to scrape data as economics studentNikolay Tretyakov
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia
 
Everybody Loves AFNetworking ... and So Can you!
Everybody Loves AFNetworking ... and So Can you!Everybody Loves AFNetworking ... and So Can you!
Everybody Loves AFNetworking ... and So Can you!jeffsoto
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 

Similar to Python learning for Natural Language Processing (2nd) (20)

NSLogger network logging extension
NSLogger network logging extensionNSLogger network logging extension
NSLogger network logging extension
 
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
 
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
Feature Extraction for Effective Microblog Search and Adaptive Clustering Alg...
 
TYPO3 Transition Tool
TYPO3 Transition ToolTYPO3 Transition Tool
TYPO3 Transition Tool
 
IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013
 
Crawler 2
Crawler 2Crawler 2
Crawler 2
 
Web scraping with php
Web scraping with phpWeb scraping with php
Web scraping with php
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Linq
LinqLinq
Linq
 
N hidden gems you didn't know hippo delivery tier and hippo (forge) could give
N hidden gems you didn't know hippo delivery tier and hippo (forge) could giveN hidden gems you didn't know hippo delivery tier and hippo (forge) could give
N hidden gems you didn't know hippo delivery tier and hippo (forge) could give
 
Yql hacku iitd_2012
Yql hacku iitd_2012Yql hacku iitd_2012
Yql hacku iitd_2012
 
Rest web services
Rest web servicesRest web services
Rest web services
 
Consuming RESTful Web services in PHP
Consuming RESTful Web services in PHPConsuming RESTful Web services in PHP
Consuming RESTful Web services in PHP
 
Consuming RESTful services in PHP
Consuming RESTful services in PHPConsuming RESTful services in PHP
Consuming RESTful services in PHP
 
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
Challenges of Simple Documents: When Basic isn't so Basic - Cassandra Targett...
 
Angular2 inter3
Angular2 inter3Angular2 inter3
Angular2 inter3
 
How to scrape data as economics student
How to scrape data as economics studentHow to scrape data as economics student
How to scrape data as economics student
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntop
 
Everybody Loves AFNetworking ... and So Can you!
Everybody Loves AFNetworking ... and So Can you!Everybody Loves AFNetworking ... and So Can you!
Everybody Loves AFNetworking ... and So Can you!
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 

More from EunGi Hong

최소 편집 거리와 동적 프로그래밍
최소 편집 거리와 동적 프로그래밍최소 편집 거리와 동적 프로그래밍
최소 편집 거리와 동적 프로그래밍EunGi Hong
 
철자 교정기
철자 교정기철자 교정기
철자 교정기EunGi Hong
 
라틴어로 보는 컴퓨터 과학
라틴어로 보는 컴퓨터 과학라틴어로 보는 컴퓨터 과학
라틴어로 보는 컴퓨터 과학EunGi Hong
 
Android App Bar
Android App BarAndroid App Bar
Android App BarEunGi Hong
 
검색엔진 오픈 소스 Lucene
검색엔진 오픈 소스 Lucene검색엔진 오픈 소스 Lucene
검색엔진 오픈 소스 LuceneEunGi Hong
 
Haskell and Function
Haskell and FunctionHaskell and Function
Haskell and FunctionEunGi Hong
 
Wordswordswords
WordswordswordsWordswordswords
WordswordswordsEunGi Hong
 
Haskell and List
Haskell and ListHaskell and List
Haskell and ListEunGi Hong
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingEunGi Hong
 
Ah Counter App 마무리
Ah Counter App 마무리Ah Counter App 마무리
Ah Counter App 마무리EunGi Hong
 
안드로이드 개발하기 3rd week
안드로이드 개발하기 3rd week안드로이드 개발하기 3rd week
안드로이드 개발하기 3rd weekEunGi Hong
 
안드로이드 개발하기 2nd week
안드로이드 개발하기 2nd week안드로이드 개발하기 2nd week
안드로이드 개발하기 2nd weekEunGi Hong
 
안드로이드 개발하기_1st
안드로이드 개발하기_1st안드로이드 개발하기_1st
안드로이드 개발하기_1stEunGi Hong
 
Python Learning for Natural Language Processing
Python Learning for Natural Language ProcessingPython Learning for Natural Language Processing
Python Learning for Natural Language ProcessingEunGi Hong
 

More from EunGi Hong (16)

최소 편집 거리와 동적 프로그래밍
최소 편집 거리와 동적 프로그래밍최소 편집 거리와 동적 프로그래밍
최소 편집 거리와 동적 프로그래밍
 
철자 교정기
철자 교정기철자 교정기
철자 교정기
 
라틴어로 보는 컴퓨터 과학
라틴어로 보는 컴퓨터 과학라틴어로 보는 컴퓨터 과학
라틴어로 보는 컴퓨터 과학
 
Android App Bar
Android App BarAndroid App Bar
Android App Bar
 
검색엔진 오픈 소스 Lucene
검색엔진 오픈 소스 Lucene검색엔진 오픈 소스 Lucene
검색엔진 오픈 소스 Lucene
 
Haskell and Function
Haskell and FunctionHaskell and Function
Haskell and Function
 
Wordswordswords
WordswordswordsWordswordswords
Wordswordswords
 
Haskell and List
Haskell and ListHaskell and List
Haskell and List
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Automata
AutomataAutomata
Automata
 
Ah Counter App 마무리
Ah Counter App 마무리Ah Counter App 마무리
Ah Counter App 마무리
 
Linguistics
LinguisticsLinguistics
Linguistics
 
안드로이드 개발하기 3rd week
안드로이드 개발하기 3rd week안드로이드 개발하기 3rd week
안드로이드 개발하기 3rd week
 
안드로이드 개발하기 2nd week
안드로이드 개발하기 2nd week안드로이드 개발하기 2nd week
안드로이드 개발하기 2nd week
 
안드로이드 개발하기_1st
안드로이드 개발하기_1st안드로이드 개발하기_1st
안드로이드 개발하기_1st
 
Python Learning for Natural Language Processing
Python Learning for Natural Language ProcessingPython Learning for Natural Language Processing
Python Learning for Natural Language Processing
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Python learning for Natural Language Processing (2nd)

  • 2. 1. Learning Sequence 2. Lists and Functions 3. Loops 4. Processing Raw Text with NLTK CONTENTS
  • 3. • 1. Python Syntax • 2. Strings and Console Output • 3. Conditionals and Control Flow • 4. Functions • 5. Lists & Dictionaries • 6. Student Becomes the Teacher(test) • 7. Lists and Functions • 8. Loops • 9. Exam Statistics(test) • 10. Advanced Topic in Python • 11. Introduction to Classes • 12. File Input and Output LEARNING SEQUENCE (WWW.CODECADEMY.COM)
  • 6. PROCESSING RAW TEXT WITH NLTK (http://www.nltk.org/book/) 웹 상의 HTML 문서로부터 텍스트를 추출 후 , NLTK 를 사용하여 텍스트의 키워드를 추출 After extracting a text from HTML document on the web, I tried to extract keywords from the text with NLTK.
  • 7. EXAMPLES 이주 아동 외면하는 ' 다문화 한국사회‘ (http://www.huffingtonpost.kr/kyongwhan- ahn/story_b_6927970.html?utm_hp_ref=korea) [('', 65), ('(', 9), (')', 9), (' 한다 ', 6), ("'", 6), (' 있다 ', 5), (' 아동 ', 5), (' 큰 ', 5), (' 모든 ', 5), (' 일 ', 5), (' 국제 ', 4), (' 대한민국 ', 4), (' 나라 ', 4), (' 땅 ', 4), (' 국제사회 ', 4), (' 인권 ', 4), (' 의원 ', 3), (' 세계 ', 3), (' 여의 ', 3), (' 수 ', 3), (' 안 ', 3), (' 강한 ', 3), (' 불문 ', 2), (' 이주 ', 2), (' 법무부 ', 2)]
  • 8. 1. HTML TO RAW TEXT # -*- coding: utf-8 -*- from urllib import request import nltk, re, pprint from nltk import word_tokenize from nltk import * from bs4 import BeautifulSoup url = “http://www.huffingtonpost.kr/kyongwhan- ahn/story_b_6927970.html?utm_hp_ref=korea” html = request.urlopen(url).read().decode(‘utf8’) raw = BeautifulSoup(html).get_text()
  • 9. 1. HTML TO RAW TEXT # -*- coding: utf-8 -*- from urllib import request import nltk, re, pprint from nltk import word_tokenize from nltk import * from bs4 import BeautifulSoup url = “http://www.huffingtonpost.kr/kyongwhan- ahn/story_b_6927970.html?utm_hp_ref=korea” html = request.urlopen(url).read().decode(‘utf8’) raw = BeautifulSoup(html).get_text()
  • 10. 2. RAW TEXT TO LIST raw = raw[30123:32364] print (type(raw)) -> <class ‘str’> tokens = word_tokenize(raw) print (type(tokens)) -> <class ‘list’>
  • 11. 3. LIST TO VOCABULARIES words = Trial.NounExtractor(token)
  • 12. 3. LIST TO VOCABULARIES words = Trial.NounExtractor(token)
  • 13. 3. LIST TO VOCABULARIES token = [‘ 철수는’ , ‘ 동생에게’ , ‘ 자전거를’ , ‘ 빌려주었다’ ] words = Trial.NounExtractor(token) words = [‘ 철수’ , ‘ 동생’ , ‘ 자전거’ , ‘ 빌려주었다’ ]
  • 14. 4. FREQUENCY DISTRIBUTION fdist = FreqDist(words) print (fdist.most_common(25))
  • 15. 4. FREQUENCY DISTRIBUTION fdist = FreqDist(words) print (fdist.most_common(25))
  • 16. EXAMPLES 나를 끌어내린 롯데월드 (http://www.huffingtonpost.kr/seungjoon- ahn/story_b_6928016.html?utm_hp_ref=korea) [('', 63), (' 그 ', 19), (' 것 ', 12), (' 우리 ', 10), ('!', 8), (' 없 ', 8), (' 놀이기구 ', 8), (' 직원 ', 8), (' 수 ', 8), (' 시각장애인 ', 8), (' 안 ', 6), (' 내 ', 5), (' 있었다 ', 5), (' 않 ', 5), (' 매뉴얼 ', 5), (' 근거 ', 5), (' 사람 ', 5), (' 롯데월드 ', 5), (' 다른 ', 5), (' 있던 ', 4), (' 한 ', 4), (' 장애인 ', 4), (' 설명 ', 4), (' 때 ', 4), (' 상황 ', 4)]