Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

R을 이용한 국토부 실거래가 사이트 웹 스크래핑

5,194 views

Published on

R을 이용한

- 국토교통부 주택실거래 데이터
- 미국 세인트루이스 연준 데이터
- 한국은행 경제통계시스템(ECOS) 데이터

를 OPEN-API 혹은 웹 스크래핑으로 가져오기

Published in: Data & Analytics
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • accessibility Books Library allowing access to top content, including thousands of title from favorite author, plus the ability to read or download a huge selection of books for your pc or smartphone within minutes DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ...................................ALL FOR EBOOKS................................................. Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • -- DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT -- ......................................................................................................................... ......................................................................................................................... Download FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... (Unlimited)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • ACCESS that WEBSITE Over for All Ebooks (Unlimited) ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... DOWNLOAD FULL EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M }
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

R을 이용한 국토부 실거래가 사이트 웹 스크래핑

  1. 1. Overview • OPEN-API, Web 스크래핑에 대한 정의 • 구현을 위한 필수 기술 소개 • R에서 API로 FRED, ECOS 데이터 입수 • R에서 스크래핑으로 아파트실거래가 입수 • 입수데이터를 활용한 각 종 모형 분석
  2. 2. OPEN-API, Web 스크래핑 • 전통적인 데이터 입수 방법
  3. 3. OPEN-API, Web 스크래핑 Open API (often referred to as OpenAPI new technology) is a word used to describe sets of technologies that enable websites to interact with each other by using REST, SOAP, JavaScript and other web technologies. While its possibilities aren't limited to web-based applications, it's becoming an increasing trend in so-called Web 2.0 applications. Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.
  4. 4. OPEN-API, Web 스크래핑
  5. 5. OPEN-API
  6. 6. Web 스크래핑 http://rt.molit.go.kr/rtApt.do?cmd=getTradeAptLocal &dongCode=1168010600 &danjiCode=ALL &srhYear=2014 &srhPeriod=2 &gubunRadio2=1
  7. 7. 구현을 위한 필수 기술 소개 - R R Base 패키지 R Studio
  8. 8. 구현을 위한 필수 기술 소개 - JSON JSON(JavaScript Object Notation) <root> <ZFPMember> <name>문**</name> </ZFPMemer> <ZFPMember> <name>박**</name> </ZFPMemer> <ZFPMember> <name>김**</name> </ZFPMemer> <ZFPMember> <name>최**</name> </ZFPMemer> </root> { ZFPMember = [ { “name” : “문**”}, { “name”: “박**”}, {“name” : “김**”}, {“name”, “최**”} ] }
  9. 9. R에서 API로 ECOS, FRED 입수 API 입수를 위한 5단계 ① API KEY 유무 확인 ② 필요한 패키지 받기(jsonlite) ③ 쿼리 만들기 ④ 데이터 입수 ⑤ 분석(Parsing)
  10. 10. API KEY 유무 확인
  11. 11. 필요한 패키지 받기(jsonlite) > install.packages(“jsonlite”) > library(jsonlite)
  12. 12. 쿼리만들기(FRED) http://api.stlouisfed.org/fred/series/observations? series_id=CPIAUCSL &api_key=b55d00cc4e7ea4483038c2f6edad____ &file_type=json
  13. 13. 데이터 입수(FRED) library(jsonlite) series_id <- "CPIAUCSL" api_key <- "b55d00cc4e7ea4483038c2f6edad____" file_type <-"json" url = paste0("http://api.stlouisfed.org/fred/series/observations", "?series_id=",series_id, "&api_key=",api_key, "&file_type=",file_type) raw.data <- readLines(url, warn = "F",encoding="UTF-8")
  14. 14. 데이터 처리(FRED) > dat<- fromJSON(raw.data) > str(dat) List of 13 $ realtime_start : chr "2014-06-12" $ realtime_end : chr "2014-06-12" $ observation_start: chr "1776-07-04“ : $ limit : num 1e+05 $ observations :'data.frame': 808 obs. of 4 variables: ..$ realtime_start: chr [1:808] "2014-06-12" "2014-06-12" ..$ realtime_end : chr [1:808] "2014-06-12" "2014-06-12 ..$ date : chr [1:808] "1947-01-01" "1947-02-01" ..$ value : chr [1:808] "21.48" "21.62" "22.0" "22.0" ... > dat$observations$value
  15. 15. 쿼리만들기(ECOS) http://ecos.bok.or.kr/api/StatisticTableList/SCES3Y78SI__/xml/kr/1/10
  16. 16. 쿼리만들기(ECOS) http://ecos.bok.or.kr/api/StatisticItemList/sample/xml/kr/1/10/021Y123/ http://ecos.bok.or.kr/api/StatisticSearch/SCES3Y78SI__/xml/kr/1/1000/021Y1 23/MM/196501/201405/0/
  17. 17. 데이터 입수(ECOS) library(jsonlite) api_key = "SCES3Y78SI4P/“; file_type = "json/“; lang_type = "kr/" start_no = "1/“; end_no ="100/" stat_code = "021Y123/“; cycle_type = "MM/" start_date = "196501/“; end_date = "201405/" item_no = "0" url = paste0("http://ecos.bok.or.kr/api/StatisticSearch/", api_key,file_type,lang_type,start_no,end_no,stat_code,cycle_type,st art_date,end_date,item_no) raw.data <- readLines(url, warn = "F",encoding="UTF-8")
  18. 18. 데이터 처리(ECOS) > raw.data <- readLines(url, warn = "F", encoding="UTF-8") > dat<- fromJSON(raw.data) > str(dat) List of 1 $ StatisticSearch:List of 2 ..$ list_total_count: num 25 ..$ row :'data.frame': 10 obs. of 8 variables: .. ..$ UNIT_NAME : chr [1:10] "십억원 " "십억원 .. .. ..$ STAT_NAME : chr [1:10] "1.1.주요 통화금융지 .. .. ..$ STAT_CODE : chr [1:10] "010Y002" "010Y002" "010Y002" .. .. ..$ ITEM_NAME1: chr [1:10] "화폐발행잔액(말잔)" "화폐발 .. .. ..$ ITEM_NAME2: chr [1:10] " " " " " " " .. .. ..$ DATA_VALUE: chr [1:10] "49777.5" "50528" "50226. .. .. ..$ ITEM_NAME3: chr [1:10] " " " " " " " " .. .. ..$ TIME : chr [1:10] "201204" "201205“ > dat$StatisticSearch$row$DATA_VALUE
  19. 19. 데이터 분석(FRED) library(zoo) lst_series <- list("CPIAUCSL","UNRATE","FEDFUNDS") #소비자 물가지수,실업률, 기준금리 api_key <- "b55d00cc4e7ea4483038c2f6edad____" file_type <-"json" ts<-zoo() for(i in 1:length(lst_series)){ url = paste0("http://api.stlouisfed.org/fred/series/observations", "?series_id=",lst_series[i], "&api_key=",api_key, "&file_type=",file_type) raw.data <- readLines(url, warn = "F",encoding="UTF-8") dat<- fromJSON(raw.data) temp<-zoo(as.numeric(dat$observations$value),as.Date(c(dat$observations$date))) if(i==1){ ts<-temp }else{ ts<-na.locf(merge(ts,temp)) colnames(ts)[i]<-lst_series[i] } } colnames(ts)[1] <- lst_series[1] #첫번째 컬럼이름을 정의
  20. 20. 데이터 분석(FRED) #NA값 제거 ts<-ts[!is.na(ts[,3]),] #1차차분 ts.diff1 <- diff(ts,lag=1) #ACF(autocorrelation) 그래프 acf(as.numeric(ts.diff1[,1]),main=colnames(ts)[1]) #전기대비 증감 ts.rate <- ts.diff1/ts #dataframe으로 변환 df<- data.frame(ts) #Plot 그리기 plot(x=as.Date(rownames(df)),y=df[,1],type="l", xlab="date",ylab=colnames(df)[1]) #회귀분석 summary(lm(CPIAUCSL~UNRATE+FEDFUNDS, data=df))
  21. 21. Web Scrapping(국토교통부)
  22. 22. Web Scrapping(국토교통부) dongCode = "1168010600" danjiCode = "ALL" srhYear = "2014" srhPeriod = "1" gubunRadio2 = "1" url = paste0("http://rt.molit.go.kr/rtApt.do?cmd=getTradeAptLocal&dongCode=", dongCode,"&danjiCode=",danjiCode,"&srhYear=",srhYear, "&srhPeriod=",srhPeriod,"&gubunRadio2=",gubunRadio2) raw.data <- readLines(url, warn = "F",encoding="UTF-8") dat<- fromJSON(raw.data) str(dat) df<-data.frame(cbind( dat$detailList$APT_CODE,dat$detailList$AREA, dat$detailList$MONTH,dat$detailList$SUM_AMT)) write.csv(df, file=“aptTrans.csv”)
  23. 23. Web Scrapping(국토교통부) –대용량 dongCode = "1168010600" danjiCode = "ALL" gubunRadio2 = "1“ dft <- data.frame() for(i in 2006:2014){ for(j in 1:4){ url = paste0("http://rt.molit.go.kr/rtApt.do?cmd=getTradeAptLocal&dongCode=", dongCode,"&danjiCode=",danjiCode,"&srhYear=",i, "&srhPeriod=",j,"&gubunRadio2=",gubunRadio2) raw.data <- readLines(url, warn = "F",encoding="UTF-8") dat<- fromJSON(raw.data) df<-data.frame(cbind(dat$detailList$APT_CODE,dat$detailList$AREA, dat$detailList$MONTH,dat$detailList$SUM_AMT)) dft<-rbind(dft,df) } }
  24. 24. But, Quantmod Yahoo! Finance, FRED, Google Finance, Oanda, The Currency Site 의 데이터를 함수형식으로 제공 - http://www.quantmod.com/
  25. 25. And, Quandl 9백만개가 넘는 데이터셋에서 함수형태로 데이터를 제공 - http://www.quandl.com/

×