Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
5
2020 2 1
1 1
2 1
3 Excel 2
3.1 GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Excel . ....
5
1
• Excel, CSV, Stata JSON, XML
•
•
• R
2
• Excel : Microsoft Excel .xls .xlsx
• CSV : Comma Separated Values
• Stata : ...
3.1 GDP 5
3.1 GDP
GDP
(1) =⇒ =⇒
(2) 23 2008SNA 2018 30
2011 2008SNA
I 3
3.2 Excel 5
(3) IV. 1
30ffm1rn_jp.xlsx
3.2 Excel
R R
readxl Excel
readxl
> install.packages("readxl")
readxl
> library(rea...
3.3 Excel 5
> gdp <- read_excel(" .xls") #
3.3.1
R R working
directory R getwd()
> getwd() #
[1] "/Users/shito/Documents/g...
3.3 Excel 5
> gdp <- read_excel("data/30ffm1rn_jp.xlsx", skip=6, col_names=TRUE)
> dim(gdp) #
[1] 62 26
gdp 1–3
1–4
> gdp[...
3.4 5
$ 2010: num 286647 280524 279478 1983 942 ...
$ 2011: num 288797 282050 280876 1864 690 ...
$ 2012: num 293397 28611...
3.4 5
1 Excel
NA read_excel()
trim=TRUE GDP
> gdp[,1] # Excel NA
3.4.1
NA
> gdp <- gdp[!is.na(gdp[,1]),] # NA
> dim(gdp) #...
3.4 5
> 1:dim(gdp)[1]-3 # (1:dim(gdp)[1]) - 3
[1] -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[...
3.4 5
Excel
cn
> gdp <- gdp[cn!=" ", ]
> cn <- cn[cn!=" "]
> dim(gdp) # 2 55
[1] 52 26
> (1:length(cn))[duplicated(cn)] #
...
5
> paste("Y", colnames(gdp), sep="") #
[1] "Y1994" "Y1995" "Y1996" "Y1997" "Y1998" "Y1999" "Y2000" "Y2001" "Y2002" "Y2003...
5
save() load()
.RData
> save(gdp, file="ch05-GDP.RData") # gdp ch05-GDP.RData
> save(gdp, cn, file="ch05-GDP.RData") # gd...
5
> rownames(gdp)[c(37, 40)]
[1] " " " "
2015
> gdp[c(" ", " "), "Y2015"]
[1] 517223.3 524004.4
Excel
3
GDE
GDP GDP
> max(...
5
1995 2000 2005 2010 2015
440000460000480000500000520000
Year
GDP
plot() plot(x, y) x y
x 1994 2018 1994:2018
y GDP y
2 t...
7.1 5
7.1
(1) URL
http://www.nstac.go.jp/services/ippan-microdata.html
(2)
I 15
7.2 CSV 5
(3) 21 ip-
pan_2009zensho.zip
(4) ippan_2009zensho
2 csv xls ippan_2009zensho_z_dataset.csv
7.2 CSV
CSV read.csv...
5
head()
tail()
> head(microdata) #
> tail(microdata) #
RData
> save(microdata, file="Microdata.RData") #
8 Stata
Stata
St...
5
PPP GDP
GDP GDP
GDP GDP per capita
(1) World Bank GDP ppp per capita
http://data.worldbank.org/indicator/NY.GDP.PCAP.PP....
Upcoming SlideShare
Loading in …5
×

of

第5回 様々なファイル形式の読み込みとデータの書き出し Slide 1 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 2 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 3 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 4 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 5 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 6 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 7 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 8 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 9 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 10 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 11 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 12 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 13 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 14 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 15 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 16 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 17 第5回 様々なファイル形式の読み込みとデータの書き出し Slide 18
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

第5回 様々なファイル形式の読み込みとデータの書き出し

Download to read offline

西南学院大学経済学部 演習1 講義ノート
講義ページ: http://courses.wshito.com/semi1/2020-datascience/index.html

第5回 様々なファイル形式の読み込みとデータの書き出し

  1. 1. 5 2020 2 1 1 1 2 1 3 Excel 2 3.1 GDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.2 Excel . . . . . . . . . . . . . . . . . . 3 3.3 Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4.2 . . . . . . . . . . . . . . . . . . 8 3.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 10 5 11 6 —GDP 11 7 CSV 13 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 7.2 CSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8 Stata 16 9 16 1
  2. 2. 5 1 • Excel, CSV, Stata JSON, XML • • • R 2 • Excel : Microsoft Excel .xls .xlsx • CSV : Comma Separated Values • Stata : Stata Stata • JSON: JavaScript JavaScript • XML: XML Extensible Markup Language Web HTML Hyper Text Markup Language XML R Excel CSV Stata 3 3 Excel GDP Excel GDP R I 2
  3. 3. 3.1 GDP 5 3.1 GDP GDP (1) =⇒ =⇒ (2) 23 2008SNA 2018 30 2011 2008SNA I 3
  4. 4. 3.2 Excel 5 (3) IV. 1 30ffm1rn_jp.xlsx 3.2 Excel R R readxl Excel readxl > install.packages("readxl") readxl > library(readxl) Excel read_excel() > help(read_excel) 3.3 Excel Excel Excel 3 read_excel() 1 1 read_excel() Excel I 4
  5. 5. 3.3 Excel 5 > gdp <- read_excel(" .xls") # 3.3.1 R R working directory R getwd() > getwd() # [1] "/Users/shito/Documents/git-repositories/R-programming-lecnote/handout" setwd() > setwd("/Users/shito") # /Users/shito Mac Windows E seminer > setwd("E:/seminer") 30ffm1rn_jp.xlsx Windows E > gdp <- read_excel("E:/ /30ffm1rn_jp.xlsx") # Windows E 30ffm1rn jp.xlsx data > gdp <- read_excel("data/30ffm1rn_jp.xlsx") 3.3.2 Excel 8 7 6 skip=6 1 col_names TRUE col_names TRUE I 5
  6. 6. 3.3 Excel 5 > gdp <- read_excel("data/30ffm1rn_jp.xlsx", skip=6, col_names=TRUE) > dim(gdp) # [1] 62 26 gdp 1–3 1–4 > gdp[1:3, 1:4] # A tibble: 3 x 4 ...1 `1994` `1995` `1996` <chr> <dbl> <dbl> <dbl> 1 245684. 251970. 258152. 2 241526. 247606. 253738. 3 238193. 243885. 250301. A tibble: 3 x 4 gdp Tibble Tibble > is.data.frame(gdp) # [1] TRUE > gdp <- as.data.frame(gdp) # as. Excel R str() > str(gdp) 'data.frame': 62 obs. of 26 variables: $ ...1: chr " " " " " " $ 1994: num 245684 241526 238193 3841 220 ... $ 1995: num 251970 247606 243885 4436 224 ... $ 1996: num 258152 253738 250301 4083 326 ... $ 1997: num 255782 251424 248477 3457 349 ... $ 1998: num 256658 251557 248824 3160 328 ... $ 1999: num 260436 254945 251806 3532 264 ... $ 2000: num 263972 259136 256157 3314 262 ... $ 2001: num 268881 263698 261270 2602 268 ... $ 2002: num 271953 266957 264282 2995 357 ... $ 2003: num 273850 268450 266218 2814 665 ... $ 2004: num 277097 271716 269038 3435 815 ... $ 2005: num 281427 275868 273860 2808 873 ... $ 2006: num 283494 277848 276444 2092 726 ... $ 2007: num 285850 280312 279124 1977 815 ... $ 2008: num 280055 274560 273435 1898 789 ... $ 2009: num 282488 276710 275792 1752 826 ... I 6
  7. 7. 3.4 5 $ 2010: num 286647 280524 279478 1983 942 ... $ 2011: num 288797 282050 280876 1864 690 ... $ 2012: num 293397 286118 285226 1846 954 ... $ 2013: num 301514 294138 294092 1360 1290 ... $ 2014: num 293681 286783 287605 1162 1882 ... $ 2015: num 295661 288039 289877 1088 2767 ... $ 2016: num 295534 287605 289346 1280 2954 ... $ 2017: num 298875 290958 293426 1194 3534 ... $ 2018: num 299047 291331 294214 1288 4058 ... 1 62 obs. of 26 variables 62 26 $ 1 ...1 character 2 1994 numeric Excel 1 1 2 1994 2018 3.4 1 gdp 1 rownames() > rownames(gdp) <- gdp[, 1] # <-- ’row.names’ b Excel I 7
  8. 8. 3.4 5 1 Excel NA read_excel() trim=TRUE GDP > gdp[,1] # Excel NA 3.4.1 NA > gdp <- gdp[!is.na(gdp[,1]),] # NA > dim(gdp) # 62 57 [1] 57 26 Excel 3 gdp 2 NA 3 > gdp <- gdp[1:(dim(gdp)[1]-3), ] # 3 > dim(gdp) # 57 54 [1] 54 26 3 1:(dim(gdp)[1]-3) I 8
  9. 9. 3.4 5 > 1:dim(gdp)[1]-3 # (1:dim(gdp)[1]) - 3 [1] -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [28] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 > 1:(dim(gdp)[1]-3) # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [28] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 3.4.2 gsub() global substitution gsub(" ", " ", ) > cn <- gsub(" ", "", gdp[, 1]) # > cn[1:7] # [1] " " [2] " " [3] " " [4] " " [5] " ) " [6] " " [7] " " > cn <- gsub(" ", "", cn) # cn gdp duplicated() TRUE 1 10 5 1 duplicated() 2 1 5 10 TRUE > n <- 1:10 # 1 10 > n[c(5, 10)] <- 1 # 5 10 1 > n [1] 1 2 3 4 1 6 7 8 9 1 > duplicated(n) # 5 10 TRUE [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE > n[duplicated(n)] # 2 [1] 1 1 duplicated() cn > cn[duplicated(cn)] [1] " " " " " " " " I 9
  10. 10. 3.4 5 Excel cn > gdp <- gdp[cn!=" ", ] > cn <- cn[cn!=" "] > dim(gdp) # 2 55 [1] 52 26 > (1:length(cn))[duplicated(cn)] # [1] 18 19 27 > n <- (1:length(cn))[duplicated(cn)][1:2] # 2 > cn[n] # [1] " " " " > cn[n]<- paste(cn[n], " ", sep="") # paste > cn[n] # [1] " " " " 3.4.3 Excel cn gdp 3.4.4 > colnames(gdp) [1] "1994" "1995" "1996" "1997" "1998" "1999" "2000" "2001" "2002" "2003" "2004" [12] "2005" "2006" "2007" "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" [23] "2016" "2017" "2018" $ 1994 > gdp$1994 # <-- 1994 > gdp$"1994" Y R I 10
  11. 11. 5 > paste("Y", colnames(gdp), sep="") # [1] "Y1994" "Y1995" "Y1996" "Y1997" "Y1998" "Y1999" "Y2000" "Y2001" "Y2002" "Y2003" [11] "Y2004" "Y2005" "Y2006" "Y2007" "Y2008" "Y2009" "Y2010" "Y2011" "Y2012" "Y2013" [21] "Y2014" "Y2015" "Y2016" "Y2017" "Y2018" > colnames(gdp) <- paste("Y", colnames(gdp), sep="") > gdp$Y1994 # $ [1] 245683.7 241525.9 238192.7 3841.3 219.6 204412.5 36801.1 4189.9 72309.6 [10] 284892.9 33267.7 128139.8 127861.1 86091.4 26920.2 60788.2 43137.2 1435.2 [19] 9977.2 31758.1 -79.6 -512.4 -9.6 -356.8 -84.9 2.4 721.2 [28] 21.2 730.5 -10606.3 34762.1 29701.9 5098.4 45368.4 31786.3 12578.1 [37] NA 426889.1 -11552.5 19152.4 446041.4 3802.2 14738.2 10936.0 449843.6 [46] 446414.9 331638.4 114933.2 421532.9 237272.1 34446.7 45303.1 1994 NA NA NA > rownames(gdp)[is.na(gdp$Y1994)] # NA [1] "" > gdp <- gdp[rownames(gdp) != "", ] # gdp > gdp$Y1994 [1] 245683.7 241525.9 238192.7 3841.3 219.6 204412.5 36801.1 4189.9 72309.6 [10] 284892.9 33267.7 128139.8 127861.1 86091.4 26920.2 60788.2 43137.2 1435.2 [19] 9977.2 31758.1 -79.6 -512.4 -9.6 -356.8 -84.9 2.4 721.2 [28] 21.2 730.5 -10606.3 34762.1 29701.9 5098.4 45368.4 31786.3 12578.1 [37] 426889.1 -11552.5 19152.4 446041.4 3802.2 14738.2 10936.0 449843.6 446414.9 [46] 331638.4 114933.2 421532.9 237272.1 34446.7 45303.1 Excel 3 4 > gdp[1:3, 1:4] Y1994 Y1995 Y1996 Y1997 245683.7 251970.1 258151.7 255781.6 241525.9 247606.3 253738.1 251423.5 238192.7 243885.2 250301.4 248476.6 4 R ls() > ls() # list [1] "cn" "gdp" "n" I 11
  12. 12. 5 save() load() .RData > save(gdp, file="ch05-GDP.RData") # gdp ch05-GDP.RData > save(gdp, cn, file="ch05-GDP.RData") # gdp cn 5 load() gdp cn rm() remove > rm(gdp, cn) # gdp cn remove > ls() # gdp cn [1] "n" load() > load("ch05-GDP.RData") > ls() [1] "cn" "gdp" "n" > gdp[1:3, 1:4] Y1994 Y1995 Y1996 Y1997 245683.7 251970.1 258151.7 255781.6 241525.9 247606.3 253738.1 251423.5 238192.7 243885.2 250301.4 248476.6 6 —GDP R GDP GDP GDI GDE 1 37 40 1 I I 12
  13. 13. 5 > rownames(gdp)[c(37, 40)] [1] " " " " 2015 > gdp[c(" ", " "), "Y2015"] [1] 517223.3 524004.4 Excel 3 GDE GDP GDP > max(gdp[" ",]) # GDP [1] 533667.9 > min(gdp[" ",]) # GDP [1] 426889.1 Excel 10 100 GDP GDP GDP y > y <- gdp[" ",] > colnames(gdp)[y == max(y)] # y y [1] "Y2018" > colnames(gdp)[y == min(y)] # y y [1] "Y1994" GDP 2018 1994 GDP > plot(1994:2018, y, type="l", xlab="Year", ylab="GDP") I 13
  14. 14. 5 1995 2000 2005 2010 2015 440000460000480000500000520000 Year GDP plot() plot(x, y) x y x 1994 2018 1994:2018 y GDP y 2 type="l" line xlab ylab x y GDP 4 GDP 2009 2008 7 CSV CSV Comma Separated Values CSV CSV 5 I 14
  15. 15. 7.1 5 7.1 (1) URL http://www.nstac.go.jp/services/ippan-microdata.html (2) I 15
  16. 16. 7.2 CSV 5 (3) 21 ip- pan_2009zensho.zip (4) ippan_2009zensho 2 csv xls ippan_2009zensho_z_dataset.csv 7.2 CSV CSV read.csv() > help(read.csv) Usage ippan_2009zensho_z_dataset.csv 5 6 7 skip=5 header=TRUE Excel Shift JIS R UTF8 encoding="Shift_JIS" > microdata <- read.csv("data/ippan_2009zensho_z_dataset.csv", # + skip=5, header=TRUE, encoding="Shift_JIS") > dim(microdata) # [1] 45811 20 I 16
  17. 17. 5 head() tail() > head(microdata) # > tail(microdata) # RData > save(microdata, file="Microdata.RData") # 8 Stata Stata Stata .dta Stata Stata R Stata read.dta() foreign > library(foreign) > library(help=foreign) # foreign > help(read.dta) # read.dta Stata read.dta() > stata.data <- read.dta(" .dta") # Stata dta 9 GDP GDP R GDP GDP GDP 10 10 GDP Purchasing Power Parity Rate PPP I 17
  18. 18. 5 PPP GDP GDP GDP GDP GDP per capita (1) World Bank GDP ppp per capita http://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD (2) Download CSV (3) API_NY API_NY.GDP.PCAP.PP.CD_DS2_en_csv_v2_######.csv ###### R ppp (4) (5) ppp 1 1 (6) Indicator.Name Indicator.Code (7) 2015 GDP (8) GDP GDP (9) ppp WorldBank_GDP.RData (10) ppp (11) WorldBank_GDP.RData ppp I 18
  • JinseiHamamoto

    Apr. 29, 2020

西南学院大学経済学部 演習1 講義ノート 講義ページ: http://courses.wshito.com/semi1/2020-datascience/index.html

Views

Total views

130

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

4

Shares

0

Comments

0

Likes

1

×