SlideShare a Scribd company logo
1 of 13
Download to read offline
Here is the notebook which I used for the Ann Arbor Chapter of the
American Statistical Association Class
'Up and Running with Python'
A copy is available at: https://drive.google.com/open?id=1-lgkJ9pilNsvQ1NaJTR3Xf8j9_u0xp07
(https://drive.google.com/open?id=1-lgkJ9pilNsvQ1NaJTR3Xf8j9_u0xp07)
This is a standard module import list:
In [2]: # For system information:
import sys
import os
import pandas as pd
import pandasql as sqla # for SQL work.
import numpy as np
import scipy
import statsmodels
# For graphs:
import matplotlib as plt
import seaborn as sns
# For dates and times:
import datetime
import time
import math as mth
# this causes matplotlib graphs to be inside the jupyter notebook, rather than appe
aring in a separate window:
%matplotlib inline
In [ ]:
First Step - Import a Data Set from .CSV
Pandas (Panel DAta Sets) is the major Python module for data wrangling. The major object is the Pandas DataFrame
(corresponding to the R data frame or SAS data set). Pandas has a large number of import funtions. We will use
read_csv() to import the data set.
In [3]: pwt9 = pd.read_csv('Datapwt9.0.csv')
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
1 of 13 2/28/2020, 10:52 AM
Explore the object, using various functions:
In [4]: print(type(pwt9))
In [5]: pwt9.shape
In [6]: pwt9.head()
In [7]: pwt9.tail()
'Index' is what Pandas DataFrames use for the names of columns and rows. If there is no row names specified, the
row number is used. Note that '[...]' is a list.
<class 'pandas.core.frame.DataFrame'>
Out[5]: (11830, 48)
Out[6]:
Unnamed:
0
country isocode year currency rgdpe rgdpo pop emp avh ... csh_g csh_x csh_m csh_r
0 ABW-1950 Aruba ABW 1950
Aruban
Guilder
NaN NaN NaN NaN NaN ... NaN NaN NaN NaN
1 ABW-1951 Aruba ABW 1951
Aruban
Guilder
NaN NaN NaN NaN NaN ... NaN NaN NaN NaN
2 ABW-1952 Aruba ABW 1952
Aruban
Guilder
NaN NaN NaN NaN NaN ... NaN NaN NaN NaN
3 ABW-1953 Aruba ABW 1953
Aruban
Guilder
NaN NaN NaN NaN NaN ... NaN NaN NaN NaN
4 ABW-1954 Aruba ABW 1954
Aruban
Guilder
NaN NaN NaN NaN NaN ... NaN NaN NaN NaN
5 rows × 48 columns
Out[7]:
Unnamed:
0
country isocode year currency rgdpe rgdpo pop emp avh
11825 ZWE-2010 Zimbabwe ZWE 2010
US
Dollar
20652.718750 21053.855469 13.973897 6.298438 NaN
11826 ZWE-2011 Zimbabwe ZWE 2011
US
Dollar
20720.435547 21592.298828 14.255592 6.518841 NaN
11827 ZWE-2012 Zimbabwe ZWE 2012
US
Dollar
23708.654297 24360.527344 14.565482 6.248271 NaN
11828 ZWE-2013 Zimbabwe ZWE 2013
US
Dollar
27011.988281 28157.886719 14.898092 6.287056 NaN
11829 ZWE-2014 Zimbabwe ZWE 2014
US
Dollar
28495.554688 29149.708984 15.245855 6.499974 NaN
5 rows × 48 columns
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
2 of 13 2/28/2020, 10:52 AM
In [8]: pwt9.columns
You can assign column names to a list, for changing them or using them in formulae:
In [9]: column_names = list(pwt9.columns)
The reason I put them into a list function was that the column names are a data type in Pandas called an 'Index', and
Python will not allow editing it.
In [10]: type(column_names)
In [11]: column_names[0] = 'Unique_ID'
We can rename columns by assignment:
In [12]: pwt9.columns = column_names
In [13]: pwt9.columns
In the printouts above, Python used the row number (starting at 0), becaue there was no row label variable assigned,
which Pandas calls a 'Row Index'. You can assign one - for this data set, the variable 'Unique_ID':
In [14]: pwt9.set_index('Unique_ID', inplace=True)
Out[8]: Index(['Unnamed: 0', 'country', 'isocode', 'year', 'currency', 'rgdpe',
'rgdpo', 'pop', 'emp', 'avh', 'hc', 'ccon', 'cda', 'cgdpe', 'cgdpo',
'ck', 'ctfp', 'cwtfp', 'rgdpna', 'rconna', 'rdana', 'rkna', 'rtfpna',
'rwtfpna', 'labsh', 'delta', 'xr', 'pl_con', 'pl_da', 'pl_gdpo',
'i_cig', 'i_xm', 'i_xr', 'i_outlier', 'cor_exp', 'statcap', 'csh_c',
'csh_i', 'csh_g', 'csh_x', 'csh_m', 'csh_r', 'pl_c', 'pl_i', 'pl_g',
'pl_x', 'pl_m', 'pl_k'],
dtype='object')
Out[10]: list
Out[13]: Index(['Unique_ID', 'country', 'isocode', 'year', 'currency', 'rgdpe', 'rgdpo',
'pop', 'emp', 'avh', 'hc', 'ccon', 'cda', 'cgdpe', 'cgdpo', 'ck',
'ctfp', 'cwtfp', 'rgdpna', 'rconna', 'rdana', 'rkna', 'rtfpna',
'rwtfpna', 'labsh', 'delta', 'xr', 'pl_con', 'pl_da', 'pl_gdpo',
'i_cig', 'i_xm', 'i_xr', 'i_outlier', 'cor_exp', 'statcap', 'csh_c',
'csh_i', 'csh_g', 'csh_x', 'csh_m', 'csh_r', 'pl_c', 'pl_i', 'pl_g',
'pl_x', 'pl_m', 'pl_k'],
dtype='object')
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
3 of 13 2/28/2020, 10:52 AM
In [15]: pwt9.head()
The 'inplace=True' makes Pandas alter the original data set, rather than making a copy.
You can select items in a DataFrame by row/column number, or row/column name:
the first uses the iloc[] command (for integer location?), the second used the 'loc[]' command:
In [16]: pwt9.iloc[0,0]
This selects the row 'ZWE-2010', the ',:' selects all columns.
Out[15]:
country isocode year currency rgdpe rgdpo pop emp avh hc ... csh_g csh_x csh_m
Unique_ID
ABW-1950 Aruba ABW 1950
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1951 Aruba ABW 1951
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1952 Aruba ABW 1952
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1953 Aruba ABW 1953
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1954 Aruba ABW 1954
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
5 rows × 47 columns
Out[16]: 'Aruba'
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
4 of 13 2/28/2020, 10:52 AM
In [17]: pwt9.loc['ZWE-2010',:]
Out[17]: country Zimbabwe
isocode ZWE
year 2010
currency US Dollar
rgdpe 20652.7
rgdpo 21053.9
pop 13.9739
emp 6.29844
avh NaN
hc 2.3726
ccon 21862.8
cda 26023
cgdpe 20537.3
cgdpo 21236.8
ck 34274.3
ctfp 0.234834
cwtfp 0.277434
rgdpna 19295.2
rconna 20977.2
rdana 25035.1
rkna 78953.3
rtfpna 0.932604
rwtfpna 0.863711
labsh 0.555796
delta 0.0373084
xr 1
pl_con 0.442739
pl_da 0.458783
pl_gdpo 0.443672
i_cig interpolated
i_xm benchmark
i_xr market
i_outlier no
cor_exp NaN
statcap 45.5556
csh_c 0.902229
csh_i 0.195897
csh_g 0.127251
csh_x 0.214657
csh_m -0.454497
csh_r 0.0144622
pl_c 0.44717
pl_i 0.5431
pl_g 0.411316
pl_x 0.701797
pl_m 0.606324
pl_k 1.01514
Name: ZWE-2010, dtype: object
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
5 of 13 2/28/2020, 10:52 AM
In [18]: pwt9.iloc[0,0:10]
In [19]: pwt9.iloc[0:10,]
To select a column, just use the column name(s) within quotes and brackets.
Out[18]: country Aruba
isocode ABW
year 1950
currency Aruban Guilder
rgdpe NaN
rgdpo NaN
pop NaN
emp NaN
avh NaN
hc NaN
Name: ABW-1950, dtype: object
Out[19]:
country isocode year currency rgdpe rgdpo pop emp avh hc ... csh_g csh_x csh_m
Unique_ID
ABW-1950 Aruba ABW 1950
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1951 Aruba ABW 1951
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1952 Aruba ABW 1952
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1953 Aruba ABW 1953
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1954 Aruba ABW 1954
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1955 Aruba ABW 1955
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1956 Aruba ABW 1956
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1957 Aruba ABW 1957
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1958 Aruba ABW 1958
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
ABW-1959 Aruba ABW 1959
Aruban
Guilder
NaN NaN NaN NaN NaN NaN ... NaN NaN NaN
10 rows × 47 columns
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
6 of 13 2/28/2020, 10:52 AM
In [20]: pwt9['country']
Missing Values (https://pandas.pydata.org/pandas-docs/stable/user_guide
/missing_data.html (https://pandas.pydata.org/pandas-docs/stable/user_guide
/missing_data.html))
Pandas uses 'NaN' and 'NA' as missing value symbols. You can also set '-inf' and 'inf' to be treated as missing values
with the command: "pandas.options.mode.use_inf_as_na = True."
Python uses 'None'.
There are other types, such as 'NaT' for datetime64[ns]
When comparing/testing values, use the functions 'isna()' and 'notna()' functions:
Out[20]: Unique_ID
ABW-1950 Aruba
ABW-1951 Aruba
ABW-1952 Aruba
ABW-1953 Aruba
ABW-1954 Aruba
...
ZWE-2010 Zimbabwe
ZWE-2011 Zimbabwe
ZWE-2012 Zimbabwe
ZWE-2013 Zimbabwe
ZWE-2014 Zimbabwe
Name: country, Length: 11830, dtype: object
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
7 of 13 2/28/2020, 10:52 AM
In [21]: pwt9.loc['ABW-1950',]
In [22]: pwt9.loc['ABW-1950','rgdpe']
In [23]: pwt9.loc['ABW-1950','rgdpe'] == 'nan'
In [24]: pd.isna(pwt9.loc['ABW-1950','rgdpe'])
Out[21]: country Aruba
isocode ABW
year 1950
currency Aruban Guilder
rgdpe NaN
rgdpo NaN
pop NaN
emp NaN
avh NaN
hc NaN
ccon NaN
cda NaN
cgdpe NaN
cgdpo NaN
ck NaN
ctfp NaN
cwtfp NaN
rgdpna NaN
rconna NaN
rdana NaN
rkna NaN
rtfpna NaN
rwtfpna NaN
labsh NaN
delta NaN
xr NaN
pl_con NaN
pl_da NaN
pl_gdpo NaN
i_cig NaN
i_xm NaN
i_xr NaN
i_outlier NaN
cor_exp NaN
statcap NaN
csh_c NaN
csh_i NaN
csh_g NaN
csh_x NaN
csh_m NaN
csh_r NaN
pl_c NaN
pl_i NaN
pl_g NaN
pl_x NaN
pl_m NaN
pl_k NaN
Name: ABW-1950, dtype: object
Out[22]: nan
Out[23]: False
Out[24]: True
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
8 of 13 2/28/2020, 10:52 AM
The usual caveats about propagation of missing values applies.
Group By: split-apply-combine
In [25]: grouped_pwt9 = pwt9.groupby(['isocode','year'])
In [26]: grouped_pwt9.count()
You can also add the groupby() and count() methods directly.
Out[26]:
country currency rgdpe rgdpo pop emp avh hc ccon cda ... csh_g csh_x csh_m csh_r
isocode year
ABW
1950 1 1 0 0 0 0 0 0 0 0 ... 0 0 0
1951 1 1 0 0 0 0 0 0 0 0 ... 0 0 0
1952 1 1 0 0 0 0 0 0 0 0 ... 0 0 0
1953 1 1 0 0 0 0 0 0 0 0 ... 0 0 0
1954 1 1 0 0 0 0 0 0 0 0 ... 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
ZWE
2010 1 1 1 1 1 1 0 1 1 1 ... 1 1 1
2011 1 1 1 1 1 1 0 1 1 1 ... 1 1 1
2012 1 1 1 1 1 1 0 1 1 1 ... 1 1 1
2013 1 1 1 1 1 1 0 1 1 1 ... 1 1 1
2014 1 1 1 1 1 1 0 1 1 1 ... 1 1 1
11830 rows × 45 columns
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
9 of 13 2/28/2020, 10:52 AM
In [27]: pwt9.groupby('isocode').count()
Sumary Statistics (see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html
(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html))
In [28]: pwt9.describe()
Out[27]:
country year currency rgdpe rgdpo pop emp avh hc ccon ... csh_g csh_x csh_m csh_r
isocode
ABW 65 65 65 45 45 45 24 0 0 45 ... 45 45 45 45
AGO 65 65 65 45 45 45 45 0 45 45 ... 45 45 45 45
AIA 65 65 65 45 45 45 29 0 0 45 ... 45 45 45 45
ALB 65 65 65 45 45 45 45 0 45 45 ... 45 45 45 45
ARE 65 65 65 45 45 45 45 0 45 45 ... 45 45 45 45
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
VNM 65 65 65 45 45 45 45 45 45 45 ... 45 45 45 45
YEM 65 65 65 26 26 26 26 0 26 26 ... 26 26 26 26
ZAF 65 65 65 65 65 65 65 12 65 65 ... 65 65 65 65
ZMB 65 65 65 60 60 60 60 0 60 60 ... 60 60 60 60
ZWE 65 65 65 61 61 61 35 0 61 61 ... 61 61 61 61
182 rows × 46 columns
Out[28]:
year rgdpe rgdpo pop emp avh hc
count 11830.000000 9.439000e+03 9.439000e+03 9439.000000 8244.000000 3319.000000 7867.000000 9.439000e+03
mean 1982.000000 2.530908e+05 2.508545e+05 30.090573 14.218857 1995.704047 2.032653 1.856605e+05
std 18.762456 9.973281e+05 9.899123e+05 111.489127 56.500008 271.514641 0.708940 7.308969e+05
min 1950.000000 1.854156e+01 1.108702e+00 0.004377 0.001180 1362.503252 1.007038 1.459605e+01
25% 1966.000000 5.847668e+03 6.044927e+03 1.593420 0.941974 1816.496296 1.408157 4.988803e+03
50% 1982.000000 2.458191e+04 2.487366e+04 5.985658 2.976475 1979.000000 1.916717 1.975266e+04
75% 1998.000000 1.304465e+05 1.324746e+05 19.533204 8.403848 2176.902193 2.608796 9.644857e+04
max 2014.000000 1.708030e+07 1.713595e+07 1369.435670 798.367798 3042.446905 3.734285 1.368595e+07
8 rows × 40 columns
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
10 of 13 2/28/2020, 10:52 AM
In [29]: pwt9.groupby('isocode').describe()
Plotting, using Matplotlib
There are a number of plotting functions
Out[29]:
year rgdpe ... pl_m
count mean std min 25% 50% 75% max count mean ... 75%
isocode
ABW 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 2524.893969 ... 0.648430
AGO 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 55150.395095 ... 0.572490
AIA 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 207.284460 ... 0.566618
ALB 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 14805.517654 ... 0.580224
ARE 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 255698.718403 ... 0.560910
... ... ... ... ... ... ... ... ... ... ... ... ...
VNM 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 153119.746441 ... 0.550852
YEM 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 26.0 41752.590952 ... 0.642233
ZAF 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 65.0 285953.341587 ... 0.552549
ZMB 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 60.0 16092.146663 ... 0.572661
ZWE 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 61.0 23284.501889 ... 0.542417
182 rows × 320 columns
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
11 of 13 2/28/2020, 10:52 AM
In [30]: import matplotlib.pyplot as plt
import numpy as np
plt.xlabel('Log(Population)')
plt.title('Histogram of Log(Population)')
# Adding the option 'density=True' would give the density.
plt.hist(np.log(pwt9['pop']), bins=30)
plt.show()
The command '%matplotlib inline' in the first cell at the top of the notebook causes Matplotlib to put the plot in the
notebook, rather than in a new window.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
C:ProgramDataAnaconda3libsite-packagespandascoreseries.py:853: RuntimeWar
ning: invalid value encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)
C:ProgramDataAnaconda3libsite-packagesnumpylibhistograms.py:824: RuntimeW
arning: invalid value encountered in greater_equal
keep = (tmp_a >= first_edge)
C:ProgramDataAnaconda3libsite-packagesnumpylibhistograms.py:825: RuntimeW
arning: invalid value encountered in less_equal
keep &= (tmp_a <= last_edge)
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
12 of 13 2/28/2020, 10:52 AM
In [ ]:
Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U...
13 of 13 2/28/2020, 10:52 AM

More Related Content

What's hot

python高级内存管理
python高级内存管理python高级内存管理
python高级内存管理rfyiamcool
 
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPythonByterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPythonakaptur
 
Node.js API pitfalls
Node.js API pitfallsNode.js API pitfalls
Node.js API pitfallsTorontoNodeJS
 
Hypercritical C++ Code Review
Hypercritical C++ Code ReviewHypercritical C++ Code Review
Hypercritical C++ Code ReviewAndrey Karpov
 
Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)
Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)
Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)Shift Conference
 
Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...
Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...
Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...DevOpsDays Tel Aviv
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python Chetan Giridhar
 

What's hot (10)

python高级内存管理
python高级内存管理python高级内存管理
python高级内存管理
 
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPythonByterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
 
Node.js API pitfalls
Node.js API pitfallsNode.js API pitfalls
Node.js API pitfalls
 
Hypercritical C++ Code Review
Hypercritical C++ Code ReviewHypercritical C++ Code Review
Hypercritical C++ Code Review
 
Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)
Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)
Shift Remote FRONTEND: Reactivity in Vue.JS 3 - Marko Boskovic (Barrage)
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
 
Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...
Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...
Massively Distributed Backups at Facebook Scale - Shlomo Priymak, Facebook - ...
 
MongoDB
MongoDBMongoDB
MongoDB
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python
 
Eta
EtaEta
Eta
 

Similar to Up and running with python

Données hospitalières relatives à l'épidémie de COVID-19 FRANCE
Données hospitalières relatives à l'épidémie de COVID-19 FRANCEDonnées hospitalières relatives à l'épidémie de COVID-19 FRANCE
Données hospitalières relatives à l'épidémie de COVID-19 FRANCENalron
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData StackPeadar Coyle
 
Pandas+postgre sql 實作 with code
Pandas+postgre sql 實作 with codePandas+postgre sql 實作 with code
Pandas+postgre sql 實作 with codeTim Hong
 
Ns2: OTCL - PArt II
Ns2: OTCL - PArt IINs2: OTCL - PArt II
Ns2: OTCL - PArt IIAjit Nayak
 
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RPaul Bradshaw
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Unit-5 Time series data Analysis.pptx
Unit-5 Time series data Analysis.pptxUnit-5 Time series data Analysis.pptx
Unit-5 Time series data Analysis.pptxSheba41
 
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiCassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiModern Data Stack France
 
Poker, packets, pipes and Python
Poker, packets, pipes and PythonPoker, packets, pipes and Python
Poker, packets, pipes and PythonRoger Barnes
 
Tokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java DeveloperTokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java DeveloperConnor McDonald
 
FØCAL Boston AiR - Computer Vision Tracing and Hardware Simulation
FØCAL Boston AiR - Computer Vision Tracing and Hardware SimulationFØCAL Boston AiR - Computer Vision Tracing and Hardware Simulation
FØCAL Boston AiR - Computer Vision Tracing and Hardware SimulationFØCAL
 
Seaborn graphing present
Seaborn graphing presentSeaborn graphing present
Seaborn graphing presentYilin Zeng
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Mingxuan Li
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Perth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL TechniquesPerth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL TechniquesConnor McDonald
 

Similar to Up and running with python (20)

Données hospitalières relatives à l'épidémie de COVID-19 FRANCE
Données hospitalières relatives à l'épidémie de COVID-19 FRANCEDonnées hospitalières relatives à l'épidémie de COVID-19 FRANCE
Données hospitalières relatives à l'épidémie de COVID-19 FRANCE
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Pandas+postgre sql 實作 with code
Pandas+postgre sql 實作 with codePandas+postgre sql 實作 with code
Pandas+postgre sql 實作 with code
 
Ns2: OTCL - PArt II
Ns2: OTCL - PArt IINs2: OTCL - PArt II
Ns2: OTCL - PArt II
 
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Unit-5 Time series data Analysis.pptx
Unit-5 Time series data Analysis.pptxUnit-5 Time series data Analysis.pptx
Unit-5 Time series data Analysis.pptx
 
Cluto presentation
Cluto presentationCluto presentation
Cluto presentation
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiCassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
 
Poker, packets, pipes and Python
Poker, packets, pipes and PythonPoker, packets, pipes and Python
Poker, packets, pipes and Python
 
Tokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java DeveloperTokyo APAC Groundbreakers tour - The Complete Java Developer
Tokyo APAC Groundbreakers tour - The Complete Java Developer
 
FØCAL Boston AiR - Computer Vision Tracing and Hardware Simulation
FØCAL Boston AiR - Computer Vision Tracing and Hardware SimulationFØCAL Boston AiR - Computer Vision Tracing and Hardware Simulation
FØCAL Boston AiR - Computer Vision Tracing and Hardware Simulation
 
Seaborn graphing present
Seaborn graphing presentSeaborn graphing present
Seaborn graphing present
 
Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance Data Mining & Analytics for U.S. Airlines On-Time Performance
Data Mining & Analytics for U.S. Airlines On-Time Performance
 
PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Mona cheatsheet
Mona cheatsheetMona cheatsheet
Mona cheatsheet
 
Bash tricks
Bash tricksBash tricks
Bash tricks
 
Perth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL TechniquesPerth APAC Groundbreakers tour - SQL Techniques
Perth APAC Groundbreakers tour - SQL Techniques
 

More from Barry DeCicco

Easy HTML Tables in RStudio with Tabyl and kableExtra
Easy HTML Tables in RStudio with Tabyl and kableExtraEasy HTML Tables in RStudio with Tabyl and kableExtra
Easy HTML Tables in RStudio with Tabyl and kableExtraBarry DeCicco
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Barry DeCicco
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysisBarry DeCicco
 
Using RStudio on AWS
Using RStudio on AWSUsing RStudio on AWS
Using RStudio on AWSBarry DeCicco
 
Calling python from r
Calling python from rCalling python from r
Calling python from rBarry DeCicco
 
Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Barry DeCicco
 
Calling r from sas (msug meeting, feb 17, 2018) revised
Calling r from sas (msug meeting, feb 17, 2018)   revisedCalling r from sas (msug meeting, feb 17, 2018)   revised
Calling r from sas (msug meeting, feb 17, 2018) revisedBarry DeCicco
 

More from Barry DeCicco (7)

Easy HTML Tables in RStudio with Tabyl and kableExtra
Easy HTML Tables in RStudio with Tabyl and kableExtraEasy HTML Tables in RStudio with Tabyl and kableExtra
Easy HTML Tables in RStudio with Tabyl and kableExtra
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
Beginning text analysis
Beginning text analysisBeginning text analysis
Beginning text analysis
 
Using RStudio on AWS
Using RStudio on AWSUsing RStudio on AWS
Using RStudio on AWS
 
Calling python from r
Calling python from rCalling python from r
Calling python from r
 
Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)
 
Calling r from sas (msug meeting, feb 17, 2018) revised
Calling r from sas (msug meeting, feb 17, 2018)   revisedCalling r from sas (msug meeting, feb 17, 2018)   revised
Calling r from sas (msug meeting, feb 17, 2018) revised
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Up and running with python

  • 1. Here is the notebook which I used for the Ann Arbor Chapter of the American Statistical Association Class 'Up and Running with Python' A copy is available at: https://drive.google.com/open?id=1-lgkJ9pilNsvQ1NaJTR3Xf8j9_u0xp07 (https://drive.google.com/open?id=1-lgkJ9pilNsvQ1NaJTR3Xf8j9_u0xp07) This is a standard module import list: In [2]: # For system information: import sys import os import pandas as pd import pandasql as sqla # for SQL work. import numpy as np import scipy import statsmodels # For graphs: import matplotlib as plt import seaborn as sns # For dates and times: import datetime import time import math as mth # this causes matplotlib graphs to be inside the jupyter notebook, rather than appe aring in a separate window: %matplotlib inline In [ ]: First Step - Import a Data Set from .CSV Pandas (Panel DAta Sets) is the major Python module for data wrangling. The major object is the Pandas DataFrame (corresponding to the R data frame or SAS data set). Pandas has a large number of import funtions. We will use read_csv() to import the data set. In [3]: pwt9 = pd.read_csv('Datapwt9.0.csv') Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 1 of 13 2/28/2020, 10:52 AM
  • 2. Explore the object, using various functions: In [4]: print(type(pwt9)) In [5]: pwt9.shape In [6]: pwt9.head() In [7]: pwt9.tail() 'Index' is what Pandas DataFrames use for the names of columns and rows. If there is no row names specified, the row number is used. Note that '[...]' is a list. <class 'pandas.core.frame.DataFrame'> Out[5]: (11830, 48) Out[6]: Unnamed: 0 country isocode year currency rgdpe rgdpo pop emp avh ... csh_g csh_x csh_m csh_r 0 ABW-1950 Aruba ABW 1950 Aruban Guilder NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 1 ABW-1951 Aruba ABW 1951 Aruban Guilder NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 2 ABW-1952 Aruba ABW 1952 Aruban Guilder NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 3 ABW-1953 Aruba ABW 1953 Aruban Guilder NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 4 ABW-1954 Aruba ABW 1954 Aruban Guilder NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 5 rows × 48 columns Out[7]: Unnamed: 0 country isocode year currency rgdpe rgdpo pop emp avh 11825 ZWE-2010 Zimbabwe ZWE 2010 US Dollar 20652.718750 21053.855469 13.973897 6.298438 NaN 11826 ZWE-2011 Zimbabwe ZWE 2011 US Dollar 20720.435547 21592.298828 14.255592 6.518841 NaN 11827 ZWE-2012 Zimbabwe ZWE 2012 US Dollar 23708.654297 24360.527344 14.565482 6.248271 NaN 11828 ZWE-2013 Zimbabwe ZWE 2013 US Dollar 27011.988281 28157.886719 14.898092 6.287056 NaN 11829 ZWE-2014 Zimbabwe ZWE 2014 US Dollar 28495.554688 29149.708984 15.245855 6.499974 NaN 5 rows × 48 columns Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 2 of 13 2/28/2020, 10:52 AM
  • 3. In [8]: pwt9.columns You can assign column names to a list, for changing them or using them in formulae: In [9]: column_names = list(pwt9.columns) The reason I put them into a list function was that the column names are a data type in Pandas called an 'Index', and Python will not allow editing it. In [10]: type(column_names) In [11]: column_names[0] = 'Unique_ID' We can rename columns by assignment: In [12]: pwt9.columns = column_names In [13]: pwt9.columns In the printouts above, Python used the row number (starting at 0), becaue there was no row label variable assigned, which Pandas calls a 'Row Index'. You can assign one - for this data set, the variable 'Unique_ID': In [14]: pwt9.set_index('Unique_ID', inplace=True) Out[8]: Index(['Unnamed: 0', 'country', 'isocode', 'year', 'currency', 'rgdpe', 'rgdpo', 'pop', 'emp', 'avh', 'hc', 'ccon', 'cda', 'cgdpe', 'cgdpo', 'ck', 'ctfp', 'cwtfp', 'rgdpna', 'rconna', 'rdana', 'rkna', 'rtfpna', 'rwtfpna', 'labsh', 'delta', 'xr', 'pl_con', 'pl_da', 'pl_gdpo', 'i_cig', 'i_xm', 'i_xr', 'i_outlier', 'cor_exp', 'statcap', 'csh_c', 'csh_i', 'csh_g', 'csh_x', 'csh_m', 'csh_r', 'pl_c', 'pl_i', 'pl_g', 'pl_x', 'pl_m', 'pl_k'], dtype='object') Out[10]: list Out[13]: Index(['Unique_ID', 'country', 'isocode', 'year', 'currency', 'rgdpe', 'rgdpo', 'pop', 'emp', 'avh', 'hc', 'ccon', 'cda', 'cgdpe', 'cgdpo', 'ck', 'ctfp', 'cwtfp', 'rgdpna', 'rconna', 'rdana', 'rkna', 'rtfpna', 'rwtfpna', 'labsh', 'delta', 'xr', 'pl_con', 'pl_da', 'pl_gdpo', 'i_cig', 'i_xm', 'i_xr', 'i_outlier', 'cor_exp', 'statcap', 'csh_c', 'csh_i', 'csh_g', 'csh_x', 'csh_m', 'csh_r', 'pl_c', 'pl_i', 'pl_g', 'pl_x', 'pl_m', 'pl_k'], dtype='object') Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 3 of 13 2/28/2020, 10:52 AM
  • 4. In [15]: pwt9.head() The 'inplace=True' makes Pandas alter the original data set, rather than making a copy. You can select items in a DataFrame by row/column number, or row/column name: the first uses the iloc[] command (for integer location?), the second used the 'loc[]' command: In [16]: pwt9.iloc[0,0] This selects the row 'ZWE-2010', the ',:' selects all columns. Out[15]: country isocode year currency rgdpe rgdpo pop emp avh hc ... csh_g csh_x csh_m Unique_ID ABW-1950 Aruba ABW 1950 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1951 Aruba ABW 1951 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1952 Aruba ABW 1952 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1953 Aruba ABW 1953 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1954 Aruba ABW 1954 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN 5 rows × 47 columns Out[16]: 'Aruba' Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 4 of 13 2/28/2020, 10:52 AM
  • 5. In [17]: pwt9.loc['ZWE-2010',:] Out[17]: country Zimbabwe isocode ZWE year 2010 currency US Dollar rgdpe 20652.7 rgdpo 21053.9 pop 13.9739 emp 6.29844 avh NaN hc 2.3726 ccon 21862.8 cda 26023 cgdpe 20537.3 cgdpo 21236.8 ck 34274.3 ctfp 0.234834 cwtfp 0.277434 rgdpna 19295.2 rconna 20977.2 rdana 25035.1 rkna 78953.3 rtfpna 0.932604 rwtfpna 0.863711 labsh 0.555796 delta 0.0373084 xr 1 pl_con 0.442739 pl_da 0.458783 pl_gdpo 0.443672 i_cig interpolated i_xm benchmark i_xr market i_outlier no cor_exp NaN statcap 45.5556 csh_c 0.902229 csh_i 0.195897 csh_g 0.127251 csh_x 0.214657 csh_m -0.454497 csh_r 0.0144622 pl_c 0.44717 pl_i 0.5431 pl_g 0.411316 pl_x 0.701797 pl_m 0.606324 pl_k 1.01514 Name: ZWE-2010, dtype: object Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 5 of 13 2/28/2020, 10:52 AM
  • 6. In [18]: pwt9.iloc[0,0:10] In [19]: pwt9.iloc[0:10,] To select a column, just use the column name(s) within quotes and brackets. Out[18]: country Aruba isocode ABW year 1950 currency Aruban Guilder rgdpe NaN rgdpo NaN pop NaN emp NaN avh NaN hc NaN Name: ABW-1950, dtype: object Out[19]: country isocode year currency rgdpe rgdpo pop emp avh hc ... csh_g csh_x csh_m Unique_ID ABW-1950 Aruba ABW 1950 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1951 Aruba ABW 1951 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1952 Aruba ABW 1952 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1953 Aruba ABW 1953 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1954 Aruba ABW 1954 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1955 Aruba ABW 1955 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1956 Aruba ABW 1956 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1957 Aruba ABW 1957 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1958 Aruba ABW 1958 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN ABW-1959 Aruba ABW 1959 Aruban Guilder NaN NaN NaN NaN NaN NaN ... NaN NaN NaN 10 rows × 47 columns Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 6 of 13 2/28/2020, 10:52 AM
  • 7. In [20]: pwt9['country'] Missing Values (https://pandas.pydata.org/pandas-docs/stable/user_guide /missing_data.html (https://pandas.pydata.org/pandas-docs/stable/user_guide /missing_data.html)) Pandas uses 'NaN' and 'NA' as missing value symbols. You can also set '-inf' and 'inf' to be treated as missing values with the command: "pandas.options.mode.use_inf_as_na = True." Python uses 'None'. There are other types, such as 'NaT' for datetime64[ns] When comparing/testing values, use the functions 'isna()' and 'notna()' functions: Out[20]: Unique_ID ABW-1950 Aruba ABW-1951 Aruba ABW-1952 Aruba ABW-1953 Aruba ABW-1954 Aruba ... ZWE-2010 Zimbabwe ZWE-2011 Zimbabwe ZWE-2012 Zimbabwe ZWE-2013 Zimbabwe ZWE-2014 Zimbabwe Name: country, Length: 11830, dtype: object Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 7 of 13 2/28/2020, 10:52 AM
  • 8. In [21]: pwt9.loc['ABW-1950',] In [22]: pwt9.loc['ABW-1950','rgdpe'] In [23]: pwt9.loc['ABW-1950','rgdpe'] == 'nan' In [24]: pd.isna(pwt9.loc['ABW-1950','rgdpe']) Out[21]: country Aruba isocode ABW year 1950 currency Aruban Guilder rgdpe NaN rgdpo NaN pop NaN emp NaN avh NaN hc NaN ccon NaN cda NaN cgdpe NaN cgdpo NaN ck NaN ctfp NaN cwtfp NaN rgdpna NaN rconna NaN rdana NaN rkna NaN rtfpna NaN rwtfpna NaN labsh NaN delta NaN xr NaN pl_con NaN pl_da NaN pl_gdpo NaN i_cig NaN i_xm NaN i_xr NaN i_outlier NaN cor_exp NaN statcap NaN csh_c NaN csh_i NaN csh_g NaN csh_x NaN csh_m NaN csh_r NaN pl_c NaN pl_i NaN pl_g NaN pl_x NaN pl_m NaN pl_k NaN Name: ABW-1950, dtype: object Out[22]: nan Out[23]: False Out[24]: True Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 8 of 13 2/28/2020, 10:52 AM
  • 9. The usual caveats about propagation of missing values applies. Group By: split-apply-combine In [25]: grouped_pwt9 = pwt9.groupby(['isocode','year']) In [26]: grouped_pwt9.count() You can also add the groupby() and count() methods directly. Out[26]: country currency rgdpe rgdpo pop emp avh hc ccon cda ... csh_g csh_x csh_m csh_r isocode year ABW 1950 1 1 0 0 0 0 0 0 0 0 ... 0 0 0 1951 1 1 0 0 0 0 0 0 0 0 ... 0 0 0 1952 1 1 0 0 0 0 0 0 0 0 ... 0 0 0 1953 1 1 0 0 0 0 0 0 0 0 ... 0 0 0 1954 1 1 0 0 0 0 0 0 0 0 ... 0 0 0 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ZWE 2010 1 1 1 1 1 1 0 1 1 1 ... 1 1 1 2011 1 1 1 1 1 1 0 1 1 1 ... 1 1 1 2012 1 1 1 1 1 1 0 1 1 1 ... 1 1 1 2013 1 1 1 1 1 1 0 1 1 1 ... 1 1 1 2014 1 1 1 1 1 1 0 1 1 1 ... 1 1 1 11830 rows × 45 columns Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 9 of 13 2/28/2020, 10:52 AM
  • 10. In [27]: pwt9.groupby('isocode').count() Sumary Statistics (see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html)) In [28]: pwt9.describe() Out[27]: country year currency rgdpe rgdpo pop emp avh hc ccon ... csh_g csh_x csh_m csh_r isocode ABW 65 65 65 45 45 45 24 0 0 45 ... 45 45 45 45 AGO 65 65 65 45 45 45 45 0 45 45 ... 45 45 45 45 AIA 65 65 65 45 45 45 29 0 0 45 ... 45 45 45 45 ALB 65 65 65 45 45 45 45 0 45 45 ... 45 45 45 45 ARE 65 65 65 45 45 45 45 0 45 45 ... 45 45 45 45 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... VNM 65 65 65 45 45 45 45 45 45 45 ... 45 45 45 45 YEM 65 65 65 26 26 26 26 0 26 26 ... 26 26 26 26 ZAF 65 65 65 65 65 65 65 12 65 65 ... 65 65 65 65 ZMB 65 65 65 60 60 60 60 0 60 60 ... 60 60 60 60 ZWE 65 65 65 61 61 61 35 0 61 61 ... 61 61 61 61 182 rows × 46 columns Out[28]: year rgdpe rgdpo pop emp avh hc count 11830.000000 9.439000e+03 9.439000e+03 9439.000000 8244.000000 3319.000000 7867.000000 9.439000e+03 mean 1982.000000 2.530908e+05 2.508545e+05 30.090573 14.218857 1995.704047 2.032653 1.856605e+05 std 18.762456 9.973281e+05 9.899123e+05 111.489127 56.500008 271.514641 0.708940 7.308969e+05 min 1950.000000 1.854156e+01 1.108702e+00 0.004377 0.001180 1362.503252 1.007038 1.459605e+01 25% 1966.000000 5.847668e+03 6.044927e+03 1.593420 0.941974 1816.496296 1.408157 4.988803e+03 50% 1982.000000 2.458191e+04 2.487366e+04 5.985658 2.976475 1979.000000 1.916717 1.975266e+04 75% 1998.000000 1.304465e+05 1.324746e+05 19.533204 8.403848 2176.902193 2.608796 9.644857e+04 max 2014.000000 1.708030e+07 1.713595e+07 1369.435670 798.367798 3042.446905 3.734285 1.368595e+07 8 rows × 40 columns Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 10 of 13 2/28/2020, 10:52 AM
  • 11. In [29]: pwt9.groupby('isocode').describe() Plotting, using Matplotlib There are a number of plotting functions Out[29]: year rgdpe ... pl_m count mean std min 25% 50% 75% max count mean ... 75% isocode ABW 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 2524.893969 ... 0.648430 AGO 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 55150.395095 ... 0.572490 AIA 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 207.284460 ... 0.566618 ALB 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 14805.517654 ... 0.580224 ARE 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 255698.718403 ... 0.560910 ... ... ... ... ... ... ... ... ... ... ... ... ... VNM 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 45.0 153119.746441 ... 0.550852 YEM 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 26.0 41752.590952 ... 0.642233 ZAF 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 65.0 285953.341587 ... 0.552549 ZMB 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 60.0 16092.146663 ... 0.572661 ZWE 65.0 1982.0 18.90767 1950.0 1966.0 1982.0 1998.0 2014.0 61.0 23284.501889 ... 0.542417 182 rows × 320 columns Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 11 of 13 2/28/2020, 10:52 AM
  • 12. In [30]: import matplotlib.pyplot as plt import numpy as np plt.xlabel('Log(Population)') plt.title('Histogram of Log(Population)') # Adding the option 'density=True' would give the density. plt.hist(np.log(pwt9['pop']), bins=30) plt.show() The command '%matplotlib inline' in the first cell at the top of the notebook causes Matplotlib to put the plot in the notebook, rather than in a new window. In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: C:ProgramDataAnaconda3libsite-packagespandascoreseries.py:853: RuntimeWar ning: invalid value encountered in log result = getattr(ufunc, method)(*inputs, **kwargs) C:ProgramDataAnaconda3libsite-packagesnumpylibhistograms.py:824: RuntimeW arning: invalid value encountered in greater_equal keep = (tmp_a >= first_edge) C:ProgramDataAnaconda3libsite-packagesnumpylibhistograms.py:825: RuntimeW arning: invalid value encountered in less_equal keep &= (tmp_a <= last_edge) Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 12 of 13 2/28/2020, 10:52 AM
  • 13. In [ ]: Up and Running with Python http://localhost:8888/nbconvert/html/Documents/Python/Projects/ASA U... 13 of 13 2/28/2020, 10:52 AM