INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
Lecture 1: Python – Fundamentals
Dr. A. Ramesh
DEPARTMENT OF MANAGEMENT
IIT ROORKEE
2
2
Learning objectives
1. Installing Python
2. Fundamentals of Python
3. Data Visualisation
2
3
3
Python Installation Process
Installation Process –
Step 1: Type https://www.anaconda.com at the address bar of web
browser.
Step 2: Click on download button
Step 3: Download python 3.8 version for windows OS
Step 4: Double click on file to run the application
Step 5: Follow the instructions until completion of installation process
3
4
4
Python Installation Process
Installation Process –
Step 1: Type https://www.anaconda.com at the address bar of web browser.
4
5
5
Python Installation Process
Step 2: Click on download button
5
6
6
Python Installation Process
Step 3: Download python 3.8 version for windows OS
6
7
7
Python Installation Process
Step 4: Double click on the downloaded file to run the application
7
8
8
Python Installation Process
8
9
9
Python Installation Process
9
10
10
Python Installation Process
10
11
11
Python Installation Process
11
12
12
Python Installation Process
12
13
13
Python Installation Process
13
14
14
Python Installation Process
14
15
15
Python Installation Process
15
16
16
Python Installation Process
16
17
17
Why Jupyter NoteBook?
17
Why?
• Edit code on web browser
• Easy in documentation
• Easy in demonstration
• User- friendly Interface
18
18
Python and Jupyter
18
Python Programming Language Jupyter Application
Software Package contains both
python and jupyter application
19
19
20
20
About Jupyter NoteBook
20
Cell -> Access using Enter Key
21
21
About Jupyter NoteBook
21
Input Field -> Green color indicates edit mode
Blue color indicates command mode
22
22
About Jupyter NoteBook
22
-> It contains documentation
-> Text not executed as code
23
23
About Jupyter Notebook
• Command mode allow to edit notebook as whole
• To close edit mode (Press Escape key)
• Execution (Three ways)
• Comment line is written preceding with # symbol.
23
o Ctrl +Enter (Output field can not be modified)
o Shift +Enter (Output field is modified)
o Run button on Jupyter interface
24
24
About Jupyter Notebook
• Important shortcut keys
24
o A -> To create cell above
o B -> To create cell below
o D + D -> For deleting cell
o M -> For markdown cell
o Y -> For code cell
25
25
Fundamentals of Python
• Loading a simple delimited data file
• Counting how many rows and columns were loaded
• Determining which type of data was loaded
• Looking at different parts of the data by subsetting rows and columns
25
Importing Different Files in Jupyter Notebook
• Importing text file
26
Importing Different Files in Jupyter Notebook
• Importing tablular file
27
Importing Different Files in Jupyter Notebook
• Importing excel file
28
Importing Different Files in Jupyter Notebook
• Importing Zip file
29
Importing Different Files in Jupyter Notebook
• Importing PDF file
30
31
31
31
32
32
Loading a simple delimited data file
32
33
33
33
34
34
• head method shows us only the first 5 rows
34
35
35
Get the number of rows and columns
35
36
36
get column names
36
37
37
get the dtype of each column
37
38
38
Pandas Types Versus Python Types
38
39
39
get more information about data
39
40
40
Looking at Columns, Rows, and Cells
• # get the country column and save it to its own variable
40
41
41
# show the first 5 observations
41
42
42
# show the last 5 observations
42
43
43
# Looking at country, continent, and year
43
44
44
44
45
45
Looking at Columns, Rows, and Cells
• Subset Rows by Index Label: loc
45
46
46
get the first row
• Python counts from 0
46
47
47
• # get the 100th row
# Python counts from 0
47
48
48
• get the last row
48
49
49
Subsetting Multiple Rows
• # select the first, 100th, and 1000th rows
49
50
50
Subset Rows by Row Number: iloc
• # get the 2nd row
50
51
51
• get the 100th row
51
52
52
• # using -1 to get the last row
52
53
53
With iloc, we can pass in the -1 to get the last row—something we couldn’t do with loc.
53
54
54
• # get the first, 100th, and 1000th rows
54
55
55
Subsetting Columns
• The Python slicing syntax uses a colon, :
• If we have just a colon, the attribute refers to everything.
• So, if we just want to get the first column using the loc or iloc syntax,
we can write something like df.loc[:, [columns]] to subset the column(s).
55
56
56
• # subset columns with loc
# note the position of the colon
# it is used to select all rows
56
57
57
57
58
58
• # subset columns with iloc
• # iloc will alow us to use integers
• # -1 will select the last column
58
59
59
Subsetting Columns by Range
• # create a range of integers from 0 to 4 inclusive
59
60
60
• # subset the dataframe with the range
60
61
61
Subsetting Rows and Columns
• # using loc
61
62
62
• # using iloc
62
63
63
Subsetting Multiple Rows and Columns
• #get the 1st, 100th, and 1000th rows
# from the 1st, 4th, and 6th columns
63
64
64
• if we use the column names directly,
# it makes the code a bit easier to read
# note now we have to use loc, instead of iloc
64
65
65
65
66
66
66
67
67
Grouped Means
• # For each year in our data, what was the average life expectancy?
# To answer this question,
# we need to split our data into parts by year;
# then we get the 'lifeExp' column and calculate the mean
67
68
68
68
69
69
69
70
70
• If you need to “flatten” the dataframe, you can use the
reset_index method.
70
71
71
Grouped Frequency Counts
• use the nunique to get counts of unique values on a Pandas Series.
71
72
72
Basic Plot
72
73
73
73
74
74
Visual Representation of the Data
• Histogram -- vertical bar chart of frequencies
• Frequency Polygon -- line graph of frequencies
• Ogive -- line graph of cumulative frequencies
• Pie Chart -- proportional representation for categories of a whole
• Stem and Leaf Plot
• Pareto Chart
• Scatter Plot
74
75
75
Methods of visual presentation of data
• Table
75
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 90 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
76
76
Methods of visual presentation of data
• Graphs
76
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
77
77
Methods of visual presentation of data
• Pie chart
77
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
78
78
Methods of visual presentation of data
• Multiple bar chart
78
0 20 40 60 80 100
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
North
West
East
79
79
Methods of visual presentation of data
• Simple pictogram
79
0
20
40
60
80
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
North
West
80
80
Frequency distributions
• Frequency tables
80
Class Interval Frequency Cumulative Frequency
< 20 13 13
<40 18 31
<60 25 56
<80 15 71
<100 9 80
Observation Table
81
81
Frequency diagrams
Frequency
0
5
10
15
20
25
30
< 20 <40 <60 <80 <100
Frequency
81
Frequency
0
5
10
15
20
25
30
< 20 <40 <60 <80 <100
Frequency
Cumulative Frequency
0
10
20
30
40
50
60
70
80
90
< 20 <40 <60 <80 <100
Cumulative Frequency
82
82
Histogram
82
Class Interval
Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
0
10
20
0 10 20 30 40 50 60 70 80
Years
Frequency
83
83
Histogram Construction
83
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
0
10
20
0 10 20 30 40 50 60 70 80
Years
Frequency
84
84
Frequency Polygon
84
Class IntervalFrequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
0
10
20
0 10 20 30 40 50 60 70 80
Years
Frequency
85
85
Ogive
Cumulative
Class Interval Frequency
20-under 30 6
30-under 40 24
40-under 50 35
50-under 60 46
60-under 70 49
70-under 80 50
85
0
20
40
60
0 10 20 30 40 50 60 70 80
Years
Frequency
86
86
Relative Frequency Ogive
Cumulative
Relative
Class Interval Frequency
20-under 30 .12
30-under 40 .48
40-under 50 .70
50-under 60 .92
60-under 70 .98
70-under 80 1.00
86
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 10 20 30 40 50 60 70 80
Years
Cumulative
Relative
Frequency
87
87
Pareto Chart
87
0
10
20
30
40
50
60
70
80
90
100
Poor
Wiring
Short in
Coil
Defective
Plug
Other
Frequency
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
88
88
Scatter Plot
Registered Vehicles (1000's) Gasoline Sales (1000's of Gallons)
5 60
15 120
9 90
15 140
7 60
0
100
200
0 5 10 15 20
RegisteredVehicles
Gasoline
Sales
88
89
89
Principles of Excellent Graphs
• The graph should not distort the data
• The graph should not contain unnecessary adornments (sometimes
referred to as chart junk)
• The scale on the vertical axis should begin at zero
• All axes should be properly labeled
• The graph should contain a title
• The simplest possible graph should be used for a given set of data
90
90
Graphical Errors: Chart Junk
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage
Bad Presentation
Minimum Wage
0
2
4
1960 1970 1980 1990
$
 Good Presentation
91
91
Graphical Errors:
Compressing the Vertical Axis
Good Presentation
Quarterly Sales Quarterly Sales
Bad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$

92
92
Graphical Errors: No Zero Point on the Vertical Axis
Monthly Sales
36
39
42
45
J F M A M J
$
Graphing the first six months of sales
Monthly Sales
0
39
42
45
J F M A M J
$
36
Good Presentations
Bad Presentation

1_ Introduction Python.pptx python is a data