2. Pandas
● Pandas is an open-source library that is built on top of NumPy library.
● It is a Python package that offers various data structures and operations for
manipulating numerical data and time series.
● It is mainly popular for importing and analyzing data much easier.
● Pandas is fast and it has high-performance & productivity for users.
3. Installation
● The first step of working in pandas is to ensure whether it is installed in the Python
folder or not.
● If not then we need to install it in our system using pip command.
● Command:
○ pip install pandas
● After the pandas have been installed into the system, you need to import the library.
○ import pandas as pd
4. Pandas
● Pandas generally provide two data structures for manipulating data,
They are:
○ Series
○ DataFrame
5. Series
● Pandas Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.).
● The axis labels are collectively called indexes.
● Pandas Series is nothing but a column in an excel sheet.
6. Creating a series
● In the real world, a Pandas Series will be
created by loading the datasets from
existing storage, storage can be SQL
Database, CSV file, an Excel file.
● Pandas Series can be created from the
lists, dictionary, and from a scalar value
etc.
import pandas as pd
import numpy as np
# Creating empty series
ser = pd.Series()
print(ser)
# simple array
data = np.array(['s', 'e', 'r', 'i', 'e', 's'])
ser = pd.Series(data)
print(ser)
7. Creating a series from lists
● In order to create a series from list, we
have to first create a list after that we
can create a series from list.
import pandas as pd
# a simple list
list = ['s', 'e', 'r', 'i', 'e', 's']
# create series from a list
ser = pd.Series(list)
print(ser)
8. Accessing element of Series
There are two ways through which we can access element of series, they are :
● Accessing Element from Series with Position
● Accessing Element Using Label (index)
Accessing Element from Series with Position :
● In order to access the series element refers to the index number.
● Use the index operator [ ] to access an element in a series.
● The index must be an integer. In order to access multiple elements from a series,
we use Slice operation.
9. Example
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['s','e','r','i','e','s',
'i','n','P','y','t','h','o','n'])
ser = pd.Series(data)
#retrieve the first element
print(ser[:5])
10. Accessing Element Using Label (index)
● In order to access an element from
series, we have to set values by index
label.
● A Series is like a fixed-size dictionary
in that you can get and set values by
index label.
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['s','e','r','i','e','s', 'i','n','P','y','t','h','o','n'])
ser
=pd.Series(data,index=[10,11,12,13,14,15,16,17,18,
19,20,21,22,23])
# accessing a element using index element
print(ser[16])
11. Indexing and Selecting Data in Series
● Indexing in pandas means simply selecting
particular data from a Series.
● Indexing could mean selecting all the data,
some of the data from particular columns.
Indexing can also be known as Subset
Selection.
import pandas as pd
# making data frame
df =
pd.read_csv("/Users/maik/Documents/DCSE/Courses
/IMS/DMT/nba.csv")
ser = pd.Series(df['Name'])
data = ser.head(10)
data
12. Data Frames
● A Pandas DataFrame is a 2 dimensional
data structure, like a 2 dimensional
array, or a table with rows and columns.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45] }
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
13. Locate Row
● As you can see from the result above, the DataFrame is like a table with rows and columns.
● Pandas use the loc attribute to return one or more specified row(s).
○ print(df.loc[0])
○ print(df.loc[[0, 1]])
14. Named Indexes
● With the index argument, you can name your own indexes.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
15. Locate Named Indexes
● Use the named index in the loc attribute to return the specified row(s).
○ print(df.loc["day2"])
Note: use to_string() to print the entire DataFrame.
16. Example
import pandas as pd
# reading csv file
data =
pd.read_csv("/Users/maik/Documents/DCSE/Co
urses/IMS/DMT/nba.csv")
# storing dtype before converting
before = data.dtypes
# converting dtypes using astype
data["Salary"]= data["Salary"].astype(int)
data["Number"]= data["Number"].astype(str)
# storing dtype after converting
after = data.dtypes
# printing to compare
print("BEFORE CONVERSIONn", before, "n")
print("AFTER CONVERSIONn", after, "n")