WHAT IS PANDAS?
•Pandas is an open-source data analysis and
manipulation library for Python.
• It provides data structures like Series (1D) and
DataFrame (2D).
• Used for cleaning, analyzing, and visualizing data.
3.
USES OF PANDAS
•High performance Data analysis tool.
• Working with large dataset.
• Support or load files with different formats.
• Handle missing data
• Read and write data from files (CSV, Excel, etc.)
• Filter, group, and aggregate data
• Time series analysis
• Essential for Data Analysis, Machine Learning, andVisualization
4.
PANDAS SERIES
• Aone-dimensional labeled array capable of
holding any data type.
• Each element has a label (index).
SERIES FUNCTIONS
• s.head()-> Returns first n elements
• s.tail() -> Returns last n elements
• s.sum() -> Sum of all elements
• s.mean() -> Mean of elements
• s.index -> Returns index labels
• s.values -> Returns array of values
7.
SERIES FUNCTION EXAMPLE
s= pd.Series([10, 20, 30, 40])
print(s.sum()) # Output: 100
print(s.mean()) # Output: 25.0
print(s.head(2)) # Output: 10, 20
8.
PANDAS DATAFRAME
• A2D labeled data structure with columns of
potentially different types.
• Like a table with rows and columns.
9.
CREATING A DATAFRAME
data= {'Name': ['A', 'B'], 'Marks': [90, 85]}
df = pd.DataFrame(data)
print(df)
Output:
Name Marks
0 A 90
1 B 85
10.
DATAFRAME FUNCTIONS -EXAMPLES
df.head() -> First n rows
df.tail() -> Last n rows
df.describe() -> Statistical summary
df.info() -> Info about DataFrame
df['col'] -> Access column
df.iloc[0] -> Access row by index
11.
DATAFRAME FUNCTION
EXAMPLE
• df= pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
• print(df.head()) # Output: entire DataFrame
• print(df['A']) # Output: Column A
• print(df.describe()) # Summary stats
12.
Accessing columns androws:
print(df['Name']) # Access column
print(df.iloc[1]) # Access second row by index
print(df.loc[2]) # Access row by label/index
• Adding a new column
df['Salary'] = [50000, 60000, 70000]
print(df)
HANDLING MISSING DATAIN PANDAS
• Missing data is common in real-world datasets.
• Pandas provides several functions to handle missing data.
• Missing values are typically represented as NaN (Not a
Number).
EXAMPLE: HANDLING MISSINGDATA
import pandas as pd
data = {'A': [1, None, 3], 'B': [4, 5, None]}
df = pd.DataFrame(data)
print(df.isnull()) # True for NaNs
print(df.fillna(0)) # Replace NaNs with 0
print(df.dropna()) # Drop rows with NaNs
17.
PANDAS IN ML,DATA ANALYSIS &
VISUALIZATION
• Pandas is widely used in:
• - Machine Learning (data preprocessing, feature
engineering)
• - Data Analysis (exploratory data analysis)
• - DataVisualization (with libraries like matplotlib, seaborn)
18.
FUNCTIONS USED INML AND
ANALYSIS
• df.drop() -> Drop columns/rows (e.g., irrelevant features)
• df.fillna() -> Handle missing data
• df.astype() -> Convert data types (e.g., to float or int)
• df['col'].map() -> Transform values (e.g., label encoding)
• df.groupby() -> Group and summarize data
• df.corr() -> Correlation matrix for features
• df.describe() -> Statistical summary of data
19.
FUNCTIONS USED IN
VISUALIZATION
•df.plot() -> Quick plot of data
• df.hist() -> Histogram of numeric columns
• df.boxplot() -> Boxplot for distribution and outliers
• df.plot(kind='bar') -> Bar chart
• df.plot(kind='line') -> Line chart for time series