Pandas easy to ;learn l;ibrary ffff.pptx

INTRODUCTIONTO PANDAS
WITH EXAMPLES
SERIES AND DATAFRAME FUNCTIONS WITH EXAMPLE
PROGRAMS AND OUTPUTS

WHAT IS PANDAS?
• Pandas is an open-source data analysis and
manipulation library for Python.
• It provides data structures like Series (1D) and
DataFrame (2D).
• Used for cleaning, analyzing, and visualizing data.

USES OF PANDAS
• High performance Data analysis tool.
• Working with large dataset.
• Support or load files with different formats.
• Handle missing data
• Read and write data from files (CSV, Excel, etc.)
• Filter, group, and aggregate data
• Time series analysis
• Essential for Data Analysis, Machine Learning, andVisualization

PANDAS SERIES
• A one-dimensional labeled array capable of
holding any data type.
• Each element has a label (index).

CREATING A SERIES
Example:
import pandas as pd
s = pd.Series([10, 20, 30])
print(s)
Output:
0 10
1 20
2 30

SERIES FUNCTIONS
• s.head() -> Returns first n elements
• s.tail() -> Returns last n elements
• s.sum() -> Sum of all elements
• s.mean() -> Mean of elements
• s.index -> Returns index labels
• s.values -> Returns array of values

SERIES FUNCTION EXAMPLE
s = pd.Series([10, 20, 30, 40])
print(s.sum()) # Output: 100
print(s.mean()) # Output: 25.0
print(s.head(2)) # Output: 10, 20

PANDAS DATAFRAME
• A 2D labeled data structure with columns of
potentially different types.
• Like a table with rows and columns.

CREATING A DATAFRAME
data = {'Name': ['A', 'B'], 'Marks': [90, 85]}
df = pd.DataFrame(data)
print(df)
Output:
Name Marks
0 A 90
1 B 85

DATAFRAME FUNCTIONS - EXAMPLES
df.head() -> First n rows
df.tail() -> Last n rows
df.describe() -> Statistical summary
df.info() -> Info about DataFrame
df['col'] -> Access column
df.iloc[0] -> Access row by index

DATAFRAME FUNCTION
EXAMPLE
• df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
• print(df.head()) # Output: entire DataFrame
• print(df['A']) # Output: Column A
• print(df.describe()) # Summary stats

Accessing columns and rows:
print(df['Name']) # Access column
print(df.iloc[1]) # Access second row by index
print(df.loc[2]) # Access row by label/index
• Adding a new column
df['Salary'] = [50000, 60000, 70000]
print(df)

FILTERING DATA
# Get all people older than 28
print(df[df['Age'] > 28])

HANDLING MISSING DATA IN PANDAS
• Missing data is common in real-world datasets.
• Pandas provides several functions to handle missing data.
• Missing values are typically represented as NaN (Not a
Number).

FUNCTIONSTO HANDLE MISSING DATA
• isnull() -> Detect missing values
• notnull() -> Detect non-missing values
• dropna() -> Remove missing values
• fillna() -> Replace missing values
• interpolate() -> Fill values using interpolation

EXAMPLE: HANDLING MISSING DATA
import pandas as pd
data = {'A': [1, None, 3], 'B': [4, 5, None]}
df = pd.DataFrame(data)
print(df.isnull()) # True for NaNs
print(df.fillna(0)) # Replace NaNs with 0
print(df.dropna()) # Drop rows with NaNs

PANDAS IN ML, DATA ANALYSIS &
VISUALIZATION
• Pandas is widely used in:
• - Machine Learning (data preprocessing, feature
engineering)
• - Data Analysis (exploratory data analysis)
• - DataVisualization (with libraries like matplotlib, seaborn)

FUNCTIONS USED IN ML AND
ANALYSIS
• df.drop() -> Drop columns/rows (e.g., irrelevant features)
• df.fillna() -> Handle missing data
• df.astype() -> Convert data types (e.g., to float or int)
• df['col'].map() -> Transform values (e.g., label encoding)
• df.groupby() -> Group and summarize data
• df.corr() -> Correlation matrix for features
• df.describe() -> Statistical summary of data

FUNCTIONS USED IN
VISUALIZATION
• df.plot() -> Quick plot of data
• df.hist() -> Histogram of numeric columns
• df.boxplot() -> Boxplot for distribution and outliers
• df.plot(kind='bar') -> Bar chart
• df.plot(kind='line') -> Line chart for time series

Pandas easy to ;learn l;ibrary ffff.pptx

More Related Content

Similar to Pandas easy to ;learn l;ibrary ffff.pptx

Recently uploaded

Pandas easy to ;learn l;ibrary ffff.pptx