INTRODUCTIONTO PANDAS
WITH EXAMPLES
SERIES AND DATAFRAME FUNCTIONS WITH EXAMPLE
PROGRAMS AND OUTPUTS
WHAT IS PANDAS?
• Pandas is an open-source data analysis and
manipulation library for Python.
• It provides data structures like Series (1D) and
DataFrame (2D).
• Used for cleaning, analyzing, and visualizing data.
USES OF PANDAS
• High performance Data analysis tool.
• Working with large dataset.
• Support or load files with different formats.
• Handle missing data
• Read and write data from files (CSV, Excel, etc.)
• Filter, group, and aggregate data
• Time series analysis
• Essential for Data Analysis, Machine Learning, andVisualization
PANDAS SERIES
• A one-dimensional labeled array capable of
holding any data type.
• Each element has a label (index).
CREATING A SERIES
Example:
import pandas as pd
s = pd.Series([10, 20, 30])
print(s)
Output:
0 10
1 20
2 30
SERIES FUNCTIONS
• s.head() -> Returns first n elements
• s.tail() -> Returns last n elements
• s.sum() -> Sum of all elements
• s.mean() -> Mean of elements
• s.index -> Returns index labels
• s.values -> Returns array of values
SERIES FUNCTION EXAMPLE
s = pd.Series([10, 20, 30, 40])
print(s.sum()) # Output: 100
print(s.mean()) # Output: 25.0
print(s.head(2)) # Output: 10, 20
PANDAS DATAFRAME
• A 2D labeled data structure with columns of
potentially different types.
• Like a table with rows and columns.
CREATING A DATAFRAME
data = {'Name': ['A', 'B'], 'Marks': [90, 85]}
df = pd.DataFrame(data)
print(df)
Output:
Name Marks
0 A 90
1 B 85
DATAFRAME FUNCTIONS - EXAMPLES
df.head() -> First n rows
df.tail() -> Last n rows
df.describe() -> Statistical summary
df.info() -> Info about DataFrame
df['col'] -> Access column
df.iloc[0] -> Access row by index
DATAFRAME FUNCTION
EXAMPLE
• df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
• print(df.head()) # Output: entire DataFrame
• print(df['A']) # Output: Column A
• print(df.describe()) # Summary stats
Accessing columns and rows:
print(df['Name']) # Access column
print(df.iloc[1]) # Access second row by index
print(df.loc[2]) # Access row by label/index
• Adding a new column
df['Salary'] = [50000, 60000, 70000]
print(df)
FILTERING DATA
# Get all people older than 28
print(df[df['Age'] > 28])
HANDLING MISSING DATA IN PANDAS
• Missing data is common in real-world datasets.
• Pandas provides several functions to handle missing data.
• Missing values are typically represented as NaN (Not a
Number).
FUNCTIONSTO HANDLE MISSING DATA
• isnull() -> Detect missing values
• notnull() -> Detect non-missing values
• dropna() -> Remove missing values
• fillna() -> Replace missing values
• interpolate() -> Fill values using interpolation
EXAMPLE: HANDLING MISSING DATA
import pandas as pd
data = {'A': [1, None, 3], 'B': [4, 5, None]}
df = pd.DataFrame(data)
print(df.isnull()) # True for NaNs
print(df.fillna(0)) # Replace NaNs with 0
print(df.dropna()) # Drop rows with NaNs
PANDAS IN ML, DATA ANALYSIS &
VISUALIZATION
• Pandas is widely used in:
• - Machine Learning (data preprocessing, feature
engineering)
• - Data Analysis (exploratory data analysis)
• - DataVisualization (with libraries like matplotlib, seaborn)
FUNCTIONS USED IN ML AND
ANALYSIS
• df.drop() -> Drop columns/rows (e.g., irrelevant features)
• df.fillna() -> Handle missing data
• df.astype() -> Convert data types (e.g., to float or int)
• df['col'].map() -> Transform values (e.g., label encoding)
• df.groupby() -> Group and summarize data
• df.corr() -> Correlation matrix for features
• df.describe() -> Statistical summary of data
FUNCTIONS USED IN
VISUALIZATION
• df.plot() -> Quick plot of data
• df.hist() -> Histogram of numeric columns
• df.boxplot() -> Boxplot for distribution and outliers
• df.plot(kind='bar') -> Bar chart
• df.plot(kind='line') -> Line chart for time series

Pandas easy to ;learn l;ibrary ffff.pptx

  • 1.
    INTRODUCTIONTO PANDAS WITH EXAMPLES SERIESAND DATAFRAME FUNCTIONS WITH EXAMPLE PROGRAMS AND OUTPUTS
  • 2.
    WHAT IS PANDAS? •Pandas is an open-source data analysis and manipulation library for Python. • It provides data structures like Series (1D) and DataFrame (2D). • Used for cleaning, analyzing, and visualizing data.
  • 3.
    USES OF PANDAS •High performance Data analysis tool. • Working with large dataset. • Support or load files with different formats. • Handle missing data • Read and write data from files (CSV, Excel, etc.) • Filter, group, and aggregate data • Time series analysis • Essential for Data Analysis, Machine Learning, andVisualization
  • 4.
    PANDAS SERIES • Aone-dimensional labeled array capable of holding any data type. • Each element has a label (index).
  • 5.
    CREATING A SERIES Example: importpandas as pd s = pd.Series([10, 20, 30]) print(s) Output: 0 10 1 20 2 30
  • 6.
    SERIES FUNCTIONS • s.head()-> Returns first n elements • s.tail() -> Returns last n elements • s.sum() -> Sum of all elements • s.mean() -> Mean of elements • s.index -> Returns index labels • s.values -> Returns array of values
  • 7.
    SERIES FUNCTION EXAMPLE s= pd.Series([10, 20, 30, 40]) print(s.sum()) # Output: 100 print(s.mean()) # Output: 25.0 print(s.head(2)) # Output: 10, 20
  • 8.
    PANDAS DATAFRAME • A2D labeled data structure with columns of potentially different types. • Like a table with rows and columns.
  • 9.
    CREATING A DATAFRAME data= {'Name': ['A', 'B'], 'Marks': [90, 85]} df = pd.DataFrame(data) print(df) Output: Name Marks 0 A 90 1 B 85
  • 10.
    DATAFRAME FUNCTIONS -EXAMPLES df.head() -> First n rows df.tail() -> Last n rows df.describe() -> Statistical summary df.info() -> Info about DataFrame df['col'] -> Access column df.iloc[0] -> Access row by index
  • 11.
    DATAFRAME FUNCTION EXAMPLE • df= pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) • print(df.head()) # Output: entire DataFrame • print(df['A']) # Output: Column A • print(df.describe()) # Summary stats
  • 12.
    Accessing columns androws: print(df['Name']) # Access column print(df.iloc[1]) # Access second row by index print(df.loc[2]) # Access row by label/index • Adding a new column df['Salary'] = [50000, 60000, 70000] print(df)
  • 13.
    FILTERING DATA # Getall people older than 28 print(df[df['Age'] > 28])
  • 14.
    HANDLING MISSING DATAIN PANDAS • Missing data is common in real-world datasets. • Pandas provides several functions to handle missing data. • Missing values are typically represented as NaN (Not a Number).
  • 15.
    FUNCTIONSTO HANDLE MISSINGDATA • isnull() -> Detect missing values • notnull() -> Detect non-missing values • dropna() -> Remove missing values • fillna() -> Replace missing values • interpolate() -> Fill values using interpolation
  • 16.
    EXAMPLE: HANDLING MISSINGDATA import pandas as pd data = {'A': [1, None, 3], 'B': [4, 5, None]} df = pd.DataFrame(data) print(df.isnull()) # True for NaNs print(df.fillna(0)) # Replace NaNs with 0 print(df.dropna()) # Drop rows with NaNs
  • 17.
    PANDAS IN ML,DATA ANALYSIS & VISUALIZATION • Pandas is widely used in: • - Machine Learning (data preprocessing, feature engineering) • - Data Analysis (exploratory data analysis) • - DataVisualization (with libraries like matplotlib, seaborn)
  • 18.
    FUNCTIONS USED INML AND ANALYSIS • df.drop() -> Drop columns/rows (e.g., irrelevant features) • df.fillna() -> Handle missing data • df.astype() -> Convert data types (e.g., to float or int) • df['col'].map() -> Transform values (e.g., label encoding) • df.groupby() -> Group and summarize data • df.corr() -> Correlation matrix for features • df.describe() -> Statistical summary of data
  • 19.
    FUNCTIONS USED IN VISUALIZATION •df.plot() -> Quick plot of data • df.hist() -> Histogram of numeric columns • df.boxplot() -> Boxplot for distribution and outliers • df.plot(kind='bar') -> Bar chart • df.plot(kind='line') -> Line chart for time series