Business
Analytics
Eng. Tasneem Mohammed Helil
Groupby
groupby() used to split a DataFrame into groups based on certain
columns, apply a function to each group
Use the groupby() method on the Store column and calculate the total sales by
summing the Amount column
df_sales = pd.DataFrame({ 'Store': ['Store A', 'Store A', 'Store B', 'Store C', 'Store B',
'Store C'], 'Amount': [100, 200, 150, 300, 250, 100] })
Exercise
You have a DataFrame that contains grades for different subjects. Each
row represents a student's grade in a specific subject.
Find the average grade for each subject
df_grades = pd.DataFrame({
'Subject': ['Math', 'Science', 'Math', 'English', 'Science', 'English’],
'Grade': [85, 90, 78, 88, 92, 81]
})
Try reading from gapminder-FiveYearData excel
Merge
Merging combines two DataFrames based on a key or common column(s).
This is similar to SQL joins, where you can combine data from different
sources based on a shared identifier (can be more than one column/feature)
df_employees = pd.DataFrame({'EmployeeID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df_salaries = pd.DataFrame({'EmployeeID': [1, 2], 'Salary': [50000, 55000]})
Exercise
Create a table that only includes employees who have a valid department
Using these Data Frames:
df_employees = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David’],
'DepartmentID': [1, 2, 1, 4]
})
df_departments = pd.DataFrame({
'DepartmentID': [1, 2, 3,5],
'DepartmentName': ['HR', 'IT', 'Sales','Marketing'] })
Try it with outer merge and drop
the NAN rows
join
The join function in Pandas is used to combine two DataFrames based
on their index (row labels) rather than columns. It’s useful when the
data already has the same index in both DataFrames.
Exercise
Ensure all students in the df_students DataFrame are included, even if
there is no corresponding score in df_scores
df_students = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],'Age':
[20, 21, 22, 23]},index=['S001', 'S002', 'S003', 'S004’])
df_scores = pd.DataFrame({'Math': [85, 90, 78, 88],'Science': [92, 80, 85,
91]},index=['S001', 'S002', 'S003', 'S005'])
Concat
Concatenation in Pandas is used to stack or combine DataFrames along
a particular axis (either rows or columns) as putting two or more
DataFrames together either vertically (like stacking one on top of
another) or horizontally (side-by-side)
df_jan = pd.DataFrame({'Store': ['A', 'B'], 'Sales': [200, 150]})
df_feb = pd.DataFrame({'Store': ['A', 'B'], 'Sales': [250, 100]})
df_combined = pd.concat([df_jan, df_feb], ignore_index=True)
Pivot
Pivoting reshapes data by turning unique values from one column into
separate columns. It’s useful for reorganizing data for easy analysis or
visualization.
Determine each product's sales in each month
df_sales = pd.DataFrame({ 'Product': ['A', 'A', 'B', 'B'], 'Region': ['North', 'South',
'North', 'South'], 'Month': ['Jan', 'Jan', 'Feb', 'Feb'], 'Sales': [100, 150, 200, 250] })
Exercise
Transform this data to show the total sales per store on specific dates
df_sales = pd.DataFrame({'store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store C', 'Store C’],
'date': ['2024-10-01', '2024-10-02', '2024-10-01', '2024-10-02', '2024-10-01', '2024-10-02'],
'product': ['A', 'A', 'B', 'B', 'A', 'B’],
'sales': [100, 120, 150, 160, 90, 110]})
Identify the Index Column: In this case, we want each store to be the index, so store will be the index.
Identify the Columns to Expand: We want each date to become a column in the resulting DataFrame.
Values to Fill: We want to fill each cell with the sales amount for each product in each store and date.
Thank You

Lab 4 lab4 file for python for 4 grade.pptx

  • 1.
  • 2.
    Groupby groupby() used tosplit a DataFrame into groups based on certain columns, apply a function to each group Use the groupby() method on the Store column and calculate the total sales by summing the Amount column df_sales = pd.DataFrame({ 'Store': ['Store A', 'Store A', 'Store B', 'Store C', 'Store B', 'Store C'], 'Amount': [100, 200, 150, 300, 250, 100] })
  • 3.
    Exercise You have aDataFrame that contains grades for different subjects. Each row represents a student's grade in a specific subject. Find the average grade for each subject df_grades = pd.DataFrame({ 'Subject': ['Math', 'Science', 'Math', 'English', 'Science', 'English’], 'Grade': [85, 90, 78, 88, 92, 81] }) Try reading from gapminder-FiveYearData excel
  • 4.
    Merge Merging combines twoDataFrames based on a key or common column(s). This is similar to SQL joins, where you can combine data from different sources based on a shared identifier (can be more than one column/feature) df_employees = pd.DataFrame({'EmployeeID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']}) df_salaries = pd.DataFrame({'EmployeeID': [1, 2], 'Salary': [50000, 55000]})
  • 5.
    Exercise Create a tablethat only includes employees who have a valid department Using these Data Frames: df_employees = pd.DataFrame({ 'EmployeeID': [101, 102, 103, 104], 'Name': ['Alice', 'Bob', 'Charlie', 'David’], 'DepartmentID': [1, 2, 1, 4] }) df_departments = pd.DataFrame({ 'DepartmentID': [1, 2, 3,5], 'DepartmentName': ['HR', 'IT', 'Sales','Marketing'] }) Try it with outer merge and drop the NAN rows
  • 6.
    join The join functionin Pandas is used to combine two DataFrames based on their index (row labels) rather than columns. It’s useful when the data already has the same index in both DataFrames.
  • 7.
    Exercise Ensure all studentsin the df_students DataFrame are included, even if there is no corresponding score in df_scores df_students = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],'Age': [20, 21, 22, 23]},index=['S001', 'S002', 'S003', 'S004’]) df_scores = pd.DataFrame({'Math': [85, 90, 78, 88],'Science': [92, 80, 85, 91]},index=['S001', 'S002', 'S003', 'S005'])
  • 8.
    Concat Concatenation in Pandasis used to stack or combine DataFrames along a particular axis (either rows or columns) as putting two or more DataFrames together either vertically (like stacking one on top of another) or horizontally (side-by-side) df_jan = pd.DataFrame({'Store': ['A', 'B'], 'Sales': [200, 150]}) df_feb = pd.DataFrame({'Store': ['A', 'B'], 'Sales': [250, 100]}) df_combined = pd.concat([df_jan, df_feb], ignore_index=True)
  • 9.
    Pivot Pivoting reshapes databy turning unique values from one column into separate columns. It’s useful for reorganizing data for easy analysis or visualization. Determine each product's sales in each month df_sales = pd.DataFrame({ 'Product': ['A', 'A', 'B', 'B'], 'Region': ['North', 'South', 'North', 'South'], 'Month': ['Jan', 'Jan', 'Feb', 'Feb'], 'Sales': [100, 150, 200, 250] })
  • 10.
    Exercise Transform this datato show the total sales per store on specific dates df_sales = pd.DataFrame({'store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store C', 'Store C’], 'date': ['2024-10-01', '2024-10-02', '2024-10-01', '2024-10-02', '2024-10-01', '2024-10-02'], 'product': ['A', 'A', 'B', 'B', 'A', 'B’], 'sales': [100, 120, 150, 160, 90, 110]}) Identify the Index Column: In this case, we want each store to be the index, so store will be the index. Identify the Columns to Expand: We want each date to become a column in the resulting DataFrame. Values to Fill: We want to fill each cell with the sales amount for each product in each store and date.
  • 11.