Lab 4 lab4 file for python for 4 grade.pptx

Business
Analytics
Eng. Tasneem Mohammed Helil

Groupby
groupby() used to split a DataFrame into groups based on certain
columns, apply a function to each group
Use the groupby() method on the Store column and calculate the total sales by
summing the Amount column
df_sales = pd.DataFrame({ 'Store': ['Store A', 'Store A', 'Store B', 'Store C', 'Store B',
'Store C'], 'Amount': [100, 200, 150, 300, 250, 100] })

Exercise
You have a DataFrame that contains grades for different subjects. Each
row represents a student's grade in a specific subject.
Find the average grade for each subject
df_grades = pd.DataFrame({
'Subject': ['Math', 'Science', 'Math', 'English', 'Science', 'English’],
'Grade': [85, 90, 78, 88, 92, 81]
})
Try reading from gapminder-FiveYearData excel

Merge
Merging combines two DataFrames based on a key or common column(s).
This is similar to SQL joins, where you can combine data from different
sources based on a shared identifier (can be more than one column/feature)
df_employees = pd.DataFrame({'EmployeeID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df_salaries = pd.DataFrame({'EmployeeID': [1, 2], 'Salary': [50000, 55000]})

Exercise
Create a table that only includes employees who have a valid department
Using these Data Frames:
df_employees = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David’],
'DepartmentID': [1, 2, 1, 4]
})
df_departments = pd.DataFrame({
'DepartmentID': [1, 2, 3,5],
'DepartmentName': ['HR', 'IT', 'Sales','Marketing'] })
Try it with outer merge and drop
the NAN rows

join
The join function in Pandas is used to combine two DataFrames based
on their index (row labels) rather than columns. It’s useful when the
data already has the same index in both DataFrames.

Exercise
Ensure all students in the df_students DataFrame are included, even if
there is no corresponding score in df_scores
df_students = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],'Age':
[20, 21, 22, 23]},index=['S001', 'S002', 'S003', 'S004’])
df_scores = pd.DataFrame({'Math': [85, 90, 78, 88],'Science': [92, 80, 85,
91]},index=['S001', 'S002', 'S003', 'S005'])

Concat
Concatenation in Pandas is used to stack or combine DataFrames along
a particular axis (either rows or columns) as putting two or more
DataFrames together either vertically (like stacking one on top of
another) or horizontally (side-by-side)
df_jan = pd.DataFrame({'Store': ['A', 'B'], 'Sales': [200, 150]})
df_feb = pd.DataFrame({'Store': ['A', 'B'], 'Sales': [250, 100]})
df_combined = pd.concat([df_jan, df_feb], ignore_index=True)

Pivot
Pivoting reshapes data by turning unique values from one column into
separate columns. It’s useful for reorganizing data for easy analysis or
visualization.
Determine each product's sales in each month
df_sales = pd.DataFrame({ 'Product': ['A', 'A', 'B', 'B'], 'Region': ['North', 'South',
'North', 'South'], 'Month': ['Jan', 'Jan', 'Feb', 'Feb'], 'Sales': [100, 150, 200, 250] })

Exercise
Transform this data to show the total sales per store on specific dates
df_sales = pd.DataFrame({'store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store C', 'Store C’],
'date': ['2024-10-01', '2024-10-02', '2024-10-01', '2024-10-02', '2024-10-01', '2024-10-02'],
'product': ['A', 'A', 'B', 'B', 'A', 'B’],
'sales': [100, 120, 150, 160, 90, 110]})
Identify the Index Column: In this case, we want each store to be the index, so store will be the index.
Identify the Columns to Expand: We want each date to become a column in the resulting DataFrame.
Values to Fill: We want to fill each cell with the sales amount for each product in each store and date.

Lab 4 lab4 file for python for 4 grade.pptx

More Related Content

Similar to Lab 4 lab4 file for python for 4 grade.pptx

Recently uploaded

Lab 4 lab4 file for python for 4 grade.pptx