call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
Unit 3_Numpy_VP.pptx
1. Unit 3: Basics of Numpy
21BCA2T452 : Python Programming
Prof. Vishnu Priya P M
Assistant Professor Dept. of Computer Science
Kristu Jayanti College, Autonomous
(Reaccredited A++ Grade by NAAC with CGPA 3.78/4)
Bengaluru – 560077, India
2. NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific
computing. It provides support for arrays (multi-dimensional, homogeneous data structures)
and a wide range of mathematical functions to perform vectorized computations efficiently.
Installing NumPy
Before using NumPy, you need to make sure it's installed. You can install it using pip:
pip install numpy
VISHNU PRIYA P M 2
3. Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
VISHNU PRIYA P M 3
4. 3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
VISHNU PRIYA P M 4
5. BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication,
and division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]
VISHNU PRIYA P M 5
6. 2. Indexing and Slicing:
You can access individual elements and slices of NumPy arrays using indexing and slicing:
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
3. Array Shape and Reshaping:
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
4. Aggregation Functions:
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value
VISHNU PRIYA P M 6
7. VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences
of data without the need for explicit loops. This approach leverages highly optimized, low-level
code to achieve faster and more efficient computations. The primary library for vectorized
computation in Python is NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays
or lists. For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
VISHNU PRIYA P M 7
8. Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how
how you can achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])
VISHNU PRIYA P M 8
9. INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures:
the DataFrame and the Series. These data structures are designed to handle structured data, making it easier
to work with datasets in a tabular format.
DataFrame:
A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
It consists of rows and columns, where each column can have a different data type (e.g., integers, floats,
strings, or even custom data types).
You can think of a DataFrame as a collection of Series objects, where each Series is a column.
DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data
cleaning, exploration, and transformation.
VISHNU PRIYA P M 9
10. import pandas as pd
# Creating a DataFrame from a dictionary of data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
Here's a basic example of how to create a DataFrame using Pandas:
VISHNU PRIYA P M 10
11. Series:
A Series is a one-dimensional labeled array that can hold data of any data type.
It is like a column in a DataFrame or a single variable in statistics.
Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning
all the data within a Series must be of the same data type. For example, if you create a Series with integer
values, all values within that Series will be integers.
Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or
names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can
specify custom labels if needed.
Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have
columns or rows like a DataFrame.
VISHNU PRIYA P M 11
12. import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64
VISHNU PRIYA P M 12
13. Some common tasks you can perform with Pandas:
Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL
databases, and more.
Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific
criteria.
Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and
Seaborn to create informative plots and charts.
VISHNU PRIYA P M 13
14. A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous
tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data
manipulation and analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
DataFrame
VISHNU PRIYA P M 14
15. • From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3. Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.
VISHNU PRIYA P M 15
16. 4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For
example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row
VISHNU PRIYA P M 16
17. 6. Data Analysis:
Pandas provides various functions for data analysis, such
as describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
VISHNU PRIYA P M 17
18. INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data
structures. It provides the labels or names for the rows or columns of your data. You can use
indexing, selection, and filtering techniques with these indexes to access specific data points or
subsets of your data. Here's how you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label-
based indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element
by label and column name
VISHNU PRIYA P M 18
19. • Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
VISHNU PRIYA P M 19
df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
20. VISHNU PRIYA P M 20
3. Filtering:
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your
DataFrame to select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)
21. VISHNU PRIYA P M 21
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the
.reset_index() method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.
22. ARITHMETIC AND DATA ALIGNMENT IN PANDAS
VISHNU PRIYA P M 22
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of
the objects involved in the operation, which ensures that the result of the operation maintains data integrity and is
aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas:
1. Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two
Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data
based on common labels and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match
between series1 and series2.
23. VISHNU PRIYA P M 23
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them. The alignment occurs both
for rows (based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and
df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or
columns with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values
24. VISHNU PRIYA P M 24
5. Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to
match the shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work
with datasets of different shapes without needing to manually align them. It ensures that operations are
performed in a way that maintains the integrity and structure of your data.
25. VISHNU PRIYA P M 25
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike
Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the
same data type). Here's how arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the
arrays being operated on. This means that if you perform an operation between two NumPy arrays of
different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
26. VISHNU PRIYA P M 26
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is
possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting
array is the result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2
In this case, result will be [4, 10, 18].
27. VISHNU PRIYA P M 27
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques,
including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use
the np.vectorize() function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire
arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be
applied element-wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.
28. VISHNU PRIYA P M 28
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional
array. This is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
29. VISHNU PRIYA P M 29
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be
applied element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
30. VISHNU PRIYA P M 30
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex
mapping operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.
31. VISHNU PRIYA P M 31
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in
Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or
rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort(): This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)
32. VISHNU PRIYA P M 32
np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the
original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
sorted_arr = arr[indices]
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s)
to sort by and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)
33. VISHNU PRIYA P M 33
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.
You can then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle
ties (e.g., assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
df = pd.DataFrame(data)
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:
34. VISHNU PRIYA P M 34
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
35. VISHNU PRIYA P M 35
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)
36. VISHNU PRIYA P M 36
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov()
functions, respectively. These functions are useful for analyzing relationships and dependencies between
variables. Here's how to use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables.
It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear
correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
37. VISHNU PRIYA P M 37
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
38. VISHNU PRIYA P M 38
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive
relationship (both variables increase or decrease together), while negative values indicate an inverse
relationship (one variable increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns,
you can compute the correlation matrix or covariance matrix for all pairs of variables.
39. VISHNU PRIYA P M 39
HANDLING MISSING DATA
Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides
several ways to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here
are some common techniques for handling missing data in NumPy:
Using np.nan: NumPy represents missing data using np.nan. You can create arrays with missing values like this:
import numpy as np
arr = np.array([1.0, 2.0, np.nan, 4.0])
Now, arr contains a missing value represented as np.nan.
40. VISHNU PRIYA P M 40
Checking for Missing Data: You can check for missing values using the np.isnan() function. For example:
np.isnan(arr) # Returns a boolean array indicating which elements are NaN.
Filtering Missing Data: To filter out missing values from an array, you can use boolean indexing. For example:
arr[~np.isnan(arr)] # Returns an array without NaN values.
Replacing Missing Data: You can replace missing values with a specific value using np.nan_to_num() or
np.nanmean(). For example:
arr[np.isnan(arr)] = 0 # Replace NaN with 0
Or, to replace NaN with the mean of the non-missing values:
mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean
41. VISHNU PRIYA P M 41
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values. You
can use functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing
the result.
Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing
values. NumPy provides functions like np.interp() for this purpose.
Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data
more explicitly by creating a mask that specifies which values are missing. This can be useful for certain
computations.
Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can
apply the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to
handle missing data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends on your data
analysis goals and the context of your data. Some methods may be more appropriate than others,
depending on your use case.
42. VISHNU PRIYA P M 42
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi-
dimensional arrays where each dimension has multiple levels or labels. This is particularly useful when you want
to represent higher-dimensional data with more complex hierarchical structures.
You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:
import numpy as np
# Create a MultiIndex with two levels
index = np.array([['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
multi_index = np.vstack((index, ['X', 'Y', 'X', 'Y'])).T
# Create a random data array
data = np.random.rand(4, 3)
# Create a DataFrame with MultiIndex
df = pd.DataFrame(data, index=multi_index, columns=['Value1', 'Value2', 'Value3'])
43. VISHNU PRIYA P M 43
In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second
level. Then, we've created a DataFrame with this MultiIndex and some random data.
You can access data from this DataFrame using hierarchical indexing. For example:
# Accessing data using hierarchical indexing
value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # Access Value1 for 'A', 1, 'X'
44. VISHNU PRIYA P M 44
Some common operations with hierarchical indexing include:
Slicing: You can perform slices at each level of the index, allowing you to select specific subsets of the data.
Stacking and Unstacking: You can stack or unstack levels to convert between a wide and long format, which
can be useful for different types of analyses.
Swapping Levels: You can swap levels to change the order of the levels in the index.
Grouping and Aggregating: You can group data based on levels of the index and perform aggregation
functions like mean, sum, etc.
Reordering Levels: You can change the order of levels in the index.
Resetting Index: You can reset the index to move the hierarchical index levels back to columns.
45. VISHNU PRIYA P M 45
Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel
data or data with multiple categorical variables. It allows for more expressive data organization and
manipulation. You can also use the pd.MultiIndex class from the pandas library, which provides more
advanced functionality for working with hierarchical data structures, including various methods for
creating and manipulating MultiIndex objects.