SlideShare a Scribd company logo
1 of 45
Unit 3: Basics of Numpy
21BCA2T452 : Python Programming
Prof. Vishnu Priya P M
Assistant Professor Dept. of Computer Science
Kristu Jayanti College, Autonomous
(Reaccredited A++ Grade by NAAC with CGPA 3.78/4)
Bengaluru – 560077, India
NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific
computing. It provides support for arrays (multi-dimensional, homogeneous data structures)
and a wide range of mathematical functions to perform vectorized computations efficiently.
Installing NumPy
Before using NumPy, you need to make sure it's installed. You can install it using pip:
pip install numpy
VISHNU PRIYA P M 2
Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
VISHNU PRIYA P M 3
3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
VISHNU PRIYA P M 4
BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication,
and division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]
VISHNU PRIYA P M 5
2. Indexing and Slicing:
You can access individual elements and slices of NumPy arrays using indexing and slicing:
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
3. Array Shape and Reshaping:
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
4. Aggregation Functions:
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value
VISHNU PRIYA P M 6
VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences
of data without the need for explicit loops. This approach leverages highly optimized, low-level
code to achieve faster and more efficient computations. The primary library for vectorized
computation in Python is NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays
or lists. For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
VISHNU PRIYA P M 7
Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how
how you can achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])
VISHNU PRIYA P M 8
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures:
the DataFrame and the Series. These data structures are designed to handle structured data, making it easier
to work with datasets in a tabular format.
DataFrame:
 A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type (e.g., integers, floats,
strings, or even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is a column.
 DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data
cleaning, exploration, and transformation.
VISHNU PRIYA P M 9
import pandas as pd
# Creating a DataFrame from a dictionary of data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
Here's a basic example of how to create a DataFrame using Pandas:
VISHNU PRIYA P M 10
Series:
 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning
all the data within a Series must be of the same data type. For example, if you create a Series with integer
values, all values within that Series will be integers.
 Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or
names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can
specify custom labels if needed.
 Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have
columns or rows like a DataFrame.
VISHNU PRIYA P M 11
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64
VISHNU PRIYA P M 12
Some common tasks you can perform with Pandas:
 Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL
databases, and more.
 Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
 Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
 Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific
criteria.
 Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and
Seaborn to create informative plots and charts.
VISHNU PRIYA P M 13
A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous
tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data
manipulation and analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
DataFrame
VISHNU PRIYA P M 14
• From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3. Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.
VISHNU PRIYA P M 15
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For
example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row
VISHNU PRIYA P M 16
6. Data Analysis:
Pandas provides various functions for data analysis, such
as describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
VISHNU PRIYA P M 17
INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data
structures. It provides the labels or names for the rows or columns of your data. You can use
indexing, selection, and filtering techniques with these indexes to access specific data points or
subsets of your data. Here's how you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label-
based indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element
by label and column name
VISHNU PRIYA P M 18
• Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
VISHNU PRIYA P M 19
df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
VISHNU PRIYA P M 20
3. Filtering:
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your
DataFrame to select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)
VISHNU PRIYA P M 21
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the
.reset_index() method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
VISHNU PRIYA P M 22
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of
the objects involved in the operation, which ensures that the result of the operation maintains data integrity and is
aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas:
1. Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two
Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data
based on common labels and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match
between series1 and series2.
VISHNU PRIYA P M 23
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them. The alignment occurs both
for rows (based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and
df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or
columns with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values
VISHNU PRIYA P M 24
5. Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to
match the shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work
with datasets of different shapes without needing to manually align them. It ensures that operations are
performed in a way that maintains the integrity and structure of your data.
VISHNU PRIYA P M 25
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike
Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the
same data type). Here's how arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the
arrays being operated on. This means that if you perform an operation between two NumPy arrays of
different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
VISHNU PRIYA P M 26
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is
possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting
array is the result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2
In this case, result will be [4, 10, 18].
VISHNU PRIYA P M 27
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques,
including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use
the np.vectorize() function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire
arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be
applied element-wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.
VISHNU PRIYA P M 28
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional
array. This is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
VISHNU PRIYA P M 29
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be
applied element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
VISHNU PRIYA P M 30
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex
mapping operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.
VISHNU PRIYA P M 31
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in
Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or
rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort(): This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)
VISHNU PRIYA P M 32
np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the
original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
sorted_arr = arr[indices]
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s)
to sort by and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)
VISHNU PRIYA P M 33
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.
You can then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle
ties (e.g., assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
df = pd.DataFrame(data)
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:
VISHNU PRIYA P M 34
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
VISHNU PRIYA P M 35
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)
VISHNU PRIYA P M 36
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov()
functions, respectively. These functions are useful for analyzing relationships and dependencies between
variables. Here's how to use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables.
It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear
correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
VISHNU PRIYA P M 37
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
VISHNU PRIYA P M 38
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive
relationship (both variables increase or decrease together), while negative values indicate an inverse
relationship (one variable increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns,
you can compute the correlation matrix or covariance matrix for all pairs of variables.
VISHNU PRIYA P M 39
HANDLING MISSING DATA
Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides
several ways to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here
are some common techniques for handling missing data in NumPy:
Using np.nan: NumPy represents missing data using np.nan. You can create arrays with missing values like this:
import numpy as np
arr = np.array([1.0, 2.0, np.nan, 4.0])
Now, arr contains a missing value represented as np.nan.
VISHNU PRIYA P M 40
Checking for Missing Data: You can check for missing values using the np.isnan() function. For example:
np.isnan(arr) # Returns a boolean array indicating which elements are NaN.
Filtering Missing Data: To filter out missing values from an array, you can use boolean indexing. For example:
arr[~np.isnan(arr)] # Returns an array without NaN values.
Replacing Missing Data: You can replace missing values with a specific value using np.nan_to_num() or
np.nanmean(). For example:
arr[np.isnan(arr)] = 0 # Replace NaN with 0
Or, to replace NaN with the mean of the non-missing values:
mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean
VISHNU PRIYA P M 41
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values. You
can use functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing
the result.
Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing
values. NumPy provides functions like np.interp() for this purpose.
Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data
more explicitly by creating a mask that specifies which values are missing. This can be useful for certain
computations.
Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can
apply the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to
handle missing data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends on your data
analysis goals and the context of your data. Some methods may be more appropriate than others,
depending on your use case.
VISHNU PRIYA P M 42
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi-
dimensional arrays where each dimension has multiple levels or labels. This is particularly useful when you want
to represent higher-dimensional data with more complex hierarchical structures.
You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:
import numpy as np
# Create a MultiIndex with two levels
index = np.array([['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
multi_index = np.vstack((index, ['X', 'Y', 'X', 'Y'])).T
# Create a random data array
data = np.random.rand(4, 3)
# Create a DataFrame with MultiIndex
df = pd.DataFrame(data, index=multi_index, columns=['Value1', 'Value2', 'Value3'])
VISHNU PRIYA P M 43
In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second
level. Then, we've created a DataFrame with this MultiIndex and some random data.
You can access data from this DataFrame using hierarchical indexing. For example:
# Accessing data using hierarchical indexing
value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # Access Value1 for 'A', 1, 'X'
VISHNU PRIYA P M 44
Some common operations with hierarchical indexing include:
Slicing: You can perform slices at each level of the index, allowing you to select specific subsets of the data.
Stacking and Unstacking: You can stack or unstack levels to convert between a wide and long format, which
can be useful for different types of analyses.
Swapping Levels: You can swap levels to change the order of the levels in the index.
Grouping and Aggregating: You can group data based on levels of the index and perform aggregation
functions like mean, sum, etc.
Reordering Levels: You can change the order of levels in the index.
Resetting Index: You can reset the index to move the hierarchical index levels back to columns.
VISHNU PRIYA P M 45
Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel
data or data with multiple categorical variables. It allows for more expressive data organization and
manipulation. You can also use the pd.MultiIndex class from the pandas library, which provides more
advanced functionality for working with hierarchical data structures, including various methods for
creating and manipulating MultiIndex objects.

More Related Content

Similar to Unit 3_Numpy_VP.pptx

XII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptxXII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptxlekha572836
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
Lecture 3 intro2data
Lecture 3 intro2dataLecture 3 intro2data
Lecture 3 intro2dataJohnson Ubah
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
b09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptx
b09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptxb09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptx
b09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptxUtsabDas8
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
pandasppt with informative topics coverage.pptx
pandasppt with informative topics coverage.pptxpandasppt with informative topics coverage.pptx
pandasppt with informative topics coverage.pptxvallarasu200364
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptxKirti Verma
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-LearnDucat India
 

Similar to Unit 3_Numpy_VP.pptx (20)

XII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptxXII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptx
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
Lecture 3 intro2data
Lecture 3 intro2dataLecture 3 intro2data
Lecture 3 intro2data
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 
b09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptx
b09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptxb09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptx
b09e9e67-aeb9-460b-9f96-cfccb318d3a7.pptx
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
pandasppt with informative topics coverage.pptx
pandasppt with informative topics coverage.pptxpandasppt with informative topics coverage.pptx
pandasppt with informative topics coverage.pptx
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptx
 
interenship.pptx
interenship.pptxinterenship.pptx
interenship.pptx
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
DataFrame Creation.pptx
DataFrame Creation.pptxDataFrame Creation.pptx
DataFrame Creation.pptx
 
Data Science Using Scikit-Learn
Data Science Using Scikit-LearnData Science Using Scikit-Learn
Data Science Using Scikit-Learn
 

More from vishnupriyapm4

PCCF UNIT 2 CLASS.pptx
PCCF UNIT 2 CLASS.pptxPCCF UNIT 2 CLASS.pptx
PCCF UNIT 2 CLASS.pptxvishnupriyapm4
 
Introduction to DBMS_VP.pptx
Introduction to DBMS_VP.pptxIntroduction to DBMS_VP.pptx
Introduction to DBMS_VP.pptxvishnupriyapm4
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptxvishnupriyapm4
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptxvishnupriyapm4
 
OPS Ecosystem and Engineering.pptx
OPS Ecosystem and Engineering.pptxOPS Ecosystem and Engineering.pptx
OPS Ecosystem and Engineering.pptxvishnupriyapm4
 
Project Planning and Management.pptx
Project Planning and Management.pptxProject Planning and Management.pptx
Project Planning and Management.pptxvishnupriyapm4
 
Software_Process_Model for class.ppt
Software_Process_Model for class.pptSoftware_Process_Model for class.ppt
Software_Process_Model for class.pptvishnupriyapm4
 
php user defined functions
php user defined functionsphp user defined functions
php user defined functionsvishnupriyapm4
 
Session and cookies in php
Session and cookies in phpSession and cookies in php
Session and cookies in phpvishnupriyapm4
 
Break and continue in C
Break and continue in C Break and continue in C
Break and continue in C vishnupriyapm4
 

More from vishnupriyapm4 (17)

PCCF UNIT 2 CLASS.pptx
PCCF UNIT 2 CLASS.pptxPCCF UNIT 2 CLASS.pptx
PCCF UNIT 2 CLASS.pptx
 
pccf unit 1 _VP.pptx
pccf unit 1 _VP.pptxpccf unit 1 _VP.pptx
pccf unit 1 _VP.pptx
 
Introduction to DBMS_VP.pptx
Introduction to DBMS_VP.pptxIntroduction to DBMS_VP.pptx
Introduction to DBMS_VP.pptx
 
Entity_DBMS.pptx
Entity_DBMS.pptxEntity_DBMS.pptx
Entity_DBMS.pptx
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptx
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptx
 
OPS Ecosystem and Engineering.pptx
OPS Ecosystem and Engineering.pptxOPS Ecosystem and Engineering.pptx
OPS Ecosystem and Engineering.pptx
 
Open Source VP.pptx
Open Source VP.pptxOpen Source VP.pptx
Open Source VP.pptx
 
Project Planning and Management.pptx
Project Planning and Management.pptxProject Planning and Management.pptx
Project Planning and Management.pptx
 
Software_Process_Model for class.ppt
Software_Process_Model for class.pptSoftware_Process_Model for class.ppt
Software_Process_Model for class.ppt
 
2.java intro.pptx
2.java intro.pptx2.java intro.pptx
2.java intro.pptx
 
features of JAVA.pptx
features of JAVA.pptxfeatures of JAVA.pptx
features of JAVA.pptx
 
php user defined functions
php user defined functionsphp user defined functions
php user defined functions
 
Session and cookies in php
Session and cookies in phpSession and cookies in php
Session and cookies in php
 
constant in C
constant in Cconstant in C
constant in C
 
File Handling in C
File Handling in CFile Handling in C
File Handling in C
 
Break and continue in C
Break and continue in C Break and continue in C
Break and continue in C
 

Recently uploaded

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Unit 3_Numpy_VP.pptx

  • 1. Unit 3: Basics of Numpy 21BCA2T452 : Python Programming Prof. Vishnu Priya P M Assistant Professor Dept. of Computer Science Kristu Jayanti College, Autonomous (Reaccredited A++ Grade by NAAC with CGPA 3.78/4) Bengaluru – 560077, India
  • 2. NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of mathematical functions to perform vectorized computations efficiently. Installing NumPy Before using NumPy, you need to make sure it's installed. You can install it using pip: pip install numpy VISHNU PRIYA P M 2
  • 3. Importing NumPy To use NumPy in your Python code, you should import it: import numpy as np By convention, it's common to import NumPy as np for brevity. Creating NumPy Arrays You can create NumPy arrays using various methods: 1. From Python Lists: arr = np.array([1, 2, 3, 4, 5]) 2. Using NumPy Functions: zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements ones_arr = np.ones(3) # Creates an array of ones with 3 elements rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1 VISHNU PRIYA P M 3
  • 4. 3. Using NumPy's Range Function: range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8] VISHNU PRIYA P M 4
  • 5. BASIC ARRAY OPERATIONS Once you have NumPy arrays, you can perform various operations on them: 1. Element-wise Operations: NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and division: a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) c = a + b # Element-wise addition: [5, 7, 9] d = a * b # Element-wise multiplication: [4, 10, 18] VISHNU PRIYA P M 5
  • 6. 2. Indexing and Slicing: You can access individual elements and slices of NumPy arrays using indexing and slicing: arr = np.array([0, 1, 2, 3, 4, 5]) element = arr[2] # Access element at index 2 (value: 2) sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4]) 3. Array Shape and Reshaping: You can check and change the shape of NumPy arrays: arr = np.array([[1, 2, 3], [4, 5, 6]]) shape = arr.shape # Get the shape (2, 3) reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2) 4. Aggregation Functions: NumPy provides functions to compute statistics on arrays: arr = np.array([1, 2, 3, 4, 5]) mean = np.mean(arr) # Calculate the mean (average) max_val = np.max(arr) # Find the maximum value min_val = np.min(arr) # Find the minimum value VISHNU PRIYA P M 6
  • 7. VECTORIZED COMPUTATION Vectorized computation in Python refers to performing operations on entire arrays or sequences of data without the need for explicit loops. This approach leverages highly optimized, low-level code to achieve faster and more efficient computations. The primary library for vectorized computation in Python is NumPy. Traditional Loop-Based Computation In traditional Python programming, you might use explicit loops to perform operations on arrays or lists. For example: # Using loops to add two lists element-wise list1 = [1, 2, 3] list2 = [4, 5, 6] result = [] for i in range(len(list1)): result.append(list1[i] + list2[i]) # Result: [5, 7, 9] VISHNU PRIYA P M 7
  • 8. Vectorized Computation with NumPy NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how how you can achieve the same result using NumPy: import numpy as np # Using NumPy for element-wise addition arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 + arr2 # Result: array([5, 7, 9]) VISHNU PRIYA P M 8
  • 9. INTRODUCTION TO PANDAS DATA STRUCTURES Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the DataFrame and the Series. These data structures are designed to handle structured data, making it easier to work with datasets in a tabular format. DataFrame:  A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.  It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or even custom data types).  You can think of a DataFrame as a collection of Series objects, where each Series is a column.  DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning, exploration, and transformation. VISHNU PRIYA P M 9
  • 10. import pandas as pd # Creating a DataFrame from a dictionary of data data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'San Francisco', 'Los Angeles']} df = pd.DataFrame(data) # Displaying the DataFrame print(df) Here's a basic example of how to create a DataFrame using Pandas: VISHNU PRIYA P M 10
  • 11. Series:  A Series is a one-dimensional labeled array that can hold data of any data type.  It is like a column in a DataFrame or a single variable in statistics.  Series objects are commonly used for time series data, as well as other one-dimensional data. Key characteristics of a Pandas Series:  Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the data within a Series must be of the same data type. For example, if you create a Series with integer values, all values within that Series will be integers.  Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom labels if needed.  Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns or rows like a DataFrame. VISHNU PRIYA P M 11
  • 12. import pandas as pd # Create a Series from a list data = [10, 20, 30, 40, 50] series = pd.Series(data) # Display the Series print(series) 0 10 1 20 2 30 3 40 4 50 dtype: int64 VISHNU PRIYA P M 12
  • 13. Some common tasks you can perform with Pandas:  Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases, and more.  Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and transforming data types.  Data Selection: Easily select specific rows and columns of interest using various indexing techniques.  Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific criteria.  Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to create informative plots and charts. VISHNU PRIYA P M 13
  • 14. A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and analysis in Python. Here's how you can work with DataFrames in Python using Pandas: 1. Import Pandas: First, you need to import the Pandas library. import pandas as pd 2. Creating a DataFrame: You can create a DataFrame in several ways. Here are a few common methods: From a dictionary: data = {'Column1': [value1, value2, ...], 'Column2': [value1, value2, ...]} df = pd.DataFrame(data) DataFrame VISHNU PRIYA P M 14
  • 15. • From a list of lists: data = [[value1, value2], [value3, value4]] df = pd.DataFrame(data, columns=['Column1', 'Column2']) • From a CSV file: df = pd.read_csv('file.csv') 3. Viewing Data: You can use various methods to view and explore your DataFrame: df.head(): Displays the first few rows of the DataFrame. df.tail(): Displays the last few rows of the DataFrame. df.shape: Returns the number of rows and columns. df.columns: Returns the column names. df.info(): Provides information about the DataFrame, including data types and non-null counts. VISHNU PRIYA P M 15
  • 16. 4. Selecting Data: You can select specific columns or rows from a DataFrame using indexing or filtering. For example: df['Column1'] # Select a specific column df[['Column1', 'Column2']] # Select multiple columns df[df['Column1'] > 5] # Filter rows based on a condition 5. Modifying Data: You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example: df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column df.at[index, 'Column1'] = new_value # Update a specific value df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row VISHNU PRIYA P M 16
  • 17. 6. Data Analysis: Pandas provides various functions for data analysis, such as describe(), groupby(), agg(), and more. 7. Saving Data: You can save the DataFrame to a CSV file or other formats: df.to_csv('output.csv', index=False) VISHNU PRIYA P M 17
  • 18. INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It provides the labels or names for the rows or columns of your data. You can use indexing, selection, and filtering techniques with these indexes to access specific data points or subsets of your data. Here's how you can work with index objects in Pandas: 1. Indexing: Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label- based indexing and .iloc[] for integer-based indexing. • Label-based indexing: df.loc['label'] # Access a specific row by its label df.loc['label', 'column_name'] # Access a specific element by label and column name VISHNU PRIYA P M 18
  • 19. • Integer-based indexing: df.iloc[0] # Access the first row df.iloc[0, 1] # Access an element by row and column index 2. Selection: You can use various methods to select specific data based on conditions or criteria. • Select rows based on a condition: VISHNU PRIYA P M 19 df[df['Column'] > 5] # Select rows where 'Column' is greater than 5 • Select rows by multiple conditions: df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
  • 20. VISHNU PRIYA P M 20 3. Filtering: Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to select rows meeting the condition. Create a boolean mask: condition = df['Column'] > 5 Apply the mask to the DataFrame: filtered_df = df[condition] 4. Setting a New Index: You can set a specific column as the index of your DataFrame using the .set_index() method. df.set_index('Column_Name', inplace=True)
  • 21. VISHNU PRIYA P M 21 5. Resetting the Index: If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index() method. df.reset_index(inplace=True) 6. Multi-level Indexing: You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data structures. df.set_index(['Index1', 'Index2'], inplace=True) Index objects in Pandas are versatile and powerful for working with data because they enable you to access and manipulate your data in various ways, whether it's for data retrieval, filtering, or restructuring.
  • 22. ARITHMETIC AND DATA ALIGNMENT IN PANDAS VISHNU PRIYA P M 22 Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of the objects involved in the operation, which ensures that the result of the operation maintains data integrity and is aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas: 1. Automatic Alignment: When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data based on common labels and performs the operation only on matching labels. series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C']) series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D']) result = series1 + series2 In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match between series1 and series2.
  • 23. VISHNU PRIYA P M 23 2. Missing Data (NaN): When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values. 3. DataFrame Alignment: The same principles apply to DataFrames when performing operations between them. The alignment occurs both for rows (based on the index) and columns (based on column names). df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y']) df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z']) result = df1 + df2 In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2. 4. Handling Missing Data: You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns with missing data. result_filled = result.fillna(0) # Replace NaN with 0 result_dropped = result.dropna() # Remove rows or columns with NaN values
  • 24. VISHNU PRIYA P M 24 5. Alignment with Broadcasting: Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the shape of the Series. series = pd.Series([1, 2, 3]) scalar = 2 result = series * scalar In this example, result will be a Series with values [2, 4, 6]. Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way that maintains the integrity and structure of your data.
  • 25. VISHNU PRIYA P M 25 ARITHMETIC AND DATA ALIGNMENT IN NUMPY NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how arithmetic and data alignment work in NumPy: Automatic Alignment: NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays being operated on. This means that if you perform an operation between two NumPy arrays of different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise. import numpy as np arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5]) result = arr1 + arr2 In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
  • 26. VISHNU PRIYA P M 26 Broadcasting Rules: NumPy follows specific rules when broadcasting arrays: If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side. Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are compatible. If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error. Handling Missing Data: In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible. Element-Wise Operations: NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting array is the result of applying the operation to the corresponding elements in the input arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 * arr2 In this case, result will be [4, 10, 18].
  • 27. VISHNU PRIYA P M 27 APPLYING FUNCTIONS AND MAPPING In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize() function for mapping operations. Here's an overview of these approaches: Vectorized Functions: NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element-wise to arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Applying a function element-wise result = np.square(arr) # Square each element In this example, the np.square() function is applied element-wise to the arr array.
  • 28. VISHNU PRIYA P M 28 ‘np.apply_along_axis(): You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array. This is useful when you want to apply a function to each row or column of a 2D array. import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) # Apply a function along the rows (axis=1) def sum_of_row(row): return np.sum(row) result = np.apply_along_axis(sum_of_row, axis=1, arr=arr) In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
  • 29. VISHNU PRIYA P M 29 np.vectorize(): The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied element-wise to NumPy arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Apply the vectorized function to the array result = vectorized_func(arr) This approach is useful when you have a custom function that you want to apply to an array.
  • 30. VISHNU PRIYA P M 30 Mapping with np.vectorize(): You can use np.vectorize() to map a function to each element of an array. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Map the function to each element result = vectorized_func(arr) This approach is similar to applying a function element-wise but can be used for more complex mapping operations. These methods allow you to apply functions and perform mapping operations efficiently on NumPy arrays, making it a powerful library for numerical and scientific computing tasks.
  • 31. VISHNU PRIYA P M 31 SORTING AND RANKING Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries: Sorting in NumPy: In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions. np.sort(): This function returns a new sorted array without modifying the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) sorted_arr = np.sort(arr)
  • 32. VISHNU PRIYA P M 32 np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) sorted_arr = arr[indices] Sorting in Pandas: In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s) to sort by and the sorting order. import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 35]} df = pd.DataFrame(data) # Sort by 'Age' column in ascending order sorted_df = df.sort_values(by='Age', ascending=True)
  • 33. VISHNU PRIYA P M 33 NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements. You can then use these rankings to create a ranked array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0 Ranking in Pandas: In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle ties (e.g., assigning the average rank to tied values). import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 30]} df = pd.DataFrame(data) # Rank by 'Age' column in descending order and assign average rank to tied values df['Rank'] = df['Age'].rank(ascending=False, method='average') Ranking in NumPy:
  • 34. VISHNU PRIYA P M 34 SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS 1. Summary Statistics: NumPy provides functions to compute summary statistics directly on arrays. import numpy as np data = np.array([25, 30, 22, 35, 28]) mean = np.mean(data) median = np.median(data) std_dev = np.std(data) variance = np.var(data)
  • 35. VISHNU PRIYA P M 35 2. Percentiles and Quartiles: You can compute specific percentiles and quartiles using the np.percentile() function. percentile_25 = np.percentile(data, 25) percentile_75 = np.percentile(data, 75) 3. Correlation and Covariance: You can compute correlation and covariance between arrays using np.corrcoef() and np.cov(). correlation_matrix = np.corrcoef(data1, data2) covariance_matrix = np.cov(data1, data2)
  • 36. VISHNU PRIYA P M 36 CORRELATION AND COVARIANCE In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions, respectively. These functions are useful for analyzing relationships and dependencies between variables. Here's how to use them: Computing Correlation Coefficient (Correlation): The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation. import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6])
  • 37. VISHNU PRIYA P M 37 # Compute the correlation coefficient between x and y correlation_matrix = np.corrcoef(x, y) # The correlation coefficient is in the (0, 1) element of the matrix correlation_coefficient = correlation_matrix[0, 1] In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
  • 38. VISHNU PRIYA P M 38 Computing Covariance: Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship (both variables increase or decrease together), while negative values indicate an inverse relationship (one variable increases as the other decreases). import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6]) # Compute the covariance between x and y covariance_matrix = np.cov(x, y) # The covariance is in the (0, 1) element of the matrix covariance = covariance_matrix[0, 1] In this example, covariance will contain the covariance between x and y. Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can compute the correlation matrix or covariance matrix for all pairs of variables.
  • 39. VISHNU PRIYA P M 39 HANDLING MISSING DATA Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides several ways to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here are some common techniques for handling missing data in NumPy: Using np.nan: NumPy represents missing data using np.nan. You can create arrays with missing values like this: import numpy as np arr = np.array([1.0, 2.0, np.nan, 4.0]) Now, arr contains a missing value represented as np.nan.
  • 40. VISHNU PRIYA P M 40 Checking for Missing Data: You can check for missing values using the np.isnan() function. For example: np.isnan(arr) # Returns a boolean array indicating which elements are NaN. Filtering Missing Data: To filter out missing values from an array, you can use boolean indexing. For example: arr[~np.isnan(arr)] # Returns an array without NaN values. Replacing Missing Data: You can replace missing values with a specific value using np.nan_to_num() or np.nanmean(). For example: arr[np.isnan(arr)] = 0 # Replace NaN with 0 Or, to replace NaN with the mean of the non-missing values: mean = np.nanmean(arr) arr[np.isnan(arr)] = mean
  • 41. VISHNU PRIYA P M 41 Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values. You can use functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing the result. Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing values. NumPy provides functions like np.interp() for this purpose. Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data more explicitly by creating a mask that specifies which values are missing. This can be useful for certain computations. Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can apply the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to handle missing data along specific dimensions. Keep in mind that the specific method you choose to handle missing data depends on your data analysis goals and the context of your data. Some methods may be more appropriate than others, depending on your use case.
  • 42. VISHNU PRIYA P M 42 HIERARCHICAL INDEXING Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi- dimensional arrays where each dimension has multiple levels or labels. This is particularly useful when you want to represent higher-dimensional data with more complex hierarchical structures. You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example: import numpy as np # Create a MultiIndex with two levels index = np.array([['A', 'A', 'B', 'B'], [1, 2, 1, 2]]) multi_index = np.vstack((index, ['X', 'Y', 'X', 'Y'])).T # Create a random data array data = np.random.rand(4, 3) # Create a DataFrame with MultiIndex df = pd.DataFrame(data, index=multi_index, columns=['Value1', 'Value2', 'Value3'])
  • 43. VISHNU PRIYA P M 43 In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second level. Then, we've created a DataFrame with this MultiIndex and some random data. You can access data from this DataFrame using hierarchical indexing. For example: # Accessing data using hierarchical indexing value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # Access Value1 for 'A', 1, 'X'
  • 44. VISHNU PRIYA P M 44 Some common operations with hierarchical indexing include: Slicing: You can perform slices at each level of the index, allowing you to select specific subsets of the data. Stacking and Unstacking: You can stack or unstack levels to convert between a wide and long format, which can be useful for different types of analyses. Swapping Levels: You can swap levels to change the order of the levels in the index. Grouping and Aggregating: You can group data based on levels of the index and perform aggregation functions like mean, sum, etc. Reordering Levels: You can change the order of levels in the index. Resetting Index: You can reset the index to move the hierarchical index levels back to columns.
  • 45. VISHNU PRIYA P M 45 Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel data or data with multiple categorical variables. It allows for more expressive data organization and manipulation. You can also use the pd.MultiIndex class from the pandas library, which provides more advanced functionality for working with hierarchical data structures, including various methods for creating and manipulating MultiIndex objects.