SlideShare a Scribd company logo
1 of 43
UNIT 3: BASICS OF NUMPY
1
NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific
computing. It provides support for arrays (multi-dimensional, homogeneous data structures)
and a wide range of mathematical functions to perform vectorized computations efficiently. This
guide will cover some of the basics of working with NumPy arrays and performing vectorized
computations.
Installing NumPy
Before using NumPy, you need to make sure it's installed. You can install it using pip:
pip install numpy
2
Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
3
3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
4
BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication,
and division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]
5
2. Indexing and Slicing:
You can access individual elements and slices of NumPy arrays using indexing and slicing:
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
3. Array Shape and Reshaping:
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
4. Aggregation Functions:
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value
6
VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences
of data without the need for explicit loops. This approach leverages highly optimized, low-level
code to achieve faster and more efficient computations. The primary library for vectorized
computation in Python is NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays
or lists. For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
7
Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how
how you can achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])
8
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures:
the DataFrame and the Series. These data structures are designed to handle structured data, making it easier
to work with datasets in a tabular format.
DataFrame:
 A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type (e.g., integers, floats,
strings, or even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is a column.
 DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data
cleaning, exploration, and transformation. 9
Here's a basic example of how to create a DataFrame using Pandas:
10
Series:
 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning
all the data within a Series must be of the same data type. For example, if you create a Series with integer
values, all values within that Series will be integers.
 Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or
names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can
specify custom labels if needed.
 Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have
columns or rows like a DataFrame.
11
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64
12
Some common tasks you can perform with Pandas:
 Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL
databases, and more.
 Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
 Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
 Data Aggregation: Perform group by operations, calculate statistics, and aggregate data based on specific
criteria.
 Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and
Seaborn to create informative plots and charts.
13
A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous
tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data
manipulation and analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
DataFrame
14
• From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3. Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts. 15
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For
example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a
new column
df.at[index, 'Column1'] = new_value # Update a specific
value
df = df.append({'Column1': value1, 'Column2': value2},
ignore_index=True) # Append a new row
16
6. Data Analysis:
Pandas provides various functions for data analysis, such
as describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
17
INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data
structures.
It provides the labels or names for the rows or columns of your data. You can use indexing,
selection, and filtering techniques with these indexes to access specific data points or subsets of
your data. Here's how you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label-
based indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element
by label and column name
18
• Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
19
df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
20
3. Filtering:
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your
DataFrame to select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)
21
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the
.reset_index() method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
22
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series an
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels o
the objects involved in the operation, which ensures that the result of the operation maintains data integrity and
aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas:
1. Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between tw
Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the dat
based on common labels and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C'])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't matc
between series1 and series2.
23
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them. The alignment occurs both
for rows (based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and
df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or
columns with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values
24
5. Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to
match the shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work
with datasets of different shapes without needing to manually align them. It ensures that operations are
performed in a way that maintains the integrity and structure of your data.
25
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike
Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the
same data type). Here's how arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the
arrays being operated on. This means that if you perform an operation between two NumPy arrays of
different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
26
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is
possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting
array is the result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2
In this case, result will be [4, 10, 18].
WHAT IS VECTORIZATION ?
 Vectorization is used to speed up the Python code without using loop.
 Using such a function can help in minimizing the running time of code efficiently.
 Various operations are being performed over vector such as dot product of vectors which is also known
as scalar product as it produces single output, outer products which results in square matrix of
dimension equal to length X length of the vectors, Element wise multiplication which products the
element of same indexes and dimension of the matrix remain unchanged.
27
28
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques,
including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use
the np.vectorize() function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire
arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be
applied element-wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.
29
HOW TO CREATE YOUR OWN UFUNC
To create your own ufunc(Universal Functions), you have to define a function, like you do with normal
functions in Python, then you add it to your NumPy ufunc library with the frompyfunc() method.
 ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.
 They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for
computation.
 ufuncs also take additional arguments, like:
The frompyfunc() method takes the following arguments:
1.function - the name of the function.
2.inputs - the number of input arguments (arrays).
3.outputs - the number of output arrays.
30
31
32
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional
array. This is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
33
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be
applied element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
34
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex
mapping operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.
35
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in
Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or
rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort(): This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)
 np. sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices.
The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted
array [1, 2, 4, 5, 6, 8, 9, 10] .
36
37
np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the
original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
sorted_arr = arr[indices]
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s)
to sort by and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)
38
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.
You can then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle
ties (e.g., assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
df = pd.DataFrame(data)
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:
39
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
40
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)
41
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov()
functions, respectively. These functions are useful for analyzing relationships and dependencies between
variables. Here's how to use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables.
It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear
correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
42
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
43
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive
relationship (both variables increase or decrease together), while negative values indicate an inverse
relationship (one variable increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns,
you can compute the correlation matrix or covariance matrix for all pairs of variables.

More Related Content

Similar to Unit 3_Numpy_Vsp.pptx

Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptxKirti Verma
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxSandeep Singh
 
Python for Data Analysis.pdf
Python for Data Analysis.pdfPython for Data Analysis.pdf
Python for Data Analysis.pdfJulioRecaldeLara1
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxtangadhurai
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfssuser598883
 
2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptxPeangSereysothirich
 
Lecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision MakingLecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision Makingssuser46aec4
 
pandas directories on the python language.pptx
pandas directories on the python language.pptxpandas directories on the python language.pptx
pandas directories on the python language.pptxSumitMajukar
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxAkashgupta517936
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfUmarMustafa13
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfKrishnaJyotish1
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...HendraPurnama31
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 

Similar to Unit 3_Numpy_Vsp.pptx (20)

interenship.pptx
interenship.pptxinterenship.pptx
interenship.pptx
 
Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptx
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Python for Data Analysis.pdf
Python for Data Analysis.pdfPython for Data Analysis.pdf
Python for Data Analysis.pdf
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
 
2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx2. Data Preprocessing with Numpy and Pandas.pptx
2. Data Preprocessing with Numpy and Pandas.pptx
 
Lecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision MakingLecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision Making
 
pandas directories on the python language.pptx
pandas directories on the python language.pptxpandas directories on the python language.pptx
pandas directories on the python language.pptx
 
python-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptxpython-numpyandpandas-170922144956 (1).pptx
python-numpyandpandas-170922144956 (1).pptx
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdf
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
Matplotlib adalah pustaka plotting 2D Python yang menghasilkan gambar berkual...
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Pandas csv
Pandas csvPandas csv
Pandas csv
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 

More from prakashvs7

Python lambda.pptx
Python lambda.pptxPython lambda.pptx
Python lambda.pptxprakashvs7
 
Unit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptxUnit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptxprakashvs7
 
unit 5_Real time Data Analysis vsp.pptx
unit 5_Real time Data Analysis  vsp.pptxunit 5_Real time Data Analysis  vsp.pptx
unit 5_Real time Data Analysis vsp.pptxprakashvs7
 
final Unit 1-1.pdf
final Unit 1-1.pdffinal Unit 1-1.pdf
final Unit 1-1.pdfprakashvs7
 
PCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docxPCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docxprakashvs7
 
AI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptxAI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptxprakashvs7
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxprakashvs7
 
AI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptxAI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptxprakashvs7
 
DS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptxDS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptxprakashvs7
 
DS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxDS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxprakashvs7
 
DS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptxDS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptxprakashvs7
 

More from prakashvs7 (15)

Python lambda.pptx
Python lambda.pptxPython lambda.pptx
Python lambda.pptx
 
Unit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptxUnit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptx
 
unit 5_Real time Data Analysis vsp.pptx
unit 5_Real time Data Analysis  vsp.pptxunit 5_Real time Data Analysis  vsp.pptx
unit 5_Real time Data Analysis vsp.pptx
 
unit 4-1.pptx
unit 4-1.pptxunit 4-1.pptx
unit 4-1.pptx
 
unit 3.ppt
unit 3.pptunit 3.ppt
unit 3.ppt
 
final Unit 1-1.pdf
final Unit 1-1.pdffinal Unit 1-1.pdf
final Unit 1-1.pdf
 
PCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docxPCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docx
 
AI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptxAI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptx
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
 
AI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptxAI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptx
 
DS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptxDS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptx
 
DS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxDS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptx
 
DS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptxDS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptx
 
Php unit i
Php unit i Php unit i
Php unit i
 
The process
The processThe process
The process
 

Recently uploaded

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 

Recently uploaded (20)

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 

Unit 3_Numpy_Vsp.pptx

  • 1. UNIT 3: BASICS OF NUMPY 1
  • 2. NUMPY BASICS: ARRAYS AND VECTORIZED COMPUTATION NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of mathematical functions to perform vectorized computations efficiently. This guide will cover some of the basics of working with NumPy arrays and performing vectorized computations. Installing NumPy Before using NumPy, you need to make sure it's installed. You can install it using pip: pip install numpy 2
  • 3. Importing NumPy To use NumPy in your Python code, you should import it: import numpy as np By convention, it's common to import NumPy as np for brevity. Creating NumPy Arrays You can create NumPy arrays using various methods: 1. From Python Lists: arr = np.array([1, 2, 3, 4, 5]) 2. Using NumPy Functions: zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements ones_arr = np.ones(3) # Creates an array of ones with 3 elements rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1 3
  • 4. 3. Using NumPy's Range Function: range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8] 4
  • 5. BASIC ARRAY OPERATIONS Once you have NumPy arrays, you can perform various operations on them: 1. Element-wise Operations: NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and division: a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) c = a + b # Element-wise addition: [5, 7, 9] d = a * b # Element-wise multiplication: [4, 10, 18] 5
  • 6. 2. Indexing and Slicing: You can access individual elements and slices of NumPy arrays using indexing and slicing: arr = np.array([0, 1, 2, 3, 4, 5]) element = arr[2] # Access element at index 2 (value: 2) sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4]) 3. Array Shape and Reshaping: You can check and change the shape of NumPy arrays: arr = np.array([[1, 2, 3], [4, 5, 6]]) shape = arr.shape # Get the shape (2, 3) reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2) 4. Aggregation Functions: NumPy provides functions to compute statistics on arrays: arr = np.array([1, 2, 3, 4, 5]) mean = np.mean(arr) # Calculate the mean (average) max_val = np.max(arr) # Find the maximum value min_val = np.min(arr) # Find the minimum value 6
  • 7. VECTORIZED COMPUTATION Vectorized computation in Python refers to performing operations on entire arrays or sequences of data without the need for explicit loops. This approach leverages highly optimized, low-level code to achieve faster and more efficient computations. The primary library for vectorized computation in Python is NumPy. Traditional Loop-Based Computation In traditional Python programming, you might use explicit loops to perform operations on arrays or lists. For example: # Using loops to add two lists element-wise list1 = [1, 2, 3] list2 = [4, 5, 6] result = [] for i in range(len(list1)): result.append(list1[i] + list2[i]) # Result: [5, 7, 9] 7
  • 8. Vectorized Computation with NumPy NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how how you can achieve the same result using NumPy: import numpy as np # Using NumPy for element-wise addition arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 + arr2 # Result: array([5, 7, 9]) 8
  • 9. INTRODUCTION TO PANDAS DATA STRUCTURES Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the DataFrame and the Series. These data structures are designed to handle structured data, making it easier to work with datasets in a tabular format. DataFrame:  A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.  It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or even custom data types).  You can think of a DataFrame as a collection of Series objects, where each Series is a column.  DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning, exploration, and transformation. 9
  • 10. Here's a basic example of how to create a DataFrame using Pandas: 10
  • 11. Series:  A Series is a one-dimensional labeled array that can hold data of any data type.  It is like a column in a DataFrame or a single variable in statistics.  Series objects are commonly used for time series data, as well as other one-dimensional data. Key characteristics of a Pandas Series:  Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the data within a Series must be of the same data type. For example, if you create a Series with integer values, all values within that Series will be integers.  Labeled Data: Series have two parts: the data itself and an associated index. The index provides labels or names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom labels if needed.  Size and Shape: A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns or rows like a DataFrame. 11
  • 12. import pandas as pd # Create a Series from a list data = [10, 20, 30, 40, 50] series = pd.Series(data) # Display the Series print(series) 0 10 1 20 2 30 3 40 4 50 dtype: int64 12
  • 13. Some common tasks you can perform with Pandas:  Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases, and more.  Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and transforming data types.  Data Selection: Easily select specific rows and columns of interest using various indexing techniques.  Data Aggregation: Perform group by operations, calculate statistics, and aggregate data based on specific criteria.  Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to create informative plots and charts. 13
  • 14. A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and analysis in Python. Here's how you can work with DataFrames in Python using Pandas: 1. Import Pandas: First, you need to import the Pandas library. import pandas as pd 2. Creating a DataFrame: You can create a DataFrame in several ways. Here are a few common methods: From a dictionary: data = {'Column1': [value1, value2, ...], 'Column2': [value1, value2, ...]} df = pd.DataFrame(data) DataFrame 14
  • 15. • From a list of lists: data = [[value1, value2], [value3, value4]] df = pd.DataFrame(data, columns=['Column1', 'Column2']) • From a CSV file: df = pd.read_csv('file.csv') 3. Viewing Data: You can use various methods to view and explore your DataFrame: df.head(): Displays the first few rows of the DataFrame. df.tail(): Displays the last few rows of the DataFrame. df.shape: Returns the number of rows and columns. df.columns: Returns the column names. df.info(): Provides information about the DataFrame, including data types and non-null counts. 15
  • 16. 4. Selecting Data: You can select specific columns or rows from a DataFrame using indexing or filtering. For example: df['Column1'] # Select a specific column df[['Column1', 'Column2']] # Select multiple columns df[df['Column1'] > 5] # Filter rows based on a condition 5. Modifying Data: You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example: df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column df.at[index, 'Column1'] = new_value # Update a specific value df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row 16
  • 17. 6. Data Analysis: Pandas provides various functions for data analysis, such as describe(), groupby(), agg(), and more. 7. Saving Data: You can save the DataFrame to a CSV file or other formats: df.to_csv('output.csv', index=False) 17
  • 18. INDEX OBJECTS-INDEXING, SELECTION, AND FILTERING In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It provides the labels or names for the rows or columns of your data. You can use indexing, selection, and filtering techniques with these indexes to access specific data points or subsets of your data. Here's how you can work with index objects in Pandas: 1. Indexing: Indexing allows you to access specific elements or rows in your data using labels. You can use .loc[] for label- based indexing and .iloc[] for integer-based indexing. • Label-based indexing: df.loc['label'] # Access a specific row by its label df.loc['label', 'column_name'] # Access a specific element by label and column name 18
  • 19. • Integer-based indexing: df.iloc[0] # Access the first row df.iloc[0, 1] # Access an element by row and column index 2. Selection: You can use various methods to select specific data based on conditions or criteria. • Select rows based on a condition: 19 df[df['Column'] > 5] # Select rows where 'Column' is greater than 5 • Select rows by multiple conditions: df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
  • 20. 20 3. Filtering: Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to select rows meeting the condition. Create a boolean mask: condition = df['Column'] > 5 Apply the mask to the DataFrame: filtered_df = df[condition] 4. Setting a New Index: You can set a specific column as the index of your DataFrame using the .set_index() method. df.set_index('Column_Name', inplace=True)
  • 21. 21 5. Resetting the Index: If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index() method. df.reset_index(inplace=True) 6. Multi-level Indexing: You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data structures. df.set_index(['Index1', 'Index2'], inplace=True) Index objects in Pandas are versatile and powerful for working with data because they enable you to access and manipulate your data in various ways, whether it's for data retrieval, filtering, or restructuring.
  • 22. ARITHMETIC AND DATA ALIGNMENT IN PANDAS 22 Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series an DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels o the objects involved in the operation, which ensures that the result of the operation maintains data integrity and aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas: 1. Automatic Alignment: When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between tw Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the dat based on common labels and performs the operation only on matching labels. series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C']) series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D']) result = series1 + series2 In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't matc between series1 and series2.
  • 23. 23 2. Missing Data (NaN): When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values. 3. DataFrame Alignment: The same principles apply to DataFrames when performing operations between them. The alignment occurs both for rows (based on the index) and columns (based on column names). df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y']) df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z']) result = df1 + df2 In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2. 4. Handling Missing Data: You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns with missing data. result_filled = result.fillna(0) # Replace NaN with 0 result_dropped = result.dropna() # Remove rows or columns with NaN values
  • 24. 24 5. Alignment with Broadcasting: Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the shape of the Series. series = pd.Series([1, 2, 3]) scalar = 2 result = series * scalar In this example, result will be a Series with values [2, 4, 6]. Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way that maintains the integrity and structure of your data.
  • 25. 25 ARITHMETIC AND DATA ALIGNMENT IN NUMPY NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how arithmetic and data alignment work in NumPy: Automatic Alignment: NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays being operated on. This means that if you perform an operation between two NumPy arrays of different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise. import numpy as np arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5]) result = arr1 + arr2 In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
  • 26. 26 Broadcasting Rules: NumPy follows specific rules when broadcasting arrays: If the arrays have a different number of dimensions, pad the smaller shape with ones on the left side. Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are compatible. If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error. Handling Missing Data: In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible. Element-Wise Operations: NumPy performs arithmetic operations element-wise by default. This means that each element in the resulting array is the result of applying the operation to the corresponding elements in the input arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 * arr2 In this case, result will be [4, 10, 18].
  • 27. WHAT IS VECTORIZATION ?  Vectorization is used to speed up the Python code without using loop.  Using such a function can help in minimizing the running time of code efficiently.  Various operations are being performed over vector such as dot product of vectors which is also known as scalar product as it produces single output, outer products which results in square matrix of dimension equal to length X length of the vectors, Element wise multiplication which products the element of same indexes and dimension of the matrix remain unchanged. 27
  • 28. 28 APPLYING FUNCTIONS AND MAPPING In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize() function for mapping operations. Here's an overview of these approaches: Vectorized Functions: NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element-wise to arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Applying a function element-wise result = np.square(arr) # Square each element In this example, the np.square() function is applied element-wise to the arr array.
  • 29. 29
  • 30. HOW TO CREATE YOUR OWN UFUNC To create your own ufunc(Universal Functions), you have to define a function, like you do with normal functions in Python, then you add it to your NumPy ufunc library with the frompyfunc() method.  ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.  They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.  ufuncs also take additional arguments, like: The frompyfunc() method takes the following arguments: 1.function - the name of the function. 2.inputs - the number of input arguments (arrays). 3.outputs - the number of output arrays. 30
  • 31. 31
  • 32. 32 ‘np.apply_along_axis(): You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array. This is useful when you want to apply a function to each row or column of a 2D array. import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) # Apply a function along the rows (axis=1) def sum_of_row(row): return np.sum(row) result = np.apply_along_axis(sum_of_row, axis=1, arr=arr) In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
  • 33. 33 np.vectorize(): The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied element-wise to NumPy arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Apply the vectorized function to the array result = vectorized_func(arr) This approach is useful when you have a custom function that you want to apply to an array.
  • 34. 34 Mapping with np.vectorize(): You can use np.vectorize() to map a function to each element of an array. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Map the function to each element result = vectorized_func(arr) This approach is similar to applying a function element-wise but can be used for more complex mapping operations. These methods allow you to apply functions and perform mapping operations efficiently on NumPy arrays, making it a powerful library for numerical and scientific computing tasks.
  • 35. 35 SORTING AND RANKING Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python through libraries like NumPy and Pandas. These operations help organize data in a desired order or rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries: Sorting in NumPy: In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions. np.sort(): This function returns a new sorted array without modifying the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) sorted_arr = np.sort(arr)
  • 36.  np. sort() returns the sorted array whereas np. argsort() returns an array of the corresponding indices. The figure shows how the algorithm transforms an unsorted array [10, 6, 8, 2, 5, 4, 9, 1] into a sorted array [1, 2, 4, 5, 6, 8, 9, 10] . 36
  • 37. 37 np.argsort(): This function returns the indices that would sort the array. You can use these indices to sort the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) sorted_arr = arr[indices] Sorting in Pandas: In Pandas, you can sort Series and DataFrames using the sort_values() method. You can specify the column(s) to sort by and the sorting order. import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 35]} df = pd.DataFrame(data) # Sort by 'Age' column in ascending order sorted_df = df.sort_values(by='Age', ascending=True)
  • 38. 38 NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements. You can then use these rankings to create a ranked array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0 Ranking in Pandas: In Pandas, you can rank data using the rank() method. You can specify the sorting order and how to handle ties (e.g., assigning the average rank to tied values). import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 30]} df = pd.DataFrame(data) # Rank by 'Age' column in descending order and assign average rank to tied values df['Rank'] = df['Age'].rank(ascending=False, method='average') Ranking in NumPy:
  • 39. 39 SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS 1. Summary Statistics: NumPy provides functions to compute summary statistics directly on arrays. import numpy as np data = np.array([25, 30, 22, 35, 28]) mean = np.mean(data) median = np.median(data) std_dev = np.std(data) variance = np.var(data)
  • 40. 40 2. Percentiles and Quartiles: You can compute specific percentiles and quartiles using the np.percentile() function. percentile_25 = np.percentile(data, 25) percentile_75 = np.percentile(data, 75) 3. Correlation and Covariance: You can compute correlation and covariance between arrays using np.corrcoef() and np.cov(). correlation_matrix = np.corrcoef(data1, data2) covariance_matrix = np.cov(data1, data2)
  • 41. 41 CORRELATION AND COVARIANCE In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions, respectively. These functions are useful for analyzing relationships and dependencies between variables. Here's how to use them: Computing Correlation Coefficient (Correlation): The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation. import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6])
  • 42. 42 # Compute the correlation coefficient between x and y correlation_matrix = np.corrcoef(x, y) # The correlation coefficient is in the (0, 1) element of the matrix correlation_coefficient = correlation_matrix[0, 1] In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
  • 43. 43 Computing Covariance: Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship (both variables increase or decrease together), while negative values indicate an inverse relationship (one variable increases as the other decreases). import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6]) # Compute the covariance between x and y covariance_matrix = np.cov(x, y) # The covariance is in the (0, 1) element of the matrix covariance = covariance_matrix[0, 1] In this example, covariance will contain the covariance between x and y. Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can compute the correlation matrix or covariance matrix for all pairs of variables.