2. NumPy: NumPy is a library for the Python programming language, adding
support for large, multi-dimensional arrays and matrices, along with a
collection of mathematical functions to operate on these arrays.
Key Features:
Array creation and manipulation
Mathematical operations on arrays
Linear algebra operations
Fourier transforms
Random number generation
Applications:
Scientific computing
Data analysis and manipulation
Machine learning 2
3. How to install NumPy on Jupyter?
Open the jupyter notebook and type the following code:
!pip install numpy
Import numpy as np
Solve the following code then:
n = np.array((1,2,3))
Print(n)
Type of object:
Print(type(n))
3
4. OpenCV (Open Source Computer Vision Library):
OpenCV is an open-source computer vision and machine learning software
library. It provides a wide range of functionalities for real-time computer vision,
including image and video processing, object detection, face recognition, and
more.
Key Features:
Image and video I/O
Image processing algorithms
Object detection and tracking
Machine learning algorithms for computer vision tasks
Applications:
Robotics
Augmented reality
Surveillance systems
Medical image analysis 4
5. How to install Open CV on Jupyter?
Open the jupyter notebook and type the following code:
!pip install opencv-python
import cv2
img = cv2.imread("img1.png")
cv2.imshow("MRK",img)
cv2.waitKey(10000)
cv2.destroyAllWindows()
5
6. Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python. It provides a MATLAB-like interface and
supports a wide variety of plots and graphs.
Key Features:
Line plots, scatter plots, and histograms
2D and 3D plotting
Customization of plots
Integration with NumPy arrays
Applications:
Data visualization
Scientific plotting
Statistical analysis
6
7. How to install Matplotlib on Jupyter?
Open the jupyter notebook and type the following code:
!pip install matplotlib
Import matplotlib.pyplot as plt // as means alias (named)
import numpy as np
xpts = np.array([0,4])
ypts = np.array([0,6])
plt.plot(xpts,ypts)
plt.show()
7
8. scikit-image, commonly abbreviated as skimage, is an open-source image
processing library for Python.
It provides a collection of algorithms for image division, feature extraction,
image filtering, and other image processing tasks
Image Processing
Integration: It seamlessly integrates with other scientific Python libraries such
as NumPy, SciPy, and Matplotlib, allowing for efficient image manipulation and
analysis.
User-Friendly API
Community Support: Skimage benefits from an active community of developers
and users,
8
9. Installing scikit-image library:
Pip install scikit-image
Import skimage
from skimage import io
# Load an image from a file
image = io.imread('example_image.jpg')
# Display the image
io.imshow(image)
io.show()
9
10. Pillow is a Python Imaging Library (PIL) fork, which adds extensive image processing
capabilities to Python. It provides support for opening, manipulating, and saving many
different image file formats.
Image Manipulation: Pillow offers a wide range of image handling functionalities such
as resizing, cropping, rotating, filtering, and enhancing images.
Image File Support: It supports various image file formats including JPEG, PNG, GIF,
etc. making it suitable for handling varied image data.
Integration: Pillow seamlessly integrates with other Python libraries such as NumPy
and Matplotlib, enabling easy interoperability with scientific computing and data
visualization tools.
Ease of Use: Pillow provides a simple and intuitive API for working with images,
making it accessible to users with varying levels of programming experience.
Activeness: Pillow is actively maintained and updated, ensuring compatibility with the
latest Python versions and continued support for new features and improvements.
10
11. Installing Pillow library:
Pip install pillow
from PIL import Image
# Open an image file
original_image =
Image.open("example.jpg")
# Display basic information about
the image
print("Original Image Format:",
original_image.format)
print("Original Image Size:",
original_image.size)
# Resize the image
new_size = (original_image.size[0] //
2, original_image.size[1] // 2)
# Reduce size by half
resized_image =
original_image.resize(new_size)
11
# Display new size
print("Resized Image Size:", resized_image.size)
# Save the resized image with a new name
resized_image.save("resized_example.jpg")
# Close the original and resized images
original_image.close()
resized_image.close()
print("Resized image saved successfully!")
12. Pandas is a powerful Python library for data manipulation and analysis. It
offers data structures and functions to efficiently work with structured data like
time series, tabular, and heterogeneous data.
Data Structures: Pandas provides two main data structures: Series (1D labeled
array) and DataFrame (2D labeled data structure), which offer powerful data
manipulation capabilities.
Data Handling: It offers functionalities for reading and writing data from
various formats like CSV, Excel, SQL databases etc.
Data Analysis: Pandas supports data analysis tasks including data cleaning,
filtering, grouping, merging, and reshaping, making it indispensable for
exploratory data analysis.
Integration: It seamlessly integrates with other Python libraries such as
NumPy, Matplotlib, and scikit-learn, enhancing its capabilities in scientific
computing and machine learning tasks.
12
13. Installing Pandas library:
Pip install pandas
Some time it shows for pip upgrade
then use the following to upgrade
your pip:
Python.exe -m pip install --upgrade
pip
import pandas as pd
# Read a CSV file into a DataFrame
df = pd.read_csv("example.csv")
# Display the first few rows of the
DataFrame
print("First few rows of the
DataFrame:")
print(df.head()) 13
# Display summary information
about the DataFrame
print("nSummary
information:")
print(df.info())
# Display basic statistics of
numerical columns
print("nBasic statistics:")
print(df.describe())
14. Definition: scikit-learn is a versatile machine learning library for Python. It offers
simple and efficient tools for data mining and data analysis, implementing a wide
range of machine learning algorithms.
Machine Learning Algorithms: scikit-learn provides implementations for various
machine learning algorithms including classification, regression, clustering,
dimensionality reduction, and model selection.
Model Evaluation: It offers tools for model evaluation, cross-validation, and
hyperparameter tuning, facilitating the development of robust and accurate machine
learning models.
Integration: scikit-learn seamlessly integrates with other Python libraries such as
NumPy, SciPy, and Pandas, enabling easy preprocessing, training, and evaluation of
machine learning models.
Scalability: It is designed to be scalable and efficient, making it suitable for working
with large datasets and complex models.
14
15. Installing scikit-learn library:
Pip install scikit-learn
Import sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import
train_test_split
from sklearn.ensemble import
RandomForestClassifier
from sklearn.metrics import
accuracy_score, classification_report
# Load the Iris dataset
iris = load_iris()
X = iris.data # Features
y = iris.target # Target variable
# Split the dataset into training and
testing sets
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.2,
random_state=42) 15
# Initialize the Random Forest classifier
rf_classifier =
RandomForestClassifier(n_estimators=100,
random_state=42)
# Train the classifier
rf_classifier.fit(X_train, y_train)
# Predict on the test set
y_pred = rf_classifier.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Display classification report
print("nClassification Report:")
print(classification_report(y_test, y_pred,
target_names=iris.target_names))
16. Seaborn is a Python library for creating attractive statistical graphics.
Statistical Visualization: Seaborn excels in generating plots like scatter plots,
bar charts, and heatmaps for effective data exploration.
Integration with Pandas: It seamlessly works with Pandas DataFrames,
making data visualization straightforward.
Customization: Users can easily customize plot aesthetics to suit their
preferences.
Statistical Analysis: Seaborn offers tools for visualizing relationships between
variables and conducting statistical analysis.
Community and Documentation: Supported by an active community and
comprehensive documentation for easy learning.
16
17. Installing seaborn library:
Pip install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
iris_df = sns.load_dataset("iris") # Load Iris dataset as a DataFrame
# Create a pairplot using Seaborn
sns.pairplot(iris_df, hue='species', palette='Set1')
# Add title
plt.suptitle("Pairplot of Iris Dataset")
# Show the plot
plt.show()
17
18. Plotly is a Python library for creating interactive and publication-quality graphs.
Interactive Visualization: Plotly allows users to interactively explore data
through zooming and hovering over data points.
Online Platform: It offers an online platform for hosting and sharing interactive
plots.
Chart Types: Supports a wide range of chart types including scatter plots, line
plots, and 3D surface plots.
Integration: Easily integrates with other Python libraries for seamless data
manipulation and visualization.
Customization: Provides extensive options for customizing plot appearance for
tailored visualizations.
18
19. Installing plotly library:
Pip install plotly
import plotly.graph_objects as go
# Sample data
x_values = [1, 2, 3, 4, 5]
y_values = [2, 3, 5, 7, 11]
# Create a line plot
fig = go.Figure(data=go.Scatter(x=x_values, y=y_values,
mode='lines'))
# Add title and axis labels
fig.update_layout(title='Simple Line Plot',
xaxis_title='X-axis',
yaxis_title='Y-axis')
# Show the plot
fig.show() 19
20. Data Pre Processing:
Data preprocessing is a critical step in machine learning pipelines.
It is define as the techniques and procedures used to prepare raw
data for analysis.
It involves several tasks such as importing and exporting data,
cleaning and formatting data, handling missing values, and feature
scaling.
20
Importing and Exporting Data:
•Importing data involves loading datasets into the machine learning
environment.
•This can be done using libraries like Pandas in Python or functions like
read_csv() for CSV files, read_excel() for Excel files, etc.
import pandas as pd
df=pd.read_csv(‘ML.csv’)
df.shape #show number of rows and columns
df.describe() #calculate the SD, mean etc.
21. Exporting the Data :
import pandas as pd
# Example DataFrame
data = {
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Export DataFrame to CSV
df.to_csv('output.csv', index=False) 21
22. Cleaning and Formatting Data:
Cleaning data involves identifying and handling anomalies, inconsistencies,
and errors in the dataset.
This may include removing duplicates, correcting data types, dealing with
outliers, etc.
Formatting data involves ensuring that data is in the appropriate format for
analysis.
For example, converting categorical variables into numerical representations,
standardizing date formats, etc.
22
23. import pandas as pd
# Load the dataset
data = {
'Name': ['John', 'Alice', 'Bob', 'Anna', 'Mike', 'Emily'],
'Age': [25, 30, None, 35, 40, ''],
'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', '',
'Seattle'],
'Gender': ['Male', 'Female', 'Male', '', 'Male', 'Female'],
'Salary': ['$50000', '$60000', '$70000', '$80000', '90000', '$100000']
}
df = pd.DataFrame(data)
# Display the original DataFrame
print("Original DataFrame:")
print(df)
print()
# Clean and format the data
# 1. Convert Age to numeric and fill missing values with the median
age
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
23
median_age = df['Age'].median() #
Calculate median age
df['Age'].fillna(median_age, inplace=True)
# Fill missing values with median
# 2. Remove rows with missing or empty
values in City and Gender columns
df = df[df['City'].notna() &
df['Gender'].notna() & (df['Gender'] != '')]
# 3. Convert Salary to numeric and remove
dollar signs
df['Salary'] = df['Salary'].replace('[$,]', '',
regex=True).astype(float)
# Display the cleaned and formatted
DataFrame
print("Cleaned and Formatted
DataFrame:")
print(df)
24. Handling Missing Values:
Missing values are common in datasets and can significantly affect the
performance of machine learning models if not handled properly.
Techniques for handling missing values include:
Imputation: Replacing missing values with a calculated or estimated value
(e.g., mean, median, mode).
Deletion: Removing rows or columns with missing values.
Advanced techniques like predictive modeling to estimate missing values
based on other features.
The example is same as previous.
24
25. Feature Scaling:
Feature scaling is the process of standardizing or normalizing the range of
independent variables or features in the dataset.
It is essential for algorithms that are sensitive to the scale of the input
features, such as gradient descent-based algorithms (e.g., linear regression,
logistic regression) or distance-based algorithms (e.g., k-nearest neighbors,
support vector machines).
Common techniques for feature scaling include:
Min-Max Scaling: Scaling features to a fixed range, usually [0, 1].
Standardization (Z-score normalization): Scaling features so that they have
the properties of a standard normal distribution with a mean of 0 and a
standard deviation of 1.
Robust Scaling: Scaling features using statistics that are robust to outliers,
such as the median and interquartile range.
25