2. What is Machine Learning?
the use and development of computer systems that are able to learn and
adapt without following explicit instructions, by using algorithms and statistical
models to analyze and draw inferences from patterns in data.
Is a science of making computers learn and act like humans by feeding data
and information without being explicitly programmed.
3. Real World Examples of Machine
Learning
Facial Recognitions Voice recognition
Ex: Siri & Cortana Healthcare Industry
Weather Forecasting Produce a web Series
7. Sample Machine Learning Using Decision
Tress(Gathering Data)
Situation: Image we have an online music store
Note: In gender 1 for male & 0 for female
8. Codes for calling the data into the Jupyter
import pandas as pd
music_data = pd.read_csv('music.csv’)
music_data
Preparing the Data
11. sklearn library
Scikit-learn, often abbreviated as sklearn, is a popular Python library for
machine learning and data analysis. It provides a wide range of tools and
algorithms for various tasks related to machine learning and data mining.
Some of the key functions and features of scikit-learn include:
Data Preprocessing: Scikit-learn offers tools for data preprocessing, such as
data cleaning, scaling, encoding categorical variables, and feature selection.
This is essential for preparing data for machine learning models.
Supervised Learning: Scikit-learn supports a wide range of supervised
learning algorithms, including linear and logistic regression, support vector
machines, decision trees, random forests, k-nearest neighbors, and more.
These algorithms are used for tasks like classification and regression.
12. 1.Unsupervised Learning: It also provides algorithms for unsupervised
learning, such as clustering (e.g., K-Means clustering) and dimensionality
reduction (e.g., Principal Component Analysis or PCA).
2.Model Selection: Scikit-learn includes tools for model selection and
hyperparameter tuning, like cross-validation, grid search, and randomized
search. These help in finding the best model and its associated
hyperparameters for a given problem.
3.Model Evaluation: You can use scikit-learn to evaluate the performance of
machine learning models through various metrics like accuracy, precision,
recall, F1-score, and ROC curves, among others.
4.Feature Extraction and Engineering: The library offers tools for feature
extraction and feature engineering, including techniques like TF-IDF, word
embeddings, and more.
13. Pipeline Building: Scikit-learn allows you to create machine learning
pipelines, which are sequences of data preprocessing steps, feature selection,
and model training. This helps streamline the machine learning workflow.
Ensemble Methods: You can build ensemble models like Random Forests
and Gradient Boosting using scikit-learn, which often improve predictive
performance.
Integration with NumPy and pandas: Scikit-learn integrates seamlessly with
other popular Python libraries like NumPy and pandas for data manipulation
and handling.
Community and Documentation: Scikit-learn has an active community of
users and developers, and it provides extensive documentation, tutorials, and
examples to help users get started with machine learning tasks.
14. We add in our library this code
from sklearn.tree import DecisionTreeClassifier
is a class in scikit-learn (sklearn) used to create
and train decision tree classifiers, which are a type
of supervised machine learning model. Decision
trees are used for both classification and
regression tasks.
15. How to use DecisionTreeClassifier
We need to initialize :
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
Training: You can train the decision tree classifier on your dataset using the
fit() method. It takes the features (X) and target labels (y) as input:
model.fit(X, y)
16. Prediction:
predictions = model.predict(new_data) or in our case
predictions = model.predict([[21,1],[22,0]])
After training, you can use the trained model to make
predictions on new data points using the predict()
method:
18. Training our Data
train_test_split- function from the sklearn.model_selection module in scikit-
learn is a commonly used function for splitting a dataset into two or more
subsets, typically for the purpose of training and testing a machine learning
model.
Its primary function is to randomly divide a dataset into two or more
portions: one for training the model (the training set) and the other for
evaluating the model's performance (the testing/validation set).
sklearn.metrics - is a module in scikit-learn (a popular machine learning library
in Python) that provides a wide range of functions and classes for evaluating
the performance of machine learning models. These metrics are essential
for assessing how well your models are performing on various tasks, such
as classification, regression, clustering, and more
In this data set there is no null value or duplicate but we need to separate the columns ,the two columns is the input set and the other is the output sets.
We need to separate the columns because we need to train a model in our case the output set is the genre which consider to be our predictions. If you try to observe in in table we don’t have a data containing 21yrs old male, we don’t know what would the prefer genre they like, so we will ask our model make some predictions .
Overall, scikit-learn is a powerful and versatile library that simplifies many aspects of machine learning in Python, making it accessible to both beginners and experienced data scientists and machine learning practitioners.
Rule of the thumb, allocate 70-80 %of our data for training and 20-30% for its testing.