Explore how our student team leveraged data science to forecast power consumption, empowering smarter energy management and sustainability initiatives. visit for more: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
3. Introduction
• The Power sector is most important, a necessary sector and is very well influenced
by technological advancements, changing consumer preferences, and a competitive
market.
• Power Consumption, which is the phenomenon of users consuming the provided
electricity, poses unique challenges and opportunities. When the power
consumption is altered, it can seriously affect on production and transmission of
electricity.
• Machine learning, with its predictive capabilities, offers a transformative approach
to understanding and mitigating the challenges posed by changes in power
consumption.
4. Power Domain And Importance
The Power Sector is a rapidly growing sector and it's not shy to present its own set of distinct
challenges and opportunities.
I chose Power Domain for my Capstone Project because :
• Efficiency of Supply: The efficient supply of power to consumers create a stable
system of operations which enables system to produce to power based on the
demand of the consumer.
• Costs matters : The cost of supply ,transmission and distribution of power to
consumers should be optimized so the providers don't experience losses .
• Chain of operation : Different production units hydro, thermal, wind are used. So as
per the demand these units should be made operational.
• Tech is always changing: New tech stuff is always popping up, especially in Power
industry. Figuring out how to use these tech innovations to help consumers is part of
the adventure.
5. Problem Statement &
Explanation
Electricity
Production from
Various Units
Sub Station
Zone_1
Zone_2
Zone_3
Weather conditions, and potential emission diffusion metrics :
1. Temperature
2. Humidity
3. Wind Speed
4. Diffuse flows
5. General Diffuse flows
6. Dataset Information
Here are the key details about the dataset used in this project:
• Number of records: Our dataset comprises a robust collection of
data, consisting of 52,416 records. Each record represents a
unique entry, contributing to the richness and depth of our analysis.
• Features/Columns: The dataset is characterized by a diverse set
of features, each providing valuable insights into climatic
conditions, flows of water, and power consumptions in various
zones. In total, there are 9 features/columns that form the basis of
our predictive modeling.
• Source of the Data: We have partnered with a leading Moroccan
renewable energy company committed to providing efficient and
sustainable energy solutions. They want to develop a robust tool
for optimizing energy usage in Agadir, a critical region for their
operations.
Columns/Features
• Datetime
• Temperature
• Humidity
• Wind Speed
• General Diffuse Flows
• Diffuse Flows
• PowerConsumption_Zone1
• PowerConsumption_Zone2
• PowerConsumption_Zone3
7. Exploratory Data Analysis (EDA)
• Exploring the data allowed us to gain a comprehensive overview of
the data's structure. It uncovered potential patterns, helped us
identify key trends and get essential insights from the dataset.
• Throughout the EDA process, we analyzed the distribution of
individual features, investigated correlations, and explored any
inherent relationships between variables.
• Visualizations also played a crucial role in providing a clear
representation of the data, offering insights into customer behavior
and identifying the factors that may contribute to customer churn.
8. Exploratory Data Analysis (EDA)
1. First, we made sure there were no Null values and Duplicates in the dataset. And luckily, there weren't
any. Our dataset was clean to begin with.
2. There are some columns that don't provide any useful information and hence they won't contribute
much to the predictions. Therefore, we will drop the following columns during Preprocessing : Datetime,
PowerConsumption_Zone1, PowerConsumption_Zone2.
3. The target variable, PowerConsumption_Zone3 exhibits Continuousness.
4. The independent variables Temperature, Humidity, Windspeed, General Diffuse Flows, Diffuse Flows
also exhibits Continuousness.
11. Visualizations
Upon inspecting the heatmap, we can see that there is strong positive correlation
observed among the columns PowerConsumption_Zone1, PowerConsumption_Zone2,
PowerConsumption_Zone3 . As a result, PowerConsumption_Zone1,
PowerConsumption_Zone2 will be dropped.
12. Preprocessing
• First, “Datetime” , “PowerConsumption_Zone1” and “PowerConsumption_Zone2”
columns were dropped as they didn’t provide any useful information for our
predictions
• we made sure there were no Null values and Duplicates in the dataset. And luckily,
there weren't any. Our dataset was clean to begin with.
Splitting the data into X and y
• Now, we partition the dataset into two components: X and y.
• The variable X encompasses all independent variables, representing the features
that contribute to our predictions.
• On the other hand, y encapsulates the dependent variable or target variable, serving
as the outcome we aim to predict.
13. Train-Test Split
• We then split the dataset into training data and testing data.
• We'll now split the dataset into training and testing data. We will do an 80:20
split, so our test size will be set to 0.2.
• We will take Random State as 42. This will guarantee the reproducibility of
our results across different runs.
Minmax Scaler
• We used Minmax Scaler to normalize the features of the dataset.
• This ensured that the consistency between the features of the dataset was
maintained.
• MinMax Scaler scales the data so that it is in the range of [0, 1].
14. Applying Machine
Learning Algorithms
This Power Consumption prediction problem we have here is a Continuous Regression problem.
Models used:
• Linear Regression : Linear regression is a quiet and the simplest statistical regression method used for predictive
analysis in machine learning. Linear regression shows the linear relationship between the independent(predictor)
variable I, and the dependent(output) variable
• Decision Tree Regression : Decision tree regression observes features of an object and trains a model in the
structure of a tree to predict data in the future to produce meaningful continuous output. In the context of Power
Consumption prediction, it observes the features of independent variables and trains the model.
• Random Forest Regression : Random forest regression is a supervised learning algorithm and bagging technique
that uses an ensemble learning method for regression in machine learning. The trees in random forests run in
parallel, meaning there is no interaction between these trees while building the trees.
• Gradient Boost Regression : Gradient boosting regression trees are based on the idea of an ensemble method
derived from a decision tree. The decision tree uses a tree structure. Starting from tree root, branching according to
the conditions and heading toward the leaves, the goal leaf is the prediction result.
16. Model Selection
and Considerations
• Random Forest Regression outperforms
Linear Regression, Decision tree Regression
and Gradient Boost Regression in all metrics,
demonstrating higher r2 Score, Lower mse and
rmse error. It seems to be a promising model for
our task.
• Based on the provided metrics, Random Forest
Regression stands out as the best-performing
model overall.
• Hence, we will go with Random Forest
Regression as our final model as it is quite
evident that it predicts best for our Power
Consumption prediction model.
17. Conclusion
• With the help of several insights, patterns and trends in our data, we’ve used Machine Learning to
predict the power consumption in zone3.
• This project offers significant benefits to electricity providers:
• By predicting power consumption, Electricity providers can adopt proactive measures to produce
power at the required rate. This involves proper electricity production , less transmission losses
and proper supply to consumers.
• By focusing efforts on consumption at a high rate, Electricity providers can streamline operations,
reduce production costs, and improve overall efficiency.
• Understanding the factors influencing power consumption enables providers to efficiently supply
the power to meet individual needs. This level of personalization fosters stronger consumer
relationships, increases efficiency in supply of electricity without losses.