1) The document discusses fundamentals of time series analysis including characteristics of time series data, data cleaning, time-based indexing, visualization, and resampling of time series data.
2) Methods discussed include plotting univariate time series data from a random walk model and open power system data to analyze electricity consumption trends over time in Germany.
3) The data is cleaned, indexed by date, and grouped to visualize patterns like seasonal effects and differences between weekdays and weekends.
Apriori is the most famous frequent pattern mining method. It scans dataset repeatedly and generate item sets by bottom-top approach.
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
Directed Acyclic Graph Representation of basic blocks is the most important topic of compiler design.This will help for student studying in master degree in computer science.
Apriori is the most famous frequent pattern mining method. It scans dataset repeatedly and generate item sets by bottom-top approach.
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties
Directed Acyclic Graph Representation of basic blocks is the most important topic of compiler design.This will help for student studying in master degree in computer science.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
A talk given by Julian Hyde at DataCouncil SF on April 18, 2019
How do you organize your data so that your users get the right answers at the right time? That question is a pretty good definition of data engineering — but it is also describes the purpose of every DBMS (database management system). And it’s not a coincidence that these are so similar.
This talk looks at the patterns that reoccur throughout data management — such as caching, partitioning, sorting, and derived data sets. As the speaker is the author of Apache Calcite, we first look at these patterns through the lens of Relational Algebra and DBMS architecture. But then we apply these patterns to the modern data pipeline, ETL and analytics. As a case study, we look at how Looker’s “derived tables” blur the line between ETL and caching, and leverage the power of cloud databases.
[open source] hamilton, a micro framework for creating dataframes, and its ap...Stefan Krawczyk
At Stitch Fix, we have 130+ “Full Stack Data Scientists” who, in addition to doing data science work, are also expected to engineer and own data pipelines for their production models. One data science team, the Forecasting, Estimation, and Demand team, was in a bind. Their data generation process was causing them iteration & operational frustrations in delivering time-series forecasts for the business. In this talk I’ll present Hamilton, a novel open source Python micro framework, that solved their pain points by changing their working paradigm.
Specifically, Hamilton enables a simpler paradigm for Data Science & Data Engineering teams to create, maintain, and execute code for generating dataframes, especially when there are lots of inter-column dependencies. Hamilton does this by building a DAG of dependencies directly from Python functions defined in a special manner, which also makes unit testing and documentation easy; tune into the talk to find out how. I’ll also cover our experience migrating to it, our best practices in using it in production for over two years, along with planned extensions to make it a general purpose framework.
COMP41680 - Sample API Assignment¶
In [5]:
import os
import urllib.request
import csv
import pandas as pd
Task 1: Identify one or more suitable web APIs¶
API Chosen:
A single API that was chosen for this assignment was that provided by www.worldweatheronline.com
Specifically, the historic weather data API - http://developer.worldweatheronline.com/api/docs/historical-weather-api.aspx
The API is no longer freely available but they give out a free 60 day trial upon signing up, this entitles the user to 500 calls to the API per day.
The API key I received which works here is fbaf429501ff4c7f92b8463217d103
In [3]:
api_key = "fbaf429501ff4c7f92b8463217d103"
Task 2: Collect data your chosen API(s)¶
Collecting Raw Data - Functions needed:
The following 3 functions were written to allow multiple calls of the API as only limited data is available per call.
These function are commented throughout and are called below:
In [4]:
#create a file with set headings - 2 diff types of data to store
def create_file (file_loc, headings):
with open(file_loc, "w",newline='') as write_file: #as in get_and_write_data function
f = csv.writer(write_file)
f.writerow(headings)
write_file.close()
#function to call the API, retreive the raw csv data, and write to a file
def get_and_write_data(link, file_loc):
response = urllib.request.urlopen(link)
html = response.read().decode()
with open(file_loc, "a",newline='') as write_file: #open the file / create it, newline ='' to prevent blank lines being written
f = csv.writer(write_file)
lines = html.strip().split("\n")
for l in lines:
if l[0] =="#": # prevent it from writing the comments in the return of each API call
continue
elif l[0:10] in ["Not Availa", "There is n"]: #prevent it from writing lines where no data is present (i.e. returns saying - "Not Available" or "There is no weather data available for the date provided. Past data is available from 1 July, 2008 onwards only.")
continue
else: #if doesn't have those it is data and so should be written
l = l.split(",") #it comes in as a String, so convert to a list for later easier writing and manipulation
f.writerow(l)
#print ("Line Written")
write_file.close()
#return print ("Monthly Data Appending to Raw File - Completed")
# function to take in parameters set and then use this data to build a link
# to be passed into the get_and_write_data function
def get_raw_data(file_loc, api_key, location, year, month): #month needs to be a string to avoid invalid token errors for ints as the API needs a leading 0 for single digit months
while year <=2016: #iterate for all years available in api, namely July 2008 to date
if month == "02": #need to change end date in the call to the API as it doesn't return full values if the date .
This is a brief introduction to how R can be useful in the manufacturing sector to calculate the frequency of faults and then developing the model so that preventive maintenance can be done
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docxstilliegeorgiana
prog-05.pdf
Programming Assignment #5
CSci 430, Spring 2019
Dates:
Assigned: Monday April 15, 2019
Due: Wednesday May 1, 2019 (before Midnight)
Objectives:
� Understand short-term process scheduling.
� Work with data structures to implement a round-robin scheduler.
� Look at e�ects of di�erent time slice quantum sizes on the round-robin scheduling algorithm.
� Use C/C++ to implement vector and matrix data structures, get practice in creating and using
such data structures in C/C++.
Description:
Our textbooks chapter 9 discusses several possible short-term process scheduling policies. In this
programming assignment exercise we will implement two of the preemptive policies, the simple shortest
remaining time policy (SRT) and the round-robin scheduler with preemptive time slicing. Your program
will be given a simple input �le, indicating the process name, its arrival time and its total service time,
the same as the process scheduling examples from our textbook in Table 9.4 and Figure 9.5. You will
simulate the execution of the required schedulers. As in previous assignments, you program will need
to work non-interactively and be callable from the command line. The program will be provided with
the �le name of a �le with process information, in the format discussed below. Your program will also
be given the time slicing quantum parameter it is to use for the simulation, if round-robin scheduling
is selected. Your program will need to output the results of running the set of simulated processes
using the selected scheduling policy with the indicated time slice for the round-robin scheduler. Your
program will have to output its results exactly as shown below in the required output format. Your
program will also need to calculate some summary statistics for the simulated processes, including the
turnaround time and Tr/Ts ratio for each process, and the mean Tr and Tr/Ts values for the given
simulation.
Process simulation �le formats
The �les with the information about the processes to be simulated are fairly simple, and have the same
information that our textbook uses to illustrate the process scheduling examples. Each simulation �le
contains multiple rows of data, where each row consists of the process name, its arrival time, and its
service time. Here is an example:
1
A 0 3
B 2 6
C 4 4
D 6 5
E 8 2
This �le is named process-01.sim in the zip archive of �les I have given you to get started on this
assignment. This is also the same set of processes and start/service times used for all of the examples
in table 9.4 and �gure 9.5.
Running Simulations
As with previous assignments you are required to support using your simulation from the command
line. Your program will take the name of the �le containing the process information �rst. The next
parameter will be either 'rr' to perform round-robin scheduling, or 'srt' if shortest remaining time policy
is to be simulated. Finally, a 3rd parameter will be supplied for the round-robin ...
Kapacitor - Real Time Data Processing EnginePrashant Vats
Kapacitor is a native data processing engine.Kapacitor is a native data processing engine.It can process both stream and batch data from InfluxDB.It lets you plug in your own custom logic or user-defined functions to process alerts with dynamic thresholds. Key Kapacitor Capabilities
-Alerting
-ETL (Extraction, Transformation and Loading)
-Action Oriented
-Streaming Analytics
-Anomaly Detection
Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Jason L Brugger
U-SQL is the query language for big data analytics on the Azure Data Lake platform. This session will explore the unification of SQL and C# in this new query language, examples of combining data from external sources such as Azure SQL Database and Blob storage with Azure Data Lake store, creating and referencing assemblies, job submission and tools. The ADL platform will also be compared and contrasted to the HDInsight/Hadoop platform.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Runway Orientation Based on the Wind Rose Diagram.pptx
Unit-5 Time series data Analysis.pptx
1. AD3301
DATA EXPLORATION AND VISUALIZATION
Unit 5
TIME SERIES ANALYSIS
Fundamentals of TSA – Characteristics of time
series data – Data Cleaning – Time-based
indexing – Visualizing – Grouping –
Resampling.
2. Time series data
Time series data includes timestamps and is generated
while monitoring the industrial process or tracking any
business metrics.
An ordered sequence of timestamp values at equally
spaced intervals is referred to as a time series.
Analysis of a time series is used in many applications such
as sales forecasting, utility studies, budget analysis, economic
forecasting, inventory studies.
There many methods that can be used to model and
forecast time series.
3. Fundamentals of TSA
1. We can generate the dataset using the
numpy library:
import os import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
zero_mean_series = np.random.normal(loc=0.0, scale=1.,
size=50)
print(zero_mean_series)
5. 2. Next, we use the seaborn library to plot the
time series data.
plt.figure(figsize=(16, 8))
g = sns.lineplot(data=zero_mean_series)
g.set_title('Zero mean model')
g.set_xlabel('Time index')
plt.show()
6. We plotted the time series graph using the seaborn.lineplot()
function which is a built-in method provided by the seaborn
library. The output of the preceding code is given here:
7. 3. We can perform a cumulative sum over the list and
then plot the data using a time series plot. The plot
gives more interesting results
random_walk = np.cumsum(zero_mean_series)
print(random_walk)
It generates an array of the cumulative sum as shown here:
[ 0.91315139 1.43270997 0.40098944 -0.32421356 1.56512255 1.1688074
1.88838045 1.90611164 0.0224164 0.64514216 -0.57903366 -0.97109746
-0.51869892 -0.36149331 -0.24264069 -1.21204774 -2.42202265 -1.48999747
-1.86245958 -0.75111634 -0.5947768 -1.1387184 -0.96996227 -0.68733947
-0.10438789 0.22012962 0.64998637 1.62499367 1.87220386 1.41535986
0.8318829 0.14436192 0.97258843 0.25077455 0.64568416 -1.14704284
-1.77078204 -2.01722767 -2.24674902 0.81636651 -2.24108755 -0.86213759
-1.25766759 -1.52125784 -1.73784212 -1.09963977 -2.87373147 -2.20701359
-3.09731306 -2.69971764]
Note that for any particular value, the next value is the sum of previous values.
8. 4. Now, if we plot the list using the time series plot as
shown here, we get an interesting graph that shows
the change in values over time:
plt.figure(figsize=(16, 8))
g = sns.lineplot(data=random_walk)
g.set_title('Random Walk')
g.set_xlabel('Time index')
plt.show()
9. The output of the preceding code is given here:
Note the graph shown in the preceding diagram. It shows the
change of values over time.
10. Univariate time series
• When we capture a sequence of observations
for the same variable over a particular
duration of time, the series is referred to as
univariate time series.
• In general, in a univariate time series, the
observations are taken over regular time
periods.
• (E.g.) The change in temperature over time
throughout a day.
11. Characteristics of time series data
• Trend: When looking at time series data, it is essential to see if there
is any trend. Observing a trend means that the average measurement
values seem either to decrease or increase over time.
• Outliers: Time series data may contain a notable amount of outliers.
These outliers can be noted when plotted on a graph.
• Seasonality: Some data in time series tends to repeat over a certain
interval in some patterns. We refer to such repeating patterns as
seasonality.
• Abrupt changes: Sometimes, there is an uneven change in time series
data. We refer to such uneven changes as abrupt changes. Observing
abrupt changes in time series is essential as it reveals essential
underlying phenomena.
• Constant variance over time: It is essential to look at the time series
data and see whether or not the data exhibits constant variance over
time.
12. Time Series Analysis (TSA) with Open
Power System Data
• We can use the Open Power System dataset to
discover how electricity consumption and
production varies over time in Germany.
• Importing the dataset
# load time series dataset
df_power =
pd.read_csv("https://raw.githubusercontent.com/je
nfly/opsd/master/opsd_germany_daily.csv")
print(df_power.columns)
13. The output of the preceding code is given here:
Index(['Consumption', 'Wind', 'Solar', 'Wind+Solar'],
dtype='object')
The columns of the dataframe are described here:
• Date: The date is in the format yyyy-mm-dd.
• Consumption: This indicates electricity consumption in
GWh.
• Solar: This indicates solar power production in GWh.
• Wind+Solar: This represents the sum of solar and wind
power production in GWh.
14. Data cleaning
1. We can start by checking the shape of the dataset:
df_power.shape
The output of the preceding code is given here:
(4383, 5)
The dataframe contains 4,283 rows and 5 columns.
2. We can also check few entries inside the dataframe.
Let's examine the last 10 entries:
print(df_power.tail(10))
16. 3. Next, let's review the data types of
each column in our df_power dataframe:
print(df_power.dtypes)
The output of the preceding code is given here:
Date object
Consumption float64
Wind float64
Solar float64
Wind+Solar float64
dtype: object
17. 4. Note that the Date column has a data type of object. This is not
correct. So, the next step is to correct the Date column, as shown
here:
#convert object to datetime format
df_power['Date'] = pd.to_datetime(df_power['Date'])
5. It should convert the Date column to Datetime format. We can verify
this again:
print(df_power.dtypes)
The output of the preceding code is given here:
Date datetime64[ns]
Consumption float64
Wind float64
Solar float64
Wind+Solar float64
dtype: object
Note that the Date column has been changed into the correct data
type.
18. 6. Let's next change the index of our dataframe
to the Date column:
df_power = df_power.set_index('Date')
df_power.tail(3)
The output of the preceding code is given
here:
Note from the preceding screenshot that the Date column has
been set as DatetimeIndex
19. 7. We can simply verify this by using the code snippet given here:
Print(df_power.index)
The output of the preceding code is given here:
DatetimeIndex(['2006-01-01', '2006-01-02', '2006-01-03',
'2006-01-04', '2006-01-05', '2006-01-06', '2006-01-07',
'2006-01-08', '2006-01-09', '2006-01-10', ... '2017-12-22',
'2017-12-23', '2017-12-24', '2017-12-25', '2017-12-26',
'2017-12-27', '2017-12-28', '2017-12-29', '2017-12-30',
'2017-12-31'],dtype='datetime64[ns]', name='Date', length=4383,
freq=None)
8. Since our index is the DatetimeIndex object, now we can use it to
analyze thedataframe. Let's add more columns to our dataframe to
make it easier. Let's add Year, Month, and Weekday Name:
# Add columns with year, month, and weekday name
df_power['Year'] = df_power.index.year
df_power['Month'] = df_power.index.month
df_power['Weekday Name'] = df_power.index.day_name()
20. 9. Let's display five random rows from the dataframe:
# Display a random sampling of 5 rows
print(df_power.sample(5, random_state=0))
The output of this code is given here:
Note that we added three more columns—Year, Month, and
Weekday Name. Adding these columns helps to make the
analysis of data easier.
21. Time-based indexing
Time-based indexing is a very powerful method of the pandas
library. Having time-based indexing allows using a formatted string
to select data.
See the following code, for example:
print(df_power.loc['2015-10-02'])
The output of the preceding code is given here:
Consumption 1391.05
Wind 81.229
Solar 160.641
Wind+Solar 241.87
Year 2015
Month 10
Weekday Name Friday
Name: 2015-10-02 00:00:00, dtype: object
Note that we used the pandas dataframe loc accessor. In the preceding
example, we used a date as a string to select a row. We can use all sorts of
techniques to access rows just as we can do with a normal dataframe
index.
22. Visualizing time series
Let's visualize the time series dataset. We will continue using the
same df_power dataframe:
1. The first step is to import the seaborn and matplotlib libraries:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})
plt.rcParams['figure.figsize'] = (8,5)
plt.rcParams['figure.dpi'] = 150
2. Next, let's generate a line plot of the full time series of Germany's
daily electricity consumption:
df_power['Consumption'].plot(linewidth=0.5)
23. As depicted in the preceding screenshot, the y-axis shows the
electricity consumption and the x-axis shows the year.
However, there are too many datasets to cover all the years.
The output of the preceding code is given here:
24. 3. Let's use the dots to plot the data for all the other columns:
cols_to_plot = ['Consumption', 'Solar', 'Wind']
axes = df_power[cols_to_plot].plot(marker='.', alpha=0.5,
linestyle='None',figsize=(14, 6), subplots=True)
for ax in axes:
ax.set_ylabel('Daily Totals (GWh)')
The output of the preceding code is given here:
The output shows that electricity consumption can be broken down into two
distinct patterns:
One cluster roughly from 1,400 GWh and above
Another cluster roughly below 1,400 GWh
Moreover, solar production is higher in summer and lower in winter. Over the years,
there seems to have been a strong increasing trend in the output of wind power.
25. 4. We can further investigate a single year to have a closer look.
Check the code given here:
ax = df_power.loc['2016', 'Consumption'].plot()
ax.set_ylabel('Daily Consumption (GWh)');
The output of the preceding code is given here:
From the preceding screenshot, we can see clearly the
consumption of electricity for 2016.
The graph shows a drastic decrease in the consumption of
electricity at the end of the year(December) and during August.
26. Let's examine the month of December 2016 with the following
code block:
ax = df_power.loc['2016-12',
'Consumption'].plot(marker='o', linestyle='-')
ax.set_ylabel('Daily Consumption (GWh)');
The output of the preceding code is given here:
As shown in the preceding graph, electricity consumption is higher
on weekdays and lowest at the weekends. We can see the
consumption for each day of the month. We can zoom in further to
see how consumption plays out in the last week of December.
27. In order to indicate a particular week of December, we can supply a specific date
range as shown here:
ax = df_power.loc['2016-12-23':'2016-12-30',
'Consumption'].plot(marker='o', linestyle='-')
ax.set_ylabel('Daily Consumption (GWh)');
As illustrated in the preceding code, we want to see the electricity consumption
between 2016-12-23 and 2016-12-30. The output of the preceding code is given here:
As illustrated in the preceding screenshot, electricity consumption was lowest
on the day of Christmas, probably because people were busy partying. After
Christmas, the consumption increased.
28. Grouping time series data
1. We can first group the data by months and then use the
box plots to visualize the data:
fig, axes = plt.subplots(3, 1, figsize=(8, 7), sharex=True)
for name, ax in zip(['Consumption', 'Solar', 'Wind'], axes):
sns.boxplot(data=df_power, x='Month', y=name, ax=ax)
ax.set_ylabel('GWh')
ax.set_title(name)
if ax != axes[-1]:
ax.set_xlabel('')
The output of the preceding code is given here:
29. 2. Next, we can group the consumption of electricity by the
day of the week, and present it in a box plot:
sns.boxplot(data=df_power, x='Weekday Name',
y='Consumption');
The output of the preceding code is given here:
The preceding screenshot shows that electricity consumption is higher on
weekdays than on weekends. Interestingly, there are more outliers on the
weekdays.
30. Resampling time series data
It is often required to resample the dataset at lower or higher frequencies. This
resampling is done based on aggregation or grouping operations. For example, we can
resample the data based on the weekly mean time series as follows:
1. We can use the code given here to resample our data:
columns = ['Consumption', 'Wind', 'Solar', 'Wind+Solar']
power_weekly_mean = df_power[columns].resample('W').mean()
power_weekly_mean
The output of the preceding code is given here:
As shown in the preceding screenshot, the first row, labeled 2006-01-01, includes the
average of all the data. We can plot the daily and weekly time series to compare the
dataset over the six-month period.
31. 2. Let's see the last six months of 2016. Let's start by initializing
the variable:
start, end = '2016-01', '2016-06‘
3. Next, let's plot the graph using the code given here:
fig, ax = plt.subplots()
ax.plot(df_power.loc[start:end, 'Solar'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
ax.plot(power_weekly_mean.loc[start:end, 'Solar'],
marker='o', markersize=8, linestyle='-', label='Weekly Mean
Resample')
ax.set_ylabel('Solar Production in (GWh)')
ax.legend();
32. The output of the preceding code is given here:
The preceding screenshot shows that the weekly mean
time series is increasing over time and is much smoother
than the daily time series.