SlideShare a Scribd company logo
AD3301
DATA EXPLORATION AND VISUALIZATION
Unit 5
TIME SERIES ANALYSIS
Fundamentals of TSA – Characteristics of time
series data – Data Cleaning – Time-based
indexing – Visualizing – Grouping –
Resampling.
Time series data
Time series data includes timestamps and is generated
while monitoring the industrial process or tracking any
business metrics.
An ordered sequence of timestamp values at equally
spaced intervals is referred to as a time series.
Analysis of a time series is used in many applications such
as sales forecasting, utility studies, budget analysis, economic
forecasting, inventory studies.
There many methods that can be used to model and
forecast time series.
Fundamentals of TSA
1. We can generate the dataset using the
numpy library:
import os import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
zero_mean_series = np.random.normal(loc=0.0, scale=1.,
size=50)
print(zero_mean_series)
The output of the preceding code is given here:
[ 0.91315139 0.51955858 -1.03172053 -0.725203 1.88933611 -0.39631515
0.71957305 0.01773119 -1.88369523 0.62272576 -1.22417583 -0.3920638
0.45239854 0.15720562 0.11885262 -0.96940705 -1.20997492 0.93202519
-0.37246211 1.11134324 0.15633954 -0.5439416 0.16875613 0.2826228
0.58295158 0.3245175 0.42985676 0.97500729 0.24721019 -0.45684401
-0.58347696 -0.68752098 0.82822652 -0.72181389 0.39490961 -1.792727
-0.6237392 -0.24644562 -0.22952135 3.06311553 -3.05745406 1.37894995
-0.39553 -0.26359025 -0.21658428 0.63820235 -1.7740917 0.66671788
-0.89029947 0.39759542]
2. Next, we use the seaborn library to plot the
time series data.
plt.figure(figsize=(16, 8))
g = sns.lineplot(data=zero_mean_series)
g.set_title('Zero mean model')
g.set_xlabel('Time index')
plt.show()
We plotted the time series graph using the seaborn.lineplot()
function which is a built-in method provided by the seaborn
library. The output of the preceding code is given here:
3. We can perform a cumulative sum over the list and
then plot the data using a time series plot. The plot
gives more interesting results
random_walk = np.cumsum(zero_mean_series)
print(random_walk)
It generates an array of the cumulative sum as shown here:
[ 0.91315139 1.43270997 0.40098944 -0.32421356 1.56512255 1.1688074
1.88838045 1.90611164 0.0224164 0.64514216 -0.57903366 -0.97109746
-0.51869892 -0.36149331 -0.24264069 -1.21204774 -2.42202265 -1.48999747
-1.86245958 -0.75111634 -0.5947768 -1.1387184 -0.96996227 -0.68733947
-0.10438789 0.22012962 0.64998637 1.62499367 1.87220386 1.41535986
0.8318829 0.14436192 0.97258843 0.25077455 0.64568416 -1.14704284
-1.77078204 -2.01722767 -2.24674902 0.81636651 -2.24108755 -0.86213759
-1.25766759 -1.52125784 -1.73784212 -1.09963977 -2.87373147 -2.20701359
-3.09731306 -2.69971764]
Note that for any particular value, the next value is the sum of previous values.
4. Now, if we plot the list using the time series plot as
shown here, we get an interesting graph that shows
the change in values over time:
plt.figure(figsize=(16, 8))
g = sns.lineplot(data=random_walk)
g.set_title('Random Walk')
g.set_xlabel('Time index')
plt.show()
The output of the preceding code is given here:
Note the graph shown in the preceding diagram. It shows the
change of values over time.
Univariate time series
• When we capture a sequence of observations
for the same variable over a particular
duration of time, the series is referred to as
univariate time series.
• In general, in a univariate time series, the
observations are taken over regular time
periods.
• (E.g.) The change in temperature over time
throughout a day.
Characteristics of time series data
• Trend: When looking at time series data, it is essential to see if there
is any trend. Observing a trend means that the average measurement
values seem either to decrease or increase over time.
• Outliers: Time series data may contain a notable amount of outliers.
These outliers can be noted when plotted on a graph.
• Seasonality: Some data in time series tends to repeat over a certain
interval in some patterns. We refer to such repeating patterns as
seasonality.
• Abrupt changes: Sometimes, there is an uneven change in time series
data. We refer to such uneven changes as abrupt changes. Observing
abrupt changes in time series is essential as it reveals essential
underlying phenomena.
• Constant variance over time: It is essential to look at the time series
data and see whether or not the data exhibits constant variance over
time.
Time Series Analysis (TSA) with Open
Power System Data
• We can use the Open Power System dataset to
discover how electricity consumption and
production varies over time in Germany.
• Importing the dataset
# load time series dataset
df_power =
pd.read_csv("https://raw.githubusercontent.com/je
nfly/opsd/master/opsd_germany_daily.csv")
print(df_power.columns)
The output of the preceding code is given here:
Index(['Consumption', 'Wind', 'Solar', 'Wind+Solar'],
dtype='object')
The columns of the dataframe are described here:
• Date: The date is in the format yyyy-mm-dd.
• Consumption: This indicates electricity consumption in
GWh.
• Solar: This indicates solar power production in GWh.
• Wind+Solar: This represents the sum of solar and wind
power production in GWh.
Data cleaning
1. We can start by checking the shape of the dataset:
df_power.shape
The output of the preceding code is given here:
(4383, 5)
The dataframe contains 4,283 rows and 5 columns.
2. We can also check few entries inside the dataframe.
Let's examine the last 10 entries:
print(df_power.tail(10))
The output of the preceding code is given here:
3. Next, let's review the data types of
each column in our df_power dataframe:
print(df_power.dtypes)
The output of the preceding code is given here:
Date object
Consumption float64
Wind float64
Solar float64
Wind+Solar float64
dtype: object
4. Note that the Date column has a data type of object. This is not
correct. So, the next step is to correct the Date column, as shown
here:
#convert object to datetime format
df_power['Date'] = pd.to_datetime(df_power['Date'])
5. It should convert the Date column to Datetime format. We can verify
this again:
print(df_power.dtypes)
The output of the preceding code is given here:
Date datetime64[ns]
Consumption float64
Wind float64
Solar float64
Wind+Solar float64
dtype: object
Note that the Date column has been changed into the correct data
type.
6. Let's next change the index of our dataframe
to the Date column:
df_power = df_power.set_index('Date')
df_power.tail(3)
The output of the preceding code is given
here:
Note from the preceding screenshot that the Date column has
been set as DatetimeIndex
7. We can simply verify this by using the code snippet given here:
Print(df_power.index)
The output of the preceding code is given here:
DatetimeIndex(['2006-01-01', '2006-01-02', '2006-01-03',
'2006-01-04', '2006-01-05', '2006-01-06', '2006-01-07',
'2006-01-08', '2006-01-09', '2006-01-10', ... '2017-12-22',
'2017-12-23', '2017-12-24', '2017-12-25', '2017-12-26',
'2017-12-27', '2017-12-28', '2017-12-29', '2017-12-30',
'2017-12-31'],dtype='datetime64[ns]', name='Date', length=4383,
freq=None)
8. Since our index is the DatetimeIndex object, now we can use it to
analyze thedataframe. Let's add more columns to our dataframe to
make it easier. Let's add Year, Month, and Weekday Name:
# Add columns with year, month, and weekday name
df_power['Year'] = df_power.index.year
df_power['Month'] = df_power.index.month
df_power['Weekday Name'] = df_power.index.day_name()
9. Let's display five random rows from the dataframe:
# Display a random sampling of 5 rows
print(df_power.sample(5, random_state=0))
The output of this code is given here:
Note that we added three more columns—Year, Month, and
Weekday Name. Adding these columns helps to make the
analysis of data easier.
Time-based indexing
Time-based indexing is a very powerful method of the pandas
library. Having time-based indexing allows using a formatted string
to select data.
See the following code, for example:
print(df_power.loc['2015-10-02'])
The output of the preceding code is given here:
Consumption 1391.05
Wind 81.229
Solar 160.641
Wind+Solar 241.87
Year 2015
Month 10
Weekday Name Friday
Name: 2015-10-02 00:00:00, dtype: object
Note that we used the pandas dataframe loc accessor. In the preceding
example, we used a date as a string to select a row. We can use all sorts of
techniques to access rows just as we can do with a normal dataframe
index.
Visualizing time series
Let's visualize the time series dataset. We will continue using the
same df_power dataframe:
1. The first step is to import the seaborn and matplotlib libraries:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})
plt.rcParams['figure.figsize'] = (8,5)
plt.rcParams['figure.dpi'] = 150
2. Next, let's generate a line plot of the full time series of Germany's
daily electricity consumption:
df_power['Consumption'].plot(linewidth=0.5)
As depicted in the preceding screenshot, the y-axis shows the
electricity consumption and the x-axis shows the year.
However, there are too many datasets to cover all the years.
The output of the preceding code is given here:
3. Let's use the dots to plot the data for all the other columns:
cols_to_plot = ['Consumption', 'Solar', 'Wind']
axes = df_power[cols_to_plot].plot(marker='.', alpha=0.5,
linestyle='None',figsize=(14, 6), subplots=True)
for ax in axes:
ax.set_ylabel('Daily Totals (GWh)')
The output of the preceding code is given here:
The output shows that electricity consumption can be broken down into two
distinct patterns:
One cluster roughly from 1,400 GWh and above
Another cluster roughly below 1,400 GWh
Moreover, solar production is higher in summer and lower in winter. Over the years,
there seems to have been a strong increasing trend in the output of wind power.
4. We can further investigate a single year to have a closer look.
Check the code given here:
ax = df_power.loc['2016', 'Consumption'].plot()
ax.set_ylabel('Daily Consumption (GWh)');
The output of the preceding code is given here:
From the preceding screenshot, we can see clearly the
consumption of electricity for 2016.
The graph shows a drastic decrease in the consumption of
electricity at the end of the year(December) and during August.
Let's examine the month of December 2016 with the following
code block:
ax = df_power.loc['2016-12',
'Consumption'].plot(marker='o', linestyle='-')
ax.set_ylabel('Daily Consumption (GWh)');
The output of the preceding code is given here:
As shown in the preceding graph, electricity consumption is higher
on weekdays and lowest at the weekends. We can see the
consumption for each day of the month. We can zoom in further to
see how consumption plays out in the last week of December.
In order to indicate a particular week of December, we can supply a specific date
range as shown here:
ax = df_power.loc['2016-12-23':'2016-12-30',
'Consumption'].plot(marker='o', linestyle='-')
ax.set_ylabel('Daily Consumption (GWh)');
As illustrated in the preceding code, we want to see the electricity consumption
between 2016-12-23 and 2016-12-30. The output of the preceding code is given here:
As illustrated in the preceding screenshot, electricity consumption was lowest
on the day of Christmas, probably because people were busy partying. After
Christmas, the consumption increased.
Grouping time series data
1. We can first group the data by months and then use the
box plots to visualize the data:
fig, axes = plt.subplots(3, 1, figsize=(8, 7), sharex=True)
for name, ax in zip(['Consumption', 'Solar', 'Wind'], axes):
sns.boxplot(data=df_power, x='Month', y=name, ax=ax)
ax.set_ylabel('GWh')
ax.set_title(name)
if ax != axes[-1]:
ax.set_xlabel('')
The output of the preceding code is given here:
2. Next, we can group the consumption of electricity by the
day of the week, and present it in a box plot:
sns.boxplot(data=df_power, x='Weekday Name',
y='Consumption');
The output of the preceding code is given here:
The preceding screenshot shows that electricity consumption is higher on
weekdays than on weekends. Interestingly, there are more outliers on the
weekdays.
Resampling time series data
It is often required to resample the dataset at lower or higher frequencies. This
resampling is done based on aggregation or grouping operations. For example, we can
resample the data based on the weekly mean time series as follows:
1. We can use the code given here to resample our data:
columns = ['Consumption', 'Wind', 'Solar', 'Wind+Solar']
power_weekly_mean = df_power[columns].resample('W').mean()
power_weekly_mean
The output of the preceding code is given here:
As shown in the preceding screenshot, the first row, labeled 2006-01-01, includes the
average of all the data. We can plot the daily and weekly time series to compare the
dataset over the six-month period.
2. Let's see the last six months of 2016. Let's start by initializing
the variable:
start, end = '2016-01', '2016-06‘
3. Next, let's plot the graph using the code given here:
fig, ax = plt.subplots()
ax.plot(df_power.loc[start:end, 'Solar'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
ax.plot(power_weekly_mean.loc[start:end, 'Solar'],
marker='o', markersize=8, linestyle='-', label='Weekly Mean
Resample')
ax.set_ylabel('Solar Production in (GWh)')
ax.legend();
The output of the preceding code is given here:
The preceding screenshot shows that the weekly mean
time series is increasing over time and is much smoother
than the daily time series.

More Related Content

What's hot

Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
Wei-Yuan Chang
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
Data cubes
Data cubesData cubes
Data cubes
Mohammed
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Web data mining
Web data miningWeb data mining
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Mustafa Sherazi
 
Data reduction
Data reductionData reduction
Data reduction
kalavathisugan
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
Anush90
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
MamathaGuntu1
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
Sai Kumar Kodam
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data MiningR A Akerkar
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Job sequencing with Deadlines
Job sequencing with DeadlinesJob sequencing with Deadlines
Job sequencing with Deadlines
YashiUpadhyay3
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In R
Rsquared Academy
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
pkaviya
 
Intermediate code generator
Intermediate code generatorIntermediate code generator
Intermediate code generator
sanchi29
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Yugal Kumar
 

What's hot (20)

Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data cubes
Data cubesData cubes
Data cubes
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Web data mining
Web data miningWeb data mining
Web data mining
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Data reduction
Data reductionData reduction
Data reduction
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Job sequencing with Deadlines
Job sequencing with DeadlinesJob sequencing with Deadlines
Job sequencing with Deadlines
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In R
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Intermediate code generator
Intermediate code generatorIntermediate code generator
Intermediate code generator
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 

Similar to Unit-5 Time series data Analysis.pptx

Stata time-series-fall-2011
Stata time-series-fall-2011Stata time-series-fall-2011
Stata time-series-fall-2011
Feryanto W K
 
Using the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesUsing the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slides
Tiffany Timbers
 
DSE-complete.pptx
DSE-complete.pptxDSE-complete.pptx
DSE-complete.pptx
RanjithKumar888622
 
R_Proficiency.pptx
R_Proficiency.pptxR_Proficiency.pptx
R_Proficiency.pptx
Shivammittal880395
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
Anju Garg
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
Julian Hyde
 
[open source] hamilton, a micro framework for creating dataframes, and its ap...
[open source] hamilton, a micro framework for creating dataframes, and its ap...[open source] hamilton, a micro framework for creating dataframes, and its ap...
[open source] hamilton, a micro framework for creating dataframes, and its ap...
Stefan Krawczyk
 
COMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docxCOMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docx
pickersgillkayne
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
University of Illinois,Chicago
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
Abhik Seal
 
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docx
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docxprog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docx
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docx
stilliegeorgiana
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
University of Illinois,Chicago
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
Prashant Vats
 
unit 5_Real time Data Analysis vsp.pptx
unit 5_Real time Data Analysis  vsp.pptxunit 5_Real time Data Analysis  vsp.pptx
unit 5_Real time Data Analysis vsp.pptx
prakashvs7
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
rahulsm27
 
ACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm AngeliniACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm AngeliniBrandon Angelini
 
Dynamically Evolving Systems: Cluster Analysis Using Time
Dynamically Evolving Systems: Cluster Analysis Using TimeDynamically Evolving Systems: Cluster Analysis Using Time
Dynamically Evolving Systems: Cluster Analysis Using Time
Magnify Analytic Solutions
 
Final project kijtorntham n
Final project kijtorntham nFinal project kijtorntham n
Final project kijtorntham n
NatsarankornKijtornt
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Jason L Brugger
 
Forecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsForecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsGeoffery Mullings
 

Similar to Unit-5 Time series data Analysis.pptx (20)

Stata time-series-fall-2011
Stata time-series-fall-2011Stata time-series-fall-2011
Stata time-series-fall-2011
 
Using the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slidesUsing the python_data_toolkit_timbers_slides
Using the python_data_toolkit_timbers_slides
 
DSE-complete.pptx
DSE-complete.pptxDSE-complete.pptx
DSE-complete.pptx
 
R_Proficiency.pptx
R_Proficiency.pptxR_Proficiency.pptx
R_Proficiency.pptx
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
[open source] hamilton, a micro framework for creating dataframes, and its ap...
[open source] hamilton, a micro framework for creating dataframes, and its ap...[open source] hamilton, a micro framework for creating dataframes, and its ap...
[open source] hamilton, a micro framework for creating dataframes, and its ap...
 
COMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docxCOMP41680 - Sample API Assignment¶In [5] .docx
COMP41680 - Sample API Assignment¶In [5] .docx
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docx
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docxprog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docx
prog-05.pdfProgramming Assignment #5CSci 430, Spring 2.docx
 
Pumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency AnalysisPumps, Compressors and Turbine Fault Frequency Analysis
Pumps, Compressors and Turbine Fault Frequency Analysis
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
 
unit 5_Real time Data Analysis vsp.pptx
unit 5_Real time Data Analysis  vsp.pptxunit 5_Real time Data Analysis  vsp.pptx
unit 5_Real time Data Analysis vsp.pptx
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
 
ACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm AngeliniACMS TV Ratings Midterm Angelini
ACMS TV Ratings Midterm Angelini
 
Dynamically Evolving Systems: Cluster Analysis Using Time
Dynamically Evolving Systems: Cluster Analysis Using TimeDynamically Evolving Systems: Cluster Analysis Using Time
Dynamically Evolving Systems: Cluster Analysis Using Time
 
Final project kijtorntham n
Final project kijtorntham nFinal project kijtorntham n
Final project kijtorntham n
 
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
Hands-On with U-SQL and Azure Data Lake Analytics (ADLA)
 
Forecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series ModelsForecasting Revenue With Stationary Time Series Models
Forecasting Revenue With Stationary Time Series Models
 

More from Sheba41

Unit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.ppt
Unit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.pptUnit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.ppt
Unit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.ppt
Sheba41
 
Advanced Datastructures and algorithms CP4151unit1b.pdf
Advanced Datastructures and algorithms CP4151unit1b.pdfAdvanced Datastructures and algorithms CP4151unit1b.pdf
Advanced Datastructures and algorithms CP4151unit1b.pdf
Sheba41
 
CP4151 Advanced data structures and algorithms
CP4151 Advanced data structures and algorithmsCP4151 Advanced data structures and algorithms
CP4151 Advanced data structures and algorithms
Sheba41
 
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and AlgorithmsCP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
Sheba41
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
Sheba41
 
pig.ppt
pig.pptpig.ppt
pig.ppt
Sheba41
 
HadoooIO.ppt
HadoooIO.pptHadoooIO.ppt
HadoooIO.ppt
Sheba41
 

More from Sheba41 (7)

Unit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.ppt
Unit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.pptUnit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.ppt
Unit 2 ADvanced Data Sturctures and Algorithms Red-black_trees.ppt
 
Advanced Datastructures and algorithms CP4151unit1b.pdf
Advanced Datastructures and algorithms CP4151unit1b.pdfAdvanced Datastructures and algorithms CP4151unit1b.pdf
Advanced Datastructures and algorithms CP4151unit1b.pdf
 
CP4151 Advanced data structures and algorithms
CP4151 Advanced data structures and algorithmsCP4151 Advanced data structures and algorithms
CP4151 Advanced data structures and algorithms
 
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and AlgorithmsCP4151 ADSA unit1 Advanced Data Structures and Algorithms
CP4151 ADSA unit1 Advanced Data Structures and Algorithms
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
pig.ppt
pig.pptpig.ppt
pig.ppt
 
HadoooIO.ppt
HadoooIO.pptHadoooIO.ppt
HadoooIO.ppt
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 

Unit-5 Time series data Analysis.pptx

  • 1. AD3301 DATA EXPLORATION AND VISUALIZATION Unit 5 TIME SERIES ANALYSIS Fundamentals of TSA – Characteristics of time series data – Data Cleaning – Time-based indexing – Visualizing – Grouping – Resampling.
  • 2. Time series data Time series data includes timestamps and is generated while monitoring the industrial process or tracking any business metrics. An ordered sequence of timestamp values at equally spaced intervals is referred to as a time series. Analysis of a time series is used in many applications such as sales forecasting, utility studies, budget analysis, economic forecasting, inventory studies. There many methods that can be used to model and forecast time series.
  • 3. Fundamentals of TSA 1. We can generate the dataset using the numpy library: import os import numpy as np import matplotlib from matplotlib import pyplot as plt import seaborn as sns zero_mean_series = np.random.normal(loc=0.0, scale=1., size=50) print(zero_mean_series)
  • 4. The output of the preceding code is given here: [ 0.91315139 0.51955858 -1.03172053 -0.725203 1.88933611 -0.39631515 0.71957305 0.01773119 -1.88369523 0.62272576 -1.22417583 -0.3920638 0.45239854 0.15720562 0.11885262 -0.96940705 -1.20997492 0.93202519 -0.37246211 1.11134324 0.15633954 -0.5439416 0.16875613 0.2826228 0.58295158 0.3245175 0.42985676 0.97500729 0.24721019 -0.45684401 -0.58347696 -0.68752098 0.82822652 -0.72181389 0.39490961 -1.792727 -0.6237392 -0.24644562 -0.22952135 3.06311553 -3.05745406 1.37894995 -0.39553 -0.26359025 -0.21658428 0.63820235 -1.7740917 0.66671788 -0.89029947 0.39759542]
  • 5. 2. Next, we use the seaborn library to plot the time series data. plt.figure(figsize=(16, 8)) g = sns.lineplot(data=zero_mean_series) g.set_title('Zero mean model') g.set_xlabel('Time index') plt.show()
  • 6. We plotted the time series graph using the seaborn.lineplot() function which is a built-in method provided by the seaborn library. The output of the preceding code is given here:
  • 7. 3. We can perform a cumulative sum over the list and then plot the data using a time series plot. The plot gives more interesting results random_walk = np.cumsum(zero_mean_series) print(random_walk) It generates an array of the cumulative sum as shown here: [ 0.91315139 1.43270997 0.40098944 -0.32421356 1.56512255 1.1688074 1.88838045 1.90611164 0.0224164 0.64514216 -0.57903366 -0.97109746 -0.51869892 -0.36149331 -0.24264069 -1.21204774 -2.42202265 -1.48999747 -1.86245958 -0.75111634 -0.5947768 -1.1387184 -0.96996227 -0.68733947 -0.10438789 0.22012962 0.64998637 1.62499367 1.87220386 1.41535986 0.8318829 0.14436192 0.97258843 0.25077455 0.64568416 -1.14704284 -1.77078204 -2.01722767 -2.24674902 0.81636651 -2.24108755 -0.86213759 -1.25766759 -1.52125784 -1.73784212 -1.09963977 -2.87373147 -2.20701359 -3.09731306 -2.69971764] Note that for any particular value, the next value is the sum of previous values.
  • 8. 4. Now, if we plot the list using the time series plot as shown here, we get an interesting graph that shows the change in values over time: plt.figure(figsize=(16, 8)) g = sns.lineplot(data=random_walk) g.set_title('Random Walk') g.set_xlabel('Time index') plt.show()
  • 9. The output of the preceding code is given here: Note the graph shown in the preceding diagram. It shows the change of values over time.
  • 10. Univariate time series • When we capture a sequence of observations for the same variable over a particular duration of time, the series is referred to as univariate time series. • In general, in a univariate time series, the observations are taken over regular time periods. • (E.g.) The change in temperature over time throughout a day.
  • 11. Characteristics of time series data • Trend: When looking at time series data, it is essential to see if there is any trend. Observing a trend means that the average measurement values seem either to decrease or increase over time. • Outliers: Time series data may contain a notable amount of outliers. These outliers can be noted when plotted on a graph. • Seasonality: Some data in time series tends to repeat over a certain interval in some patterns. We refer to such repeating patterns as seasonality. • Abrupt changes: Sometimes, there is an uneven change in time series data. We refer to such uneven changes as abrupt changes. Observing abrupt changes in time series is essential as it reveals essential underlying phenomena. • Constant variance over time: It is essential to look at the time series data and see whether or not the data exhibits constant variance over time.
  • 12. Time Series Analysis (TSA) with Open Power System Data • We can use the Open Power System dataset to discover how electricity consumption and production varies over time in Germany. • Importing the dataset # load time series dataset df_power = pd.read_csv("https://raw.githubusercontent.com/je nfly/opsd/master/opsd_germany_daily.csv") print(df_power.columns)
  • 13. The output of the preceding code is given here: Index(['Consumption', 'Wind', 'Solar', 'Wind+Solar'], dtype='object') The columns of the dataframe are described here: • Date: The date is in the format yyyy-mm-dd. • Consumption: This indicates electricity consumption in GWh. • Solar: This indicates solar power production in GWh. • Wind+Solar: This represents the sum of solar and wind power production in GWh.
  • 14. Data cleaning 1. We can start by checking the shape of the dataset: df_power.shape The output of the preceding code is given here: (4383, 5) The dataframe contains 4,283 rows and 5 columns. 2. We can also check few entries inside the dataframe. Let's examine the last 10 entries: print(df_power.tail(10))
  • 15. The output of the preceding code is given here:
  • 16. 3. Next, let's review the data types of each column in our df_power dataframe: print(df_power.dtypes) The output of the preceding code is given here: Date object Consumption float64 Wind float64 Solar float64 Wind+Solar float64 dtype: object
  • 17. 4. Note that the Date column has a data type of object. This is not correct. So, the next step is to correct the Date column, as shown here: #convert object to datetime format df_power['Date'] = pd.to_datetime(df_power['Date']) 5. It should convert the Date column to Datetime format. We can verify this again: print(df_power.dtypes) The output of the preceding code is given here: Date datetime64[ns] Consumption float64 Wind float64 Solar float64 Wind+Solar float64 dtype: object Note that the Date column has been changed into the correct data type.
  • 18. 6. Let's next change the index of our dataframe to the Date column: df_power = df_power.set_index('Date') df_power.tail(3) The output of the preceding code is given here: Note from the preceding screenshot that the Date column has been set as DatetimeIndex
  • 19. 7. We can simply verify this by using the code snippet given here: Print(df_power.index) The output of the preceding code is given here: DatetimeIndex(['2006-01-01', '2006-01-02', '2006-01-03', '2006-01-04', '2006-01-05', '2006-01-06', '2006-01-07', '2006-01-08', '2006-01-09', '2006-01-10', ... '2017-12-22', '2017-12-23', '2017-12-24', '2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28', '2017-12-29', '2017-12-30', '2017-12-31'],dtype='datetime64[ns]', name='Date', length=4383, freq=None) 8. Since our index is the DatetimeIndex object, now we can use it to analyze thedataframe. Let's add more columns to our dataframe to make it easier. Let's add Year, Month, and Weekday Name: # Add columns with year, month, and weekday name df_power['Year'] = df_power.index.year df_power['Month'] = df_power.index.month df_power['Weekday Name'] = df_power.index.day_name()
  • 20. 9. Let's display five random rows from the dataframe: # Display a random sampling of 5 rows print(df_power.sample(5, random_state=0)) The output of this code is given here: Note that we added three more columns—Year, Month, and Weekday Name. Adding these columns helps to make the analysis of data easier.
  • 21. Time-based indexing Time-based indexing is a very powerful method of the pandas library. Having time-based indexing allows using a formatted string to select data. See the following code, for example: print(df_power.loc['2015-10-02']) The output of the preceding code is given here: Consumption 1391.05 Wind 81.229 Solar 160.641 Wind+Solar 241.87 Year 2015 Month 10 Weekday Name Friday Name: 2015-10-02 00:00:00, dtype: object Note that we used the pandas dataframe loc accessor. In the preceding example, we used a date as a string to select a row. We can use all sorts of techniques to access rows just as we can do with a normal dataframe index.
  • 22. Visualizing time series Let's visualize the time series dataset. We will continue using the same df_power dataframe: 1. The first step is to import the seaborn and matplotlib libraries: import matplotlib.pyplot as plt import seaborn as sns sns.set(rc={'figure.figsize':(11, 4)}) plt.rcParams['figure.figsize'] = (8,5) plt.rcParams['figure.dpi'] = 150 2. Next, let's generate a line plot of the full time series of Germany's daily electricity consumption: df_power['Consumption'].plot(linewidth=0.5)
  • 23. As depicted in the preceding screenshot, the y-axis shows the electricity consumption and the x-axis shows the year. However, there are too many datasets to cover all the years. The output of the preceding code is given here:
  • 24. 3. Let's use the dots to plot the data for all the other columns: cols_to_plot = ['Consumption', 'Solar', 'Wind'] axes = df_power[cols_to_plot].plot(marker='.', alpha=0.5, linestyle='None',figsize=(14, 6), subplots=True) for ax in axes: ax.set_ylabel('Daily Totals (GWh)') The output of the preceding code is given here: The output shows that electricity consumption can be broken down into two distinct patterns: One cluster roughly from 1,400 GWh and above Another cluster roughly below 1,400 GWh Moreover, solar production is higher in summer and lower in winter. Over the years, there seems to have been a strong increasing trend in the output of wind power.
  • 25. 4. We can further investigate a single year to have a closer look. Check the code given here: ax = df_power.loc['2016', 'Consumption'].plot() ax.set_ylabel('Daily Consumption (GWh)'); The output of the preceding code is given here: From the preceding screenshot, we can see clearly the consumption of electricity for 2016. The graph shows a drastic decrease in the consumption of electricity at the end of the year(December) and during August.
  • 26. Let's examine the month of December 2016 with the following code block: ax = df_power.loc['2016-12', 'Consumption'].plot(marker='o', linestyle='-') ax.set_ylabel('Daily Consumption (GWh)'); The output of the preceding code is given here: As shown in the preceding graph, electricity consumption is higher on weekdays and lowest at the weekends. We can see the consumption for each day of the month. We can zoom in further to see how consumption plays out in the last week of December.
  • 27. In order to indicate a particular week of December, we can supply a specific date range as shown here: ax = df_power.loc['2016-12-23':'2016-12-30', 'Consumption'].plot(marker='o', linestyle='-') ax.set_ylabel('Daily Consumption (GWh)'); As illustrated in the preceding code, we want to see the electricity consumption between 2016-12-23 and 2016-12-30. The output of the preceding code is given here: As illustrated in the preceding screenshot, electricity consumption was lowest on the day of Christmas, probably because people were busy partying. After Christmas, the consumption increased.
  • 28. Grouping time series data 1. We can first group the data by months and then use the box plots to visualize the data: fig, axes = plt.subplots(3, 1, figsize=(8, 7), sharex=True) for name, ax in zip(['Consumption', 'Solar', 'Wind'], axes): sns.boxplot(data=df_power, x='Month', y=name, ax=ax) ax.set_ylabel('GWh') ax.set_title(name) if ax != axes[-1]: ax.set_xlabel('') The output of the preceding code is given here:
  • 29. 2. Next, we can group the consumption of electricity by the day of the week, and present it in a box plot: sns.boxplot(data=df_power, x='Weekday Name', y='Consumption'); The output of the preceding code is given here: The preceding screenshot shows that electricity consumption is higher on weekdays than on weekends. Interestingly, there are more outliers on the weekdays.
  • 30. Resampling time series data It is often required to resample the dataset at lower or higher frequencies. This resampling is done based on aggregation or grouping operations. For example, we can resample the data based on the weekly mean time series as follows: 1. We can use the code given here to resample our data: columns = ['Consumption', 'Wind', 'Solar', 'Wind+Solar'] power_weekly_mean = df_power[columns].resample('W').mean() power_weekly_mean The output of the preceding code is given here: As shown in the preceding screenshot, the first row, labeled 2006-01-01, includes the average of all the data. We can plot the daily and weekly time series to compare the dataset over the six-month period.
  • 31. 2. Let's see the last six months of 2016. Let's start by initializing the variable: start, end = '2016-01', '2016-06‘ 3. Next, let's plot the graph using the code given here: fig, ax = plt.subplots() ax.plot(df_power.loc[start:end, 'Solar'], marker='.', linestyle='-', linewidth=0.5, label='Daily') ax.plot(power_weekly_mean.loc[start:end, 'Solar'], marker='o', markersize=8, linestyle='-', label='Weekly Mean Resample') ax.set_ylabel('Solar Production in (GWh)') ax.legend();
  • 32. The output of the preceding code is given here: The preceding screenshot shows that the weekly mean time series is increasing over time and is much smoother than the daily time series.