Time series analysis is conducted on daily views of Wikipedia article. The data set contains individual Pages and daily views of the pages.
The total number of pages in the data set is 145k. The training data set 1 contains daily views from July 1st 2015 to Dec 31st 2016 with a total number of 550 days.
Testing of forecast model is based on data from January, 1st, 2017 up until March 1st, 2017, which is 60 days including 1st march 2017.
Using artificial intelligence algorithms and machine learning techniques, ToPa 3D was able to identify the center point of each nursery plant pot in the ground for precision trimming operations. This saved Woodburn Nursery time and money to ensure their operation maintained maximum efficiency
(KO) <<2019 데이터야놀자>> 에서 발표한 내용입니다.
사회적으로 문제가 되고 있는 어뷰징 분석과 이를 방지할 수 있는 방법에 대해 정리해보았습니다.
(EN) Presented in Datayanolja 2019 (Domestic data conference).
Dealing with one of the big social issues: manipulation of public opinions in online news platform.
Main contributions are two-fold: 1) identifying manipulators 2) suggesting possible 3 solutions
* notice: The material is written in Korean.
Sample Calculations for solar rooftop project in Indiadisruptiveenergy
If you want to setup a solar rooftop power plant on your roof, you must get the feasibility report first to confirm the profitability of the proposed project. Things such as Payback Period, Return On Investment(ROI), Internal Rate of Return and 100 more factors affecting the solar power output from your installed capacity. This is how a sample case calculation looks like.
Using artificial intelligence algorithms and machine learning techniques, ToPa 3D was able to identify the center point of each nursery plant pot in the ground for precision trimming operations. This saved Woodburn Nursery time and money to ensure their operation maintained maximum efficiency
(KO) <<2019 데이터야놀자>> 에서 발표한 내용입니다.
사회적으로 문제가 되고 있는 어뷰징 분석과 이를 방지할 수 있는 방법에 대해 정리해보았습니다.
(EN) Presented in Datayanolja 2019 (Domestic data conference).
Dealing with one of the big social issues: manipulation of public opinions in online news platform.
Main contributions are two-fold: 1) identifying manipulators 2) suggesting possible 3 solutions
* notice: The material is written in Korean.
Sample Calculations for solar rooftop project in Indiadisruptiveenergy
If you want to setup a solar rooftop power plant on your roof, you must get the feasibility report first to confirm the profitability of the proposed project. Things such as Payback Period, Return On Investment(ROI), Internal Rate of Return and 100 more factors affecting the solar power output from your installed capacity. This is how a sample case calculation looks like.
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...Peter Laurinec
This paper presents a new method for forecasting the load of individual electricity consumers using smart grid data and clustering. The data from all consumers are used for clustering to create more suitable training sets to forecasting methods. Before clustering, time series are efficiently preprocessed by normalisation and the computation of representations of time series using a multiple linear regression model. Final centroid-based forecasts are scaled by saved normalisation parameters to create forecast for every consumer. Our method is compared with the approach that creates forecasts for every consumer separately. Evaluation and experiments were conducted on two large smart meter datasets from residences of Ireland and factories of Slovakia.
The achieved results proved that our clustering-based method improves forecasting accuracy and decreases high rates of errors (maximum). It is also more scalable since it is not necessary to train the model for every consumer.
Stuck with your Forecasting Assignment? Get 24/7 help from tutors with Phd in the subject. Email us at support@helpwithassignment.com
Reach us at http://www.HelpWithAssignment.com
New Clustering-based Forecasting Method for Disaggregated End-consumer Electr...Peter Laurinec
This paper presents a new method for forecasting the load of individual electricity consumers using smart grid data and clustering. The data from all consumers are used for clustering to create more suitable training sets to forecasting methods. Before clustering, time series are efficiently preprocessed by normalisation and the computation of representations of time series using a multiple linear regression model. Final centroid-based forecasts are scaled by saved normalisation parameters to create forecast for every consumer. Our method is compared with the approach that creates forecasts for every consumer separately. Evaluation and experiments were conducted on two large smart meter datasets from residences of Ireland and factories of Slovakia.
The achieved results proved that our clustering-based method improves forecasting accuracy and decreases high rates of errors (maximum). It is also more scalable since it is not necessary to train the model for every consumer.
Stuck with your Forecasting Assignment? Get 24/7 help from tutors with Phd in the subject. Email us at support@helpwithassignment.com
Reach us at http://www.HelpWithAssignment.com
Application of different tools such as CAGE framework and market entry strategies id different developing & developed economies and evaluating the success of Zara in India
RBL Bank is one of the fast growing private banks in India. A detailed general environment analysis(PESTEL), Industry analysis(Porter's 5 forces), VRIO analysis carried to look at the strategy analysis and formulated strategy for different business verticals, as part of the Project in MBA
P&G Strategic Restructuring of Global Business Service. Evaluation of strengths & weakness of outsourcing vs insourcing. Different difficulties in the decision. Effective communication strategy for employees
Data Visualization is widely used in industries in info-graphics design, business analytics, data analytics, advanced analytics, business intelligence dashboards, content marketing. It is the 1st part of 3 part series on data visualization. These techniques will enable you to create a good design UI/UX. It contains r codes useful for programmers to create good visual charts and depict a story to clients, customer, senior management, etc ...
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. Web Traffic Time Series
Forecasting
SUBMITTED BY –
Korivi Sravan Kumar
2. Introduction:
The data contains daily views of Wikipedia article. The data set contains individual Pages
and daily views of the pages.
The total number of pages in the data set is 145k. The training data set 1 contains daily views
from July 1st
2015 to Dec 31st
2016 with a total number of 550 days.
Testing of forecast model is based on data from January, 1st, 2017 up until March 1st, 2017,
which is 60 days including 1st
march 2017.
The training dataset 2 contains data set upto 1st
Sept 2017.
Test data set has been created from training data set 2 for evaluating accuracy.
Importing libraries:
All the libraries imported for data manipulation, time series and forecasting
Data Input:
Creation of training and test data sets:
The data is converted into training & testing data based on Train1 and Train 2 data sets.
Columns from train 2 data set are selected from Jan1st 2017 to March 1st
2018 including 1st
march.
library(forecast) #working with time series
library(fpp2) #working with time series
library(dplyr) # data manipulation
library(tidyverse) #data manipulation
library(lubridate) # easily work with dates and times
library(zoo) # working with time series data
setwd(“D:/Assignment-2/”) #Set the working directory
train <- read.csv("train_1.csv") #Read train_1 csv file
dim(train) # Rows = 145063; Columns = 551
rows_count = nrow(train) #No. of rows
cols_count = ncol(train) #No. of columns
train2 <- read.csv("train_2.csv") #Read train_2 csv file
dim(train2)
test <- train2[, (cols_count+1): (cols_count+60)] # 551+60(days) =611
3. After converting the data to train and test data sets. Each page time series data needs to be
converted into time series for forecasting.
To make better understanding of the code, we selected a random row using sample() and used
the row number 707772 to explain the process of conversion to time series data for
application of different forecasting models and evaluation methodology of various
forecasting models.
In actual all the code from below is run a loop to get forecast for each page as presented in
the kaggle –‘Web Time Series Forecasting’ which is provided at the end of the document.
Converting to time series
trainsep = train[70772,]
testsep = test[70772,]
sum = sum(train[1,2:cols_count])
if(!is.na(sum)){
f = t(trainsep[,-c(1,552)])
f_test = t(testsep)
f = data.frame(f,substr(row.names(f),2,11))
colnames(f) = c("visits","dat")
# To convert X(yyyy.mm.dd) into date(yyyy.mm.dd)
f_test = data.frame(f_test,substr(row.names(f_test),2,11))
colnames(f_test) = c("visits","dat")
#---------------------Rest of the code is in the if condition------------------------
}
f.ts = ts(f$visits, start = c(2015, 07, 01), frequency = 7) # to create time series object
f.ts = tsclean(f.ts) # To Identify and Replace Outliers And Missing Values In A Time Series
5. ggAcf(f.ts)
Box test performed to check whether the time series is white noise or not. As p-value < 0.05,
the time series is not whitenoise.
> Box.test(f.ts, lag = 10, fitdf = 0, type = "Lj")
Box-Ljung test
data: f.ts
X-squared = 5260.9, df = 10, p-value < 2.2e-16
Forecasting models:
For the data, forecasting is applied by using Naïve forecast, snaive forecast, moving average
forecast, simple exponential smoothing, holt’s smoothing and holt’s winter smoothing to
check for the next 60 days forecast.
1. Naïve forecast:
Naïve forecast is applied on the training time series.
Output:
> summary(fcnaive_ts)
fcnaive_ts = naive(f.ts, 60)
summary(fcnaive_ts)
autoplot(fcnaive_ts)
checkresiduals(fcnaive_ts)
10. Upon checking the residuals, and perform box test, the p-value <0.05. It suggests that
residuals is not white noise.
3. Moving average:
4. Simple exponential smoothing:
autoplot(f.ts, series = "Data") +
autolayer(ma(f.ts, 7), series = "1 week MA") +
autolayer(ma(f.ts, 31), series = "1 month MA") +
autolayer(ma(f.ts, 91), series = "3 month MA") +
autolayer(ma(f.ts, 183), series = "6 month MA") +
xlab("Date") +
ylab("visits")
11. Output:
> checkresiduals(fcses_ts)
Ljung-Box test
data: Residuals from Simple exponential smoothing
Q* = 908.14, df = 108, p-value < 2.2e-16
Model df: 2. Total lags used: 110
fcses_ts <- ses(f.ts, alpha = .2, h = 60) # simple exponential moving average
summary(fcses_ts)
autoplot(fcses_ts) #plot
checkresiduals(fcses_ts) #residuals to check whether it is white noise or not
12. As p value of Box text <0.05, the residuals are white noise, as the data contains both trend
and seasonality.
5.Holt’s smoothing
> checkresiduals(fcholt_ts)
fcholt_ts <- holt(f.ts, h = 60)
summary(fcholt_ts)
autoplot(fcholt_ts)
checkresiduals(fcholt_ts)
13. Ljung-Box test
data: Residuals from Holt's method
Q* = 1002, df = 106, p-value < 2.2e-16
Model df: 4. Total lags used: 110
Upon tuning the beta parameters,
# identify optimal alpha parameter
beta <- seq(.0001, .5, by = .001)
RMSE <- NA
for(i in seq_along(beta)) {
fit <- holt(f.ts, beta = beta[i], h = 60)
RMSE[i] <- accuracy(fit, f_test$visits)[2,2]
}
# convert to a data frame and idenitify min alpha value
beta.fit <- data_frame(beta, RMSE)
beta.min <- filter(beta.fit, RMSE == min(RMSE))
# plot RMSE vs. alpha
ggplot(beta.fit, aes(beta, RMSE)) +
geom_line() +
geom_point(data = beta.min, aes(beta, RMSE), size = 2, color = "blue")
fcholt_ts <- holt(f.ts, h = 90, belta = beta.min$beta)
14. 6. Holt’s winter smoothing:
Decomposition of additional time series:
hw.ts <- ets(f.ts, model = "ZZZ")
checkresiduals(hw.ts)
autoplot(hw.ts)
summary(hw.ts)
15. > summary(hw.ts)
ETS(M,N,M)
Call:
ets(y = f.ts, model = "ZZZ")
Smoothing parameters:
alpha = 0.6672
gamma = 0.0364
Initial states:
l = 194.5145
s = 1.1697 1.0074 0.9371 0.9015 0.9571 1.0013
1.0259
sigma: 0.1116
AIC AICc BIC
8362.877 8363.286 8405.977
Training set error measures:
ME RMSE MAE MPE MAPE MASE
ACF1
Training set 2.652725 88.74605 61.03587 -0.1384452 7.216793 0.6028258 -0.0
1053619
The Holt winter model of ETS(M,N,M) has residuals with higher p-value than other models.
16. Evaluating the different forecast models:
Every model is evaluated against RMSE of test data. On the basis of lower RMSE, Holt’s
method is selected and used to forecast.
> accuracy(fcnaive_ts, f_test$visits)
ME RMSE MAE MPE MAPE MASE
ACF1
Training set 1.967213 102.6296 68.49265 -0.2271511 7.699251 1.000000 -
0.1835412
Test set 283.950000 419.6527 302.65000 15.8649924 17.797103 4.418722
NA
> accuracy(fcsnaive_ts, f_test$visits)
ME RMSE MAE MPE MAPE MASE A
CF1
Training set 16.93582 145.0159 101.2496 1.3114613 11.96902 1.478255 0.6341
429
Test set 46.02771 315.4499 181.7056 0.2651809 11.04817 2.652921
NA
> accuracy(mean_fc, f_test$visits)
ME RMSE MAE MPE MAPE MASE
ACF1
Training set 4.291307e-14 751.6030 694.2092 -119.25917 156.48239 10.135528
0.98933
Test set 4.602121e+02 554.3247 466.6034 27.59744 28.31075 6.812459
NA
> accuracy(fcses_ts,f_test$visits)
ME RMSE MAE MPE MAPE MASE
ACF1
Training set 22.432887 128.3612 87.42038 1.732784 9.373469 1.276347 0.6
310515
Test set -3.173597 309.0159 188.65375 -3.246674 11.869201 2.754365
NA
> accuracy(fcholt_ts,f_test$visits)
ME RMSE MAE MPE MAPE MASE
ACF1
Training set -3.993354 99.2416 66.82666 -1.6738377 7.629988 0.9756764 0
.08924597
Test set 28.649399 308.8983 193.02659 -0.9831642 11.896523 2.8182087
NA
> accuracy(fcets_ts, f_test$visits)
ME RMSE MAE MPE MAPE MASE
ACF1
Training set 2.652725 88.74605 61.03587 -0.1384452 7.216793 0.8911302
-0.01053619
Test set 114.850686 314.26532 173.62511 4.8900676 10.024239 2.5349451
NA
17. R code to run for 145 k pages automatically:
#Library
library(forecast) #working with time series
library(fpp2) #working with time series
library('dplyr') # data manipulation
library('tidyverse') #data manipulation
library(lubridate) # easily work with dates and times
library(zoo) # working with time series data
#train data
train <- read.csv("train_1.csv")
dim(train)
# head(train)
rows_count = nrow(train)
cols_count = ncol(train)
train2 <- read.csv("train_2.csv")
dim(train2)
#Creation of test data from training data set
test <- train2[, (cols_count+1):(cols_count+60)]
dim(test)
for(j in 1:nrow(train)){
trainsep = train[j,]
testsep = test[j,]
sum = sum(train[1,2:cols_count])
if(!is.na(sum)){
#Matrix to store RMSE of training and test data set accuracy of forecasts
accur <- matrix(, nrow = 6, ncol = 2)
#Data imputations
f = t(trainsep[,-c(1,552)])
f_test = t(testsep)
head(f_test)
f = data.frame(f,substr(row.names(f),2,11))
colnames(f) = c("visits","dat")
f_test = data.frame(f_test,substr(row.names(f_test),2,11))
colnames(f_test) = c("visits","dat")
head(f)
head(f_test)
#Creation of timeseries data after cleaning using ts and tsclean
f.ts =tsclean(ts(f$visits,frequency = 7))
head(f.ts, 45)
18. #Data Exploration
autoplot(f.ts)
gglagplot(f.ts)
acf(f.ts)
Box.test(f.ts, lag = 10, fitdf = 0, type = "Lj")
#Removing trend and to check for the seasonality
f.ts.dif = diff(f.ts)
gglagplot(f.ts.dif)
ggAcf(f.ts.dif)
autoplot(f.ts.dif)
f_test.dif <- diff(f_test$visits)
Box.test(f.ts.dif, lag = 10, fitdf = 0, type = "Lj")
ggAcf(f.ts)
#Naive test
fcnaive_ts = naive(f.ts, 60)
summary(fcnaive_ts)
autoplot(fcnaive_ts)
checkresiduals(fcnaive_ts)
act = accuracy(fcnaive_ts, f_test$visits)
accur[1,1] = act[2,2] #test RMSE accuracy
accur[1,2] = act[1,2] #trin RMSE accuracy
#seasonal naive test
fcsnaive_ts = snaive(f.ts,60)
summary(fcsnaive_ts)
autoplot(fcsnaive_ts)
checkresiduals(fcsnaive_ts)
act = accuracy(fcsnaive_ts, f_test$visits)
accur[2,1] = act[2,2] #test RMSE accuracy
accur[2,2] = act[1,2] #trin RMSE accuracy
#mean forecast
mean_fc <- meanf(f.ts, h = 60)
act = accuracy(mean_fc, f_test$visits)
accur[3,1] = act[2,2] #test RMSE accuracy
accur[3,2] = act[1,2] #trin RMSE accuracy
#SES(Simple Exponential smoothing)
fcses_ts <- ses(f.ts, alpha = .2, h = 60)
summary(fcses_ts)
autoplot(fcses_ts)
checkresiduals(fcses_ts)
accuracy(fcses_ts,f_test$visits)
fces_ts1 <-ses(f.ts.dif, alpha = .2, h = 60)
autoplot(fces_ts1)
summary(fces_ts1)
autoplot(f.ts.dif)
checkresiduals(fces_ts1)
accuracy(fces_ts1,f_test.dif)
alpha <- seq(.01, .99, by = .01)
RMSE <- NA
for(i in seq_along(alpha)) {
fit <- ses(f.ts, alpha = alpha[i], h = 60)
RMSE[i] <- accuracy(fit, f_test$visits)[2,2]
}
19. alpha.fit <- data_frame(alpha, RMSE)
alpha.min <- filter(alpha.fit, RMSE == min(RMSE))
ggplot(alpha.fit, aes(alpha, RMSE)) +
geom_line() +
geom_point(data = alpha.min, aes(alpha, RMSE), size = 2, color = "blue")
fcses_ts <- ses(f.ts, alpha = alpha.min$alpha, h = 60)
autoplot(fcses_ts)
act = accuracy(fcses_ts,f_test$visits)
accur[4,1] = act[2,2] #test RMSE accuracy
accur[4,2] = act[1,2] #trin RMSE accuracy
fcholt_ts <- holt(f.ts, h = 60)
summary(fcholt_ts)
autoplot(fcholt_ts)
checkresiduals(fcholt_ts)
act = accuracy(fcholt_ts,f_test$visits)
accur[5,1] = act[2,2] #test RMSE accuracy
accur[5,2] = act[1,2] #trin RMSE accuracy
# identify optimal alpha parameter
beta <- seq(.0001, .5, by = .001)
RMSE <- NA
for(i in seq_along(beta)) {
fit <- holt(f.ts, beta = beta[i], h = 60)
RMSE[i] <- accuracy(fit, f_test$visits)[2,2]
}
# convert to a data frame and idenitify min alpha value
beta.fit <- data_frame(beta, RMSE)
beta.min <- filter(beta.fit, RMSE == min(RMSE))
# plot RMSE vs. alpha
ggplot(beta.fit, aes(beta, RMSE)) +
geom_line() +
geom_point(data = beta.min, aes(beta, RMSE), size = 2, color = "blue")
fcholt_ts <- holt(f.ts, h = 60, belta = beta.min$beta)
act = accuracy(fcholt_ts,f_test$visits)
accur[5,1] = act[2,2] #test RMSE accuracy
accur[5,2] = act[1,2] #trin RMSE accuracy
autoplot(decompose(f.ts))
#HoltWinters seasonal model
hw.ts <- ets(f.ts, model = "ZZZ")
checkresiduals(hw.ts)
autoplot(hw.ts)
summary(hw.ts)
fcets_ts <- forecast(hw.ts, h = 60)
act= accuracy(fcets_ts, f_test$visits)
accur[6,1] = act[2,2] #test RMSE accuracy
accur[6,2] = act[1,2] #trin RMSE accuracy
#Model evaluation using RMSE of test data
method = c("naive","snaive","mean", "ses","holts","aes")
accur1 = data_frame(method, as.vector(t(accur[,1])))
colnames(accur1) = c("method","RMSE_TEST")
minimum <- filter(accur1, RMSE_TEST == min(RMSE_TEST))
20. Conclusion:
Each series will have different forecast depending upon the trend, seasonality and error terms
in the page visits daily. Some of the pages have no trend, some have trend and seasonality.
Some have no trend but seasonality. Data exploration has been used to understand about the
time series. Acf plots help us in understanding the autocorrelation lag plots. Using the
moving average, time series plots are used to understand for smoothing the data.
Different forecast models are used to understand about the time series. Navie, seasonal
naïve, simple exponential smoothing, holt’s smoothing, holt-winters smoothing used for the
forecasting. While using the forecasting models, residual plots are made to check whether the
error is centered around 0, ACF plots lie within in the range of Box test > 0.05.
RMSE used to evaluate the different models. Based on the lower RMSE value, the forecast
model is selected to predict the next 60 days page visits.
if (minimum$method == "naive"){
fcnaive_ts
}else if(minimum$method == "snaive"){
fcsnaive_ts
}else if(minimum$method == "mean"){
mean_fc
}else if(minimum$method == "ses"){
fcses_ts
}else if(minimum$method == "holts"){
fcholt_ts
}else if(minimum$method == "aes"){
fcets_ts
}
}
}