Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Portland oregon riders monthly data Using R
1.
2. CONTRIBUTION
ISHITA MATHUR – CODING IN R
MANSI MARWAHA – ANALYSIS FOR PPT
RIYA SEHGAL – MODEL BUILDING AND ANALYSIS FOR PPT
MEGHNA BAID – ANALYSIS FOR PPT
3. WHAT IS A TIME SERIES ?
• ANY METRIC THAT IS MEASURED OVER REGULAR TIME INTERVALS FORMS A
TIME SERIES. ANALYSIS OF TIME SERIES IS COMMERCIALLY IMPORTANCE
BECAUSE OF INDUSTRIAL NEED AND RELEVANCE ESPECIALLY W.R.T
FORECASTING (DEMAND, SALES, SUPPLY ETC).
• EACH DATA POINT AT TIME T IN A TIME SERIES CAN BE EXPRESSED AS
EITHER A SUM OR A PRODUCT OF 3 COMPONENTS, NAMELY, SEASONALITY
(ST), TREND (TT) AND ERROR (ET) (A.K.A WHITE NOISE).
4. AIM OF THE ANALYSIS
• TO ANALYSE TREND AND SEASONALITY OF THE RIDERS IN PORTLAND
OREGAN
• TO DEPICT INSIGHTS USING R
• TO FORECAST THE VALUES FOR THE FUTURE
5. UNDERSTANDING THE DATASET
• IT IS A MONTHLY COUNT OF RIDERS FOR THE PORTLAND PUBLIC
TRANSPORTATION SYSTEM. THE WEBSITE STATES THAT IT IS FROM JANUARY
1960 THROUGH JUNE 1969
• IT CONTAINS 2 COLUMNS ONE REPRESENTS THE MONTH AND OTHER
REPRESENTS THE TOTAL NUMBER OF RIDERS FOR THE PORTLAND PUBLIC
TRANSPORTATION SYSTEM
• FOLLOWING IS A LINK TO THE CASE STUDY
• HTTPS://WWW.KAGGLE.COM/HSANKESARA/PORTLAND-OREGON-AVG-RIDER-
MONTHLY-DATA/DATA#PORTLAND-OREGON-AVERAGE-MONTHLY-.CSV
6. METHODOLOGY
• EXPLORATORY DATA ANALYSIS
• TIME SERIES AND DECOMPOSING PHASE
• HYPOTHESIS FOR STATIONARY AND NON STATIONARY SERIES
• CHECKING ACCURACY
• FORECASTING FOR THE FUTURE
7. CONVERTING THE DATA SET INTO TIME
SERIES
WHILE WORKING WITH A TIME SERIES ANALYSIS IT IS VERY IMPORTANT TO
UNDERSTAND THE TRENDS, SEASONALITY AND ERROR IN THE DATA SET.
FOR THAT WE NEED TO CHANGE THE DATA SET INTO TIME SERIES SO AS TO
APPLY FURTHER FUNCTIONS
HERE WE HAVE USED TS () FUNCTION TO CONVERT THE DATA INTO TIME
SERIES
8. EXPLORATORY DATA ANALYSIS
• IT IS USED TO CHECK THE OVERALL STRUCTURE AND SUMMARY OF DATA TO
UNDERSTAND AN OVERALL PICTURE ABOUT THE DATA.
• IT IS USEFUL TO FIND BASIC VALUES AND OUTLIERS
• HERE WE FOUND OUT
THE STRUCTURE OF DATA SET USING STR() FUNCTION
FIRST 6 ROWS OF DATA USING HEAD() FUNCTION
NAMES OF THE COLUMNS USING NAMES() FUNCTION
NA VALUES IN THE DATA USING IS.NA() FUNCTION
RANGE: 613 – 1558
AVERAGE COUNT OF PEOPLE IS 1120
9. VISUAL REPRESENTATION OF TIME SERIES
• VISUAL REPRESENTATION IS OFTEN
CONSIDERED A BETTER WAY TO
UNDERSTAND THE DATA
• HERE WE HAVE USED THE FUNCTION
PLOT() TO CREATE A PLOT TO
VISUALLY IDENTIFY THE TRENDS
AND SEASONALITY
10. OUTLIER DETECTIONBox-Plot shows that there is no outlier in
dataset i.e. monthly count of riders for the
Portland public transportation system.
In case there are outliers in the dataset, we
have two methods to treat them.
IQR Test
This method detects an outlier where the Demand
value is greater than (or equal to) Q3+1.5*(Q3-Q1)
11. DECOMPOSITION OF TIME SERIES COMPONENTS
• SEASONAL TREND
DECOMPOSITION USING (STL) IS
AN ALGORITHM THAT WAS
DEVELOPED TO HELP TO DIVIDE UP
A TIME SERIES INTO THREE
COMPONENTS NAMELY:
1.TREND
2.SEASONALITY
3.REMAINDER
12. PROPERTIES OF RESIDUAL
• RESIDUAL MUST BE STATIONARY (STATISTICAL PROPERTY MUST BE CONSTANT)
• STATIONARITY MEANS MEAN IS CONSTANT , VARIANCE IS CONSTANT AND
AUTOCOVARIANCE IS CONSTANT FOR SAME ORDER
• AUGMENTED DICKEY FULLER TEST
• H0 : SERIES IS NOT STATIONARY
• H1: SERIES IS STATIONARY
• FUNCTION USED- ADF.TEST(RESIDUAL)
• HERE, P VALUE = 0.01 < 0.05 , THEREFORE HO GETS REJECTED
• THEREFORE, RESIDUALS ARE STATIONARY.
• IN CASE, RESIDUALS ARE NOT STATIONARY WE TAKE THE LAG VALUES BY
DIFFERENCING.
13. ARIMA MODEL
• AUTO REGRESSIVE INTEGRATED MOVING AVERAGE’ IS A CLASS OF MODELS THAT
‘EXPLAINS’ A GIVEN TIME SERIES BASED ON ITS OWN PAST VALUES, THAT IS, ITS OWN
LAGS AND THE LAGGED FORECAST ERRORS, SO THAT EQUATION CAN BE USED TO
FORECAST FUTURE VALUES. AN ARIMA MODEL IS CHARACTERIZED BY 3 TERMS: P, D, Q
• WHERE,
P IS THE ORDER OF THE AR TERM
Q IS THE ORDER OF THE MA TERM
D IS THE NUMBER OF DIFFERENCING REQUIRED TO MAKE THE TIME SERIES STATIONARY
14. PLOT OF PACF CHART
Partial
autocorrelation
can be imagined
as the correlation
between the series
and its lag, after
excluding the
contributions from
the intermediate
lags
15. PLOT OF ACF CHART
ACF is a plot of
total correlation
between different
lag functions.
16. FINDING ORDER OF THE ARMA MODEL
• THE REQUIRED NUMBER OF AR TERMS BY INSPECTING THE PARTIAL
AUTOCORRELATION (PACF) PLOT. (HERE, P=1.5)
• THE RIGHT ORDER OF DIFFERENCING IS THE MINIMUM DIFFERENCING
REQUIRED TO GET A NEAR-STATIONARY SERIES (HERE, D=0)
• THE ACF TELLS HOW MANY MA TERMS ARE REQUIRED TO REMOVE ANY
AUTOCORRELATION IN THE SERIES. (HERE, Q=1)
17. MODEL BUILDING AND ACCURACY
AS NOW RESIDUALS ARE STATIONARY AND WE FOUND OUT THE ORDER OF
ARIMA MODEL.
MODEL IS BUILT BY - RESID_MODEL<-ARIMA(PORTLANDTS,ORDER =
C(1.5,0,1))
ACCURACY
AS MEAN ABSOLUTE PERCENTAGE ERROR (MAPE) IS 4.14 WHICH IS LESS
THAN 7 , THE MODEL IS GOOD.
18. 3 MONTHS FORECAST OF MONTHLY COUNT OF RIDERS FOR
THE PORTLAND PUBLIC TRANSPORTATION SYSTEM.
21. LJUNG BOX STATISTIC
•H0: NO AUTOCORRELATION
H1: AUTOCORRELATION
•HERE P-VALUE IS 0.9 WHICH IS MORE THAN 0.05
THEREFORE, NULL IS ACCEPTED, THERE IS NO AUTOCORRELATION
•THIS IS A GOOD MODEL