SlideShare a Scribd company logo
1 of 9
M140 Introducing Statistics
Answer:
Arima
Slide 1:
One of the most important question involved in this presentation is “what is data analysis?”.
Data analysis is crucial in any business firm and it involves the process of examining a given
data set to enable one to draw the most appropriate conclusion about the key question that
they wanted to answer. It is most of the time is a continuous process that involves the
collection and analysis of data that is still under scrutiny, that is because research normally
tries to identify the patterns that are present in the entire data that has been collected.
Tools like MS Excel, Python, RStudio etc. are used in any type of data analysis project.
In the current project Python has been selected as the data analysis tool. Python has become
a very important tool that researchers prefer when they want to conduct analysis of any
given data set, and this is because the software is flexible and the language is easy for
analysts to understand.
An ARIMA model will be used in order to forecast a given set of data.
Slide 2:
ARIMA model normally entail three terms that is p, d and q. These terms have meaning and
are very essential for this model, for instance;
P – Stands for the order of the Autoregressive (AR) term
d – Stands for the total number of differentiations that is needed to make the time series
analysis stationary, keeping in mind that for a stationery time series d is always equal to
zero (d = 0).
q – Stands for the Moving Average (MA) term. It entails the total number of lagged errors
that should be in the ARIMA model.
Slide 3:
1) Investigation of the concept of time series and forecasting. The project will try and bring
forth how the aspect time series and forecasting are intertwined. Forecasting puts into use
the previous information of a certain matter under study to predict the future outcome or
how the situation will be in future, and time series forecasting uses aspects such as
historical trends, cyclical analysis, and the idea of seasonality. That is why the concepts will
be used in the project to determine the future outcome in business using the data from Light
IT consultancy firm. This is to determine whether or not the implementation of ARIMA
model will be effective in future in the business sector.
2) To perform the required gathering of the system development. This project will ensure
that it has adequate system development that will be necessary for the effective
development of the ARIMA model using Python 3 software. This will also show how critical
it is to gather the requirement that is necessary for system development. This is because for
the model to work a system has to be developed; this system will ensure that the time series
is stationary and show if the project team understands what is needed from the project
because it is from this phase that the most important requirement of the project can be seen
or discovered. Therefore, this project is aiming at doing the requirement gathering properly
to avoid issues that will cause the failure or delay of the project.
3) To design and model the proposed ARIMA system focusing on business customer data
from Light IT consultancy firm. This project aims at implementing the use of ARIMA model
in business sector, and therefore, a good system will ensure that the model works
effectively. The system will ensure that the data analysis is done with precision and the
results are interpreted appropriately, to allow entrepreneurs to make proper decision for
their business. The data from the consultancy firm will be analyzed using the Python
analysis tool and use the ARIMA model that entails time series analysis and forecasting.
4) Coding the system using Python programing language. Python programming language
allows one to work faster and integrate the system in the most effective manner. That is
why this project aims at using the python coding system so as to be able to finish the project
analysis faster and obtain the most effective results. This is because python allow analysts
or programmers to do clear and logical coding.
5) To test and validate the ARIMA system code against the requirements using business
customer data from Light IT firm. This is the part where the project will analyze the data
and implement the use of the suggested model in the analysis so that clear interpretations
can be made. This project wants to determine if the model will be effective in the data
analysis process and prove that it can be used in the business sector during data analysis
and predict the future outcome of a business using the forecasting technique of time series
analysis.
Slide 4:
The challenges involved in the project are:
Hardware crush. Hard wares such as computers sometimes can crush dues to internal drive
malfunctioning, this can lead to loss of the data or important information that is to be used
in the project, other hardware materials include external storage devices that might get lost
or be infected with malicious.
Power shortage. The data analysis soft wares and machines depend on electricity for
functionality, and whenever there is power shortage, these processes cannot continue. That
is why a constant supply of power is very necessary for this project.
Data loss. This is a problem that sometimes arises due to poor storage of the data set, the
data set might be lost because it might have been stored in a device that was infected by
viruses. In other cases, the data may be lost because the storage device was stolen. For this
project no data has been lost yet and to avoid this from happening several backup plans
have been put in place.
Poor management of change. It is obvious that change is always inevitable when doing a
project, and when team members involved in the project do not quickly accept change then
that might be a challenge because there will be no generation of ideas that might be helpful
to the project. At some point of this project this problem has been experienced, the
difference in ideology resulted to chaos although it was resolved it might happen again in
future and to avoid that proper communication has been enhanced among team members.
Slide 5:
This is the general equation for an ARIMA(p,d,q) model. Where;
βo is a constant.
β1 …. βp are the auto-regression parameters to be estimated.
Xt is the observed value of time series at time t (month t= 1, 2, 3…).
θ1… θp are the moving average parameters to be estimated.
Xt-1, et-1, et-1, et-2…et-p are the observed value of time series at t-1 up to t-p.
εt is the error term with mean as 0 and variance as 1.
Slide 6:
Before proceeding with an ARIMA model, it is necessary to check the stationarity of the data
involved in the study. There are certain tests that help in checking the stationarity of the
data like Augmented dickey Fuller test and Phillips Perron unit test. The null hypothesis for
both the tests are same and states that the data is not stationary i.e. an unit root is present
in the data. The null hypothesis gets rejected when the p-value is obtained to be less than
0.05.
Slide 7:
From the graph it can be said that it is not stationary because the mean, variance, and
covariance is not constant over the given period of time. The different means and variances
can be seen on the distribution of the peaks and troughs that are on the graph, they are not
the same and this is because the data has trend meaning that there is increase in sales with
every different times. Due to this aspect the time series analysis cannot be performed.
Slide 8:
From the ACF results it can be seen that the lags immediately decay from zero to a negative
which is a sign for a stationary data, with this we see that there is one lag before proceeding
to the next lag, also from the new graph for the stationary data. We can see that there is no
trend meaning that the mean and the variances are equal to show that the data is now
stationary. The data is now perfect to conduct the analysis that we wanted to conduct to
help get the best ARIMA model for this series.
Slide 9:
The best ARIMA model has been found to be ARIMA model of order (9, 2, 0). This is the best
model that anyone can implement into the Python 3 in the business sector. The model can
also be used by the Light IT firm to carry out any other analysis that it would wish to do.
Having this model does not mean that it is the best, other models can also be used depends
entirely on a person’s preference.
From the research, it was realized that the use of ARIMA model in Python 3 is the most
effective form of data analysis, this was achieved when the data from Light IT consultancy
firm was tested using the model and the results were pleasing because the forecast of the
data was determined in the easiest way possible, because from the analysis we were able to
know that the data was not stationary therefore, the process of differencing had to be done
in order to make the data a stationary one. This brought the realization that all firms in the
business sector should use the same model for their data analysis and keep their businesses
on track.
With the advancement in technology everybody wants to develop a coding language that is
best understood by them, software these days work with codes and thus avoiding long
structural sentence that take plenty of time to write. This research brought us to the
realization that coding languages like the ones that are used in the Python software might
look difficult to handle but they are the best to use, this is to imply that any company should
consider having a expert who is good at coding to facilitate fast manipulation of the
company’s data.
From the extensive discussion that we had conducted in this study, it was realized that the
ARIMA model is actually the best time series model that many people always sue when they
want to conduct time series research, the others are good yes but just not as good as the
ARIMA model. The model is the best because it uses the lagged moving average to smoothen
the time series under study, it also works on the assumption that the future depends on past
incidences, it is broadly used for the statistical and technical analysis to get a forecast of the
data.
Slide 10:
A few recommendations can be made based on the results:
Companies should venture into the use of software to boost their businesses; this is because
the world is advancing and the analogue method of data storage and analysis is almost
forgotten. A good software takes a lot of details in a firm that cannot be done manually and
if done manually could take longer than expected. Tools such as python not only analyzes
the job but also keep the results in store for future referencing.
The business sector should use coding language to facilitate faster manipulation of data,
since computers use artificial intelligence coding will be an easier way to tell or command
the computer of what you would like to be helped with. This makes this machine language a
very important aspect for all businesses.
It is wise if all companies store all the record of their sales for future reference, for example
a time might come when a company wants to know how it has been generally fairing on
since it was started. It is the data that has always been stored that will enhance this, because
without data the company cannot know if it is doing well or not, and where it need to make
adjustments.
Regression
Slide 1:
Regression analysis is the most widely used technique for fitting models to data. When a
regression model is fit using ordinary least squares, we get a few statistics to describe a
large set of data. These statistics can be highly influenced by a small set of data that is
different from the bulk of the data. These points could be y-type outliers (vertical outliers)
that do not follow the general model of the data or x-type outliers (leverage points) that are
systematically different from the rest of the explanatory data. We can also have points that
are both leverage points and vertical outliers, which are sometimes referred to as bad
leverage points. Collectively, we call any points of these kinds’ outliers.
There are a few techniques to anticipate the boundaries in regression; one of the strategies
is Ordinary Least Square (OLS). Assessing boundaries with OLS should satisfy some
endorsed suppositions, with mistakes commonly free and ordinary with a center worth 0
and fluctuation 2.
The Assumptions Involved In Regression Are:
Autocorrelation: It measures how the lagged version of the value of a variable is related to
the original version of it, when a time series data is considered.
Heteroscedasticity: It refers to situations where the variance of the residuals is unequal
over a range of measured values. When running a regression analysis, heteroscedasticity
results in an unequal scatter of the residuals
Multicollinearity: is a statistical concept where several independent variables in a model are
correlated. Two variables are considered to be perfectly collinear if their correlation
coefficient is +/- 1.0. Multicollinearity among independent variables will result in less
reliable statistical inferences
Normality: it means that the data under study must follow a normal distribution
Slide 2
The objective deliverable in consideration are:
To define data analytics in the business and data usage perspectives: Defining what exactly
is data analytics gives a broader perspective of what the topic of discussion is all about, in
this case, data analytics is described as the analysis of data to make a substantive conclusion
on a given topic of study (Mehta, and Pandit, 2018). This will be achieved through research
and reading of materials to get a precise understanding of what exactly is data analytics in
the data processing and business decision-making process
To justify the need for data analytics in the business decision-making process: This
objective is tailored to identify the key gaps in conventional data processing thus presenting
analytics as a solution through comparing the traditional data processing and their
efficiency levels to the data analytics and its respective efficiency in the business decision-
making process
To identify the regression data analytics techniques and algorithms: The identification of
the diverse techniques used in regression analytics is key in tailoring the research into
achieving the overall goal of the research as specified in the topic of study
To demonstrate the benefits of regression algorithms in predictive analytics: By attaining
the demonstration of regression analytics and its respective analytics, the benefits drawn
from it shall be identified as the specifics to be achieved in this objective. This is through a
hands-on demonstration of one or two regression algorithms used in data analytics.
Slide 3
Regression analysis is one of the significant and usually utilized factual devices for
examining the connection between a reliant and at least one autonomous factor, with wide
applications in the field of money, economic aspects, medication, and brain research. A
regression technique is for the most part characterized as
In which Y depicted as the dependent variable as well as ε derived as the true residual
vector as well as X depicting the design matrix finalizes to become the n × p. Derive β based
as the estimator for the β as well as the below (2) depicted to represent the next latter fitted
residuals.
The regression investigation ordinarily utilizes the least-squares strategy for assessment of
model boundaries under certain presumptions to be fulfilled, like the ordinariness of
mistakes with zero mean and steady change, i.e., ε ∼ N (0, δ2 ).
Slide 4
Outliers being conflicting perceptions and generally veered off from most of the perceptions
in information need legitimate taking care of as they present a genuine danger to the
regression model and its assessed coefficients and, thus, give deceiving results (Werner,
2019). Two kinds of outliers can occur in the regression dataset. One with extremely
enormous qualities in the reaction is alluded to as upward outliers, while perceptions with
extremely huge qualities in the explanatory variable are called influence focuses.
Robust regression is an innovative process for overcoming the problem of outliers and
strong perceptions in data and limiting their influence on the regression results. The vast
majority of so-called robust regression techniques lack this characteristic. The basic goal of
robust assessment is to provide reliable evaluations/derivations for oblique borders while
keeping outliers at bay. The robust system substitutes some other capacity for the OLS's
number of squared residuals, which is usually less impacted by unusual perceptions. These
methods first fit the data to regression and then identify outliers as perceptions with large
residuals. Effectiveness, breakdown point, and limited impact are three desirable features of
robust tactics. The breakdown point is only a small fraction of the unexpected impressions
that an assessor might face before making an incorrect decision.
Least Trimmed squares (LTS) is an exceptionally robust and sensibly useful assessor among
each of the robust assessors accessible in the writing, and it is acquired by confining the
managed amount of the squared residuals. The LTS assessor is a modified version of the LS
assessor that focuses on the more important features while ignoring the extreme
impressions in the organized data.
Slide 5
This is the result of the Residual normality test that was carried out using the Kolmogorov
Smirnov test. The p-values are very low. Therefore null hypothesis is rejected so it tends to
be inferred that the residuals of old style linear regression models are not normally
distributed. Residuals that are not normally distributed can be brought about by an outlier
in the information.
Slide 6
Outlier detection is done using TRES detection and this result was obtained. In light of the
results from the table, it is realized that all malnutrition information know 2012-2017 has
outliers
Slide 7
These figures demonstrate the standardized residuals vs fitted quality and robust distances
for India, respectively. A thorough investigation reveals that both population growth and
foreign direct investment inflows have a significant role in Pakistan's monetary
development. Nonetheless, FDI inflows are inextricably linked to population growth, and
monetary growth is inextricably linked to population growth, even if the influence of gross
investment funds is minor.
Slide 8
The current study examined the impact of FDI inflow, yearly basis population growth, as
well as gross investment companies on Pakistan's and India's GDP per capita using least
squares (LS) and high breakdown robust least tried squares (LTS) regression techniques.
FDI has a small but positive impact on both Pakistan's and India's monetary growth within
the LS structure; however, once the LTS approach is implemented, FDI becomes a decisive
and vital part of Pakistan's financial development model. Nonetheless, due to the general
elimination of 5 and 2 exclusions from the information of Pakistan and India, respectively,
FDI has a negligible impact on the monetary economy of India. Populace development adds
to GDP per capita for the two economies indistinguishably. The two methods uncover that
quick populace development adversely impacts the monetary development of the two
nations and henceforth is a significant issue for the financial development of the two
economies, and it requires prompt consideration.

More Related Content

Similar to M140 Introducing Statistics.docx

operation research notes
operation research notesoperation research notes
operation research notes
Renu Thakur
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
Rachit Mishra
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
Rohit Dubey
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
ijdpsjournal
 
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docxRunning head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
todd271
 
Jinxing_LIN_S224266_Poster
Jinxing_LIN_S224266_PosterJinxing_LIN_S224266_Poster
Jinxing_LIN_S224266_Poster
jinxing lin
 

Similar to M140 Introducing Statistics.docx (20)

Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
Data Analytics Introduction.pptx
Data Analytics Introduction.pptxData Analytics Introduction.pptx
Data Analytics Introduction.pptx
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...
 
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
Study on Positive and Negative Rule Based Mining Techniques for E-Commerce Ap...
 
operation research notes
operation research notesoperation research notes
operation research notes
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
Rachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_reportRachit Mishra_stock prediction_report
Rachit Mishra_stock prediction_report
 
Guide for a Data Scientist
Guide for a Data ScientistGuide for a Data Scientist
Guide for a Data Scientist
 
Technovision
TechnovisionTechnovision
Technovision
 
Time Series Weather Forecasting Techniques: Literature Survey
Time Series Weather Forecasting Techniques: Literature SurveyTime Series Weather Forecasting Techniques: Literature Survey
Time Series Weather Forecasting Techniques: Literature Survey
 
applications and advantages of python
applications and advantages of pythonapplications and advantages of python
applications and advantages of python
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docxRunning head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
 
Jinxing_LIN_S224266_Poster
Jinxing_LIN_S224266_PosterJinxing_LIN_S224266_Poster
Jinxing_LIN_S224266_Poster
 

More from stirlingvwriters

Speak to the idea of feminism from your perspective and.docx
Speak to the idea of feminism from your perspective and.docxSpeak to the idea of feminism from your perspective and.docx
Speak to the idea of feminism from your perspective and.docx
stirlingvwriters
 
Thinking about password identify two that you believe are.docx
Thinking about password identify two that you believe are.docxThinking about password identify two that you believe are.docx
Thinking about password identify two that you believe are.docx
stirlingvwriters
 
The student will demonstrate and articulate proficiency in.docx
The student will demonstrate and articulate proficiency in.docxThe student will demonstrate and articulate proficiency in.docx
The student will demonstrate and articulate proficiency in.docx
stirlingvwriters
 
To help lay the foundation for your study of postmodern.docx
To help lay the foundation for your study of postmodern.docxTo help lay the foundation for your study of postmodern.docx
To help lay the foundation for your study of postmodern.docx
stirlingvwriters
 
TITLE Digital marketing before and after pandemic Sections that.docx
TITLE Digital marketing before and after pandemic Sections that.docxTITLE Digital marketing before and after pandemic Sections that.docx
TITLE Digital marketing before and after pandemic Sections that.docx
stirlingvwriters
 
This assignment focuses on Marxist students will educate.docx
This assignment focuses on Marxist students will educate.docxThis assignment focuses on Marxist students will educate.docx
This assignment focuses on Marxist students will educate.docx
stirlingvwriters
 
You enter your project team meeting with Mike and Tiffany.docx
You enter your project team meeting with Mike and Tiffany.docxYou enter your project team meeting with Mike and Tiffany.docx
You enter your project team meeting with Mike and Tiffany.docx
stirlingvwriters
 
Your software has gone live and is in the production.docx
Your software has gone live and is in the production.docxYour software has gone live and is in the production.docx
Your software has gone live and is in the production.docx
stirlingvwriters
 
This learning was a cornucopia of enrichment with regard.docx
This learning was a cornucopia of enrichment with regard.docxThis learning was a cornucopia of enrichment with regard.docx
This learning was a cornucopia of enrichment with regard.docx
stirlingvwriters
 
This is a school community relations My chosen school.docx
This is a school community relations My chosen school.docxThis is a school community relations My chosen school.docx
This is a school community relations My chosen school.docx
stirlingvwriters
 
Sociology researches social issues through the use of theoretical.docx
Sociology researches social issues through the use of theoretical.docxSociology researches social issues through the use of theoretical.docx
Sociology researches social issues through the use of theoretical.docx
stirlingvwriters
 
You are the newly hired Director of Risk Management for.docx
You are the newly hired Director of Risk Management for.docxYou are the newly hired Director of Risk Management for.docx
You are the newly hired Director of Risk Management for.docx
stirlingvwriters
 

More from stirlingvwriters (20)

Speak to the idea of feminism from your perspective and.docx
Speak to the idea of feminism from your perspective and.docxSpeak to the idea of feminism from your perspective and.docx
Speak to the idea of feminism from your perspective and.docx
 
What is the logic behind How.docx
What is the logic behind How.docxWhat is the logic behind How.docx
What is the logic behind How.docx
 
Thinking about password identify two that you believe are.docx
Thinking about password identify two that you believe are.docxThinking about password identify two that you believe are.docx
Thinking about password identify two that you believe are.docx
 
The student will demonstrate and articulate proficiency in.docx
The student will demonstrate and articulate proficiency in.docxThe student will demonstrate and articulate proficiency in.docx
The student will demonstrate and articulate proficiency in.docx
 
To help lay the foundation for your study of postmodern.docx
To help lay the foundation for your study of postmodern.docxTo help lay the foundation for your study of postmodern.docx
To help lay the foundation for your study of postmodern.docx
 
TITLE Digital marketing before and after pandemic Sections that.docx
TITLE Digital marketing before and after pandemic Sections that.docxTITLE Digital marketing before and after pandemic Sections that.docx
TITLE Digital marketing before and after pandemic Sections that.docx
 
This assignment focuses on Marxist students will educate.docx
This assignment focuses on Marxist students will educate.docxThis assignment focuses on Marxist students will educate.docx
This assignment focuses on Marxist students will educate.docx
 
Upton Souls of Black.docx
Upton Souls of Black.docxUpton Souls of Black.docx
Upton Souls of Black.docx
 
What is a In this.docx
What is a In this.docxWhat is a In this.docx
What is a In this.docx
 
There are many possible sources of literature for.docx
There are many possible sources of literature for.docxThere are many possible sources of literature for.docx
There are many possible sources of literature for.docx
 
You enter your project team meeting with Mike and Tiffany.docx
You enter your project team meeting with Mike and Tiffany.docxYou enter your project team meeting with Mike and Tiffany.docx
You enter your project team meeting with Mike and Tiffany.docx
 
Write a minimum of 200 words response to each post.docx
Write a minimum of 200 words response to each post.docxWrite a minimum of 200 words response to each post.docx
Write a minimum of 200 words response to each post.docx
 
View the video on Law at Discuss various.docx
View the video on Law at Discuss various.docxView the video on Law at Discuss various.docx
View the video on Law at Discuss various.docx
 
Your software has gone live and is in the production.docx
Your software has gone live and is in the production.docxYour software has gone live and is in the production.docx
Your software has gone live and is in the production.docx
 
This learning was a cornucopia of enrichment with regard.docx
This learning was a cornucopia of enrichment with regard.docxThis learning was a cornucopia of enrichment with regard.docx
This learning was a cornucopia of enrichment with regard.docx
 
This is a school community relations My chosen school.docx
This is a school community relations My chosen school.docxThis is a school community relations My chosen school.docx
This is a school community relations My chosen school.docx
 
Write 3 Only one resource is I.docx
Write 3 Only one resource is I.docxWrite 3 Only one resource is I.docx
Write 3 Only one resource is I.docx
 
Sociology researches social issues through the use of theoretical.docx
Sociology researches social issues through the use of theoretical.docxSociology researches social issues through the use of theoretical.docx
Sociology researches social issues through the use of theoretical.docx
 
Step Listen to the Trail of Tears.docx
Step Listen to the Trail of Tears.docxStep Listen to the Trail of Tears.docx
Step Listen to the Trail of Tears.docx
 
You are the newly hired Director of Risk Management for.docx
You are the newly hired Director of Risk Management for.docxYou are the newly hired Director of Risk Management for.docx
You are the newly hired Director of Risk Management for.docx
 

Recently uploaded

SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 

Recently uploaded (20)

Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
Climbers and Creepers used in landscaping
Climbers and Creepers used in landscapingClimbers and Creepers used in landscaping
Climbers and Creepers used in landscaping
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
demyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptxdemyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptx
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 

M140 Introducing Statistics.docx

  • 1. M140 Introducing Statistics Answer: Arima Slide 1: One of the most important question involved in this presentation is “what is data analysis?”. Data analysis is crucial in any business firm and it involves the process of examining a given data set to enable one to draw the most appropriate conclusion about the key question that they wanted to answer. It is most of the time is a continuous process that involves the collection and analysis of data that is still under scrutiny, that is because research normally tries to identify the patterns that are present in the entire data that has been collected. Tools like MS Excel, Python, RStudio etc. are used in any type of data analysis project. In the current project Python has been selected as the data analysis tool. Python has become a very important tool that researchers prefer when they want to conduct analysis of any given data set, and this is because the software is flexible and the language is easy for analysts to understand. An ARIMA model will be used in order to forecast a given set of data. Slide 2: ARIMA model normally entail three terms that is p, d and q. These terms have meaning and are very essential for this model, for instance; P – Stands for the order of the Autoregressive (AR) term d – Stands for the total number of differentiations that is needed to make the time series analysis stationary, keeping in mind that for a stationery time series d is always equal to zero (d = 0). q – Stands for the Moving Average (MA) term. It entails the total number of lagged errors
  • 2. that should be in the ARIMA model. Slide 3: 1) Investigation of the concept of time series and forecasting. The project will try and bring forth how the aspect time series and forecasting are intertwined. Forecasting puts into use the previous information of a certain matter under study to predict the future outcome or how the situation will be in future, and time series forecasting uses aspects such as historical trends, cyclical analysis, and the idea of seasonality. That is why the concepts will be used in the project to determine the future outcome in business using the data from Light IT consultancy firm. This is to determine whether or not the implementation of ARIMA model will be effective in future in the business sector. 2) To perform the required gathering of the system development. This project will ensure that it has adequate system development that will be necessary for the effective development of the ARIMA model using Python 3 software. This will also show how critical it is to gather the requirement that is necessary for system development. This is because for the model to work a system has to be developed; this system will ensure that the time series is stationary and show if the project team understands what is needed from the project because it is from this phase that the most important requirement of the project can be seen or discovered. Therefore, this project is aiming at doing the requirement gathering properly to avoid issues that will cause the failure or delay of the project. 3) To design and model the proposed ARIMA system focusing on business customer data from Light IT consultancy firm. This project aims at implementing the use of ARIMA model in business sector, and therefore, a good system will ensure that the model works effectively. The system will ensure that the data analysis is done with precision and the results are interpreted appropriately, to allow entrepreneurs to make proper decision for their business. The data from the consultancy firm will be analyzed using the Python analysis tool and use the ARIMA model that entails time series analysis and forecasting. 4) Coding the system using Python programing language. Python programming language allows one to work faster and integrate the system in the most effective manner. That is why this project aims at using the python coding system so as to be able to finish the project analysis faster and obtain the most effective results. This is because python allow analysts or programmers to do clear and logical coding. 5) To test and validate the ARIMA system code against the requirements using business customer data from Light IT firm. This is the part where the project will analyze the data and implement the use of the suggested model in the analysis so that clear interpretations can be made. This project wants to determine if the model will be effective in the data analysis process and prove that it can be used in the business sector during data analysis and predict the future outcome of a business using the forecasting technique of time series
  • 3. analysis. Slide 4: The challenges involved in the project are: Hardware crush. Hard wares such as computers sometimes can crush dues to internal drive malfunctioning, this can lead to loss of the data or important information that is to be used in the project, other hardware materials include external storage devices that might get lost or be infected with malicious. Power shortage. The data analysis soft wares and machines depend on electricity for functionality, and whenever there is power shortage, these processes cannot continue. That is why a constant supply of power is very necessary for this project. Data loss. This is a problem that sometimes arises due to poor storage of the data set, the data set might be lost because it might have been stored in a device that was infected by viruses. In other cases, the data may be lost because the storage device was stolen. For this project no data has been lost yet and to avoid this from happening several backup plans have been put in place. Poor management of change. It is obvious that change is always inevitable when doing a project, and when team members involved in the project do not quickly accept change then that might be a challenge because there will be no generation of ideas that might be helpful to the project. At some point of this project this problem has been experienced, the difference in ideology resulted to chaos although it was resolved it might happen again in future and to avoid that proper communication has been enhanced among team members. Slide 5: This is the general equation for an ARIMA(p,d,q) model. Where; βo is a constant. β1 …. βp are the auto-regression parameters to be estimated. Xt is the observed value of time series at time t (month t= 1, 2, 3…). θ1… θp are the moving average parameters to be estimated. Xt-1, et-1, et-1, et-2…et-p are the observed value of time series at t-1 up to t-p. εt is the error term with mean as 0 and variance as 1. Slide 6:
  • 4. Before proceeding with an ARIMA model, it is necessary to check the stationarity of the data involved in the study. There are certain tests that help in checking the stationarity of the data like Augmented dickey Fuller test and Phillips Perron unit test. The null hypothesis for both the tests are same and states that the data is not stationary i.e. an unit root is present in the data. The null hypothesis gets rejected when the p-value is obtained to be less than 0.05. Slide 7: From the graph it can be said that it is not stationary because the mean, variance, and covariance is not constant over the given period of time. The different means and variances can be seen on the distribution of the peaks and troughs that are on the graph, they are not the same and this is because the data has trend meaning that there is increase in sales with every different times. Due to this aspect the time series analysis cannot be performed. Slide 8: From the ACF results it can be seen that the lags immediately decay from zero to a negative which is a sign for a stationary data, with this we see that there is one lag before proceeding to the next lag, also from the new graph for the stationary data. We can see that there is no trend meaning that the mean and the variances are equal to show that the data is now stationary. The data is now perfect to conduct the analysis that we wanted to conduct to help get the best ARIMA model for this series. Slide 9: The best ARIMA model has been found to be ARIMA model of order (9, 2, 0). This is the best model that anyone can implement into the Python 3 in the business sector. The model can also be used by the Light IT firm to carry out any other analysis that it would wish to do. Having this model does not mean that it is the best, other models can also be used depends entirely on a person’s preference. From the research, it was realized that the use of ARIMA model in Python 3 is the most effective form of data analysis, this was achieved when the data from Light IT consultancy firm was tested using the model and the results were pleasing because the forecast of the data was determined in the easiest way possible, because from the analysis we were able to know that the data was not stationary therefore, the process of differencing had to be done in order to make the data a stationary one. This brought the realization that all firms in the business sector should use the same model for their data analysis and keep their businesses on track. With the advancement in technology everybody wants to develop a coding language that is best understood by them, software these days work with codes and thus avoiding long structural sentence that take plenty of time to write. This research brought us to the realization that coding languages like the ones that are used in the Python software might
  • 5. look difficult to handle but they are the best to use, this is to imply that any company should consider having a expert who is good at coding to facilitate fast manipulation of the company’s data. From the extensive discussion that we had conducted in this study, it was realized that the ARIMA model is actually the best time series model that many people always sue when they want to conduct time series research, the others are good yes but just not as good as the ARIMA model. The model is the best because it uses the lagged moving average to smoothen the time series under study, it also works on the assumption that the future depends on past incidences, it is broadly used for the statistical and technical analysis to get a forecast of the data. Slide 10: A few recommendations can be made based on the results: Companies should venture into the use of software to boost their businesses; this is because the world is advancing and the analogue method of data storage and analysis is almost forgotten. A good software takes a lot of details in a firm that cannot be done manually and if done manually could take longer than expected. Tools such as python not only analyzes the job but also keep the results in store for future referencing. The business sector should use coding language to facilitate faster manipulation of data, since computers use artificial intelligence coding will be an easier way to tell or command the computer of what you would like to be helped with. This makes this machine language a very important aspect for all businesses. It is wise if all companies store all the record of their sales for future reference, for example a time might come when a company wants to know how it has been generally fairing on since it was started. It is the data that has always been stored that will enhance this, because without data the company cannot know if it is doing well or not, and where it need to make adjustments. Regression Slide 1: Regression analysis is the most widely used technique for fitting models to data. When a regression model is fit using ordinary least squares, we get a few statistics to describe a large set of data. These statistics can be highly influenced by a small set of data that is different from the bulk of the data. These points could be y-type outliers (vertical outliers) that do not follow the general model of the data or x-type outliers (leverage points) that are systematically different from the rest of the explanatory data. We can also have points that are both leverage points and vertical outliers, which are sometimes referred to as bad leverage points. Collectively, we call any points of these kinds’ outliers. There are a few techniques to anticipate the boundaries in regression; one of the strategies
  • 6. is Ordinary Least Square (OLS). Assessing boundaries with OLS should satisfy some endorsed suppositions, with mistakes commonly free and ordinary with a center worth 0 and fluctuation 2. The Assumptions Involved In Regression Are: Autocorrelation: It measures how the lagged version of the value of a variable is related to the original version of it, when a time series data is considered. Heteroscedasticity: It refers to situations where the variance of the residuals is unequal over a range of measured values. When running a regression analysis, heteroscedasticity results in an unequal scatter of the residuals Multicollinearity: is a statistical concept where several independent variables in a model are correlated. Two variables are considered to be perfectly collinear if their correlation coefficient is +/- 1.0. Multicollinearity among independent variables will result in less reliable statistical inferences Normality: it means that the data under study must follow a normal distribution Slide 2 The objective deliverable in consideration are: To define data analytics in the business and data usage perspectives: Defining what exactly is data analytics gives a broader perspective of what the topic of discussion is all about, in this case, data analytics is described as the analysis of data to make a substantive conclusion on a given topic of study (Mehta, and Pandit, 2018). This will be achieved through research and reading of materials to get a precise understanding of what exactly is data analytics in the data processing and business decision-making process To justify the need for data analytics in the business decision-making process: This objective is tailored to identify the key gaps in conventional data processing thus presenting analytics as a solution through comparing the traditional data processing and their efficiency levels to the data analytics and its respective efficiency in the business decision- making process To identify the regression data analytics techniques and algorithms: The identification of the diverse techniques used in regression analytics is key in tailoring the research into achieving the overall goal of the research as specified in the topic of study To demonstrate the benefits of regression algorithms in predictive analytics: By attaining
  • 7. the demonstration of regression analytics and its respective analytics, the benefits drawn from it shall be identified as the specifics to be achieved in this objective. This is through a hands-on demonstration of one or two regression algorithms used in data analytics. Slide 3 Regression analysis is one of the significant and usually utilized factual devices for examining the connection between a reliant and at least one autonomous factor, with wide applications in the field of money, economic aspects, medication, and brain research. A regression technique is for the most part characterized as In which Y depicted as the dependent variable as well as ε derived as the true residual vector as well as X depicting the design matrix finalizes to become the n × p. Derive β based as the estimator for the β as well as the below (2) depicted to represent the next latter fitted residuals. The regression investigation ordinarily utilizes the least-squares strategy for assessment of model boundaries under certain presumptions to be fulfilled, like the ordinariness of mistakes with zero mean and steady change, i.e., ε ∼ N (0, δ2 ). Slide 4 Outliers being conflicting perceptions and generally veered off from most of the perceptions in information need legitimate taking care of as they present a genuine danger to the regression model and its assessed coefficients and, thus, give deceiving results (Werner, 2019). Two kinds of outliers can occur in the regression dataset. One with extremely enormous qualities in the reaction is alluded to as upward outliers, while perceptions with extremely huge qualities in the explanatory variable are called influence focuses. Robust regression is an innovative process for overcoming the problem of outliers and strong perceptions in data and limiting their influence on the regression results. The vast majority of so-called robust regression techniques lack this characteristic. The basic goal of robust assessment is to provide reliable evaluations/derivations for oblique borders while keeping outliers at bay. The robust system substitutes some other capacity for the OLS's number of squared residuals, which is usually less impacted by unusual perceptions. These methods first fit the data to regression and then identify outliers as perceptions with large residuals. Effectiveness, breakdown point, and limited impact are three desirable features of robust tactics. The breakdown point is only a small fraction of the unexpected impressions that an assessor might face before making an incorrect decision. Least Trimmed squares (LTS) is an exceptionally robust and sensibly useful assessor among
  • 8. each of the robust assessors accessible in the writing, and it is acquired by confining the managed amount of the squared residuals. The LTS assessor is a modified version of the LS assessor that focuses on the more important features while ignoring the extreme impressions in the organized data. Slide 5 This is the result of the Residual normality test that was carried out using the Kolmogorov Smirnov test. The p-values are very low. Therefore null hypothesis is rejected so it tends to be inferred that the residuals of old style linear regression models are not normally distributed. Residuals that are not normally distributed can be brought about by an outlier in the information. Slide 6 Outlier detection is done using TRES detection and this result was obtained. In light of the results from the table, it is realized that all malnutrition information know 2012-2017 has outliers Slide 7 These figures demonstrate the standardized residuals vs fitted quality and robust distances for India, respectively. A thorough investigation reveals that both population growth and foreign direct investment inflows have a significant role in Pakistan's monetary development. Nonetheless, FDI inflows are inextricably linked to population growth, and monetary growth is inextricably linked to population growth, even if the influence of gross investment funds is minor. Slide 8 The current study examined the impact of FDI inflow, yearly basis population growth, as well as gross investment companies on Pakistan's and India's GDP per capita using least squares (LS) and high breakdown robust least tried squares (LTS) regression techniques. FDI has a small but positive impact on both Pakistan's and India's monetary growth within the LS structure; however, once the LTS approach is implemented, FDI becomes a decisive and vital part of Pakistan's financial development model. Nonetheless, due to the general elimination of 5 and 2 exclusions from the information of Pakistan and India, respectively, FDI has a negligible impact on the monetary economy of India. Populace development adds to GDP per capita for the two economies indistinguishably. The two methods uncover that
  • 9. quick populace development adversely impacts the monetary development of the two nations and henceforth is a significant issue for the financial development of the two economies, and it requires prompt consideration.