SlideShare a Scribd company logo
1 of 92
Project Report on 
Developing statistical models of demand 
forecasting for domestic trade market 
operations of Torrent Pharmaceuticals Ltd. 
In partial fulfillment of requirements of 
Master of Business Administration (2006-08) 
Submitted BY 
SUCHIT SHAH 
Roll no.51 
MBA-I 
SUBMITTED TO 
AES PGIBM 
Undertaken at 
Torrent Pharmaceuticals Limited
ACKNOWLEDGEMENT 
I here by wish to take the opportunity to express my gratitude to Mr. 
K G Ramchandran- General Manager Human Resources ; for allowing 
to undertake my summer training at a well reputed organization like 
Torrent Pharmaceuticals Limited and Ms. MIti Randeri and Ms. 
Mallika Priyadarshini- Assistant Manager HR for taking care of all our 
official requirements. 
I express my sincere thanks and gratitude to Mr. Vipul Patel- General 
Manager Supply Chain Management and Mr. Chandan Chatterjee- 
AGM Supply Chain Management for guiding and encouraging me to 
carry out my project work successfully. 
I wish to convey my deepest regards and thanks to my project guide, 
Mr. Bhavesh Nainani-Manager DPC and Mr. Deep Vyas -Manager 
DPC for their all help, timely guidance and feedback in spite of a 
very busy schedule. They 
always managed to find time to sit with me and provide the necessary 
guideline and ideas. 
I also wish to express my sincere thanks to Mr. Bhavin Shah- 
Assistant Manager DPC and Mr. Jayant Nikhare- Assistant Manager 
DPC for their help and support in every possible way. 
Finally I wish to thank all staff of Supply Chain Management 
Department for their kind co operation during the tenure of my 
project. 
Suchit Shah 
Summer Trainee 
May-July (2007) 
AES Post Graduate Institute of Business Management 
Ahmedabad 
2
TABLE OF CONTENTS 
Heading Page No. 
Executive Summary 04 
Torrent Group-overview 06 
Mission Vision & Values 06 
Objective 07 
Salient Features 07 
Project Constraint 07 
Assumptions made during the project 07 
Project Overview 09 
Benefits Expected 10 
Demand Forecasting-Introduction 10 
The basic steps in a forecasting task 11 
Company network 14 
Demand Planning @SCM Dept 17 
Exponential Smoothing 25 
Triple Exponential Smoothing 25 
Multiple Regressions with ‘n’ factors 33 
Multiple Regression with MS Excel 42 
LINEST function 49 
Fitting Multiple Regression Model 46 
Methodology 56 
Regional level forecasting 75 
Findings 79 
Recommendations 83 
3
Future scopes of the model 83 
Limitations 84 
References 85 
Executive Summary 
This report first attempts to study, how Planning system works at Domestic Demand 
Planning Cell (DPC), SCM dept, Torrent Pharmaceuticals Ltd with a view to get 
acquainted with the system & processes. 
It tries to understand the various reports prepared by DPC.What type of data are 
maintained, in what form, in what type? To understand the existing Demand 
forecasting procedure. An exploratory analysis is done. 
Initially it attempts to get the idea of product basket. Products are primarily classified as 
per their sales behavior. In the initial phase, it studies the product basket. It attempts to 
identify and define all the factors which may directly or indirectly affect the sales of the 
SKU. 
After getting acquainted with the product basket and its behavior, it defines the definition 
of problem. Demand forecasting is the process of determining what products are 
needed where, when, and in what quantities. It is needed to forecast for the sales in 
a way so that it shows less fluctuations. It explores all concerned topic with demand 
forecasting. 
The present system of the forecasting is well structured and well defined. Somehow it 
has not been able to show accurate results of forecasting. The system can’t quantify the 
fluctuation in the actual sales. That is why a need, for developing a statistical model, 
arises. Then report tries to explore for the alternative models available. 
With having the sales data of 24 months, a triple exponential smoothing model is 
applied with its assumptions. That hasn’t shown the desired output and so, it has been 
rejected. 
The actual sales data are affected by many parameters. They all should be taken care 
of and should be given effect to the actual sales. After seeing the complexities, it 
decides to apply Multiple Regression model with the parameters short listed. It takes 
all the assumption of multiple regression for granted. Under the model it considers the 
sales as an independent and affecting parameter as dependent one. 
4
A database is made for getting the data of included parameters. Data are collected from 
the SAP and ORG-Marg. Primary sales and institutional sales are got through the 
internal source of data. Tertiary sales are got through ORG-data. 
By using MS Excel (2003), multiple regression is fit to the data and the forecasts are 
generated for the future months. Forecast accuracy is calculated as per DPC 
methodology. These results are compared with the results of the existing systems. 
It has shown a significant increase in forecast accuracy. 
A model of demand forecasting at regional level is also made. But it can neither be 
analyzed nor be validated due to time constraint. 
It is recommended to implement model for demand forecasting with the personnel 
intervention and with the addition of due insights. 
5
Torrent Group: Overview 
It all began with the inspired efforts of one enterprising individual, Shri U N Mehta, when 
he ventured on his own to create history in the Indian Pharmaceutical industry, by 
successfully implementing the concept of niche marketing. With the launch of 
Trinicalm Plus, an effective tranquilizer, the foundation of the company was laid 
as ‘Trinity Laboratories, which was later, renamed ‘Torrent ‘. Today Torrent is one 
of the leading pharmaceutical companies of India. Torrent is multifaceted and 
dynamic group dedicated to transforming life by serving two of its most critical 
needs- healthcare and energy. 
In the power sector, the Torrent Group remains the most experienced private 
sector player in the state of Gujarat. Torrent just lunched a mega project, the 1100 
MW SUGEN CCPP, being set up at an investment of Rs. 3096 crores, is a backward 
integration move of Torrent Power to secure a reliable source of supply for its 
Ahmedabad and Surat distribution areas. 
The project is strategically located. It is close to River Tapi, National Highway No.8, gas 
supply infrastructure comprising LNG terminals and main gas trunk lines The plant 
would comprise of 3 advanced class gas turbines with a high operating efficiency. 
Environmental and social impact of this project is minimal due to use of eco-friendly 
Natural Gas 
The flagship company of Torrent group, Torrent Pharmaceuticals Limited, is a dominant 
player in the therapeutic areas of cardiovascular (CV) and central nervous system (CNS) 
and has achieved significant presence in gastro-intestinal, diabetology, anti-infective and 
pain management segments. 
To cater to new niche segments and sharpen its focus among customers, Torrent Pharma 
has ‘11’ marketing divisions, each catering to defined therapeutic segment. Torrent 
Pharma’s competitive advantage as a manufacturer stems from its world-class 
manufacturing facilities. Its manufacturing facilities at Indrad, Gujarat, comply with 
USFDA,WHO, cGMP, MHRA and TGA norms and have received ISO 9001, ISO 14001 and 
OHSAS 18001 (Occupational Health and Safety Management System) and ISO/IEC- 17025 
certifications. 
With a view to cater to its growth requirements, Torrent Pharma commissioned a new state 
of art formulations manufacturing facility at Baddi, Himachal Pradesh, in November 2005. 
The facility has a capacity to manufacture 3600 million tablets, 400 million capsules and 18 
million Oral Liquid bottles, per annum and would cater to the domestic formulations 
requirement. 
Torrent has a modern and well-equipped state-of-the-art R&D Centre, built with an 
investment of US $ 40 million. It is manned by more than 525 highly qualified scientists, with 
a combined experience of over 2500 scientific man-years in Drug Discovery and 
Development. Torrent Pharma has earmarked 9% of sales year-after-year for R&D 
advancement. 
6
In the International operations arena, Torrent Pharma exports to more than 50 countries 
around the world with over 1000 product registrations. The international business has been 
broadly divided into five zones- USA, Latin America, Russia and CIS, Western Europe and 
CEE and Rest of the World (ROW). For its export excellence in International Business, 
Torrent Pharma has won several prestigious export awards. 
Torrent Pharma is now gearing up to enter the advanced highly regulated international 
markets. Torrent Pharma has incorporated Zao Torrent Pharma in Russia, Torrent Do Brasil 
Ltda in Brazil, Torrent Pharma GmbH in Germany, Torrent Pharma Inc. in USA and Torrent 
Pharma Philippines Inc. in Philippines. These wholly owned subsidiaries will become a 
springboard for entry into several regulated and less regulated international markets. 
TORRENT PHARMACEUTICALS LIMITED 
Mission: 
We commit ourselves to total customer care by delivering world –class products and 
services. 
Vision 
To be the leader in the pharmaceutical industry 
Values 
A set of core value continue to guide us through the process of transforming the 
conglomerate into a high-performing and caring organization for our customers, 
employees, shareholders and society. 
· Improving quality of life of our customers, as we believe quality is a way of life. 
· Creating value for our shareholders, for the trust bestowed on us. 
· Building an empowered and ethical Torrent family, as the foundation for a 
bright future. 
· Responsibility towards the society and environment, as we owe our existence 
to them. 
· Being innovative in solutions, for being different, counts. 
· Striving for excellence in whatever we do, to follow the exclusive path to 
leadership. 
· Flexibility and speed shall be our oars for navigating the turbulent seas. 
7
Objective of Project 
To develop a statistical model of demand forecasting for domestic trade 
market operations of Torrent Pharmaceuticals Ltd. 
 @Gross level 
 @Regional level 
Salient Features 
Following are the salient features of the project 
 It aims to improve the existing demand forecasting process by using a 
statistical tool 
 It tries to cover all the quantitative and qualitative factors which affect 
the actual sales 
 It takes into consider the uncertain fluctuations and captures them 
 It discusses the product specific sales behavior 
 It makes the whole forecasting procedure a dynamic one 
 It reveals the clear picture of Pharma sector from drug specific to 
macro level 
Project Constraints 
 The project is based on the tertiary sales made available from 
ORG-Marg data, it may contain inaccuracy up to some extent 
 The project doesn’t include the secondary sales data 
 The project may considers the parameters only for which data are 
available 
 The project tries to estimate the future values of the parameters 
Assumptions Made during the project 
 Data, which are collected, is accurate. 
 Future estimates of the parameters are true. 
 Parameters taken into considerations are least correlated 
 Data collection horizon ranges from May’05 to June’05 
8
Project overview 
To study how the demand planning works at SCM dept., 
To develop a statistical tool for demand forecasting 
9 
Identifying and defining the parameters 
Applying various statistical tools 
Matching output of the model with the past sales 
Comparison with existing system and suggested system 
Checking the robustness of the tool 
Comparison with existing system and suggested system 
Implementation, if all the criteria are fulfilled for all or 
partial numbers of SKUs
Benefits expected 
 Minimization of overstocking 
 Reducing the gap between orders and actual sales 
 No opportunity loss, which may result into growth 
 Better inventory control and hence better cash flow 
 Better utilization of resources 
 Dispatch efficiency 
 Smooth operation flow from demand planning to order execution 
 Prior planning of recruitment and changes in workforce 
 Proper allocation of promotional budget 
DEMAND FORECASTING- a brief overview 
Introduction 
What does the word forecast mean? 
The word “fore” means ‘watch out’ in golf and is shouted as a warning to anyone who 
could potentially be in the path of a misplaced golf ball. The word “cast” to an angler 
means “throw out.” Putting the two words together, a word is made i.e. “forecast”. That 
means “watch out and Throw out.’ 
Forecast management is the process of making, checking, correcting and using 
forecasts. It also includes determination of the forecast horizon. 
Forecast- An estimate of future demand. A forecast can be determined by 
mathematical means using historical data. It can be created subjectively by using 
estimates from informal sources, or it can represent a combination of both techniques. 
Forecasting involves making projections about future performance on the basis of 
historical and current data. 
Forecast methods can be divided into history-based and future-based ones. 
 History-based demand forecasts are analytic methods based on consume 
statistics. They can be further divided into mathematical and graphic methods. 
 Future-based demand forecasts use already existing information about future 
demand e.g. offers, confirmed orders in a contracting phase and interviews on 
customer behavior. (Schönsleben, 1998) In this study, conditional variance 
models are used for quantifying the demand process uncertainty. The uncertainty 
can for example be dependent on the level of demand, the previous shocks and 
the historic level of the variance process. 
Understanding customer demand is key to any manufacturer to make and keep 
sufficient inventory so customer orders can be correctly met. The discipline that helps a 
supply chain forecast and plan well is called as demand planning. 
10
Accurate and timely demand plans are a vital component of and effective supply chain. 
Inaccurate demand forecasts typically would result in supply imbalances. Although 
revenue forecast accuracy is important for corporate planning, forecast accuracy at the 
SKU level is critical for proper allocation of resources. 
Types of forecasting 
 Quantitative forecasting is used when sufficient quantitative information is 
available. 
 Qualitative forecasting is used when little quantitative information is available, but 
sufficient qualitative knowledge exists. 
Quantitative forecasting can be applied when three conditions exist: 
1. Information about the past is available 
2. This information can be quantified in the form of numerical data. 
3. It can be assumed that some aspects of the past pattern will continue into 
the future. 
Under quantitative forecasting methods, there are tow major types of forecasting 
models: Explanatory models 
Time series forecasting 
Explanatory models assume that the variable to be forecasted exhibits an explanatory 
relationship with one or mote independent variables. 
Time series forecasting deals with the past data only. It makes no attempt to discover 
the factors affecting its behavior. The objective of time series forecasting methods is to 
discover the pattern in the historical data series and extrapolate that pattern into the 
future. 
Forecast Management 
Forecast management is the process of making, checking, correcting and using 
forecasts. It also includes determination of the forecast horizon. 
While designing a forecasting system, the policy issues of what to forecast, why 
forecast is needed, and who does the forecasting must be addressed. A forecast is 
meaningful only in relation to planning and decision making in some area of business 
application. Thus, an important aspect of any forecasting system is knowing and 
planning how it will be used in business planning, budgeting, and the operations 
aspects of master scheduling and inventory planning. Different attributes of the 
forecasting system of varying levels of concern and interest to people in each of these 
areas. 
The basic steps in a forecasting task 
Forecasting is a five steps sequential process for which quantitative data is available. 
Step 1: Problem Definition 
Step 2: Gathering information 
Step 3: Preliminary (exploratory analysis) 
Step 4: Choosing and fitting models 
Step 5: Using and evaluating a forecasting model 
11
Problem definition 
The definition of problem involves developing a deep understanding of how the 
forecasts will be used, who requires the forecasts, and how the forecasting function fits 
within the organization. It is worth spending time talking to everyone who sill be involved 
in collecting data, maintaining databases, and using the forecasts for future planning. 
A forecaster has a great deal of work to do to properly define the forecasting problem, 
before any answers can be provided. One need to know exactly wha products are 
stored who uses them, how long it takes to produce each item, what level of unsatisfied 
demand the company is prepared to bear, and so on. 
Gathering information 
The information available can be mainly of two types: 
1. Statistical data 
2. The accumulated judgment and expertise of key personnel 
Exploratory analysis 
By calculating simple statistics like mean, standard deviation, correlation, minimum, 
maximum, percentiles associated with each set of data. On having more than one 
series of historical data, one can use descriptive statistics for exploration. 
The purpose of doing this at this stage is to get a feel for the data. Do they follow 
consistent patterns? Is there evidence of the presence of business cycles? Are there 
any outliers in the data that need to be explained by those with expert knowledge? How 
strong are the relationships among the variables available for analysis? 
Choosing and fitting models 
After doing the exploratory analysis, it can be understood that how to handle the data. 
What pattern and what behavior is being observed? One can understand that what are 
the things that affect the actual sales? 
So, it is the stage when one can choose the model which is to be fitted. One can 
interpret the characteristics of the actual past data. And can also determine which 
model can be chosen? One has to match the assumption of the specific models with the 
data. After choosing the model, one should fit it to the data. If necessary than it should 
be modified accordingly. 
Using and evaluating a forecasting model 
After fitting the model with the actual data, inference can be derived. Accordingly one 
can have the forecasts as per the model for the future data. It should be checked by 
holding one month actual data, and giving the forecast. After getting that forecast, it 
should be compared with the data. Forecast effectiveness (forecast accuracy) should be 
calculated. If that is better than the present system, it should be used. 
12
SCM @ TORRENT 
Supply Chain Management coordinates entire channel from supplier to customer. 
Supply Chain Management is the management of the entire value-added chain, from 
the supplier to manufacturer right through to the retailer and the final customer. Supply 
chain management coordinates almost all the departments of the company. It links the 
departments and smoothens the whole system. 
SCM has three primary goals: 
· Reduce inventory, 
· Increase the transaction speed by exchanging data in real-time, 
· Increase sales by implementing customer requirements more efficiently. 
Planning done at SCM is the indicator for all the other departments i.e. Production, 
finance, marketing and HR also. 
Torrent’s supply chain management is mainly bifurcated in to two divisions, 
i.e. Domestic operations division 
International operations divisions 
Domestic operations division is bifurcated in to following, 
i.e. C&FA Cell 
Demand Planning Cell 
Indrad Warehouse 
Zirakpur Warehouse 
PPC- Indrad & Baddi 
Supply chain management department is well equipped with necessary infrastructure. It 
has all the means of modern software, and hardware. 
To cater and handle the large company multipoint network across India and whole 
world, SAP is implemented at TORRENT SCM dept., MM module (Material 
Management module) and PP module (Production planning module) is used by the 
department personnel. 
Microsoft excel is used extensively at the department. Various MIS are prepared by 
using MS Excel from SAP data. 
Company network 
Torrent’s corporate office is based at Ahmedabad. All the supporting activities are 
conducted from the HO (based at Ahmedabad). 
Two plants are situated at Indrad (Gujarat) and Baddi (Himachal Pradesh). 
Most of the domestic requirement is served by the Baddi plant. Company has set up its 
warehouse at Zirakpur (Punjab).Products produced at Baddi ppc are stored at Zirakpur 
warehouse. All dispatches are done from the warehouse. 
Company has 25 carrying and Forwarding agents across all over India. 
C&FAs are responsible for the primary sales in the particular allocated region. 
C&FAs are the agents which sell the products to the stockiest. They get the orders from 
the stockiest and that is further put to the supply chain department at HO. 
13
Again all these activities are coordinated by Supply Chain Department. 
Company engages in mainly two type of selling. 
· Trade sales 
· Institutional sales 
Sales with trade aspect are the sales done through the channels of C&FAs. While 
institutional sales are the sales to the institutions like hospitals, railways, army etc… 
14
SCM@HO 
15 
Demand planning 
Baddi ppc 
Indrad PPC 
Indrad Warehouse 
Zirakpur Warehouse 
Warehouse to C&FA 
C&FAs C&FAs C&FAs C&FAs 
Primary sales 
Stockist Institutions 
Stockist 
Retailer 
Secondary sales 
Retailer Retailer 
Customer Tertiary sales 
Stock Transfer 
Inter C&FA 
Company network
Products 
Company has a product basket consisting 500+ products. 
Company produces products in the form of Tablets, capsules, liquid and injections. 
Each product is allocated a unique 7 digit product code. 
From marketing point of view, there are 11 divisions made; accordingly the drugs are 
allocated to the divisions. 
Sensa, Mind, Axon, Neuron, Azuca, Psycan, Omega, Delta, Prima, Vista, Alfa 
These division again are classified into three groups; PVA, APOD, SMAN 
Where; PVA= Prima, Vista, Alpha (Anti Infective segment) 
APOD= Azuca, Psycan, Omega, Delta (Cardiology and Diabetology) 
SMAN=Sensa, Mind, Axon, Neuron (Central Nervous System) 
Product Classification 
These products show different behaviors in selling quantity. Accordingly one should also 
classify as, 
· Matured (stable) products 
· Seasonal product 
· New products 
Matured products are the products which are there in a market since longtime. 
They show the particular pattern and do not show significant deviation. One can 
understand the fluctuation. They reflect clearly the stable pattern. 
E.g. Nikoran 5, Deplatt tab, Antidep 
Seasonal products are the products which show particular seasonal behaviors. 
Sales goes high in particular season i.e. in particular month. 
By having more than one cycle i.e. a year, it can also be estimated that amount hike 
due to the particular season. There are certain products which depend on the season. 
E.g. Quintor Infusion is the product which has shown high sales in the month of April, 
May. 
New Products are the products which are launched within 6 months. It is not easy to 
estimate its behavior. By having less data, one can not capture the trend and the 
amount of deviation. So, it is not that easy to capture the fluctuation in the selling 
quantity. 
e.g Rimofit, Rimoslim are the product just launched in the month of May’07. 
16
Demand Planning @SCM Dept., 
Demand planning is the process through which an organization generates a forecast of 
market demand for its products on a regular basis. This allows the organization to 
calculate a historically based statistical forecast for each point (that is, part 
number/warehouse combination). Some key output variables include demand in pieces, 
demand in customer orders, pieces per customer order; standard (forecast) deviation, 
and pieces per deviation. 
At Torrent, there is a separate demand planning cell under the SCM dept., which 
conducts the demand planning on the basis of 4 months rolling plan. Under the rolling 
plan, planning is done 4 months prior to the corresponding month. Planning includes 
the demand forecasting, production planning, Supply planning, and dispatch 
planning. 
Demand plan is first given by the marketing department. And then it is to be reviewed by 
the demand planning cell. For every product in each division, demand plan is reviewed, 
and corrected if needed. 
On the 20th day of every month M Demand planning is done by the demand planning 
cell for the month M+3 . 
After deciding demand plan, it is being executed by the related departments of the 
company in a very sequential manner and in a very structured way. All the planning like 
production planning, financial planning. Dispatch planning, procurement planning is 
made accordingly. 
Company produces most of the products at in-house facilities i.e. at the Indrad and 
Baddi plants. While for certain products, company has P2P and LLM arrangements. 
P2P is principle to principle arrangement, in which the products made by other 
companies, are marketed by TORRENT. Drug license and manufacturing licenses must 
be had by that company. Torrent need not to have drug license and manufacturing 
license. There are approx 230 products which are received from P2P. 
. 
LLM is Loan License manufacturing, in which the company uses the plant of other 
companies. But TORRENT uses the facilities of others’. Torrent must have a drug 
license and manufacturing license of that particular drug. There are approx 38 products 
which are received from LLM. 
It is very complex task to forecast for the products which are not produced in house, as 
it has a longer lead time than the products produced in-house. 
17
Demand Planning deals with these arrangements. They are responsible for getting the 
products in time and for planning its demand, dispatches at the right point of the time at 
least cost. 
Due to certain circumstances, it is not possible to execute all the orders got from the 
stockiest. There are situations when they are not able to connect the stock as per the 
order. 
It generally happens due to certain situation like non availability of raw material, 
machine breakdown, transportation problems, or due to sudden excess demand. 
In some cases, it can be known in advance that a particular product may not be 
available for the coming month. So that product is declared as Non available product, 
which is abbreviated as NAP. This can be the genuine sales, if proper demand 
planning. 
On the beginning of every month, every aspect of the past month is analyzed and 
proper justification is done to the particular aspect. Certain reports like Gap report, Nap 
report, inventory analysis report, connectivity report etc… are prepared. 
Planning Horizon-4 months rolling plan 
(Tentative plan) 
Solid rock 
July 
Solid 
August 
Slushy 
September 
Liquid 
October 
M M+1 M+2 M+3 
Let’s consider the month of June’07 as a reference point. According to the 4 months 
rolling plan, in the month of June, demand planning is made for the coming 4 month. As 
it is a continuous process, a new month is added every month. Status is changed for the 
consequent months. 
Next two consecutive months are considered as a frozen. That means in the month ‘M- 
1’, demand plan is made fixed for the next two months i.e. ‘M’ and ‘M+1’. It can not be 
changed in the status of solid rock, solid status. 
While the planning for the 3rd and 4th month is made tentative. 
Status of these months is Slushy and Liquid. In tentative plan, demand can be changed 
as per the constraints. 
In the same manner, the status of month ‘M’ and ‘M+1’ were tentative in ‘M-3’. 
Status of every month is changed on arrival at the new month. 
18
Existing system of forecasting 
Torrent’s domestic operation system work on make to stock basis. Products are 
manufactured prior to the orders are received at C&FA.Hence there is a need to 
forecast the sales in advance. 
Optimum quantity should be produced to serve the market. 
At Torrent, Existing system of forecasting doesn’t use the specific statistical tool. 
Forecasting process is performed based on the past data & statistics like average sales, 
minimum, maximum sales and orders are taken into consideration. 
Division vise demand plan is prepared by marketing department on the basis of field 
target. This plan is reviewed by the demand planning cell. So, according to the schedule 
of rolling plan, the demand plan is made. 
Demands (forecasts) are generally predicted on the basis of past data. Past behavior 
of the resent months along with the general trend is considered to forecast. Field 
targets given to the sales force also are taken in to considerations. That means 
quantitative data is considered. 
Certain factors like epidemics, seasonal effect and the some visible factors are taken 
care of. Visible factors include the competitor’s move, market behavior, and 
authoritarian factors. These factors are the qualitative data. Qualitative data should be 
quantified in a particular manner. 
Considering all these factors, forecasts are put forward. 
Present system works more on the judgment, no particular statistical tool is applied. 
So, it has not been able to capture all these factors precisely. Fluctuations can not be 
quantified in the proper proportion. There may be a bias in estimation and quantification 
of these parameters. 
These all results in to forecast which doesn’t match exactly with the actual sales. 
Forecasts made do not fit to the actual data. 
Poor forecast accuracy will result into 
· Dispatch inefficiencies. 
· Loss of genuine sales 
· High inventory, so does the blockage of working capital 
· High lead time 
It’s must to have good forecast accuracy. 
Forecast accuracy here is less, which needs to be improved. 
Hence, there is a need to develop a system (model), which takes care of all the 
concerned factors. All the factors are needed to be understood and are to be quantified 
properly. How a single factor affects different SKUs in different manner. 
19
By demand planning cell, a file named CODIS is prepared, which is Correlation among 
orders, demand, Inventory and sales. From the SAP, for every product a data is 
available which gives the demand, orders got, sales, and the total availability. 
By this file, it is tried to analyze the actual scenario, to what extent orders are executed. 
%Variation of demand to sales and % variation to orders is calculated. 
That shows how the demand is close or away from the actual sales and orders. 
Graph shown on the next two pages are the graphs, showing the status of orders, 
demand, sales and stock. And the other is showing the % variation demand to sales 
and % variation demand to orders with the corresponding trend lines. 
The graph given on the next page is for the product Alprax, 0.5 tabs, which composites 
the molecule Alprazolam, which belongs to the class Tranquilizers. 
20
Forecast Accuracy Jun'05 - May'07 
2000000 
1800000 
1600000 
1400000 
1200000 
1000000 
800000 
600000 
400000 
200000 
0 
Quantity (Units) 
June '05 July '05 Aug '05 Sept'05 Oct '05 Nov '05 Dec'05 Jan '06 Feb '06 March 
'06 
April '06 May '06 June '06 July '06 Aug '06 Sept'06 Oct'06 Nov '06 Dec'06 Jan'07 Feb'07 Mar'07 Apr'07 May'07 June'07 
Demand 950010 900120 850000 860000 772000 855000 820000 730000 750000 700000 850000 800000 900000 950000 900000 900130 670000 600130 500130 500000 450000 375000 525000 525000 550000 
Orders 878172 755927 795987 885868 665098 749917 773628 691974 616661 686613 940231 913461 962695 839190 855761 741450 490813 538989 492931 534135 558444 428911 693125 607044 596129 
Sales 562903 499151 525658 590579 443185 499145 515552 459476 411027 432582 613461 604213 621979 557786 568361 476396 486713 535429 491120 527857 533434 420533 668147 580078 574958 
TA @ CFA 1431263 1449613 1413207 1364700 1322452 1577114 1412617 1138279 933865 1319114 1727856 1282504 1235664 1275416 1552737 767560 687486 387987 792249 261937 667944 610092 893657 904710 734085 
Demand Orders Sales TA @ CFA 
21
Tracking Forecast Acccuracy 
40.00% 
20.00% 
0.00% 
-20.00% 
-40.00% 
-60.00% 
June 
'05 
July '05 Aug '05 Sept'05 Oct '05 
Nov 
'05 
Series1 -40.75%-44.55% -38.16% -31.33% -42.59% -41.62% -37.13% -37.06%-45.20%-38.20%-27.83%-24.47%-30.89%-41.29% -36.85%-47.07%-27.36%-10.78% -1.80% 5.57% 18.54% 12.14% 27.27% 10.49% 
Series2 -7.56% -16.02% -6.35% 3.01% -13.85% -12.29% -5.66% -5.21% -17.78% -1.91% 10.62% 14.18% 6.97% -11.66% -4.92% -17.63% -26.74% -10.19% -1.44% 6.83% 24.10% 14.38% 32.02% 15.63% 
Month 
% Variation 
Series1 Series2 
Linear (Series2) Linear (Series1) 
Dec'05 Jan '06 Feb '06 
March 
'06 
April 
'06 
May 
'06 
June 
'06 
July '06 Aug '06 Sept'06 Oct'06 
Nov 
'06 
Dec'06 Jan'07 Feb'07 Mar'07 Apr'07 May'07 
The above given is the graph showing %variation of sales and orders to demand 
22
Forecast accuracy at TORRENT with the present system 
It is said to an accurate forecast, if; 
Sales= (90% to 110% of the forecast) 
Demand planning cell, at TORRENT, calculates the forecast accuracy on the beginning 
of the month for the past month. Forecast accuracy, at the gross level and C&FA level, 
are calculated. 
An actual sale during the last month is compared to the projected demand of the 
corresponding product and corresponding month. 
The deviation of actual sales from the demand is calculated. 
Let’s consider for the ‘X’ product, the actual sales are ‘Yt’ and accordingly the forecast 
for the same is ‘Ft’. 
Then the deviation is calculated by the formula, (Yt-Ft)/Ft. This will give us the % 
deviation of demand to sales. 
At TORRENT, a range is defined for the specification of the forecast accuracy. 
A forecast is considered to be a HIT, if it fluctuates within the range of the +/-10% range, 
otherwise miss. 
MS Excel is used for the purpose. By the present system, it has shown less accurate 
results. 
There is a need to work on the demand forecasting. 
Present system is efficient when it comes to the stable, fast moving and matured 
products. 
Present system takes care of the products, which have shown high skewness due to 
promotions and schemes. 
Present system can estimate well the sales of the product which are to be launched. 
Demand planning cell also interacts with the marketing people about the product 
behavior on line extension. Demand planning cell does the well job in estimating 
fluctuation of the forthcoming incidents, which can be known in advance. 
Present system has its own unique features. 
System is well defined and well designed. It is a foolproof system. 
Defining parameters 
Parameters are the factors, which directly or indirectly affect the actual sales. 
These factors are needed to be identified. What types of factors affect the actual sales? 
Factors which have a direct effect and indirect effect should be explored out. By the 
process of exploration one can have a list of parameters. Then it needs to be sort out in 
way to get the parameters which have a significant impact on it. There are the statistical 
methods to check the significance of various parameters on the actual sales. 
These can be the factors which can affect the actual sales. 
23
 Trend 
 Total availability of the SKU 
 Seasonal factors i.e. for months 
 Promotions and schemes 
 Price sensitivity 
 Market share 
 Market growth of the SKU 
 Market growth of the molecule 
 Market growth of the brand 
 Market growth of the molecule class 
 Market growth expected by the organizations 
 Additional duties, taxes levied by government 
 Introduction of new drugs by competitor in the same segment 
 Line extension by company 
 Introduction of new drugs by company in the same segment 
 Regional factor 
 Drugs with Same molecule 
 Same Products with the different power 
 Institutional sales 
 Sales force 
 Field Targets 
 Secondary sales 
 No. of stockiest 
 Tertiary sales 
 No. of retailers 
 Miscellaneous factors (Epidemic, Billing channel, Government factors, 
availability, orders etc…) 
Above stated can be the factors which can have significant impact on the actual sales. 
There must be a proper selection of the parameters for having accurate and close 
forecasts. 
Matching the results 
After developing an appropriate model, models should be applied on the past data. 
Forecast for the past data should be done. It should be compared with the actual past 
data to verify the reliability and validity of the model. Various other statistical tools can 
be used to check for the same purpose. 
Comparing it with the present system 
After developing the models, it is necessary to compare it with the present system. 
If it gives better results than the present one or not. Comparison should be on the basis 
of various aspects, it should give reliable and consistent results. Does it have an impact 
on inventory level? Does it have an impact on profitability? Can it make the whole 
system smoother? 
24
Is it Robust? 
Models should give the accurate results in any situation. If it gives the proper forecast in 
any situation, then it should be implemented. Model should capture the fluctuation. 
It should react to the adjustment done on foreseeing certain factors. Model has to be 
robust. It should be flexible towards the changes done. And it should react accordingly. 
Implementation 
After inspecting all the criteria, one should validate the model. If it gives reliable, 
consistent and precise results and have a significant impact on the topics of concern. 
Then it should be implemented. It should be used for the future. 
Statistical tool 
Tools which can be considered are 
Time series 
Exponential smoothing 
Multiple regressions 
Many forecasting methods are based on the concept that when an underlying pattern 
exists in a data series, that pattern can be distinguished from randomness by smoothing 
(averaging) past values. The effect of smoothing is to eliminate randomness so the 
pattern can be broken down into sub patterns that identify each component of the time 
series separately. Such a breakdown can frequently aid in better understanding the 
behavior of the series, which facilitates improved accuracy in forecasting. 
Time series decomposes the data in to the sub patterns. It analyzes the data and 
separates the effects of the components. 
Data= pattern error 
=f (trend-cycle, seasonality, error) 
But here at Torrent, there is a product basket having 500+ products. Each has a 
different behavior to behave. There are several factors which affects the overall 
dimensions. It is not enough to use time series. As it captures the trend, seasonality and 
error. 
To analyze and determine the trend, seasonality and level which is followed by the data, 
Triple Exponential Smoothing is applied. On the basis of the assumption and the 
methodology of the model, one can fit the model to the past data. And accordingly the 
forecasts for the coming period are got on the basis of past data. 
Data Availability 
There is a 24 months data available, which gives the monthly primary sales of past 24 
months i.e. From May’05 to May’06. Data available is of two complete cycles, which is 
the least requirement of applying triple exponential smoothing. Primary sales are the 
sales done through the channels of CFAs. But it also includes the institutional sales, 
which is to be nullifying later. Data for institutional sales are got from the SAP as a 
dump for the same period as stated above. 
25
Exponential Smoothing 
A model is an extension of moving average method and uses weighted moving 
average. In this particular method, weights are allocated to the past data and the recent 
data. A class of methods that imply exponentially decreasing weights as the 
observations get older. This method has the property that recent values are given 
relatively more weight in forecasting than the older observations. 
Triple Exponential Smoothing (Holt Winters multiplicative model) 
Holt’s method of exponential smoothing is developed by Winters (1960) to capture 
seasonality. 
It considers (1) Deseasonalized level 
(2) Trend (Growth) level 
(3) Seasonality 
Let’s consider the, Original data i.e. monthly sales as Yt . 
Deseasonalized factor Rt 
Trend factor (Growth factor) Gt 
Seasonal factor St 
Forecast Ft 
As monthly data is available for 24 months, we have two complete cycles. Data is 
available from June’05 to May’07. In the table given on the next page shows the 3rd 
column having these data. 
To get the level and trend, one should apply the linear regression. In linear regression 
Equation, 
Y=a+bX; Y= actual sales 
a= intercept (Rt) 
b= Growth (Gt) 
After getting the deseasonalized level and growth factor, seasonal factor is calculated. 
Seasonal factor= Actual sales of the corresponding month 
Forecasted sales for the same month by linear regression 
By this one can have the seasonal factor. If it is greater than 1 than it is showing that 
amount of higher sales due to season. If it is less than 1 than it is showing that amount 
of less sales due to the season. 
Equations for the Holt-Winters’ method are as follows; 
Level: Rt = α*Yt + (1-α)*(Gt-1+Rt-1) 
St-s 
Trend: Gt=β*(Rt-Rt-1) +(1-β)*Gt-1 
26
Seasonal: St=γ*Yt+ (1-γ)*St-s 
Rt 
Forecast: Ft= (Rt +Gt*X)*St-s-x 
Here α, β, γ are the smoothing constant,0< α, β, γ<1. 
These values are chosen by the forecaster as per the feasibility of the data. There can 
be a bias in initializing the values of the smoothing constants. And it has been observed 
that α, β, γ=0.5 gives the favorable results. 
But to remove the bias of initializing the method is modified. So that it gives the same 
results as per the above calculation. 
The modified method is as follows: 
Rather than using the smoothing equation for the trend, level and seasonal factors by 
the above equation. One should fixed the trend and the level factor as it is got by the 
linear regression. It should be held constant for every month i.e. for the past months as 
well as the coming months. 
For seasonal indices of the future months, one should consider the average of the same 
corresponding months of the past cycles. 
This makes the calculations easy for the value of all the smoothing constants as 0.5. 
So below given is the forecast for the two drugs Nikoran 5 Mg tab and Torleva 500. 
Last column indicates the %variation between the forecast and the actual sales. 
For the past months, it has shown very less variation i.e.+/-10% 
27
Nikoran 5 Mg Tab 
28
29
Torleva 500 
month sales (Yt) 
Yt^(deseasonalized 
factor) Rt(level) Gt(trend) 
seasonal 
factor 
seasonal 
indices 
forecasted 
demand(Ft) % variation 
108762.3 420.073 
June '05 117586 109633 1.072542 118193.3 -0.51651 
July '05 106675 109991.6 0.969846 113104.7 -6.02741 
Aug '05 110095 110350.3 0.997687 113275.7 -2.88902 
Sept'05 108912 109205.4 0.997314 108723.6 0.173017 
Oct '05 97303 111067.5 0.876071 101156 -3.95981 
Nov '05 111810 112921.2 0.99016 110736.4 0.96022 
Dec'05 100237 111784.7 0.896697 106306.8 -6.05549 
Jan '06 116430 112143.3 1.038225 119867.4 -2.95235 
Feb '06 106153 112501.9 0.943566 107077.4 -0.87078 
March '06 87058 119773.7 0.726854 79559.33 8.613413 
April '06 147642 113219.2 1.304037 139645.1 5.416415 
May '06 126534 113577.8 1.114074 123260.7 2.586905 
June '06 124478 113936.4 1.092522 123650.3 0.664976 
July '06 125046 114295 1.094063 118306.7 5.389457 
Aug '06 120342 113375.1 1.06145 118465.6 1.559224 
Sept'06 111741 115012.2 0.971557 113686 -1.74062 
Oct'06 109466 115370.9 0.948818 105755.5 3.389605 
Nov '06 115732 115729.5 1.000022 106138.8 8.289126 
Dec'06 112807 112057.2 1.006691 111104.2 1.509473 
Jan'07 128082 116446.7 1.09992 125256.5 2.206027 
Feb'07 112052 116805.3 0.959306 111873.4 0.159366 
Mar'07 79875 117163.9 0.681737 83109.6 -4.04958 
Apr'07 136233 117522.5 1.159207 145853.6 -7.06184 
May'07 124027 117881.2 1.052136 128720.5 -3.78424 
June'07 118239.8 1.082532 129107.2 
july'07 118598.4 1.031955 123508.7 
aug'07 118957 1.029568 123655.6 
sept'07 119315.6 0.984436 118648.4 
Oct'07 119674.2 0.912445 110355.1 
Nov'07 120032.8 0.995091 120768.7 
Dec'07 120391.5 0.951694 115901.6 
Jan'08 120750.1 1.069072 130645.6 
Feb'08 121108.7 0.951436 116669.5 
Mar'08 121467.3 0.704296 86659.9 
Apr'08 121825.9 1.231622 152062.1 
May'08 122184.5 1.083105 134180.3 
30
31
month 
sales 
(Yt) 
Yt^(deseasonalized 
factor) Rt(level) Gt(treind) 
seasonal 
factor 
seasonal 
indices 
forecasted 
demand(Ft) % variation 
7530.833 232.21 
June 
'05 5580 7763.043 0.71879 7073.096 -26.758 
July '05 8250 7995.253 1.031862 8739.312 -5.93105 
Aug '05 8250 8227.463 1.002739 8242.473 0.091238 
Sept'05 9165 8459.673 1.083375 8108.181 11.53103 
Oct '05 8280 8691.883 0.952613 7685.767 7.176727 
Nov '05 9310 8924.093 1.043243 9216.261 1.006867 
Dec'05 9835 9156.303 1.074123 9020.762 8.278982 
Jan '06 9820 9388.513 1.045959 9863.728 -0.4453 
Feb '06 7760 9620.723 0.806592 8007.905 -3.19465 
March 
'06 6212 9852.933 0.630472 6937.039 -11.6716 
April 
'06 15723 10085.14 1.559026 14403.85 8.389929 
May '06 12660 10317.35 1.227059 11451.72 9.544068 
June 
'06 11641 10549.56 1.103458 9611.962 17.4301 
July '06 12445 10781.77 1.154263 11785.15 5.30211 
Aug '06 11024 11013.98 1.000909 11034.08 -0.0914 
Sept'06 9374 11246.19 0.833527 10778.92 -14.9874 
Oct'06 9365 11478.4 0.81588 10149.74 -8.37947 
Nov '06 11971 11710.61 1.022235 12094.01 -1.02756 
Dec'06 10704 11942.82 0.896271 11766.03 -9.92184 
Jan'07 12848 12175.03 1.055274 12791.29 0.441369 
Feb'07 10647 12407.24 0.858128 10327.29 3.002793 
Mar'07 9829 12639.45 0.777644 8898.912 9.462696 
Apr'07 16700 12871.66 1.297424 18383.63 -10.0816 
May'07 13010 13103.87 0.992836 14544.61 -11.7956 
June'07 0.911124 12150.83 
july'07 1.093063 14830.99 
aug'07 1.001824 13825.68 
Sept'07 0.958451 13449.67 
Oct'07 0.884246 12613.71 
Nov'07 1.032739 14971.76 
Dec'07 0.985197 14511.3 
Jan'08 1.050617 15718.86 
Feb'08 0.83236 12646.68 
Mar'08 0.704058 10860.78 
Apr'08 1.428225 22363.41 
May'08 1.109948 17637.5 
32
Domstal Tab 
sales (y) 
Y^(deseaso 
nalized factor) Rt(level) Gt(trend) 
seasonal 
factor 
seasonal 
indices 
forecasted 
demand 
% 
variation 
908464.975 
- 
19904.11 
June 
'05 1601814 888560.8633 1.802 2159685 -35% 
July '05 1127094 868656.752 1.297 1150926 -2% 
Aug '05 754860 848752.6407 0.889 1247370 -65% 
Sept'05 780661 828848.5294 0.941 486365 38% 
Oct '05 366338 808944.4181 0.452 314832 14% 
Nov '05 519499 789040.3068 0.658 980747 -89% 
Dec'05 1153666 769136.1955 1.499 1007405 13% 
Jan '06 93867 749232.0842 0.125 232877 -148% 
Feb '06 161767 729327.9729 0.221 241359 -49% 
March 
'06 207495 709423.8616 0.292 273507 -32% 
April 
'06 551717 689519.7503 0.800 680936 -23% 
May '06 521824 669615.639 0.779 849166 -63% 
June 
'06 1987064 649711.5277 3.058 1579151 21% 
July '06 851742 629807.4164 1.352 834463 2% 
Aug '06 1250256 609903.3051 2.049 896345 28% 
Sept'06 136720 589999.1938 0.231 346209 -153% 
Oct'06 185576 570095.0825 0.325 221874 -20% 
Nov '06 1005490 550190.9712 1.827 683866 32% 
Dec'06 593723 530286.8599 1.119 694563 -17% 
Jan'07 253332 510382.7486 0.496 158637 37% 
Feb'07 215842 490478.6372 0.440 162316 25% 
Mar'07 225209 470574.5259 0.478 181422 19% 
Apr'07 529518 450670.4146 1.174 445060 16% 
May'07 756852 430766.3033 1.756 546272 28% 
June'07 2.43054243 998618 
july'07 1.32494925 518000 
aug'07 1.46965034 545320 
sept'07 0.58679561 206053 
Oct'07 0.38918846 128917 
Nov'07 1.24296128 386986 
Dec'07 1.30978815 381721 
Jan'08 0.31082059 84398 
Feb'08 0.33093342 83273 
Mar'08 0.38553344 89338 
Apr'08 0.98755149 209184 
May'08 1.26813932 243377 
A graph showing correlation among forecast, sales, orders, stock available for the 
Domstal tab. 
33
Forecast accuracy June'05-May'07(sales&orders) 
3000000 
2500000 
2000000 
1500000 
1000000 
500000 
0 
June 
'05 July '05 Aug '05 Sept'05 Oct '05 Nov 
'05 Dec'05 Jan '06 Feb '06 March 
'06 
April 
'06 
May 
'06 
June 
'06 July '06 Aug '06 Sept'06 Oct'06 Nov 
'06 Dec'06 Jan'07 Feb'07 Mar'07 Apr'07 May'07 
sales 1601814 1127094 754860 780661 366338 519499 1153666 93867 161767 207495 551717 521824 1987064 851742 1250256 136720 185576 1005490 593723 253332 215842 225209 529518 756852 
forecasted demand (sales) 21596851150926 1247370 486365 314832 980747 1007405 232877 241359 273507 680936 849166 1579151 834463 896345 346209 221874 683866 694563 158637 162316 181422 445060 546272 
orders 20252871328359782084 781741 371538 526339 1163220 93867 161863 207591 556421 525592 2037925864498 1256448 140196 191010 1073078 603124 260148 232799 228318 560602 795742 
forecasted demand (order) 247724612926221315619 495389 330171 10866511051971 253379 269189 292252 751399 951116 1723485889079 893926 332245 218369 708007 674449 159645 166439 177034 445103 549776 
sales forecasted demand (sales) orders forecasted demand (order) 
A graph showing the variation of forecast to sale and orders 
34
Tracking forecasting accuracy 
100% 
50% 
0% 
-50% 
-100% 
-150% 
-200% 
-250% 
June 
'05 
July 
'05 
Aug 
'05 
Sept'0 
5 
Oct 
'05 
Nov 
'05 
Dec'0 
5 
Jan 
'06 
Feb 
'06 
Marc 
h '06 
April 
'06 
May 
'06 
June 
'06 
July 
'06 
Aug 
'06 
Sept'0 
6 
Oct'0 
6 
Nov 
'06 
Dec'0 
6 
Jan'0 
7 
Feb'0 
7 
Mar'0 
7 
Apr'0 
7 
%variation sales Vs Demand -35% -2% -65% 38% 14% -89% 13% -148% -49% -32% -23% -63% 21% 2% 28% -153% -20% 32% -17% 37% 25% 19% 16% 28% 
%variaton orders Vs Demand -22% 3% -68% 37% 11% -106% 10% -170% -66% -41% -35% -81% 15% -3% 29% -137% -14% 34% -12% 39% 29% 22% 21% 31% 
% variation sales Vs past forecat 52% -46% -59% -28% -118% -54% -4% -220% -24% 13% 0% -5% 30% -41% -12% -47% -8% 10% -136% 41% 31% 33% 6% 34% 
%variation orders Vs past forecast 62% -24% -53% -28% -115% -52% -3% -220% -24% 13% 1% -5% 31% -39% -11% -43% -5% 16% -132% 42% 36% 34% 11% 37% 
month 
% variation 
%variation sales Vs Demand %variaton orders Vs Demand 
% variation sales Vs past forecat %variation orders Vs past forecast 
Linear (%variation sales Vs Demand) Linear (%variaton orders Vs Demand) 
May'0 
7 
In the above SKU, it has shown much fluctuation in the past forecasts. 
But this model works on some basic assumption and hence limitations; 
35
It needs data of two cycles, but TORRENT has many products that are launched after 
that. This means that this method fails with products having less data. 
This method concentrates only on 3 parameters which are very less. As there are many 
other probable factors which affect the actual sales. So, the method will not be able to 
give the accurate results. 
Method may also contain certain biases as the constants are initialized by the 
forecaster. 
So it is not advisable to carry on with the triple exponential method for forecasting. 
A more robust, flexible, and inclusive model is needed to be chosen and fitted to the 
data. 
Need of another method 
Another method must be applied, which can include every parameter affecting the 
actual sales. 
· A method which is adjustable to any change regarding the parameters. 
· One which gives very significant results. 
· One which gives elaborate explanations about the steps taken. 
· The method which gives less error. 
· One, which increases the forecast accuracy and effectiveness to the significant 
level. 
· A new method should be an inclusive one. Later when a new parameter is 
identified, it should be able to consider it. 
Multiple Regressions with ‘n’ factors 
General Purpose 
The general purpose of multiple regressions (the term was first used by Pearson, 1908) 
is to learn more about the relationship between several independent or predictor 
variables and a dependent or criterion variable. 
. 
Overview 
Multiple regression, a time-honored technique going back to Pearson's 1908 use of it, is 
employed to account for (predict) the variance in an interval dependent, based on linear 
combinations of interval, dichotomous, or dummy independent variables. Multiple 
regression can establish that a set of independent variables explains a proportion of the 
variance in a dependent variable at a significant level (through a significance test of R2), 
and can establish the relative predictive importance of the independent variables (by 
comparing beta weights). Power terms can be added as independent variables to 
explore curvilinear effects. Cross-product terms can be added as independent variables 
to explore interaction effects. One can test the significance of difference of two R2's to 
determine if adding an independent variable to the model helps significantly. Using 
hierarchical regression, one can see how most variance in the dependent can be 
explained by one or a set of new independent variables, over and above that explained 
by an earlier set. Of course, the estimates (b coefficients and constant) can be used to 
construct a prediction equation and generate predicted scores on a variable for further 
analysis. 
The multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's 
are the regression coefficients, representing the amount the dependent variable y 
changes when the corresponding independent changes 1 unit. The c is the constant, 
36
where the regression line intercepts the y axis, representing the amount the dependent 
y will be when all the independent variables are 0. The standardized version of the b 
coefficients is the beta weights, and the ratio of the beta coefficients is the ratio of the 
relative predictive power of the independent variables. Associated with multiple 
regression is R2, multiple correlation, which is the percent of variance in the dependent 
variable, explained collectively by all of the independent variables. 
Multiple regression shares all the assumptions of correlation: linearity of relationships, 
the same level of relationship throughout the range of the independent variable 
("homoscedasticity"), interval or near-interval data, absence of outliers, and data whose 
range is not truncated. In addition, it is important that the model being tested is correctly 
specified. The exclusion of important causal variables or the inclusion of extraneous 
variables can change markedly the beta weights and hence the interpretation of the 
importance of the independent variables. 
Key Terms and Concepts 
The regression equation takes the form 
Y =bo+ b1*x1 + b2*x2 + e 
; Where Y is the true dependent, 
b's are the regression coefficients for the corresponding x (independent) terms, 
c is the constant or intercept, 
e is the error term reflected in the residuals. 
Sometimes this is expressed more simply as 
y = bo+ b1*x1 + b2*x2 + e 
; Where y is the estimated dependent 
‘e’ is the constant (which includes the error term). 
Equations such as that above, with no interaction effects (see below), are called main 
effects models. In MS Excel 
Select Tools, Data Analysis, Regression 
Analyze, Regression, Linear; select your dependent and independent variables; click 
Statistics; select Estimates, Confidence Intervals, Model Fit; continue; OK. 
Predicted values, also called fitted values, are the values of each case based on using 
the regression equation for all cases in the analysis. In SPSS, dialog boxes use the 
term PRED to refer to predicted values and ZPRED to refer to standardized predicted 
values. Click the Save button in SPSS to add and save these as new variables in your 
dataset. 
Adjusted predicted values are the values of each case based on using the regression 
equation for all cases in the analysis except the given case. 
Residuals are the difference between the observed values and those predicted by the 
regression equation. 
Interaction effects are sometimes called moderator effects because the interacting 
third variable which changes the relation between two original variables is a moderator 
variable which moderates the original relationship. For instance, the relation between 
income and conservatism may be moderated depending on the level of education. 
The regression coefficient, b, is the average amount the dependent increases when 
the independent increases one unit and other independents are held constant. Put 
another way, the b coefficient is the slope of the regression line: the larger the b, the 
steeper the slope, the more the dependent changes for each unit change in the 
independent. The b coefficient is the unstandardized simple regression coefficient for 
the case of one independent. When there are two or more independents, the b 
37
coefficient is a partial regression coefficient, though it is common simply to call it a 
"regression coefficient" also. In SPSS, Analyze, Regression, Linear; click the Statistics 
button; make sure Estimates is checked to get the b coefficients (the default). 
b coefficients compared to partial correlation coefficients. The b coefficient is a 
semi-partial coefficient, in contrast to partial coefficients as found in partial correlation. 
The partial coefficient for a given independent variable removes the variance explained 
by control variables from both the independent and the dependent, then assesses the 
remaining correlation. In contrast, a semi-partial coefficient removes the variance only 
from the independent. That is, where partial coefficients look at total variance of the 
dependent variable, semi-partial coefficients look at the variance in the dependent after 
variance accounted for by control variables is removed. Thus the b coefficients, as 
semi-partial coefficients, reflect the unique (independent) contributions of each 
independent variable to explaining the total variance in the dependent variable. 
Dynamic inference is drawing the interpretation that the dependent changes b units 
because the independent changes one unit. That is, one assumes that there is a 
change process (a dynamic) which directly relates unit changes in x to b changes in y. 
This assumption implies two further assumptions which may or may not be true: (1) b is 
stable for all sub samples or the population (cross-unit invariance) and thus is not an 
artificial average which is often unrepresentative of particular groups; and (2) b is stable 
across time when later re-samples of the population are taken (cross-time invariance). 
t-tests are used to assess the significance of individual b coefficients. Specifically 
testing the null hypothesis that the regression coefficient is zero. A common rule of 
thumb is to drop from the equation all variables not significant at the .05 level or better. 
Note that restricted variance of the independent variable in the particular sample at 
hand can be a cause of a finding of no significance. Like all significance tests, the t-test 
assumes randomly sampled data. In SPSS, Analyze, Regression, Linear; click the 
Statistics button; make sure Estimates is checked to get t and the significance of b. 
Level-importance is the b coefficient times the mean for the corresponding 
independent variable. The sum of the level importance contributions for all the 
independents, plus the constant, equals the mean of the dependent variable. Achen 
(1982: 72) notes that the b coefficient may be conceived as the "potential influence" of 
the independent on the dependent, while level importance may be conceived as the 
"actual influence." This contrast is based on the idea that the higher the b, the more y 
will change for each unit increase in b, but the lower the mean for the given 
independent, the fewer actual unit changes will be expected. By taking both the 
magnitude of b and the magnitude of the mean value into account, level importance is a 
better indicator of expected actual influence of the independent on the dependent. Level 
importance is not computed by SPSS. 
The beta weights are the regression (b) coefficients for standardized data. Beta is the 
average amount the dependent increases when the independent increases one 
standard deviation and other independent variables are held constant. If an independent 
variable has a beta weight of .5, this means that when other independents are held 
constant, the dependent variable will increase by half a standard deviation (.5 also). The 
ratio of the beta weights is the ratio of the estimated unique predictive importance of the 
independents. Note that the betas will change if variables or interaction terms are added 
or deleted from the equation. Reordering the variables without adding or deleting will not 
affect the beta weights. That is, the beta weights help assess the unique importance of 
the independent variables relative to the given model embodied in the regression 
equation. Note that adding or subtracting variables from the model can cause the b and 
38
beta weights to change markedly, possibly leading the researcher to conclude that an 
independent variable initially perceived as unimportant is actually and important 
variable. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure 
Estimates is checked to get the beta coefficients (the default). 
Note that the betas reflect the unique contribution of each independent variable. Joint 
contributions contribute to R-square but are not attributed to any particular independent 
variable. The result is that the betas may underestimate the importance of a variable 
which makes strong joint contributions to explaining the dependent variable but which 
does not make a strong unique contribution. Thus when reporting relative betas, one 
must also report the correlation of the independent variable with the dependent variable 
as well, to acknowledge if it has a strong correlation with the dependent variable. 
Standardized means that for each datum the mean is subtracted and the result divided 
by the standard deviation. The result is that all variables have a mean of 0 and a 
standard deviation of 1. This enables comparison of variables of differing magnitudes 
and dispersions. Only standardized b-coefficients (beta weights) can be compared to 
judge relative predictive power of independent variables. 
Note some authors use "b" to refer to sample regression coefficients, and "beta" to refer 
to regression coefficients for population data. They then refer to "standardized beta" for 
what is simply called the "beta weight" here. 
Correlation: 
Pearson's r2 is the percent of variance in the dependent explained by the given 
independent when (unlike the beta weights) all other independents are allowed to vary. 
The result is that the magnitude of r2 reflects not only the unique covariance it shares 
with the dependent, but uncontrolled effects on the dependent attributable to covariance 
the given independent shares with other independents in the model. A rule of thumb is 
that multicollinearity may be a problem if a correlation is > .90 or several are >.7 in the 
correlation matrix formed by all the independents. 
The intercept, 
Variously expressed as e, c, or x-sub-0, is the estimated Y value when all the 
independents have a value of 0. Sometimes this has real meaning and sometimes it 
doesn’t — that is, sometimes the regression line cannot be extended beyond the range 
of observations, either back toward the Y axis or forward toward infinity. In SPSS, 
Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked 
to get the intercept, labeled the "constant" (the default). 
MS EXCEL allows the researcher to check a box to not have an intercept. This is 
equivalent to forcing the regression line to run through the origin. In rare cases the 
researcher may know the relation is linear and that the dependent variable is zero when 
all the independents are zero, in which case the option may be selected. 
R2, also called multiple correlations or the coefficient of multiple determination, is the 
percent of the variance in the dependent explained uniquely or jointly by the 
independents. R-squared can also be interpreted as the proportionate reduction in error 
in estimating the dependent when knowing the independents. That is, R2 reflects the 
number of errors made when using the regression model to guess the value of the 
dependent, in ratio to the total errors made when using only the dependent's mean as 
the basis for estimating all cases. Mathematically, R2 = (1 - (SSE/SST)), where SSE = 
error sum of squares = SUM ((Yi - EstYi) squared), where Yi is the actual value of Y for 
the ith case and EstYi is the regression prediction for the ith case; and where SST = total 
sum of squares = SUM ((Yi - MeanY) squared). The "residual sum of squares" in SPSS 
output is SSE and reflects regression error. Thus R-square is 1 minus regression error 
39
as a percent of total error and will be 0 when regression error is as large as it would be 
if you simply guessed the mean for all cases of Y. Put another way, the regression sum 
of squares/total sum of squares = R-square, where the regression sum of squares = 
total sum of squares - residual sum of squares. In SPSS, Analyze, Regression, Linear; 
click the Statistics button; make sure Model fit is checked to get R2. 
Maximizing R2 by adding variables is inappropriate unless variables are added to the 
equation for sound theoretical reason. At an extreme, when n-1 variables are added to a 
regression equation, R2 will be 1, but this result is meaningless. Adjusted R2 is used as 
a conservative reduction to R2 to penalize for adding variables and is required when the 
number of independent variables is high relative to the number of cases or when 
comparing models with different numbers of independents 
Standard Error of Estimate (SEE), confidence intervals, and prediction intervals. 
Confidence intervals around the mean are discussed in the section on significance. In 
regression, however, the confidence refers to more than one thing. Note the confidence 
and prediction intervals will improve (narrow) if sample size is increased, or the 
confidence level is decreased (ex., from 95% to 90%). 
For large samples, SEE approximates the standard error of a predicted value. SEE is 
the standard deviation of the residuals. In a good model, SEE will be markedly less than 
the standard deviation of the dependent variable. In a good model, the mean of the 
dependent variable will be greater than 1.96 times SEE. 
The confidence interval of the regression coefficient. Based on t-tests, the 
confidence interval is the plus/minus range around the observed sample regression 
coefficient, within which we can be, say, 95% confident the real regression coefficient 
for the population regression lies. Confidence limits are relevant only to random sample 
datasets. If the confidence interval includes 0, then there is no significant linear 
relationship between x and y. We then do not reject the null hypothesis that x is 
independent of y. In SPSS, Analyze, Regression, Linear; click Statistics; check 
Confidence Limits to get t and confidence limits on b. 
The confidence interval of y (the dependent variable) is also called the standard error 
of mean prediction. Some 95 times out of a hundred, the true mean of y will be within 
the confidence limits around the observed mean of n sampled cases. That is, the 
confidence interval is the upper and lower bounds for the mean predicted response. 
Note the confidence interval of y deals with the mean, not an individual case of y. 
Moreover, the confidence interval is narrower than the prediction interval, which deals 
with individual cases. Note a number of textbooks do not distinguish between 
confidence and prediction intervals and confound this difference. In SPSS, select 
Analyze, Regression, Linear; click Save; under "Prediction intervals" check "Mean" and 
under "Confidence interval" set the confidence level you want (ex., 95%). Note SPSS 
calls this a prediction interval for the mean. 
The prediction interval of y. For the 95% confidence limits, the prediction interval on a 
fitted value is plus/minus is the estimated value plus or minus 1.96 times SQRT (SEE + 
S2 
y), where S2 
y is the standard error of the mean prediction. Prediction intervals are 
upper and lower bounds for the prediction of the dependent variable for a single case. 
Thus some 95 times out of a hundred; a case with the given values on the independent 
variables would lie within the computed prediction limits. The prediction interval will be 
wider (less certain) than the confidence interval, since it deals with an interval estimate 
of cases, not means. In SPSS, select Analyze, Regression, Linear; click Save; under 
"Prediction intervals" check "Individual" and under "Confidence interval" set the 
confidence level you want (ex., 95%). 
40
F test: The F test is used to test the significance of R, which is the same as testing the 
significance of R2, which is the same as testing the significance of the regression model 
as a whole. If prob(F) < .05, then the model is considered significantly better than would 
be expected by chance and we reject the null hypothesis of no linear relationship of y to 
the independents. F is a function of R2, the number of independents and the number of 
cases. F is computed with k and (n - k - 1) degrees of freedom, where k = number of 
terms in the equation not counting the constant. 
F = [R2/k]/[(1 - R2 )/(n - k - 1)]. 
In MS EXCEL, the F test appears in the ANOVA table, which is part of regression 
output. Note that the F test is too lenient for the stepwise method of estimating 
regression coefficients and an adjustment to F is recommended ( 
Outliers are data points which lie outside the general linear pattern of which the midline 
is the regression line. A rule of thumb is that outliers are points whose standardized 
residual is greater than 3.3 (corresponding to the .001 alpha level). The removal of 
outliers from the data set under analysis can at times dramatically affect the 
performance of a regression model. Outliers should be removed if there is reason to 
believe that other variables not in the model explain why the outlier cases are unusual -- 
that is, these cases need a separate model. Alternatively, outliers may suggest that 
additional explanatory variables need to be brought into the model (that is, the model 
needs respecification). Another alternative is to use robust regression, whose algorithm 
gives less weight to outliers but does not discard them. 
Multicollinearity is the intercorrelation of independent variables. R2's near 1 violate the 
assumption of no perfect colinearity, while high R2's increase the standard error of the 
beta coefficients and make assessment of the unique role of each independent difficult 
or impossible. While simple correlations tell something about multicollinearity, the 
preferred method of assessing multicollinearity is to regress each independent on all the 
other 
Assumptions 
Proper specification of the model: If relevant variables are omitted from the model, 
the common variance they share with included variables may be wrongly attributed to 
those variables, and the error term is inflated. If causally irrelevant variables are 
included in the model, the common variance they share with included variables may be 
wrongly attributed to the irrelevant variables. The more the correlation of the irrelevant 
variable(s) with other independents, the greater the standard errors of the regression 
coefficients for these independents. Omission and irrelevancy can both affect 
substantially the size of the b and beta coefficients. This is one reason why it is better to 
use regression to compare the relative fit of two models rather than to seek to establish 
the validity of a single model. 
Linearity. Regression analysis is a linear procedure. To the extent nonlinear 
relationships are present, conventional regression analysis will underestimate the 
relationship. That is, R-square will underestimate the variance explained overall and the 
betas will underestimate the importance of the variables involved in the non-linear 
relationship. Substantial violation of linearity thus means regression results may be 
more or less unusable. Minor departures from linearity will not substantially affect the 
interpretation of regression output. Checking that the linearity assumption is met is an 
essential research task when use of regression models is contemplated. 
Nonlinear transformations. When nonlinearity is present, it may be possible to remedy 
the situation through use of exponential or interactive terms. Nonlinear transformation of 
selected variables may be a pre-processing step, but beware that this runs the danger 
41
of overfitting the model to what are, in fact, chance variations in the data. Power and 
other transform terms should be added only if there is a theoretical reason to do so. 
Adding such terms runs the risk of introducing multicollinearity in the model. A guard 
against this is to use centering when introducing power terms (subtract the mean from 
each score). Correlation and unstandardized b coefficients will not change as the result 
of centering. 
Partial regression plots are often used to assess nonlinearity. These are simply plots 
of each independent on the x axis against the dependent on the y axis. Curvature in the 
pattern of points in a partial regression plot shows if there is a nonlinear relationship 
between the dependent and any one of the independents taken individually. Note, 
however, that whereas partial regression plots are preferred for illuminating cases with 
high leverage, partial residual plots (below) are preferred for illuminating nonlinearities. 
Simple residual plots also show nonlinearity but do not distinguish monotone from 
nonmonotone nonlinearity. These are usually plots of standardized residuals against 
standardized estimates of Y, the dependent variable. The plot should show a random 
pattern, with no nonlinearity or heteroscedasticity. In jargon, this will show the error 
vector is orthogonal to the estimate vector. Non-linearity is, of course, shown when 
points form a curve. Non-normality is shown when points are not equally above and 
below the Y axis 0 line. Non-homoscedasticity is shown when points form a funnel or 
other shape showing variance differs as one moves along theY axis. 
Non-recursivity. The dependent cannot also be a cause of one or more of the 
independents. This is also called the assumption of non-simultaneity or absence of joint 
dependence. Violation of this assumption causes regression estimates to be biased and 
means significance tests will be unreliable. 
No overfitting. The researcher adds variables to the equation while hoping that adding 
each significantly increases R-squared. However, there is a temptation to add too many 
variables just to increase R-squared by trivial amounts. Such overfitting trains the model 
to fit noise in the data rather than true underlying relationships. Subsequent application 
of the model to other data may well see substantial drops in R-squared. 
Cross-validation is a strategy to avoid overfitting. Under cross-validation, a sample 
(typically 60% to 80%) is taken for purposes of training the model, then the hold-out 
sample (the other 20% to 40%) is used to test the stability of R-squared. This may be 
done iteratively for each alternative model until stable results are achieved. 
Unbounded data are an assumption. That is, the regression line produced by OLS can 
be extrapolated in both directions but is meaningful only within the upper and lower 
natural bounds of the dependent. 
Data are not censored, sample selected, or truncated. There are as many 
observations of the independents as for the dependents. Collapsing an interval variable 
into fewer categories leads to attenuation and will reduce R2. 
Absence of perfect multicollinearity. When there is perfect multicollinearity, there is 
no unique regression solution. Perfect multicollinearity occurs if independents are linear 
functions of each other (ex., age and year of birth), when the researcher creates dummy 
variables for all values of a categorical variable rather than leaving one out, and when 
there are fewer observations than variables. 
Absence of high partial multicollinearity. When there is high but imperfect 
multicollinearity, a solution is still possible but as the independents increase in 
correlation with each other, the standard errors of the regression coefficients will 
become inflated. High multicollinearity does not bias the estimates of the coefficients, 
42
only their reliability. This means that it becomes difficult to assess the relative 
importance of the independent variables using beta weights. It also means that a small 
number of discordant cases potentially can affect results strongly. The importance of 
this assumption depends on the type of multicollinearity. In the discussion below, the 
term "independents" refers to variables on the right-hand side of the regression 
equation other than control variables. 
Normally distributed residual error: Error, represented by the residuals, should be 
normally distributed for each set of values of the independents. A histogram of 
standardized residuals should show a roughly normal curve. An alternative for the same 
purpose is the normal probability plot, with the observed cumulative probabilities of 
occurrence of the standardized residuals on the Y axis and of expected normal 
probabilities of occurrence on the X axis, such that a 45-degree line will appear when 
observed conforms to normally expected. The F test is relatively robust in the face of 
small to medium violations of the normality assumption. The central limit theorem 
assumes that even when error is not normally distributed, when sample size is large, 
the sampling distribution of the b coefficient will still be normal. Therefore violations of 
this assumption usually have little or no impact on substantive conclusions for large 
samples, but when sample size is small, tests of normality are important. 
Additivity. Likewise, regression does not account for interaction effects, although 
interaction terms (usually products of standardized independents) may be created as 
additional variables in the analysis. As in the case of adding nonlinear transforms, 
adding interaction terms runs the danger of overfitting the model to what are, in fact, 
chance variations in the data. Such terms should be added only when there are 
theoretical reasons for doing so. That is, significant but small interaction effects from 
interaction terms not added on a theoretical basis may be artifacts of overfitting. Such 
artifacts are unlikely to be replicable on other datasets. 
Homoscedasticity: The researcher should test to assure that the residuals are 
dispersed randomly throughout the range of the estimated dependent. Put another way, 
the variance of residual error should be constant for all values of the independent(s). If 
not, separate models may be required for the different ranges. Also, when the 
homoscedasticity assumption is violated "conventionally computed confidence intervals 
and conventional t-tests for OLS estimators can no longer be justified" (Berry, 1993: 81). 
However, moderate violations of homoscedasticity have only minor impact on 
regression estimates (Fox, 2005: 516). 
No outliers. Outliers are a form of violation of homoscedasticity. Detected in the 
analysis of residuals and leverage statistics, these are cases representing high 
residuals (errors) which are clear exceptions to the regression explanation. Outliers can 
affect regression coefficients substantially. The set of outliers may suggest/require a 
separate explanation. Some computer programs allow an option of listing outliers 
directly, or there may be a "case wise plot" option which shows cases more than 2 s.d. 
from the estimate. To deal with outliers, the researcher may remove them from analysis 
and seek to explain them on a separate basis, or transforms may be used which tend to 
"pull in" outliers. These include the square root, logarithmic, and inverse (x = 1/x) 
transforms. 
Reliability: Reliability is reduced by measurement error and, since all variables have 
some measurement error, by having a large number of independent variables. To the 
extent there is random error in measurement of the variables, the regression 
coefficients will be attenuated. To the extent there is systematic error in the 
measurement of the variables, the regression coefficients will be simply wrong. (In 
43
contrast to OLS regression, structural equation modeling involves explicit modeling of 
measurement error, resulting in coefficients which, unlike regression coefficients, are 
unbiased by measurement error.) Note measurement error terms are not to be confused 
with residual error of estimate, discussed below. 
Population error is uncorrelated with each of the independents). 
This is the "assumption of mean independence": that the mean error is independent of 
the x independent variables. This is a critical regression assumption which, when 
violated, may lead to substantive misinterpretation of output. 
The (population) error term, which is the difference between the actual values of the 
dependent and those estimated by the population regression equation, should be 
uncorrelated with each of the independent variables. Since the population regression 
line is not known for sample data, the assumption must be assessed by theory. 
Specifically, one must be confident that the dependent is not also a cause of one or 
more of the independents, and that the variables not included in the equation are not 
causes of Y and correlated with the variables which are included. Either circumstance 
would violate the assumption of uncorrelated error. One common type of correlated 
error occurs due to selection bias with regard to membership in the independent 
variable "group" (representing membership in a treatment vs. a comparison group): 
measured factors such as gender, race, education, etc., may cause differential selection 
into the two groups and also can be correlated with the dependent variable. When there 
is correlated error, conventional computation of standard deviations, t-tests, and 
significance are biased and cannot be used validly. Note that residual error -- the 
difference between observed values and those estimated by the sample regression 
equation -- will always be uncorrelated and therefore the lack of correlation of the 
residuals with the independents is not a valid test of this assumption. 
Independent observations (absence of autocorrelation) leading to uncorrelated 
error terms. Current values should not be correlated with previous values in a data 
series. This is often a problem with time series data, where many variables tend to 
increment over time such that knowing the value of the current observation helps one 
estimate the value of the previous observation. Spatial autocorrelation can also be a 
problem when units of analysis are geographic units and knowing the value for a given 
area helps one estimate the value of the adjacent area. That is, each observation 
should be independent of each other observation if the error terms are not to be 
correlated, which would in turn lead to biased estimates of standard deviations and 
significance. 
By accepting all the assumptions and understanding the technicalities of the multiple 
regression model, it has been unanimously decided that multiple regression model 
should be used. As demand for the pharma products is affected by the various 
parameters with less or more concentration. So, it has been decided to work to 
construct the multiple regression model for the demand forecasting. 
So, there were certain steps to be taken. First of all proper software should be selected 
to apply the multiple regression model on the product basket of 500+ products. 
It was found that MS Excel has the facility to apply the multiple regression with using 
certain number of parameters. Let’s learn first how to use Multiple Regression function 
in MS Excel. 
44
Multiple Regression with MS Excel 
To do regression in Excel, you need the Analysis Toolpak add-in to be installed in 
Excel. This was an option when you installed Excel, but you might not have selected it. 
If you didn't install it, Excel will ask you for the CD, when you try to add the toolpak. 
Check that the add-in is installed, and added-in, by choosing Add-ins from the tools 
menu (as shown below). 
Then ensure that "Analysis ToolPak" is selected, as shown below. 
You can now use the data analysis functions in Excel, which include multiple 
regression. 
The example that we will work through is taken from dataset 6.1b in the book "Applying 
regression and correlation" (if you jumped straight in here, that is what these web pages 
is about. 
To get to the data analysis function in Excel, you select the Tools menu, and then 
choose Data Analysis. 
45
This gives the following Dialog, click on Regression and then click OK. 
The following dialog appears: 
In here, we tell Excel about the data that we would like to analyze. 
The first box is the input Y range. Here, we tell Excel about our dependent variable. 
The dependent variable must be a column, 1 cell wide and N cells long (where N is the 
number of individuals that we are analyzing). 
The dataset we are using, the dependent variable is An, which is the column which 
goes from cell D1 to Cell D41. You can either type this information in directly as 
D1:D41, or you can select the appropriate data from the spreadsheet. 
Because we have included row 1, which includes the variable name, we are going to 
have to tell Excel this, by clicking on the "Labels" checkbox. 
46
The next stage is to input the independent variables. The independent variables must 
be a block of data, of k columns (where k is the number of independent variables) and N 
rows (where N is still the number of people). In the dataset we are using we have three 
independent variables: hassles, hassles2 and hassles3. (These represent the linear, 
quadratic and cubic effects of hassles - we are analyzing a non-linear relationship here,) 
These are held in rows 1 - 41 of columns A, B and C. Again, we can type in A1:C41 or 
select the data from the spreadsheet - it will have the same effect. 
Next we tell Excel where we want the results to be written. It is best to ask for a new 
sheet - you don't want to accidentally overwrite some of your precious data, and have to 
go to all of the effort of restoring it from a backup, do you? (You do have a backup, 
don't you?) 
We can ask fro residuals and standardized residuals to be saved - these will be new 
columns of numbers created in the new spreadsheet. 
Two types of graphs will be drawn automatically if you ask for them. 
· A residual plot will draw scatter plots of each independent variable on the x-axis, 
and the residual on the y-axis. 
· A line fit plot will draw scatter plots of each independent variable on the x-axis, 
and the predicted and actual values of the dependent variable on the y axis. 
· You cannot, as far as I have been able to determine, automatically have 
· A scatter plot with the predicted values on the x-axis, and the residuals on the y-axis 
(although you can calculate these values and save them.) 
You can also request a normal probability plot. This appears to be a plot of the 
dependent variable, which is a curious thing to plot - regression analysis does not 
assume normal distribution of the dependent variable. The usual plot of this type would 
be the residuals, but this is not possible in Excel. 
The dialog box now looks like this: 
47
. 
So, finally, we click OK. 
And we get a lot of output, written to a new sheet. A note about this output - output from 
analysis in Excel is usually "live" that is to say, the data are linked to the output. If you 
change the data, you will change the output. This is not the case for this type of output 
in Excel. The results of the analysis are "dead" and will not change. 
Regression Statistics 
The first part of the output is the regression statistics. These are standard statistics 
which are given by most programs. 
ANOVA 
The ANOVA table comes next. This gives a test of significance of the R2. Note that 
Excel uses scientific notation, by default, so when it says 2.22E-08 it means, 2.22 * 10- 
8 . (i.e. 0.0000000222). 
ON the next page is shown the summary output given by the regression function in MS 
EXCEL. 
48
Summary Output 
Regression Statistics 
Multiple R 
R Square 
Adjusted R Square 
Standard Error 
Observations 
ANOVA 
df SS MS F Significance F 
Regression 
Residual 
Total 
49
Coefficients Standard 
Error 
t Stat P-value Lower 95% Upper 95% Lower 
95.0% 
Upper 
95.0% 
Intercept 
X1 
X2 
X3 
X4 
X5 
X6 
X7 
… 
50
RESIDUAL OUTPUT 
Observation Ft(forecast) Residuals 
(Yt-Ft) 
1 98559.34 704.6626 
2 108155.6 -280.247 
3 116368.6 -281.312 
4 123269.4 -62.7746 
5 94083.14 -83.1435 
6 110911.2 -241.879 
7 102224 -57.3114 
8 107602.8 10.49856 
9 95990.37 -69.035 
10 85130.35 9.64867 
11 157103.7 102.9695 
12 144048.8 -76.1371 
13 119017.1 140.9353 
14 129806.9 -32.2203 
15 112633.6 76.05105 
16 112319.5 -5.18338 
17 134574.3 176.6968 
18 121418.5 125.4875 
19 89674.86 88.14016 
20 112742.9 -179.894 
21 98022.22 117.7834 
22 74801.29 -205.29 
23 127936 -25.0449 
24 94617.27 11.72969 
51
Coefficients 
The next stage is the coefficients. Note that here I have converted the numbers to 2 
decimal places to save space). It gives the coefficient for each parameter, including 
the intercept (the constant). The standard errors, and the t-values follow (the t-value is 
the coefficient divided by the standard error). Next comes the p-value associated with 
the variable, and the confidence intervals of the parameter estimates (Excel gave these 
to me twice, even though I didn't ask for them.) 
Residuals 
The final part of the output is the residual information. The observation in the left had 
column is the case number - although Excel never told us about this, it has labeled the 
first person Observation 1, the second Observation 2, etc. (Note that this is NOT the 
original row number - Observation 1 was row 2). 
The predicted anxiety score is the score that was predicted from the regression 
equation. The residual is the raw residual - that is the difference between the predicted 
score and the actual score on the dependent variable. The final value is the 
standardized residual (the residuals adjusted to ensure that they have a standard 
deviation of 1; they have a mean of zero already). 
Graphs 
finally we will have a quick look at the graphs. 
The first graph is an example of the residual plots - it has hassles on the x-axis and the 
unstandardized residual on the y-axis. 
The second graphs show the predicted and actual anxiety scores plotted against hassles3. 
52
By using MS Excel it is possible to apply the Multiple Regression function, as stated 
above. 
Limitation of Regression function 
Regression function gives the sheet, which doesn’t change. It is known as a dead sheet. 
This doesn’t fit into our criteria. 
A dynamic function is needed which gives the output which changes, as data changes. 
By a validation list, data is made changed along with the SKUs. 
By Regression function, we are not getting an output which changes with the SKU. 
It is not possible to create summary output for the entire product basket. 
Hence, another function is used to get the changing output. 
A function called LINEST (Linear Estimation) is used. 
LINEST 
Calculates the statistics for a line by using the "least squares" method to calculate a 
straight line that best fits your data, and returns an array that describes the line. 
Because this function returns an array of values, it must be entered as an array formula. 
The equation for the line is: 
y = mx + b or 
y = m1x1 + m2x2 + ... + b (if there are multiple ranges of x-values) 
Where the dependent y-value is a function of the independent x-values. The m-values 
are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, 
and m can be vectors. The array that LINEST returns is {mn,mn-1,...,m1,b}. LINEST can 
also return additional regression statistics. 
Syntax 
LINEST(known_y's,known_x's,const,stats) 
Known_y's is the set of y-values you already know in the relationship y = mx + b. 
53
If the array known_y's is in a single column, then each column of known_x's is 
interpreted as a separate variable. 
If the array known_y's is in a single row, then each row of known_x's is interpreted as a 
separate variable. 
Known_x's is an optional set of x-values that you may already know in the relationship 
y = mx + b 
. 
The array known_x's can include one or more sets of variables. If only one variable is 
used, known_y's and known_x's can be ranges of any shape, as long as they have 
equal dimensions. If more than one variable is used, known_y's must be a vector (that 
is, a range with a height of one row or a width of one column). 
If known_x's is omitted, it is assumed to be the array {1, 2,3,...} that is the same size as 
known_y's. 
Const is a logical value specifying whether to force the constant b to equal 0. 
If const is TRUE or omitted, b is calculated normally. 
If const is FALSE, b is set equal to 0 and the m-values are adjusted to fit y = mx. 
Statistics are a logical value specifying whether to return additional regression statistics. 
If stats is TRUE, LINEST returns the additional regression statistics, so the returned 
array is {mn,mn-1,...,m1,b;sen,sen-1,...,se1,seb;r2,sey;F,df;ssreg,ssresid}. 
If stats is FALSE or omitted, LINEST returns only the m-coefficients and the constant b. 
The additional regression statistics are as follows. 
Statistic Description 
se1,se2,...,sen The standard error values for the coefficients m1,m2,...,mn. 
seb The standard error value for the constant b (seb = #N/A when const is 
FALSE). 
r2 The coefficient of determination. Compares estimated and actual y-values, 
and ranges in value from 0 to 1. If it is 1, there is a perfect 
correlation in the sample— there is no difference between the 
estimated y-value and the actual y-value. At the other extreme, if the 
coefficient of determination is 0, the regression equation is not helpful 
in predicting a y-value. For information about how r2 is calculated, see 
"Remarks" later in this topic. 
sey The standard error for the y estimate. 
F The F statistic or the F-observed value. Use the F statistic to determine 
whether the observed relationship between the dependent and 
independent variables occurs by chance. 
df The degrees of freedom. Use the degrees of freedom to help you find 
F-critical values in a statistical table. Compare the values you find in 
the table to the F statistic returned by LINEST to determine a 
confidence level for the model. For information about how df is 
calculated, see "Remarks" later in this topic. Example 4 below shows 
use of F and df. 
SSreg The regression sum of squares. 
SSresid The residual sum of squares. For information about how ssreg and 
54
ssresid are calculated, see "Remarks" later in this topic. 
The following illustration shows the order in which the additional regression statistics are 
returned. 
Statistics given by function 
coeff(n) coeff(n-1) coeff(n-2) coeff(n-3) …… 
se(n) se(n-1) se(n-2) se(n-3) …… 
coeff of det S.E 
F stats d.f. 
SS reg SS resid 
Fitting Multiple Regression Model 
AT SCM dept, DPC plays with vast and scattered product basket. Product basket 
contains various drugs in the form of tablets, capsules, vials and bottles. Various drugs 
are combination of the different molecules. Product belongs to the different molecule 
classes. As we have discussed and got certain numbers of parameters which can affect 
the actual sales, each parameter has to be checked out for its impact on the actual 
sales. 
We have the question of including parameters in to the model as an independent 
parameter. 
One should check out the significance and validity of the parameter. After deciding all 
those criteria, a decision should be taken as to which parameter should be included. 
ASSUMPTIONS made 
Parameters taken into considerations are least correlated 
Multiple regression model follows all the assumption of the correlation. 
Data, which are collected, is accurate. 
Future estimates of the parameters are true. 
There is no intercept considered. 
Data Sources 
SAP data files- SAP data files are the files which are extracted from the SAP. 
As SAP contains all the data regarding the sales, orders, availability, field targets, 
institutional sales and what not! SAP contains past data in every form in which it is 
needed. Generally these data are fed into SAP in the past. So to get the data, SAP is 
used and data files are used as the data source. Thus, SAP data files are the internal 
source of the data. 
ORG-MARG DATA 
ORG-MARG is the market research company. They collect the sales data from retail 
counters. Data collected by ORG people is product specific, company specific, industry 
specific, market specific. 
55
Data used for the project is of Pharmaceuticals’ sector. 
TORRENT is a subscriber of the ORG Data. TORRENT uses the org data for the 
market research and analysis purpose. 
There is a separate cell at TORRENT, which deals with the ORG data. ORG data is 
replenished on every month for the recent past month by the ORG-MARG. 
ORG MARG has the dedicated software, which are used to get the data in the form as it 
is needed. 
ORG data is available on the market basis. 
Data available has shown the hierarchies as shown below in the graph. 
ORG data is available on the monthly as well as yearly basis. 
ORG data is available in the units (strips) and value. They also give the company market 
share, company market growth, molecule growth, molecule class growth, and company’s 
share in the particular sector. It also provides the statistics in terms of years. How much 
market share does the company have? How much does it have gained or lost? 
ORG provides the data a month later i.e. in the month of June, it provides the data of the 
month of May. 
56 
Market 
Pharmaceuticals 
Molecule class 
Tranquilizers 
Molecule 
Aalprazolam 
Pack wise 
Alprax .5 tab 
Alprax Sr 0.5 Tab 
Strength wise 
Alprax (0.5) 
(All the products 
consisting 
0.5 strength) 
Brand 
Alprax
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit
Torrent pharmaceuticals limited ; suchit

More Related Content

What's hot

About Pfizer Company
About Pfizer CompanyAbout Pfizer Company
About Pfizer Companyindira 7
 
Portfolio analysis and case study of Tata group
Portfolio analysis and case study of Tata groupPortfolio analysis and case study of Tata group
Portfolio analysis and case study of Tata groupAnjanaS27
 
Performance Appraisal Project Work
Performance Appraisal Project WorkPerformance Appraisal Project Work
Performance Appraisal Project WorkSri HimaShouri M
 
Sun pharma- A complete company review, analysis of crisis and realistic recom...
Sun pharma- A complete company review, analysis of crisis and realistic recom...Sun pharma- A complete company review, analysis of crisis and realistic recom...
Sun pharma- A complete company review, analysis of crisis and realistic recom...TilikaChawda
 
Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)
Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)
Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)Eularis
 
Research Report on Cipla Limited
Research Report on Cipla Limited Research Report on Cipla Limited
Research Report on Cipla Limited Aditya Arora
 
Employee motivation FOR MBA FROJECT
Employee motivation FOR MBA FROJECTEmployee motivation FOR MBA FROJECT
Employee motivation FOR MBA FROJECTJohn Ap
 
Organizational Health
Organizational HealthOrganizational Health
Organizational HealthDavid Alman
 
Financial analysis of Tata Steel
Financial analysis of Tata SteelFinancial analysis of Tata Steel
Financial analysis of Tata SteelShubhank Shukla
 
HR Department of a Pharma Company
HR Department of a Pharma CompanyHR Department of a Pharma Company
HR Department of a Pharma CompanyArsalan Humayun
 
Performance management system at mahindra and mahindra Limited
Performance management system at mahindra and mahindra LimitedPerformance management system at mahindra and mahindra Limited
Performance management system at mahindra and mahindra LimitedAditiee Deshpande
 

What's hot (20)

About Pfizer Company
About Pfizer CompanyAbout Pfizer Company
About Pfizer Company
 
Portfolio analysis and case study of Tata group
Portfolio analysis and case study of Tata groupPortfolio analysis and case study of Tata group
Portfolio analysis and case study of Tata group
 
Cipla final
Cipla finalCipla final
Cipla final
 
Performance Appraisal Project Work
Performance Appraisal Project WorkPerformance Appraisal Project Work
Performance Appraisal Project Work
 
B2 novartis
B2 novartisB2 novartis
B2 novartis
 
Lupin
LupinLupin
Lupin
 
Nakul lupin
Nakul lupinNakul lupin
Nakul lupin
 
Sun pharma- A complete company review, analysis of crisis and realistic recom...
Sun pharma- A complete company review, analysis of crisis and realistic recom...Sun pharma- A complete company review, analysis of crisis and realistic recom...
Sun pharma- A complete company review, analysis of crisis and realistic recom...
 
Project report titles for mba in pharma
Project report titles for mba in pharmaProject report titles for mba in pharma
Project report titles for mba in pharma
 
Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)
Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)
Pre-Launch Planning: Priming Your Pharma Brand For Profit And Success (mini)
 
Research Report on Cipla Limited
Research Report on Cipla Limited Research Report on Cipla Limited
Research Report on Cipla Limited
 
ppt
pptppt
ppt
 
Cipla Presentation
Cipla PresentationCipla Presentation
Cipla Presentation
 
Employee motivation FOR MBA FROJECT
Employee motivation FOR MBA FROJECTEmployee motivation FOR MBA FROJECT
Employee motivation FOR MBA FROJECT
 
Organizational Health
Organizational HealthOrganizational Health
Organizational Health
 
Financial analysis of Tata Steel
Financial analysis of Tata SteelFinancial analysis of Tata Steel
Financial analysis of Tata Steel
 
Tata motors pms and kpi
Tata motors pms and kpiTata motors pms and kpi
Tata motors pms and kpi
 
HR Department of a Pharma Company
HR Department of a Pharma CompanyHR Department of a Pharma Company
HR Department of a Pharma Company
 
Performance management system at mahindra and mahindra Limited
Performance management system at mahindra and mahindra LimitedPerformance management system at mahindra and mahindra Limited
Performance management system at mahindra and mahindra Limited
 
Losart (Losar
Losart (LosarLosart (Losar
Losart (Losar
 

Similar to Torrent pharmaceuticals limited ; suchit

Comprehensive Analysis on: Business Strategy of Square Pharmaceuticals Limited
Comprehensive Analysis on:  Business Strategy of Square Pharmaceuticals LimitedComprehensive Analysis on:  Business Strategy of Square Pharmaceuticals Limited
Comprehensive Analysis on: Business Strategy of Square Pharmaceuticals LimitedSadman Prodhan
 
Summer internship project report ranabxy kishan kumar_2010096
Summer internship project report ranabxy kishan kumar_2010096Summer internship project report ranabxy kishan kumar_2010096
Summer internship project report ranabxy kishan kumar_2010096Kishankrduggal
 
Mgt 490- Independent University, Bangladesh
Mgt 490- Independent University, BangladeshMgt 490- Independent University, Bangladesh
Mgt 490- Independent University, BangladeshJunait Husain Rahul
 
Mgt 490- Independent University, Bangladesh
Mgt 490- Independent University, BangladeshMgt 490- Independent University, Bangladesh
Mgt 490- Independent University, BangladeshJunait Husain Rahul
 
Six weeks summer training report
Six weeks summer training reportSix weeks summer training report
Six weeks summer training reportSaurabh Prakash
 
project-report-on-working-capital
project-report-on-working-capitalproject-report-on-working-capital
project-report-on-working-capitalRamesh Ankathi
 
Pharmacy systems analysis
Pharmacy systems analysis Pharmacy systems analysis
Pharmacy systems analysis WGroup
 
summer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTD
summer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTDsummer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTD
summer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTDMayank Patel
 
“A study on customer feedback and upgradation of Haem up vet launched or intr...
“A study on customer feedback and upgradation of Haem up vet launched or intr...“A study on customer feedback and upgradation of Haem up vet launched or intr...
“A study on customer feedback and upgradation of Haem up vet launched or intr...Vatsal Patel
 
The study of scope and implementation of lean aspects
The study of scope and implementation of lean aspectsThe study of scope and implementation of lean aspects
The study of scope and implementation of lean aspectsprjpublications
 
The study of scope and implementation of lean aspects
The study of scope and implementation of lean aspectsThe study of scope and implementation of lean aspects
The study of scope and implementation of lean aspectsprj_publication
 
MedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27th
MedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27thMedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27th
MedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27thAnup Soans
 
0601050 market potential for erp system
0601050 market potential for erp system0601050 market potential for erp system
0601050 market potential for erp systemSupa Buoy
 

Similar to Torrent pharmaceuticals limited ; suchit (20)

Assignment
AssignmentAssignment
Assignment
 
Triad-Isotopes
Triad-IsotopesTriad-Isotopes
Triad-Isotopes
 
Squar pharma
Squar pharmaSquar pharma
Squar pharma
 
Comprehensive Analysis on: Business Strategy of Square Pharmaceuticals Limited
Comprehensive Analysis on:  Business Strategy of Square Pharmaceuticals LimitedComprehensive Analysis on:  Business Strategy of Square Pharmaceuticals Limited
Comprehensive Analysis on: Business Strategy of Square Pharmaceuticals Limited
 
Summer internship project report ranabxy kishan kumar_2010096
Summer internship project report ranabxy kishan kumar_2010096Summer internship project report ranabxy kishan kumar_2010096
Summer internship project report ranabxy kishan kumar_2010096
 
Mgt 490- Independent University, Bangladesh
Mgt 490- Independent University, BangladeshMgt 490- Independent University, Bangladesh
Mgt 490- Independent University, Bangladesh
 
Mgt 490- Independent University, Bangladesh
Mgt 490- Independent University, BangladeshMgt 490- Independent University, Bangladesh
Mgt 490- Independent University, Bangladesh
 
Six weeks summer training report
Six weeks summer training reportSix weeks summer training report
Six weeks summer training report
 
project-report-on-working-capital
project-report-on-working-capitalproject-report-on-working-capital
project-report-on-working-capital
 
Pharmacy systems analysis
Pharmacy systems analysis Pharmacy systems analysis
Pharmacy systems analysis
 
Sachin Resume(1)
Sachin Resume(1)Sachin Resume(1)
Sachin Resume(1)
 
Overview pharmaceutical insights
Overview   pharmaceutical insightsOverview   pharmaceutical insights
Overview pharmaceutical insights
 
summer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTD
summer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTDsummer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTD
summer internship project report on SUPPLY CHAIN ON SUPER CROP SAFE LTD
 
Main zota abstract
Main zota abstractMain zota abstract
Main zota abstract
 
Report
ReportReport
Report
 
“A study on customer feedback and upgradation of Haem up vet launched or intr...
“A study on customer feedback and upgradation of Haem up vet launched or intr...“A study on customer feedback and upgradation of Haem up vet launched or intr...
“A study on customer feedback and upgradation of Haem up vet launched or intr...
 
The study of scope and implementation of lean aspects
The study of scope and implementation of lean aspectsThe study of scope and implementation of lean aspects
The study of scope and implementation of lean aspects
 
The study of scope and implementation of lean aspects
The study of scope and implementation of lean aspectsThe study of scope and implementation of lean aspects
The study of scope and implementation of lean aspects
 
MedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27th
MedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27thMedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27th
MedicinMan CEO Roundtable 2021 is here... Saturday, Feb 27th
 
0601050 market potential for erp system
0601050 market potential for erp system0601050 market potential for erp system
0601050 market potential for erp system
 

More from jitharadharmesh

Gcsr report BY JITHARA DHARMESH
Gcsr report BY JITHARA DHARMESHGcsr report BY JITHARA DHARMESH
Gcsr report BY JITHARA DHARMESHjitharadharmesh
 
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062Jithara dharmesh 137730592028 sadhariya jagdish 137730592062
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062jitharadharmesh
 
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...jitharadharmesh
 
Presentation on customer awareness and prefering skoda cars
Presentation on customer awareness and prefering skoda carsPresentation on customer awareness and prefering skoda cars
Presentation on customer awareness and prefering skoda carsjitharadharmesh
 
“Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot
“Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot “Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot
“Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot jitharadharmesh
 
ppt on WORKING CAPITAL MANAGEMENT AT Silver Forge Pvt...
        ppt on       WORKING CAPITAL MANAGEMENT AT           Silver Forge Pvt...        ppt on       WORKING CAPITAL MANAGEMENT AT           Silver Forge Pvt...
ppt on WORKING CAPITAL MANAGEMENT AT Silver Forge Pvt...jitharadharmesh
 
ppt on rolex ring pvt.ltd
ppt on rolex ring pvt.ltdppt on rolex ring pvt.ltd
ppt on rolex ring pvt.ltdjitharadharmesh
 
“Customer awareness and preferring Skoda Yeti.”
“Customer awareness and preferring Skoda Yeti.”“Customer awareness and preferring Skoda Yeti.”
“Customer awareness and preferring Skoda Yeti.”jitharadharmesh
 
Creative analysis of financial report
Creative analysis of financial reportCreative analysis of financial report
Creative analysis of financial reportjitharadharmesh
 
FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM)
 FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM) FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM)
FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM)jitharadharmesh
 
market research on customer awarness of bank
market research on customer awarness of bankmarket research on customer awarness of bank
market research on customer awarness of bankjitharadharmesh
 
Gp consumer behaviour for third party at private banks
Gp   consumer behaviour for third party at private banksGp   consumer behaviour for third party at private banks
Gp consumer behaviour for third party at private banksjitharadharmesh
 
Credit management & npa of co operative bank ltd.1
Credit management & npa of co operative bank ltd.1Credit management & npa of co operative bank ltd.1
Credit management & npa of co operative bank ltd.1jitharadharmesh
 

More from jitharadharmesh (20)

Gcsr report BY JITHARA DHARMESH
Gcsr report BY JITHARA DHARMESHGcsr report BY JITHARA DHARMESH
Gcsr report BY JITHARA DHARMESH
 
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062Jithara dharmesh 137730592028 sadhariya jagdish 137730592062
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062
 
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...
Jithara dharmesh 137730592028 sadhariya jagdish 137730592062 ROLE OF SALES PR...
 
Sbi ppt
Sbi pptSbi ppt
Sbi ppt
 
Presentation on customer awareness and prefering skoda cars
Presentation on customer awareness and prefering skoda carsPresentation on customer awareness and prefering skoda cars
Presentation on customer awareness and prefering skoda cars
 
“Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot
“Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot “Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot
“Foreign Exchange Risk Management” at Rolex Rings Pvt. Ltd., Rajkot
 
ppt on WORKING CAPITAL MANAGEMENT AT Silver Forge Pvt...
        ppt on       WORKING CAPITAL MANAGEMENT AT           Silver Forge Pvt...        ppt on       WORKING CAPITAL MANAGEMENT AT           Silver Forge Pvt...
ppt on WORKING CAPITAL MANAGEMENT AT Silver Forge Pvt...
 
WORKER’S ABSENTEEISM
WORKER’S ABSENTEEISMWORKER’S ABSENTEEISM
WORKER’S ABSENTEEISM
 
Final
FinalFinal
Final
 
ppt on rolex ring pvt.ltd
ppt on rolex ring pvt.ltdppt on rolex ring pvt.ltd
ppt on rolex ring pvt.ltd
 
“Customer awareness and preferring Skoda Yeti.”
“Customer awareness and preferring Skoda Yeti.”“Customer awareness and preferring Skoda Yeti.”
“Customer awareness and preferring Skoda Yeti.”
 
ROLEX RINGS PVT. LTd
ROLEX RINGS PVT. LTdROLEX RINGS PVT. LTd
ROLEX RINGS PVT. LTd
 
Creative analysis of financial report
Creative analysis of financial reportCreative analysis of financial report
Creative analysis of financial report
 
FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM)
 FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM) FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM)
FULFILMENT PROCESS & HOW TO INCREASE SALES ( CORPORATE BUSINESS CDMA V/S GSM)
 
market research on customer awarness of bank
market research on customer awarness of bankmarket research on customer awarness of bank
market research on customer awarness of bank
 
Gp consumer behaviour for third party at private banks
Gp   consumer behaviour for third party at private banksGp   consumer behaviour for third party at private banks
Gp consumer behaviour for third party at private banks
 
Hdfc mutual fund
Hdfc mutual fundHdfc mutual fund
Hdfc mutual fund
 
Hdfc bank
Hdfc bankHdfc bank
Hdfc bank
 
Credit management & npa of co operative bank ltd.1
Credit management & npa of co operative bank ltd.1Credit management & npa of co operative bank ltd.1
Credit management & npa of co operative bank ltd.1
 
credit management
credit managementcredit management
credit management
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

Torrent pharmaceuticals limited ; suchit

  • 1. Project Report on Developing statistical models of demand forecasting for domestic trade market operations of Torrent Pharmaceuticals Ltd. In partial fulfillment of requirements of Master of Business Administration (2006-08) Submitted BY SUCHIT SHAH Roll no.51 MBA-I SUBMITTED TO AES PGIBM Undertaken at Torrent Pharmaceuticals Limited
  • 2. ACKNOWLEDGEMENT I here by wish to take the opportunity to express my gratitude to Mr. K G Ramchandran- General Manager Human Resources ; for allowing to undertake my summer training at a well reputed organization like Torrent Pharmaceuticals Limited and Ms. MIti Randeri and Ms. Mallika Priyadarshini- Assistant Manager HR for taking care of all our official requirements. I express my sincere thanks and gratitude to Mr. Vipul Patel- General Manager Supply Chain Management and Mr. Chandan Chatterjee- AGM Supply Chain Management for guiding and encouraging me to carry out my project work successfully. I wish to convey my deepest regards and thanks to my project guide, Mr. Bhavesh Nainani-Manager DPC and Mr. Deep Vyas -Manager DPC for their all help, timely guidance and feedback in spite of a very busy schedule. They always managed to find time to sit with me and provide the necessary guideline and ideas. I also wish to express my sincere thanks to Mr. Bhavin Shah- Assistant Manager DPC and Mr. Jayant Nikhare- Assistant Manager DPC for their help and support in every possible way. Finally I wish to thank all staff of Supply Chain Management Department for their kind co operation during the tenure of my project. Suchit Shah Summer Trainee May-July (2007) AES Post Graduate Institute of Business Management Ahmedabad 2
  • 3. TABLE OF CONTENTS Heading Page No. Executive Summary 04 Torrent Group-overview 06 Mission Vision & Values 06 Objective 07 Salient Features 07 Project Constraint 07 Assumptions made during the project 07 Project Overview 09 Benefits Expected 10 Demand Forecasting-Introduction 10 The basic steps in a forecasting task 11 Company network 14 Demand Planning @SCM Dept 17 Exponential Smoothing 25 Triple Exponential Smoothing 25 Multiple Regressions with ‘n’ factors 33 Multiple Regression with MS Excel 42 LINEST function 49 Fitting Multiple Regression Model 46 Methodology 56 Regional level forecasting 75 Findings 79 Recommendations 83 3
  • 4. Future scopes of the model 83 Limitations 84 References 85 Executive Summary This report first attempts to study, how Planning system works at Domestic Demand Planning Cell (DPC), SCM dept, Torrent Pharmaceuticals Ltd with a view to get acquainted with the system & processes. It tries to understand the various reports prepared by DPC.What type of data are maintained, in what form, in what type? To understand the existing Demand forecasting procedure. An exploratory analysis is done. Initially it attempts to get the idea of product basket. Products are primarily classified as per their sales behavior. In the initial phase, it studies the product basket. It attempts to identify and define all the factors which may directly or indirectly affect the sales of the SKU. After getting acquainted with the product basket and its behavior, it defines the definition of problem. Demand forecasting is the process of determining what products are needed where, when, and in what quantities. It is needed to forecast for the sales in a way so that it shows less fluctuations. It explores all concerned topic with demand forecasting. The present system of the forecasting is well structured and well defined. Somehow it has not been able to show accurate results of forecasting. The system can’t quantify the fluctuation in the actual sales. That is why a need, for developing a statistical model, arises. Then report tries to explore for the alternative models available. With having the sales data of 24 months, a triple exponential smoothing model is applied with its assumptions. That hasn’t shown the desired output and so, it has been rejected. The actual sales data are affected by many parameters. They all should be taken care of and should be given effect to the actual sales. After seeing the complexities, it decides to apply Multiple Regression model with the parameters short listed. It takes all the assumption of multiple regression for granted. Under the model it considers the sales as an independent and affecting parameter as dependent one. 4
  • 5. A database is made for getting the data of included parameters. Data are collected from the SAP and ORG-Marg. Primary sales and institutional sales are got through the internal source of data. Tertiary sales are got through ORG-data. By using MS Excel (2003), multiple regression is fit to the data and the forecasts are generated for the future months. Forecast accuracy is calculated as per DPC methodology. These results are compared with the results of the existing systems. It has shown a significant increase in forecast accuracy. A model of demand forecasting at regional level is also made. But it can neither be analyzed nor be validated due to time constraint. It is recommended to implement model for demand forecasting with the personnel intervention and with the addition of due insights. 5
  • 6. Torrent Group: Overview It all began with the inspired efforts of one enterprising individual, Shri U N Mehta, when he ventured on his own to create history in the Indian Pharmaceutical industry, by successfully implementing the concept of niche marketing. With the launch of Trinicalm Plus, an effective tranquilizer, the foundation of the company was laid as ‘Trinity Laboratories, which was later, renamed ‘Torrent ‘. Today Torrent is one of the leading pharmaceutical companies of India. Torrent is multifaceted and dynamic group dedicated to transforming life by serving two of its most critical needs- healthcare and energy. In the power sector, the Torrent Group remains the most experienced private sector player in the state of Gujarat. Torrent just lunched a mega project, the 1100 MW SUGEN CCPP, being set up at an investment of Rs. 3096 crores, is a backward integration move of Torrent Power to secure a reliable source of supply for its Ahmedabad and Surat distribution areas. The project is strategically located. It is close to River Tapi, National Highway No.8, gas supply infrastructure comprising LNG terminals and main gas trunk lines The plant would comprise of 3 advanced class gas turbines with a high operating efficiency. Environmental and social impact of this project is minimal due to use of eco-friendly Natural Gas The flagship company of Torrent group, Torrent Pharmaceuticals Limited, is a dominant player in the therapeutic areas of cardiovascular (CV) and central nervous system (CNS) and has achieved significant presence in gastro-intestinal, diabetology, anti-infective and pain management segments. To cater to new niche segments and sharpen its focus among customers, Torrent Pharma has ‘11’ marketing divisions, each catering to defined therapeutic segment. Torrent Pharma’s competitive advantage as a manufacturer stems from its world-class manufacturing facilities. Its manufacturing facilities at Indrad, Gujarat, comply with USFDA,WHO, cGMP, MHRA and TGA norms and have received ISO 9001, ISO 14001 and OHSAS 18001 (Occupational Health and Safety Management System) and ISO/IEC- 17025 certifications. With a view to cater to its growth requirements, Torrent Pharma commissioned a new state of art formulations manufacturing facility at Baddi, Himachal Pradesh, in November 2005. The facility has a capacity to manufacture 3600 million tablets, 400 million capsules and 18 million Oral Liquid bottles, per annum and would cater to the domestic formulations requirement. Torrent has a modern and well-equipped state-of-the-art R&D Centre, built with an investment of US $ 40 million. It is manned by more than 525 highly qualified scientists, with a combined experience of over 2500 scientific man-years in Drug Discovery and Development. Torrent Pharma has earmarked 9% of sales year-after-year for R&D advancement. 6
  • 7. In the International operations arena, Torrent Pharma exports to more than 50 countries around the world with over 1000 product registrations. The international business has been broadly divided into five zones- USA, Latin America, Russia and CIS, Western Europe and CEE and Rest of the World (ROW). For its export excellence in International Business, Torrent Pharma has won several prestigious export awards. Torrent Pharma is now gearing up to enter the advanced highly regulated international markets. Torrent Pharma has incorporated Zao Torrent Pharma in Russia, Torrent Do Brasil Ltda in Brazil, Torrent Pharma GmbH in Germany, Torrent Pharma Inc. in USA and Torrent Pharma Philippines Inc. in Philippines. These wholly owned subsidiaries will become a springboard for entry into several regulated and less regulated international markets. TORRENT PHARMACEUTICALS LIMITED Mission: We commit ourselves to total customer care by delivering world –class products and services. Vision To be the leader in the pharmaceutical industry Values A set of core value continue to guide us through the process of transforming the conglomerate into a high-performing and caring organization for our customers, employees, shareholders and society. · Improving quality of life of our customers, as we believe quality is a way of life. · Creating value for our shareholders, for the trust bestowed on us. · Building an empowered and ethical Torrent family, as the foundation for a bright future. · Responsibility towards the society and environment, as we owe our existence to them. · Being innovative in solutions, for being different, counts. · Striving for excellence in whatever we do, to follow the exclusive path to leadership. · Flexibility and speed shall be our oars for navigating the turbulent seas. 7
  • 8. Objective of Project To develop a statistical model of demand forecasting for domestic trade market operations of Torrent Pharmaceuticals Ltd.  @Gross level  @Regional level Salient Features Following are the salient features of the project  It aims to improve the existing demand forecasting process by using a statistical tool  It tries to cover all the quantitative and qualitative factors which affect the actual sales  It takes into consider the uncertain fluctuations and captures them  It discusses the product specific sales behavior  It makes the whole forecasting procedure a dynamic one  It reveals the clear picture of Pharma sector from drug specific to macro level Project Constraints  The project is based on the tertiary sales made available from ORG-Marg data, it may contain inaccuracy up to some extent  The project doesn’t include the secondary sales data  The project may considers the parameters only for which data are available  The project tries to estimate the future values of the parameters Assumptions Made during the project  Data, which are collected, is accurate.  Future estimates of the parameters are true.  Parameters taken into considerations are least correlated  Data collection horizon ranges from May’05 to June’05 8
  • 9. Project overview To study how the demand planning works at SCM dept., To develop a statistical tool for demand forecasting 9 Identifying and defining the parameters Applying various statistical tools Matching output of the model with the past sales Comparison with existing system and suggested system Checking the robustness of the tool Comparison with existing system and suggested system Implementation, if all the criteria are fulfilled for all or partial numbers of SKUs
  • 10. Benefits expected  Minimization of overstocking  Reducing the gap between orders and actual sales  No opportunity loss, which may result into growth  Better inventory control and hence better cash flow  Better utilization of resources  Dispatch efficiency  Smooth operation flow from demand planning to order execution  Prior planning of recruitment and changes in workforce  Proper allocation of promotional budget DEMAND FORECASTING- a brief overview Introduction What does the word forecast mean? The word “fore” means ‘watch out’ in golf and is shouted as a warning to anyone who could potentially be in the path of a misplaced golf ball. The word “cast” to an angler means “throw out.” Putting the two words together, a word is made i.e. “forecast”. That means “watch out and Throw out.’ Forecast management is the process of making, checking, correcting and using forecasts. It also includes determination of the forecast horizon. Forecast- An estimate of future demand. A forecast can be determined by mathematical means using historical data. It can be created subjectively by using estimates from informal sources, or it can represent a combination of both techniques. Forecasting involves making projections about future performance on the basis of historical and current data. Forecast methods can be divided into history-based and future-based ones.  History-based demand forecasts are analytic methods based on consume statistics. They can be further divided into mathematical and graphic methods.  Future-based demand forecasts use already existing information about future demand e.g. offers, confirmed orders in a contracting phase and interviews on customer behavior. (Schönsleben, 1998) In this study, conditional variance models are used for quantifying the demand process uncertainty. The uncertainty can for example be dependent on the level of demand, the previous shocks and the historic level of the variance process. Understanding customer demand is key to any manufacturer to make and keep sufficient inventory so customer orders can be correctly met. The discipline that helps a supply chain forecast and plan well is called as demand planning. 10
  • 11. Accurate and timely demand plans are a vital component of and effective supply chain. Inaccurate demand forecasts typically would result in supply imbalances. Although revenue forecast accuracy is important for corporate planning, forecast accuracy at the SKU level is critical for proper allocation of resources. Types of forecasting  Quantitative forecasting is used when sufficient quantitative information is available.  Qualitative forecasting is used when little quantitative information is available, but sufficient qualitative knowledge exists. Quantitative forecasting can be applied when three conditions exist: 1. Information about the past is available 2. This information can be quantified in the form of numerical data. 3. It can be assumed that some aspects of the past pattern will continue into the future. Under quantitative forecasting methods, there are tow major types of forecasting models: Explanatory models Time series forecasting Explanatory models assume that the variable to be forecasted exhibits an explanatory relationship with one or mote independent variables. Time series forecasting deals with the past data only. It makes no attempt to discover the factors affecting its behavior. The objective of time series forecasting methods is to discover the pattern in the historical data series and extrapolate that pattern into the future. Forecast Management Forecast management is the process of making, checking, correcting and using forecasts. It also includes determination of the forecast horizon. While designing a forecasting system, the policy issues of what to forecast, why forecast is needed, and who does the forecasting must be addressed. A forecast is meaningful only in relation to planning and decision making in some area of business application. Thus, an important aspect of any forecasting system is knowing and planning how it will be used in business planning, budgeting, and the operations aspects of master scheduling and inventory planning. Different attributes of the forecasting system of varying levels of concern and interest to people in each of these areas. The basic steps in a forecasting task Forecasting is a five steps sequential process for which quantitative data is available. Step 1: Problem Definition Step 2: Gathering information Step 3: Preliminary (exploratory analysis) Step 4: Choosing and fitting models Step 5: Using and evaluating a forecasting model 11
  • 12. Problem definition The definition of problem involves developing a deep understanding of how the forecasts will be used, who requires the forecasts, and how the forecasting function fits within the organization. It is worth spending time talking to everyone who sill be involved in collecting data, maintaining databases, and using the forecasts for future planning. A forecaster has a great deal of work to do to properly define the forecasting problem, before any answers can be provided. One need to know exactly wha products are stored who uses them, how long it takes to produce each item, what level of unsatisfied demand the company is prepared to bear, and so on. Gathering information The information available can be mainly of two types: 1. Statistical data 2. The accumulated judgment and expertise of key personnel Exploratory analysis By calculating simple statistics like mean, standard deviation, correlation, minimum, maximum, percentiles associated with each set of data. On having more than one series of historical data, one can use descriptive statistics for exploration. The purpose of doing this at this stage is to get a feel for the data. Do they follow consistent patterns? Is there evidence of the presence of business cycles? Are there any outliers in the data that need to be explained by those with expert knowledge? How strong are the relationships among the variables available for analysis? Choosing and fitting models After doing the exploratory analysis, it can be understood that how to handle the data. What pattern and what behavior is being observed? One can understand that what are the things that affect the actual sales? So, it is the stage when one can choose the model which is to be fitted. One can interpret the characteristics of the actual past data. And can also determine which model can be chosen? One has to match the assumption of the specific models with the data. After choosing the model, one should fit it to the data. If necessary than it should be modified accordingly. Using and evaluating a forecasting model After fitting the model with the actual data, inference can be derived. Accordingly one can have the forecasts as per the model for the future data. It should be checked by holding one month actual data, and giving the forecast. After getting that forecast, it should be compared with the data. Forecast effectiveness (forecast accuracy) should be calculated. If that is better than the present system, it should be used. 12
  • 13. SCM @ TORRENT Supply Chain Management coordinates entire channel from supplier to customer. Supply Chain Management is the management of the entire value-added chain, from the supplier to manufacturer right through to the retailer and the final customer. Supply chain management coordinates almost all the departments of the company. It links the departments and smoothens the whole system. SCM has three primary goals: · Reduce inventory, · Increase the transaction speed by exchanging data in real-time, · Increase sales by implementing customer requirements more efficiently. Planning done at SCM is the indicator for all the other departments i.e. Production, finance, marketing and HR also. Torrent’s supply chain management is mainly bifurcated in to two divisions, i.e. Domestic operations division International operations divisions Domestic operations division is bifurcated in to following, i.e. C&FA Cell Demand Planning Cell Indrad Warehouse Zirakpur Warehouse PPC- Indrad & Baddi Supply chain management department is well equipped with necessary infrastructure. It has all the means of modern software, and hardware. To cater and handle the large company multipoint network across India and whole world, SAP is implemented at TORRENT SCM dept., MM module (Material Management module) and PP module (Production planning module) is used by the department personnel. Microsoft excel is used extensively at the department. Various MIS are prepared by using MS Excel from SAP data. Company network Torrent’s corporate office is based at Ahmedabad. All the supporting activities are conducted from the HO (based at Ahmedabad). Two plants are situated at Indrad (Gujarat) and Baddi (Himachal Pradesh). Most of the domestic requirement is served by the Baddi plant. Company has set up its warehouse at Zirakpur (Punjab).Products produced at Baddi ppc are stored at Zirakpur warehouse. All dispatches are done from the warehouse. Company has 25 carrying and Forwarding agents across all over India. C&FAs are responsible for the primary sales in the particular allocated region. C&FAs are the agents which sell the products to the stockiest. They get the orders from the stockiest and that is further put to the supply chain department at HO. 13
  • 14. Again all these activities are coordinated by Supply Chain Department. Company engages in mainly two type of selling. · Trade sales · Institutional sales Sales with trade aspect are the sales done through the channels of C&FAs. While institutional sales are the sales to the institutions like hospitals, railways, army etc… 14
  • 15. SCM@HO 15 Demand planning Baddi ppc Indrad PPC Indrad Warehouse Zirakpur Warehouse Warehouse to C&FA C&FAs C&FAs C&FAs C&FAs Primary sales Stockist Institutions Stockist Retailer Secondary sales Retailer Retailer Customer Tertiary sales Stock Transfer Inter C&FA Company network
  • 16. Products Company has a product basket consisting 500+ products. Company produces products in the form of Tablets, capsules, liquid and injections. Each product is allocated a unique 7 digit product code. From marketing point of view, there are 11 divisions made; accordingly the drugs are allocated to the divisions. Sensa, Mind, Axon, Neuron, Azuca, Psycan, Omega, Delta, Prima, Vista, Alfa These division again are classified into three groups; PVA, APOD, SMAN Where; PVA= Prima, Vista, Alpha (Anti Infective segment) APOD= Azuca, Psycan, Omega, Delta (Cardiology and Diabetology) SMAN=Sensa, Mind, Axon, Neuron (Central Nervous System) Product Classification These products show different behaviors in selling quantity. Accordingly one should also classify as, · Matured (stable) products · Seasonal product · New products Matured products are the products which are there in a market since longtime. They show the particular pattern and do not show significant deviation. One can understand the fluctuation. They reflect clearly the stable pattern. E.g. Nikoran 5, Deplatt tab, Antidep Seasonal products are the products which show particular seasonal behaviors. Sales goes high in particular season i.e. in particular month. By having more than one cycle i.e. a year, it can also be estimated that amount hike due to the particular season. There are certain products which depend on the season. E.g. Quintor Infusion is the product which has shown high sales in the month of April, May. New Products are the products which are launched within 6 months. It is not easy to estimate its behavior. By having less data, one can not capture the trend and the amount of deviation. So, it is not that easy to capture the fluctuation in the selling quantity. e.g Rimofit, Rimoslim are the product just launched in the month of May’07. 16
  • 17. Demand Planning @SCM Dept., Demand planning is the process through which an organization generates a forecast of market demand for its products on a regular basis. This allows the organization to calculate a historically based statistical forecast for each point (that is, part number/warehouse combination). Some key output variables include demand in pieces, demand in customer orders, pieces per customer order; standard (forecast) deviation, and pieces per deviation. At Torrent, there is a separate demand planning cell under the SCM dept., which conducts the demand planning on the basis of 4 months rolling plan. Under the rolling plan, planning is done 4 months prior to the corresponding month. Planning includes the demand forecasting, production planning, Supply planning, and dispatch planning. Demand plan is first given by the marketing department. And then it is to be reviewed by the demand planning cell. For every product in each division, demand plan is reviewed, and corrected if needed. On the 20th day of every month M Demand planning is done by the demand planning cell for the month M+3 . After deciding demand plan, it is being executed by the related departments of the company in a very sequential manner and in a very structured way. All the planning like production planning, financial planning. Dispatch planning, procurement planning is made accordingly. Company produces most of the products at in-house facilities i.e. at the Indrad and Baddi plants. While for certain products, company has P2P and LLM arrangements. P2P is principle to principle arrangement, in which the products made by other companies, are marketed by TORRENT. Drug license and manufacturing licenses must be had by that company. Torrent need not to have drug license and manufacturing license. There are approx 230 products which are received from P2P. . LLM is Loan License manufacturing, in which the company uses the plant of other companies. But TORRENT uses the facilities of others’. Torrent must have a drug license and manufacturing license of that particular drug. There are approx 38 products which are received from LLM. It is very complex task to forecast for the products which are not produced in house, as it has a longer lead time than the products produced in-house. 17
  • 18. Demand Planning deals with these arrangements. They are responsible for getting the products in time and for planning its demand, dispatches at the right point of the time at least cost. Due to certain circumstances, it is not possible to execute all the orders got from the stockiest. There are situations when they are not able to connect the stock as per the order. It generally happens due to certain situation like non availability of raw material, machine breakdown, transportation problems, or due to sudden excess demand. In some cases, it can be known in advance that a particular product may not be available for the coming month. So that product is declared as Non available product, which is abbreviated as NAP. This can be the genuine sales, if proper demand planning. On the beginning of every month, every aspect of the past month is analyzed and proper justification is done to the particular aspect. Certain reports like Gap report, Nap report, inventory analysis report, connectivity report etc… are prepared. Planning Horizon-4 months rolling plan (Tentative plan) Solid rock July Solid August Slushy September Liquid October M M+1 M+2 M+3 Let’s consider the month of June’07 as a reference point. According to the 4 months rolling plan, in the month of June, demand planning is made for the coming 4 month. As it is a continuous process, a new month is added every month. Status is changed for the consequent months. Next two consecutive months are considered as a frozen. That means in the month ‘M- 1’, demand plan is made fixed for the next two months i.e. ‘M’ and ‘M+1’. It can not be changed in the status of solid rock, solid status. While the planning for the 3rd and 4th month is made tentative. Status of these months is Slushy and Liquid. In tentative plan, demand can be changed as per the constraints. In the same manner, the status of month ‘M’ and ‘M+1’ were tentative in ‘M-3’. Status of every month is changed on arrival at the new month. 18
  • 19. Existing system of forecasting Torrent’s domestic operation system work on make to stock basis. Products are manufactured prior to the orders are received at C&FA.Hence there is a need to forecast the sales in advance. Optimum quantity should be produced to serve the market. At Torrent, Existing system of forecasting doesn’t use the specific statistical tool. Forecasting process is performed based on the past data & statistics like average sales, minimum, maximum sales and orders are taken into consideration. Division vise demand plan is prepared by marketing department on the basis of field target. This plan is reviewed by the demand planning cell. So, according to the schedule of rolling plan, the demand plan is made. Demands (forecasts) are generally predicted on the basis of past data. Past behavior of the resent months along with the general trend is considered to forecast. Field targets given to the sales force also are taken in to considerations. That means quantitative data is considered. Certain factors like epidemics, seasonal effect and the some visible factors are taken care of. Visible factors include the competitor’s move, market behavior, and authoritarian factors. These factors are the qualitative data. Qualitative data should be quantified in a particular manner. Considering all these factors, forecasts are put forward. Present system works more on the judgment, no particular statistical tool is applied. So, it has not been able to capture all these factors precisely. Fluctuations can not be quantified in the proper proportion. There may be a bias in estimation and quantification of these parameters. These all results in to forecast which doesn’t match exactly with the actual sales. Forecasts made do not fit to the actual data. Poor forecast accuracy will result into · Dispatch inefficiencies. · Loss of genuine sales · High inventory, so does the blockage of working capital · High lead time It’s must to have good forecast accuracy. Forecast accuracy here is less, which needs to be improved. Hence, there is a need to develop a system (model), which takes care of all the concerned factors. All the factors are needed to be understood and are to be quantified properly. How a single factor affects different SKUs in different manner. 19
  • 20. By demand planning cell, a file named CODIS is prepared, which is Correlation among orders, demand, Inventory and sales. From the SAP, for every product a data is available which gives the demand, orders got, sales, and the total availability. By this file, it is tried to analyze the actual scenario, to what extent orders are executed. %Variation of demand to sales and % variation to orders is calculated. That shows how the demand is close or away from the actual sales and orders. Graph shown on the next two pages are the graphs, showing the status of orders, demand, sales and stock. And the other is showing the % variation demand to sales and % variation demand to orders with the corresponding trend lines. The graph given on the next page is for the product Alprax, 0.5 tabs, which composites the molecule Alprazolam, which belongs to the class Tranquilizers. 20
  • 21. Forecast Accuracy Jun'05 - May'07 2000000 1800000 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0 Quantity (Units) June '05 July '05 Aug '05 Sept'05 Oct '05 Nov '05 Dec'05 Jan '06 Feb '06 March '06 April '06 May '06 June '06 July '06 Aug '06 Sept'06 Oct'06 Nov '06 Dec'06 Jan'07 Feb'07 Mar'07 Apr'07 May'07 June'07 Demand 950010 900120 850000 860000 772000 855000 820000 730000 750000 700000 850000 800000 900000 950000 900000 900130 670000 600130 500130 500000 450000 375000 525000 525000 550000 Orders 878172 755927 795987 885868 665098 749917 773628 691974 616661 686613 940231 913461 962695 839190 855761 741450 490813 538989 492931 534135 558444 428911 693125 607044 596129 Sales 562903 499151 525658 590579 443185 499145 515552 459476 411027 432582 613461 604213 621979 557786 568361 476396 486713 535429 491120 527857 533434 420533 668147 580078 574958 TA @ CFA 1431263 1449613 1413207 1364700 1322452 1577114 1412617 1138279 933865 1319114 1727856 1282504 1235664 1275416 1552737 767560 687486 387987 792249 261937 667944 610092 893657 904710 734085 Demand Orders Sales TA @ CFA 21
  • 22. Tracking Forecast Acccuracy 40.00% 20.00% 0.00% -20.00% -40.00% -60.00% June '05 July '05 Aug '05 Sept'05 Oct '05 Nov '05 Series1 -40.75%-44.55% -38.16% -31.33% -42.59% -41.62% -37.13% -37.06%-45.20%-38.20%-27.83%-24.47%-30.89%-41.29% -36.85%-47.07%-27.36%-10.78% -1.80% 5.57% 18.54% 12.14% 27.27% 10.49% Series2 -7.56% -16.02% -6.35% 3.01% -13.85% -12.29% -5.66% -5.21% -17.78% -1.91% 10.62% 14.18% 6.97% -11.66% -4.92% -17.63% -26.74% -10.19% -1.44% 6.83% 24.10% 14.38% 32.02% 15.63% Month % Variation Series1 Series2 Linear (Series2) Linear (Series1) Dec'05 Jan '06 Feb '06 March '06 April '06 May '06 June '06 July '06 Aug '06 Sept'06 Oct'06 Nov '06 Dec'06 Jan'07 Feb'07 Mar'07 Apr'07 May'07 The above given is the graph showing %variation of sales and orders to demand 22
  • 23. Forecast accuracy at TORRENT with the present system It is said to an accurate forecast, if; Sales= (90% to 110% of the forecast) Demand planning cell, at TORRENT, calculates the forecast accuracy on the beginning of the month for the past month. Forecast accuracy, at the gross level and C&FA level, are calculated. An actual sale during the last month is compared to the projected demand of the corresponding product and corresponding month. The deviation of actual sales from the demand is calculated. Let’s consider for the ‘X’ product, the actual sales are ‘Yt’ and accordingly the forecast for the same is ‘Ft’. Then the deviation is calculated by the formula, (Yt-Ft)/Ft. This will give us the % deviation of demand to sales. At TORRENT, a range is defined for the specification of the forecast accuracy. A forecast is considered to be a HIT, if it fluctuates within the range of the +/-10% range, otherwise miss. MS Excel is used for the purpose. By the present system, it has shown less accurate results. There is a need to work on the demand forecasting. Present system is efficient when it comes to the stable, fast moving and matured products. Present system takes care of the products, which have shown high skewness due to promotions and schemes. Present system can estimate well the sales of the product which are to be launched. Demand planning cell also interacts with the marketing people about the product behavior on line extension. Demand planning cell does the well job in estimating fluctuation of the forthcoming incidents, which can be known in advance. Present system has its own unique features. System is well defined and well designed. It is a foolproof system. Defining parameters Parameters are the factors, which directly or indirectly affect the actual sales. These factors are needed to be identified. What types of factors affect the actual sales? Factors which have a direct effect and indirect effect should be explored out. By the process of exploration one can have a list of parameters. Then it needs to be sort out in way to get the parameters which have a significant impact on it. There are the statistical methods to check the significance of various parameters on the actual sales. These can be the factors which can affect the actual sales. 23
  • 24.  Trend  Total availability of the SKU  Seasonal factors i.e. for months  Promotions and schemes  Price sensitivity  Market share  Market growth of the SKU  Market growth of the molecule  Market growth of the brand  Market growth of the molecule class  Market growth expected by the organizations  Additional duties, taxes levied by government  Introduction of new drugs by competitor in the same segment  Line extension by company  Introduction of new drugs by company in the same segment  Regional factor  Drugs with Same molecule  Same Products with the different power  Institutional sales  Sales force  Field Targets  Secondary sales  No. of stockiest  Tertiary sales  No. of retailers  Miscellaneous factors (Epidemic, Billing channel, Government factors, availability, orders etc…) Above stated can be the factors which can have significant impact on the actual sales. There must be a proper selection of the parameters for having accurate and close forecasts. Matching the results After developing an appropriate model, models should be applied on the past data. Forecast for the past data should be done. It should be compared with the actual past data to verify the reliability and validity of the model. Various other statistical tools can be used to check for the same purpose. Comparing it with the present system After developing the models, it is necessary to compare it with the present system. If it gives better results than the present one or not. Comparison should be on the basis of various aspects, it should give reliable and consistent results. Does it have an impact on inventory level? Does it have an impact on profitability? Can it make the whole system smoother? 24
  • 25. Is it Robust? Models should give the accurate results in any situation. If it gives the proper forecast in any situation, then it should be implemented. Model should capture the fluctuation. It should react to the adjustment done on foreseeing certain factors. Model has to be robust. It should be flexible towards the changes done. And it should react accordingly. Implementation After inspecting all the criteria, one should validate the model. If it gives reliable, consistent and precise results and have a significant impact on the topics of concern. Then it should be implemented. It should be used for the future. Statistical tool Tools which can be considered are Time series Exponential smoothing Multiple regressions Many forecasting methods are based on the concept that when an underlying pattern exists in a data series, that pattern can be distinguished from randomness by smoothing (averaging) past values. The effect of smoothing is to eliminate randomness so the pattern can be broken down into sub patterns that identify each component of the time series separately. Such a breakdown can frequently aid in better understanding the behavior of the series, which facilitates improved accuracy in forecasting. Time series decomposes the data in to the sub patterns. It analyzes the data and separates the effects of the components. Data= pattern error =f (trend-cycle, seasonality, error) But here at Torrent, there is a product basket having 500+ products. Each has a different behavior to behave. There are several factors which affects the overall dimensions. It is not enough to use time series. As it captures the trend, seasonality and error. To analyze and determine the trend, seasonality and level which is followed by the data, Triple Exponential Smoothing is applied. On the basis of the assumption and the methodology of the model, one can fit the model to the past data. And accordingly the forecasts for the coming period are got on the basis of past data. Data Availability There is a 24 months data available, which gives the monthly primary sales of past 24 months i.e. From May’05 to May’06. Data available is of two complete cycles, which is the least requirement of applying triple exponential smoothing. Primary sales are the sales done through the channels of CFAs. But it also includes the institutional sales, which is to be nullifying later. Data for institutional sales are got from the SAP as a dump for the same period as stated above. 25
  • 26. Exponential Smoothing A model is an extension of moving average method and uses weighted moving average. In this particular method, weights are allocated to the past data and the recent data. A class of methods that imply exponentially decreasing weights as the observations get older. This method has the property that recent values are given relatively more weight in forecasting than the older observations. Triple Exponential Smoothing (Holt Winters multiplicative model) Holt’s method of exponential smoothing is developed by Winters (1960) to capture seasonality. It considers (1) Deseasonalized level (2) Trend (Growth) level (3) Seasonality Let’s consider the, Original data i.e. monthly sales as Yt . Deseasonalized factor Rt Trend factor (Growth factor) Gt Seasonal factor St Forecast Ft As monthly data is available for 24 months, we have two complete cycles. Data is available from June’05 to May’07. In the table given on the next page shows the 3rd column having these data. To get the level and trend, one should apply the linear regression. In linear regression Equation, Y=a+bX; Y= actual sales a= intercept (Rt) b= Growth (Gt) After getting the deseasonalized level and growth factor, seasonal factor is calculated. Seasonal factor= Actual sales of the corresponding month Forecasted sales for the same month by linear regression By this one can have the seasonal factor. If it is greater than 1 than it is showing that amount of higher sales due to season. If it is less than 1 than it is showing that amount of less sales due to the season. Equations for the Holt-Winters’ method are as follows; Level: Rt = α*Yt + (1-α)*(Gt-1+Rt-1) St-s Trend: Gt=β*(Rt-Rt-1) +(1-β)*Gt-1 26
  • 27. Seasonal: St=γ*Yt+ (1-γ)*St-s Rt Forecast: Ft= (Rt +Gt*X)*St-s-x Here α, β, γ are the smoothing constant,0< α, β, γ<1. These values are chosen by the forecaster as per the feasibility of the data. There can be a bias in initializing the values of the smoothing constants. And it has been observed that α, β, γ=0.5 gives the favorable results. But to remove the bias of initializing the method is modified. So that it gives the same results as per the above calculation. The modified method is as follows: Rather than using the smoothing equation for the trend, level and seasonal factors by the above equation. One should fixed the trend and the level factor as it is got by the linear regression. It should be held constant for every month i.e. for the past months as well as the coming months. For seasonal indices of the future months, one should consider the average of the same corresponding months of the past cycles. This makes the calculations easy for the value of all the smoothing constants as 0.5. So below given is the forecast for the two drugs Nikoran 5 Mg tab and Torleva 500. Last column indicates the %variation between the forecast and the actual sales. For the past months, it has shown very less variation i.e.+/-10% 27
  • 28. Nikoran 5 Mg Tab 28
  • 29. 29
  • 30. Torleva 500 month sales (Yt) Yt^(deseasonalized factor) Rt(level) Gt(trend) seasonal factor seasonal indices forecasted demand(Ft) % variation 108762.3 420.073 June '05 117586 109633 1.072542 118193.3 -0.51651 July '05 106675 109991.6 0.969846 113104.7 -6.02741 Aug '05 110095 110350.3 0.997687 113275.7 -2.88902 Sept'05 108912 109205.4 0.997314 108723.6 0.173017 Oct '05 97303 111067.5 0.876071 101156 -3.95981 Nov '05 111810 112921.2 0.99016 110736.4 0.96022 Dec'05 100237 111784.7 0.896697 106306.8 -6.05549 Jan '06 116430 112143.3 1.038225 119867.4 -2.95235 Feb '06 106153 112501.9 0.943566 107077.4 -0.87078 March '06 87058 119773.7 0.726854 79559.33 8.613413 April '06 147642 113219.2 1.304037 139645.1 5.416415 May '06 126534 113577.8 1.114074 123260.7 2.586905 June '06 124478 113936.4 1.092522 123650.3 0.664976 July '06 125046 114295 1.094063 118306.7 5.389457 Aug '06 120342 113375.1 1.06145 118465.6 1.559224 Sept'06 111741 115012.2 0.971557 113686 -1.74062 Oct'06 109466 115370.9 0.948818 105755.5 3.389605 Nov '06 115732 115729.5 1.000022 106138.8 8.289126 Dec'06 112807 112057.2 1.006691 111104.2 1.509473 Jan'07 128082 116446.7 1.09992 125256.5 2.206027 Feb'07 112052 116805.3 0.959306 111873.4 0.159366 Mar'07 79875 117163.9 0.681737 83109.6 -4.04958 Apr'07 136233 117522.5 1.159207 145853.6 -7.06184 May'07 124027 117881.2 1.052136 128720.5 -3.78424 June'07 118239.8 1.082532 129107.2 july'07 118598.4 1.031955 123508.7 aug'07 118957 1.029568 123655.6 sept'07 119315.6 0.984436 118648.4 Oct'07 119674.2 0.912445 110355.1 Nov'07 120032.8 0.995091 120768.7 Dec'07 120391.5 0.951694 115901.6 Jan'08 120750.1 1.069072 130645.6 Feb'08 121108.7 0.951436 116669.5 Mar'08 121467.3 0.704296 86659.9 Apr'08 121825.9 1.231622 152062.1 May'08 122184.5 1.083105 134180.3 30
  • 31. 31
  • 32. month sales (Yt) Yt^(deseasonalized factor) Rt(level) Gt(treind) seasonal factor seasonal indices forecasted demand(Ft) % variation 7530.833 232.21 June '05 5580 7763.043 0.71879 7073.096 -26.758 July '05 8250 7995.253 1.031862 8739.312 -5.93105 Aug '05 8250 8227.463 1.002739 8242.473 0.091238 Sept'05 9165 8459.673 1.083375 8108.181 11.53103 Oct '05 8280 8691.883 0.952613 7685.767 7.176727 Nov '05 9310 8924.093 1.043243 9216.261 1.006867 Dec'05 9835 9156.303 1.074123 9020.762 8.278982 Jan '06 9820 9388.513 1.045959 9863.728 -0.4453 Feb '06 7760 9620.723 0.806592 8007.905 -3.19465 March '06 6212 9852.933 0.630472 6937.039 -11.6716 April '06 15723 10085.14 1.559026 14403.85 8.389929 May '06 12660 10317.35 1.227059 11451.72 9.544068 June '06 11641 10549.56 1.103458 9611.962 17.4301 July '06 12445 10781.77 1.154263 11785.15 5.30211 Aug '06 11024 11013.98 1.000909 11034.08 -0.0914 Sept'06 9374 11246.19 0.833527 10778.92 -14.9874 Oct'06 9365 11478.4 0.81588 10149.74 -8.37947 Nov '06 11971 11710.61 1.022235 12094.01 -1.02756 Dec'06 10704 11942.82 0.896271 11766.03 -9.92184 Jan'07 12848 12175.03 1.055274 12791.29 0.441369 Feb'07 10647 12407.24 0.858128 10327.29 3.002793 Mar'07 9829 12639.45 0.777644 8898.912 9.462696 Apr'07 16700 12871.66 1.297424 18383.63 -10.0816 May'07 13010 13103.87 0.992836 14544.61 -11.7956 June'07 0.911124 12150.83 july'07 1.093063 14830.99 aug'07 1.001824 13825.68 Sept'07 0.958451 13449.67 Oct'07 0.884246 12613.71 Nov'07 1.032739 14971.76 Dec'07 0.985197 14511.3 Jan'08 1.050617 15718.86 Feb'08 0.83236 12646.68 Mar'08 0.704058 10860.78 Apr'08 1.428225 22363.41 May'08 1.109948 17637.5 32
  • 33. Domstal Tab sales (y) Y^(deseaso nalized factor) Rt(level) Gt(trend) seasonal factor seasonal indices forecasted demand % variation 908464.975 - 19904.11 June '05 1601814 888560.8633 1.802 2159685 -35% July '05 1127094 868656.752 1.297 1150926 -2% Aug '05 754860 848752.6407 0.889 1247370 -65% Sept'05 780661 828848.5294 0.941 486365 38% Oct '05 366338 808944.4181 0.452 314832 14% Nov '05 519499 789040.3068 0.658 980747 -89% Dec'05 1153666 769136.1955 1.499 1007405 13% Jan '06 93867 749232.0842 0.125 232877 -148% Feb '06 161767 729327.9729 0.221 241359 -49% March '06 207495 709423.8616 0.292 273507 -32% April '06 551717 689519.7503 0.800 680936 -23% May '06 521824 669615.639 0.779 849166 -63% June '06 1987064 649711.5277 3.058 1579151 21% July '06 851742 629807.4164 1.352 834463 2% Aug '06 1250256 609903.3051 2.049 896345 28% Sept'06 136720 589999.1938 0.231 346209 -153% Oct'06 185576 570095.0825 0.325 221874 -20% Nov '06 1005490 550190.9712 1.827 683866 32% Dec'06 593723 530286.8599 1.119 694563 -17% Jan'07 253332 510382.7486 0.496 158637 37% Feb'07 215842 490478.6372 0.440 162316 25% Mar'07 225209 470574.5259 0.478 181422 19% Apr'07 529518 450670.4146 1.174 445060 16% May'07 756852 430766.3033 1.756 546272 28% June'07 2.43054243 998618 july'07 1.32494925 518000 aug'07 1.46965034 545320 sept'07 0.58679561 206053 Oct'07 0.38918846 128917 Nov'07 1.24296128 386986 Dec'07 1.30978815 381721 Jan'08 0.31082059 84398 Feb'08 0.33093342 83273 Mar'08 0.38553344 89338 Apr'08 0.98755149 209184 May'08 1.26813932 243377 A graph showing correlation among forecast, sales, orders, stock available for the Domstal tab. 33
  • 34. Forecast accuracy June'05-May'07(sales&orders) 3000000 2500000 2000000 1500000 1000000 500000 0 June '05 July '05 Aug '05 Sept'05 Oct '05 Nov '05 Dec'05 Jan '06 Feb '06 March '06 April '06 May '06 June '06 July '06 Aug '06 Sept'06 Oct'06 Nov '06 Dec'06 Jan'07 Feb'07 Mar'07 Apr'07 May'07 sales 1601814 1127094 754860 780661 366338 519499 1153666 93867 161767 207495 551717 521824 1987064 851742 1250256 136720 185576 1005490 593723 253332 215842 225209 529518 756852 forecasted demand (sales) 21596851150926 1247370 486365 314832 980747 1007405 232877 241359 273507 680936 849166 1579151 834463 896345 346209 221874 683866 694563 158637 162316 181422 445060 546272 orders 20252871328359782084 781741 371538 526339 1163220 93867 161863 207591 556421 525592 2037925864498 1256448 140196 191010 1073078 603124 260148 232799 228318 560602 795742 forecasted demand (order) 247724612926221315619 495389 330171 10866511051971 253379 269189 292252 751399 951116 1723485889079 893926 332245 218369 708007 674449 159645 166439 177034 445103 549776 sales forecasted demand (sales) orders forecasted demand (order) A graph showing the variation of forecast to sale and orders 34
  • 35. Tracking forecasting accuracy 100% 50% 0% -50% -100% -150% -200% -250% June '05 July '05 Aug '05 Sept'0 5 Oct '05 Nov '05 Dec'0 5 Jan '06 Feb '06 Marc h '06 April '06 May '06 June '06 July '06 Aug '06 Sept'0 6 Oct'0 6 Nov '06 Dec'0 6 Jan'0 7 Feb'0 7 Mar'0 7 Apr'0 7 %variation sales Vs Demand -35% -2% -65% 38% 14% -89% 13% -148% -49% -32% -23% -63% 21% 2% 28% -153% -20% 32% -17% 37% 25% 19% 16% 28% %variaton orders Vs Demand -22% 3% -68% 37% 11% -106% 10% -170% -66% -41% -35% -81% 15% -3% 29% -137% -14% 34% -12% 39% 29% 22% 21% 31% % variation sales Vs past forecat 52% -46% -59% -28% -118% -54% -4% -220% -24% 13% 0% -5% 30% -41% -12% -47% -8% 10% -136% 41% 31% 33% 6% 34% %variation orders Vs past forecast 62% -24% -53% -28% -115% -52% -3% -220% -24% 13% 1% -5% 31% -39% -11% -43% -5% 16% -132% 42% 36% 34% 11% 37% month % variation %variation sales Vs Demand %variaton orders Vs Demand % variation sales Vs past forecat %variation orders Vs past forecast Linear (%variation sales Vs Demand) Linear (%variaton orders Vs Demand) May'0 7 In the above SKU, it has shown much fluctuation in the past forecasts. But this model works on some basic assumption and hence limitations; 35
  • 36. It needs data of two cycles, but TORRENT has many products that are launched after that. This means that this method fails with products having less data. This method concentrates only on 3 parameters which are very less. As there are many other probable factors which affect the actual sales. So, the method will not be able to give the accurate results. Method may also contain certain biases as the constants are initialized by the forecaster. So it is not advisable to carry on with the triple exponential method for forecasting. A more robust, flexible, and inclusive model is needed to be chosen and fitted to the data. Need of another method Another method must be applied, which can include every parameter affecting the actual sales. · A method which is adjustable to any change regarding the parameters. · One which gives very significant results. · One which gives elaborate explanations about the steps taken. · The method which gives less error. · One, which increases the forecast accuracy and effectiveness to the significant level. · A new method should be an inclusive one. Later when a new parameter is identified, it should be able to consider it. Multiple Regressions with ‘n’ factors General Purpose The general purpose of multiple regressions (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. . Overview Multiple regression, a time-honored technique going back to Pearson's 1908 use of it, is employed to account for (predict) the variance in an interval dependent, based on linear combinations of interval, dichotomous, or dummy independent variables. Multiple regression can establish that a set of independent variables explains a proportion of the variance in a dependent variable at a significant level (through a significance test of R2), and can establish the relative predictive importance of the independent variables (by comparing beta weights). Power terms can be added as independent variables to explore curvilinear effects. Cross-product terms can be added as independent variables to explore interaction effects. One can test the significance of difference of two R2's to determine if adding an independent variable to the model helps significantly. Using hierarchical regression, one can see how most variance in the dependent can be explained by one or a set of new independent variables, over and above that explained by an earlier set. Of course, the estimates (b coefficients and constant) can be used to construct a prediction equation and generate predicted scores on a variable for further analysis. The multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's are the regression coefficients, representing the amount the dependent variable y changes when the corresponding independent changes 1 unit. The c is the constant, 36
  • 37. where the regression line intercepts the y axis, representing the amount the dependent y will be when all the independent variables are 0. The standardized version of the b coefficients is the beta weights, and the ratio of the beta coefficients is the ratio of the relative predictive power of the independent variables. Associated with multiple regression is R2, multiple correlation, which is the percent of variance in the dependent variable, explained collectively by all of the independent variables. Multiple regression shares all the assumptions of correlation: linearity of relationships, the same level of relationship throughout the range of the independent variable ("homoscedasticity"), interval or near-interval data, absence of outliers, and data whose range is not truncated. In addition, it is important that the model being tested is correctly specified. The exclusion of important causal variables or the inclusion of extraneous variables can change markedly the beta weights and hence the interpretation of the importance of the independent variables. Key Terms and Concepts The regression equation takes the form Y =bo+ b1*x1 + b2*x2 + e ; Where Y is the true dependent, b's are the regression coefficients for the corresponding x (independent) terms, c is the constant or intercept, e is the error term reflected in the residuals. Sometimes this is expressed more simply as y = bo+ b1*x1 + b2*x2 + e ; Where y is the estimated dependent ‘e’ is the constant (which includes the error term). Equations such as that above, with no interaction effects (see below), are called main effects models. In MS Excel Select Tools, Data Analysis, Regression Analyze, Regression, Linear; select your dependent and independent variables; click Statistics; select Estimates, Confidence Intervals, Model Fit; continue; OK. Predicted values, also called fitted values, are the values of each case based on using the regression equation for all cases in the analysis. In SPSS, dialog boxes use the term PRED to refer to predicted values and ZPRED to refer to standardized predicted values. Click the Save button in SPSS to add and save these as new variables in your dataset. Adjusted predicted values are the values of each case based on using the regression equation for all cases in the analysis except the given case. Residuals are the difference between the observed values and those predicted by the regression equation. Interaction effects are sometimes called moderator effects because the interacting third variable which changes the relation between two original variables is a moderator variable which moderates the original relationship. For instance, the relation between income and conservatism may be moderated depending on the level of education. The regression coefficient, b, is the average amount the dependent increases when the independent increases one unit and other independents are held constant. Put another way, the b coefficient is the slope of the regression line: the larger the b, the steeper the slope, the more the dependent changes for each unit change in the independent. The b coefficient is the unstandardized simple regression coefficient for the case of one independent. When there are two or more independents, the b 37
  • 38. coefficient is a partial regression coefficient, though it is common simply to call it a "regression coefficient" also. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get the b coefficients (the default). b coefficients compared to partial correlation coefficients. The b coefficient is a semi-partial coefficient, in contrast to partial coefficients as found in partial correlation. The partial coefficient for a given independent variable removes the variance explained by control variables from both the independent and the dependent, then assesses the remaining correlation. In contrast, a semi-partial coefficient removes the variance only from the independent. That is, where partial coefficients look at total variance of the dependent variable, semi-partial coefficients look at the variance in the dependent after variance accounted for by control variables is removed. Thus the b coefficients, as semi-partial coefficients, reflect the unique (independent) contributions of each independent variable to explaining the total variance in the dependent variable. Dynamic inference is drawing the interpretation that the dependent changes b units because the independent changes one unit. That is, one assumes that there is a change process (a dynamic) which directly relates unit changes in x to b changes in y. This assumption implies two further assumptions which may or may not be true: (1) b is stable for all sub samples or the population (cross-unit invariance) and thus is not an artificial average which is often unrepresentative of particular groups; and (2) b is stable across time when later re-samples of the population are taken (cross-time invariance). t-tests are used to assess the significance of individual b coefficients. Specifically testing the null hypothesis that the regression coefficient is zero. A common rule of thumb is to drop from the equation all variables not significant at the .05 level or better. Note that restricted variance of the independent variable in the particular sample at hand can be a cause of a finding of no significance. Like all significance tests, the t-test assumes randomly sampled data. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get t and the significance of b. Level-importance is the b coefficient times the mean for the corresponding independent variable. The sum of the level importance contributions for all the independents, plus the constant, equals the mean of the dependent variable. Achen (1982: 72) notes that the b coefficient may be conceived as the "potential influence" of the independent on the dependent, while level importance may be conceived as the "actual influence." This contrast is based on the idea that the higher the b, the more y will change for each unit increase in b, but the lower the mean for the given independent, the fewer actual unit changes will be expected. By taking both the magnitude of b and the magnitude of the mean value into account, level importance is a better indicator of expected actual influence of the independent on the dependent. Level importance is not computed by SPSS. The beta weights are the regression (b) coefficients for standardized data. Beta is the average amount the dependent increases when the independent increases one standard deviation and other independent variables are held constant. If an independent variable has a beta weight of .5, this means that when other independents are held constant, the dependent variable will increase by half a standard deviation (.5 also). The ratio of the beta weights is the ratio of the estimated unique predictive importance of the independents. Note that the betas will change if variables or interaction terms are added or deleted from the equation. Reordering the variables without adding or deleting will not affect the beta weights. That is, the beta weights help assess the unique importance of the independent variables relative to the given model embodied in the regression equation. Note that adding or subtracting variables from the model can cause the b and 38
  • 39. beta weights to change markedly, possibly leading the researcher to conclude that an independent variable initially perceived as unimportant is actually and important variable. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get the beta coefficients (the default). Note that the betas reflect the unique contribution of each independent variable. Joint contributions contribute to R-square but are not attributed to any particular independent variable. The result is that the betas may underestimate the importance of a variable which makes strong joint contributions to explaining the dependent variable but which does not make a strong unique contribution. Thus when reporting relative betas, one must also report the correlation of the independent variable with the dependent variable as well, to acknowledge if it has a strong correlation with the dependent variable. Standardized means that for each datum the mean is subtracted and the result divided by the standard deviation. The result is that all variables have a mean of 0 and a standard deviation of 1. This enables comparison of variables of differing magnitudes and dispersions. Only standardized b-coefficients (beta weights) can be compared to judge relative predictive power of independent variables. Note some authors use "b" to refer to sample regression coefficients, and "beta" to refer to regression coefficients for population data. They then refer to "standardized beta" for what is simply called the "beta weight" here. Correlation: Pearson's r2 is the percent of variance in the dependent explained by the given independent when (unlike the beta weights) all other independents are allowed to vary. The result is that the magnitude of r2 reflects not only the unique covariance it shares with the dependent, but uncontrolled effects on the dependent attributable to covariance the given independent shares with other independents in the model. A rule of thumb is that multicollinearity may be a problem if a correlation is > .90 or several are >.7 in the correlation matrix formed by all the independents. The intercept, Variously expressed as e, c, or x-sub-0, is the estimated Y value when all the independents have a value of 0. Sometimes this has real meaning and sometimes it doesn’t — that is, sometimes the regression line cannot be extended beyond the range of observations, either back toward the Y axis or forward toward infinity. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get the intercept, labeled the "constant" (the default). MS EXCEL allows the researcher to check a box to not have an intercept. This is equivalent to forcing the regression line to run through the origin. In rare cases the researcher may know the relation is linear and that the dependent variable is zero when all the independents are zero, in which case the option may be selected. R2, also called multiple correlations or the coefficient of multiple determination, is the percent of the variance in the dependent explained uniquely or jointly by the independents. R-squared can also be interpreted as the proportionate reduction in error in estimating the dependent when knowing the independents. That is, R2 reflects the number of errors made when using the regression model to guess the value of the dependent, in ratio to the total errors made when using only the dependent's mean as the basis for estimating all cases. Mathematically, R2 = (1 - (SSE/SST)), where SSE = error sum of squares = SUM ((Yi - EstYi) squared), where Yi is the actual value of Y for the ith case and EstYi is the regression prediction for the ith case; and where SST = total sum of squares = SUM ((Yi - MeanY) squared). The "residual sum of squares" in SPSS output is SSE and reflects regression error. Thus R-square is 1 minus regression error 39
  • 40. as a percent of total error and will be 0 when regression error is as large as it would be if you simply guessed the mean for all cases of Y. Put another way, the regression sum of squares/total sum of squares = R-square, where the regression sum of squares = total sum of squares - residual sum of squares. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Model fit is checked to get R2. Maximizing R2 by adding variables is inappropriate unless variables are added to the equation for sound theoretical reason. At an extreme, when n-1 variables are added to a regression equation, R2 will be 1, but this result is meaningless. Adjusted R2 is used as a conservative reduction to R2 to penalize for adding variables and is required when the number of independent variables is high relative to the number of cases or when comparing models with different numbers of independents Standard Error of Estimate (SEE), confidence intervals, and prediction intervals. Confidence intervals around the mean are discussed in the section on significance. In regression, however, the confidence refers to more than one thing. Note the confidence and prediction intervals will improve (narrow) if sample size is increased, or the confidence level is decreased (ex., from 95% to 90%). For large samples, SEE approximates the standard error of a predicted value. SEE is the standard deviation of the residuals. In a good model, SEE will be markedly less than the standard deviation of the dependent variable. In a good model, the mean of the dependent variable will be greater than 1.96 times SEE. The confidence interval of the regression coefficient. Based on t-tests, the confidence interval is the plus/minus range around the observed sample regression coefficient, within which we can be, say, 95% confident the real regression coefficient for the population regression lies. Confidence limits are relevant only to random sample datasets. If the confidence interval includes 0, then there is no significant linear relationship between x and y. We then do not reject the null hypothesis that x is independent of y. In SPSS, Analyze, Regression, Linear; click Statistics; check Confidence Limits to get t and confidence limits on b. The confidence interval of y (the dependent variable) is also called the standard error of mean prediction. Some 95 times out of a hundred, the true mean of y will be within the confidence limits around the observed mean of n sampled cases. That is, the confidence interval is the upper and lower bounds for the mean predicted response. Note the confidence interval of y deals with the mean, not an individual case of y. Moreover, the confidence interval is narrower than the prediction interval, which deals with individual cases. Note a number of textbooks do not distinguish between confidence and prediction intervals and confound this difference. In SPSS, select Analyze, Regression, Linear; click Save; under "Prediction intervals" check "Mean" and under "Confidence interval" set the confidence level you want (ex., 95%). Note SPSS calls this a prediction interval for the mean. The prediction interval of y. For the 95% confidence limits, the prediction interval on a fitted value is plus/minus is the estimated value plus or minus 1.96 times SQRT (SEE + S2 y), where S2 y is the standard error of the mean prediction. Prediction intervals are upper and lower bounds for the prediction of the dependent variable for a single case. Thus some 95 times out of a hundred; a case with the given values on the independent variables would lie within the computed prediction limits. The prediction interval will be wider (less certain) than the confidence interval, since it deals with an interval estimate of cases, not means. In SPSS, select Analyze, Regression, Linear; click Save; under "Prediction intervals" check "Individual" and under "Confidence interval" set the confidence level you want (ex., 95%). 40
  • 41. F test: The F test is used to test the significance of R, which is the same as testing the significance of R2, which is the same as testing the significance of the regression model as a whole. If prob(F) < .05, then the model is considered significantly better than would be expected by chance and we reject the null hypothesis of no linear relationship of y to the independents. F is a function of R2, the number of independents and the number of cases. F is computed with k and (n - k - 1) degrees of freedom, where k = number of terms in the equation not counting the constant. F = [R2/k]/[(1 - R2 )/(n - k - 1)]. In MS EXCEL, the F test appears in the ANOVA table, which is part of regression output. Note that the F test is too lenient for the stepwise method of estimating regression coefficients and an adjustment to F is recommended ( Outliers are data points which lie outside the general linear pattern of which the midline is the regression line. A rule of thumb is that outliers are points whose standardized residual is greater than 3.3 (corresponding to the .001 alpha level). The removal of outliers from the data set under analysis can at times dramatically affect the performance of a regression model. Outliers should be removed if there is reason to believe that other variables not in the model explain why the outlier cases are unusual -- that is, these cases need a separate model. Alternatively, outliers may suggest that additional explanatory variables need to be brought into the model (that is, the model needs respecification). Another alternative is to use robust regression, whose algorithm gives less weight to outliers but does not discard them. Multicollinearity is the intercorrelation of independent variables. R2's near 1 violate the assumption of no perfect colinearity, while high R2's increase the standard error of the beta coefficients and make assessment of the unique role of each independent difficult or impossible. While simple correlations tell something about multicollinearity, the preferred method of assessing multicollinearity is to regress each independent on all the other Assumptions Proper specification of the model: If relevant variables are omitted from the model, the common variance they share with included variables may be wrongly attributed to those variables, and the error term is inflated. If causally irrelevant variables are included in the model, the common variance they share with included variables may be wrongly attributed to the irrelevant variables. The more the correlation of the irrelevant variable(s) with other independents, the greater the standard errors of the regression coefficients for these independents. Omission and irrelevancy can both affect substantially the size of the b and beta coefficients. This is one reason why it is better to use regression to compare the relative fit of two models rather than to seek to establish the validity of a single model. Linearity. Regression analysis is a linear procedure. To the extent nonlinear relationships are present, conventional regression analysis will underestimate the relationship. That is, R-square will underestimate the variance explained overall and the betas will underestimate the importance of the variables involved in the non-linear relationship. Substantial violation of linearity thus means regression results may be more or less unusable. Minor departures from linearity will not substantially affect the interpretation of regression output. Checking that the linearity assumption is met is an essential research task when use of regression models is contemplated. Nonlinear transformations. When nonlinearity is present, it may be possible to remedy the situation through use of exponential or interactive terms. Nonlinear transformation of selected variables may be a pre-processing step, but beware that this runs the danger 41
  • 42. of overfitting the model to what are, in fact, chance variations in the data. Power and other transform terms should be added only if there is a theoretical reason to do so. Adding such terms runs the risk of introducing multicollinearity in the model. A guard against this is to use centering when introducing power terms (subtract the mean from each score). Correlation and unstandardized b coefficients will not change as the result of centering. Partial regression plots are often used to assess nonlinearity. These are simply plots of each independent on the x axis against the dependent on the y axis. Curvature in the pattern of points in a partial regression plot shows if there is a nonlinear relationship between the dependent and any one of the independents taken individually. Note, however, that whereas partial regression plots are preferred for illuminating cases with high leverage, partial residual plots (below) are preferred for illuminating nonlinearities. Simple residual plots also show nonlinearity but do not distinguish monotone from nonmonotone nonlinearity. These are usually plots of standardized residuals against standardized estimates of Y, the dependent variable. The plot should show a random pattern, with no nonlinearity or heteroscedasticity. In jargon, this will show the error vector is orthogonal to the estimate vector. Non-linearity is, of course, shown when points form a curve. Non-normality is shown when points are not equally above and below the Y axis 0 line. Non-homoscedasticity is shown when points form a funnel or other shape showing variance differs as one moves along theY axis. Non-recursivity. The dependent cannot also be a cause of one or more of the independents. This is also called the assumption of non-simultaneity or absence of joint dependence. Violation of this assumption causes regression estimates to be biased and means significance tests will be unreliable. No overfitting. The researcher adds variables to the equation while hoping that adding each significantly increases R-squared. However, there is a temptation to add too many variables just to increase R-squared by trivial amounts. Such overfitting trains the model to fit noise in the data rather than true underlying relationships. Subsequent application of the model to other data may well see substantial drops in R-squared. Cross-validation is a strategy to avoid overfitting. Under cross-validation, a sample (typically 60% to 80%) is taken for purposes of training the model, then the hold-out sample (the other 20% to 40%) is used to test the stability of R-squared. This may be done iteratively for each alternative model until stable results are achieved. Unbounded data are an assumption. That is, the regression line produced by OLS can be extrapolated in both directions but is meaningful only within the upper and lower natural bounds of the dependent. Data are not censored, sample selected, or truncated. There are as many observations of the independents as for the dependents. Collapsing an interval variable into fewer categories leads to attenuation and will reduce R2. Absence of perfect multicollinearity. When there is perfect multicollinearity, there is no unique regression solution. Perfect multicollinearity occurs if independents are linear functions of each other (ex., age and year of birth), when the researcher creates dummy variables for all values of a categorical variable rather than leaving one out, and when there are fewer observations than variables. Absence of high partial multicollinearity. When there is high but imperfect multicollinearity, a solution is still possible but as the independents increase in correlation with each other, the standard errors of the regression coefficients will become inflated. High multicollinearity does not bias the estimates of the coefficients, 42
  • 43. only their reliability. This means that it becomes difficult to assess the relative importance of the independent variables using beta weights. It also means that a small number of discordant cases potentially can affect results strongly. The importance of this assumption depends on the type of multicollinearity. In the discussion below, the term "independents" refers to variables on the right-hand side of the regression equation other than control variables. Normally distributed residual error: Error, represented by the residuals, should be normally distributed for each set of values of the independents. A histogram of standardized residuals should show a roughly normal curve. An alternative for the same purpose is the normal probability plot, with the observed cumulative probabilities of occurrence of the standardized residuals on the Y axis and of expected normal probabilities of occurrence on the X axis, such that a 45-degree line will appear when observed conforms to normally expected. The F test is relatively robust in the face of small to medium violations of the normality assumption. The central limit theorem assumes that even when error is not normally distributed, when sample size is large, the sampling distribution of the b coefficient will still be normal. Therefore violations of this assumption usually have little or no impact on substantive conclusions for large samples, but when sample size is small, tests of normality are important. Additivity. Likewise, regression does not account for interaction effects, although interaction terms (usually products of standardized independents) may be created as additional variables in the analysis. As in the case of adding nonlinear transforms, adding interaction terms runs the danger of overfitting the model to what are, in fact, chance variations in the data. Such terms should be added only when there are theoretical reasons for doing so. That is, significant but small interaction effects from interaction terms not added on a theoretical basis may be artifacts of overfitting. Such artifacts are unlikely to be replicable on other datasets. Homoscedasticity: The researcher should test to assure that the residuals are dispersed randomly throughout the range of the estimated dependent. Put another way, the variance of residual error should be constant for all values of the independent(s). If not, separate models may be required for the different ranges. Also, when the homoscedasticity assumption is violated "conventionally computed confidence intervals and conventional t-tests for OLS estimators can no longer be justified" (Berry, 1993: 81). However, moderate violations of homoscedasticity have only minor impact on regression estimates (Fox, 2005: 516). No outliers. Outliers are a form of violation of homoscedasticity. Detected in the analysis of residuals and leverage statistics, these are cases representing high residuals (errors) which are clear exceptions to the regression explanation. Outliers can affect regression coefficients substantially. The set of outliers may suggest/require a separate explanation. Some computer programs allow an option of listing outliers directly, or there may be a "case wise plot" option which shows cases more than 2 s.d. from the estimate. To deal with outliers, the researcher may remove them from analysis and seek to explain them on a separate basis, or transforms may be used which tend to "pull in" outliers. These include the square root, logarithmic, and inverse (x = 1/x) transforms. Reliability: Reliability is reduced by measurement error and, since all variables have some measurement error, by having a large number of independent variables. To the extent there is random error in measurement of the variables, the regression coefficients will be attenuated. To the extent there is systematic error in the measurement of the variables, the regression coefficients will be simply wrong. (In 43
  • 44. contrast to OLS regression, structural equation modeling involves explicit modeling of measurement error, resulting in coefficients which, unlike regression coefficients, are unbiased by measurement error.) Note measurement error terms are not to be confused with residual error of estimate, discussed below. Population error is uncorrelated with each of the independents). This is the "assumption of mean independence": that the mean error is independent of the x independent variables. This is a critical regression assumption which, when violated, may lead to substantive misinterpretation of output. The (population) error term, which is the difference between the actual values of the dependent and those estimated by the population regression equation, should be uncorrelated with each of the independent variables. Since the population regression line is not known for sample data, the assumption must be assessed by theory. Specifically, one must be confident that the dependent is not also a cause of one or more of the independents, and that the variables not included in the equation are not causes of Y and correlated with the variables which are included. Either circumstance would violate the assumption of uncorrelated error. One common type of correlated error occurs due to selection bias with regard to membership in the independent variable "group" (representing membership in a treatment vs. a comparison group): measured factors such as gender, race, education, etc., may cause differential selection into the two groups and also can be correlated with the dependent variable. When there is correlated error, conventional computation of standard deviations, t-tests, and significance are biased and cannot be used validly. Note that residual error -- the difference between observed values and those estimated by the sample regression equation -- will always be uncorrelated and therefore the lack of correlation of the residuals with the independents is not a valid test of this assumption. Independent observations (absence of autocorrelation) leading to uncorrelated error terms. Current values should not be correlated with previous values in a data series. This is often a problem with time series data, where many variables tend to increment over time such that knowing the value of the current observation helps one estimate the value of the previous observation. Spatial autocorrelation can also be a problem when units of analysis are geographic units and knowing the value for a given area helps one estimate the value of the adjacent area. That is, each observation should be independent of each other observation if the error terms are not to be correlated, which would in turn lead to biased estimates of standard deviations and significance. By accepting all the assumptions and understanding the technicalities of the multiple regression model, it has been unanimously decided that multiple regression model should be used. As demand for the pharma products is affected by the various parameters with less or more concentration. So, it has been decided to work to construct the multiple regression model for the demand forecasting. So, there were certain steps to be taken. First of all proper software should be selected to apply the multiple regression model on the product basket of 500+ products. It was found that MS Excel has the facility to apply the multiple regression with using certain number of parameters. Let’s learn first how to use Multiple Regression function in MS Excel. 44
  • 45. Multiple Regression with MS Excel To do regression in Excel, you need the Analysis Toolpak add-in to be installed in Excel. This was an option when you installed Excel, but you might not have selected it. If you didn't install it, Excel will ask you for the CD, when you try to add the toolpak. Check that the add-in is installed, and added-in, by choosing Add-ins from the tools menu (as shown below). Then ensure that "Analysis ToolPak" is selected, as shown below. You can now use the data analysis functions in Excel, which include multiple regression. The example that we will work through is taken from dataset 6.1b in the book "Applying regression and correlation" (if you jumped straight in here, that is what these web pages is about. To get to the data analysis function in Excel, you select the Tools menu, and then choose Data Analysis. 45
  • 46. This gives the following Dialog, click on Regression and then click OK. The following dialog appears: In here, we tell Excel about the data that we would like to analyze. The first box is the input Y range. Here, we tell Excel about our dependent variable. The dependent variable must be a column, 1 cell wide and N cells long (where N is the number of individuals that we are analyzing). The dataset we are using, the dependent variable is An, which is the column which goes from cell D1 to Cell D41. You can either type this information in directly as D1:D41, or you can select the appropriate data from the spreadsheet. Because we have included row 1, which includes the variable name, we are going to have to tell Excel this, by clicking on the "Labels" checkbox. 46
  • 47. The next stage is to input the independent variables. The independent variables must be a block of data, of k columns (where k is the number of independent variables) and N rows (where N is still the number of people). In the dataset we are using we have three independent variables: hassles, hassles2 and hassles3. (These represent the linear, quadratic and cubic effects of hassles - we are analyzing a non-linear relationship here,) These are held in rows 1 - 41 of columns A, B and C. Again, we can type in A1:C41 or select the data from the spreadsheet - it will have the same effect. Next we tell Excel where we want the results to be written. It is best to ask for a new sheet - you don't want to accidentally overwrite some of your precious data, and have to go to all of the effort of restoring it from a backup, do you? (You do have a backup, don't you?) We can ask fro residuals and standardized residuals to be saved - these will be new columns of numbers created in the new spreadsheet. Two types of graphs will be drawn automatically if you ask for them. · A residual plot will draw scatter plots of each independent variable on the x-axis, and the residual on the y-axis. · A line fit plot will draw scatter plots of each independent variable on the x-axis, and the predicted and actual values of the dependent variable on the y axis. · You cannot, as far as I have been able to determine, automatically have · A scatter plot with the predicted values on the x-axis, and the residuals on the y-axis (although you can calculate these values and save them.) You can also request a normal probability plot. This appears to be a plot of the dependent variable, which is a curious thing to plot - regression analysis does not assume normal distribution of the dependent variable. The usual plot of this type would be the residuals, but this is not possible in Excel. The dialog box now looks like this: 47
  • 48. . So, finally, we click OK. And we get a lot of output, written to a new sheet. A note about this output - output from analysis in Excel is usually "live" that is to say, the data are linked to the output. If you change the data, you will change the output. This is not the case for this type of output in Excel. The results of the analysis are "dead" and will not change. Regression Statistics The first part of the output is the regression statistics. These are standard statistics which are given by most programs. ANOVA The ANOVA table comes next. This gives a test of significance of the R2. Note that Excel uses scientific notation, by default, so when it says 2.22E-08 it means, 2.22 * 10- 8 . (i.e. 0.0000000222). ON the next page is shown the summary output given by the regression function in MS EXCEL. 48
  • 49. Summary Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations ANOVA df SS MS F Significance F Regression Residual Total 49
  • 50. Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept X1 X2 X3 X4 X5 X6 X7 … 50
  • 51. RESIDUAL OUTPUT Observation Ft(forecast) Residuals (Yt-Ft) 1 98559.34 704.6626 2 108155.6 -280.247 3 116368.6 -281.312 4 123269.4 -62.7746 5 94083.14 -83.1435 6 110911.2 -241.879 7 102224 -57.3114 8 107602.8 10.49856 9 95990.37 -69.035 10 85130.35 9.64867 11 157103.7 102.9695 12 144048.8 -76.1371 13 119017.1 140.9353 14 129806.9 -32.2203 15 112633.6 76.05105 16 112319.5 -5.18338 17 134574.3 176.6968 18 121418.5 125.4875 19 89674.86 88.14016 20 112742.9 -179.894 21 98022.22 117.7834 22 74801.29 -205.29 23 127936 -25.0449 24 94617.27 11.72969 51
  • 52. Coefficients The next stage is the coefficients. Note that here I have converted the numbers to 2 decimal places to save space). It gives the coefficient for each parameter, including the intercept (the constant). The standard errors, and the t-values follow (the t-value is the coefficient divided by the standard error). Next comes the p-value associated with the variable, and the confidence intervals of the parameter estimates (Excel gave these to me twice, even though I didn't ask for them.) Residuals The final part of the output is the residual information. The observation in the left had column is the case number - although Excel never told us about this, it has labeled the first person Observation 1, the second Observation 2, etc. (Note that this is NOT the original row number - Observation 1 was row 2). The predicted anxiety score is the score that was predicted from the regression equation. The residual is the raw residual - that is the difference between the predicted score and the actual score on the dependent variable. The final value is the standardized residual (the residuals adjusted to ensure that they have a standard deviation of 1; they have a mean of zero already). Graphs finally we will have a quick look at the graphs. The first graph is an example of the residual plots - it has hassles on the x-axis and the unstandardized residual on the y-axis. The second graphs show the predicted and actual anxiety scores plotted against hassles3. 52
  • 53. By using MS Excel it is possible to apply the Multiple Regression function, as stated above. Limitation of Regression function Regression function gives the sheet, which doesn’t change. It is known as a dead sheet. This doesn’t fit into our criteria. A dynamic function is needed which gives the output which changes, as data changes. By a validation list, data is made changed along with the SKUs. By Regression function, we are not getting an output which changes with the SKU. It is not possible to create summary output for the entire product basket. Hence, another function is used to get the changing output. A function called LINEST (Linear Estimation) is used. LINEST Calculates the statistics for a line by using the "least squares" method to calculate a straight line that best fits your data, and returns an array that describes the line. Because this function returns an array of values, it must be entered as an array formula. The equation for the line is: y = mx + b or y = m1x1 + m2x2 + ... + b (if there are multiple ranges of x-values) Where the dependent y-value is a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that LINEST returns is {mn,mn-1,...,m1,b}. LINEST can also return additional regression statistics. Syntax LINEST(known_y's,known_x's,const,stats) Known_y's is the set of y-values you already know in the relationship y = mx + b. 53
  • 54. If the array known_y's is in a single column, then each column of known_x's is interpreted as a separate variable. If the array known_y's is in a single row, then each row of known_x's is interpreted as a separate variable. Known_x's is an optional set of x-values that you may already know in the relationship y = mx + b . The array known_x's can include one or more sets of variables. If only one variable is used, known_y's and known_x's can be ranges of any shape, as long as they have equal dimensions. If more than one variable is used, known_y's must be a vector (that is, a range with a height of one row or a width of one column). If known_x's is omitted, it is assumed to be the array {1, 2,3,...} that is the same size as known_y's. Const is a logical value specifying whether to force the constant b to equal 0. If const is TRUE or omitted, b is calculated normally. If const is FALSE, b is set equal to 0 and the m-values are adjusted to fit y = mx. Statistics are a logical value specifying whether to return additional regression statistics. If stats is TRUE, LINEST returns the additional regression statistics, so the returned array is {mn,mn-1,...,m1,b;sen,sen-1,...,se1,seb;r2,sey;F,df;ssreg,ssresid}. If stats is FALSE or omitted, LINEST returns only the m-coefficients and the constant b. The additional regression statistics are as follows. Statistic Description se1,se2,...,sen The standard error values for the coefficients m1,m2,...,mn. seb The standard error value for the constant b (seb = #N/A when const is FALSE). r2 The coefficient of determination. Compares estimated and actual y-values, and ranges in value from 0 to 1. If it is 1, there is a perfect correlation in the sample— there is no difference between the estimated y-value and the actual y-value. At the other extreme, if the coefficient of determination is 0, the regression equation is not helpful in predicting a y-value. For information about how r2 is calculated, see "Remarks" later in this topic. sey The standard error for the y estimate. F The F statistic or the F-observed value. Use the F statistic to determine whether the observed relationship between the dependent and independent variables occurs by chance. df The degrees of freedom. Use the degrees of freedom to help you find F-critical values in a statistical table. Compare the values you find in the table to the F statistic returned by LINEST to determine a confidence level for the model. For information about how df is calculated, see "Remarks" later in this topic. Example 4 below shows use of F and df. SSreg The regression sum of squares. SSresid The residual sum of squares. For information about how ssreg and 54
  • 55. ssresid are calculated, see "Remarks" later in this topic. The following illustration shows the order in which the additional regression statistics are returned. Statistics given by function coeff(n) coeff(n-1) coeff(n-2) coeff(n-3) …… se(n) se(n-1) se(n-2) se(n-3) …… coeff of det S.E F stats d.f. SS reg SS resid Fitting Multiple Regression Model AT SCM dept, DPC plays with vast and scattered product basket. Product basket contains various drugs in the form of tablets, capsules, vials and bottles. Various drugs are combination of the different molecules. Product belongs to the different molecule classes. As we have discussed and got certain numbers of parameters which can affect the actual sales, each parameter has to be checked out for its impact on the actual sales. We have the question of including parameters in to the model as an independent parameter. One should check out the significance and validity of the parameter. After deciding all those criteria, a decision should be taken as to which parameter should be included. ASSUMPTIONS made Parameters taken into considerations are least correlated Multiple regression model follows all the assumption of the correlation. Data, which are collected, is accurate. Future estimates of the parameters are true. There is no intercept considered. Data Sources SAP data files- SAP data files are the files which are extracted from the SAP. As SAP contains all the data regarding the sales, orders, availability, field targets, institutional sales and what not! SAP contains past data in every form in which it is needed. Generally these data are fed into SAP in the past. So to get the data, SAP is used and data files are used as the data source. Thus, SAP data files are the internal source of the data. ORG-MARG DATA ORG-MARG is the market research company. They collect the sales data from retail counters. Data collected by ORG people is product specific, company specific, industry specific, market specific. 55
  • 56. Data used for the project is of Pharmaceuticals’ sector. TORRENT is a subscriber of the ORG Data. TORRENT uses the org data for the market research and analysis purpose. There is a separate cell at TORRENT, which deals with the ORG data. ORG data is replenished on every month for the recent past month by the ORG-MARG. ORG MARG has the dedicated software, which are used to get the data in the form as it is needed. ORG data is available on the market basis. Data available has shown the hierarchies as shown below in the graph. ORG data is available on the monthly as well as yearly basis. ORG data is available in the units (strips) and value. They also give the company market share, company market growth, molecule growth, molecule class growth, and company’s share in the particular sector. It also provides the statistics in terms of years. How much market share does the company have? How much does it have gained or lost? ORG provides the data a month later i.e. in the month of June, it provides the data of the month of May. 56 Market Pharmaceuticals Molecule class Tranquilizers Molecule Aalprazolam Pack wise Alprax .5 tab Alprax Sr 0.5 Tab Strength wise Alprax (0.5) (All the products consisting 0.5 strength) Brand Alprax