MS Excel Macros/ VBA Project report

11,575 views

Published on

MS Excel Macros/ VBA Project report

Published in: Technology
1 Comment
14 Likes
Statistics
Notes
No Downloads
Views
Total views
11,575
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
0
Comments
1
Likes
14
Embeds 0
No embeds

No notes for slide

MS Excel Macros/ VBA Project report

  1. 1. Effectiveness of Macro/VBA(Visual Basic for Application) of MS-Excel in Data Analysis.
  2. 2. Executive SummaryDue primarily to its widespread availability, Microsoft Excel is often the de facto choiceof engineers and scientists in need of software for measurement data analysis andmanipulation. Microsoft Excel lends itself well to extremely simple test andmeasurement applications and the financial uses for which it was designed; however, inan era when companies are forced to do more with less, choosing the appropriate toolsto maximize efficiency (thereby reducing costs) is imperative.One thing virtually every reader has in common is the need to automate some aspect ofExcel. That is what VBA is all about.Microsoft Excel primarily used for the management, inspection, analysis, and reportingof acquired or simulated engineering and scientific data – offers efficiency gains andscalability with features of data post-processing applications.Visual Basic for Applications (VBA) can be used in conjunction with Microsoft Excel toautomate data analysis tasks. In particular, VBA can be used to lock in on and parsebusiness data then automatically engage Excels data processing tools to perform theanalysis tasks you want. When the analysis is completed, VBA can then be used toautomatically create reports on Excel worksheets or in other applications such as Wordand PowerPoint. VBA can also be used to create your own custom data analysis tools.Youre probably aware that people use Excel for thousands of different tasks. Here arejust a few examples: Keeping lists of things, such as customer names, students grades, or holiday gift ideas Budgeting and forecasting Analyzing scientific data Creating invoices and other forms Developing charts from data tab mark 2
  3. 3. The list could go on and on, but you get the idea. Excel is used for a wide variety ofthings, and everyone reading this article has different needs and expectations regardingExcel.VBA is a fantastic tool for empowering analysts to build their own solutions to problems.It gives analysts the power to create innovative new bits of kit without learning the sortof heavyweight programming that is the preserve of full-time coders with computerscience degrees. What analysts produce in VBA - and I speak from personal experiencehere - is quite often horrifying to their IT departments. Even very good code by analyststandards is a world away from the way that a good programmer might chose to solve aproblem. For one thing, no programmer worth the name would have started their buildin VBA.The thing is, even with that coding deficiency, VBA works. It makes an awful lot ofbusinesses run. And along the way, its completely hamstrung Microsoft with Excelupgrades, because its too embedded in too many places to change it now.VBA survived and it prospered. It did that because it met a need and its a need thatlarge companies in particular, go out of their way to avoid happening with other bits ofsoftware. Excel was the Trojan horse that put IT capability into the hands of people whoarent supposed to have it. As VBA starts to show its age, were in danger of driftingtowards a world where centralized IT departments control access to data and access tothe tools that can work with it.For example, you might create a VBA program to format and print your month-end salesreport. After developing and testing the program, you can execute the macro with asingle command, causing Excel to automatically perform many time-consumingprocedures. Rather than struggle through a tedious sequence of commands, you cangrab a cup of coffee and let your computer do the work — which is how its supposed tobe, right? 3
  4. 4. Objective Of Study Detail study of Macros /VBA (Visual Basic for Applications) and its effectiveness in Data Analysis. The project would enable the reader gain important insights into MS Excel and VBA with regards to its evolution and future growth. The study would also reflect upon the various methods or functions of Excel and VBA to stay ahead of the curve. The study would also delve into the technological aspects of the Data analysis with regards to its network spread, communication technologies, emerging technologies, etc. The project report would seek to shed light on the futuristic scope of the VBA with regards to new development language for businesses and its impact, etc.Broad Action Plan Detail exploration on helpfulness of VBA/ Macros in data analysis and Life cycle of macro and inventorization process in ABC Company. To gain a historical perspective of the MS Excel. To study the VBA from the perspective of its effectiveness in terms of analyzing data, etc. To identify the challenges faced while analyzing the data. To analyze future trends and thereby cite business opportunities for effective usage of the MS Excel and VBA. 4
  5. 5. TABLE OF CONTENTS Sr. No Topic Page No1. Background 102. Introduction to MS Excel 133. History 174. Introduction to Visual Basic (VB) 18 4.1 What is Visual Basic 185. Introduction to VBA 19 5.1 Why VBA 20 5.2 Calculations without VBA 22 5.3 Advantages of Using VBA 236. Miscellany 26 Simulation Example 297. 7.1 What is the algorithm 29 7.2 VBA code for this example 31 7.3 A trick to speed up the calculations 338 Reporting in Excel 359. Attributes of Good VBA Models 37 9.1 Documenting VBA Models 4210. Caveats 45 10.1 General Issues 48 10.2 Results of Analyses 49 10.3 Additional Analyses 54 10.4 Requesting Many Analyses 57 10.5 Working with Many Columns 5711. Beyond VBA 5912. Conclusion 60 5
  6. 6. 13. Recommendations 6314. Scope for Future Study 6415. References 65 6
  7. 7. 1. Background I have experienced measurement, statistics, and data analysis courses for a period ofthe past 5 years. By and large, my colleagues have come from the non-technical,particularly from education and Finance. I generally find that people in the educationwill often have a negative attitude toward automation, and finance peoples seeks forautomated methods, a fact that has at times added more than a small bit of extrachallenge. My working has always had a practical bent. I make an effort to automate the dataresearch or analysis, and I fully integrate technology in my work life. I cannot alwaysassume that peoples are as technology-literate as I would like; at times it’s necessary toset aside instructional time in order to deal with specific technology topics, as I willmention below. 7
  8. 8. Main Areas of WorkAbove graph displays the main areas of work on the basis of employeesMain area of work in which software is usedToday each sector is using the software in less or more manner; so above graph talksabout the percentage of usage of software’s in main areas. 8
  9. 9. Percentage of respondents using each packageAs we have software’s in organizations and above is the percentage of respondentsusing each package 9
  10. 10. 2. Introduction to MS Excel Excel is the backbone to any custom built financial model, and requires having goodtechnical Excel skills. By connecting to any type of databases (Oracle, IBM, SQL Server, OLAP) Excel canretrieve data from your corporate databases and files, you dont have to retype the datathat you want to analyze in Excel. You can also refresh your financial spreadsheets andsummaries automatically from the original source database whenever the database isupdated with new information. A powerful and easy to use operational or financial model in Excel provides decisionmakers with analytical capabilities to assess the outcomes of a range of scenarios. Good financial management and financial governance are at the core of goodmanagement. They help to drive performance by supporting effective decision making,aiding the efficient running of organizations and maximizing the effective use ofresources. Good financial management is also essential to maintain the stewardship andaccountability of public funds. The way government bodies collect, analyze and utilizefinancial management information directly impacts on the performance of theirorganizations and the delivery of their objectives. Financial modeling with Excel A Financial Model is complex spreadsheet structured, dynamic and flexible. Itcontains a set of variable assumptions, inputs, outputs, calculations, and scenarios. Theobjective is, by changing input data, to explore relationships between several variablesand to test the results of these changes on the output results of ad hoc scenario. It allowssimulate a wide range of scenarios and sensitivity analysis in short period of time. And,this can be done faster and easier in an Excel spreadsheet than in any Analyticsapplication (SAP, SAS, Siebel, etc). 10
  11. 11. Concretely, a Financial Model can be used to match different needs or objectives. Itcan be a Business Case, a Profitability Analysis, a Budget, a Reporting or a Forecastingstudy. All of them are efficient tools that help executives and managers monitor, manageand run their business. Because they are in all the levels and departments of company,Excel spreadsheets are the solutions. The challenge is to design a low maintenance user-friendly reporting tools that automatically consolidate, analyze, transform, update andpresent the information needed. Business Cases A business case has to translate business ideas from vague concepts into a concreteset of numbers, and to score high in credibility, accuracy, and practical value in order tobe the financial backbone to the project’s concepts. A business case is based on ―What-if analysis and scenario management‖ essential toanswer typical business case analysis questions, and has a strong focus on cash-flowevaluation of strategic business decisions in order to assess the financial feasibility of theproject. A business base is referred to frequently during the project, to determine whether itis currently on track. And at the end of the project, success is measured against theability to meet the objectives defined in the Business Case. So the completion of aBusiness Case is critical to the success of the project. Cost and Profitability Analysis A cost and profitability analysis helps to determine where resources should beallocated to maximize profit. This type of analysis not only serves as a tool to make moreinformed decisions, but can also identify ways to improve business processes. Forexample, it helps to identify who are the most profitable customers in order to focus on 11
  12. 12. them. The well-known 80/20 states 80% of your profit usually comes from 20% of yourcustomers - but which 20%? A profitability analysis may not be only based on customers, but also on products oractivities. Budgeting Creating, monitoring and managing a budget is key to business success. It shouldhelp you allocate resources where they are needed. It is extremely important to knowhow much money you have to spend, and where you are spending it. It is the mosteffective way to control your cash-flow, to keep your business - and its finances - ontrack - allowing you to invest in new opportunities at the appropriate time. If your business is growing, you may not always be able to be hands-on with everypart of it. You may have to split your budget up between different areas or departmentssuch as sales, production, marketing, administration, etc. Youll find that money starts to move in many different directions through yourorganization - budgets are a vital tool in ensuring that you stay in control of expenditure.A budget is a plan to: control your finances ensure you can continue to fund your current commitments enable you to make confident financial decisions and meet your objectives ensure you have enough money for your future projects You should stick to your budget as far as possible, but review and revise it as needed.Successful businesses often have a rolling budget. 12
  13. 13.  Management Reporting Management Reporting keeps track of all the change and compares historical dataversus actual or original projection versus reality. We assist in setting up and monitoringeffective and timely management reports, using financial and non-financial data: Measure the business - weekly, monthly and annually Effectively handle market changes and manage associated costs Set up internal business practices and structures for reporting internally Pin point problem areas. Forecasting Once you have created a budget and related this to actual numbers, you can create adynamic rolling forecast which can be updated on a regular basis. This can be done on aweekly or even daily basis to give you a more accurate and up to date picture of cashflow and profit and loss. Before you start forecasting, remember that revenue projections are only asmeaningful as your baseline data. Make sure the data is complete, correct and ordered. There needs to be enoughhistorical sales data to accurately perform an analysis, typically seven to ten timeperiods; the longer that forecast timeline the more accurate the forecast. The data mustbe ordered from oldest to newest. If there is any missing data for a time period, thenestimate the number as accurately as possible. the time periods need to be uniform; forexample compare months to months or year to years. 13
  14. 14. 3. History 14
  15. 15. 4. Introduction to Visual Basic (VB) Visual basic is one of the most popular programming languages in the market today.Microsoft has positioned it to fit multiple purposes in development. The language rangesfrom light weight vb script programming, to application specific programming with vbfor applications4.1 What is Visual Basic? The visual part refers to the method used to create GUI. Rather than writingnumerous lines of code to describe the appearance and location of interface elements, wesimplify add rebuilt objects into place on screens. VB is high level programming language evolved from earlier DOS version calledBASIC. VB is event driven programming VB programs are made up of many subprograms , each has its in own program codes and each can be executed independentlyand at the same time each can be linked in one way or another. VB is designed to deploy applications across the enterprise and to scale of any sizeneeded the ability to develop object mode is databases integration, server components,and Internet/Intranet applications provides an extensive range of capabilities and toolsof the developer. In particular VB lets us to add menus, textboxes, command buttons,option buttons, check boxes, scroll bars, and file & directory boxes to blank windows.We can communicate with other window applications and perhaps most importantly wewill have an easy method to let users’ control and access database. 15
  16. 16. 5. Introduction to VBA Visual Basic for Applications, Excel’s powerful built-in programming language,permits you to easily incorporate user written functions into a spreadsheet. User caneasily calculate Black-Scholes and binomial option prices, For example, in case you think VBA is something esoteric which you will neverotherwise need to know, VBA is now the core macro language for all Microsoft’s officeproducts, including Word. It has also been incorporated into software from othervendors. You need not write complicated programs using VBA in order for it to beuseful to you. At the very least, knowing VBA will make it easier for you to analyzerelatively complex problems for yourself. This document presumes that you have a basic knowledge of Excel, including the useof built-in functions and named ranges. I do not presume that you know anything aboutwriting macros or programming. The examples here are mostly related to option pricing,but the principles apply generally to any situation where you use Excel as a tool fornumerical analysis. The Windows version of Excel supports programming through MicrosoftsVisualBasic for Applications (VBA), which is a dialect of Visual Basic. Programming with VBAallows spreadsheet manipulation that is awkward or impossible with standardspreadsheet techniques. Programmers may write code directly using the Visual BasicEditor (VBE), which includes a window for writing code, debugging code, and codemodule organization environment. The user can implement numerical methods as wellas automating tasks such as formatting or data organization in VBA and guide thecalculation using any desired intermediate results reported back to the spreadsheet. VBA was removed from Mac Excel 2008, as the developers did not believe that atimely release would allow porting the VBA engine natively to Mac OS X. VBA wasrestored in the next version, Mac Excel 2011. 16
  17. 17. A common and easy way to generate VBA code is by using the Macro Recorder. TheMacro Recorder records actions of the user and generates VBA code in the form of amacro. These actions can then be repeated automatically by running the macro. Themacros can also be linked to different trigger types like keyboard shortcuts, a commandbutton or a graphic. The actions in the macro can be executed from these trigger types orfrom the generic toolbar options. The VBA code of the macro can also be edited in theVBE. Certain features such as loop functions and screen prompts by their ownproperties, and some graphical display items, cannot be recorded, but must be enteredinto the VBA module directly by the programmer. Advanced users can employ userprompts to create an interactive program, or react to events such as sheets being loadedor changed. Users should be aware that using Macro Recorded code may not be compatible fromone version of Excel to another. Some code that is used in Excel 2010 cannot be used inExcel 2003. Making a Macro that changes the cell colors and making changes to otheraspects of cells may not be backward compatible. VBA code interacts with the spreadsheet through the Excel Object Model,[14]avocabulary identifying spreadsheet objects, and a set of supplied functions ormethods that enable reading and writing to the spreadsheet and interaction with itsusers (for example, through custom toolbars or command bars and message boxes). User-created VBA subroutines execute these actions and operate like macrosgenerated using the macro recorder, but are more flexible and efficient.5.1 Why VBA? Macros have been used as development tool since the early days of the MicrosoftOffice product line. Microsoft Access macros incorporate generalized database functionsusing existing Microsoft Access capabilities. Errors in a macro can be easily resolved by 17
  18. 18. using the Microsoft supplied Help function. The ease with which you can generateMacros makes Macro development seems easier to accomplish. You can generate macros by selecting database operations and commands in theMacro window. These macros can then be converted to Microsoft Access VBA. In mostcases, you need only make minor edits to the saved code in order to have a functionalprogram. All syntax, spacing and functionality are included in the saved file, whichcontains VBA code specific to the particular application being recorded. Unskilledprogrammers are able to interpret the code and learn how to generate code toaccomplish specific tasks. In the process, the novice programmer may gain a usefulintroduction to VBA code. Building Macros can be easier and faster than writing VBAcode, for simple applications, and making global key assignments, however, moreadvanced and complex applications are not so easily accomplished using macros. People tend to consider Macros because VBA code is perceived to be moreprogrammatic, offering a variety of options that appear confusing and time consumingto understand. These options, however, provide developers with tools to extendMicrosoft Access capabilities beyond those packaged with the Microsoft Accesssoftware. If building or generating macros comes easily and does not consume greatamounts of your time, you may want to consider their use, particularly if you want toaccomplish rather simple tasks. If, however, you find macros to be time consuming andtedious, as many have attested to, you may want to consider building VBA code. Bylearning and building upon VBA skills, you acquire a programming skill set that isapplicable and portable to various other applications. Macros, on the other hand, areused in many applications, but they are specific to a particular application. Macros, inmost cases, are not portable to other applications. VBA is one of the more easy-to-learn programming languages. It does not require thecomplex programming techniques that are necessary to program C++ or other high levellanguages. VBA provides a user-friendly, forms-based interface to assign variables and 18
  19. 19. simplify code development. VBA is a widely used application so that help is availablefrom a variety of sources. A second party would have to know and understand yourparticular application in order to assist you with building a macro. VBA can be used to perform any operation that a macro can perform. VBA alsoallows you to perform a multitude of more advanced operations to include thefollowing: Incorporate error-handling modules to assist in the running of your applications. Integrate Word and Excel features in your database Present users with professional forms-based layouts to interface with your database Process data in the background Create multi-purpose forms Perform conditional looping5.2 Calculations without VBA Suppose you wish to compute the Black-Scholes formula in a spreadsheet. Suppose also that you have named cells 2 for the stock price (s), strike price (k),interest rate (r), time to expiration (t), volatility (v), and dividend yield (d). You couldenter the following into a cell:s*exp(-d*t)*norms dist((ln(s/k)+(r-d+vˆ2/2)* t)/(v*t ˆ0.5))−k *exp(- r * t)* normsdist((ln(s/k)+(r -d-vˆ2/2)*t)/(v*tˆ0.5)) Typing this formula is cumbersome, though of course you can copy the formulawherever you would like it to appear. It is possible to use Excel’s data table feature tocreate a table of Black-Scholes prices, but this is cumbersome and inflexible. If you wantto calculate option Greeks (e.g. delta, gamma, etc...) you must again enter or copy theformulas into each cell where you want a calculation to appear. And if you decide to 19
  20. 20. change some aspect of your formula, you have to hunt down all occurrences and makethe changes. When the same formula is copied throughout a worksheet, that worksheetpotentially becomes harder to modify in a safe and reliable fashion. When the worksheetis to b e used by others, maintainability becomes even more of a concern. Spreadsheet construction becomes even harder if you want to, for example, computea price for a finite- lived America n option. There is no way to do this in one cell, so youmust compute the binomial tree in a range of cells, and copy the appropriate formulasfor the stock price and the option price. It is not so bad with a 3-step binomial calculation, but for 100 steps you will spendquite a while setting up the spreadsheet. You must do this separately for each time youwant a binomial price to appear in the spreadsheet. And if you decide you want to se tup a put pricing tree, there is no easy way to edit your call tree to price puts. Of courseyou can make the formulas quite flexible and general by using lots of ―if‖ statements.But things would be come much easier if you could create your own formulas withinExcel. You can — with Visual Basic for Applications.5.3 Advantages of Using VBA VBA, or Visual Basic for Applications, is the simple programming language that canbe used within Excel 2007 (and earlier versions, though there are a few changes that havebeen implemented with the Office 2007 release) to develop macros and complexprograms. The advantages of which are: The ability to do what you normally do in Excel, but a thousand times faster The ease with which you can work with enormous sets of data To develop analysis and reporting programs downstream from large central databases such as Sybase, SQL Server, and accounting, financial and production programs such as Oracle, SAP, and others. 20
  21. 21. Macros save keystrokes by automating frequently used sequences of commands, anddevelopers use macros to integrate Office with enterprise applications - for example, toextract customer data automatically from Outlook e-mails or to look up relatedinformation in CRM systems or to generate Excel spreadsheets from data extracted fromenterprise resource planning (ERP) systems. To create an Excel spreadsheet with functionality beyond the standard defaults, youwrite code. Microsoft Visual Basic is a programming environment that uses a computerlanguage to do just that. Although VBA is a language of its own, it is in reality derivedfrom the big Visual Basic computer language developed by Microsoft, which is now thecore macro language for all Microsoft applications. To take advantage of the functionality of the Microsoft Visual Basic environment,there are many suggestions you can use or should follow. Below we will take a look at afew hints and tips for VBA security and protection in Excel, a more in-depthunderstanding of which can be gained by attending a VBA Excel 2007 course, deliveredby a Microsoft certified trainer. Password protecting the code As a VBA Excel user you may want to protect your code so that nobody may modifyit and to protect against the loss of intellectual property if people access source codewithout permission. This is easily achieved in the VBE editor by going to "Tools/VBAProject Properties/Protection". Check the box and enter a password. Hiding worksheets In any or all of your Excel workbooks you might want to hide a worksheet thatcontains sensitive or confidential information from the view of other users of theworkbook. If you just hide the worksheet in the standard way the next user will be ableto simply un-hide it, but by using a VBA method to hide and password protect a 21
  22. 22. worksheet, without protecting the entire workbook, you will be able to allow other usersaccess without affecting the confidentiality of the data. Protecting workbooks There are different levels of protection for workbooks, from not allowing anyoneaccess to the workbook to not allowing any changes to be made to it, i.e. setting thesecurity to read only so that no changes can be made to the templates you have created. 22
  23. 23. 6. Miscellany Getting Excel to generate your macros for you Suppose you want to perform a task and you don’t have a clue how to program it inVBA. For example, suppose you want to c re ate a subroutine to set up a graph. You canset up a graph manually, and tell Excel to record the VBA commands which accomplishthe same thing. You then examine the result and see how it works. To do this, selectTools |Record Macro |Record New Macro. Excel will record all your actions in a newmodule located atthe end of your workbook, i.e. following Sheet16. You stop therecording by clicking the - ?- button which should have app eared on your spreadsheetthen you started recording. Macro recording is an extremely useful tool forunderstanding how Excel and VBA work and interact; this is in fact how the Excelexperts learn how to w rite macros which control Excel’s actions. For example, here is the macro code Excel generates if you use the chart wizard to setup a chart using data in the range A2:C4. You can see among other things that theselected graph style was the fourth line graph in the graph gallery, and that the chartwas titled ―Here is the title ‖. Also, each data series is in a column and that the firstcolumn was used as the x-axis(―CategoryLabels:=1‖).’ Macro1 Macro’ Macro recorded <Date> by <UserName>Sub Macro1()Range(”A2:C4”).SelectActiveSheet.ChartObjects. Add (196.5, 39, 252.75, 162).SelectActiveChart.ChartWizard Source:=Range(”A2:C4”),Gallery:=xlLine,Format:=4,PlotBy:=xlColumns,CategoryLabels:=1,SeriesLabels:=0,HasLegend:=1,Title:=”Here is the Title”, CategoryTitle:=”X−Axis”, ValueTitle:=”Y −Axis”, ExtraTitle:=””End Sub 23
  24. 24.  Using multiple modules You can split up your functions and subroutines among as many modules as you like—functions from one module can call another, for example. Using multiple module s isoften convenient for clarity. If you put everything in one module you will sp end a lot oftime sc rolling around. Re-calculation speed One unfortunate drawback of VBA—and of most macro c o de in most applications—is that it is slow. When you are using built- in functions, Excel performs clever internalchecking to know whether something requires recalculation (you should be aware thaton occasion it appears that this clever checking goes awry and something which shouldbe recalculated, isn’t). When you write a custom function, however, Excel is not able to perform its checkingon your functions, and it therefore tends to recalculate everything. This means that ifyou have a complicated spreadsheet, you may find very slow recalculation times. This isa problem with custom functions and not one you can do anything about. There are tricks for speeding things up. Here are two:• If you are looping a great deal, be sure to declare your looping index variables as integers. This will speed up Excel’s handling of these variables. For example, if you use i as the index in a for loop, use the statement Dim i as integer While a lengthy subroutine is executing, you may wish to turn off Excel’s screenupdating. You do this by Application.ScreenUpdating = False 24
  25. 25. This will only work in subroutines, not in functions. If you want to check theprogress of your calculations, you can turn Screen Updating off at the beginning of yoursubroutine. Whenever you would like to see your calculation’s progress (for exampleevery 100th iteration) you can turn it on and then immediately turn it off again. This willupdate the display.• Finally, here is a good thing to know: Ctrl-Break will (Usually) stop a recalculation!Remember this. Someday you will thank me for it. Ctrl-Break is more reliable if yourmacro writes output to the screen or spreadsheet. Debugging We will not go into details here, but VBA has very sophisticated de buggingcapabilities. For example, you can set breakpoints (i.e. lines in your routine where Excelwill stop calculating to give you a chance to see what is happening) and watches (whichmeans that you can look at the values of variables at diff erent points in the routine).Look up ―debugging‖ in the online help. Creating an Add-in Suppose you have written a useful set of option functions and wish to make thembroadly available in your spreadsheets. You can make the functions automaticallyavailable in any spreadsheet your write by creating an add-in. To do this, you simplyswitch to a macro module and then Tools |Make Add-in. Excel will create a file with theXLA extension which contains your functions. You can then make these functionsautomatically available by Tools |Add-ins, and browse to locate your own add-inmodule if it does not appear on the list. Any functions available through an add-in will automatically show up in thefunction list under the set of ―userdefined‖ functions. 25
  26. 26. 7. Simulation Example Suppose you have a large amount of money to invest. Suppose that at the end of thenext five years you wish to have it fully invested in stocks. It is often asserted in thepopular press that it is preferable to invest it in the market gradually, rather than all atonce. In particular, consider the strategy of each quarter taking a pro rata share of whatis left and investing it in stocks. So the first quarter invest 1/20th in stocks, the secondinvest 1/19th of money remaining in stocks, etc... It is obvious that the strategy in whichwe invest in stocks over time should have a smaller average return and a lower standarddeviation than a strategy in which we plunge into stocks, but how much lower andsmaller? Monte Carlo simulation is a natural tool to address a question like this. We willfirst see how to structure the problem and then analyze it in Excel. You may notunderstand the details of how the random stock price is generated. That does not matterfor purposes of this example; rather, the important thing is to understand how theproblem is structured and how that structure is translated into VBA.7.1 What is the algorithm? To begin, we de scribe the investment strategy and the evolution over time of theportfolio. Suppose we initially have $100 which is invested in bonds and nothinginvested in stock. Let the variable s BONDS and STOCK denote the amount invested ineach. Let h be the fraction of a year between investments in stock (s o for example if h =.25, there are 4 transfers per year from bonds to stock), and let r, µ, and σ denote the risk-free rate, the expected return on the stock, and the volatility of the stock. Suppose we switch from bonds to stock 20 times, once a quarter for 5 years. Le t n =the number of times we will switch. We need to know the stock price each time weswitch. Denote these prices by PRICE(0), PRICE(1), PRICE(2), ... , PRICE(20). Now eachperiod, at the beginning. 26
  27. 27. The following example is considerably more complicated than those that precede it.It is designed to illustrate many of the basic concepts in a non-trivial fashion. You maywish to skip it initially, and return to it once you have had some experience with VBA. If you are thinking about option-pricing, you might expect this example to becomputed using the risk-neutral distribution. Instead, we will compare the actual payoffdistributions of the two strategies in order to compare the means and standarddeviations. If we wished to value the two strategies, we would substitute the risk-neutraldistribution by replacing the 15% expected rate of return with the 10% risk-free rate.After making this substitution, both strategies would have the same expected payoff of$161 (100 × 1.15). Since both strategies entail buying assets at a fair price, there is no need to perform avaluation! Both will be worth the initial investment of the period we first switch somefunds from bonds to stock. At the end of the period, we figure out how much we earnedover the period. If we wish to switch a roughly constant proportion each period, wecould s witch 1/20 the first period, 1/19 with 19 periods to go, and s o forth. Thissuggests that at the beginning of period j,bonds (j)=bonds(j−1) ∗ (1−1/(n+1 −j))stock (j)=stock(j−1)+bonds(j −1)/(n+1 −j) At the end of the period we havestock (j)=stock(j ) ∗ price (j)/ price (j −1)bonds (j) = bonds(j) ∗ exp(r ∗ h) In words, during period j, we earn interest on the bonds and capital gains on thestock. We can think of the STOCK(j) and BONDS(j) on the right-hand side as denotingbeginning of period values after we have allocated some dollars from bonds to s to ck,and the value s on the right-hand side as the end-of-period values after we have earnedinterest and capital gains. We compute capital gains on the stock by 27
  28. 28. price (j) = price (j −1) ∗ exp((mu − 0.5 ∗ v ˆ 2) ∗ h+ v ∗ h ˆ (0.5) ∗WorksheetFunction.Norm SInv(rnd())) As mentioned above, it is not important if you do not understand this expression. Itis the standard way to create a random log normally-distributed stock price, where theexpected return on the stock is mu, the volatility is v, and the length of a period is h. Atthe end, j = n, and we will invest all remaining bonds in stock, and earn returns for onefinal period. This describes the stock and bond calculations for a single set of randomly-drawnlognormal prices. Now we want to rep eat this process many time s. Each time, we willsave the results of the trial and us e the m to compute the distribution.7.2 VBA code for this example. We will set this up as a subroutine. The first several lines in the routine simplyactivate the worksheet where we will write the data, and then clear the area. We needtwo columns: one to store the terminal portfolio value if we invest fully in stock at theoutset, the other to store the terminal value if invest slowly. Note that we have se t it upto run 2000 trials, and we also clear 2000 rows . We tell VBA that the variables ―bonds‖,―stock‖, and ―price‖ are going to be arrays of type double, but we do not yet know whatsize to make the array. The Worksheets(―Invest Output‖).Activate makes the ―investoutput‖ worksheet the default worksheet, so that all reading and writing will be done toit unless another worksheet is specified. 1 Sub Monte invest () 2 Dim bonds () As Double 3 Dim stock() As Double 4 Dim price () As Double 5 Worksheets(” Invest Output”). Activate 6 Range(”a1 .. b2000”). Select 7 Selection.Clear 8 ’number of monte−carlo trials 9 iter = 2000 28
  29. 29. Now we set the parameters. The risk-free rate, mean return on the stock, andvolatility are all annual numbers. We invest each quarter, so h = 0.25. There are 20 periods to keep track of since we invest each quarter for 5 years. Notethat once we specify 20 periods, we can dimension the bonds, stock, and price variable sto run from 0 to 20. We do this using the ―Redim‖ command. 10 ’ number of reinvestment periods 11 n = 20 12 ’ Reset the dimension of the bonds and stock variable 13 ReDim bonds (0 To n), stock(0 To n), price(0 To n) 14 ’ length of each period 15 h = 0.25 16 ’ expected return on stock 17 mu = 0.15 18 ’ risk−free interest rate 19 r = 0.1 20 ’ volatility 21 v = 0.3 Now we have an outer loop. Each time through this outer loop, we have one trial, i.e.we draw a series of 20 random stock prices and we see what the terminal payoff is fromour 2 strategies. Note that before we run through a single trial we have to initialize our variables : theinitial stock price is 100, we have $100 of bonds and no stock, and ―Price (0)‖, which isthe intial stock price , is set to 100. 22 ’each time through this lo op is one com ple te iteration 23 For i = 1 To iter: price (0) = 100 24 bonds (0) = 100 25 stock (0) = 0 This is the heart of the program. Each period for 20 periods we perform ourallocation as above. Note that we draw a new random stock price using our standardlognormal expression. 29
  30. 30. 26 For j = 1 To n 27 ’allocate 1/ n of bonds to stock 28 stock ( j ) = stock ( j −1) + bonds(j−1) / (n + 1 − j) 29 bonds (j) = bonds (j −1) ∗ (1 $ −$ 1 / (n + 1 $ −$ j)) 30 31 ’ draw a new lognormal stock price 32 price (j) = price (j −1) ∗ Exp ((mu − 0.5 ∗ v ˆ 2) ∗ h + 33 v ∗ h ˆ (0.5) ∗ Workshee tFunc tionNormSInv( nd ())) 34 35 ’ earn returns on bonds and stock 36 bonds (j) = bonds (j) ∗ Exp (r ∗ h) 37 stock (j ) = s to ck ( j ) ∗ ( price ( j ) / price ( j −1)) 38 39 Next j Once through this loop, all that remains is to write the results to ―sheet1‖. Thefollowing two statements do that, by writing the terminal price to column 1, row i, andthe value of the terminal stock position to column 2, row i. 40 ActiveSheet.Cells (i, 1) = price (n) 41 ActiveSheet.Cells (i, 2) = stock (n) 42 Next i 43 End Sub Note that you could also write the data across in columns : you would do this bywriting ActiveSheet.Cells(1, i) = p1 This would write the terminal price across the first row.7.3 A trick to speed up the calculations Modify the inner loop by adding the two lines referring to ―screenupdating‖: ’ each time through this loop is one complete iteration For i = 1 To iter Application.ScreenUpdating = false ... If (i mod 100 = 0) then application.screenupdating = true ActiveSheet.Cells (i , 1) = price (i) 30
  31. 31. ActiveSheet.Cells (i , 2) = stock (i) Next i The first line prevents Excel from updating the display as the subroutine is run. Itturns out that it takes Excel quite a lot of time to redraw the spreadsheet and graphswhen numbers are added. The second line at the end redraws the spreadsheet every 100 iterations. The ―mod‖function returns the remainder from dividing the first number by the second. Thus, i mod 100 will equal 0 whenever i is evenly divisible by 100. So on iterationnumbers 100, 200, etc..., the spreadsheet will b e redrawn. This cuts the calculation timeapproximately in half. Note that ―Application.ScreenUpdating‖ is an example of a command which onlyworks within a subroutine. It will not work within a function. 31
  32. 32. 8. Reporting in Excel While Excel was designed specifically to provide powerful, easy-to-use tools fortransforming quantitative analysis into visual representations, Excel remains an excellentand extremely efficient way for business analysts to share the results of their work.Familiar, accessible, and widely available, Excel makes it relatively easy to generateattractive, flexible presentations that can be widely distributed. Reports created in Excelalso make it easy for others to access underlying data, cut and paste it into their ownspreadsheets, and make full use of the data in subsequent work. Integrating analysis into Excel is fast, easy, and reliable and can be done in any ofthree different ways, depending on the way the results will be used.  Scheduled Reports: For static reports that are updated on a regular basis-daily, weekly, or monthly, for example, A file−based solution is ideal. The process begins by integrating data sources into Excel. Once the computation is complete, VBA scripts automatically generate tables and graphics. They are delivered to Excel in comma-separated files and Windows metafiles, using a simple VB script to embed the results directly into a preformatted report.  Interactive Desktop Applications: Where more interactivity is required, an Excel add-in can be created that includes menus and dialogs for controlling the parameters of the report and the data to be analyzed. An VB script is created to run the analysis based on the chosen parameter. A call from Excel to VB is then made using a COM API that initiates the script and inserts the results into the report. This option is best suited to dynamic reports that are distributed to relatively small numbers of end users.  Client-Server Applications: The server-based option is ideal in situations where interactive reports are created or accessed by larger numbers of users and where 32
  33. 33. the ability to change the underlying analytics quickly, and distribute them widelyis desired. Similar to the client-based solution, an Excel add-in is created thatincludes menus and dialogs for controlling parameters of the analysis. Excel thenuses an HTTP API to call a remote server where the VB script is run. Results arethen inserted into Excel. This server-based approach enables organizations to takeadvantage of the power of server-based distributed technology to generate anddisseminate analytics 33
  34. 34. 9. Attributes of Good VBA Models While VBA models can be widely different from one another, all good ones need tohave certain common attributes. In this section I briefly describe the attributes that youshould try to build into your models. Some of these apply to Excel models as well. I amincluding them both here and under Excel so that you can have comprehensive lists ofthe attributes at both places.  Realistic Most models you develop will be directly or indirectly used to make some decisions.The output of the model must therefore be realistic. This means that the assumptions,mathematical relationships, and inputs you use in the model must be realistic. For most―real-world‖ models, making sure of this takes a lot of time and effort, but this is not aplace where you should cut corners. If a model does not produce realistic outputs, itdoes not matter how good its outputs look or how well it works otherwise.  Error-Free It is equally important that a model be error-free. You must test a model extensivelyto make sure of this. It is generally much easier to fix a problem when a model just doesnot work or produces obviously wrong answers. It is much harder to find errors that aremore subtle and occur for only certain combinations of input values. See the chapter ondebugging for help on making your models error-free.  Flexible The more different types of question a model can answer, the more useful it is. In theplanning stage, you should try to anticipate the different types of questions the model islikely to be used to answer. You then do not have to make major changes every timesomeone tries to use it for something slightly different. 34
  35. 35.  Easy to Provide Inputs Most VBA models need inputs from the user, and the easier it is for the user toprovide the inputs, the better. Generally a VBA model can get inputs either throughinput dialog boxes (that is, through the InputBox function) or by reading them in from aspreadsheet (or database). Using input dialog boxes to get input data works well when there are only a fewinputs—probably five or less. If the model needs more inputs, it is better to set up aninput area in a spreadsheet (or, for large models, even a separate input spreadsheet)where the user can enter the input data before running the model. This approach is particularly helpful if the user is likely to change only one or twoinputs from one run to the next. If a model uses a large number of input dialog boxes,the user will have to enter data in each of them every time he runs the model—even if hewants to change only one or two inputs. However, if the user has to provide some input(based on some intermediate outputs) while a procedure is running, then using inputdialog boxes is the only option. If the model uses input dialog boxes, the prompt should provide enough specificinformation to help the user enter the right data in the right format. Similarly, if theinput data is to be provided in certain cells in a spreadsheet, then there should beenough information in the next cell (or nearby) to help the user enter the right data in theright format.  Good Output Production A model that does not produce good outputs to get its major results acrosspersuasively is not as useful as it can be. Producing reports with VBA models is gen-erally a two-step process: the model produces outputs on spreadsheets and then parts orall of the spreadsheets have to be printed out. For printed outputs good models shouldinclude built-in reports (in Excel) that any user can produce easily. The spreadsheet 35
  36. 36. outputs produced by a VBA model should be such that they do not require too muchmanipulation before creating printed reports. These reports should be attractive, easy toread, and uncluttered. Avoid trying to squeeze in too much information on one page. If areport has to include a lot of data, organize it in layers so that people can start by lookingat summary results and then dig into additional details as necessary. One of theadvantages of VBA compared to other programming languages is that it can produceexcellent graphical outputs using Excel’s charting features.VBA models should includegraphical outputs wherever they will enhance the usefulness of the models. Another thing to keep in mind is that unlike an Excel model, the VBA model does notshow intermediate results (unless either through message boxes or through spreadsheetoutputs or charts). The modeler should therefore anticipate what output—intermediateand final—the user may want to see and provide for them in the model.  Data Validations It is generally more important to provide thorough data validation in VBA modelsthan it is in Excel models. If the user accidentally enters invalid data, most of the time themodel simply will not run, but it will not provide any useful information on what theproblem is and leave the user in a helpless situation. You can, of course, have the VBA code check input data for various possible errorsbefore using them. A simple alternate approach is to have the input data read in fromspreadsheets and provide data validation to the input cells on the spreadsheet usingExcel’s Data Validation feature. (To keep the codes short and to avoid repeating thesame lines of codes, I have generally omitted data validation in the models in this book.Instead of writing data validation codes repeatedly, you can create and keep a few Subprocedures for the type of data validation you need to do for the type of models youwork with most often and call them as needed.) 36
  37. 37.  Judicious Formatting The formatting here refers to formatting of the model’s output. Poor, haphazardformatting reduces a model’s usefulness because it is distracting. Use formatting (fonts,borders, patterns, colors, etc.) judiciously to make your model’s outputs easier tounderstand and use. (As much as possible, create the formatting parts of the code byrecording them with the Macro Recorder.)  Appropriate Numbers Formatting In the model’s outputs, you should format numbers with the minimum number ofdecimal points necessary. Using too many decimal points makes numbers difficult toread and often gives a false sense of precision as well. Make the format-ting of similarnumbers uniform throughout the output. (Remember that displaying numbers withfewer decimal points does not reduce the accuracy of the model in any way becauseinternally Excel and VBA continue using the same number of significant digits.) Wherever appropriate make numbers more readable by using special for-matting toshow them in thousands or millions.  Well Organized and Easy to Follow The better organized a model is, the easier it is to follow and update. The key tomaking your code organized is to break it down into segments, each of which carry outone distinct activity or set of computations. One way to accomplish this is to use separateSub procedures and Function procedures for many of such segments, especially the onesthat will be repeated many times. In the extreme, the main Sub procedure may simplyconsist of calls to other Sub procedures and Function procedures. An additionaladvantage of this approach is that you can develop a number of Sub procedures andFunction procedures to do things that you often need to do and incorporate them inother codes as needed. 37
  38. 38. Using structured programming also makes a code easier to follow. In a structuredprogram, the procedure is segmented into a number of stand-alone units, each of whichhas only one entry and one exit point. Control does not jump into or exit from themiddle of these units. The proper visual design of a code can also make it easier to follow. For example,statements should be properly indented to show clearly how they fit into the various If,For, and other structures. Similarly, each major and minor segment of the code should beseparated by blank lines and other means and informatively labeled. (The easiest way tolearn these techniques is by imitating well-written codes.)  Statements Are Easy to Read and Understand Experienced programmers try to make their codes as concise as possible, often usingobscure features of the programming language. Such codes may be admired by otherexperienced programmers, but they often baffle beginners. With the high speed of modern PCs, codes do not usually have to be concise orhighly efficient. It is best to aim for codes that are easy to understand; even if that meansthat it has more lines of code than is absolutely necessary. Avoid writing long equations whenever you can. Break them up by doing longcalculations in easily understandable steps. Make all variable names short butdescriptive and not cryptic. If in a large model you decide to use a naming scheme, try tomake it intuitive and provide an explanation of the scheme in the documentations.  Robust ―Robust‖ here refers to a code that is resistant to ―crashing.‖ It often takes significantextra work to make a code ―bulletproof,‖ and that time and effort may not be justifiedfor many of the codes you will write. Nonetheless, the code should guard againstobvious problems. For example, unless specified otherwise, a VBA code always workswith the currently active spreadsheet. So throughout a code you should make sure that 38
  39. 39. the right worksheet is active at the right time or else precede cell addresses, and so on,by the appropriate worksheet reference. Using effective data validation for the inputdata is another way of making your code robust.  Minimum Hard Coding Hard-coded values are difficult to change, especially in large models because there isalways the danger of missing them in a few places. It is best to set up any value that mayhave to be changed later as a variable and use the variable in all equations. Even for values that are not going to change it is better to define and use constants.Then use them in the equations. This makes equations easier to read and guards againstpossible mistakes of typing in the wrong number.  Good Documentation Good documentation is key to understanding VBA models and is a must for all buttrivial ones. For hints on producing good documentation, see the next section.9.1 Documenting VBA Models Documenting a model means putting in writing, diagrams, flowcharts, and so on, theinformation someone else (or you in the future) will need to figure out what a modeldoes, how it is structured, what assumptions are built into it, and so forth. A user canthen make changes to (update) it if necessary. It should also include, for example, noteson any shortcuts you may have taken for now that should be fixed later and anyassumptions or data you have used now that may need to be updated later. There is no standard format or structure for documenting a model. You have to beguided by the objectives mentioned above. Here are some common approaches todocumenting your VBA models. Every model needs to be documented differently andeveryone does documentation differently. Over time you will develop your own style. 39
  40. 40.  Including Comments in the Code The most useful documenting tool in VBA is comments. Comments are notes andreminders you include at various places in the VBA code. You indicate a comment by anapostrophe. Except when it occurs in texts within quotation marks, VBA interprets anapostrophe as the beginning of a comment and ignores the rest of the line. You can usean entire line or blocks of line for comments or you can put a comment after a statementin a line (for example, to explain something about the statement). You should include in your code all the comments that may be helpful, but do not gooverboard and include comments to explain things that are obvious. Including a lot ofsuperfluous comments can make codes harder rather than easier to read. Here are someideas on what types of comments you may want to include in your code: At the beginning of a procedure include a brief description of what the code does. At times it may also be useful to list the key inputs and outputs and some other information as well. Every time significant changes are made to the code, insert comments near the beginning of the code below the code description to keep track of the change date, the important changes made at that time, and who made the changes Sometimes it also helps to insert additional comments above or next to the statement(s) that has been changed to explain what was changed and why. Also record who made the change and when. If the procedure uses a particular variable naming scheme, then use comments to explain it. Use distinctive comment lines (for example, *********) to break down long procedures into sections, and at the beginning of each section include a short name or description of the section. Use comments next to a variable to explain what it stands for, where its value came from, and anything else that may be helpful. 40
  41. 41. You can get more ideas about what kind of comments to include in your code from the examples in this and other books. Over time you will develop your own style of providing comments in code. Make sure you insert comments as you code. If you put it off until later, yourcomments may not be as useful, inserting them may take longer because you may haveto spend time trying to remember things, and worst of all, you may never get around toit. If you do not include good comments in your code, modifying it a few months latermay take much longer.  Documenting Larger Models If you are developing a large model and saving different versions of the work-bookas I have suggested, then the workbook should include a worksheet titled―VersionDescription.‖ In this worksheet, list against each version number the major changes youmade to the code in that version. Every time you save your work under a new versionname, start a new row of description under that version number in the VersionDescription worksheet and keep adding to it as you make major changes. The key is todo this as you go along and not wait until later when you may forget some of thechanges you made. This is essentially the history (log) of the model’s development. Ifyou ever want to go back to an earlier stage and go in a different direction from there,the log will save you a lot of time. Also, you may want to have several different versionsof a model. You can document here how they differ from each other. For large models, you may also need to create a book of formal documentation(which will include information on why and how certain modeling decisions were made,flow charts for the model, etc.) and a user’s manual. For most of your work, however,documentation of the type I discussed should be adequate. 41
  42. 42. 10. Caveats We used Excel to do some basic data analysis tasks to see whether it is a reasonablealternative to using a statistical package for the same tasks. We concluded that Excel is apoor choice for statistical analysis beyond textbook examples, the simplest descriptivestatistics, or for more than a very few columns. The problems we encountered that led tothis conclusion are in four general areas: Missing values are handled inconsistently, and sometimes incorrectly. Data organization differs according to analysis, forcing you to reorganize your data in many ways if you want to do many different analyses. Many analyses can only be done on one column at a time, making it inconvenient to do the same analysis on many columns. Output is poorly organized, sometimes inadequately labeled, and there is no record of how an analysis was accomplished. Excel is convenient for data entry, and for quickly manipulating rows and columnsprior to statistical analysis. However when you are ready to do the statistical analysis,we recommend the use of a statistical package such as SAS, SPSS, Stata, Systat orMinitab. Excel is probably the most commonly used spreadsheet for PCs. Newly purchasedcomputers often arrive with Excel already loaded. It is easily used to do a variety ofcalculations, includes a collection of statistical functions, and a Data Analysis ToolPak.As a result, if you suddenly find you need to do some statistical analysis, you may turnto it as the obvious choice. We decided to do some testing to see how well Excel wouldserve as a Data Analysis application. To present the results, we will use a small example. The data for this example isfictitious. It was chosen to have two categorical and two continuous variables, so that wecould test a variety of basic statistical techniques. Since almost all real data sets have at 42
  43. 43. least a few missing data points, and since the ability to deal with missing data correctlyis one of the features that we take for granted in a statistical analysis package, weintroduced two empty cells in the data: Treatment Outcome X Y 1 1 10.2 9.9 1 1 9.7 2 1 10.4 10.2 1 2 9.8 9.7 2 1 10.3 10.1 1 2 9.6 9.4 2 1 10.6 10.3 1 2 9.9 9.5 2 2 10.1 10 2 2 10.2 Each row of the spreadsheet represents a subject. The first subject received Treatment1, and had Outcome 1. X and Y are the values of two measurements on each subject. Wewere unable to get a measurement for Y on the second subject, or on X for the lastsubject, so these cells are blank. The subjects are entered in the order that the databecame available, so the data is not ordered in any particular way. We used this data to do some simple analyses and compared the results with astandard statistical package. The comparison considered the accuracy of the results aswell as the ease with which the interface could be used for bigger data sets - i.e. morecolumns. We used SPSS as the standard, though any of the statistical packages supportswould do equally well for this purpose. In this article when we say "a statisticalpackage," we mean SPSS, SAS, STATA, SYSTAT, or Minitab. Most of Excels statistical procedures are part of the Data Analysis tool pack, which isin the Tools menu. It includes a variety of choices including simple descriptive statistics,t-tests, correlations, 1 or 2-way analysis of variance, regression, etc. If you do not have a 43
  44. 44. Data Analysis item on the Tools menu, you need to install the Data AnalysisToolPak. Search in Help for "Data Analysis Tools" for instructions on loading theToolPak. Two other Excel features are useful for certain analyses, but the Data Analysis toolpack is the only one that provides reasonably complete tests of statistical significance.Pivot Table in the Data menu can be used to generate summary tables of means,standard deviations, counts, etc. Also, you could use functions to generate somestatistical measures, such as a correlation coefficient. Functions generate a singlenumber, so using functions you will likely have to combine bits and pieces to get whatyou want. Even so, you may not be able to generate all the parts you need for a completeanalysis. Unless otherwise stated, all statistical tests using Excel were done with the DataAnalysis ToolPak. In order to check a variety of statistical tests, we chose the followingtasks: Get means and standard deviations of X and Y for the entire group, and for each treatment group. Get the correlation between X and Y. Do a two sample t-test to test whether the two treatment groups differ on X and Y. Do a paired t-test to test whether X and Y are statistically different from each other. Compare the number of subjects with each outcome by treatment group, using a chi-squared test. All of these tasks are routine for a data set of this nature, and all of them could beeasily done using any of the aobve listed statistical packages. 44
  45. 45. 10.1 General IssuesEnable the Analysis ToolPak The Data Analysis ToolPak is not installed with the standard Excel setup. Look inthe Tools menu. If you do not have a Data Analysis item, you will need to install theData Analysis tools. Search Help for "Data Analysis Tools" for instructions.Missing Values A blank cell is the only way for Excel to deal with missing data. If you have anyother missing value codes, you will need to change them to blanks.Data Arrangement Different analyses require the data to be arranged in various ways. If you plan on avariety of different tests, there may not be a single arrangement that will work. You willprobably need to rearrange the data several ways to get everything you need.Dialog Boxes Choose Tools/Data Analysis, and select the kind of analysis you want to do. Thetypical dialog box will have the following items: Input Range: Type the upper left and lower right corner cells. e.g. A1:B100. You canonly choose adjacent rows and columns. Unless there is a checkbox for grouping data byrows or columns (and there usually is not), all the data is considered as one glop. Labels - There is sometimes a box you can check off to indicate that the first row ofyour sheet contains labels. If you have labels in the first row, check this box, and youroutput MAY be labeled with your label. Then again, it may not. Output location - New Sheet is the default. Or, type in the cell address of the upperleft corner of where you want to place the output in the current sheet. New Worksheet is 45
  46. 46. another option, which I have not tried. Ramifications of this choice are discussed below.Other items, depending on the analysis.Output location The output from each analysis can go to a new sheet within your current Excel file(this is the default), or you can place it within the current sheet by specifying the upperleft corner cell where you want it placed. Either way is a bit of a nuisance. If eachoutput is in a new sheet, you end up with lots of sheets, each with a small bit of output.If you place them in the current sheet, you need to place them appropriately; leave roomfor adding comments and labels; changes you need to make to format one outputproperly may affect another output adversely. Example: Output from Descriptive has acolumn of labels such as Standard Deviation, Standard Error, etc. You will want to makethis column wide in order to be able to read the labels. But if a simple Frequency outputis right underneath, then the column displaying the values being counted, which mayjust contain small integers, will also be wide.10.2 Results of AnalysesDescriptive Statistics The quickest way to get means and standard deviations for a entire group is usingDescriptives in the Data Analysis tools. You can choose several adjacent columns for theInput Range (in this case the X and Y columns), and each column is analyzed separately.The labels in the first row are used to label the output, and the empty cells are ignored. Ifyou have more, non-adjacent columns you need to analyze, you will have to repeat theprocess for each group of contiguous columns. The procedure is straightforward, canmanage many columns reasonably efficiently, and empty cells are treated properly. To get the means and standard deviations of X and Y for each treatment grouprequires the use of Pivot Tables (unless you want to rearrange the data sheet to separatethe two groups). After selecting the (contiguous) data range, in the Pivot Table Wizards 46
  47. 47. Layout option, drag Treatment to the Row variable area, and X to the Data area. Doubleclick on ―Count of X‖ in the Data area, and change it to Average. Drag X into the Databox again, and this time change Count to StdDev. Finally, drag X in one more time,leaving it as Count of X. This will give us the Average, standard deviation and numberof observations in each treatment group for X. Do the same for Y, so we will get theaverage, standard deviation and number of observations for Y also. This will put a totalof six items in the Data box (three for X and three for Y). As you can see, if you want toget a variety of descriptive statistics for several variables, the process will get tedious. A statistical package lets you choose as many variables as you wish for descriptivestatistics, whether or not they are contiguous. You can get the descriptive statistics for allthe subjects together, or broken down by a categorical variable such as treatment. Youcan select the statistics you want to see once, and it will apply to all variables chosen.Correlations Using the Data Analysis tools, the dialog for correlations is much like the one fordescriptives - you can choose several contiguous columns, and get an output matrix ofall pairs of correlations. Empty cells are ignored appropriately. The output does NOTinclude the number of pairs of data points used to compute each correlation (which canvary, depending on where you have missing data), and does not indicate whether any ofthe correlations are statistically significant. If you want correlations on non-contiguouscolumns, you would either have to include the intervening columns, or copy the desiredcolumns to a contiguous location. A statistical package would permit you to choose non-contiguous columns for yourcorrelations. The output would tell you how many pairs of data points were used tocompute each correlation, and which correlations are statistically significant. 47
  48. 48. Two-Sample T-test This test can be used to check whether the two treatment groups differ on the valuesof either X or Y. In order to do the test you need to enter a cell range for each group.Since the data were not entered by treatment group, we first need to sort the rows bytreatment. Be sure to take all the other columns along with treatment, so that the data foreach subject remains intact. After the data is sorted, you can enter the range of cellscontaining the X measurements for each treatment. Do not include the row with thelabels, because the second group does not have a label row. Therefore your output willnot be labeled to indicate that this output is for X. If you want the output labeled, youhave to copy the cells corresponding to the second group to a separate column, and entera row with a label for the second group. If you also want to do the t-test for the Ymeasurements, you� need to repeat the process. The empty cells are ignored, and other llthan the problems with labeling the output, the results are correct. A statistical package would do this task without any need to sort the data or copy itto another column, and the output would always be properly labeled to the extent thatyou provide labels for your variables and treatment groups. It would also allow you tochoose more than one variable at a time for the t-test (e.g. X and Y).Paired t-test The paired t-test is a method for testing whether the difference between twomeasurements on the same subject is significantly different from 0. In this example, wewish to test the difference between X and Y measured on the same subject. Theimportant feature of this test is that it compares the measurements within each subject. Ifyou scan the X and Y columns separately, they do not look obviously different. But ifyou look at each X-Y pair, you will notice that in every case, X is greater than Y. Thepaired t-test should be sensitive to this difference. In the two cases where either X or Y ismissing, it is not possible to compare the two measures on a subject. Hence, only 8 rowsare usable for the paired t-test. 48
  49. 49. When you run the paired t-test on this data, you get a t-statistic of 0.09, with a 2-tailprobability of 0.93. The test does not find any significant difference between X and Y.looking at the output more carefully, we notice that it says there are 9 observations. Asnoted above, there should only be 8. It appears that Excel has failed to exclude theobservations that did not have both X and Y measurements. To get the correct resultscopy X and Y to two new columns and remove the data in the cells that have no valuefor the other measure. Now re-run the paired t-test. This time the t-statistic is6.14817 with a 2-tail probability of 0.000468. The conclusion is completely different! Of course, this is an extreme example. But the point is that Excel does not calculatethe paired t-test correctly when some observations have one of the measurements butnot the other. Although it is possible to get the correct result, you would have no reasonto suspect the results you get unless you are sufficiently alert to notice that the numberof observations is wrong. There is nothing in online help that would warn you about thisissue. Interestingly, there is also a TTEST function, which gives the correct results for thisexample. Apparently the functions and the Data Analysis tools are not consistent in howthey deal with missing cells. Nevertheless, I cannot recommend the use of functions inpreference to the Data Analysis tools, because the result of using a function is a singlenumber - in this case, the 2-tail probability of the t-statistic. The function does not giveyou the t-statistic itself, the degrees of freedom, or any number of other items that youwould want to see if you were doing a statistical test. A statistical package will correctly exclude the cases with one of the measurementsmissing, and will provide all the supporting statistics you need to interpret the output.Cross tabulation and Chi-Squared Test of Independence Our final task is to count the two outcomes in each treatment group, and use a chi-square test of independence to test for a relationship between treatment and outcome. In 49
  50. 50. order to count the outcomes by treatment group, you need to use Pivot Tables. In thePivot Table Wizards Layout option, drag Treatment to Row, Outcome to Column andalso to Data. The Data area should say "Count of Outcome" – if not, double-click on itand select "Count". If you want percents, double-click "Count of Outcome", and clickOptions; in the ―Show Data As‖ box which appears, select "% of row". If you want bothcounts and percents, you can drag the same variable into the Data area twice, and use itonce for counts and once for percents. Getting the chi-square test is not so simple, however. It is only available as a function,and the input needed for the function is the observed counts in each combination oftreatment and outcome (which you have in your pivot table), and the expected counts ineach combination. Expected counts? What are they? How do you get them? If you havesufficient statistical background to know how to calculate the expected counts, and cando Excel calculations using relative and absolute cell addresses, you should be able tonavigate through this. If not, you’re out of luck. Assuming that you surmounted the problem of expected counts, you can use the Chi-test function to get the probability of observing a chi-square value bigger than the onefor this table. Again, since we are using functions, you do not get many other necessarypieces of the calculation, notably the value of the chi-square statistic or its degrees offreedom. No statistical package would require you to provide the expected values beforecomputing a chi-square test of independence. Further, the results would always includethe chi-square statistic and its degrees of freedom, as well as its probability. Often youwill get some additional statistics as well. 50
  51. 51. 10.3 Additional Analyses The remaining analyses were not done on this data set, but some comments aboutthem are included for completeness. Simple Frequencies You can use Pivot Tables to get simple frequencies. (See Crosstabulations for moreabout how to get Pivot Tables.) Using Pivot Tables, each column is considered aseparate variable, and labels in row 1 will appear on the output. You can only do onevariable at a time. Another possibility is to use the Frequencies function. The main advantage of thismethod is that once you have defined the frequencies function for one column, you canuse Copy/Paste to get it for other columns. First, you will need to enter a column withthe values you want counted (bins). If you intend to do the frequencies for manycolumns, be sure to enter values for the column with the most categories. e.g., if 3columns have values of 1 or 2, and the fourth has values of 1,2,3,4, you will need to enterthe bin values as 1,2,3,4. Now select enough empty cells in one column to store theresults - 4 in this example, even if the current column only has 2 values. Next chooseInsert/Function/Statistical/Frequencies on the menu. Fill in the input range for the firstcolumn you want to count using relative addresses (e.g. A1:A100). Fill in the Bin Rangeusing the absolute addresses of the locations where you entered the values to be counted(e.g. $M$1:$M$4). Click Finish. Note the box above the column headings of the sheet,where the formula is displayed. It start with "= FREQUENCIES(". Place the cursor tothe left of the = sign in the formula, and press Ctrl-Shift-Enter. The frequency countsnow appear in the cells you selected. To get the frequency counts of other columns, select the cells with the frequencies inthem, and choose Edit/Copy on the menu. If the next column you want to count is onecolumn to the right of the previous one, select the cell to the right of the first frequency 51
  52. 52. cell, and choose Edit/Paste (ctrl-V). Continue moving to the right and pasting for eachcolumn you want to count. Each time you move one column to the right of the originalfrequency cells, the column to be counted is shifted right from the first column youcounted. If you want percents as well, you’ll have to use the Sum function to compute the sumof the frequencies, and define the formula to get the percent for one cell. Select the cell tostore the first percent, and type the formula into the formula box at the top of the sheet -e.g. = N1*100/N$5 - where N1 is the cell with the frequency for the first category, andN5 is the cell with the sum of the frequencies. Use Copy/Paste to get the formula for theremaining cells of the first column. Once you have the percents for one column, you canCopy/Paste them to the other columns. You’ll need to be careful about the use ofrelative and absolute addresses! In the example above, we used N$5 for thedenominator, so when we copy the formula down to the next frequency on the samecolumn, it will still look for the sum in row 5; but when we copy the formula right toanother column, it will shift to the frequencies in the next column. Finally, you can use Histogram on the Data Analysis menu. You can only do onevariable at a time. As with the Frequencies function, you must enter a column with "bin"boundaries. To count the number of occurrences of 1 and 2, you need to enter 0, 1, and 2in three adjacent cells, and give the range of these three cells as the Bins on the dialogbox. The output is not labeled with any labels you may have in row 1, nor even withthe column letter. If you do frequencies on lots of variables, you will have difficultyknowing which frequency belongs to which column of data.Linear Regression Since regression is one of the more frequently used statistical analyses, we tried it outeven though we did not do a regression analysis for this example. The Regressionprocedure in the Data Analysis tools lets you choose one column as the dependentvariable, and a set of contiguous columns for the independents. However, it does not 52
  53. 53. tolerate any empty cells anywhere in the input ranges, and you are limited to 16independent variables. Therefore, if you have any empty cells, you will need to copy allthe columns involved in the regression to new columns, and delete any rows thatcontain any empty cells. Large models, with more than 16 predictors, cannot be done atall.Analysis of Variance In general, the Excels ANOVA features are limited to a few special cases rarelyfound outside textbooks, and require lots of data re-arrangements.One-way ANOVA Data must be arranged in separate and adjacent columns (or rows) for each group.Clearly, this is not conducive to doing 1-ways on more than one grouping. If you havelabels in row 1, the output will use the labels.Two-Factor ANOVA without Replication This only does the case with one observation per cell (i.e. no Within Cell error term).The input range is a rectangular arrangement of cells, with rows representing levels ofone factor, columns the levels of the other factor, and the cell contents the one value inthat cell.Two-Factor ANOVA with Replicates This does a two-way ANOVA with equal cell sizes. Input must be a rectangularregion with columns representing the levels of one factor, and rows representingreplicates within levels of the other factor. The input range MUST also include anadditional row at the top, and column on the left, with labels indicating the factors.However, these labels are not used to label the resulting ANOVA table. Click Help onthe ANOVA dialog for a picture of what the input range must look like. 53
  54. 54. 10.4 Requesting Many Analyses If you had a variety of different statistical procedures that you wanted to perform onyour data, you would almost certainly find yourself doing a lot of sorting, rearranging,copying and pasting of your data. This is because each procedure requires that the databe arranged in a particular way, often different from the way another procedure wantsthe data arranged. In our small test, we had to sort the rows in order to do the t-test, andcopy some cells in order to get labels for the output. We had to clear the contents of somecells in order to get the correct paired t-test, but did not want those cells cleared for someother test. And we were only doing five tasks. It does not get better when you try to domore. There is no single arrangement of the data that would allow you to do manydifferent analyses without making many different copies of the data. The need tomanipulate the data in many ways greatly increases the chance of introducing errors. Using a statistical program, the data would normally be arranged with the rowsrepresenting the subjects, and the columns representing variables (as they are in oursample data). With this arrangement you can do any of the analyses discussed here, andmany others as well, without having to sort or rearrange your data in any way. Onlymuch more complex analyses, beyond the capabilities of Excel and the scope of thisarticle would require data rearrangement.10.5 Working with Many Columns What if your data had not 4, but 40 columns, with a mix of categorical andcontinuous measures? How easily do the above procedures scale to a larger problem? At best, some of the statistical procedures can accept multiple contiguous columnsfor input, and interpret each column as a different measure. The descriptives andcorrelations procedures are of this type, so you can request descriptive statistics orcorrelations for a large number of continuous variables, as long as they are entered in 54
  55. 55. adjacent columns. If they are not adjacent, you need to rearrange columns or use copyand paste to make them adjacent. Many procedures, however, can only be applied to one column at a time. T-tests(either independent or paired), simple frequency counts, the chi-square test ofindependence, and many other procedures are in this class. This would become a seriousdrawback if you had more than a handful of columns, even if you use cut and paste ormacros to reduce the work. In addition to having to repeat the request many times, youhave to decide where to store the results of each, and make sure it is properly labeled soyou can easily locate and identify each output. Finally, Excel does not give you a log or other record to track what you have done.This can be a serious drawback if you want to be able to repeat the same (or similar)analysis in the future, or even if you’ve simply forgotten what you’ve already done. Using a statistical package, you can request a test for as many variables as you needat once. Each one will be properly labeled and arranged in the output, so there is noconfusion as to what’s what. You can also expect to get a log, and often a set ofcommands as well, which can be used to document your work or to repeat an analysiswithout having to go through all the steps again. 55
  56. 56. 11. Beyond VBA When you say ―Visual Basic,‖ most developers—particularly those reading thismagazine—will think of the Visual Basic development environment that has been thetopic of these columns for several years now. So what do I mean by ―beyond‖ VisualBasic? I am, however, interested in exploring the capabilities of Visual Basic wherever itleads me, and that sometimes means going outside the traditional Visual Basicdevelopment environment. You’ll be surprised at the programming power you’ll find. I am, of course, talking about Visual Basic for Applications, or VBA—the ―macro‖language that is supported by many Microsoft application programs. I put ―macro‖ inquotes because, while VBA may have its roots in the keyboard macro tools of the past,which permitted recording and playback of keystroke sequences, it has evolved intosomething entirely different. In fact, VBA is essentially the regular Visual Basic languagemodified for use in controlling existing applications rather than creating stand-aloneapplications. You have the same rich set of language constructs, data types, controlstatements, and so on available to you. In fact, from the perspective of the languageitself, a programmer would have trouble telling Visual Basic and VBA apart. Even so,VBA programs are still referred to as macros. VBA is embedded in many Microsoft applications, most notably those that are part ofMicrosoft Office: Word, Excel, Access, Outlook, PowerPoint, and FrontPage. VBA hasalso been licensed by Microsoft to some other publishers of Windows software. You canuse VBA in a keyboard macro mode in which you start recording, perform some actionsin the program, and then save the recorded macro to be played back later as needed.While recording macros only scratches the surface of VBA’s capabilities, it is nonethelessan extremely useful technique that I use on a daily basis. It is important to note that arecorded macro is not saved as a sequence of keystrokes, as was the case in some olderprograms. Rather it is saved as a Visual Basic subroutine, and the statements that carryout the recorded actions consist primarily of manipulation of properties and methods ofthe application’s objects. 56
  57. 57. 12. Conclusion Although Excel is a fine spreadsheet, it is not a statistical data analysis package. In allfairness, it was never intended to be one. Keep in mind that the Data Analysis ToolPak isan "add-in" - an extra feature that enables you to do a few quick calculations. So it shouldnot be surprising that that is just what it is good for - a few quick calculations. If youattempt to use it for more extensive analyses, you will encounter difficulties due to anyor all of the following limitations: Potential problems with analyses involving missing data. These can be insidious, in that the unwary user is unlikely to realize that anything is wrong. Lack of flexibility in analyses that can be done due to its expectations regarding the arrangement of data. This results in the need to cut/paste/sort/ and otherwise rearrange the data sheet in various ways, increasing the likelyhood of errors. Output scattered in many different worksheets, or all over one worksheet, which you must take responsibility for arranging in a sensible way. Output may be incomplete or may not be properly labeled, increasing possibility of misidentifying output. Need to repeat requests for the some analyses multiple times in order to run it for multiple variables, or to request multiple options. Need to do some things by defining your own functions/formulae, with its attendant risk of errors. No record of what you did to generate your results, making it difficult to document your analysis, or to repeat it at a later time, should that be necessary. If you have more than about 10 or 12 columns, and/or want to do anything beyonddescriptive statistics and perhaps correlations, you should be using a statistical package.There are several suitable ones available by site license through OIT, or you can usethem in any of the OIT PC labs. If you have Excel on your own PC, and don’t want to 57
  58. 58. pay for a statistical program, by all means use Excel to enter the data (with rowsrepresenting the subjects, and columns for the variables). All the mentioned statisticalpackages can read Excel files, so you can do the (time-consuming) data entry at home,and go to the labs to do the analysis. I have found Excel to be eminently suitable for use in my measurement and dataanalysis classes. It’s not only suitable, but a very effective and readily-available tool forintroducing students to contemporary data analysis methods. Excel’s fundamental datatable design, coupled with useful chart capabilities, easily leads students down pathswhich will pave the way for their later application of such systems as SPSS and SAS. Although Excel is a fine spreadsheet, it is not a statistical data analysis package. In allfairness, it was never intended to be one. Keep in mind that the Data Analysis ToolPak isan "add-in" - an extra feature that enables you to do a few quick calculations. So it shouldnot be surprising that that is just what it is good for - a few quick calculations. If youattempt to use it for more extensive analyses, you will encounter difficulties due to anyor all of the following limitations: Potential problems with analyses involving missing data. These can be insidious, in that the unwary user is unlikely to realize that anything is wrong. Lack of flexibility in analyses that can be done due to its expectations regarding the arrangement of data. This results in the need to cut/paste/sort/ and otherwise rearrange the data sheet in various ways, increasing the likelyhood of errors. Output scattered in many different worksheets, or all over one worksheet, which you must take responsibility for arranging in a sensible way. Output may be incomplete or may not be properly labeled, increasing possibility of misidentifying output. Need to repeat requests for the some analyses multiple times in order to run it for multiple variables, or to request multiple options. 58

×