Value and Applicability 1Value and Applicability of Re-Sampling Techniques by Edgardo Donovan RES 601 – Dr. Roger Rensvold Module 5 – Case Analysis Monday, September 15, 2008
Value and Applicability 2 Value and Applicability of Re-Sampling Techniques Complexity is the disease. Resampling (drawing repeated samples from the given data, or population suggested by the data) is a proven cure. Bootstrap, permutation, and other computer-intensive procedures have revolutionized statistics. Resampling is now the method of choice for confidence limits, hypothesis tests, and other everyday inferential problems. ANONYMOUS Resample.com, 2008 espite the technological advances that have enabled re-sampling tools more accessible to the general public, formal statistical methodology remains the prevailing technique utilized throughout the majority of research projects given re-sampling’s limitedapplicability to small well defined non-ambiguous sets of data. Whereas typical research projects involve a single random sampling of a large groupof data in an attempt to infer characteristics that apply across its spectrum, re-samplinginvolves numerous repeated samples within the same body of usually small data in anattempt to define the characteristics of the data universally. Only recently has this beeneasier to do. Re-sampling can possibly involve hundreds of thousands of calculations andwas less prevalent when personal computing technology was in its infancy and still ratherexpensive. Organizations promoting re-sampling such as the ones represented atResample.com believe that re-sampling eliminates a lot of the complexity inherent intraditional research methods. They argue that rather than attempting to extend a series ofparametric and non-parametric tests from a small sample to better understand greater
Value and Applicability 3phenomena is inferior compared to re-sampling which enables analysis of the totality ofmost sorts of data. What is troubling is that one cannot find any instance of analysis ontheir web site that examines the perceived advantages and disadvantages of the twomethodologies. Rather than provide analysis as to why re-sampling is superior beyondwhat is discussed above, re-sampling proponents go as far as stating that the growingstream of scientific articles using re-sampling techniques, both as a basic tool as well as fordifficult applications, testifies to re-samplings value (Resample.com). Re-sampling has become increasingly popular as a tool used for testingmediation because it does not require the normality assumption to be met, and because itcan be effectively utilized with smaller sample sizes under 20 units (Wikipedia). One ofthe challenges of traditional research, which emphasizes formal hypotheses andsignificance testing of null hypotheses, is that extreme data variances in the majority ofcases are not desired and can take away from the overall research model applicability. Re-sampling smoothes out the degree of data variance due to the fact that it resamples thesame groups of data sometimes hundreds and even thousands of times. The end result is amore streamlined representation of results. By eliminating the need towards ensuring thatprospective data sets will confine themselves within an acceptable results range, re-sampling mediation renders research less complex. This can be an attractive approach forthose who are seeking to accurately universally define a full range of possible results. Mankind has always longed to make sense of the surrounding world and attemptedto categorize social and natural phenomena within a series of artificial constructs based onan array of logical formulae. The beauty of formal hypotheses and significance testing of
Value and Applicability 4null hypotheses is that it does not attempt to define the totality of an environment butattempts to derive behavior patterns and predispositions through the thorough analysis ofmostly random samples. Some research confines itself in better understanding certainphenomena within very specific contexts and retains its validity for many years. Otherresearch which attempts universally define predictable dynamics both at a micro andmacro level with little to no context is usually less successful. Unfortunately, re-sampling despite its practical applications in few areas, isusually utilized towards achieving the latter objective. The main problem with re-samplingis that it is practical in few mono-dimensional areas where data set behavior patterns canbe universally defined within a handful of parameters. Rather than further illuminateregarding the infinite complexity of the world around us, re-sampling proponents believethat complexity is the problem and that it has to be circumvented (Resampling.com). Chong Ho Yu in his 2003 research titled “Resampling methods: concepts, applications,and Justification”, states that the obstacles in computing resources and mathematical logicshave been removed and that perhaps now researchers will pay more attention tophilosophical justification of re-sampling. In making a case for his argument he brings upan the “Monte Carlo Simulation” where researchers make up data and draw conclusionsbased on many possible scenarios. The name "Monte Carlo" comes from an analogy to thegambling houses on the French Riviera. Years ago some gamblers studied how they couldmaximize their chances of winning by using simulations to check the probability ofoccurrence for each possible case in games of chance. The forerunner of gaming statisticalanalysis geared towards improving the success of players was actually pioneered by Ed
Value and Applicability 5Thorpe in his acclaimed 1962 book “Beat the Dealer”. He devised a somewhat successfulstatistical methodology based on re-sampling designed towards that end. The contextualbasis of his method was the game of Blackjack which provided a contained small statisticaldata set in the form of a deck or two of un-shuffled cards. His methodology provided “hit”or “stay” indicators based on what cards had already been dealt and the probability ofdesirable cards appearing. This method is also known as card-counting and was heraldedas a breakthrough but ceased to work once casinos caught on and started to involve 3 ormore decks of continuously shuffled cards into the game. The added level of complexityeliminated the previous 1% advantage of the card-counter and turned the odds back inoverwhelming favor of the house. Other experts added to the critique of re-sampling vis-à-vis card counting by pointing to the chance of a three-of-a-kind hand. They recognized thatthat event does not happen very often, and it would take many hands from an un-shuffleddeck of cards to estimate its probability (Simon). Once again we see that complexity is the chief enemy of the re-sampling technique.Re-sampling may work fine in small mono-dimensional controlled data set environmentsbut ceases in its efficacy once multidimensional or “complex” variables are added to theequation. The attempt to define multidimensional complex phenomena is the basis formost scientific research and it is hard to imagine one being successful in that endeavor ifthe choice to ignore complexity is made. Despite the many weaknesses of the re-sampling methodology , one of the reasonsfor its continued limited popularity is that it appeals to that facet of the human psyche thatlongs to render the surrounding world less mysterious, more discernable, and less
Value and Applicability 6unpredictable so that it can be managed more effectively (Levin). However, one canattempt to achieve the latter by operationalizing concepts into qualitative variables,extending that process into quantitative data-gathering, and conducting null-hypothesisanalysis conveys a sense of order to what may otherwise seem as abstract ideas ortheories. There may be a more promising future for re-sampling in the area of game-theory.The latter is an accepted technique utilized to measure the likelihood of outcomesconcerning issues related to mono-dimensional environments. There are potentialextensions of game-theory techniques based on re-sampling in the areas of corporate riskmanagement and military war gaming. Although the latter two still involve complexenvironments, re-sampling can be used to better define gain/loss propositions as long asthey are done in a highly contextualized micro-level. For example, a military campaign mayattempt to war-game a specific number of similarly modeled aircraft without taking intoaccount other impacting factors such as air superiority, anti-aircraft resources, weathervariances, proximity to support bases, pilot ability, etc. In the investment world, one couldattempt to resample scenarios based on the past performance of stocks in relation tomono-dimensional variations of inflation, interest rates, etc. Despite the technological advances that have enabled re-sampling tools moreaccessible to the general public, formal statistical methodology remains the prevailingtechnique utilized throughout the majority of research projects given re-sampling’s limitedapplicability to small well defined non-ambiguous sets of data.
Value and Applicability 7 BibliographyAnonymous. (2008). Bootstrapping (statistics). Retrieved on 11 August 2008 fromhttp://en.wikipedia.org/wiki/Bootstrapping_(statistics)Anonymous. (2008). Resampling stats. Retrieved on 11 August 2008 fromhttp://www.resample.com/Howell, David. (2008). Resampling statistics: randomization and the bootstrap. University ofVermontLevin, Joel. (1998). What if there were no more bickering about statistical significance tests?Research in the Schools. Vol. 5, No. 2, 43-53.Simon, Julian. (2008). Why the formal method in statistics is usually theoretically inferior.Retrieved on 11 August 2008 from http://www.graduate.tuiu.com/Yu, Chong Ho. (2003). Resampling methods: concepts, applications, and justification.practical assessment, research & evaluation, 8(19). Retrieved September 10, 2008 fromhttp://PAREonline.net/getvn.asp?v=8&n=19