• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Morningstar ratings and fund performance blake morey
 

Morningstar ratings and fund performance blake morey

on

  • 431 views

 

Statistics

Views

Total Views
431
Views on SlideShare
431
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Morningstar ratings and fund performance blake morey Morningstar ratings and fund performance blake morey Document Transcript

    • J O U R N A L OF FINANCIAL A N D OUANTITATIVE ANALYSIS VOL. 35, NO. 3, SEPTEMBER 2000COPYRIGHT 2000. SCHOOL OF BUSINESS ADMINISTRATION. UNIVERSITY OF WASHINGTON. SEATTLE. WA 98195Momingstar Ratings and Mutual FundPerformanceChristopher R. Blake ancj Matthew R. Morey*AbstractThis study examines the Momingstar rating system as a predictor of mutual fund perfor-mance for U.S. domestic equity funds. We also compare the predictive abilities of theMomingstar rating system with those of altemative predictors. The results indicate find-ings that are robust across different samples, ages and styles of funds, and performancemeasures. First, low ratings from Momingstar generally indicate relatively poor futureperformance. Second, there is little statistical evidence that Momingstars highest-ratedfunds outperform the next-to-highest and median-rated funds. Third, Momingstar ratings,at best, do only slightly better than the altemative predictors in forecasting future fundperformance.I. Introduction In recent years, increasing attention has been paid to the persistence of mu-tual fund performance in the finance literature. Yet, to date, there has been con-siderably less attention devoted to the predictive abilities of the Momingstar five-star mutual fund rating service that many investors use as a guide in their mutualfund selections. This study attempts to fill that void by examining the ability ofthe Momingstar ratings to predict both unadjusted and risk-adjusted retums incomparison with performance metrics common in the performance literature. The question of whether Momingstar ratings predict out-of-sample perfor-mance is an important one, given that several studies in the performance literaturehave documented that new cash flows from investors are related to past perfor-mance ratings (see, e.g., Sirri and Tufano (1998) and Gruber (1996)). In fact,there is evidence that high-rated funds experience cash inflows that are far greater * Blake, Graduate School of Business Administration, Fordham University, 113 West 60th St., NewYork, NY 10023; Morey, Lubin School of Business, Pace University, 1 Pace Plaza, New York, NY10038. Morey acknowledges financial support from the Economic and Pension Research Departmentof TIAA-CREF. We thank Will Goetzmann and Charles Trzcinka for data and Stephen Brown (theeditor), Edwin Elton, Steve Foerster, Doug Fore, Martin Gruber, Mark Hulbert, Zoran Ivkovid (thereferee), Richard C. Morey, Derrick Reagle, Emily Rosenbaum, H. D. Vinod/ Mark Warshawsky,and .seminar participants at the Securities and Exchange Commission and the 1999 European FinanceAssociation Meetings (Helsinki) for helpful comments and suggestions. For example, see Hendricks, Patel, and Zeckhauser (1993), Goetzmann and IbboLson (1994),Malkiel (1995), Brown and Goetzmann (1995), Elton, Gruber, and Blake (1996a), and Carhart (1997). 451
    • 452 Journal of Financial and Quantitative Analysisin size than the cash outflows experienced by low-rated funds (see, e.g., Sirri andTufano (1998) and Goetzmann and Peles (1997)). Hence, examining performanceacross funds grouped by Momingstar rankings will indicate if these cash flows arejustified by subsequent relative performance. As evidence of the importatice of the Momingstar five-star rating service(where a five-star rating is the best and a one-star rating is the worst), considera recent study, reported in both the Boston Globe and The Wall Street Journal,^which found that 97% of the money flowing into no-load equity funds betweenJanuary and August 1995 was invested into funds that were rated as five- or four-star funds by Momingstar, while funds with less than three stars suffered a netoutflow of funds during the same period. Moreover, the heavy use of Momingstarratings in mutual fund advertising suggests that mutual fund companies believethat investors care about Momingstar ratings. Indeed, in some cases, the onlymention of retum performance in the mutual fund advertisement is the Mom-ingstar rating. Finally, the importance of Momingstar ratings has been under-scored by some recent high-profile publications (e.g., Blume (1998) and Sharpe(1998)) that have investigated the underlying properties of the Momingstar ratingsystem. Despite its importance, there is, to our knowledge, only one extant academicstudy on the predictive abilities of the Momingstar ratings service. Khorana andNelling (1998) examine the question of persistence of the Momingstar ratingsand compare the Momingstar ratings from a group of funds in December 1992to the June 1995 ratings of those same funds. Although they find evidence ofpersistence, in that high-rated funds are still high rated and low-rated funds arestill low rated, there are a number of problems with the study. First, there is asurvivorship bias problem, since the funds were selected at the end of the sampleperiod rather than at the beginning. Hence, any fund that had merged, liquidated,or changed its name between the beginning and ending of the sample period wasnot included. Second, because Momingstar uses a 10-year risk-adjusted retumas a major component of its ratings, and because there are only two and one-half years of data between the beginning and end of the Khorana and Nellingsample, the ratings are based on overlapping data. Consequently, the findings ofpersistence in the ratings are endemic to the data. Finally, Khorana and Nellingonly examine performance persistence as measured by Momingstar ratings; theydo not examine how well Momingstar ratings predict more standard measures ofperformance.^ In this paper, we examine whether the Momingstar five-star system has anypredictive power for the future performance of funds. Our data and methodologyare sensitive to the following key issues in mutual fund research. ^Charles Jaffe, "Rating the Raters: Flaws Found in Each Service," Boston Globe, Aug. 27, 1995,p. 78. The same survey was also reported by Karen Damato, "Momingstar Edges Toward One-YearRatings," The Wall Street Journal, April 5, 1996. ^It should be noted that Momingstar reports an in-house study, conducted by Laura Lallos in 1997,in which 45% of the five-star funds in 1987 receive five stars in 1997. However, no other comparisonsare provided and few details of the study are reported.
    • Blake and Morey 453 i) We use a mutual fund data set generated at the time the funds were actually rated by Momingstar and then follow the out-of-sample performance of these funds. This methodology allows us to circumvent the well-known survivorship bias problem described by Brown, Goetzmann, Ibbotson, and Ross (1992), Elton, Gruber, and Blake (1996b), and others. ii) We adjust retums for front-end and deferred loads, because the Momingstar rating system also adjusts for loads. iii) We compare the predictive abilities of the Momingstar ratings with those of altemative predictors: in-sample historical average monthly retums, one- and four-index in-sample alphas, and in-sample Sharpe (1966) ratios. iv) We examine out-of-sample one-, three-, and five-year horizons, so that we can give both short- and long-term analyses of the predictive abilities of Momingstar ratings and the altemative predictors. Moreover, these time horizons are consistent with the historical retums that prospective investors are often provided when considering a mutual fund. v) We examine the predictive abilities of the Momingstar ratings and the alter- native predictors at different times to observe how well they predict in up and down markets. vi) We address the issue of performance predictability by separating domestic equity funds according to investment style (i.e., aggressive growth, equity- income, growth, growth-income, and small company funds) at the time they were rated (see Brown (1999), Brown and Goetzmann (1997), Elton, Gm- ber. Das, and Hlavka (1993), and Goetzmann and Ibbotson (1994)).vii) We explore whether the age of a fund affects performance predictability by separating funds into young (three to five years), middle-aged (five to 10 years), and seasoned (10 years or more) groups.viii) We measure out-of-sample performance using several well-known perfor- mance metrics, including the Sharpe ratio, mean monthly excess retums, a modified version of Jensens alpha (1968), and a four-index alpha (Elton, Gruber, and Blake (1996a)). ix) We analyze the results using parametric and non-parametric tests. The paper is organized as follows. Section II describes the data and relateshow the funds were chosen, how Momingstar calculates their ratings, and howthe retums data were collected and calculated. Section III describes the method-ology of the paper. Section IV presents the Momingstar rating results. Section Vpresents the altemative predictor results, and Section VI provides the conclusion.II. DataA. Sample Groups and Fund Selection Criteria We examine two broad sample groups in this study. For simplicity, we termthese samples seasoned funds 1992-1997 and complete funds 1993.
    • 454 Journal of Financial and Quantitative Analysis1. Seasoned Funds 1992-1997 For the first sample group, we use the beginning-of-the-year MomingstarOn-Disk or Principia programs from 1992 to 1997 to select mutual funds.** Weuse the beginning-of-the-year disks to simplify the data so that we always examinecalendar years. Moreover, we start at the beginning of the year 1992 since thiscorresponds to the first beginning-of-the-year On-Disk program. By using the actual Momingstar disks, we know all the funds available toinvestors selecting funds based on Momingstar ratings at the time of the Mom-ingstar evaluation, thus circumventing any possible survivorship bias problems.Data prior to the beginning of the On-Disk program are available from Mom-ingstar on a proprietary basis, however, these data include only the survivingfunds. Funds that were rated at the time of the Momingstar rating that weremerged or liquidated at some later date are not available.^ Since the use of suchdata would introduce a severe survivorship bias, they are not used in our study. From the beginning-of-the-year disks we then select funds based on three cri-teria. First, we select only "domestic equity" funds as identified by Momingstars"Investment Class." From the domestic equity funds, we then select all fundswithin each of the following five Momingstar "investment objectives" (styles):aggressive growth, equity-income, growth, growth and income, and small com-pany. This allows us to examine whether there is a "style effect" on fund per-formance predictability. It is important to note that the designation for the "in-vestment objective" is determined by Momingstar, usually based on the wordingin the funds prospectus. However, in some cases, Momingstar may give a fundan investment objective different from that implied by the funds name or by thefunds prospectus if Momingstar determines that the fund invests in a way that isinconsistent with the wording in its prospectus. Since we examine the out-of-sample performance, we also examine if thefunds retain their classifications by Momingstar in the out-of-sample periods. Inevery sample examined, at least 85% ofthe funds retain their style classification atthe end of the sample period.^ Hence, according to Momingstar, the vast majorityof funds do not change their style of management. Second, each fund had to have at least 10 years of retums at the time itwas rated by Momingstar; e.g., funds rated by Momingstar in January 1993 hadto have retums data starting from, at the latest, January 1983. We use the 10-year cutoff because the 10-year in-sample period utilizes Momingstars base-linerating system. As stated earlier, Momingstar provides each mutual fund with aone- to five-star summary rating. To obtain this summary rating, Momingstartakes a weighted average of the three-, five-, and 10-year risk-adjusted retums,where the weights are 20% on the three-year retum, 30% on the five-year retum. ••These corre.spond to the January 1992 On-Disk, January 1993 On-Disk, January 1994 On-Disk,January 1995 On-Disk, January 1996 On-Disk, and the January 1997 Principia. (In October 1996On-Disk changed to Principia.) The On-Disks begin in October 1991. *We thank Peter Carrillo of Momingstar for this point. ^To obtain this percentage, we examined only the funds in the sample that did not merge or liqui-date during the out-of-sample period.
    • Blake and Morey 455and 50% on the 10-year retum. Due to its importance in Momingstars rankings,we used the 10-year time period as a criterion in selecting funds. Third, funds had to be open at the time they were rated by Momingstar. Anyfund that was closed to new investors at the time of the rating was excluded fromour analysis to maintain a sample of funds that could actually be invested in at thetime of the ratings.*2. Complete 1993 Sample The above sample group excludes many Momingstar-rated funds simply be-cause of our criterion that funds must have 10 years of in-sample data at the timethey are rated. While their base-line rating system uses a combination of thethree-, five-, and 10-year retums, Momingstar still rates funds with less than 10years of retums. So long as a fund has at least three years of retums history, it canreceive a summary star rating from Momingstar. Funds with three-five years ofretums history are rated using a system that puts a 100% weighting on their three-year past performance; funds with five-10 years of retums history are rated usinga system that puts a 40% weighting on the three-year retum and a 60% weightingon the five-year retum. By excluding younger funds, we miss out on another interesting aspect ofthe Momingstar rating system. Since younger funds are rated only on short-termretums (i.e., the three-year retum) whereas older funds are rated on a combinationof the three-, five-, and 10-year retums, the younger fund ratings are particularlysensitive to the overall performance of the market. In a bull market (as in thelate 1990s), young equity funds could receive higher ratings not because of bettershort-term performance, but rather because the rating system only evaluates themduring a time when the market is doing exceptionally well.^ This could alter thepredictive ability of the ratings. The problem with including younger funds in our first sample group is thatthe number of funds to examine is onerous. As a tradeoff, we create anothersample in which we include virtually all open aggressive growth, equity-income,growth, growth-income, and small company funds that were rated by Momingstarin January 1993." By using the 1993 data, we only have to examine the out-of-sample performance of 635 funds as opposed to well over 2000 funds if wewere to use the 1996 or 1997 On-Disk/Principia programs. Furthermore, by usingJanuary 1993 rated funds, we are still able to follow out-of-sample performanceout to five years. In summary, the complete funds 1993 sample includes all open funds ratedby Momingstar and listed as aggressive growth, equity-income, growth, growth-income, or small company. Hence, this includes young funds (between three and five years of retum history at the time they were rated), middle-aged funds (be- *The number of closed funds that met our other criteria was as follows: January 1992 sample:11 funds; January 1993 sample: 11 funds; January 1994 sample: 19 funds; January 1995 sample: 19funds; January 1996 sample: 28 funds; January 1997 sample: 37 funds. Blume (1998), in a study utilizing only 1996 data, provides some evidence that there is a relativelyhigh percentage of young funds that are classified as five-star or one-star funds. "Twenty-four funds met our other criteria, yet were listed a.s closed funds in January 1993; weexcluded them from the sample.
    • 456 Journal of Financial and Quantitative Analysistween five and 10 years of historical retums at the time they were rated), andseasoned funds (10 or more years of retums at the time they were rated). As with the seasoned funds samples, we also examine the number of fundsthat change their investment objective in the out-of-sample periods. Similar to theseasoned funds sample, at least 85% of the funds do not change their Momingstarinvestment objective over the course of the out-of-sample period."B. Problem Funds To examine the out-of-sample forecast ability of Momingstar ratings, weobtain the out-of-sample monthly retums of these funds. For a majority of thefunds, obtaining the out-of-sample retums is simply a matter of following thepreviously rated fund. However, because a minority of funds have either gonethrough a name change, a merger, a combination of both, or because they haveliquidated, identifying out-of-sample retums for those funds is more complicated.We describe how we handle these problematic funds. We use the Momingstar data^ and The Wall Street Journal to identify thename changes. We then simply use the newly named funds retums as the out-of-sample retums. For the merged funds, we used the Momingstar data and The Wall StreetJournal to ascertain the month of the fund merger. However, when these twosources did not provide the necessary information, we called the individual mu-tual fund companies. Once the merger month was identified, we then collectedthe out-of-sample retums by the following procedure. First, until the fund merges,we simply use the out-of-sample retums of the fund in question. After the fundmerges into its partner fund, we assume the investor randomly reinvests into oneof the other surviving funds with the same investment objective as the mergedfund in our sample. Hence, the out-of-sample retums from the merger monthonward are equally-weighted monthly averages of the retums of all the other sur-viving funds in our sample with the same investment objective as the mergedfund. The assumption of random reinvestment into any surviving fund regardlessof its ranking may at first blush seem irrational, given that investors should prefersuperior funds. But we are examining Momingstar predictability, not only for su-perior, but also for inferior performance. Forcing random reinvestment into onlyhigh-ranked funds could bias the predictability results. Furthermore, an investormay be interested in using Momingstar rankings not only for investment in supe-rior funds, but also for avoiding investment in inferior funds. (Nevertheless, wedid examine the results obtained by assuming an investor randomly chose a sur-viving fund from those rated only three stars or better; the results were virtuallyidentical.)^ " As with our seasoned funds sample, we examined only the funds in the sample that did not mergeor liquidate during the out-of-sample period to obtain this percentage. ^The Momingstar On-Disk and Principia disks (after 1993) each provide a list of funds that haverecently undergone name changes, mergers, and liquidations. ^An additional issue in using our "random reinvestment" assumption is that an investor may incurcapital gains or losses when closing out an investment in a merged fund and then investing in anotherfund, inducing a tax effect. However, even for surviving funds, potential tax effects exist becausefunds must by law distribute all income distributions and intemally generated capital gains to their
    • Blake and Morey 457 For the liquidated funds, we first determined when the fund was liquidated.Again, this information was obtained from Momingstar or The Wall Street Jour-nal. As with the merged funds, from the month of liquidation and onward, weassume the investor randomly re-invests in the current sample of funds with thesame investment objective as the merged fund.C. Momingstar Ratings To calculate its ratings, Momingstar first classifies funds into one of fourcategories: domestic equity, foreign equity, municipal bond, and taxable bond.The ratings are then based upon an aggregation of the three-, five-, and 10-yearrisk-adjusted retum for funds with 10 years or more of retum history, three- andfive-year risk-adjusted retums for funds with five to less than 10 years of retumdata, and three-year risk-adjusted retums for funds with three to less than fiveyears of retum data. To calculate the risk-adjusted retum, Momingstar first calcu-lates a load-adjusted retum for the fund by adjusting the retums for expenses suchas 12b-l fees, management fees, and other costs automatically taken out of thefund, and then by adjusting for front-end and deferred loads.^ Next, Momingstarcalculates a "Momingstar retum" in which the expense- and load-adjusted excessis retum divided by the higher of two variables: the excess average retum of thefund category (domestic stock, intemational stock, taxable bond, or municipalbond) or the average 90-day U.S. T-bill rate, (Expense- and Load-Adjusted Retum on the Fund - T-Bill) max[{Average Category Retum - T-Bill), T-Bill]Momingstar divides through by one of these two variables to prevent distortionscaused by having low or negative average excess retums in the denominator ofequation (1). Such a situation might occur in a protracted down market.^investors, and, even if an investor chooses to automatically reinvest a distribution back into the fund,it is a taxable event (unles.s the fund is held in a tax-sheltered plan such as an IRA or Keogh plan).This study (and most other extant studies on fund performance) analyzes pre-tax retums, so any taxeffects of our reinvestment assumption do not affect our results. An altemative approach that does nothave the potential tax effect of our reinvestment approach for merged funds is the "follow-the-money"approach introduced in Elton, Gruber, and Blake (1996b), where a merged funds retums are splicedto its "merge partner" funds retums to form a complete time series. (The investor essentially staysinvested without having to close out and then reinvest.) But because of the way we calculate our out-of-sample performance alpha.s for disappearing funds (see Section III for details), we would require acomplete in-sample time series of retums for the "merge partner" fund, and in some cases the partnerfund did not exist long enough to obtain such a series. ••originally Momingstar used only three categories: domestic equity, municipal bond, and taxablebond. The foreign equity funds were placed in the domestic equity category. The foreign equity cat-egory was started in 1996. The category definitions we describe in this paper are from Momingstars1998 manual. "Blume (1998), pp. 4-5, provides an excellent description of how Momingstar accounts for loadsin the Momingstar retums. The load adjustment process is the following. Assume L is the loadadjustment. If there is no load of any type, then L is equal to 1. If there is a load, L is less than one;e.g., a 4% front-end load would make L equal to 0.96. The load-adjusted retum is then equal to thereturn of the fund times L Note that the front-end load is always assumed to the be the maximumpossible load. The deferred load adjustment is reduced as the holding period is increased. In SectionII, we explain in more detail how we adjust the retum data for loads. ^Principia Manual, p. 97.
    • 458 Journal of Financial and Quantitative Analysis Momingstar then calculates a "Momingstar risk" measure, which is calcu-lated differently from traditional risk measures, such as beta and standard devia-tion that both see greater-than and less-than-expected retums as added volatility.Momingstar believes that most investors greatest fear is losing money, whichMomingstar defines as underperforming the risk-free rate of retum an investorcan eam from the 90-day Treasury bill. Hence, their risk measure only focuseson downside risk. To calculate risk, Momingstar plots monthly retums in re-lation to T-bill retums, adds up the amounts by which the fund trails the T-billretum each month, and then divides that total by the time horizons total numberof months. This number, the average monthly underperformance statistic, is thencompared with those of other funds in the same broad investment category to as-sign the risk scores. The resultant Momingstar risk score expresses how risky thefund is relative to the average fund in its category.^ To calculate a funds summary star rating, Momingstar calculates the three-,five-, and 10-year Momingstar retum and risk. For each time horizon, the Mom-ingstar risk scores are then subtracted from the Momingstar retum scores. Thethree numbers (one for each time horizon) are then given subjective weights."The three-year number receives a 20% weighting, the five-year a 30% weighting,and the 10-year a 50% weighting. As stated above, in the case of young funds(funds with three to less than five years of retum data), the three-year number re-ceives a 100% weighting; in the case of middle-aged funds (funds with five to lessthan 10 years of retum data), the three-year number receives a 40% weighting andthe five-year number receives a 60% weighting. With these weights, Momingstarcalculates the weighted average of the numbers. The resulting number is thenplotted along a bell curve to determine the funds star rating. If the fund scoresin the top 10% of its broad investment category, it receives a rating of five stars;if the fund falls in the next 22.5%, it receives four stars; if it falls in the middle35%, it receives three stars; if it lies in the next 22.5%, the fund receives two stars,and if it is in the bottom 10%, it receives one star. Momingstar, with a few minorexceptions, has used this same summary rating system throughout its history.^° Several features of the data should be noted.^i) For the seasoned fund subsamples, the number of funds in each sample grows. This is not surprising, since with each year the number of funds that meet the criteria grow. ^Focusing only on downside risk is neither unique to Momingstar nor new; it was explored byMarkowitz (1959) and incorporated into an asset-pricing model by Bawa and Lindenberg (1977). "principia Manual, p. 98. "Morey and Morey (1999) present a methodology that endogenously determines these weights. ^"The Momingstar technical staff verified this point. See Blume (1998), p. 3, for more on thisissue. ^ Detailed tables describing the breakdowns and averages of star ratings, loads, styles, and ages forall of our data samples are available from the authors upon request.
    • Blake and Morey 459ii) There are more five-star funds than one-star funds and the average star rating of each sample is above three. This skewness in the ratings of the sample in- dicates that aggressive growth, equity-income, growth, growth-income, and small company funds with 10 years or more of retums performed slightly better than other funds in the Momingstar domestic equity category.^^iii) The standard deviation of the ratings is about the same in each sample, in- dicating that the distribution of the ratings does not differ much from one sample to another.iv) For the load-funds, most have front-end loads and relatively few have de- ferred loads.v) Most of the funds are grouped within the growth and growth-income invest- ment objectives. When we examined the average star ratings by style, we found that, for mostof our sample years, aggressive growth and small company funds have fewerfunds and lower averages than the other styles. Moreover, in many samples, theaggressive growth, equity-income, and small company styles have few, if any,funds in the lowest or highest star categories. Also, the standard deviations areabout the same in each subsample within each investment style, with the notableexception being the 1993 aggressive growth subsample. For the complete fund 1993 sample, as with the seasoned funds sample, theaverage star rating is above three and there are relatively more five-star funds thanthere are one-star funds. The average rating exceeding three stars is a result ofother investment objectives being grouped into the domestic equity category (seefootnote 22). Other interesting features ofthe complete fund 1993 sample follow. i) Most of the funds are in the seasoned and middle-aged category; only 14% of the funds are young funds. ii) More than one-half of the funds are load funds. Again, loads seem to be an important factor to consider.iii) As with the seasoned funds sample, most of the funds are clustered in the growth and growth-income styles.iv) Aggressive growth funds fare worse than the other investment objectives in terms of star ratings. v) There are no substantial differences in the average star ratings of young, middle-aged, and seasoned funds, yet the respective distributions are quite different. In fact, there are no one-star young funds. As stated above, young funds receive their stars based upon the past three years of retums, so it is possible that the three years prior to January 1993 did not drive any new funds into the bottom rating category. higher average star ratings could be due to seasoned funds performing slightly better, orit could be a result of other investment objectives (styles), besides those used in this study, beinggrouped into the domestic equity category. These other investment objectives include domestic hybridfunds, convertible bond funds, fund.s termed by Momingstar to be miscellaneous funds, and evenintemational funds up until 1996. Blume (1998) has documented that these other investment objectivefunds generally have lower performance and are rated lower than the Aggressive growth, equity-income, growth, growth-income, and small company funds.
    • 460 Journal of Financial and Quantitative AnalysisD. Momingstar Scores Since January 1994, Momingstar has provided the three-, five-, and 10-yearMomingstar retum and risk numbers for all the mutual funds it evaluates. Withthat information, plus the subjective weights (20%, 30%, and 50% for the three-,five-, and 10-year horizons), we can calculate the resultant scores and numeri-cally rank the funds evaluated here. These scores then allow us to conduct non-parametric rank correlation tests. (Since the data are not provided before 1994,we do not conduct these tests either for the seasoned 1992 and 1993 samples, orfor the complete funds 1993 sample.)E. Alternative Predictors We compare the Momingstar rankings and scores with those of four alter-native predictors. Each altemative predictor is calculated during the in-sampleperiod just prior to fund selection, either during the 10-year period prior to theout-of-sample evaluation periods when examining the seasoned funds 1992-1997sample, or during the three-year period prior to the out-of-sample evaluation peri-ods when examining the complete funds 1993 sample only. For a naive predictor,we use the funds average monthly in-sample retum. The second altemative pre-dictor we use is the in-sample Sharpe ratio.(2) Sharpe, =where /?,• - Rp is the mean excess (net ofthe 30-day T-bill rate) monthly retum forthe rth mutual fund during the in-sample period, and cr, is the standard deviationof the excess monthly retums for the ith mutual fund during the in-sample period. For two additional altemative predictors, we use Jensen single-index andfour-index alphas, obtained from the following time-series regression model.(3) Ri, = ai-^where Ri, = the excess total retum (net of the 30-day T-bill retum) for fund i in in-sample month f, a,- = the alpha for fund i, used as a performance predictor; Pik = the sensitivity of fund is excess retum to index k; Ik, = the retum for index k in in-sample month t; and £,-, = the random error for fund / in in-sample month t.For Jensen alphas, K— and h, = the excess total retum ofthe S&P 500 in monthf. For the four-index alphas, K = 4, I, = the excess total retum of the S&P 500in month t, I2, = the excess total retum of the Lehman Aggregate Bond Index inmonth t, h, = the difference in retum between a small-cap and large-cap stockportfolio based on Prudential Bache indexes in month t, and U, = the difference
    • Blake and Morey 461in retum between a growth and value stock portfolio based on Prudential Bacheindexes in month t.^^ We utilize the four-index model because, as Elton, Gruber,and Blake (1996a) show, this model provides better risk adjustment for mutualfunds than the single-index model.R Out-of-Sample Evaiuation Periods When evaluating performance, investors are typically presented with theone-, three-, five-, and (if available) 10-year past performance windows. Sim-ilarly, we use one-, three-, and five-year periods to examine the out-of-sampleforecasting ability of Momingstars ratings. (The 10-year window is outside thebounds of our sample.) This provides 12 subsamples for performance evaluationfor our seasoned funds 1992-1997 sample and three additional subsamples forour complete funds 1993 sample.^G. Returns Data and Load Adjustments For the out-of-sample and in-sample retums used in the altemative predic-tors, the data consist of monthly retums from the Momingstar On-Disk and Prin-cipia programs. These retums data are adjusted for management, administrative, 12b-l fees, and other costs automatically taken out of fund assets. However, un-like the Momingstar risk-adjusted ratings, the monthly retum data do not adjustfor sales charges such as front-end and deferred loads.^ Consequently, if we usethe monthly retum data for the out-of-sample retums, the retums on load fundswill be overstated. Very little attention in the mutual fund performance literature is given to thetreatment of loads in retum data. Although some authors (e.g., Gruber (1996))have presented results separately for load and no-load funds, most studies (e.g.,Hendricks, Patel, and Zeckhauser (1993), Elton, Gruber, and Blake (1996a),Malkiel (1995), and Carhart (1997)) provide no direct adjustment for loads intheir retums data. Loads may be important, especially in this paper since theMomingstar ratings encompass load-adjusted retums, but the question is how todeal with them. Should one use front-end loads, deferred loads, or both? Whenand for how long should one apply the load? What if the mutual fund has reducedits load over time (especially the deferred load)? Should one use an average loadadjustment for each month or an annualized load? If one decides to use an annu-alized load, what interest rate should one use to discount the load factor? In light of these issues, we adjust the monthly retums of each mutual fundusing an approach similar to Rea and Reid (1998). For both front-end and deferredloads, we consider an investor who buys and holds the load shares for a fixednumber of months, i.e., 12 months (one year), 36 months (three years), or 60months (five years). For front-end loads, the investor buying the fund pays the ^See Elton, Gruber, and Blake (1996a) for a detailed description ofthe Prudential Bache portfoliosused in the four-index model. ^"•A detailed table showing the number of funds, number of merged and liquidated funds, andnumber of funds that changed their Momingstar styles for all out-of-sample periods and both samplegroups is available from the authors upon request. 2Principia Manual (1998), p. 107.
    • 462 Journal of Financial and Quantitative Analysisload in a lump sum at the time of purchase. To spread the front-end load acrossthe period that the shares are held, we use Rea and Reids assumption that theinvestor borrows the amount necessary to pay the load up front and then repaysthe loan as an annuity in equal, monthly installments during the holding period.Hence, the monthly load adjustment reflects the amount that was borrowed andthe interest on the loan. Mathematically, our front-end load adjustment process is(4) r = 7=1where r is the monthly interest rate (the monthly geometric average of the one-,three-, or five-year Treasury yield over the holding period),/ is the front-end load(expressed as a percent), h is the number of months the fund is held, and/"* is themonthly front-end load adjustment. Hence, the front-end load adjusted retumsare(5) Ri, — Kit -J ,where /?„ is the monthly retum of fund i in month t and Rj^^ is the monthlyfront-end-load-adjusted retum of fund i in month f. As an example of the adjustment, consider a one-year investment in Fi-delitys Magellan fund starting in January 1992 when Magellan had a front-endload of 3%, and the one-year Treasury yield was 3.84%, giving a monthly aver-age rate of 0.31%. Therefore, for the one-year holding (out-of-sample) period,/= 3%, r = 0.0031, and /i = 12, giving/"" = 0.255%. We then subtract 0.255%from each of the Magellan funds 12 monthly retums during 1992 to obtain theload-adjusted retums. For the deferred-load adjustment, the process is slightly different in the factthat the payment of the deferred load does not occur until the end of the hold-ing period. To convert the deferred load into a monthly payment, the investor isassumed to have prepaid the load in equal monthly installments. The amount ofthe monthly prepayment reflects the deferred load less the interest eamed on theprepayments. Thus, the equation for the monthly deferred-load adjustment is(6) 7=1where d is the deferred load (expressed as a percent) and d"" is the monthlydeferred-load adjustment. Hence, the deferred-load-adjusted retums are(7) R°^^ = Rit-d"",where /?,-, is the monthly retum of fund i in month t and R^^^ is the monthlydeferred-load-adjusted retum of fund i in month t. As with the front-end loads, we use the monthly geometric average of theone-, three-, or five-year Treasury yield over the holding period for the interest
    • Blake and Morey 463rate. However, in contrast to the front-end load adjustment, we reduce the amountof the deferred load as the holding period, h, increases. We do this because Mom-ingstar also reduces the deferred load as the holding period increases. Hence, fora holding period of 12 months, the full amount of the deferred load is imposed.For the 36-month holding period, we apply only half of the original deferred loadand in the 60-month holding period, the deferred load completely disappears.III. Methodology To measure out-of-sample performance we use four performance metrics. Toexamine out-of-sample performance of the Momingstar ratings and the altemativepredictors, we use two methods: dummy variable regression analysis and the non-parametric Spearman-Rho rank correlation test.A. Out-of-Sample Performance Measurement We use four performance metrics from the existing performance literature tomeasure out-of-sample performance: the Sharpe (1966) ratio, the mean monthlyexcess retum, a modified version of Jensens (1968) alpha, and a modified versionof a four-index alpha (Elton, Gmber, and Blake (1996a)). For each performancemetric, we examine both non-load-adjusted and load-adjusted versions. We reportresults only for the load-adjusted Sharpe ratio, the load-adjusted mean monthlyexcess retum, the non-load-adjusted modified Jensen alpha, and the non-load-adjusted modified four-index alpha.^* The load-adjusted Sharpe ratio for fund i is(8) Sharpe, =where Rf^ — RF is the mean excess (net of the 30-day T-bill rate) load-adjustedmonthly retum for the zth mutual fund during the evaluation (out-of-sample) pe-riod andCT/is the standard deviation of the excess load-adjusted monthly retumsfor the Ith mutual fund during the evaluation period. The non-load-adjustedSharpe ratio is essentially the same as equation (2), except that it uses the out-of-sample period. The load-adjusted mean monthly excess retum is R]-^ - Rf.The non-load-adjusted mean monthly excess retum is Ri - RF. The non-load-adjusted modified Jensen and four-index alphas are calculatedusing a methodology similar to Elton, Gruber, and Blake (1996a). Specifically, foreach seasoned funds subsample, we utilize a time-series period of monthly non-load-adjusted retums going back 10 years from the selection date and forward tothe end of the out-of-sample evaluation period to obtain an estimate of the inter-cept from either the single-index or four-index model regression (equation (3)).For our complete funds 1993 sample group, we utilize a time-series period of ^^The results for the metrics that are not reported in this paper, i.e., those for the non-toad-adjustedSharpe ratio, the non-load-adjusted mean monthly excess retum, the load-adjusted modified Jensenalpha and the load-adjusted modified four-index alpha, are essentially the same as their load/non-loadcounterparts. These results are available from the authors upon request.
    • 464 Journal of Financial and Quantitative Analysismonthly non-load-adjusted retums going back three years from the selection dateand forward to the end of the out-of-sample evaluation period to obtain an esti-mate of the intercept from either the single-index or four-index model regression(equation (3)). To obtain the alphas, we add the average monthly residual during the evalu-ation period to the intercept. For example, to obtain a modified Jensen alpha fora seasoned funds one-year out-of-sample performance measure in the 1992 sub-sample, we mn the one-index model on monthly retums starting in January 1982and ending in December 1992 (11 years) to obtain an estimate of the intercept.We then add the average of the funds residuals during the one year after the se-lection date (the evaluation period) to the estimated intercept to obtain the fundsmodified Jensen alpha. To obtain alphas for funds that merged or liquidated during the evaluationperiod, we first run two regressions: i) a regression using the funds retums goingback either 10 or three years from the selection date and ending in the month priorto the funds disappearance, and ii) a regression run over the entire regression pe-riod using the retums on an equally-weighted portfolio formed each month fromthe existing funds in the sample. We then form a weighted average of: i) the fundsestimated intercept plus the funds average residual during the time it survived inthe evaluation period, and ii) the estimated intercept plus the average residual dur-ing the remaining time in the evaluation period of the equally-weighted portfolio,where the funds weight is the fraction of the evaluation period it survived andthe equally-weighted portfolios weight is the remaining fraction. This provides aperformance measure for an investor who buys a remaining fund in the sample atrandom if the original fund merges or liquidates. For the load-adjusted modified Jensen and four-index alphas, we do not useload-adjusted retums, since we use both out-of-sample and in-sample data forthese measures. We could apply loads to the in-sample data, however, doing sowould raise a number of problems. First, the loads during the in-sample periodmay be quite different from those in the out-of-sample period. Second, and moreimportantly, it is not clear how we should deal with loads before an investor ownsa fund. Again, our assumption in this paper is that the investor selects the fundsat the time they are rated by Momingstar. Moreover, our load adjustment dependsupon how long the investor holds the fund. If we were to assume that the investoralready owned the fund before the out-of-sample period started, and paid loadsduring the in-sample period, it would be difficult to determine the correct load toassess for the out-of-sample period. As an altemative, we adjust the single-index and four-index alphas for loadsby using an added (0,1) dummy variable in equation (9), where 1 = load fundsand 0 = no-load funds.B. Dummy Variable Regression Anaiysis The first method we use to examine out-of-sample predictive performance isa cross-sectional dummy variable regression analysis that allows us to examinethe Momingstar star ranking group differences in performance predictability. Tomake the results for the altemative predictors comparable to those for the Mom-
    • Blake and Morey 465ingstar star groups, we divide the funds into five subgroups after ranking themin descending order by each of their altemative predictors. These five altema-tive predictor subgroups are not quintiles, since we wanted to preserve the samenumber of funds in each altemative predictor subgroup as we have in each ofthe five Momingstar star groups. As an example, consider our January 1992 sea-soned fund subsamples. The same 263 funds are in each of these subsamples: 18five-star funds, 93 four-star funds. 111 three-star funds, 33 two-star funds, and 8one-star funds. Therefore, for our 1992 seasoned fund subsamples, for any oneof our altemative predictors, group five has 18 funds with the highest altemativepredictor, group four has the next highest 93 funds, etc. We estimate the following equation for each of the 12 subsamples for theseasoned funds and for each of the three subsamples for the 1993 complete set,(9) Si = jo-i-yiD4i + j2D3i + j3D2i + j4Dli + Ui,where: 5, = out-of-sample performance metric for fund ;, i.e., the load-adjusted Sharpe ratio, the load-adjusted mean monthly retum, the non-load ad- justed single index alpha, the non-load adjusted four-index alpha; D4 — 1 if a four-star fund or if in altemative predictor group four, 0 if not; Z)3 = 1 if a three-star fund or if in altemative predictor group three, 0 if not; D2 = 1 if a two-star fund or if in altemative predictor group two, 0 if not; Dl = 1 if a one-star fund or if in altemative predictor group one, 0 if not; J = 1 through A^, where N is the total number of funds in the subsample. In equation (9), the five-star fund group or the altemative predictor groupfive is the reference group for the dummy variable regression.^^ Hence, whenusing the load-adjusted Sharpe ratio as the out-of-sample performance measure,the coefficient 70 represents the expected load-adjusted Sharpe ratio when all thedummy variables are equal to 0, and coefficients 71 through 74 represent the dif-ferences between the dummy variables and the reference group. The f-statisticson the coefficients provide a test of the significance of the difference between anindividual dummy group and the reference group. We use the five-star funds or altemative predictor group five as a referencegroup because they provide a ceiling from which we can compare the performanceof the lower group funds. If the star ratings or altemative predictors accuratelyforecast out-of-sample performance, we should see increasingly negative (andsignificant) coefficients as we move from 71 to 74.C. Spearman-Rho Rani< Correlation Test As a final test, we use the two-tailed Spearman-Rho rank correlation test toexamine the rank correlations of both the Momingstar scores and the altemative ^We also performed all of the dummy variable regressions using the three-star funds or the alter-native predictor group 3 as the reference group. The results, which did not change when using thisreference group, are available from the authors.
    • 466 Journal of Financial and Quantitative Analysispredictors with the out-of-sample performance measures. Since Momingstar pro-vides the data to rank the funds beginning in 1994, we only examine this test forsamples that begin in 1994 or later. The Spearman-Rho has a null hypothesis ofno correlation between the two rankings and is a non-parametric test. For this test, we follow the methodology of Elton, Gruber, and Blake (1996a).For each fund in the sample, we examine the four different out-of-sample mea-sures: the (load-adjusted) Sharpe ratios, the (load-adjusted) mean monthly excessretums, the Jensen alphas, and the four-index alphas. We first sort all the fundsin descending order by either their in-sample Momingstar scores or, in the caseof the altemative predictors, by their in-sample predictors performance. Next,we organize the data into deciles and compute the average for each decile. Ourgoal is then to examine whether the decile ranking given by either the Mom-ingstar scores or by the altemative predictors corresponds to the decile rankingsof the four out-of-sample performance measures. If the Momingstar system orthe altemative predictors forecast well out-of-sample, then there should be closecorrelation between the in-sample and the out-of-sample rankings.IV. Momingstar Rating Results We present the predictive ability of the Momingstar ratings in two broad sec-tions. First, we report the results using the 1992-1997 seasoned funds subsam-ples. In Section IV.A, we discuss the dummy variable results for the overall sam-ples, the dummy variable results for the samples organized by style groups, theSpearman-Rho rank correlation results for the overall samples, and the Spearman-Rho rank correlation test for the samples organized by style groups. In SectionIV.B, we report the results of the complete funds 1993 sample. All the regres-sions in Section IV were tested for heteroskedasticity using the White (1980) test.None of the regression residuals exhibited evidence of heteroskedasticity at the10% level.A. 1992-1997 Seasoned Funds Sample Results In this subsection, we discuss the dummy variable regression results on ourseasoned funds groups, both for the overall samples and for the samples brokenfurther down into style subgroups. For the overall samples, we do not examinethe mean monthly retum results, since funds with different styles will likely havedifferent mean monthly retums.1. Dummy Variable Regression Analysis on Overall Seasoned Fund SamplesThe Load-Adjusted Sharpe Ratio. Table 1 presents the dummy variable regres-sion analysis in which we examine how well the Momingstar stars predict out-of-sample fund performance, as measured by the load-adjusted Sharpe ratio, forthe overall seasoned fund samples. First, the Sharpe ratio results in Table 1 showthat the 70 coefficients, the constants in the dummy variable regressions, differfrom sample to sample. The 1992 constant is close to zero and insignificant, the1994 constant is well below zero and significant, and the 1993 and 1995-1997constants are all positive and significant. These results indicate that the reference
    • Blake and Morey 467group (the five-star funds) performs quite differetitly in different years. The up-and-down performance of the five-star Sharpe ratios is consistent with the perfor-mance of the S&P 500 indexs mean monthly excess retums. Second, the resultsshow that the four- and three-star funds do not diverge from the five-star fundsin terms of out-of-sample performance. Only three of the 24 coefficients (71 and72 for the 12 samples) are significant, indicating no significant difference in out-of-sample performance of median- and top-rated funds. In many cases, even thesigns on the coefficients are the opposite of what one would expect. Third, thereis some evidence that the Momingstar ratings predict the low-performing funds.The 73 and 74 coefficients are generally negative and significant (12 of the 24 73and 74 coefficients), indicating that the performance of one- and two-star funds issignificantly worse than the five-star funds. Fourth, the R^ and F-statistic valuesfor the samples differ dramatically—the 1992 one-year sample has an R^ of 0.02while the 1997 one-year sample has an/?^ of 0.17. TABLE 1 Dummy Variable Regressions Using Momingstar Stars Sample To (constant) Tl (<.star) T2 (3-star) T3 (2-sta/) ~l« (-star) F-Stal.1992 0.06 0.11 0.07 0.06 0.01 0.02 1.13 (1.09) (1.76) (1.12) (0.78) (0.10)1993 0.26* -0.03 -0.05 -0.11 -0.21 0.02 1.27 (4.13) (0.42) (0.78) (1.51) (1.40)1994 .-0.18* -0.05 -0.05 -0.09 -0.32* 0.05 4.11* (3.92) (1.02) (1.02) (1.69) (3.74)1995 0.69- 0.14* 0.10 -0.03 -0.42* 0.11 10.31* (9.29) (1.77) (1.37) (0.35) (3.51)1996 0.25* 0.05 0.02 -0.03 -0.15* 0.06 5.82* (8.19) (1.51) (0.55) (0.92) (2.67)1997 0.39- -0.03 -0.07* -0.15* -0.35* 0.17 20.28* (12.69) (0.91) (2.23) (4.36) (6.72)Three-Year1992 0.01 0.05 0.04 0.06 0.01 0.02 1.01 (0.07) (1.50) (1.36) (1.84) (0.21)1993 0.28* -0.03 -0.06 -0.14* -0.12 0.08 5.51* (8.99) (1.02) (1.70) (3.66) (1.57)1994 0.26" -0.01 -0.02 -0.08* -032* 0.15 12.88* (8.87) (0.01) (0.61) (2.23) (5.68)1995 0.42* 0.03 0.01 -0.04 -0.26* 0.12 11.66* (12.61) (0.74) (0.34) (1.10) (4.79)Five-year1992 0.24* 0.01 -0.01 -0.01 -0.11* 0.03 2.30 (9.91) (0.18) (0.32) (0.31) (2.50)1993 0.34* -0.04 -0.06* -0.13* -0.08 0.08 6.09* (12.65) (1.31) (2.06) (4.03) (1.25)Sample: Funds wiin 10 years or more of in-sample returns (Seasoned Funds 1992-1997 Sample Group).Out-ol-Sample Perlormar^ce Measure: Load-Adjusted Sharpe Ratio,(-statistics are in parentheses.*indicates significance at the 5% level.The Modified Jensen Alpha and Four-Index Alpha. Results for the modifiedJensen and four-index alphas continue to demonstrate the same pattems as theSharpe ratio: little, if any, significant difference between the five-, four-, andthree-star rated funds (with the 1993five-yearsample providing the only evidenceof significance in the right direction), some evidence of negative and significant
    • 468 Journal of Financial and Quantitative Analysisdifferences between the low-rated and the five-star funds, and wide swings in theconstant and R^ values. In addition, the one- and four-index models show that, inmost cases, the five-star funds have negative (and sometimes significant) alphas(the 70 coefficient).^^2. Dummy Variable Regression Analysis on Samples Organized by Style Groups We also examined the ability of the Momingstar stars to predict out-of-sample performance when the samples are broken into the five style groups. Theresults are very similar across out-of-sample performance measures and also tothe dummy variable analysis on the unbroken sample.^ First, there is very little ability to predict significant negative differencesbetween the five-, four-, and three-star funds. In fact, for the out-of-sample load-adjusted mean monthly retum, 33 of the 60 coefficients for 72 (the three-star fund)are positive. Second, the growth and growth-income styles ratings show someability to predict low-performing funds. However, this result does not extend tothe other styles: the low ratings of aggressive growth, equity-income, and smallcompany funds show relatively little ability to detect significant differences inout-of-sample performance. Third, there are vast differences in the constant termacross styles and samples. For example, using the 1994 one-year sample, thesmall company five-star funds post a solid gain, while every other style shows anegative value for the constant. The results for the aggressive growth and equity income styles, and to alesser extent for small company funds, should be interpreted carefully since thereare relatively few of these funds in the seasoned funds samples. In fact, for anumber of samples, the equity-income style does not have a single one-star fund.The small sample may be the reason that the stars for growth and growth-incomefunds predict low future performance better than the other styles.3. Spearman-Rho Rank Correlation Tests for Overall Seasoned Funds Samples Table 2 displays the Spearman-Rho rank correlation test results for the over-all seasoned fund samples. For each of the six samples in which we have avail-able Momingstar scores. Table 2 shows the Spearman-Rho rank correlations ofthe in-sample Momingstar scores with the out-of-sample decile averages of theperformance measures across all 10 deciles, and across both the top five decilesand the bottom five deciles. The results show the same basic pattem found inthe dummy regression analysis on the overall sample: the low scores predict poorfuture performance and the high scores have, at best, only mixed ability to pre-dict future performance. In examining the rank correlation coefficients on all 10deciles, several performance measures are relatively well correlated with the in-sample Momingstar scores. In fact, in four of the six samples for the Sharpe ^^Tables on the re.sulLs for the modified Jen.sen and four-index alpha.s are available from the authors.AI.SO, .since the samples are not divided by investment objective, we do not report the load-adjustedmean monthly retum results in this section. Funds with different styles will likely have different meanmonthly returns. In the sections in which we organize .samples into their respective style groups, weuse the load-adjusted mean monthly retum as one of the out-of-sample measures. ^Detailed tables of the resulLs for the out-of-sample performance measures when the samples arebroken into style groups are available upon request.
    • Blake and Morey 469ratio, one of the six samples for the Jensen single-index alpha, and two of thesix samples for the four-index alpha, we cannot reject the null hypothesis of nocorrelation in the rankings at the 95% confidence level. However, examining thecorrelation coefficients of the top five and bottom five deciles, we see that overallrank correlation results are largely based on the ability ofthe low scores to predictpoor future performance. In most cases, the correlation coefficients for the bottomfive decile are much larger than those for the top five decile. Generally, the rankcorrelation coefficients for the top five deciles are actually negative, indicatingthat high scores do not accurately predict superior future performance. TABLE 2Spearman-Rho Rank Correlations Using Decile Averages of In-Sample Momingstar Scores with Decile Averages of Out-of-Sample Performance Measures Rank Correlation Load-Adjusted Non-Load-Adjusted Non-Load-Adjusted Sample Deciles Sharpe Ratio Jensen Alpha 4-lndex AlphaOne-Year1994 All 0.806* 0.600 0.806* Top 5 -0.200 -0.700 -0.400 Bottom 5 0.600 0.600 0.8001995 All 0.430 0.467 0.648* Top 5 -0.700 -0.500 0.600 Bottom 5 1.000* 0.700 0.3001996 All 0.758* 0.370 0.079 Top 5 -0.400 -0.900* -0.900* Bottom 5 0.900* 0.700 0.3001997 All 0.927* 0.648* -0.297 Top 5 0.500 -0.700 -0.900* Bottom 5 0.900* 1.000* 0.700Three-Year1994 All 0.685* 0.673 0.442 Top 5 -0.100 -0.200 -0.700 Bottom 5 0.700 0.700 0.6001995 All 0.491 0.382 0.261 Top 5 -1.000* -0.600 -0.100 Bottom 5 1.000* 1.000* 0.400All funds have 10 or more years of In-sample returns (seasoned funds 1992-1997 group).*indicates significance at the 5% level.4. Spearman-Rho Rank Correlation Tests for Samples Organized by Style Groups We also examined the results of the Spearman-Rho rank correlation testsfor the samples when broken into their respective style groups.-" As with thedummy variable results for the samples broken into style groups, we examinedthe out-of-sample load-adjusted mean monthly retum, the load-adjusted Sharperatio, the non-load-adjusted single-index alpha, and the non-load-adjusted four-index alpha. The results mirror the dummy variable results when the samplesare organized by style. For the aggressive growth, equity-income, and, to a lesserextent, the small company sample groups, we do not see much positive correlationbetween the Momingstar scores and the out-of-sample metrics. This is true for therank tests ofthe overall 10 deciles, the topfivedeciles, and the bottom five deciles. ^"A detailed table containing the resulLs of that examination is available from the authors.
    • 470 Journal of Financial and Quantitative AnalysisAgain, the reason for this may be that the sample sizes are not large.-" However,with the growth and growth-income samples, we see the pattem suggested by theoverall Spearman-rho rank correlations tests in Table 2: low correlations usingthe top five deciles and higher correlations using the bottom five deciles. In fact,in the growth fund sample, every bottom five decile rank correlation is higher invalue than the top five decile rank correlation. The results again suggest that theMomingstar scores are weak in terms of predicting high future performance andyet have some ability to predict underperforming funds.B. Complete Funds 1993 Sample Results1. Dummy Variable Regression Analysis on the Overall Complete Funds 1993 Samples Table 3 presents the results from the dummy variable regressions for ouroverall complete funds 1993 sample and shows the same pattems detected in theseasoned funds overall sample results. First, there is a relatively strong ability topredict low-performing funds, especially in the longer out-of-sample terms. Ofthe 18 coefficients for 73 (two-star) and 74 (one-star), 15 are negative and sig-nificant, indicating that low-rated funds do perform significantly worse in termsof risk-adjusted performance. Second, there is only weak ability to predict high-performing funds. Only two of the nine coefficients for 72 (three-star) and zeroof the nine coefficients for 71 (four-star) are negative and significant. In fact,only five of the nine 71 (four-star) coefficients have the "correct" negative sign.Third, the Momingstar stars do a slightly better job of predicting out-of-sampleperformance when using the four-index alpha.2. Dummy Variable Regressions for Samples Organized by Age Panels A-C of Table 4 present the results for the dummy variable regressionsin which we use samples organized by age. Table 4, Panel A reports the resultsfor young funds (three to less than five years of in-sample retums); Panel B re-ports the middle-aged funds (five to less than 10 years of in-sample retums); PanelC reports the seasoned funds (10 or more years of in-sample retums). There isevidence of an ability to predict poor future performance, especially among sea-soned and middle-aged funds. Using the four-index alpha for the middle-agedfunds shows that the Momingstar stars have a strong ability to predict weak per-formance, as most of the coefficients for the lower rated funds are negative andstrongly significant. Among the young funds, we do not see much evidence ofability to predict weak performance, but this is probably because there are noone-star funds in the young funds sample group and, in general, relatively fewyoung funds in the sample. In predicting high-performing funds, the Momingstar stars are, at best, mildlysuccessful. In the middle-aged and seasoned fund subsamples, only five of the 18coefficients for 72 (three-star) are significant and negative, yet two of the 18 are For the equity-income sample, we did not perform the test over many samples since there werenot enough observations to create the deciles. We required that there be at least 20 ob.servations .sothat each decile would have at least two observations.
    • Blake and Morey 471 TABLE 3 Dummy Variable 1 Regressions Using Momingstar StarsSample To (constant) T1 (4-star) 72 (3-star) T3 (2-star) tA (1-slar) F-Stat.Oul-oi-Sample Perlormahce Measure: Load-Adjusted Sharpe Ratio1 year 0.25* 0.01 -0.03 -0.08 -0.07 0.01 2.14 (6.98) (0.36) (0.66) (172) (0.70)3 year 0.26* -0.01 -0.02 -0.08* -0.13* 0.05 7.90* (16.48) (0.25) (1.14) (3.79) (2.72)5 year 0.28* 0.02 0.01 -0.05* -0.08 0.08 6.00* (20.86) (1.10) (0.49) (2.66) (1.94)Out-ol-Sample Perlormahce Measure: Non-Load-Adjusted Jehsen Ihdex Alpha1 year 0.26* -0.01 -0.05 -0.10 0.01 0.01 0.59 (3.37) (0.01) (0.57) (1.04) (0.01)3 year -0.10* 0.02 -0.04 -0.16* -0.34* 0.04 6.72* (2.30) (0.35) (0.76) (2.97) (2.67)5 year -0.21* 0.07 0.03 -0.10* -0.26* 0.04 7.31* (5.45) (1.70) (0.65) (1.94) (2.29)Out-ol-Sample Perlormahce Measure: Noh-Load-Adjusted 4-lhdex Alpha1 year 0.13* -0.02 -0.04 -0.20* -0.58* 0.07 4.47* (2.00) (0.23) (0.59) (2.36) (2.96)3 year 0.05 -0.06 -0.10* -0.20* -0.41* 0.04 6.20* (1.18) (1.20) (2.22) (3.67) (3.26)5 year 0.06" -0.04 -0.09* -0.15* -0.31* 0.04 7.05* (1.99) (1.25) (2.64) (3.88) (3.35)Sample: All funds from the complete funds 1993 sample group (635 funds),/-statistics are In parentheses.*indicates significance at tfie 5% level.significant and positive, indicating that the three-star funds perform better out-of-sample than the five-star funds. Among the young funds, many of the coefficientshave the predicted negative signs, yet there is little ability to detect significantlydifferent performance between median and high-rated funds.3. Dummy Variable Regressions for Samples Organized by Style We also examined the results from the dummy variable regressions when thesamples are organized by style.^^ The results for low-performing funds are verysimilar to the seasoned fund samples results when the samples are organized bystyle. Only in the growth and growth-income styles is there a relatively strongability to predict low performance, as many of the coefficients for 73 (two-star)and 74 (one-star) are negative and significant, particularly in the longer out-of-sample periods. In predicting high-performing funds, the small company, aggressive growth,and equity income funds do not demonstrate much ability. However, for thegrowth and growth-income funds, there is evidence of ability to predict winningfunds. For both the growth and growth-income subsamples, almost all coefficientsshow the postulated negative sign, and many of the coefficients are significant forthe growth-income subsample. Of course, the success of these subsamples maybe largely related to the sample period. In our earlier analysis of the seasonedfunds samples, the 1993 subsamples provide some ofthe strongest support (albeit ^A detailed table of those resulLs is available from the authors.
    • 472 Journal of Financial an(d Quantitative Analysis TABLE 4 Dummy Variable Regressions Using Momingstar Stars Organized by AgeSample/Age To (constant) Vi (4-star) Tg (3-star) TS (2-star) l4(1-star) R2 F-Stat.Pane/ A. Youhg Funds (less thah 5 years ol irt-sampte returns)Out-ol-Sample Performance Measure. Load-Adjusted Sharpe Ratio •One-year 0.33- -0.05 -0.04 -0.25* NA 0.07 2.06 (3.33) (0.44) (0.40) (1.99)Three-year 0.28* -0.08 0.01 -0.06 NA 0.06 1.97 (7.27) (1.31) (0.02) (1.31)Five-year 0.29* -0.01 0.03 -001 NA 0.03 0.89 (8.39) (0.15) (0.88) (0.06)Out-of-Sample Performance Measure: Non-Load-Adjusted Jensen AlphaOne-year 0.42- -0.16 -0.18 -0.31 NA 0.02 0.44 (1.99) (0.61) (0.75) (1.14)Three-year -0.04 -0.10 -0.01 -0.07 NA 0.02 0.72 (0.46) (0.92) (0.01) (0.63)Five-year -0.15 -0.01 0.07 0.07 NA 0.02 0.50 (1.79) (0.03) (0.74) (0.64)Out-of-Sample Performance fi^easure. Noh-Load-Adjusted 4-lnciex Alpha :One-year 0.24 -0.13 -0.11 -0.35 NA 0.04 1.10 (1.43) (0.64) (0.60) (1.63)Three-year 0.09 -0.14 -0.11 -0.17 NA 0.03 0.83 (0.99) (1.30) (1.12) (1.52)Five-year 0.12 -0.09 -0.14 -0.13 NA 0.04 1.34 (1.75) (1.10) (1.89) (1.56)Panel B. Middle-Aged Funds (greater than 5 and less than 10 years ol in-sample returns)Out-of-Sample Performance Measure. Load-Adjusted Sharpe Ratio . •One-year 0.21 0.08 0.01 0.01 0.05 0.02 1.24 (4.45) (1.42) (0.05) (0.22) (0.33)Three-year 0.24- 0.03 -0.01 -0.04 -0.14* 0.07 5.09* (12.38) (1.44) (0.01) (1.46) (2.49)Five-year 0.25* 0.05* 0.04* -0.01 -0.09 0.08 6.25* (15.14) (2.74) (2.06) (0.52) (1.77)Out-of-Sample Performanoe Measure. Non-Load-Adjusted Jensen Alpha :One-year 0.23* 0.C7 -0.03 -0.09 0.10 0.01 0.69 (2.23) (0.57) (0.23) (0.66) (0.34)Three-year -0.13* 0.07 -0.01 -0.11 -0.40* 006 4.42* (2.39) (1.08) (0.24) (1.53) (2.53)Five-year -0.30* 0.17* 0.13* -0.04 -0.27 009 7.03* (5.73) (2.86) (2.25) (0.62) (1.79)OutOf-Sampfe Performance Measure: Non-Load-Adjusled 4-lndex AlphaOne-year 0.13 0.02 -0.08 -0.24 -0.49 0.04 2.53* (1.36) (0.15) (0.73) (1.90) (1.82)Three-year 0.08 -0.04 -0.14* -0.20* -0.47* 0.07 4.86* (1.42) (0.65) (2.25) (2.77) (2.98)Five-year 0.07 -0.01 -0.10* -0.16* -0.35* 0.09 7.11* (177) (0.12) (2.37) (3.10) (3.08) (continued on next page)not that strong) for Momingstar stars predicting high-performing funds, but it isquestionable whether these results would carry over to other sample periods.^^C. Load/No Load Counterparts for Out-of-Sample Data All results in Section IV were calculated using the load/no-load counterpartsof the out-of-sample performance measures, i.e., non-load-adjusted Sharpe ra-tios, non-load-adjusted mean monthly excess retums, and load-adjusted (using a 1993 seasoned fund sample.? show more predictability than most other samples whetherusing the Momingstar stars or the altemative predictors (see Section V).
    • Blake an(d Morey 473 TABLE 4 (continued) Dummy Variabie Regressions Using Momingstar Stars Organized by AgeSample/Age To (constant) T1 (4-star) T2 (3-star) T3 (2-star) T4(i-star) FT F-Stat.Panel C. Seasoned Funds (10 years or more of in-sample returns)Out-of-Sample Performance Measure: Load-Adjusted Sharpe RatioOne-year 0.26* -0.03 -0.05 -0.11 -0.19 0.02 1.15 (4 13) (0.42) (0.78) (1.48) (1.28)Three-year 028* -0.03 -006 -0.14* -0.11 0.07 5.28* (8.97) (1.02) (1.69) (3.62) (1.38)Five-year 0.34* -0.04 -0.06* -0.13* -0.08 0.08 6.00* (12.61) (1.30) (2.04) (4.01) (1.22)Out-of-Sample Performance Measure: Non-Load-Adjusted Jensen Index AlphaOne-year 0.20 0.01 0.01 -0.01 -0.04 0.01 0.01 (1.45) (0.10) (0.04) (0.01) (0.12)Three-year -0.08 -0.01 -0.09 -0.28* -0.25 0.06 4.18* (0.86) (0.09) (0.96) (2.61) (1.13)Five-year -0.09 -0.06 -0.15 -0.29* -0.27 0.07 5.03* (1.18) (0.71) (1.85) (3.22) (1.46)Out-of-Sample Performance Measure: Non-Load-Adjusted 4-lndex AlphaOne-year 0.07 0.02 0.05 -0.06 -0.62* 0.03 1.88 (0.58) (0.16) (0.37) (0.44) (2.17)Three-year -0.03 -0.01 -0.03 -0.18 -0.28 0.03 2.08 (0.32) (0.11) (0.29) (1.71) (1.32)Five-year 0.01 -0.04 -0.03 -0.14 -0.21 0.03 1.70 (0.13) (0.55) (0.46) (1.77) (1.35)All from complete funds 1993 sample group,t-statistics are in parentheses.*indicates significance at the 5% level.dummy variable for loads in equation (9)) modified Jensen and four-index alphas.The results were generally the same as those reported above and are availableupon request.V. Alternative Predictor Resuits The results so far indicate that Momingstar ratings do not generally pre-dict superior fund performance but do have some predictive power for poor-performing funds. Can an investor do as well by choosing funds based on al-temative predictors?- To answer this question, we examine a naive predictor thatuses in-sample mean monthly retums, an in-sample Sharpe ratio, an in-samplesingle-index alpha, and an in-sample four-index alpha. As with the Momingstar star and score tests, we use the altemative predictorson both the seasoned funds 1992-1997 sample group and the complete funds 1993sample group. Hence, again we have two different sets of results. Since presentingall of the results for each altemative predictor would result in an unwieldy numberof additional tables, we summarize our results in Tables 5-8 for the seasonedfunds 1992-1997 sample and Tables 9-12 for the complete funds 1993 sample.All the regression results reported in Section V were tested for heteroskedasticityusing the White (1980) test. None of the regression residuals exhibited evidenceof heteroskedasticity at the 10% ^^ ^We thank Stephen Brown for suggesting an examination of that question. •"Section Vs results are primarily for the overall samples. Except in Table 12, we do not reportthe results for the sample.s that are organized by .style or age. The altemative predictor re.sulLs for the
    • 474 Journal of Financial and Quantitative AnalysisA. 1992-1997 Seasoned Funds Sample Results1. Dummy Variable Regression Results We rank the funds based on the in-sample altemative predictor and then putthem into five groups that match the number of funds in each of the five Mom-ingstar star groups. This way we can construct the same dummy variable regres-sion analysis we used for the Momingstar stars. Although there are nominally four altemative predictors, we actually havefive since we use two variants for the naive predictor. The first variant allocatesthe rankings on the basis ofthe in-sample mean monthly retums: if there were 15five-star funds and 25 four-star funds, the highest 15 funds according to their in-sample mean monthly retum would receive fives and the next 25 would receivefours. In the second variant, we first examine how many Momingstar stars aregiven within each style group and then rank order the funds by their in-samplemean monthly retum within their various style groups. For example, in the 1992sample there are 24 aggressive growth funds of which zero funds received fivestars, five funds received four stars, eight funds received three stars, six fundsreceived two stars, and five funds received one star. Hence, we would rank the24 aggressive growth funds by their in-sample mean monthly retum and then givethe top five aggressive growth funds fours, the next eight aggressive growth fundsthrees, etc. This way we use mean monthly retums as an altemative predictor andyet can still be sensitive to style differences. For each of the five altemative predictors, equation (9) is then estimated forthe 12 samples and for the three different out-of-sample performance measures.Hence, for each altemative predictor we calculate results that are similar in formto those in Table 1.^^ Table 5 summarizes the significance level results. The left-hand column re-ports the number of times out of 144 coefficients (four coefficients, 12 samples,and three out-of-sample performance measures) that the predictor produces a sig-nificantly negative coefficient for 71, 72, 73, or 74. The next column reports thenumber of times that the predictor produces a significantly positive coefficient for7i. 72. 73, or 74. Hence, high numbers in the first column indicate a considerableamount of predictive ability for the predictor and high numbers in the second col-umn indicate that the predictor is not very successful. The other columns indicatewhich of the coefficients, 71, 72,73, or 74, are significantly negative. Table 5s results show several interesting findings. First, the Momingstarstars are in the middle in terms of predicting future performance. The naive pre-dictor that uses the styles and the single-index alpha predictor have very similarpredictive performance to the Momingstar stars. The naive predictor, in whichno adjustment is made for styles, and the four-index alpha generally do worse,and the Sharpe ratio does considerably better than the Momingstar stars. Second,for every predictor, including the Momingstar stars, the ability to predict high-performing funds is quite weak, yet the ability to predict low-performing funds is quite high. This result is consistent with those found in some other studiessamples organized by style and age were generally similar to those presented in Section IV. They areavailable from the authors. We summarize the.se results in tables available from the authors.
    • Blake ancj Morey 475 TABLE 5Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to Forecast Out-of-Sample Performance—Significance Levels # of times U of times (out of 144) (out of 144) the predictor the predictor * cf cases # of cases # of cases # of cases produces a produces a (out of 36) (out of 36) (out of 36) (out of 36) significantly* significantly* the the the the negative positive coefficient. coefficient. coefficient. coefficient. coefficient coefficient T1 (4-star) • is T2 (3-star). is T3 (2-star). is T4(1-star).iS for 7, . 7 2 . for 7 , . 72. significant significant significant significant Predictor T3.Or74 T3. ° T4 and negative and negative and negative and negative10-year 35 14 1 2 8 24 mean returns; stars allocated by style10-year 39 38 5 5 7 22 mean returns; no adjustment for styles10-year 49 1 3 8 9 29 Sharpe ratio10-year 36 7 0 3 6 27 single- index alpha10-year 27 48 1 2 5 19 4-index alphafvlorningstar 39 10 0 4 13 22Sampfes Examined: The 12 Seasoned Fund Samples (not broken up into style categories), i.e.. 1992 (1-. 3-. and 5-year).1993 (1-. 3-. and 5-year). 1994 (1- and 3-year). 1995 (1- and 3-year). 1996 (1-year), and 1997 (1-year).Out-of-Sample Metrics Examined: Load-Adjusted Sharpe Ratio. Non-Load Adjusted Single-Index Alpha. Non-Load Ad-justed 4-lndex Alpha.Test Examined: The alternative predictor is used to allocate the stars. The frequency of the stars is the same as withthe Morhingstar stars except that they are allocated on the basis of the alternative predictor rather than the Momingstarmethod. With these stars, we then examine equation (9). Hence, there are 36 equations estimated (12 samples. 3 out-of-sample performance metrics) of which each equation has 4 coefficients. 7 i . 72. T3. T4. (hot including the constant).Hence, there are 144 coefficients examined for each alternative predictor.*Significant at the 10% level.on performance predictability (see, e.g., Carhart (1997)) that show it is generallypossible to predict losers (but not winners) in terms of mutual fund performance. Tables 6 and 7 complement Table 5. Table 6 provides information on wherethe negative and significant cases are located with respect to the out-of-samplemeasures. In general, the results are spread out relatively evenly among the threeout-of-sample performance measures. Table 7 examines the relative coefficient signs instead of the significancelevels, and specifically reports the number of times that the coefficient sign forhighly rated funds is greater than that for funds that are two levels worse in termsof ratings. That is, it examines the number of cases (out of a total of 36) where70 (5-star) > 72 (3-star)i7l (4-star) > 73 (2-star). Or 72 (3-star) > 74 (I-star)- The results in Tables 6 and 7 are similar to those presented in the rest ofthe paper. First, on the basis of these coefficient signs, the Momingstar stars donot illustrate significantly better predictive ability than the other predictors. The
    • 476 Journal of Financial anti Quantitative Analysis TABLE 6Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to Forecast Out-of-Sample Performance—Significance Levels Organized by Out-of-Sample Performance Measure # of times out of 48 (12 # of times out of 48(12 samples; 4 coefficients) # of times out of 48 (12 samples; 4 coefficients) the predictor produces a samples; 4 coefficients) the predictor produces a significantly* negative the predictor produces a significantly* negative coefficient for 71. 72. 73. significantly* negative coefficient for 7 i . 72.73. or 74. using the coefficient for 71. 72.73. or 74. using the non-ioad-adjusted or 74. using the load-adjusted Sharpe ratio single-index alpha as the non-load-adjusted 4-index as the cut-of sample out-of-sample alpha as the out-of sample Predictor performance metric performance metric performance metric10-year mean returns; 9 11 15 stars allocated by style10-year mean returns; no 7 9 23 adjustment for styles10-year Sharpe ratio 17 21 1110-year single-index alpha 11 15 1010-year 4-index alpha 5 5 17Momingstar 15 13 11Samples Examined: Same as Table 5.Out-of-Sample Metrics Examined: Same as Table 5.Test Examined: Same as Table 5.*Significant at the 10% level. TABLE 7Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to Forecast Out-of-Sample Performance—Coefficient Signs # of cases out of 36 # of cases out of 36 # of cases out of 36 (12 samples; 3 out-of-sample (12 samples; 3 out-of-sample (12 samples; 3 out-of-sample performance metrics) performance metrics) performance metrics) in which the coefficient in which the coefficient in which the coefficient signs are such that: signs are such that; signs are such that: Predictor To (6-star) > T2 (3-star) T1 (4-star) > ^3 (2-star) T2 (3-star) > T4 (i-star)10-year mean returns; stars allocated by style 14(2) 28 (8) 34(27)10-year mean returns; no adjustment for styles 14(5) 21 (7) 33 (28)10-year Sharpe ratio 30(9) 33(18) 35 (28)10-year single-index alpha 20(3) 35(16) 33 (30)10-year 4-index alpha 10(2) 20(13) 35 (29)Momingstar 18(4) 29(19) 36 (24)Samples Examined: Same as Table 5.Out-of-Sample Metrics Examined: Same as Table 5.Test Examined: Same as Table 5.Parentheses indicate the number of the cases in which the coefficiehts were as indicated and the difference in the coeffi-cients was significant at the 10% level using a Wald Test.Momingstar stars are again in the middle in terms of their success at predict-ing future performance. Second, all the predictors, regardless of what type, havemore ability to predict low-performing funds. In at least 90% of the cases, the72 (3-star) > 74 (1-star) condition is satisfied for every predictor. Third, all predic-tors, with the notable exception of the Sharpe ratio, have problems in predictinghigh-performing funds. For most predictors, the 70 (5-star) > 72 (3-star) conditionis satisfied 50% of the time or less.
    • Blake and Morey 4772. Spearman-Rho Rank Correlation Results Table 8 summarizes the Spearman-Rho rank correlation results for the al-temative predictors.^ The Spearman-Rho rank correlation tests are the same asthose in Section IV.A.3, except that we use the altemative predictors to rank thefunds instead of the Momingstar scores. Again we use the decile averages de-scribed in Section III.D. There are six samples and three out-of-sample perfor-mance metrics. Table 8s results show essentially the same findings as the dummyvariable results for the altemative predictors. The Momingstar scores are similarin predictive ability to other altemative predictors. All the predictors, with theexception of the Sharpe ratio, have much higher Spearman-Rho rank correlationsin the bottom five deciles than in the top five deciles, indicating that the predictorsforecast low-performing funds better than high-performing funds. TABLE 8 Spearman-Rho Summary Results for Alternative Predictors and Momingstar Scores #of cases (out of 18) #of cases (out of 18) # of cases (out of 18) in which the in which the in which the Spearman-rho rank Spearman-rho rank Spearman-rho rank test is greater test is greater test is greater than 0.5 across than 0.5 across than 0.5 across Predictor all 10 deciles the top 5 deciles the bottom 5 deciles10-year mean monthly returns; 2 1 5 no adjustment for styles10-year Sharpe ratio 13 10 910-year single-index alpha 9 4 1510-year 4-index alpha 5 0 5Momingstar score 9 1 15Samples Examined: Post 1993 Seasoned Fund Samples: 1994 (1- and 3-year). 1995 (1- and 3-year). 1996 (1-year), 1997(1-year).Out-of-Sample Metrics Examined: Same as Table 5.Test Examined: Spearman-Rho tests based on decile averages. We test all 10 deciles, the top 5 deciles, and the bottom5 deciles. Since there are 6 samples with 3 out-of-sample performance metrics, there are 18 total tests for each predictorB. Complete Funds 1993 Sample ResultsDummy Variable Regression Analysis For the complete funds 1993 sample, the altemative predictors are the sameas those used above except that we use three years of in-sample data because theyoung and middle-aged funds do not have the necessary 10 years of in-sampledata and all the funds must have a minimum of three years of historical retumsto be rated by Momingstar. The results for the altemative predictors using thecomplete funds 1993 sample are presented in Tables 9-12, which provide thesame kind of information that Tables 5-7 provide for the 1992-1997 seasonedfunds sample. However, the number of cases is much smaller since we only havethree samples (rather than 12). The results show that, unlike those from the sea-soned funds sample, the Momingstar star method does significantly better thanthe altemative predictors at predicting future performance. Table 9 reports that. ^We do not use the predictor in which we allocate ranking.s using mean monthly retums by their.style because it is impossible to rank order all the funds using this method.
    • 478 Journal of Financial antd Quantitative Analysiseven though the altemative predictors have roughly the same number of signifi-cantly negative coefficients as the Momingstar stars, they generally produce manymore significantly positive coefficients. The Momingstar method may be supe-rior, since it does not produce nearly as many prediction errors. TABLE 9Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to Forecast Out-of-Sample Performance—Significance Levels # of times # of times (out of 36) (out of 36) where the where the predictor predictor # of cases # of cases # of cases # of cases produces a produces a (out of 9) (out of 9) (out of 9) (out of 9) significantly* significantly* where the where the where the where the negative positive coefficient. coefficient. coefficient. coefficient. coefficient coefficient Tl (4-star). is T2 (3-star). is T3 (2-star). is T4 (1-star). is for 71. 72. for 7 1 . 72. significant significant significant significant Predictor T3.°^T4 and negative and negative and negative and negative3-year mean 16 6 5 3 3 5 returns; stars allocated by style3-year mean 19 7 4 4 4 7 returns; no adjustment for styles3-year 18 3 5 3 3 7 Sharpe ratio3-year 19 6 4 4 4 7 single- index alpha3-year 20 4 4 4 4 8 4-index alphaf^orningstar 17 1 0 2 8 7*Significant at the 10% level.Samples Examined: The 3 Complete Funds 1993 Samples (All Ages included), i.e.. 1993 (1-. 3-. and 5-year).Out-of-Sample Metrics Examined: Load-Adjusted Sharpe Ratio. Non-Load Adjusted Singie-lndex Alpha. Non-Load Ad-justed Four-Index Alpha. 7"es( Examined: The alternative predictor is used to allocate the stars. The frequency of the stars is the same as withthe fvlorningstar stars except that they are allocated on the basis of the alternative predictor rather than the Momingstarmethod. With these stars, we then examine equation (9). Hence, there are 9 equations estimated (3 samples. 3 out-of-sample performance metrics) of which each equatioh has 4 coefficients. 7 i . 72. T3. T4. ("ot including the constant).Hence, there are 36 coefficients examihed for each alternative predictor Moreover, Table 10 shows the significant and negative coefficients generatedby the altemative predictors tend to be clustered when using the non-load-adjustedfour-index out-of-sample performance metric. The Momingstar stars, by contrast,have significantly negative coefficients spread more evenly across the three out-of-sample measures. Table 11 further demonstrates the apparent success of the Momingstar starssystem. The Momingstar stars system produces coefficient signs that are in linewith what one would expect if they had predictive ability. On the other hand, thealtemative predictors do not have such strong results, particularly in the 70 (5-star) >72 (3-star) and 7l (4-star) > 73 (2-star) CaSeS. A natural question arises at this stage: why does the Momingstar method farebetter against the altemative predictors in the complete fund 1993 sample, when
    • Blake and Morey 479 TABLE 10Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to Forecast Out-of-Sample Performance—Significance Levels Organized by Out-of-Sample Performance Metric # of times out of 12(3 # of times out of 12(3 samples; 4 coefficients) # of times out cf 12(3 samples; 4 coefficients) the predictor produces a samples; 4 coefficients) the predictor produces a sighificantly* negative the predictor produces a significantly* negative coefficient for 71.72.73. significantly* negative coefficient for 71. 72. 73. or 74. using the coefficient for 7 1 . 72. 73. or 74. using the non-load-adjusted or 74, using the load-adjusted Sharpe ratio single-index alpha as the non-ioad-adjusted 4-index as the out-of-sample out-of-sample alpha as the out-of-sample Predictor performance metric performance metric performance metric10-year mean returns; 3 2 11 stars allocated by style10-year mean returns; no 2 4 12 adjustment for styles10-year Sharpe ratio 3 1210-year single-index aipha 2 5 1210-year 4-index aipha 2 6 12Mornihgstar 5 4 8Samples Examined: Same as Table 9.Out-of-Sampfe Metrics Examined: Same as Table 9.Test Examined: Same as Table 9.•Significant at the 10% level. TABLE 11Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to Forecast Out-of-Sample Performance—Coefficient Signs # of cases out of 9 (3 it of cases out of 9 (3 # of cases out of 9 (3 samples; 3 out-of-sample samples; 3 out-of-sample samples; 3 out-of-sample performance metrics) in performance metrics) in performance metrics) in which the coefficient signs which the coefficient signs which the coefficient signs are such that; are such that; are such that; Predictor To (5-star) > T2 (3-slar) Tl (4-star) > 73 (2-star) T2 (3-star) > T4 (1-star)3-year mean returns; 4(3) 3(0) 8(6) stars allocated by style3-year mean returns; 4(4) 3(1) 9(8) no adjustment for styles3-year Sharpe ratio 5(3) 3(2) 9(8)3-year single-index aipha 4(4) 3(0) 9(7)3-year 4-index alpha 4(4) 4(1) 9(7)f/lorningstar 7(2) 9(8) 8(7)Samples Examined: Same as Table 9.Out-of-Sample Metrics Examined: Same as Table 9.Test Examined: Same as Table 9.Parentheses indicate the number of the cases in which the coefficients were as indicated and the differehce in the coeffi-cients was significant at the 10% level using a Wald Test.its predictive abilities are very similar to the altemative predictors in the 1992-1997 seasoned funds sample? A possible answer is that the Momingstar stars arebased on up to 10 years of retum data—a fund that has 10 years of data or morewill be judged not only on its three-year retums, but also on its five- and 10-yearretums; a fund with more than five years of retum data will be judged on thethree- and five-year retums. However, our altemative predictors in the completefunds 1993 sample are all based on just three years of retum data. Hence, for
    • 480 Journal of Financial and Quantitative Analysisthe majority of the funds (545 out of 635), Momingstar uses more information toallocate their stars than our altemative predictors. TABLE 12 Comparison of Momingstar Stars Against an Altemative Predictor Organized by Age Young and Seasoned Funds # of times out of # of times out of 36 (3 samples; 3 36 (3 samples; 3 out-of-sample out-of-sample performance performance # of cases out of # of cases out of # of cases out of metrics; 4 metrics; 4 9 (3 samples; 3 9 (3 samples; 3 9 (3 samples; 3 coefficients) coefficients) cut-cf-sample out-of-sample out-of-sample where the where the performance performance performance predictor predictor metrics) where metrics) where metrics) where produces a produces a the coefficieht the coefficient the coefficient significantly* significantly* signs are such signs are such signs are such negative positive that: that: that: coefficient for 7 1 . coefficient for 7 1 . To (5-star) > Tl (4-star) > T2(3-siar) > Predictor T2.T3.or74. 72. 73. or 74. T2 (3-star) T3 (2-star) T4 (1-star)Panel A. Young Funds (n = 90) (Funds with less thah 5 years of returhs as of Jah. 1993)3-year mean 2 0 7(1) 6(0) NA returns; stars allocated by styler^orningstar Stars 2 0 6(1) 5(1)Panel B. Seasoned Funds (n = 269) (Funds with 10 or more years of returns as of Jan. 1993)3-year mean 2(0) 1(0) 9(9) returns; stars allocated by style^10-year mean 8(2) 7(6) 9(5) returns; stars allocated by style"Momingstar 10 7(3) 7(6) 9(1)*Significant at the 10% level.Samples Examined: Same as Table 9.Out-of-Sample Metrics Examined: Same as Table 9.Test Examined: Same as Table 9.^Star allocation based on 635 funds of which only the young or seasoned funds are tested,"star allocation based cn 269 funds (seasoned fund sample).Parentheses indicate the humber of the cases in which the coefficients were as indicated and the differehce in the coeffi-cients was significant at the 10% level using a Wald Test. To explore this issue further, we constructed Table 12. In Panel A (youngfunds), we examine the ability of the Momingstar stars and the three-year meanretum predictor that utilizes style differences to produce significantly negativecoefficients. When examining only young funds, the Momingstar stars do nothave an informational advantage since they use the same three years of retumdata history. Panel A, Table 12 shows clearly that there is very little difference inpredictive ability between the Momingstar stars and the three-year mean retumpredictor when examining only the young funds. In Panel B (seasoned funds), we examine the predictive ability of the Mom-ingstar stars, the three-year mean retum predictor (in which we use three yearsof in-sample retum history), and the 10-year mean retum predictor (in which weuse 10 years of in-sample retum history). The results illustrate and support ourhypothesis. The altemative predictor that uses just three years of in-sample retumdata fares quite poorly relative to the Momingstar stars at predicting future per-
    • Blake and Morey 481formance. In nine of the 36 cases, it produces a significantly positive coefficientas compared to none for the Momingstar stars. However, when we compare theMomingstar stars to the altemative predictor that utilizes 10 years of in-sampleretum data, the results are quite similar. Hence, it appears that the superior abil-ity of the Momingstar stars reported in Tables 9-11 is related to the fact that theMomingstar stars use more information than the altemative predictors based onthree years of in-sample data.C. Load/No Load Counterparts for Out-of-Sample Data As mentioned in Section III.A, all results in Section V were calculated usingthe load/no load counterparts of the out-of-sample performance measures, i.e.,non-load-adjusted Sharpe ratios, non-load-adjusted mean monthly excess retums,and load-adjusted (using a dummy variable for loads in equation (9)) modifiedJensen and four-index alphas. The results were generally the same as those re-ported above and are available upon request.VI. Conclusions This paper investigates the degree to which the well-known Momingstar five-star rating system is a predictor of out-of-sample mutual fund performance. Thisis an important issue because several studies (e.g., Sirri and Tufano (1998) andGoetzmann and Peles (1997)) have shown that highly ranked funds attract thegreatest investor cash infiow. We use a data set, based on domestic equity mu-tual funds, that is free from survivorship bias, adjusted for load fees, and whichallows us to examine the predictive abilities of the rating system over differenttime horizons, periods, fund investment styles, fund ages, and with different out-of-sample performance metrics. We also compare the predictive abilities of theMomingstar rating system with those of altemative predictors: a naive predictorof in-sample historical average monthly retums, one- and four-index in-samplealphas, and in-sample Sharpe ratios. Our investigation results in several main findings. First, Momingstar is ableto "predict" low-performing funds. Funds with less than three stars generally havemuch worse future performance than other groups. This result is relatively robustover different samples, ages of funds, styles of funds, out-of-sample performancemeasures, and whether load or non-load adjusted retums are used for the out-of-sample retums. Second, there is only weak statistical evidence that the five-star(highest-rated) funds outperform the four- and three-star funds (next-to-highestand median-rated funds). Again, these results are robust over different samples,ages, out-of-sample performance measures, load assumptions, and styles. Third,the Momingstar ratings, at best, do only slightly better than altemative predictorsin foretelling future fund performance. These altemative predictors include onesthat are relatively naive, such as those that use mean monthly retums, as well asSharpe ratios, and Jensen and four-index alphas. These results suggest that otherapproaches to developing predictors, such as the "style" approach (e.g.. Brownand Goetzmann (1997) and Sharpe (1992)), may be more informative.
    • 482 Journal of Finanoial and Quantitative Analysis Our first two results are broadly consistent with much of the mutual fund per-formance persistence literature: while it is relatively easy to predict poor perfor-mance, it is much more difficult to predict superior performance. Our results alsosuggest that investors should be very cautious about associating a highly ratedfund with superior future performance. Although previous studies have shownthat highly rated funds attract the bulk of investor cash inflows, our results suggestthat those cash inflows are not necessarily justified by subsequent performance. Finally, our results do not refute the Momingstar rating system. In almost allof their publications, Momingstar states that the star ratings are not predictors offuture performance, but rather "achievement" marks. Many investors and mutualfunds nevertheless use the ratings as indicators of future performance. Studiesshow that high Momingstar ratings are strongly related to large capital inflowsand are well used in marketing mutual funds to the public. In summary, thisresearch answers an important question that investors should ask: Do the starratings actually predict out-of-sample performance?ReferencesBawa, V. S., and E. B. Lindenberg. "Capital Market Equilibrium in a Mean-Lower Partial Moment Framework." Journal of Financial Economics, 5 (1977), 189-200.Blume, M. "An Anatomy of Momingstar Ratings." Financial Analysts Journal (March/April 1998), 19-27.Brown, S. J. "Mutual Fund Styles." Conference Proceedings from Computational Finance Confer- ence, New York Univ. (Jan. 6-8, 1999).Brown, S. J., and W. Goetzmann. "Performance Persistence." Journal of Finance, 50 (1995), 679-698. "Mutual Fund Styles." Journal of Financial Economics, 43 (1997), 373-399.Brown, S. J.; W. M. Goetzmann; R. G. IbboLson; and S. Ross. "Survivorship Bias in Performance Studies." Review of Financial Studies, 5 (1992), 553-580.Carhart, M. M. "On The Persistence Of Mutual Fund Performance." Journal of Finance, 52 (1997), 57-82.Damato, K. "Momingstar Edges Toward One-Year Ratings." Wall Street Journal (April 5, 1996), Cl.Elton, E. J.; M. J. Gruber; and C. R. Blake. "The Persistence of Risk-Adjusted Mutual Fund Perfor- mance." Journal of Business, 69 (1996a), 133-157. "Survivorship Bias and Mutual Fund Performance." Review of Financial Studies, 9 (9<)(ib), 1097-1120.Elton, E. J.; M. J. Gmber; S. Das; and M. Hlavka. "Efficiency with Costly Information: A Reinter- pretation of Evidence fi-om Managed Portfolios." Review of Financial Studies, 6 (1993), 1-22.Goetzmann, W. N., and R. G. Ibbotson. "Do Winners Repeat?" Journal of Portfolio Management (Winter 1994), 9-18.Goetzmann, W. N., and N. Peles. "Cognitive Dissonance and Mutual Fund Investors." Journal of Financial Research, 20(1997), 145-158.Gmber, M. J. "Another Puzzle: The Growth in Actively Managed Mutual Funds." Journal of Finance, 5U1996), 783-810.Hendricks, D.; J. Patel; and R. Zeckhauser. "Hot Hands in Mutual Funds: Short-Run Persistence of Relative Performance, 1974-1988." Journal of Finance, 48 (1993), 93-130.Jaffe, C. "Rating the Raters: Flaws Found in Each Service." Boston Globe (Aug. 27, 1995), 78.Jensen, M. "The Performance of Mutual Funds in the Period 1945-1964." Journal of Finance, 23 (1968), 389^16.Khorana, A., and E. Nelling. The Determinants and Predictive Ability of Mutual Fund Ratings." The Journal of Investing (Nov. 1998), 61-^6.Lallos, L. Momingstar Newsletter. Chicago, IL: Momingstar Inc. (1997).Malkiel, B. G. "Retums from Investing in Equity Mutual Funds 1971 to 1991." Journal of Finance, 51 (1995), 783-810.Markowitz, H. M. Portfolio Selection: Efficient Diversification of Investments. Cambridge, MA: Basil Blackwell, Inc. (1959).
    • Blake and Morey 483Morey, M. R., and R. C. Morey. "A Mutual Performance Appraisals: A Multi-Horizon Perspective with Endogenous Benchmarking." Omega: The International Journal of Management Science, 27 (1999), 241-258.Momingstar Principia Manual. Chicago, IL: Momingstar Publication.? (1998).Rea, J. D., and B. K. Reid. "Trend.? In The Ownership Cost of Equity Mutual Funds." Investment Company Institute Perspective, 4 (Nov. 1998), 1-15.Sharpe, W. "Mutual Fund Performance." Journal of Business, 39 (1966), 119-138. "Asset Allocation: Management Style and Performance Measurement." Jour- nal of Portfolio Management (Winter 1992), 7-19. _. "Momingstar Performance Measure.?." Financial Analysts Journal (July/Aug. 1998), 21-33.Sirri, E. R., and P Tufano. "Costly Search and Mutual Fund Flows." Joumai of Finance, 53 (1998), 1589-1622.White, H. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedastic- ity." Econometrica, 48 (1980), 817-838.