Improved inference in the evaluation of mutual fund performance using panel bootstrap methodsby David Blake, Tristan Caulfield, Christos Ioannidis, Ian Tonks

Journal of Econometrics


Applied Mathematics / Economics and Econometrics / History and Philosophy of Science


Accepted Manuscript

Improved inference in the evaluation of mutual fund performance using panel bootstrap methods

David Blake, Tristan Caulfield, Christos Ioannidis, Ian Tonks

PII: S0304-4076(14)00113-4


Reference: ECONOM 3924

To appear in: Journal of Econometrics

Please cite this article as: Blake, D., Caulfield, T., Ioannidis, C., Tonks, I., Improved inference in the evaluation of mutual fund performance using panel bootstrap methods. Journal of

Econometrics (2014),

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. 1

Improved Inference in the Evaluation of Mutual Fund Performance using Panel

Bootstrap Methods


David Blake*

Tristan Caulfield**

Christos Ioannidis*** and

Ian Tonks****


Two new methodologies are introduced to improve inference in the evaluation of mutual fund performance against benchmarks. First, the benchmark models are estimated using panel methods with both fund and time effects. Second, the non-normality of individual mutual fund returns is accounted for by using panel bootstrap methods. We also augment the standard benchmark factors with fund-specific characteristics, such as fund size. Using a dataset of UK equity mutual fund returns, we find that fund size has a negative effect on the average fund manager’s benchmark-adjusted performance. Further, when we allow for time effects and the non-normality of fund returns, we find that there is no evidence that even the best performing fund managers can significantly out-perform the augmented benchmarks after fund management charges are taken into account.

Keywords: mutual funds, unit trusts, open-ended investment companies, performance measurement, factor benchmark models, panel methods, bootstrap methods

JEL: C15, C58, G11, G23 * Pensions Institute, Cass Business School, City University London; ** University College

London; *** Department of Economics, University of Bath; **** School of Management,

University of Bath

The dataset used in this paper was constructed while Ian Tonks was an ESRC Business

Fellow at the UK’s Financial Services Authority in 2009 (RES-186-27-0014), and he is grateful to the FSA’s Economics of Regulation Unit for hosting this visit. The authors are grateful to George Kapetanios for invaluable advice on implementing the bootstrap methodology and to Alok Bhargava (the Guest Editor), John Nolan, and three anonymous referees for highly constructive comments that greatly improved and shortened the paper. 2


Evidence collected over an extended period on the performance of (open-ended) mutual funds in the US (Jensen, 1968; Malkiel, 1995; Barras, Scaillet and Wermers, 2010) and unit trusts and open-ended investment companies (OEICs) in the UK (Blake and Timmermann, 1998; Cuthbertson, Nitzsche and O'Sullivan, 2008) has found that, on average, a fund manager cannot outperform the market benchmark and that any outperformance is more likely to be due to “luck” rather than “skill”. The standard approach for evaluating fund manager performance is to test it against an appropriate factor benchmark model and assess the significance of the abnormal returns from this model (Carhart, 1997). Recent evidence in

Chen, Hong, Huang and Kubik (2004) (hereafter CHHK) finds that fund size has a negative effect on performance due to diseconomies of scale at the fund level (in line with Berk and

Green, 2004). CHHK’s analysis applies the Fama and MacBeth (1973) method of estimating a series of cross-sectional regressions (one for each time period), averaging the estimated coefficients and testing for significance using the time-series variation in these estimates.

However, Petersen (2009) has shown that this methodology yields downward biased standard errors in the presence of fund effects. He explains how to estimate standard errors in the presence of both fund and time effects: either parametrically by including a time dummy for each period, and then clustering standard errors by fund; or non-parametrically by clustering on fund and time simultaneously. 1

Kosowski, Timmermann, Wermers and White (2006, hereafter KTWW) have argued that it is necessary to assess the statistical significance of fund manager performance using bootstrap methods, since the returns of individual mutual funds typically exhibit non-normal distributions (see also Fama and French, 2010, hereafter FF).

In this paper, we will assess the performance of a panel of mutual funds, allowing for the role of fund-specific characteristics, such as fund size, fund charges, and fund family membership.

We estimate a panel model using fixed effects and time dummies with standard errors clustered by fund. In addition, acknowledging that fund returns are not normally distributed, we generate a series of (non-parametric and parametric) bootstrap returns from the benchmark models to allow for appropriate statistical inference in the presence of non-normal fund returns. 1 Standard errors have been correctly computed in the econometrics literature for decades under different assumptions on stochastic properties of the errors (e.g., Hsiao, 1986, Bhargava, 1987). 3

The structure of the paper is as follows. Section 2 reviews the existing approaches to measuring mutual fund performance and shows how these approaches can be improved using bootstrap methods in a panel framework. Section 3 discusses the dataset we will be using.

The results are presented in Section 4, while Section 5 concludes. 2. Measuring mutual fund performance 2.1 Measuring performance using alternative benchmark models