In Defense Of (Intelligent) Backtesting

Backtesting is a necessary evil, but how evil is it and how does necessity compel us to use it?

These and related questions come into sharp relief after reading a recent article in Significance, a magazine published by Royal Statistical Society. “Financial investment strategies are often designed and tested using historical market data. But this can frequently give rise to ‘optimal’ strategies that are statistical mirages and perform poorly out in the real world,” write the authors, two professors with science and mathematical-science backgrounds, in “How ‘backest overfitting’ in finance leads to false discoveries.”

The article lays out a skeptical overview of backtesting, citing several well-known traps for practitioners. One of the central risks is overfitting, which is essentially fine-tuning a model so that the results closely match the desired outcome. The authors warn that “trawling again and again through historical market data in a bid to identify an ‘optimal’ approach will often lead to a dead end.”

No one can dismiss or minimize the risk. Indeed, in the grand scheme of backtesting it’s likely that only a small fraction dispense meaningful results that genuinely inform. But with careful thought, and a clear understanding of what’s likely to work and what’s not, backtesting can be a productive research tool, which is to say that this line of analysis isn’t nearly as bleak as the authors suggest.

To be sure, backtesting can easily be abused and it surely is, perhaps routinely in some circles. Backtesting per se isn’t the issue; rather, it’s how you backtest that matters. But some of the more egregious mistakes are easily avoided. Common sense helps, as do realistic expectations for working in a realm where blockbuster ideas are probably no longer possible.

In developing realistic expectations, it helps to consider when backtesting shines brightest as tool for developing informed perspective. That starts with what is perhaps the simplest of investing backtests: developing a market index that tracks some corner of the financial markets.

Take the S&P 500 Index, for instance. Once upon a time it was unclear, if not wholly unknown, what returns could be generated with an investment in the “stock market” through time. By some accounts, earning a profit through buying and holding “the market” was destined to fail.

That began to change in the 1920s, when Edgar Lawrence Smith published a study that tested performance for broad measures of equities and fixed-income securities.

The analysis on market research went full-blown academic and quantitative in a famous, seminal backtest in the early 1960s with work by professors James Lorie and Lawrence Fisher, who forever changed expectations for equity market expectations. In an early effort at using computing power to crunch the numbers, they quantitatively demonstrated (for the first time) that stocks outperformed bonds in the long run.

On the basis of that discovery, a critical piece of asset allocation insight emerged – and one based on historical evidence rather than intuition, guesswork and manual efforts a la Smith. As Peter Bernstein explained in Capital Ideas: The Improbable Origins of Modern Wall Street, Lorie and Fisher in 1964 reported in the Journal of Business that stocks outperformed bonds.

“The article was a bombshell,” Bernstein writes. “Academics and practitioners alike were astonished to find that an investor who had put $1,000 into the market in 1926, had reinvested all dividends received, had paid no taxes, and had held on until the end of 1960 would have seen the original $1,000 grow to nearly $30,000 – a gain on the order of 9 percent a year.”

As a rule of thumb, the idea that stocks will outperform bonds in the long run remains a staple for investment design – an insight drawn from a famous 1964 backtest, and one that’s held up rather convincingly in out-of-sample testing ever since.

Does that mean that all backtesting is infallible? Hardly. In fact, it’s fair to say that most of investment-related backtesting is misguided at best. By contrast, Lorie and Fisher picked the low-hanging fruit that was a byproduct of their age, when insight into markets was limited and so the opportunity for deep insight was relatively high.

Backtesting has advanced by light years since then, and not always for the better. The sheer number of backtests run, and the various insights identified, all but assures that advancing the field of investment knowledge recedes by the year.

That doesn’t mean you should avoid backtesting. At a basic level it can be enormously useful, particularly when building customized portfolios that incorporate a particular set of rules and assumptions.

Backtesting, in short, remains a powerful, productive tool, assuming it’s used wisely, in part by avoiding the temptation to extract econometric blood out of statistical stones.

In a follow-up piece I’ll outline some best practices for using backtesting sensibly. As a preview, expecting to find a silver bullet is expecting too much. Fortunately, there’s still room for valid research if you hew to some basic rules and don’t confuse quantitative analytics with a magic wand.

Learn To Use R For Portfolio Analysis
Quantitative Investment Portfolio Analytics In R:
An Introduction To R For Modeling Portfolio Risk and Return

By James Picerno