Backtests have become the weapon of choice for rationalizing various forms of tactical asset allocation, which has become increasingly popular as a risk-management tool since the 2008 crash. The hazards of backtesting—studying how a strategy performed in the past–are well known, which leads some folks to shun the concept entirely. But that’s going too far.
In some respects, every investment plan owes a debt to some type of backtesting—even for a buy-and-hold strategy, which assumes that the future will deliver gains on par with what was earned in the past. The proper lesson is that designing robust backtests, which demands close attention to detail. Easier said than done, of course, in part because the pitfalls can be subtle. Here are three that routinely trip up the novice and perhaps even some experienced investors:
1) the use of total-return prices for technical signals
2) failing to correct for look-ahead bias by not using lagged signals
3) overlooking the importance of neutral signals for computing backtest results
The good news is that these traps are easily avoided. But there’s a catch: you have to be aware of the hazards. With that in mind, let’s briefly review these backtesting snares with some simple examples.
Total return data. Imagine that you’ve created what you think of as a winning investment strategy that’s based on two signals: a) the ratio for a set of short and long moving averages; b) the trailing return for a rolling x-day window. The results look encouraging, but the upbeat outcome may be an illusion if the calculations use total return prices.
Why? Consider a mutual fund that’s unchanged on the day but dispenses a hefty distribution at the close of trading. Imagine that this fund is priced at $10 a share and it spits out a 50-cent-per-share payout. Although the underlying portfolio value was unchanged on the day the mutual fund’s price falls by 50 cents to $9.50 to compensate for the distribution. The net result for shareholders: their holdings in the fund remain unchanged on the day. The 50-cent-per-share drop is offset by a 50-cent distribution. In short, a net wash.
It’s a routine affair in day-to-day market activity but it’s a trap if you’re looking at a fund’s technical profile without adjusting for distributions. Let’s say that the 50-cent price decline pushes the fund into negative territory in terms of the short/long moving-average ratio and trailing x-day return. On the surface, this looks like a sell signal when in fact it’s nothing of the sort since the fund’s portfolio value hasn’t changed.
The solution is to use price data that’s strips out distributions. If you don’t make that adjustment, your backtests using technical signals are probably faulty. Keep in mind too that the total return price histories aren’t real in the sense that the prices have been retroactively adjusted down to compensate for dividends, capital gains, etc. In other words, total return prices weren’t available in real time through history. Ignoring this issue runs the risk that your backtests are telling lies.
Lagged signals & avoid look-ahead bias. This is another common mistake that can turn a sow’s ear into pearls, if only on paper. There are many variations to this trap, depending on the complexity of the strategy, but the basic form can be illustrated with a simple example.
Take a strategy that issues a “sell” signal when price falls below an x-day moving average and a “buy” when price rises above that average. Let’s also assume that we’re using end-of-day closing prices. You test the strategy and discover that it delivers a strong performance through time. But you forget one small item: the end-of-day signals aren’t available until after the market closes. In other words, calculating returns for a real-world version of the strategy requires using lagged “buy” and “sell” signals.
One solution: assume a one-day lag. A “sell” signal is issued at Monday’s close, which translates to assuming that security was sold at the following’s day’s close.
How much difference will such a seemingly minor change make in a strategy’s results? A lot. Indeed, many strategies that look wonderful in backtests turn into dogs after correcting for look-ahead bias.
Neutral signals. This is an especially subtle problem because it’s counterintuitive in some respects.
The problem is when there’s a gray area with one or more trading signals. For instance, let’s say you’re using two signals to determine if the current climate for an asset is bullish or bearish. A “buy” is when both signals are bullish; a “sell” is when both are bearish. If there’s a split decision—one is bullish, the other bearish—the signal is neutral, which is to say that the previous signal holds until both signals indicate a decisive change, one way or the other.
As an example, both signals issued a “buy” signal the first trading day of the month. Two weeks later one of the signals turns bearish but there’s no confirmation in the other signal, which continues to align with a bullish reading. The net result: we no longer have a “buy” signal, but there’s no “sell” signal either. In that case, the previous signal—a “buy”—remains in force until a “sell” signal arrives.
Obvious? Well, sure, once we spell it out and are aware of the subtlety. But designing this nuance into the code can trip up a rookie. The solution: generate a historical record of “buy” and “sell” signals and monitor the net result via a “position” signal. A standard system is to generate a “1” for “buy”, “0” for netural, and “-1” for “sell” in the “position” data. By contrast, a common mistake is to only calculate the “buy” signals and assume that the absence of a “buy” is the equivalent of “sell”. Not necessarily, but that won’t be obvious unless you compute a separate set of “sell” and “neutral” signals.
What’s the relevance? Results. A backtest that equates “neutral” with “buy” signals can and usually does dispense substantially different results vs. a test that recognizes the distinction. Ok, maybe you want to blur the lines for tactical reasons. That’s fine. The danger arises when the analyst doesn’t spot the difference in advance.
These are hardly the only pitfalls in backtesting, but they’re relatively common—and easily avoided. The question is whether these quantitative stumbles have skewed results in some of the more influential backtests that have found a wide audience in recent years? The answer: unclear until (if) we can reproduce the research. Unfortunately, most of the backtests that make the rounds these days don’t provide the accompanying code. That’s one more reason why it’s essential to crunch the numbers directly before making substantial monetary commitments to a given strategy.
As President Reagan famously advised, Trust but Verify. That’s a good policy for geopolitical negotiations and for backtesting investment strategies.