Replicating Indexes In R With Style Analysis: Part I

In the quest for clarity in portfolio analytics, Professor Bill Sharpe’s introduction of returns-based style analysis was a revelation. By applying statistical techniques to reverse engineer investment strategies using historical performance data, style analysis offers a powerful, practical tool for understanding the source of risk and return in portfolios. The same analytical framework can be used to replicate indexes with ETFs and other securities, providing an intriguing way to invest in strategies that may otherwise be unavailable.

Imagine that there’s a hedge fund or managed futures portfolio that you’d like own but for one reason or another it’s inaccessible. Perhaps the minimum investment is too high or the fund is closed. Or maybe you prefer to build your own to keep costs down or maintain a tighter control on risk. If the returns are published, even with a short lag, you can still jump on the bandwagon by statistically creating a rough approximation of the strategy’s asset allocation via style analysis.

Any index, in theory, can be replicated, which opens up a world of opportunity. Even if you’re not interested in investing per se, decomposing key indexes through style analysis offers valuable tactical and strategic information. As one example, deconstructing key hedge fund or CTA benchmarks published by provides the basis for quasi-real time analysis of investment trends in the alternative investment space. In turn, the analysis can provide useful perspective on the evolution of manager preferences for asset classes in global macro or managed futures strategies.

Let’s run through a simple example of how to estimate weights for an index through style analysis. To illustrate the process clearly in Part I of this two-part series, I’ll start by reverse engineering an index that’s already fully transparent: the S&P 500.

From a practical standpoint there’s no need to decompose the S&P since its components are widely known and you can readily invest in the index through low-cost proxy ETFs and mutual funds. But let’s pretend that the S&P 500 is an exotic benchmark and its design rules are a mystery. All we have to work with: the S&P’s daily returns and a vague understanding that 11 equity sectors (financials, energy, etc.) drive the S&P’s risk and return profile.

Fortunately, we have access to ETF proxies for those 11 sectors. Thanks to style analysis, we’re also in luck because these puzzle pieces can be analyzed to create a replicated version of the S&P 500 via the 11 funds.

The basic procedure is to run a regression on the S&P’s historical returns against a set of relevant reference indexes. To maintain a long-only, unlevered result we’ll impose constraints on the resulting coefficients.

There are several ways to crunch the numbers, including several off-the-shelf software packages that do all the heavy lifting for you. If you prefer to go behind the curtain to 1) understand how the analytics work; and 2) gain more control over the results it’s time to fire up R. (Much of what follows, by the way, is inspired and facilitated by the FactorAnalytics package.)

There are a number of possibilities for estimating weights via style analysis. In this example I use the quadratic programming method via the solve.QP function. If you’re curious, here’s a basic setup I wrote using R code for a one-period analysis.

In terms of ETFs, the target index is represented by SPDR S&P 500 (SPY); you can find a list of the 11 sector funds here.

For this example I used daily returns from the end of 2010 through last week’s close (Oct. 6) with the first asset-mix estimate following a year later. From there, I re-estimated the weights once every year (252 trading days). Here’s how the replicated SPY portfolio compares with the genuine article:

It’s not perfect, but it’s close. The correlation for the daily returns for the two indexes is 0.72 (if the match was perfect the correlation would be 1.0; if there was no correlation the reading would be 0.0). Looking back on the history for the sample period shows that the estimated weights for any one of the 11 sector funds ranged from 0 to roughly 22%.

Keep in mind that this replication example was the financial-engineering equivalent of shooting fish in a barrel. That was intentional, to illustrate the process for an outcome we generally knew in advance. In this case, it was clear from the get-go that 11 sector funds would explain the lion’s share of the S&P 500’s risk and return variation. Replicating other indexes, however, requires more work.

To estimate weights for, say, a hedge fund index that’s opaque beyond its performance history requires subjective decisions about which set of benchmarks/funds to use for the regression. Fortunately, there’s a wide range of ETFs that provides the raw material to replicate most strategies. Nonetheless, it’s fair to say that this process generally requires a mix of art and science.

In the example above, most of the effort was science. In Part II of this series I’ll tackle a more ambitious subject that requires more art by attempting to replicate a hedge fund index via a set of ETFs.

4 thoughts on “Replicating Indexes In R With Style Analysis: Part I

  1. Pingback: Quantocracy's Daily Wrap for 10/10/2017 | Quantocracy

  2. Pingback: Replicating the SPY ETF – Forrest Henslee

  3. Pingback: Replicating Indexes In R (Part III): Socially Responsible Investing

Comments are closed.