Statistical arbitrage strategies often seek to exploit temporary mispricings between assets that, over time, tend to exhibit stable relationships. Triplets Trading, a variation of traditional pairs trading, extends this approach by analysing the joint dynamics of three assets. The core idea is to build a linear combination of their prices to create a synthetic spread expected to display mean-reverting behavior. Trading decisions are based on the assumption that deviations from this equilibrium will eventually correct, offering opportunities for profit.
In this project, the analysis focuses on three exchange-traded funds: the SPDR S&P 500 ETF Trust (SPY), the iShares Core S&P 500 ETF (IVV), and the Vanguard S&P 500 ETF (VOO), all of which track the S&P 500 index. Despite having the same underlying benchmark, these ETFs are issued by different providers and can exhibit slight pricing discrepancies due to differences in expense ratios, liquidity, and dividend treatment. The strategy aims to identify moments when the price divergence between these ETFs exceeds typical levels, signaling potential opportunities for statistical arbitrage.
The project is structured as follows:
- Time period selection and preliminary data analysis;
- Implementation of the Johansen multivariate approach to cointegration;
- Implementation of the trading strategy.
A crucial component of the model’s development is the selection of an appropriate time period for analysis. Since the statistical relationships between the considered ETFs are not constant over time, the stability of the spread, and therefore the reliability of the trading signals, is highly dependent on the timeframe chosen for model calibration.
The period considered for the analysis is June 1, 2016 - December 30, 2023. Although this period extends for 6.5 years and includes the Covid-19 pandemic outbreak, no evident sign of significant structural breaks or regime shifts that would undermine the stability of the relationships among the three series was detected.
The identified timeframe was divided into a training set and a test set, with the split occurring September 7, 2021. Data up to this date was used to estimate the models which will allow to define the trading spreads, while the data following such date was used to test the performance of the triplets trading strategy out-of-sample.
Since the analysis of the cointegration relationship between the time series will be based on the Johansen's approach, which requires the time series to be I(1), the stationarity of the price time series was assessed inspecting the ACF and PACF plots and, more formally, by means of the Augmented Dickey-Fuller (ADF) test.
Given that all three series contain a unit root, the Johansen cointegration method to test for the existence of stationary linear combinations among them was implemented. More precisely, the following operations were carried out:
- Estimation of a Vector Autoregressive (VAR) model for the series to identify appropriate lag length
$p$ , which captures the short-run dynamics among the variables; - Computation the trace test and maximum eigenvalue test to determine the rank of the cointegration matrix, for different specifications of the cointegration relationship;
- Estimation of the Vector Error Correction models (VECM) corresponding to the different specifications of the cointegration relationships associated with a cointegration matrix with rank
$$r >0 $$ , and computation the stationary time series resulting from the estimated cointegration relationships.
The first step consists of fitting a VAR model to the time series:
Where:
-
$X_t \in \mathbb{R}^3$ is the vector of time series at time$t$ ; -
$\Phi_i \in \mathbb{R}^{3 \times 3}$ are the autoregressive coefficient matrices; -
$u_t \sim \text{i.i.d. } (0, \Sigma_u)$ is a white noise error term.
To select the appropriate lag length
The Johansen cointegration test is based on the VECM representation of a VAR(p) model. The VECM formulation is given by:
Where:
-
$\Delta X_t$ is the differenced process; -
$\Pi \in \mathbb{R}^{3 \times 3}$ is the long-run coefficient matrix; -
$\Gamma_i \in \mathbb{R}^{3 \times 3}$ are short-run adjustment coefficient matrices; -
$\varepsilon_t \sim \text{i.i.d. } (0, \Sigma)$ is a white noise innovation.
The long-run coefficient matrix can be decomposed as:
Where:
-
$\alpha \in \mathbb{R}^{3 \times r}$ contains the adjustment speed coefficients; -
$\beta \in \mathbb{R}^{3 \times r}$ contains the cointegrating vectors; -
$r \leq 3$ is the cointegration rank.
The term
To estimate the cointegration rank, two tests are considered within the Johansen approach:
- Trace test: a joint test where the null hypothesis is that the number of cointegrating vectors is less than or equal to
$r$ , against the alternative that it is more than$r$ ; - Maximum eigenvalue test: tests the null hypothesis that the number of cointegrating vectors is exactly
$r$ , against the alternative of$r + 1$ .
Such tests were performed on a more general version of the VECM:
Where:
-
$\mu \in \mathbb{R}^{r \times 1}$ is a constant term in the cointegration relation; -
$\lambda \in \mathbb{R}^{r \times 1}$ is a linear trend coefficient in the cointegration relation.
The following model specifications were considered:
- No intercept and no deterministic trend:
$\mu = 0, \lambda = 0$ ; - Intercept and no deterministic trend:
$\mu \neq 0, \lambda = 0$ ; - Intercept and linear deterministic trend:
$\mu \neq 0, \lambda \neq 0$ .
Three possible trading spreads,
To retrieve the parameters needed, the VECM models corresponding to these spreads were estimated. More precisely:
- For the spread
$s^n_t$ , the estimated VECM model is:
- For the spread
$s^c_t$ , the estimated VECM model is:
- For the spread
$s_t^{cl}$ , the estimated VECM model is:
The stationarity of the retrieved spreads was assessed via the ADF test.
Having defined a stationary spread, the trading strategy is built upon the following principle: open a position on the three ETFs when a trading signal is detected — that is, when the observed value of the spread deviates significantly from its expected value (typically zero) and close the position when the spread reverts back to equilibrium (i.e., approaches zero).
To define trading signals rigorously, an ARMA process is fitted to the spread computed on the training set. The general form of the
Where:
-
$s_t$ is the spread at time$t$ ; -
$\mu$ is the mean; -
$\phi_i$ and$\theta_j$ are AR and MA coefficients; -
$\varepsilon_t$ is a white noise error term.
For each time step in the test set, the observed spread is compared to the one-step-ahead forecast generated by the fitted ARMA model. A trading signal is triggered when the absolute difference between the forecasted and observed spread exceeds the forecast's standard deviation, indicating a statistically significant deviation from equilibrium. In general, the strength of the trading signal can be tuned by replacing the standard deviation of the forecast
After each forecast, the ARMA model's internal state is updated with the new observed value of the spread. This adaptive updating allows the model to produce more accurate subsequent forecasts without increasing uncertainty. However, note that the model is not refit — its parameters remain fixed as estimated from the training data.
This approach ensures that the trading strategy is reactive to new information, while maintaining the stability and interpretability of the original model fit.
The testing of the trading strategy was carried out examining how the strategy’s performance metric, represented by the out-of-sample return, evolves under varying assumptions about market frictions and signal strength thresholds.
Since successful triplet strategies involve an initial positive inflow of capital and no outflow upon closure, standard return metrics like linear returns
Instead, the method proposed by Gatev, Goetzmann, and Rouwenhorst (Review of Financial Studies, 2006) is adopted: if there is no open position at time t, the return
Where:
-
$S_t$ is the value of the spread at time$t$ ; -
$V_t^S$ is the market value of the spread position; -
$f$ is the transaction cost parameter.
The overall return over the out-of-sample period is then calculated as:
Such metric was computed for different values of transaction costs