This blog was originally posted via the Centre for Evaluation, London School of Hygiene & Tropical Medicine.
On 22nd September a half day event at the London School of Hygiene & Tropical Medicine launched a series of papers on Stepped Wedge Trials (here in referred to at SWTs) and brought together some of the researchers at the forefront of the development of these complex trials.
I was lucky that the start of my PhD studying these designs coincided with this project taking a systematic look at how SWTs were being used. The result of this project was a collection of six papers summarizing a review of recent SWTs and the issues these raised. The launch of these papers gave rise to some interesting discussions.
The first SWT began in 1986 and is still ongoing. The study, looking at the long term impacts of the hepatitis B vaccine in Gambia, was developed in collaboration with Peter Smith who gave us a background to the trial.
Unlike more recent SWT, the length of individual follow up in this study (30 – 40 years) was much longer than the rollout of the vaccine (4 years). The design was chosen to aid implementation and because it seemed more ethical with only a small loss of power compared to a parallel trial.
So why are stepped wedge trials becoming so popular?
Audrey Prost took us through the commonly given reasons and debate arose about their validity. There were many criticisms of SWTs being more ethical, the most convincing to me was that if a parallel trial is unethical because the intervention is already thought to be effective, it is equally unethical to withhold the intervention until it is a cluster’s time to switch.
Claims that SWTs are easier to implement seem to be over stated and several audience members recalled additional complications caused by the trial design.
A more convincing reason was when an SWT ‘piggy backed’ the roll out of an intervention which was already going to occur. Here an SWT may be the best available option, with a simple before-and-after comparison being the only alternative and a clearly inferior choice.
Next, Andrew Copas described areas of potential bias, such as a ‘carryover effect’ similar to a survivor bias but more broad, where participants with unstable conditions may be sicker by the time they receive the intervention. The audience shared experiences of bias from the cluster populations changing over time, such as a hospital’s patient population changing because of changes to the services that the hospital provided.
There were many criticisms of stepped wedge trials being more ethical, the most convincing to me was that if a parallel trial is unethical because the intervention is already thought to be effective, it is equally unethical to withhold the intervention until it is a cluster’s time to switch.
In Gianluca Baio’s talk he suggests using simulation studies to do power calculations for SWTs; these are used in other trial designs already and involve guessing what your data will look like to see how much power you have if these guesses are true. You can pretty much incorporate any design choices and the calculations neatly align with the final analysis.
So while in comparison with formulaic methods they are more difficult to do and take a bit longer to run, they are incredibly versatile. Karla Hemming expressed interest in what this could tell us about when these formulaic methods are adequate and work is ongoing in this area.
In the talk Calum Davey gave us on the analysis of SWTs he asked us “Are we treating SWTs as trials?” This hit a cord with many audience members.
SWTs contain what Calum referred to as ‘mini-trials’; randomized comparisons between each of the steps. There are also non-randomized comparisons within each cluster, but these are confounded with changes in the outcome over time. The most common analysis method uses both of these comparisons to estimate an intervention effect making assumptions about how the outcome is changing over time.
The trouble with this is how do you know if you’ve made the right assumptions about time? It all gets a bit complicated and we are reminded why we use randomized comparisons.
We may be able to overcome these problems by only using the ‘mini-trial’ intervention comparisons; the hepatitis vaccine trial was designed with this in mind.
The trouble with this is how do you know if you’ve made the right assumptions about time? It all gets a bit complicated…
Whilst this will have less statistical power than current methods perhaps this is something that must be accepted; you don’t get an increase in power for nothing and if the necessary assumptions do not hold, you have to deal with less power. The difficulty lies in how these ‘mini-trials’ should be combined without introducing assumptions.
Richard Hayes pointed out that it’s not just the time trends we should be concerned about. In many trials it is common (but not necessarily sensible) to assume that the intervention effect is the same in all clusters. In an SWT this could have the implication of assigning too much weight to the within-cluster comparisons.
The overall consensus from this thought-provoking half day boiled down to “where are you starting from?” If you think a parallel trial is an option perhaps this blog and the series of papers will encourage you in that direction.
For those who think it isn’t an option, think again, are you sure? If you still think an SWT is your best option then perhaps it is, but this is a design which should be handled with care, we are only just beginning to uncover the pitfalls.