Background
In the past two decades, there has been a dramatic increase in development of therapies and therefore in clinical trials of patients with rare diseases. The number of publications in this area rose by more than five-fold between the years 2003 and 2018 [
1]. Furthermore, the number of molecules in clinical development increased by five-fold between 2013 and 2018 [
2]. However, there are many challenges related to conducting clinical studies on rare diseases, such as heterogeneity in pathophysiology or clinical presentation, the overdue window of treatment opportunity in some conditions, and the difficulties in conducting adequately powered trials in rare diseases where the number of patients is low. The challenge of recruiting a large population needed to conduct a placebo-controlled study may lead to an enlargement of inclusion criteria and thus an increase of the heterogeneity of the population.
In the context of neuromuscular diseases, several strategies have been used to gain regulatory approval of orphan drugs without conducting large, placebo-controlled trials. These include the use of surrogate markers such as dystrophin in the clinical trial of eteplirsen in patients with Duchenne muscular dystrophy [
3] and globotriaosylceramide in the clinical trial of agalsidase beta in patients with Fabry disease [
4]. Historical controls were used during clinical testing of onasemnogene abeparvovec-xioi in spinal muscular atrophy [
5] and alglucosidase alfa in Pompe disease [
6], and a priori-designed natural history studies were used in the clinical evaluation of risdiplam in patients with spinal muscular atrophy type 2 [
7]. However, these strategies have limitations.
Surrogate markers (also called surrogate endpoints) are substitute outcomes that are studied when a desired primary clinical endpoint such as overall survival takes too long to observe or is ethically unjustifiable [
8]. In the cases of eteplirsen and golodirsen, two antisense oligonucleotides that induce skipping of exons 51 and 53, respectively, of the
dystrophin pre-mRNA, the surrogate nonclinical endpoint was the induction of truncated dystrophin production [
9,
10]. The U.S. Food and Drug Administration (FDA) approved the drugs in 2016 and 2019, respectively, under the accelerated approval program after concluding that enhanced dystrophin expression was reasonably likely to result in clinical benefit. Neither of these two drugs were, however, approved by the European Medicines Agency (EMA), which requires that a surrogate endpoint first be validated by showing a correlation between the surrogate endpoint and a clinical benefit [
11]. Furthermore, not all diseases have sensible surrogate endpoints and, in some cases, surrogate endpoints have been called into question based on later studies. In the early stages of the HIV/AIDS epidemic, change in CD4
+ T cell count was used as a surrogate marker in several trials, but this measure was eventually shown to be only weakly associated with survival [
12].
In the examples of historical cohorts cited above, survival and need for ventilation support of patients with Pompe disease treated with alglucosidase alfa [
13], and motor function scores and motor milestones in spinal muscular atrophy patients treated with onasemnogene abeparvovec-xioi [
14] were compared to the natural history of the diseases. To show the efficacy of a drug in a retrospective cohort study, however, the drug’s effects must be substantial, since other factors, such as a difference in standard of care or a placebo effect, may also explain differences between treated patients and historical cohorts.
The FDA has acknowledged the fact that rare disease clinical trials necessitate innovative designs that make use of, for example, external control patients and information on disease progression from natural history studies to improve the analytical model [
15]. Accordingly, an alternative strategy is the use of Bayesian statistics [
16]. Bayesian methods permit the reallocation of the probability of an explanation following the acquisition of new data related to a previously selected set of possible explanations. The Bayesian approach permits the borrowing of strength from additional information sources including, for example, historical controls from earlier randomized studies, data from disease registries, natural history studies and other nonrandomized sources, and expert medical opinion. Although non-Bayesian methods may also be used to compare treated patients with existing dataset or to follow evolution of disease at a population level, the Bayesian framework is more convenient for deriving prediction intervals with complex stochastic processes such as the beta distribution at both the population and individual levels [
16]. The result is a gain in study power and corresponding reduction in sample size needed. There is also a corresponding modest increase in overall Type I error (i.e., the false positive rate) that is typically estimated and controlled via pre-trial simulation studies. Bayesian methods have been discussed by both the EMA [
17,
18] and the FDA [
15,
19‐
21] with the conclusion that Bayesian methods offer a statistically acceptable approach, especially in rare and paediatric disease settings.
In this article, we leverage Bayesian methods to utilize a trial enrolee’s own personal natural history study (NHS) data to supplement recorded trial outcomes. Our NHS was a prospective international study on X-linked and autosomal dominant centronuclear myopathy, which followed Good Clinical Practice and systematic source data verification. Patient forced expiratory volume in 1 s (FEV1), assessed according to EU and US recommendations, and time on ventilator were recorded at every visit [
22]. The Bayesian model reduced the necessary sample size while controlling overall Type I error. Importantly, the model is
hierarchical, in that it can estimate both individual and overall population-level outcome trajectories and treatment effects.
We applied our hierarchical Bayesian modelling approach to the field of centronuclear myopathies (CNMs). This is a group of rare congenital myopathies with a highly variable clinical presentation and substantial genetic heterogeneity. Because of rarity and high variability, the incidence of centronuclear myopathies is not well known. However, the incidence of its most frequent and severe form, X-linked myotubular myopathy (XLMTM), is approximately 1 in 50 000 new-born males [
23]. The diagnosis is suggested by the central position of nuclei in muscle biopsies and clinical features. X-linked, autosomal recessive and autosomal dominant forms of CNM have been identified. The X-linked form is usually more severe, and symptoms are present at birth, yet a broad clinical heterogeneity is observed. The main causal mutations are distributed throughout the genes encoding myotubularin (
MTM1) for XLMTM (OMIM: 310400) [
24], dynamin 2 (
DNM2) [
25] and amphiphysin 2 (
BIN1) [
26] for the autosomal dominant form (OMIM: 160150), and amphiphysin 2 (
BIN1) [
27] for the autosomal recessive form (OMIM: 255200). The clinical traits of CNM include hypotonia, external ophthalmoplegia, and respiratory deficiency, which can be severe and life-threatening in the XLMTM congenital form [
22,
28]. Patients who survive beyond the neonatal period live with a high disease burden: a majority require the use of a wheelchair, feeding tube, and ventilation support. Additionally, respiratory function is also altered in patients who do not need ventilator support and respiratory complications are the most frequent cause of death [
29,
30]. Despite their rarity and their heterogeneous genotype and phenotype, CNMs are currently the targets of several clinical and pre-clinical development efforts that make them a paradigm for the need of alternative statistical strategies in clinical trials [
31].
Discussion
We have reported the application of Bayesian statistics to model the future natural history of a rare disease, centronuclear myopathy. To build this model, we used data from 4-year follow-up of 59 patients carrying mutations in the MTM1 or DNM2 genes. The model predicts individual patient trajectories for several endpoint measure scores based on the observations of a natural history study, and its quality of fit suggests that it adequately represents natural evolution of the disease.
Bayesian statistics offer the opportunity to compare the outcomes of patients at a given time after treatment to the simulated endpoint scores at the same given time without a treatment. Having predicted an individual trajectory with a certain probability allows us to estimate the probability that an observed deviation from that predicted trajectory would have happened without intervention. Consequently, our Bayesian incorporation of auxiliary data (NHS and run-in) offers an alternative to the comparison of treated patients with an untreated group, which can be challenging in small and heterogeneous cohorts, two conditions characteristic of clinical research in rare disease.
Bayesian statistics have previously been used in the field of rare diseases: Quintana et al. developed a Bayesian model of disease progression in GNE myopathy based on quantitative muscle strength data [
34]. Ramanan et al. proposed a Bayesian design for a phase 2 trial to compare adalimumab versus pamidronate in chronic nonbacterial osteomyelitis [
35]. In the currently ongoing Sarcome-13 trial, a phase 2 trial of mifamurtide in newly diagnosed high-risk osteosarcoma, a Bayesian analysis is planned that will incorporate available historical data into the trial [
36]. A review of Bayesian methods in rare disease settings has recently been published [
37].
The natural evolution of CNMs consists of an overall stability of the patients’ parameters [
28], and, consistently with this, the model shows non-progression of the disease. Therefore, this type of analysis will not identify a stabilising effect in centronuclear myopathy. This will be true whatever the design of the trial. As long as the conditions of the patients do not deteriorate over the period of time during which a trial can be organised and completed (generally 2 years), a long-term stabilizing effect of the treatment will not be possible to demonstrate.
One of the main limitations of our study is that it relies on a restricted and heterogeneous sample size. However, this small and heterogeneous sample size is actually the driving rationale of the Bayesian approach as small sample sizes are typical in the fields of rare diseases [
38,
39]. In the field of centronuclear myopathy, the study described is by far the largest prospective cohort to date. Indeed, existing studies that we found on the natural history, genotype, and phenotype of patients living with centronuclear myopathy that used larger cohorts of up to 120 patients were mostly retrospective or involved only a punctual intervention to identify mutations [
40‐
42].
An additional limitation resides in the heterogeneous follow-up. Some patients were followed for 1 year, some for 4 years. The model could be used for a therapeutic trial with an even more limited follow-up, however, as clinical trials generally last about 1 year. Therefore, not all of the model’s trajectories derive from the observed results, but the trajectories of the run-in patients borrow information from the natural history study of patients with longer follow up times. Again, this limitation is intrinsic to the nature of a very rare disease. The inclusion of all patients at the same time is extremely challenging in part due to geography—the whole of Europe in the present study. As the borrowing of data is limited and the Type I error is well controlled, this borrowing is justified. The Type I error is the rejection of a true null hypothesis (i.e., a false positive). In this case, a Type I error results in the conclusion that a treatment is effective when the patient’s response is in fact natural, and apparent improvement is due merely to visit-to-visit variability. To limit this, our method adapts the threshold difference between the predicted and observed response rates that is required to declare a treatment effect.
Because the natural evolution of the patients is not necessarily linearly correlated to previous history, the reliability of the model’s prediction will decrease with time. As is shown by the comparison of patients with considerable NHS data to those with run-in data, the more data available, the more stable the predictive distribution. Although the reliability of the model is therefore limited by the duration of the natural history study, the same limitation is found in a randomised controlled trial that can only show treatment efficiency for the duration of the trial. We acknowledge that by avoiding randomization and blind assessment in order to minimize sample size, we are forfeiting some protection against possible systemic biases that could result from assumption that the natural evolution is linear, and the possibility of a placebo effect. In particular, the study’s Type I error and power will obviously be affected by model misspecification. Manifestations of a placebo effect could include a positive adjustment to the intercept starting at the time of the intervention, a slight increase in slope, or even a departure from linearity (all on the logistic scale). Investigations of such changes and their impacts were discussed in the recent poster presentation by Monseur et al. [
43] and will be the subject of a future manuscript.
Though it is currently unclear how large the placebo effect can be in patients with CNMs, the placebo effect in other neuromuscular disorders has been described as mild and transient for spinal muscular atrophy and as non-existent for Duchenne muscular dystrophy [
44]. The placebo effect observed in double-blind placebo-controlled studies in spinal muscular atrophy patients, for instance, is limited in duration to up to 6 months [
45]. In recent trials involving Duchenne muscular dystrophy patients, it has been demonstrated that natural history study data are highly comparable to data from patients treated with placebo [
44]. The quality of fit of the model shows an adequate prediction of evolution over the time of the NHS. Therefore, it can be assumed that, during that clinical trial period, variations in evolution compared to the natural history predictions will be due to a treatment effect rather than a lack of reliability of the model. However, the model does not predict rare events, such as a lower respiratory tract infection that would require a hospitalisation and that could induce a significant functional decline. Mitigating this, the probability of these rare events is known. In a prospective study of a 33-patient cohort over a 1-year period, 17 (52%) patients required a visit to the hospital for acute care, with a total of 38 visits (1.15 annual visit rate). Of visits to the emergency room, 47% were due to fever or infection, and of the 34% that resulted in hospitalisation, 69% were due to fever or infection [
29].
Bayesian incorporation of auxiliary data can reduce the number of patients necessary to conduct a study and the number of patients who must be given placebo, but cannot deliver a conclusion with the same level of evidence as a full two-arm blinded study. Indeed, the model is constructed on the basis of NHS data in which no placebo effect is expected—from either the patient or from the evaluator perspective—and the evolution of patients after a given intervention could differ from the predicted trajectory due to a placebo effect. Having a limited number of patients on placebo or progressively switching patients from placebo to active treatment may overcome or mitigate this issue. In addition, a limited number of placebo-treated patients could also help to verify that untreated patients actually follow model predictions. A similar design is being used in Audentes’ ASPIRO trial, an open-label trial for gene transfer therapy in XLMTM (NCT03199469). In the extension phase of this trial, subjects who were in the delayed treatment control group are administered the drug on trial after having completed their last visit as a control at Week 24, when the primary efficacy endpoint measures will have been assessed.
As in any clinical trial occurring in a very heterogeneous and rare population, selection biases may arise. It is important to note that, in order to minimise biases in the present study, all the centres were contacted when recruiting participants and a single physiotherapist was hired to travel and visit the European patients. Social bias was also avoided by covering the patients’ costs. Furthermore, data heterogeneity is already very large with patients covering the whole spectra of possible values on the scale under investigation. Although the risk of bias cannot be entirely mitigated, no fundamental differences are expected between the patients currently enrolled and those not.
The aim of Bayesian approach is not to lower the level of evidence required for drug approval in rare diseases, but rather to benchmark this level as close as possible to the one of drug development in more common diseases, taking into account the limited existing population and the heterogeneity in terms of age, genotype, and severity. In using an individual patient’s trajectory and borrowing information on day-to-day variability from the population to more reliably predict the individual’s course of disease and then defining a response as deviation from this course, our model can estimate treatment efficacy across patients with different disease severities. Demonstrating efficacy, even if moderate, in post-symptomatic patients may also justify moving to younger or pre-symptomatic patients where the effect can be much more dramatic, given the better state of the targeted tissue. This has been clearly demonstrated in spinal muscular atrophy [
46]: The effect demonstrated in a double-blind placebo controlled study in post-symptomatic patients [
47] was much more dramatic in a pre-symptomatic population [
48], leading to newborn screening programs across the world [
49] and a dramatic improvement in patients’ conditions. Similarly, the ability to demonstrate even a mild effect in a post-symptomatic population where an analysis between treated and placebo-controlled patients cannot be conducted for practical reasons can provide evidence supporting use of a therapy in a younger or pre-symptomatic population that cannot be initially targeted by clinical development but who are likely to benefit the most and who are likely to have the best benefit to cost ratio from a payer perspective [
50].
Acknowledgements
The authors wish to thank all investigators, PT, and CRA of the study. The authors also wish to thank the patients and parents for participating in the NHS. They also thank Jackie Wyatt for scientific editing.
The NatHis-MTM Study Group: Teresa Gidaro, Elena Gargaun, Virginie Chê, Ulrike Schara, Andrea Gangfuß, Adele D’Amico, James J. Dowling, Basil T. Darras, Aurore Daron, Arturo Hernandez, Capucine de Lattre, Jean-Michel Arnal, Michèle Mayer, Jean-Marie Cuisset, Carole Vuillerot, Stéphanie Fontaine, Rémi Bellance, Valérie Biancalana, Ana Buj-Bello, Jean-Yves Hogrel, Hal Landy, Kimberly Amburgey, Barbara Andres, Enrico Bertini, Ruxandra Cardas, Séverine Denis, Dominique Duchêne, Virginie Latournerie, Nacera Reguiba, Etsuko Tsuchiya, and Carina Wallgren-Pettersson.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.