Background
Rare diseases (RD) are diseases with a prevalence inferior to one in 2000 in the general population. Though rare, they are a major public health concern since they are collectively common, and 2–3% of births and 7–8% of adults are or will be affected by an RD [
1]. More than three million French people and about 25 million Europeans are affected by one of the 7,000 currently recognized RD. In half of all cases, RD affect children under 5 years, and they are responsible for 10% of deaths in children aged 1 to 5 years [
1]. Eighty percent of RDs are of genetic origin. Most often, they are severe chronic diseases, and they can also be progressive. They considerably affect the quality of life of affected patients, causing motor, sensory or intellectual deficits in 50% of cases, and total dependency in 9% of cases [
1]. There is a crucial lack of treatment for RD, since only 5% of these disorders have an available treatment [
2]. For these reasons, three French national plans (Plan National Maladies Rares or PNMR) have been successively established for RD since 2004, enabling France to play a leading role in the field of RD in Europe [
3]. The first PNMR structured a national network of 131 multidisciplinary reference centers for RD (RCRD) and more than 500 centers of expertise for RD (CERD), which was then revised with the 3rd PNMR, resulting in a total of 387 RCRD and 1,800 CERD. The RCRD form a network of national excellence centers with extensive geographical coverage. The CERD provide RCRD expertise to local hospitals. This network gives patients the opportunity to access comprehensive clinical work-ups and regular follow-up as close as possible to their homes. The interactions between the RCRD and expert clinical laboratories, research laboratories, patient support groups, and the other various medico-social specialties in the patient care pathways have been structured into 23 thematic networks for RD (each of them encompassing RCRD and CERD for one group of diseases, accredited by the 2nd PNMR) [
4]. Their objectives are to optimize the supply of care, improve education and training, and stimulate the development of research and innovation in the field of RD. The way in which patients with RD are managed in France strongly inspired the creation of European Reference Networks by the European Commission [
5].
The PNMR have focused on improving knowledge about the epidemiology of RD through the constitution of a dedicated registry collecting information from the rare disease network. In order to fulfill this objective, a population-based registry, called CEMARA, was launched in 2007. It collects epidemiological information about RD and related medical activities from RCRD and CERD on a national level. The goal of CEMARA was to improve the understanding of the burden of disease for rare conditions, to determine the resources needed for healthcare and social services, and to identify patients eligible for natural history studies and clinical trials [
6]. A minimum dataset (MDS) [
31] has been set up (Additional file
1: Table S1). Physicians and paramedical workers (psychologists, genetic counsellors, and social workers) enter data from the RD centers, and the system allows longitudinal follow-up of individual patients. The French Data Protection Authority authorized CEMARA in 2007. It is compliant with European GDPR regulation. A Scientific Committee has validated the studies issued from the CEMARA data. The CEMARA project has registered 500,000 RD patients from 151 RCRD (out of 387), 412 CERD and recorded over 4000 RD.
Among the 23 accredited health networks, AnDDI-Rares (Anomalies du Développement avec ou sans Déficiences Intellectuelles de causes Rares) is the network of medical genetic services implanted in university hospitals. It focuses on individuals with developmental abnormalities (malformations and intellectual disability (ID)) or not, and works with more than 5000 distinct rare monogenic diseases and a large number of chromosomal abnormalities [
7]. These diseases have a prevalence of 3% (about 1.8 million people and 40,000 new cases per year in France). These disorders share common characteristics: (i) an often difficult diagnosis requiring clinical and biological expertise, (ii) a high rate of patients with no diagnosis, (iii) coordinated care relying often on multidisciplinary therapy facilities and special-needs schooling requiring multiple interactions between hospital and non-hospital partners, and (iv) the need for epidemiological, clinical and translational research regarding the natural history and pathophysiology of developmental abnormalities, with a focus on long-awaited therapeutic solutions (often requiring multicenter cohort studies). Initially, according to the first PNMR, the AnDDI-Rares network included 22 constitutive RCRD grouped under the supervision of 8 coordinator RCRD, and 7 CERD. Currently, AnDDI-Rares includes 20 constitutive university hospitals grouped under the coordination of six RCRD (one per large French inter-region), and 29 further CERD (Additional file
2: Figure S1A). Besides the facilities for care and treatment, AnDDI-Rares includes diagnostic laboratories (38 for molecular genetics, 44 for cytogenetics, 48 fetal pathology units), 32 research teams, and over 60 family support groups. The 26 departments forming the AnDDI-Rares RCRD (beneficiary of an operating grant from the French state) have filled out the register since 2007. Participation of the CERD (which does not have a grant) was optional.
Here, to gain knowledge about patients with developmental disorders and their care pathway in France, we studied the cohort of patients followed up in AnDDI-Rares network for developmental disorders, using data from the first 10 years of CEMARA data collection. We then focused on four sub-cohorts of patients diagnosed with four different specific diseases to study their characteristics and follow-up. Lastly, we focused on the sub-cohort of patients with chromosomal anomalies.
Discussion
We present here the organization of the registry for rare developmental disorders, including intellectual disability or not as part of the AnDDI-Rares network, and provide an analysis of what we have learned from the first 10 years using the CEMARA database.
Information and knowledge about RD is usually the result of data collection and registries implemented with academic and/or commercial interests and with a limited scope. Interestingly, the online rare disease database Orphanet [
12] indexes a total of more than 700 registries and databases on RD involving European research, and it can thus be used to estimate disease prevalence [
13]. These registries and databases have a variety of aims and differ in their organization, quality and database structure, usually monitoring one disease or a group of related diseases [
14‐
16]. In order to encourage the development of knowledge about RD, several countries have launched national initiatives to build registries including all RDs and with the suggestion of international cooperation, in particular within the European Reference Networks [
17]. These registries, if properly implemented with accurate and high-quality clinical data and long-term support, can facilitate health service planning, epidemiological research and clinical trial recruitment. Nevertheless, the collected data must be congruent with the aims of the registry. Registries are particularly important for rare or poorly-understood diseases that affect small numbers of patients, complex delayed diagnoses, a propensity for variable standards of care and limited treatment options.
The first French RD initiative, CEMARA, has been collecting information on RD epidemiology and related medical activities from RCRD and CERD on a national-level since 2007. To date, the data entered in CEMARA has already been used by some networks for RD [
18]. Other publications have also been facilitated by CEMARA’s infrastructure [
19]. More specifically, data on age, sex ratio, type of care, median distances travelled by patients, the most frequent type of referrals, and diagnosis categories or precise diagnoses (when available) can be obtained.
Interestingly, when compared with the data obtained for the Head and Neck Network, in which nearly 80% of patients are required to visit Paris hospitals to obtain diagnosis, care or follow up [
18], the distribution of RCRD/CERD within the AnDDI-Rares network (Additional file
2: Figure S1A) has optimized the distance patients must travel to obtain specialist care for RD.
Unlike most registries collecting detailed information on specific rare diseases, the main aim of this nationwide database is not to improve knowledge on the natural history of diseases. This is because the scope of CEMARA covers all rare diseases, which is too vast to achieve such a goal. Even so, some aspects of a disease’s natural history can be analyzed straightforwardly (age at first signs, age at death, e.g.) while others can be inferred (based on the care pathway, age at diagnosis, e.g.). A deep phenotypic description is possible through HPO terminology, but we observed that not all physicians extensively code this optional information.
In addition, we demonstrated the information that can collected on specific topics using the examples of four well-known easily recognizable diseases, including two chromosomal abnormalities (22q11 microdeletion and Williams syndromes) and two mendelian diseases (Cornelia de Lange (CdLS) and Rubinstein-Taybi syndromes (RTS)). In these examples, the large number of patients in the database makes it possible to compare with other initiatives [
20‐
23]. CEMARA collects a greater number of patients compared with the the biggest available national cohort for three of the four diseases chosen herein. Only the Children’s Hospital of Philadephia’s Cohort on Cornelia de Lange was bigger than CEMARA’s.
In the field of RD, patient organizations are usually the best resource for reaching out to a significant number of patients for a given disease. However, this national database proves to be an even more efficient tool to collect patient data since it includes information on every patient seen in an RCRD and some CERD. It can thus provide larger cohorts than those currently found in the literature for most RD. Similarly, over time the database has accumulated a vast number of patients with the various chromosomal abnormalities, diagnosed by karyotype, FISH or array-CGH. Data on chromosomal abnormalities associated with developmental phenotypes can be of great interest, yet there are no extensive epidemiological references in the literature since the use of chromosomal microarrays became more common. Researchers can now solicit the network if they want to focus on a certain disease and contact the referring clinicians all over the country for additional information. This possibility will be interesting for international collaborations on the increasing numbers of ultra-RD, but also for long-term follow-up of well-known diseases.
The CEMARA registry is comparable with other national projects that have been published in the literature. In Europe, CEMARA resembles most its Italian counterpart, which launched in 2001 as a government baseline project to support health policy decision-making in the field of RD [
24,
25]. They established a national registry of RD as a network of regional networks through 247 formally designated centers with recognized expertise, reaching full coverage of the country by 2011. After a common data set was defined for the country, they performed different quality control processes at regional and national levels. One of the main issues was tracking duplicate records. Up to June 2012, they recorded 110,841 patients. Data was carefully monitored through a validation process using formal criteria, and issues in the data were corrected by the data sources. Data of age at onset and sex distribution were provided for about 400 diseases, and incidence and/or birth prevalence provided for 275 diseases and 47 disease groups, which, altogether, comprise a substantial part of the known RD. The main difference lay in the fact that CEMARA was launched as a national project, allowing a nation-wide common data collection from the outset, thus a greater hindsight, compared with the Italian project. Both projects shared similarities regarding the type of data which may foreshadow comparative and/or pooling data studies.
Other initiatives exist outside of Europe. In the majority of cases, the strategy was to create alliances of existing RD registries, with the creation of a central repository aiming to improve consistency, harmonize data, support the development of knowledge on RD, share data, enhance research collaboration, improve interoperability, and reduce costs. The USA National Institute of Health launched a movement to create a Global RD Patient Registry and Data Repository in 2010 [
26], but unlike CEMARA the contribution to this RD-hub was based on goodwill. In China, a nationwide RD registry has been set up along with a bio-bank of genomic data to provide standardization and create research collaborations, both domestic and international [
27,
28]. In 2017, Japan decided to combine data from 300 RD projects through a cross-sectional data integration platform (RADDAR-J) [
29], aiming to promote data sharing and secondary use for research and collaboration. This Japanese initiative only focused on 300 RD, thus lacked information compared to CEMARA. A global observatory for rare disease could be achieved through the combination of these various initiatives, to the great benefit of patients: given the small number of cases in each country, it is of paramount importance that data be analyzed on the widest possible scope.
This work provides elements relative to the functioning of the database over the first 10 years. We have identified many important limitations that we wish to share with other countries which are attempting to implement nationwide epidemiological projects. Epidemiological information regarding RD is challenging to collect for a number of reasons, including the coding and classification of RD. In our case, this difficulty was overcome with the implementation of a unique disease identifier resulting from the exhaustive work of the online rare disease database Orphanet on the labeling of diseases: OrphaCodes. While exhaustiveness is usually is a challenge for any registry, public funding conditional to participation in the CEMARA project will remain a significant incentive. Unfortunately, such an incentive is not in place for CERD, even if the French Ministry of Health is providing other operational support in order to facilitate inclusion. Indeed, the RCRD have an obligation to enter all of their activity into the CEMARA database to keep their funding, unlike the CERD. As a result, most CERD do not collect patient data, so there are limits to the epidemiological work that can be carried out. Another minor limitation is that of duplicates: patients can consult in different RCRD/CERD, which implies the creation of a new file, and so any multicenter analysis requires the identification of potential duplicates. Another limit is a lack of homogeneity in the way data are entered in the different RCRD/CERD since the definition of items may not always be straightforward. For instance, a physician may consider a diagnosis as confirmed based on clinical evidence, while another may consider that confirmation is achieved only after genetic confirmation. Some improvements have been made to overcome these issues, including a frame of reference to homogenize the way data is entered, and communication in meetings to insist on the importance of epidemiology in France. A major issue is the surveillance of patients with no diagnosis, which is considered a priority of the third national RD plan [
30]. Indeed, the database does not permit to identify age at clinical diagnosis, age at diagnosis of a category, or a precise clinical diagnosis by a chromosomal/molecular confirmation. This issue will be improved in the next version of the database, since it is part of the vast epidemiological national surveillance project for undiagnosed patients. It is also difficult to ensure that the data are updated when a diagnosis is made, particularly when the results are not delivered in the context of a novel referral to the RCRD/CERD. Despite the limited amount of information collected, specific studies could be performed within the network through the identification of the exact number of patients by RCRD/CERD in France with a disease of interest. This would enable national studies to be performed, or, through linkage with other sources, to seek data that could be used to improve the management of RD, facilitate research, such as phenotype/genotype correlations or drug surveillance, or exposes economic issues such as the burden of RD. New perspectives are currently raising with the launch of a registry of patients with no diagnosis, enabling to better identify patients without diagnosis, to whom new research programs could be proposed. Also, at the dawn of the arrival of therapeutic projects in RD, the database will allow the selection of potential candidates for a therapeutic trial according to their demographic characteristics. For this purpose, although individual sites cannot access data from other sites, it is possible to ask the project coordination team, in agreement with the network, for the number of people affected by selected criteria and their referring center. In this way, the applicant can contact his or her colleagues in the framework of his project.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.