Viral Proxies
Depending on data availability, viral proxies can be derived from hospital or surveillance data. The testing frequency of viral surveillance systems has been predominantly driven by influenza activity rather than RSV (or respiratory virus activity as a whole). Therefore, such systems may underestimate the circulation of RSV by testing less frequently during peak RSV activity. Also, many of these systems are based on influenza-like illness case definitions requiring fever, which is less commonly seen in RSV. As our model primarily aims to estimate medically attended RSV burden, we use hospital-based viral proxies where possible, as this allows the proxy to be directly derived from the healthcare system whose outcomes we are assessing and avoids any geographic mismatch that might arise from the use of sentinel viral surveillance data. The proxies seek to accurately track the relative level of viral activity in the community, so the absolute value of the activity is less important than consistent measurement across the year to accurately track relative trends. On this basis, as has been done in other studies [
6,
29,
31], we use pediatric RSV activity for the RSV activity proxy as testing is frequent among young children, allowing for consistent measurement of RSV activity. They are represented by the number of RSV-related hospitalizations (ICD-10 codes: B97.4, J21.0, J12.1, J20.5, J21.9 or ICD-9 codes: 079.6, 466.11, 480.1, 466.1) in children < 2 years. Because the vast majority of bronchiolitis cases and hospitalizations in children < 2 years are related to RSV [
34,
35], the more generic bronchiolitis code (J21.9 or 466.1) is included in the RSV proxy to accommodate for the reduction in RSV testing during the peak and tail of the season, which we have observed in administrative databases in several countries. For influenza, the largest burden and most consistent testing is among older adults, so we use influenza-specific hospitalizations (ICD-10 codes: J09-J11 or ICD-9 codes: 487, 488) in adults ≥ 65 years as has been done in other studies [
13].
Time lags of 0 up to 4 weeks between the viral proxy and the outcome of interest are considered during model building to account for delays between changes in viral proxy detection and the number of events. For GP visits, potential time lags are shortened (0–2 weeks) to reflect the expectation that this would be the first source of care in most cases.
Stratifying Variables
Proposed age groups are 18–44 years, 45–64 years, 65–79 years and ≥ 80 years, but can be adapted to country-specific vaccination recommendations and data availability.
Risk factors for RSV are identified as the presence of at least one comorbidity code (Supplementary Materials, Table 3) within 1 year prior to the event. Low risk is defined as the absence of any comorbidity codes. Due to limited knowledge of risk factors for severe RSV disease [
3], risk factors for influenza are used to develop the set of comorbidity codes [
36]. Data on the risk status should be obtained from the same database as the outcome data.
Planned Outcomes
The generic protocol proposes four types of events: GP visits, ED visits, hospitalizations and deaths. A GP visit is defined as a visit to a GP. An ED visit is defined as a visit to a medical treatment facility specialized in emergency medicine, not leading to hospitalization. A hospitalization is defined as an overnight stay in a hospital.
The protocol proposes four primary outcomes: all cardiorespiratory (broad), selected cardiorespiratory (narrow), all respiratory and all cardiovascular events (see Supplementary Materials, Table 1). Both broad and narrow cardiorespiratory event definitions are considered to differentiate between the full group of respiratory and cardiovascular events and the selected cardiorespiratory codes most likely to be associated with RSV, as recommended by experts and existing literature [
3,
4]. For the ICD outcome grouping, both primary and secondary diagnoses are used, as has been done in other studies [
18,
22,
23,
31], to obtain a more comprehensive assessment of RSV-attributable events. This strategy is elected because the use of primary diagnosis only has been shown to underestimate the LRTI burden [
37]. For deaths, outcome groups are defined using the underlying cause of death. If data on all-cause death are available, a sensitivity analysis could be conducted in which outcome groups are defined using all reported causes of death.
In addition to the primary outcomes, nine secondary outcomes are selected, based on literature review and usefulness for policy assessment, to provide more specific estimates that can be used for economic evaluation [
4]. The following secondary outcomes, composed of a subset of the primary outcomes and defined by ICD code groups, are incorporated: influenza or pneumonia, bronchitis or bronchiolitis, chronic lower respiratory diseases, upper respiratory diseases, chronic heart failure exacerbations, ischaemic heart diseases, arrhythmias, cerebrovascular diseases and myocarditis (see Supplementary Materials, Table 2).
Data Requirements
The minimum data to be collected from each country-specific study include (1) outcomes (as defined above); (2) viral proxies (as defined above); (3) age group, for stratification by age groups; (4) risk status (if available), for further stratification by risk status.
Data are obtained from diverse sources, including national/regional registries or claims/EHRs from different care settings, such as GP/outpatient, ED, hospital and death registries.
Outcome data should be aggregated at least monthly, ensuring sufficient variability for seasonal modelling, and should have a well-defined catchment population (denominator) accurately representing the region/country studied, enabling incidence calculations. If the system does not have complete capture of the events in the catchment area, well-delineated adjustment factors should exist (e.g., a scaling factor to weigh up/down specific age groups). Risk status information should be extracted from the data sources from which outcome data are obtained, as the prevalence of risk factors is expected to differ by event type. To obtain risk-specific incidence rates, the catchment population should also be available stratified by risk status.
Preparation of Time Series Data
Data are aggregated weekly (or monthly) by age group and (if applicable) risk status. If cells with low counts (i.e., below the country-specific limit to guarantee anonymity, usually 5) are suppressed, a random number within the suppressed range (e.g., 1–4) is imputed to complete the time series. A shell table for constructing time series of the outcome data is given in the Supplementary Materials, Table 4.
Viral proxy data are extracted from hospital registries as discussed above and should be aggregated at the same level as the modelled outcome data (i.e., weekly or monthly). Shell tables for the weekly and monthly viral proxy data are given in the Supplementary Materials, Tables 5 and 6.
Data Analysis
Each country-specific study should establish a Statistical Analysis Plan (SAP) adapted to the country-specific data before initiating data analysis. The example of the country-specific SAP for Spain is provided in the Supplementary Materials. Quality control of the analysis scripts is planned before analysing the data. An example of country-specific scripts can be obtained from the authors upon request.
Descriptive statistics summarize the number of events for each year, both overall and stratified by age group and (if applicable) risk status. The observed number of events is plotted over time for each outcome stratified by age group and risk status (e.g., respiratory hospitalizations for adults aged 18–45 years with high-risk conditions) to evaluate if a seasonal trend is visually present, hence qualifying the data for seasonal modelling.
The weekly (or monthly) number of events is modelled separately for each outcome and each stratum (age group and, when applicable, risk status) using a quasi-Poisson regression model to allow for potential overdispersion. The identity link function is chosen to reflect the most plausible biological relation between viral circulation and the event occurrence. Seasonal variations in the number of events are captured by the periodic time trends represented by sine and cosine terms with weekly (period = 52.143) or monthly (period = 12) periodicity. The aperiodic time trends are reflected by a polynomial up to the fourth order. The seasonal terms are included in the model to accurately model the outcome (e.g., all respiratory events), not the viral proxy (e.g., RSV). RSV enters the model as a covariate; therefore, the regional pattern of RSV should not affect the suitability of this modelling approach. The viral activity is represented by appropriately lagged viral proxies for RSV and influenza. Although we anticipate a shorter lag for influenza than for RSV, we allowed the model to select the most suitable time lag for each pathogen.
Assume that the (weekly/monthly) number of events follows a Poisson distribution:
\({\text{Nr}}.{{\text{events}}}_{t}\sim {\text{Poisson}}\left({\lambda }_{t} . \theta \right)\) with
\(t=1, 2, 3,\ldots T\) the running week and
T the total number of weeks in the study period, then the expected number of events
\({E}\left({\text{Nr}}.{{\text{events}}}_{t}\right) = {\lambda }_{t}\) and the variance
\({\text{Var}}\left({\text{Nr}}.{{\text{events}}}_{t}\right) = {\lambda }_{t} . \theta\), with
\(\theta\) the overdispersion parameter. For weekly data,
\({\lambda }_{t}\) is specified as follows:
$${\lambda }_{t}= {\beta }_{0}+\sum_{k=1}^{4}{\beta }_{k}.{t}^{k}+{\beta }_{5}.{\text{sin}}\left(\frac{2\pi .t}{52.143}\right)+ {\beta }_{6}.{\text{cos}}\left(\frac{2\pi .t}{52.143}\right)+{\beta }_{7}.{\text{sin}}\left(\frac{4\pi .t}{52.143}\right)+ {\beta }_{8}.{\text{cos}}\left(\frac{4\pi .t}{52.143}\right)+\sum_{l=1}^{L}{\beta }_{\left(8+l\right)}.{{{\text{VP}}}_{l}}_{\left(t-{m}_{l}\right)}$$
where
\({\beta }_{0}\) is the expected number of baseline events,
\({\beta }_{k} \left(k=1,\ldots ,4\right)\) are coefficients associated with aperiodic time trends while
\({\beta }_{q} \left(q=5,\ldots ,8\right)\) are coefficients corresponding to yearly and half-yearly time trends. The effect of pathogen
\(l\) (
\(l = 1,\ldots ,L\) with
\(L\) the total number of pathogens under consideration) is represented by the coefficient
\({\beta }_{8+l}\) associated with the appropriately lagged activity of pathogens
\({{\text{VP}}}_{1},\ldots , {{\text{VP}}}_{L}\), with
\({m}_{l}= 0, 1,\ldots ,M\) and
M the maximally allowed time lag (2 or 4, depending on the outcome).
The expected number of monthly events is specified as follows:
$${\lambda }_{t }= {\beta }_{0 }+ \sum_{k=1}^{4}{\beta }_{k} . {t}^{k} + {\beta }_{5 }. {\text{sin}}\left(\frac{2\pi . t}{12}\right) + {\beta }_{6} . {\text{cos}}\left(\frac{2\pi . t}{12}\right) + \sum_{l=1}^{L}{\beta }_{\left(6+l\right)} . {{\text{VP}}}_{{l}_{t}}$$
where
\({\beta }_{0}\),
\({\beta }_{k} \left(k=1,\ldots ,4\right)\) and
\({\beta }_{q} \left(q=5, 6\right)\) are defined as above, and the effect of pathogen
\({{\text{VP}}}_{1},\ldots ,{{\text{VP}}}_{L}\) is represented by the coefficients
\({\beta }_{6+l } (l=1, \ldots , L).\)
The model-building procedure consists of two steps: first, identify the appropriate order of the aperiodic time trend; second, determine the proper lag of the viral proxies. In the first step, the model is fitted with only time trends (periodic and aperiodic time trends with all four polynomials). The periodic time trends are fixed to reflect the biological plausibility of seasonal trends of the data, while the order of the aperiodic time trend (
\(\sum_{k=1}^{4}{\beta }_{k} . {t}^{k}\)) can be reduced up to first order (
α =
\(0.05).\) In the second step, each (lagged) proxy variable is included in the model from step-1 one at a time. The variable with the highest test statistic is selected for inclusion into the model. Using test statistics instead of
P values is preferred to facilitate the assumption that the viral activity is biologically implausible to protect against the outcomes of interest [
16,
38]. Once a pathogen is included in the model, this step is repeated with the variables corresponding to the rest of the pathogens until one (lagged) variable for each pathogen is included in the final model.
Model fit is assessed visually by investigating the plots of observed versus estimated number of events over time, as there is no readily available numeric goodness-of-fit measurement. The number of events attributable to RSV is calculated as the difference between the expected number of events from the full model and those from the model without the RSV term (by setting the coefficient associated with RSV to zero). The yearly incidence rates of events attributable to RSV are calculated as the annual number of events attributable to RSV divided by the corresponding denominators (multiplied by 100,000). Depending on the database, the denominators are either the age- and risk-specific (if applicable) census population or the number of individuals captured within the registered nationally representative databases. The confidence intervals around the estimates are obtained using residual bootstrapping with 1000 bootstrapped samples [
39]. Given the considerably large number of bootstrapped samples, the incidence rates are assumed to be normally distributed with a mean equal to the estimated IR. Results are presented with IRs and their 95% confidence intervals (CIs). When risk- and age-specific analyses are performed, the corresponding IR in combined age- and/or risk-specific populations is calculated as the sum of the number of events attributable to RSV across risk groups and age groups divided by the sum of the corresponding population sizes (multiplied by 100,000).
To have a broad overview of the disease burden, the yearly percentage of the number of events attributable to RSV is derived as the proportion of the yearly number of events attributable to RSV out of the yearly number of observed events (presented as percentages).
While the primary analysis for this study is based on the frequentist framework in all countries, the analysis could also be conducted in a Bayesian framework. This framework has the advantage of easily incorporating prior knowledge in parameter estimation (e.g., forcing the RSV parameter to be minimally zero to reflect the assumption that RSV is not likely to protect against the outcomes of interest) and obtaining the posterior mean with its 95% credible interval without the need for additional analyses such as bootstrapping. However, a Bayesian model comes with the risk of experiencing difficulties in obtaining convergence and, when convergence is obtained, a considerable runtime. Given the potential benefit of such analyses, weighted against their limitations, the Bayesian model similar to that proposed by Zheng et al. [
31] is used as a sensitivity analysis to assess the impact of the selected framework on the primary outcomes in the first two countries.