2.1 Automatic Selection of Controls
In this article, we present a dual approach combining the Anatomical Therapeutic Chemical (ATC) classification system code and the approved therapeutic indication in the US Summary of Product Characteristics (SmPC) of galcanezumab for automatizing the selection of controls for disproportionality analysis when using FAERs. The ATC code is a unique code assigned to all active ingredients and consists of five levels: (1) the anatomical main group, (2) the therapeutic subgroup, (3) the pharmacological subgroup, (4) the chemical subgroup, and (5) the chemical substance [
7]. By using the 4th level of the ATC code, or rather the chemical subgroup, we identified all active ingredients with the same therapeutic target (i.e., CGRP antagonists) as galcanezumab. Additional controls were identified by selecting active ingredients outside the chemical subgroup of galcanezumab but with the same approved therapeutic indication to avoid masking due to drug class effect [
8] and confounding by indication. Drug class effect masking is a phenomenon that can occur when only drugs within the same chemical subgroup are used as controls in disproportionality analysis [
8]. Confounding by indication is a type of bias that can occur when drugs used for different therapeutic indications are compared in disproportionality analysis [
9].
To identify controls with the same approved therapeutic indication but outside the chemical subgroup of galcanezumab, DrugBank was used. DrugBank is a comprehensive online database containing information on active pharmaceutical ingredients and their respective pharmacological targets. In DrugBank it is possible to obtain a list of all active ingredients approved for a certain disease by searching a structured field known as “associated condition.” An “associated condition” is a term found in the drug indication section of DrugBank and refers to any specific state or medical condition for which the drug is indicated. By searching this field in DrugBank it is possible to obtain a list of drugs with an approved therapeutic indication for the medical condition under investigation which, in the case of galcanezumab, is “migraine prophylaxis.” To test the reliability of our approach for selecting controls in disproportionality analysis, we computed the proportion of drug classes correctly retrieved among drug classes mentioned in two recently published articles [
10,
11]: (number of drug classes retrieved with our approach / number of drug classes mentioned in the two recently published reviews) × 100%. For each active ingredient, drug classes were obtained from the 4th level of the ATC code.
2.2 Data Sources and Management
ICSRs in which galcanezumab and the selected control groups (see Sect.
2.1) were reported as the suspected drugs were extracted from FAERS. FAERS is the spontaneous reporting database maintained by the FDA. The goal of FAERS is to support the FDA’s post-marketing safety surveillance. The database consists of spontaneous reports of AEs and product quality complaints, and medication errors reported by health care professionals and consumers are collected. ICSRs recorded in FAERS use the structure provided by the International Council for Harmonization of Technical Requirements for Pharmaceuticals for Human Use (ICH) E2B guideline to collect ICSRs. This structure is used in all spontaneous reporting databases from the countries participating in the World Health Organization’s Programme for International Drug Monitoring, which accounts for 99% of countries worldwide [
12]. All available zipped ASCII quarterly data extract files from the fourth quartile of 2012 to the third quartile of 2021 from FAERS were downloaded from the OpenFDA website. A local database was set up in R (version 4.1.2, R Development Core Team) by using the primary identifier of each ICSR as the key to linkage based on the relational structure described by Kass-Hout and colleagues [
13].
2.4 The Rationale for Using Conditional Inference Trees
A conditional inference tree is a method to build up a prediction model via classifying characteristics of cases and controls (i.e., predictors) to the outcome (i.e., the AE). In a decision tree, the characteristics of the cases/controls and outcomes under investigation can be binary: true or false, yes or no.
A conditional inference tree consists of a root, branches, and leaves. The predictor that has a higher degree of association with the outcome represents the root node of the tree. To determine the strength of association of individual predictors and therefore determine the hierarchy of the conditional inference trees, impurity measures are used (e.g., Gini Impurity Index) and statistical tests are used to determine disproportionality (i.e., chi-squared test). Impurity measures assess how good a predictor is in classifying individuals as having/not having the outcome. Each time before adding a new set of predictors for the outcome, the impurity of the whole tree is computed. The smaller impurity the better the classification is. Where to draw the line to divide the numerical value depends on the impurity as well.
In this study, conditional inference trees were used to identify co-reported drugs that were disproportionally reported between cases and controls reporting the AE under investigation. The advantage of using decision trees in this setting relies on the fact that this technique, if compared with the current approach used in disproportionality analysis (i.e., stratification), can investigate preemptive non-linear relationships between predictors and the outcome. The rationale for using conditional inference trees is also epidemiological. If cases and controls have statistically significant differences in the proportion of a co-reported drug, the latter may represent a potential confounder for the investigated signals if we subset cases and controls with the AE under investigation.
2.5 Framework Validation Using Simulated Data
The objective of the simulation was to show some generalizability of the conditional inference trees for the identification of confounders among co-reported drugs in disproportionality signals. The method was able, in three different scenarios, to correctly identify the disproportionally co-reported drugs serving as confounders for the disproportionality signals.
Scenario 1: We have simulated 1000 cases of a hypothetical AE for galcanezumab (500 cases) and a control (500 cases) from the multivariate normal distribution of three Gaussians [
26] as follows: age~N(40,1), weight~N(75,1), and number of co-reported drugs~N(2,1). The correlation between age and weight was −0.8, the correlation between age and the number of co-reported drugs was 0.7, and the correlation between weight and the number of co-reported drugs was −0.9. We have then generated three binary variables (i.e., co-reported drug 1, co-reported drug 2, and co-reported drug 3) with marginal probabilities of 0.25–0.75, 0.75–0.25, and 0.5–0.5 with a binary variable named outcome. The variables co-reported drug 1, co-reported drug 2, and co-reported drug 3 were generated by thresholding a normal distribution. The threshold was the median of the numerical variable generated with a normal distribution as shown in Electronic Supplementary Material (ESM) 1. The correlations of the components were specified as a correlation matrix of the binary distribution [
27]. Co-reported drug 1 was correlated with co-reported drug 2 (i.e., 0.514) while co-reported drug 3 was not correlated with co-reported drug 1 and co-reported drug 2 (i.e., −0.020 and 0.048). The variable outcome was correlated with co-reported drug 1 (i.e., −0.628) and co-reported drug 2 (−0.370) but not with co-reported drug 3 (i.e., −0.026).
This setup accounts for a scenario where co-reported drug 1 and co-reported drug 2 are correlated with the outcome, as they are disproportionally co-reported between galcanezumab and the control cases having a hypothetical AE. Additionally, in this scenario, co-reported drugs 1 and 2 are correlated variables with different strengths of correlation with the outcome. It is expected that the framework identified in the decision tree co-reported drug 1, as it is the co-reported drug with the strongest correlation with the outcome.
Scenario 2: We used the same setting as in scenario 1 but we reduced the sample size to 50 cases. This setup accounts for a scenario in which there is a limited amount of cases to perform disproportionality analysis. It is expected that the framework identified in the decision tree co-reported drug 1, as it is the co-reported medication with the strongest correlation with the outcome.
Scenario 3: We used the same setting as in scenario 1 but we dropped co-reported drug 1 from the dataset and changed the marginal probabilities of co-reported drug 2 to obtain a 20% disproportionality between galcanezumab and the control’s cases. This setup accounts for a scenario in which there is a relatively small disproportionality between cases and controls for co-reported drug 2. It is expected that the framework identified in the decision tree co-reported drug 2, as it is the co-reported medication with the strongest correlation with the outcome.
In all scenarios, we performed a descriptive analysis by presenting frequencies and percentages of binary variables and plotting the kernel density distribution of age, weight, and number of co-reported drugs and correlation matrix of variables in the analytical datasets. Additionally, we modeled a conditional inference tree to identify co-reported drugs that were disproportionally reported between galcanezumab and the control reporting the AE under investigation.