Background
Methods
Setting and participants
Instruments
Data analysis
Psychometric property | Definition/criteria for acceptability |
---|---|
Acceptability | Assessed by data quality and targeting. Data quality refers to the completeness of item- and scale-level data. Assessed by completeness of data; criterion for missing data <10% [20]. Targeting is the extent to which the range of the variable measured by a scale matches the range of that variable in the study sample. Assessed by: maximum endorsement frequencies <80% [17], aggregate endorsement frequenciesa >10% [17], and skewness statistic −1 to +1 [35‐37], proximity of scale mean score to scale midpointb (no fixed criterion but closer matches indicated better targeting) [38], and acceptable distribution of ACTS Burdens scoresc (no fixed criterion but closer to 100% indicates better targeting) [39] |
Scaling assumptions | Tests of scaling assumptions assess the extent to which it is legitimate to sum a set of items, without weighting or standardisation, to produce a single total score. This criterion is satisfied when items have adequate corrected-item total correlations ≥0.30 [38, 40] and the proposed grouping of items in each subscale is correct. Assessed by using two complementary approaches: principal components analysis (factor loadings >0.30, cross-loadings <0.20) and item convergent and discriminant validity (item own-scale correlations >0.30, magnitude >2 standard errors than other scales) |
Reliability | Reliability is the extent to which scale scores are not associated with random error |
Internal consistency reliability
| |
Test-retest reproducibility | This is based on the agreement between people scores at screening and baseline, and estimates the ability of components and scales to produce stable scores [34]. For adequate test-retest reproducibility, scale-level intraclass correlation coefficients ≥0.80 [40] and item-level intraclass correlation coefficients ≥0.50 [43] should be achieved |
Validity | Validity is the extent to which a scale measures the construct that it is intended to measure |
Validity (within scale) | Evidence that a scale measures a single construct, and that items can be combined to form a summary score. Assessed on the basis of internal consistency reliability (Cronbach’s alpha ≥ 0.80) and factor analysis (factor loadings >0.30, cross-loadings < 0.20) |
Validity (correlations between scales)
| Correlations between ACTS scales: moderate correlations (0.30–0.70) expected. Correlations between TSQM IId[44] and ACTS scales: low correlations (< 0.30) expected between TSQM II Effectiveness and ACTS Burdens/ACTS Benefits; low correlations (< 0.30) expected between TSQM II Side-effects and ACTS Benefits; moderate correlations (0.30–0.70) expected between TSQM II Side-effects and ACTS Burdens; moderate correlations (0.30–0.70) expected between TSQM II Convenience and ACTS Burdens/ACTS Benefits; moderate correlations (0.30–0.70) expected between TSQM II Global Satisfaction and ACTS Burdens/ACTS Benefits |
Discriminant validity
| Evidence that a scale is not correlated with other measures of different constructs. Assessed on the basis of correlations between the ACTS and age and gender; low correlations (<0.30) expected between ACTS scores and age and gender |
Known-groups validity/hypothesis testing
| Ability of a scale to detect hypothesised differences between known subgroups. Assessed by testing the hypothesis that known groups defined on the basis of high vs low ACTS global scores for: i) Burdens (Q13) and ii) Benefits (Q17) will differ significantly (in the expected direction) on ACTS Burdens and Benefits scale scores; based on ANOVA (p<0.05) |
Responsiveness | The ability of the ACTS Burden and Benefits scales to detect significant change over time, assessed by examining scores at two or more time points of surgery and calculating an effect size statistic calculated as the mean difference (change score) in scores at time point 1 to time point 2 divided by the standard deviation of the time 1 score [44]. Clinically, increasing moderate effect sizes over time would be expected, reflecting improved treatment satisfaction. Effect sizes were interpreted as the following: 0.20 (small change), 0.50 (moderate change) and >0.80 (large change) [45] |
Results
Sample
Psychometric properties: scale level by study/language version (Dutch, Italian, French, German, English)
Acceptability: data quality and targeting
Acceptability | Targeting | Reliability | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Item missing data (%)a | Possible range (midpoint) | Actual score range | Mean score (SD) | Floor/ceiling effects (%)b | Skewness | Cronbach’s alpha | Test-retestc | Mean IICd | Range IIC | |
EINSTEIN DVT dataset
| ||||||||||
Dutch version (n=332)
| ||||||||||
ACTS Burdens
| 4 | 12–60 (36) | 34–60 | 53.4 (5.45) | 0/10 | –0.99 | 0.79 | 0.95 | 0.24 | 0.03–0.69 |
ACTS Benefits
| 1 | 3–15 (9) | 3–15 | 10.5 (2.20) | 1/3 | −0.77 | 0.80 | 0.72 | 0.56 | 0.46–0.66 |
Italian version (n=217)
| ||||||||||
ACTS Burdens
| 4 | 12–60 (36) | 33–60 | 52.4 (6.26) | 0/12 | −0.78 | 0.90 | 0.94 | 0.43 | 0.25–0.83 |
ACTS Benefits
| 1 | 3–15 (9) | 3–15 | 10.7 (2.15) | 1/8 | 0.03 | 0.90 | 0.94 | 0.75 | 0.68–0.79 |
French version (n=222)
| ||||||||||
ACTS Burdens
| 4 | 12–60 (36) | 33–60 | 55.6 (4.97) | 0/14 | −1.35 | 0.89 | 0.95 | 0.40 | 0.23–0.70 |
ACTS Benefits
| 3 | 3–15 (9) | 3–15 | 11.5 (2.40) | 2/12 | −0.89 | 0.86 | 0.76 | 0.68 | 0.60–0.72 |
German version (n=243)
| ||||||||||
ACTS Burdens
| 5 | 12–60 (36) | 34–60 | 52.0 (5.91) | 0/7 | −0.86 | 0.84 | 0.98 | 0.29 | 0.04–0.86 |
ACTS Benefits
| 1 | 3–15 (9) | 3–15 | 12.2 (2.15) | 2/12 | −1.97 | 0.82 | 0.91 | 0.60 | 0.56–0.67 |
English version (n=322)
e
| ||||||||||
ACTS Burdens
| 0 | 12–60 (36) | 23–60 | 51.6 (6.78) | 0/7 | −1.12 | 0.86 | 0.98 | 0.36 | 0.12–0.67 |
ACTS Benefits
| 0 | 3–15 (9) | 3–15 | 11.4 (2.50) | 1/14 | −0.62 | 0.87 | 0.91 | 0.70 | 0.59–0.74 |
Reliability: internal consistency, test-retest and homogeneity coefficients
Psychometric properties: item and scale level by combined language versions (EINSTEIN DVT pooled language datasets)
Acceptability: data quality and targeting
Data quality | Scaling assumptions | Targeting | Reliability | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Item missing data (%)a | Possible range (midpoint) | Actual score range | Mean score | SD | CITC | Floor/ceiling effects (%)b | Skewness | Cronbach’s alpha | Test-retestc | Mean IICd | Range IIC | |
Burden items
| ||||||||||||
Q1 Bleeding/vigorous activities | 1 | 1–5 | 1–5 | 4.26 | 1.03 | 0.50 | 3/57 | −1.41 | - | - | - | - |
Q2 Bleeding/usual activities | 1 | 1–5 | 1–5 | 4.47 | 0.86 | 0.50 | 1/65 | −1.79 | - | - | - | - |
Q3 Bruising | 0 | 1–5 | 1–5 | 4.39 | 0.87 | 0.45 | 1/58 | −1.49 | - | - | - | - |
Q4 Avoid other medicines | 0 | 1–5 | 1–5 | 4.54 | 0.80 | 0.39 | 1/69 | −1.84 | - | - | - | - |
Q5 Limit eat/drink | 0 | 1–5 | 1–5 | 4.55 | 0.78 | 0.46 | 1/69 | −1.87 | - | - | - | - |
Q6 Hassle/daily | 0 | 1–5 | 1–5 | 4.41 | 0.78 | 0.60 | 0/56 | −1.22 | - | - | - | - |
Q7 Hassle/occasional | 1 | 1–5 | 1–5 | 4.02 | 0.98 | 0.57 | 2/38 | −0.85 | - | - | - | - |
Q8 Difficult to follow your ACT | 0 | 1–5 | 1–5 | 4.71 | 0.58 | 0.52 | 0/77 | −2.19 | - | - | - | - |
Q9 Time-consuming ACT | 0 | 1–5 | 1–5 | 4.52 | 0.69 | 0.43 | 0/62 | −1.40 | - | - | - | - |
Q10 Worry about ACT | 1 | 1–5 | 1–5 | 4.07 | 0.91 | 0.56 | 1/37 | −0.84 | - | - | - | - |
Q11 Frustrating ACT | 0 | 1–5 | 1–5 | 4.45 | 0.87 | 0.61 | 1/64 | −1.76 | - | - | - | - |
Q12 Burden of ACT | 1 | 1–5 | 1–5 | 4.49 | 0.75 | 0.65 | 0/62 | −1.52 | - | - | - | - |
ACTS Burdens scale
| 4 | 12–60 (36) | 23–60 | 52.9 | 6.1 | - | 0/11 | −1.08 | 0.85 | 0.97 | 0.32 | 0.13–0.66 |
Benefit items
| ||||||||||||
Q14 Confident in ACT | 1 | 1–5 | 1–5 | 3.77 | 0.91 | 0.71 | 3/19 | −0.88 | - | - | - | - |
Q15 Reassured by ACT | 1 | 1–5 | 1–5 | 3.71 | 0.87 | 0.80 | 3/15 | −0.84 | - | - | - | - |
Q16 Satisfied with ACT | 0 | 1–5 | 1–5 | 3.79 | 0.90 | 0.69 | 3/19 | −0.92 | - | - | - | - |
ACTS Benefits scale
| 1 | 3–15 (9) | 3–15 | 11.3 | 2.4 | - | 1/10 | −0.80 | 0.86 | 0.86 | 0.67 | 0.59–0.73 |
Psychometric properties: scaling assumptions
Item | EINSTEIN DVT (N=1336) | |
---|---|---|
Component | ||
1 | 2 | |
ACTS Burdens
| ||
Q1 Bleeding/vigorous activities | 0.58 | −0.06 |
Q2 Bleeding/usual activities | 0.57 | 0.03 |
Q3 Bruising | 0.54 | 0.06 |
Q4 Avoid other medicines | 0.48 | 0.07 |
Q5 Limit eat/drink | 0.55 | 0.02 |
Q6 Hassle/daily | 0.69 | 0.10 |
Q7 Hassle/occasional | 0.66 | 0.08 |
Q8 Difficult to follow your ACT | 0.62 | 0.11 |
Q9 Time-consuming ACT | 0.53 | 0.12 |
Q10 Worry about ACT | 0.65 | 0.11 |
Q11 Frustrating ACT | 0.71 | 0.09 |
Q12 Burden of ACT | 0.74 | 0.09 |
ACTS Benefits
| ||
Q14 Confident in ACT | 0.03 | 0.88 |
Q15 Reassured by ACT | 0.07 | 0.91 |
Q16 Satisfied with ACT | 0.19 | 0.83 |
Item | EINSTEIN DVT (N=1336) | |
---|---|---|
ACTS Burdens scale | ACTS Benefits scale | |
ACTS Burdens
| ||
Q1 Bleeding/vigorous activities | 0.50 | 0.06 |
Q2 Bleeding/usual activities | 0.50 | 0.14 |
Q3 Bruising | 0.45 | 0.13 |
Q4 Avoid other medicines | 0.39 | 0.12 |
Q5 Limit eat/drink | 0.46 | 0.10 |
Q6 Hassle/daily | 0.60 | 0.17 |
Q7 Hassle/occasional | 0.57 | 0.15 |
Q8 Difficult to follow your ACT | 0.52 | 0.16 |
Q9 Time-consuming ACT | 0.43 | 0.15 |
Q10 Worry about ACT | 0.56 | 0.18 |
Q11 Frustrating ACT | 0.61 | 0.17 |
Q12 Burden of ACT | 0.65 | 0.16 |
ACTS Benefits
| ||
Q14 Confident in ACT | 0.13 | 0.71 |
Q15 Reassured by ACT | 0.17 | 0.80 |
Q16 Satisfied with ACT | 0.26 | 0.69 |
Psychometric properties: internal consistency reliability
Psychometric properties: validity
ACTS Burdens | ACTS Benefits | TSQEFF | TSQSIDE | TSQCON | |
---|---|---|---|---|---|
ACTS Burdens | 1.00 | ||||
ACTS Benefits | 0.33 | 1.00 | |||
TSQEFF | 0.16 | 0.18 | 1.00 | ||
TSQSIDE | 0.35 | 0.14 | 0.29 | 1.00 | |
TSQCON | 0.32 | 0.24 | 0.53 | 0.29 | 1.00 |
TSQGLO | 0.32 | 0.27 | 0.53 | 0.36 | 0.74 |
Psychometric properties: responsiveness
Mean | SD | t | P | ES | ||
---|---|---|---|---|---|---|
Pair 1 | ACTS Burdens day 15 to ACTS Burdens month 1 | −0.84 | 3.98 | 7.30 | 0.000 | −0.14 |
Pair 2 | ACTS Burdens day 15 to ACTS Burdens month 2 | −1.25 | 4.75 | 8.95 | 0.000 | −0.20 |
Pair 3 | ACTS Burdens day 15 to ACTS Burdens month 3 | −1.35 | 4.65 | 9.80 | 0.000 | −0.22 |
Pair 4 | ACTS Burdens day 15 to ACTS Burdens month 6 | −1.65 | 4.88 | 10.39 | 0.000 | −0.27 |
Pair 5 | ACTS Burdens day 15 to ACTS Burdens month 12 | −2.00 | 4.40 | 5.78 | 0.000 | −0.37 |
Pair 1 | ACTS Benefits day 15 to ACTS Benefits month 1 | −0.06 | 2.22 | −1.03 | 0.304 | −0.03 |
Pair 2 | ACTS Benefits day 15 to ACTS Benefits month 2 | −0.16 | 2.45 | −2.30 | 0.021 | −0.07 |
Pair 3 | ACTS Benefits day 15 to ACTS Benefits month 3 | −0.18 | 2.46 | −2.58 | 0.010 | −0.08 |
Pair 4 | ACTS Benefits day 15 to ACTS Benefits month 6 | −0.29 | 2.49 | −3.71 | 0.000 | −0.12 |
Pair 5 | ACTS Benefits day 15 to ACTS Benefits month 12 | −0.77 | 2.60 | −3.84 | 0.000 | −0.33 |
Discussion
Measurement property | Test | Methods used in testing the ACTS |
---|---|---|
Reliability | Test-retest | ✓ |
Internal consistency | Whether the items in a domain are intercorrelated, as evidenced by an internal consistency statistic (e.g., coefficient alpha) | ✓ |
Inter-interviewer reproducibility (for interviewer-administered PROs only) | Agreement between responses when the PRO is administered by two or more different interviewers | NA |
Validity | Content-related | ✓a |
Ability to measure the concept (also known as construct-related validity; can include tests for discriminant, convergent and known-groups validity) | Whether relationships among items, domains and concepts conform to what is predicted by the conceptual framework for the PRO instrument itself and its validation hypotheses | ✓ |
Ability to predict future outcomes (also known as predictive validity) | Whether future events or status can be predicted by changes in the PRO scores | x |
Ability to detect change | Includes calculations of effect size and standard error of measurement among others | ✓ |
Interpretability | Smallest difference that is considered to be clinically important; this can be a specified difference (the minimum important difference) or, in some cases, any detectable difference. The minimum important difference is used as a benchmark to interpret mean score differences between treatment arms in a clinical trial | ✓b |
Responder definition – used to identify responders in clinical trials for analysing differences in the proportion of responders between treatment arms | Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or non-clinical anchor, an empirical rule, or a combination of approaches | NA |