Introduction
Myocardial strain allows for a quantitative measurement of myocardial deformation. Analysis of strain using cardiovascular magnetic resonance (CMR) can be obtained either by tissue tagging or by direct feature tracking (FT) on standard cine images. While CMR tagging has been validated and has further advanced into various different techniques (e.g., (fast) Strain Encoding Magnetic Resonance imaging (SENC) or displacement encoding with stimulated echoes (DENSE)) [
1‐
4], it still has the drawback of requiring special sequences and scan time. In contrast to this, FT is a promising tool as it allows for the assessment of segmental and global strains in longitudinal, circumferential, and radial directions (LS, CS, RS, GLS, GCS, and GRS, respectively) from standard cine images which are usually acquired during clinical routine [
5]. Left ventricular (LV) strain analysis applying CMR has been implied in a wide array of clinical diseases ranging from chemotherapy-induced cardiotoxicity [
6,
7] to ischemic heart disease [
8‐
10] and even non-ischemic heart diseases like hypertrophic cardiomyopathies (HCMs) [
11,
12] or cases of acute myocarditis [
13,
14]. Despite its wide utility and power to detect myocardial changes even in states with preserved function, FT still lacks standardization and consensus about the methodological process. In a previous study, different factors, like post-processing software used, slice selection, and 2D or 3D analysis, which all have the potential to influence strain values, were analyzed [
15]. Additionally, one must take the time-consuming manual contouring process as well as the reader’s level of expertise and training into consideration, which also impact strain evaluation [
16]. One potential approach to reduce influence of manually derived contours is the use of artificial intelligence (AI)–derived contours. AI-based segmentation and strain evaluation have been previously applied and validated in a large cohort with commercially available software [
17] as well as commercially unavailable software [
18]. These advances may further streamline strain assessment and help to reach a consensus about a standardized approach to feature tracking in clinical routine and “big data” studies.
Our study aimed at evaluating and comparing manual and AI-based approaches regarding quantitative strain metrics used in clinical routine as well as on a contour level for strain assessment by FT in healthy volunteers and patients with different cardiac diseases in order to identify strengths and weaknesses of these methods.
Materials and methods
The ethics review board approved all studies and all participants gave written informed consent.
Study population
In the healthy volunteers cohort, 67 subjects, retrospectively recruited in a previous study [
15], were included. For the final analysis, 7 volunteers had to be excluded due to lack of a short-axis (SAX) stack covering the entire ventricle and one due to significant respiratory artifacts, which ultimately resulted to a healthy cohort consisting of 60 subjects. For the clinical validation, a cohort of 76 patients, chosen from previous studies [
19‐
21], with cases including left ventricular hypertrophy (LVH) (
n = 46 consisting of 8 patients with arterial hypertension (AHT)), 24 with aortic stenosis (AS), 14 with HCM, and chronic myocardial infarction (CMI) (
N = 30 consisting of 10 patients with preserved left ventricular ejection fraction (LVEF), no wall motion abnormalities (WMA) and focal fibrosis (CMI-F), 10 with reduced LVEF, regional WMA, focal fibrosis (CMI-WF), and 10 with reduced LVEF with dilated LVs, global WMA, and focal fibrosis (CMI-EWF)), was constructed.
Imaging protocol
CMR was performed either at a 1.5-T scanner (MAGNETOM Avanto−FIT, Siemens Healthineers) or a 3-T scanner (MAGNETOM Verio, Siemens Healthineers). Steady-state free precession-based cine images were acquired for 3 long-axis (LAX) views including a 2 chamber view (cv), a 3 cv, and a 4 cv as well as one SAX stack covering the entire left ventricle. Sequence parameters for the SAX stack at the 1.5-T scanner were as follows: time of repetition 2.8–3.31 ms, slice thickness 7 mm with no gap, flip angle 80°, echo time 1.2–1.44 ms, field of view 340–380 × 276– 308, 75 mm2, matrix 192 × 156, voxel size 1.4–2.0 × 1.4–2.0, 30 cardiac phases; and for the 3-T scanner: time of repetition 3.1 ms, slice thickness 6 mm with no gap, flip angle 45°, echo time 1.3 ms, field of view 340 × 276 mm2, matrix 192 × 156, voxel size 1.4 × 1.4, 30 cardiac phases.
Manual segmentations
Manual segmentation was performed with dedicated software (circle CVI
42 version 5.14.7, Circle Cardiovascular Imaging Inc.). Manual endo- and epicardial contours were drawn in end-diastole (ED), determined by the phase with the largest LV volume in SAX as well as in the 2 cv, 3 cv, and 4 cv. We were particularly attentive during segmentation to avoid contouring phases with the left ventricular outflow tract (LVOT) still visible in diastole and/or systole. Papillary muscles were not separately contoured as recently published [
15]. Reference points for the delineation of segments were manually placed at the subepicardial border at the anterior intersection of the left and right ventricle.
AI-generated segmentations
Similarly to the manual strain assessment, AI contours were derived in ED. The AI segmented slices with visible LVOT using open LV endo- and epicardial contours, which were disregarded for strain analysis. Reference points were automatically set by the AI; however, each point was manually validated to obtain comparable segmental values. The AI segmentation algorithms employed in the Circle CVI
42 software are comprised of different deep convolutional neural network models trained to perform SAX and LAX CMR image segmentation. A similar model architecture as that of the standard U-Net is adopted for this purpose, along with various data augmentation techniques to enhance the generalizability of the trained model. The model was trained on the UK Biobank data as well as datasets that include patient data with pathological conditions including tetralogy, cardiomyopathy, and hypertension [
22]. These models operate solely on image pixel data and image header information such as image dimensions and pixel spacing.
Strain assessment
After segmentation, a FT algorithm provided strain values. The algorithm uses myocardial points and tracks them along the cardiac cycle [
23,
24]. On a quantitative level, the manual and the AI approach were compared for strain assessment in CS and RS retrieved from SAX and LS retrieved from LAX views. All strain values were derived for global as well as segmental values according to the 17-segment model of the American heart association (AHA) for CMR without the apical segment [
25]. Correct FT was assessed by either mesh analysis or by tracking the myocardial points through the phases. Improper tracking was defined as mesh overlay or myocardial points not following the extent of the contours [
15,
26]. To enable comparability between the segmentations regarding the strain analysis, we verified that the AI algorithm chose the proper ED phase.
Statistical analysis
All continuous variables are presented by mean and standard deviation (SD). Normal distribution was visualized by QQ plots. A mixed model was used to assess measurement differences segmentally and globally between the modalities for healthy volunteers and patients combined. In the mixed model, a global test was applied to test for any differences. In the case of a significant global test, pairwise comparisons were performed. Additionally, we tested whether a difference found between the AI and manual segmentations was homogenous over all groups or whether a certain group showed major deviations.
Additionally, both segmentation approaches were compared using the “Lazy Luna” (LL) tool which allows to assess the similarity those of an experienced reader and an AI on the contour level, via reproducibility validation metrics [
27]. We chose the Dice similarity coefficient (DSC) and the Hausdorff distance (HD) to compare the consensus of the manual contours and the AI approach. DSC scores were calculated based on myocardial class, which was derived from the intersection of the endo- and epicardial contours placed manually or by the AI. High DSC numbers signifying a substantial overlap of the segmented areas and low numbers indicate incongruences. Vice versa holds true for the HD metric. In order to compare the proper placement of the insertion point, the LL tool additionally compared the manual- and AI-placed insertion point based on an angular difference to the left ventricular centroid. As some SAX acquisitions were acquired after contrast media application (post-CM), GRS and GCS as well as DSC and HD metrics were compared with acquisitions pre-contrast media application (pre-CM). Statistical analysis was performed using dedicated software (SPSS version 26, International Business Machines and SAS version 9.4, SAS Institute Inc.). The segmentation comparison tool “Lazy Luna” and the bulls-eye plots were created in Python (Version3.8, Python Software Foundation) [
27].
Discussion
The main results of our studies are as follows: strain analysis by FT on cine images based on AI-derived contours is feasible and results in equivalent global and segmental strain values with the exception of lateral segments in hypertrophied ventricles. The difference however is attributable to tracking errors as the spatial overlap metric shows good agreement of the methods.
CMR has become the gold standard for LV and right ventricular volume and mass quantification [
28] with a standardized approach for analysis and post-processing of images [
29]. Yet, there are no consensus recommendations on how to quantify LV myocardial tissue dynamics and deformation by applying CMR. Therefore, we explored how strain assessment by FT can potentially become more standardized involving AI-powered approaches. In our study, we could demonstrate that AI-generated contours for strain assessment by FT are reliable and result in equivalent global and segmental values. In a previous study by Ruijsink et al, DSC between manual and AI-based segmentation was 93% for the endocardial segmentation and 84% for the epicardial segmentation [
18]. We found a similar DSC for the myocardial class in our study. Other segmentation algorithms which were tested on the “Automatic Cardiac Diagnosis Challenge” (ACDC) dataset, achieved DSC scores up to 96% for the LV [
30]. These scores are higher than the one presented here; however, the underlying dataset in the challenge plays an important factor. The ACDC dataset included similar pathologies as in this study, such as HCM, CMI, and dilated cardiomyopathies; however, whether scans were carried out after contrast administration was not clearly depicted. Images obtained after contrast media application pose an additional challenge as myocardial fibrosis might be mistaken for the blood pool by AI algorithms as well as human readers. In this study, we found the overall lowest segmentation overlap, indicated by DSC and HD metrics, in the CMI group. As all scans in the CMI group were carried out after contrast administration and each case included at least one focal fibrosis; these scans posed a challenge for the algorithm. This was also evidenced by the lower DSC and HD in the comparison of pre-CM and post-CM images. In the subset of LVH, we found higher DSC scores in comparison to the healthy and CMI cohorts. We believe that this paradoxon can be explained by the larger LVM in the LVH cohort as the overall differences in segmentations are divided by a larger area. When considering the HD metric, the healthy cohort showed the lowest value regarding the SAX segmentations.
When segmental values, which are defined according to the AHA model, are compared, the insertion point has to be taken into consideration. In order to verify the proper insertion, visual analysis can be carried out; however, for large data, this is tedious. We propose therefore a comparison based on angular differences as outlined in the methods. The highest angular differences (8.7°) were seen in the CMI-F cohort. Comparing this to the general division of the AHA segments (60°), we feel that these are neglectable; however, the relevance of angular differences and their impact on AHA segments ought to be investigated further.
Interestingly, the study by Ruijsink et al used 3 SAX slices as well as 2cv and 4cv for strain assessment with diastolic contours. This might potentially impact the strain analysis of CS and RS values [
15]. Previous studies reported normal values for FT on the full SAX coverage [
5,
31]. To additionally achieve a more streamlined post-processing of CMR images, AI algorithms, with the placement of contours in ED and ES for functional assessment, can “recycle” these contours for strain assessment by FT. Further studies employing this approach, potentially in clinical routine scenarios, are needed to verify whether this approach is feasible in a real-world setting. We analyzed not only a healthy population but also one with different clinical entities, which were chosen as either to be a challenge regarding the segmentation (post-contrast cines with fibrosis) or for the FT algorithm (wall motion abnormalities) or both (LVH subgroup, CMI-EWF). In addition, we compared all cases 1:1 and not only selected cases, which minimizes the possibility of errors. The LVH group, especially AS and HCM cases, was burdensome for the AI on a contour and tracking level. Studies about AS and CMR FT have been previously published but none covered AI-derived contours or analysis of tracking errors [
32‐
37]. As the majority of tracking issues in the LVH group was related to the anterolateral and inferolateral segments, papillary muscle hypertrophy could have played a role [
38,
39]. As the FT algorithm relies on recognition of voxel-based features, which are in general sparse or even absent in the normal myocardium [
40], hypertrophy of the myocardial tissue can further deteriorate tracking. We found the highest CS and LS as well as the lowest RS value in the abovementioned segments in the healthy cohort. Andre et al presented similar results, with the highest variance in these segments. Potentially, the movement of the papillary muscle can have an impact on the values in these segments even in non-pathologic states, which would be further exacerbated with hypertrophy of the myocardium and its appendages. A previous study found statistically significant differences between healthy male and female volunteers in AHA segment 5 of LS analysis [
15]. One other study reported a significantly larger papillary muscle mass in males compared to females [
41], which might possibly explain the previous findings and the ones presented here. In concordance with the identified segments that pose a challenge for the FT algorithm, the AI-derived strain values show the only significant differences compared to manual contours in these segments. Interestingly, the differences between the methods were due to strain values for AS and HCM cases (Supplementary Material
2). These differences however are not based on a contour level as evidenced by the DSC. Potential influencing factors might relate to the LAX extent or the acquisition itself. In comparison to the other pathologies, we found the 3cv slice location in this group frequently at a narrower angle. This finding might be related to a prominent and dilated ascending aortic root in AS patients impairing proper 3cv acquisitions [
42,
43].
The intersegmental differences are a drawback for FT-derived strain values. In a head-to-head comparison between fast-SENC, tagging, and FT, all techniques had good reproducibility; however, in a segmental inter-study comparison, FT showed the lowest agreement [
3]. This was confirmed by other studies reporting a rather large variation across segments rendering comparisons rather unfeasible and additionally demonstrating that segmental analysis with FT is complex and clinical implications uncertain [
31,
44]. A potential solution might be the use of regional instead of segmental values [
45]. In contrast to FT, other techniques such as DENSE have a higher reproducibility of segmental strain values [
46,
47]. In addition, segmental strain values provided by DENSE have been shown to carry a prognostic implication in patients after an acute myocardial infarction [
48]. Regarding the segmental approach, SENC-derived strain values similarly show a better intersegmental agreement [
3,
49]. The fast SENC technique seems highly reproducible, even across different sites [
50,
51]. Clinical application of this technique has shown clinical merit; however, more research is needed regarding segmental values [
52]. In general, all strain values derived from commercially available software are potentially limited in their comparability as new versions provide new values; hence, providing the software version applied is of great importance.
Lastly, we want to comment on the effect of contrast media application on FT-derived strain values. On the one hand, we noticed that the segmentation becomes more challenging for the AI algorithm; on the other hand, post-CM strain values are lower. This is in line with previous literature [
53]. As SAX acquisition are now most of the time acquired after contrast media application, challenges and AI segmentation networks have to take this into consideration.
Limitations
This is a single-center study with a limited number of cases but reflects different disease entities. This limits potential statistical power in the detection of significant differences in the pairwise comparisons. Another limitation is that we did not compare cardiac contours along the cardiac cycle in order to depict the proper tracking of the FT algorithm. Furthermore, we want to point out that we did not use another vendor, which reduces generalizability.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.