Introduction
The histopathological examination of tissue is the cornerstone of cancer diagnosis globally. It is based on the staining of tissue samples with histochemical dyes, such as haematoxylin and eosin (H&E), to highlight cellular components for visual interpretation by pathologists. This process has not changed for over a century, and it is well understood that there are variations in the method [
1‐
5]. Staining variation is widely seen in clinical practice in pathology, both within and between laboratories [
5‐
7]. Although not often highlighted as a clinical risk, detailed evidence in this area is lacking. Professional guidelines and laboratory practice emphasise the need to maintain stain quality and reduce variation through internal and external quality assessment, but routine quantitative assessment of H&E staining has to date been unachievable [
8‐
10].
The need to quantify and control stain quality is given greater impetus with the increasing use of digital pathology. This is the process of scanning a glass pathology slide with a whole slide imaging system to produce a digital image. The technology has been promoted and adopted as it has the potential to improve workflow and quality in pathology services [
11‐
13]. Its utilisation is growing due to the increasing maturity of whole slide imaging systems, displays, data handling and storage, significant clinical need for pathology service globally, and the use of artificial intelligence (AI) to augment human diagnosis [
10,
14,
15].
Image quality, specifically colour, is an important parameter for AI as differences in colour are used to set thresholds to detect objects and patterns, meaning variation in the stained colour of tissue can impact upon AI algorithm performance. An increasing number of papers in the literature highlight the importance of colour stability for AI [
5‐
7,
16‐
19]. To help mitigate the effect of stain variation, computer assisted methods can be employed such as s
tain normalisation, the digital normalisation of an image’s colour, and
data augmentation, where computer simulated images with variable staining are introduced to training datasets to improve AI robustness [
20‐
23]. With
stain normalisation, the accuracy of AI, before and after normalisation, has been shown to deliver significant improvements in AI performance [
19,
23‐
27]. Examples include improving colorectal cancer classification and prostate cancer detection accuracy by 20% and 9% respectively [
26,
27]. Other work has found that prostate cancer classification performance suffered when using images from different institutions and scanners, and that application of stain normalisation to a variable-quality dataset improved AI performance by 5% [
24]. Inter-institutional staining characteristics can be distinguishable by AI and have the potential to bias accuracy, even with application of stain normalisation [
7]. Importantly, a recent study also found stain normalisation significantly improved pathologist perception of stain colour quality, diagnostic confidence, and time to diagnosis [
28]. However, they also found that normalisation reduced inter-pathologist agreement. Although this was only two pathologists it suggests that normalisation may improve perceived colour and pathologist confidence, but that the normalisation process may be a variable in its own right that could negatively impact upon inter-observer agreement. Stain normalisation can improve image standardisation, AI performance and generalisability, however image manipulation is relative and can introduce artefacts, lead to loss of information, or bias the training data [
23,
29‐
31].
An alternative approach to reduce variation between images is to reduce stain variation at its source, through laboratory quality control (QC). Strict protocols are maintained within histopathology laboratories and reagents are replenished regularly to minimise variation, with most adopting automated staining instruments for improved precision. Methods of routine QC have changed little over the years, where both internal and external quality assessments are based on subjective,
qualitative observations [
5,
32‐
35]. Although human qualitative assessment is important for assessing quality, it is subject to observer bias and relies on assessing stained control tissue which, due to intrinsic biological differences, can be variable between sections. Control tissue blocks are finite, being exhausted after a few hundred control sections have been cut, necessitating new controls which may have different morphological appearances and staining characteristics. Tissue staining may also be confounded by other variables prior to staining, such as fixation or section thickness variation [
36‐
39]. These limitations mean that using tissue-based QC approaches alone may not be sufficient as a control method for stain quality assessment over time, or across institutions.
There has been research into the use of quantitative controls for
immunohistochemistry staining in histopathology, and a consortium has recently been launched to improve immunohistochemistry reproducibility [
40‐
44]. But there is limited research focusing on quantitative QC methods for
H&E staining, which accounts for the majority of stained slides in laboratories worldwide. Gray et al. [
5] and Chlipala et al. [
45] have developed digital methods of quantifying H&E staining from whole slide images of stained control tissue. Although effective, these methods of quantifying stain can be impacted by confounding variables as they use tissue as a control and rely on accurate colour reproduction during digitisation.
In this paper we propose a method for absolute quantification of H&E staining in the laboratory environment, using stain assessment slides. Stain assessment slides comprise of a biopolymer film applied as a label to standard pathology glass slides. The biopolymer film is highly receptive to stain due to its hydrophilicity and porous structure. We characterise the stain assessment slides, compare the stain response with tissue, and validate the use of this methodology as routine QC testing for H&E staining within a clinical laboratory. This technique has the potential to offer truly objective and quantitative QC of H&E staining, to augment current QC processes in laboratories.
Discussion
We have proposed that improving stain QC and standardisation is a practical and logical approach to ensuring consistency of traditional laboratory stain quality and the resultant digital data set.
We evaluated a novel method of stain QC in a series of experiments. Experiment 1 characterised the biopolymer film on stain assessment slides, stained with H&E (separately and combined) and found a linear relationship between stain duration and stained colour of the biopolymer, with r values of 0.99 for all stain techniques. This demonstrated that the stain assessment slides take up H&E stain linearly over time and were an effective, quantitative measure of staining, based on purposefully altering stain duration.
Experiment 2 compared the H&E staining characteristics of the biopolymer with sections of human liver tissue, to contrast the performance of the system with the conventional use of tissue-based controls. There was a strong correlation between mean biopolymer and liver staining (r values between 0.98 and 0.99) indicating that biopolymer stain uptake was linearly comparable to human liver tissue within the stain durations measured. The linear relationship was non-proportional (y intercept ≠ 0) due to the biopolymer film having an increased thickness (24.4 μm biopolymer vs. 5 μm tissue sections), permitting higher sensitivity of the biopolymer to detect variations in staining.
Experiment 3 implemented stain assessment slides within a clinical laboratory to establish the clinical utility of the method. Experiment 3a assessed variation at one point in time and found the intra-instrument variation was 6–9%; a similar level to the average variation found across stain durations in Experiments 1 and 2. This suggests that this variation was dominated by intra-batch variation in the stain assessment slides, rather than variation within the staining instruments, however this was not possible to discriminate. The inter-instrument variation at one point in time was 8%, which was found to be statistically significant (p = 0.0003). This indicated that despite different instruments using the same protocol, inter-instrument variations are present. Varying levels of slide throughput may have contributed to this, e.g. a higher throughput of slides may equate to a higher likelihood of reagents becoming diluted/contaminated. There may also be variations between different H&E stain batches that could contribute to the variation measured. The stain assessment slides offer a simple method of quantifying variation and characterising staining instruments on a periodic basis. However, despite the instruments staining being significantly different, only 8% variation was measured at one point in time, which was a low level of variation (similar to baseline level of variation within the stain assessment slides), particularly considering the biopolymer has an increased sensitivity to stain compared to human tissue.
Experiment 3b assessed variation across five days and found an average intra-instrument variation of between 23 and 28%. This is approximately 2.5–4.5 times higher than the level of variation found in Experiment 3a at one point in time, which highlights the increased variation present across five days. The daily variation reached as high as 47% on one day (Stainer-3, day 2). The inter-instrument variation was 27% but was not found to be significant, although this may be due to paucity of data. The variation was likely caused by dilution of reagents and high throughput of slides over the course of one week. Daily quantitative QC would have a strong potential to limit this variation by setting thresholds of normal operation; this would also provide onward benefits for AI by providing more consistent data for both training and utilisation. A limitation of this experiment was that information was not collected on the frequency of stain reagent changes. As one of the potential benefits of the stain assessment slides would be to optimise reagent use, that information is important and should be included in future work. If less frequent reagent changes can be identified this could be of financial benefit to laboratories, either way this information potentially informs on future guidelines or standards.
Additional limitations of this study include the variability in stain uptake by the biopolymer at 6–14%. For context this variability was subjectively barely perceivable compared to the staining instrument variation found across five days, which was readily noticeable at 23–28%. It is thought that the variation was largely due to the high sensitivity of the biopolymer; the hand-made nature of constructing stain assessment slides; and the use of a manual staining process in Experiments 1 and 2. As such automated manufacture and staining processes should improve this. A further limitation was the use of different techniques to measure the colour of the biopolymer film. In Experiments 1 and 3, colour was measured spectrally (total absorbance), to characterise the
absolute stained colour in the biopolymer. Experiment 2 differed in that colour was measured digitally (RGB values) to characterise the
relative relationship between the biopolymer and tissue stain uptake. Accurate colour measurement from whole slide images relies on accurate colour reproduction of the imaging system. The whole slide images were manually checked for quality, but the AT 2 scanner was not specifically colour-calibrated prior to use, other than the out-of-factory calibration, setup and yearly calibration by the manufacturers following their standard procedure. Because we scanned the biopolymer and tissue in the same scanner at the same time, we can determine from previous experimental work that this scanner would have an expected variation in colour measurement of 0.47%, which is an order of magnitude lower than the stain variation being measured in the stain assessment slides and tissue [
5]. There was no direct comparison between the spectral and digital colour measurements and future work will compare these methods.
The H&E characterisation in this paper was based on an intensity measurement of H&E staining with equal time for each stain (1:1 ratio), so additional analysis is needed to understand the biopolymer response to disproportionate H&E stain durations. Early work suggests that this will be proportional to the time-stain uptake curves shown in
Fig. 2b. The relative uptake of H&E stains needs to be reported to inform practical instrument optimisation in the laboratory. Digital methods do exist to do this already, for example, stain deconvolution by Ruifrok et al. [
49]. The impact of varying H&E types/brands also needs to be fully characterised, as well as determination of the level of variation in stain assessment slides that equates to visually or diagnostically noticeable differences in different tissue types.
Further work will develop an operational process to allow stain assessment slides to be readily deployed and utilised in an operational environment. The use of a spectrophotometer is impractical in an operational pathology workflow, however if a laboratory has been digitalised already, a whole slide imager could practically be used to collect stain data. There are two potential limitations of this, one is that not all laboratories have gone digital, and the other is that a time lag is introduced between staining and the returned quantitative data, which may limit the utility of the stain assessment tool as a near-time quality control. To address this, we are developing a small, laboratory-friendly device to measure colour directly from the stain assessment slides that can fit easily into the laboratory workflow and provide immediate feedback. It is important to note that the stain assessment slides allow quantification of the stain delivered to tissue. We accept that there are complex relationships between haematoxylin, eosin and tissue presentation. The use of stain assessment slides is not for assessing the impact on clinical presentation, but to provide information that the staining instrument may or may not be performing within pre-defined parameters as that may have a consequence for the clinical presentation.
In summary, this work presents a novel method using a biopolymer as a quantitative H&E stain assessment tool that:
-
demonstrates linear staining with H&E,
-
shows comparable stain uptake to control tissue slides,
-
has demonstrable clinical utility in measuring stain variation.
If adopted into routine practice, the presented QC tool could improve stain consistency and optimise reagent use by removing subjectivity in stain assessment. This technique can be used as a periodic point-in-time test for staining instruments, to be used alongside laboratory internal and external qualitative assessment protocols. An added benefit of quantifying stain variability is the potential cost-saving by optimising stain replenishment and reducing reagent use. There are also clinical and operational benefits from reducing the need to re-section and re-stain tissue if stain quality drops. These benefits will not only help optimise the speed and quality of diagnosis but also help to produce consistent digital whole slide images and to help facilitate AI in digital pathology in future.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.