Introduction
Since the worldwide spread of the novel Coronavirus Disease 2019 (COVID-19) to a pandemic in the beginning of 2020, not only the acute infection and its mortality but also the long-term consequences have become subject of public and scientific interest [
1]. A proportion of COVID-19 survivors, both hospitalized and non-hospitalized, have persisting symptoms for more than 30, 60, or even 90 days after primary infection, referred to as “long haul COVID,” post-acute sequelae of COVID-19 (PASC) or long COVID [
2]. The World Health Organization (WHO) has published a definition of the post COVID-19 condition specifying the condition with persisting symptoms as fatigue, shortness of breath, and cognitive dysfunction lasting for at least 2 months [
3]. This definition is to be sharpened by further research and possibly may include several long COVID syndromes. When investigating long-term symptoms more than 28 days after a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection several studies, reviews, and meta-analyses show fatigue as the most frequent and strongly presenting symptom. In addition, dyspnea, impaired concentration, and arthromyalgia are also regularly mentioned symptoms, depending mostly on the time after infection [
2,
4,
5].
Cluster analysis as an exploratory data analysis method has been used in several studies to investigate the co-prevalence of symptoms. Using this approach to characterizing post-viral syndromes associated with COVID-19 can help to discover similarities to previously described syndromes and to derive novel hypothesis pathogenetic pathways and possible treatment approaches. Furthermore, our results may inform health services researchers on how to design multimodal treatment regimens that specifically address clusters instead of single symptoms and thus lead to an improvement of patient care. Previous cluster analysis studies focus on symptoms of acute COVID-19 or also psychosomatic symptoms associated with the pandemic situation [
6,
7]. Conducted cluster analysis on long COVID symptoms differ methodologically in the included participants, symptom selection, and method to cluster the symptoms [
8]. Kenny et al. have identified three distinct long COVID clusters in a cohort of 233 individuals: Cluster A with pain-related symptoms, cluster B with cardiorespiratory symptoms, and cluster C characterized by a significantly lower number or reported symptoms per individual [
9]. Goldhaber et al. used exploratory factor analysis to identify five clusters among 16 symptoms in a study comprising 999 participants [
10]: (1) Gastrointestinal, (2) neurocognitive, (3) musculoskeletal, (4) airway, and (5) cardiopulmonary symptoms without overlapping clusters. Tsuchida et al. performed a hierarchical cluster analysis and also identified five clusters among 23 symptoms including 500 participants [
11].
The aim of this study is to identify symptom clusters in people with long COVID and to describe their prevalence. We use a cross-sectional data sample of a large long COVID online survey to cluster self-reported symptoms.
Discussion
In our cohort of mostly non-hospitalized COVID-19 survivors, we could sharpen the description of long-term symptoms by clustering them in three symptom clusters. Fatigue and brain fog were the most frequent long-term symptoms after a SARS-CoV-2 infection, confirming results from other studies [
5,
19]. In a recent meta-analysis regarding 18 follow-up studies on long COVID symptoms, among the most prevalent symptoms were fatigue and weakness with a pooled prevalence of 28%, cognitive impairments (19%), depression symptoms (23%), and dyspnea (18%) [
5]. Our study shows a similar ranking of symptoms with respiratory and neurological symptoms being the most frequent. Despite a similar ranking of the participants symptoms with long COVID show a higher prevalence of most symptoms than described in the literature. This is especially interesting as our cutoff for present symptom already uses the rather conservative median of each individual symptom strength. The higher frequency could have its origin in the self-reported assessment of symptoms. Even when using a rather conservative cutoff to determine symptom prevalence we could show that long COVID symptoms are not only prevalent after hospitalized COVID-19 but also after “mild” acute disease.
In a meta-analysis on long COVID symptoms, 13 of 18 studies only examined hospitalized patients [
5]. In a recently published observational examination from Mexico, 4670 participants were followed by telephone calls after being hospitalized because of COVID-19 [
20]. In this study, headache and cough were the most frequent symptoms. The authors classified the assessed symptoms into body system clusters but did not carry out hierarchical cluster analysis to show associations like we did.
To better understand the co-prevalence of symptoms and potentially finding keys to potential pathogenic mechanisms of those symptoms and to derive more specific possible treatment approaches long COVID subtypes, cluster analysis has been used by several research groups with long COVID symptoms [
6,
8,
21]. The inter-cluster overlap observed in our study suggests that treatment approaches might need to be more holistic, addressing multiple symptom clusters simultaneously.
Our analysis of overlapping clusters also raises questions about the potential risk factors, pathophysiological mechanisms, and treatments underlying these symptom combinations. The use of clustering of long COVID symptoms in treatment can be observed in the development and an occupational therapy intervention. There, only people with fatigue and concentration difficulties were included and the intervention was specified for this target group. In cluster analyses, these symptoms often occur with other neurological symptoms in long COVID patients [
22].
Very similar to our approach, a study defined three distinct symptom clusters using hierarchical clustering [
9]. Kenny et al. used an identical long COVID definition to our study and included participants with a similar age distribution as in our study. Kenny et al. show fatigue as the most frequent symptom clustered together with pain and musculoskeletal symptoms, similar to cluster A in our analysis. Based on a much larger sample size, our study confirms the cardiorespiratory cluster. Additionally, we found an interesting association of cardiorespiratory symptoms with neurological symptoms such as loss of smell/taste and confusion. Our analysis also shows a more differentiated picture of the symptom clusters, as we not only included the binary appearance of symptoms but also assessed symptom strength.
Our study has several limitations. As our data relies on self-reported information, the assessments and symptoms severity ratings are per se subjective and cannot be verified. Additionally, as participants were self-recruited via the internet, our study might not be representative of the general population. Also, as non-long COVID controls in our study were not inquired on symptom and symptom severity, we could not make comparisons to differentiate clusters. On the other hand, the broad distribution and recruiting of our questionnaire allows to reach participants in regions far away from university medical centers and also to gather data about long-term symptoms after mild COVID-19 courses without the need of hospitalization or intensive care.
When regarding the distribution of clusters within the study population, our clusters are overlapping with symptoms in two or all three clusters occurring in one participant. With the applied method, participants cannot be strictly separated into just belonging to either cluster A, B, or C. Nevertheless, it will be interesting to find out more about risk factors and serological markers associated with the different symptom clusters. In addition, it would be interesting in future research to investigate patient-reported outcomes like the health-related quality of life between the defined clusters.
Having identified certain symptom associations, it could be interesting to link our study results to existing scientific knowledge about post-infectious syndromes and the understanding of their pathophysiology. The COVID Human Genetic Effort consortium recently presented an overview about mechanisms hypothetically explaining key symptoms of long COVID [
23]. They postulate that spikes of fever for example (part of our Cluster C) are caused by persistent viral reservoirs while autoimmunity targeting g-protein-coupled receptors on neurons are responsible for neurological symptoms like loss of smell and taste or confusion (as in our Cluster B) [
14]. A review and comparison of symptomatology between long COVID and Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) sheds light on the large overlap of symptoms reported by long COVID studies and the major criteria symptoms for ME/CFS (fatigue, reduced daily activity, and post-exertional malaise) [
24]. They suggest a high degree of similarities and discuss the possible origin of both syndromes in a dysregulated immune response and hyper inflammation. As our Cluster A shows a co-prevalence of fatigue with symptoms typically occurring in rheumatologic and auto inflammatory diseases like joint pain, we consider our cluster analysis may give a further piece to the puzzle of distinguishing one or several pathomechanisms of long COVID sequelae.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.