Summary of main results
In this study, we were able to demonstrate good reliability and confirmed several aspects of validity for the GEDI.
Specifically, we observed excellent internal consistency and moderate to good test-retest and interrater reliability. Moreover, we confirmed face validity by conducting expert interviews with preschool teachers. Additionally, our findings suggest that most items included in the GEDI are valid indicators for key developmental domains. Testing the GEDI against validated instruments currently in use in Germany showed good correlations between corresponding domains. Bland-Altman plots overall revealed good and acceptable agreement between GEDI and SDQ/DESK-measured domains of child development, respectively. However, some variation in agreement existed across the distribution of scores and between age groups, with the GEDI underestimating scores in younger age groups and overestimating in older age groups. In addition, associations between GEDI scores and specific characteristics such as SES and sex were comparable to previous work [
24,
26,
28,
32,
53], suggesting good convergent validity. Lastly, our density plots displayed left-skewed distributions of GEDI scores across domains, as might be expected. Scores in the lowest 10th percentile were largely similar between the German and the original EDI in all domains except LAN.
Reliability of items
Although internal consistency was generally good, some exceptions indicate a need for careful consideration. Two domains in particular should be given a closer look: PHY and LAN.
In terms of content, it might be that the domain PHYcontains more than a single latent variable; however, the original item structure of the EDI, which we used as a template, treated this as one domain. Within this domain, four items loaded below 0.3.: (i) the loading for S
ucks thumb/finger was near zero and the item-total correlation was quite low (α = 0.077). Similarly, the original EDI report [
51] as well as a report from Hagquist et al. (2013) showed poor loading (0.401) of this item in factor analyses and identified it as the most poorly fitting item in the domain [
54]. Excluding this item from our analyses, however, did not result in significantly higher internal consistency as the item response category used by 92% of our sample was “never or not true”. One potential explanation for the latter finding is that the vast majority of children in our sample came from families with a higher SES, a setting which might provide greater emotional stability [
55]. Given that theories of developmental psychology consider this item reflective of emotional conditions in children such as anxiety or depression [
52,
56], we recommend shifting the item to the domain EMO. Similarly, three items within the domain PHY showed a lack of variablity in responses: (ii)
child arrives hungry, (iii)
is independent in washroom habits most of the time, and (iv)
shows an established hand preference. We attribute this to selection issues in our study, with the majority of our sample coming from higher SES households. Previous work suggests that children in high-SES families have more of the resources needed to support their positive development than those from lower SES households [
57]. Correlations between SES and child development in our study are to be interpreted with caution, however, and future studies implementing the GEDI or an adaptation of it should ensure that children from diverse social backgrounds are sampled.
While many studies show that a secure and organized parent-child attachment is positively associated with the social, emotional and cognitive skills of children [
58], only a few items that relate to these skills exist in the GEDI and the EDI as originally reported. Thus, including additional items in the GEDI covering aspects of familial support may, as others have suggested [
59], improve the GEDI.
According to our interpretation, the poor performance of three items within the LAN domains across all age groups (
is able to read complex words, is able to read simple sentences, and is able to write simple sentences [response of “never or not true” in almost all cases [96, 96, 97% respectively]]) might be related to differences in the structure and learning objectives in German versus Canadian preschools. As the context in which the original instrument was developed, Canadian preschools focus from the outset on promoting advanced language and math skills, whereas German preschools emphasize free play during preschool time and introduce children to basic numeracy and literacy skills only in the last year for the oldest children. Advanced reading and writing abilities do not represent pedagogical aims in German preschools, which may explain why children have neither developed nor are expected to have these skills before the first grade in elementary school. A previous study [
54] from Sweden, where preschools take a similar approach to that of Germany [
60], reports similar results for two of the items mentioned. Moreover, children enrolled in German preschools typically range from 3 to 6 years of age (mean of our sample 4.7), while the age range for children participating in the sample used to develop the original instrument was higher (4 years, 11 months to 6 years, 4 months) [
50]. This would suggest, therefore, that preschool children in Germany would be even less likely to show these competencies. Therefore, these items might have to be excluded in order to produce a reliable, contextually appropriate instrument documenting early development and vulnerability in Germany.
Validity assessment
Our content validity results and a factor structure very similar to that reported in the original study [
31] underscore one of the key tenets of developmental psychology - that children’s development is universal [
44]. Thus, most of the GEDI items seem to be transferable to children across industrialized countries.
Significant negative correlations between two domains of the SDQ (i.e., SOC and EMO) with comparable domains in the GEDI indicate acceptable concurrent validity. Significant positive correlations of GEDI domains with corresponding DESK domains (all
p-value < 0.05) indicate good validity in younger children who comprised a majority of the subsample (74%,
n = 29), whereas older children had very low and in some cases negative correlations in half of the corresponding DESK domains (5/11). We attribute this either to the small sample size of older children, or to the fact that those DESK domains mainly include group performance tasks while GEDI items are based on teacher report [
39].
Results from Bland-Altman plots show good to moderate agreement between the GEDI and SDQ/DESK, though with some variation aross the distribution of scores. This was expected, as the measures capture constructs that are similar but not identical. For SDQ, agreement values are derived from the total sample. Acceptable agreement only in the highest scores of GEDI-SDQ domain pairs might be related to scorings in the higher ranges of the latent constructs as shown in the density plots. Furthermore, based on the mean differences, preschool teachers tended to underestimate younger children’s development (age groups 3 and 4 years) and to overestimate older children’s development (age groups five and 6 years) with the GEDI compared to assessments applying the SDQ. This finding suggests that GEDI assessment in Germany should be administered using age-specific questionnaires. In the small subsample in which DESK was administered, we found similar results, except for domain pair GEMO/PHY_3, where the mean difference for three-year-olds was near zero and for four-year-olds was almost half a standard deviation below zero. This inconsistency might be attributed to methodological discrepancies and should be interpreted with caution. Taken together, these findings suggest that further adaptation will be necessary for future use of the GEDI, especially for enabling valid measurement of development in middle and lower score ranges of the latent construct.
Despite a factor structure very similar to the one reported for the EDI [
51] (16 instead of 15 factors/subdomains), we observed loadings that were generally lower. This might be due to a smaller sample size of and to a younger mean age in our sample. Therefore, a future study to confirm convergent validity of an adapted GEDI should try to achieve a larger sample and a more uniform age distribution.
Sample distribution compared to representative data
The density plots reveal distributions as expected in the domains PHY, SOC, EMO and COM. For the domain PHY (also containting items that cover the general state of health and motor abilities of children) our results are in line with national statistics on the general health status of children in this age range [
61], in which 57.1, 38.6 and 4.3% report having a very good, good or poor health status, respectively. Another nationally representative study reported, that 5 to 11% of preschool children have noticeable problems in their motor development [
62]. This statistic is consistent with findings in our sample, in which a majority of children (> 90%) scored in the upper range of the specific latent trait continuum (right-skewed distribution).
Further, around 17% of children in Germany are affected by psychological health issues [
61,
63]. This in consistent with our results from domains SOC and EMO, in which approximately 75% of our sample scored higher than 8.7 and 6.8, respectively.
For COM, only 10% scored below 5 in our sample, a finding consistent with representative monitoring data, in which 3 to 16% of children showed difficulties in the development of communication skills such as vocabulary, speech comprehension, articulation, and oral fluency [
64].
For the domain LAN, the picture is different: If we had used the score threshold applied in the report of the original EDI, 50% of the children would have scored below 5 and would have been identified as vulnerable. Report from previous study conducted in Germany, however, suggests that only 20.7% showed deficits in languge and cognitive development [
12]. We interpret this as a lack of contextual appropriateness of several items in the domain LAN, as discussed previously.
Opportunities and barriers of the current version of the GEDI
Our results show that the GEDI was acceptable in the German preschool setting, most items were valid and thus, with further adaptations the GEDI promises to offer a useful tool for monitoring child development at the popoulation level in Germany [
31]. In terms of detecting developmentally delayed children, one of the biggest advantages of the GEDI is that it is designed for teacher proxy reports, which makes assessment independent from parental availability related to language barriers or other factors. Moreover, the instrument allows the preschool teacher to reflect on the development of the individual child. If concerns regarding development delay arise from the GEDI assessment, preschool teachers are well-positioned to determine the relevance of the issue for a specific child in question given their own professional capacity.
Cut-off scores to determine vulnerability rates for German preschoolers would have to be generated with a contextually appropriate version of the GEDI adapted to the context in the ways we describe.
In fact, previous studies on the EDI in other countries [
24‐
26,
28] computed vulnerability rates by classifying children in their study population as developmentally vulnerable if they scored in the lowest 10th percentile in at least one domain. In these countries, the age ranges of children in preschools were somewhat narrower (4–6 years), or they assessed only children from one age group, making comparison with the original EDI sample easier.
Taken together, to establish vulnerability cut-off scores that are contextually appropriate for the German system, the adaptation of the GEDI has to account for both (i) the pedagogical objectives of the German preschool context and (ii) age-appropriateness.
Limitations of our study and future directions
We provide the first evidence of the reliability and validity of a German translation based closely on the internationally renowned EDI instrument. Despite this strength, we acknowledge several limitations to our work. While psychometric evaluations of existing instruments do not require representative samples, for example, a selection bias in our sample makes it difficult to derive reference values and describe child development at the population level. In Germany, a legal requirement exists for active instead of passive consent from children or their parents [
65]. While similar selection biases exist in other studies [
66,
67], the net result is that participation is greatest among higher SES parents. However, to be useful as a population-based measure, future data should be anonymized and routinely collected in preschools.
While most of our measurements reflected moderate to high reliability, long intervals between the first and second measurement points of our test-retest and interrater reliability check are also a limitation of our study. These were related to the competing time demands of preschool teachers. Nevertheless, correlation values between test and retest were moderate to high, indicating good consistency of data over time. With regard to internal consistency, we followed the recommendations of the COSMIN checklist [
68] and were able to show good Cronbach’s alpha values similar to those of the developers [
32]. Nevertheless, Cronbach’s alpha has some limitations in assessing the internal consistency of latent variables [
69].
While we performed a comprehensive EFA [
68], the sample size impeded conducting a confirmatory factor analysis. Moreover, given our small subsample of children assessed with the DESK, we were not able to draw any definitive conclusions on concurrent validity using the DESK. However, using the SDQ as a comparison for the GEDI domains “social competence” and “emotional maturity” in the whole sample exhibited good correlations.
In order to develop an adapted, contextually and age-appropriate version of the GEDI, we suggest the application of Item Response Theory (IRT) [
70]. This method has been used for adapting the Swedish [
54] and Australian [
71] versions of the EDI and may also be suitable for a successful adaptation of the GEDI. We expect IRT analysis to result in a shorter instrument, including only those items with a high information value for age-specific latent trait scopes. This could increase the feasibility of the GEDI for population monitoring in Germany.