Nová studie o didaktických tesech z kvantitativního hlediska
Michal Místecký s kolegyněmi publikovali studii More similar than needed: Czech exam texts from the perspective of quantitative linguistics v časopisu Glottotheory.
Abstract:
The paper compares texts which are part of Czech language didactic tests, the examinations used as the acceptance prerequisite for secondary schools (eight-year [E8], six-year [E6], and four-year [E4] ones) or as the secondary school final check (called “maturita” [M] in Czech) of the level of knowledge of the pupils’ mother tongue. The corpus comprises 334 texts in 56 tests (more than 66,000 tokens) and covers the timespan of 2019–2023. The indexes employed in the comparison are average token length (ATL), activity (Q), moving-average type–token ratio (MATTR), moving-average morphological richness (MAMR), and verb distances (VD). The statistically significant differences are search for using Kruskal–Wallis test and Dwass–Steel–Critchlow–Fligner test. It has been found out that the texts differ as to ATL, Q, and VD, but further statistical testing declared mostly the E8–M difference as statistically significant. The correlation analysis has confirmed that the three indexes are correlated, this implying that the difference is a product of one, verbalisation-to-nominalization tendency. Finally, we performed a k-means cluster analysis (preceded by t-SNE, used to reduce the number of dimensions), which divided the texts into two groups – “A” (easy) and “B” (difficult). These two groups are distributed rather equally in E8, E6, E4, but in M, the B-texts considerably prevail.
Místecký, Michal, Radková, Lucie, Stiborská, Žaneta and Hrubá, Darina. „More similar than needed: Czech exam texts from the perspective of quantitative linguistics“ Glottotheory.
https://www.degruyterbrill.com/document/doi/10.1515/glot-2025-2014/html