Studie o syntaktických funkcích ve stylometrii
Kolega M. Kubát společně s J. Mačutkem, R. Čechem a M. Nogolovou publikovali studii „Automatic Genre Classification of Czech Texts Based on Syntactic Functions“ v knize New Frontiers in Textual Data Analysis v nakladatelství Springer.
Abstrakt:
Although there has been research conducted on text classification based on syntactic features for decades, the recent development of accurate automatic syntactic taggers has enabled scholars to apply the methods to much larger and more diverse datasets than before. This study aims to classify various text types in Czech language using relative frequencies of syntactic functions (as they are defined in the Prague Dependency Treebank (PDT)). A large balanced corpus of contemporary written Czech SYN2020 is used as the language material. The distances between texts are calculated by the Cosine Delta method and then hierarchical cluster analysis is performed. The results indicate that syntactic functions can contribute to automatic genre classification based on large empirical language data.