Vyšel nám článek v Corpus Linguistics and Linguistic Theory

Miroslav Kubát a Radek Čech společně s dalšími kolegy publikovali studii s názvem The lexical context in a style analysis: A word embeddings approach. Článek vyšel v prestižním časopisu Corpus Linguistics and Linguistic Theory, který patří do prvního kvartilu WoS.

The lexical context in a style analysis: A word embeddings approach

Miroslav Kubát, Jan Hůla, Xinying Chen, Radek Čech and Jiří Milička


This is a pilot study of usability of Context Specificity measure for stylometric purposes. Specifically, the word embedding Word2vec approach based on measuring lexical context similarity between lemmas is applied to the analysis of texts that belong to different styles. Three types of Czech texts are investigated: fiction, non-fiction, and journalism. Specifically, forty lemmas were observed (10 lemmas each for verbs, nouns, adjectives, and adverbs). The aim of the present study is to introduce a concept of the Context Specificity and to test whether this measurement is sensitive to different styles. The results show that the proposed method Closest Context Specificity (CCS) is a corpus size independent method which has a promising potential in analyzing different styles.

Keywords: neural networks; word embedding; word2vec; stylometry; style