OnomOs – The Ostrava Corpus of Proper Names
SGS02/FF/2023, University of Ostrava (2023)
Jaroslav David, Tereza Klemensová, Jana Davidová Glogarová, Michal Místecký, Agata Reclik, Jarmila Mádrová, Kristýna Březinová
The goal of the one-year project OnomOs – The Ostrava Corpus of Proper Names is to create a specialised language corpus for proper names research with the participation of students and academics. The planned corpus corresponds to the research topics addressed at the proponent’s workplace (quantitative text analysis, stylometry, onomastics) and to the current GAČR project Quantitative Onomastics: backgrounds, concepts, applications. However, in contrast to the above mentioned GAČR project, which focuses on researching the behaviour of proper names in text and the usage of quantitative and corpus approaches in their analysis, the proposed SGS project aims at the very creation of corpora. It aims to build a specific, period-based corpus, which will be compiled in cooperation with the Institute of the Czech National Corpus of the Faculty of Arts, Charles University in Prague, and to explore and detect the possibilities and problems of annotating proper names (via the named entity annotation method).
Analysis of the syntactic complexity of texts of the corpus CzeSL-SGT
SGS08/FF/2023, University of Ostrava (2023)
Miroslav Kubát, Radek Čech, Michaela Hanušková, Michaela Nogolová
The project will focus on the analysis of the development of the syntactic complexity of the texts of non-native Czech speakers across different language levels (A1-C1 according to CEFR). Syntactic complexity will be measured by two methods: a) mean dependency distance (MDD) and b) mean length of linear dependency segments (LDS). The research will be based on the student corpus CzeSL-SGT, which contains more than 8 000 texts written by non-native speakers of Czech.
Quantitative Syntactic Stylistics of Contemporary Written Czech
Miroslav Kubát, Xinying Chen, Radek Čech
The project focuses on syntactic features of different styles in contemporary written Czech. The research is based on the quantitative analysis of the corpus SYN2020 which is a syntactically annotated, representative corpus of contemporary written Czech. The corpus consists of 100 million tokens. Various syntactic features such as mean sentence length, sentence types, word order, modality, distribution of syntactic functions, indicators of attributivity and subjectivity are analyzed. Our aim is to enrich the previous research on the style of Czech texts with new perspectives. First, we focus on the syntactic part of Czech stylistics which usually stands out of the main interest of scholars. Second, our analysis is based on quantitative methods.
Quantitative Onomastics: Starting Points, Concepts, Applications
Jaroslav David, Michal Místecký, Tereza Klemensová
The project will focus on the theoretical and practical possibilities of quantitative onomastics, these being new disciplines which are aimed at mathematically based research into the area of proper names. The project focuses, in the theoretical areas, on incorporation of proper names into the context of general linguistic diversification tendencies and into research into proper names from a diachronic perspective by means of Piotrowski?s law. The verification and utilization of the theoretical conclusions and methods focus on the quantification of the dynamics in developing onymic areas (for example, literary onomastics, as well the linking of onomastics and critical analysis of discourse). Attention is also paid to the role of proper names in contemporary social-political contexts. In terms of methodology, the project is based on corpus research, making use for its realization of a wide range of available tools and collections of texts, first and foremost from the Czech National Corpus. The basic types of scholarly approaches are frequency, stylometric and collocation analysis.
Quantitative analysis of texts of CzeSL-SGT corpus
SGS06/FF/2022, University of Ostrava (2022)
Miroslav Kubát, Radek Čech, Michaela Hanušková, Michaela Nogolová, Markéta Guńková
The project will focus on the quantitative analysis of the texts of the CzeSL-SGT corpus in order to obtain data on texts of individual language levels, to model the development of these texts and to analyze the process of learning Czech as a foreign language. This corpus contains over 8000 texts written by learners of Czech as a foreign language at all language levels. We will analyze the texts using the QuitaUP and UDPipe software, which allow us to compute various properties of the texts. In particular, we will be interested in the average length of tokens, the descriptivity of the text, the verb distances, the length of sentences, lexical richness, the number of clauses in a sentence, syntactic characteristics of dependency trees.
Between Etymology and Landscape: Topics of the 21th Century Czech Onomastics
SGS03/FF/2022, University of Ostrava (2022)
Jaroslav David, Kristýna Kovářová, Tereza Klemensová, Jana Davidová Glogarová, Michal Místecký, Agata Reclik, Jarmila Mádrová, Kristýna Březinová
The goal of the one-year project Between Etymology and Landscape: Topics of the 21th Century Czech Onomastics is to complete and publish a monograph, which is the result of mutual cooperation between university students and teachers. The planned monograph is based on research topics solved at the proposer’s workplace (language landscape, quantitative analysis of texts, social aspects of the usage of proper names, urbanonymy and hydronymy research). In the context of current research, it showcases perspectives of the current research, which makes use of the methodology of stylometry, textual linguistics, geography, and anthropology. In more than twenty intended chapters, it presents these topics and thus provides a comprehensive picture of the current state of onomastic research, with an emphasis put on the role that the Ostrava department of Czech language plays, at least in the Central European context. In style and form, the monograph has the potential to appeal to experts and educated audience. Other outputs of the project will include thematically related studies, conference presentations, and chapters in diploma and dissertation theses.
Reflection of Language and Linguistic Issues in Non-Linguistic Texts
SGS (2020-2021), SGS01/FF/2020-2021
Jaroslav David, Michal Místecký, Tereza Klemensová, Agata Rupińska
The goal of the two-year project is to present the way non-linguistic texts (e.g., texts which are not written for the linguistic community and the authors of which who are not experts in linguistics) treat topics that are, at the same time, dealt within the domain of academic linguistics (e.g., opinion journalism, fiction, internet discussion forums). The project focuses on the following topics:
1) approaches to language correctness and the criteria for it in the perspective of non-linguists;
2) discussions about language purism, the concept of the good author, new rules of Czech orthography, and the decline of language;
3) reflection of the political discourse in the books by Czech authors (e.g., Karel Čapek, Karel Poláček, Václav Havel);
4) reflection of the language component of proper names – approaches to creating feminine forms; approaches to given names; changes of geographical names and the argumentation employed, as illustrated with the example of the Czech post-war borderlands and the contemporary Těšín region and Slovakia; the issue of the Czechia name in the perspective of collocation analysis
5) assessment of quality of languages on the basis of the Czech national corpus opinion journalism (collocation analysis via association measures);
6) reflection of the language of a visited country, as illustrated with the Czech travelogues to China.
Do projektu budou zapojeni 3 akademici, 6 studentů NMgr. studia a 1 student dr. studia, z toho 3 studenti zahraniční.
The structure of the grant team: 3 university teachers (academicians), 6 students in the MA study programme, and 1 student in the PhD study programme. These include 3 foreign students.
Language Devices of the Participants in Military Missions Abroad II.
SGS (2021), SGS05/FF/2021
Lucie Radková, jarmila Mádrová
The project is, by its nature, of interdisciplinary character, and is connected to one of the three main research areas pursued at the Department of Czech Language, Faculty of Arts, University of Ostrava – namely, to the studies of sociolects. The topic of the project, which develops the SGS05/FF/2020 one, is field research among the participants in international military operations and observation missions in Afghanistan and Mali. The primary goal of the project investigators is to compile the up-to-date word-stock used by Czech soldiers located at the NATO bases and to find out what military language means to the mission participants, what its role is in their communication, and what rules it follows. It is a topic which, due to its scope and demanding research, has yet to be investigated thoroughly in Czech linguistics. The researchers are going to continue with the tasks started in 2020 and finish the field research, which has been complicated by the current pandemic situation.
Language Devices of the Participants in Military Missions Abroad
SGS (2020), SGS05/FF/2020
The project is, by its nature, of interdisciplinary character, and is connected to one of the three main research areas pursued at the Department of Czech Language, Faculty of Arts, University of Ostrava, namely, to the studies of sociolects. The object of the project is field research among the participants in international military operations and observation missions in Afghanistan and Mali. The primary goal of the project investigators is to compile the up-to-date word-stock used by Czech soldiers located at the NATO bases and to find out what military language means to the mission participants, what its role is in their communication, and what rules it follows. It is a topic which, due to its scope and demanding research, has yet to be investigated thoroughly in Czech linguistics. Both qualitative and quantitative methods will be employed in the investigation. The field research on the function of military language will have an international character – it will also cover the army contingents coming from other nations, mostly the American and the Spanish ones. In future, the acquired data will be connected to research carried out in the other sections of the Army of the Czech Republic (e.g., the Military Police, the Active Reserves), and will be made use of in the monograph on the language of law enforcement groups (the Police of the Czech Republic, the Army of the Czech Republic).
Online Media Discourse on Motherhood
SGS (2019), SGS02/FF/2019
Zuzana Černá, Radek Čech
The aim of the project submitted is to analyze and interpret the discourse on motherhood which is present in Czech daily newspapers with the highest audience rates, i.e. in Metro, MF Dnes, Blesk and Deník in their online version. In terms of our project, we – in compliance with the representatives of the critical discourse analysis – consider discourse to be a way of referring to an issue from a particular perspective. We will combine the qualitative and quantitative research strategy, i.e. the approach of the critical discourse analysis and the methodological approaches of corpus linguistics, and combine them in the Corpus-Assisted Discourse Studies.
We aim to find out whether the discourse of chosen texts is neutral or persuasive (in the latter case, we try to identify its purpose and social context). We will also be interested in the categories of motherhood and identities ascribed to mothers. The partial aims will be the analysis of the lexical means (names for motherhood, mothers, relationship between mothers and non-mothers, unborn as well as born children, situations connected with pregnancy and motherhood, collocations and concordances), grammatical means, graphical means and intertextuality.
Reality Creation through Language – A Qualitative Analysis of Modern Czech Texts
SGS (2018–2019), SGS02/FF/2018–2019
Jaroslav David, Jana Davidová Glogarová, Kristýna Bílková, Zuzana Kaňáková, Tereza Klemensová, Agata Rupińska, Tinglin Sun
The goal of the 2-year project is to present language as a device for creating reality in a selected types of communicative situations and texts. The research topic is examined in the following particular topics: 1) constructing Old Czech language (in the sense ‚pretending to be authentic‘) in the modern fiction, films and marketing texts, 2) thematization (in the way of etymologization, interpretation, re-semantization) of proper names in the fiction (novel, travel writing) and opinion-journalism (political speeches). The project examines the original topics that have not been elaborated in Czech linguistics yet; the analysis is based on varied texts of different genres.
Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts
SGS, 2018, SGS01/UVAFM/18
Miroslav Kubát, Radek Čech, Jan Hůla, David Číž, Kateřina Pelegrinová
The project follows up the previous SGS project Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts. The first analysis showed that there is a convincing potential of this approach. The main goal is to extend the functionality of the developed software and to discover the possible applications of the proposed method in linguistic research. Specifically, with our method we can measure the Context specificity of lemma (CSL). This method is based on the Word2vec technique and measures the degree of the context specificity of a lemma.
Development of the Czech Pronominal (en)Clitics
GA ČR, 2017–2019
Radek Čech, Pavel Kosek (Faculty of Arts, Masaryk University Brno)
The project focuses on the development of the word order of Czech pronominal (en)clitics mi, si, ti; ho, mu, sě, tě. The analysis is based on representative parts of Old and Middle Czech Bibles (created in the 14th–18th centuries). The word order of pronominal (en)clitics is investigated: 1. in the phrase of finite verb, 2. in the infinitive, participle, (deverbative) adjective, and (deverbative) substantive phrase. The research deals especially with the competition between the second position and the contact (verb adjacent) position of the (en)clitics, with the (en)clitic cluster, with the change of originally orthotonic pronominal forms ho, mu, sě, tě to „constant“ (en)clitics and with the proclitization of pronominal (en)clitics. The project methodology draws on the tradition of Czech dependency and functional syntax. As the analysis of historical development of (en)clitics is also based on frequency characteristics of the observed phenomena, methods of quantitative linguistics are used for a further interpretation of the data.
Application of Neural Networks in Diachronic and Synchronic Semantic Analysis of Texts
SGS, 2017, SGS02/UVAFM/2017
Radek Čech, Miroslav Kubát, Jan Hůla, Vojtěch Molek
The aim of the project is to apply the contemporary methods based on neural networks in textology. Semantic changes in a Czech corpus are analyzed from synchronic and diachronic viewpoints. More specifically, (a) we examine the development of the political and social discourse from 1990 to 2014, and (b) we investigate the effectiveness of this method for genre classification. The project reflects the research topics of the Department of Czech Language (quantitative linguistics) and the Institute for Research and Applications of Fuzzy Modeling of the University of Ostrava (neural networks).
New Trends in Toponymy Research – Illustrated with Particular Localities of Moravia and Silesia Regions
SGS, project No SGS01/FF, 2016–2017
The project is aimed at research of current place names, their knowledge and usage in particular localities of Moravia and Silesia (Čeladná, Dolní Údolí, Horní Údolí, Karlova Studánka, Libavá, Ostrava-Poruba, Ostrava-Svinov, Ostrava-Třebovice, Ostravice, Rejvíz, Rusava; the industrial regions of Karviná and Ostrava). The research is realized in localities with specific demographic, historical and social developments during the 20th century. In the course of the project, new approaches and concepts are applied to place names – e.g., linguistic landscape, new regionalism, place branding, and place marketing.
The project is aimed at spoken language of Czech policemen (Policie ČR), predominantly on vocabulary of their professional slang. The research was based on a field survey mostly realized among the elite police forces. The emphasis was also put on the cryptic function of their communication.
Place Names – Endangered Cultural Heritage
SGS, project No SGS1/FF, 2014–2015
The project was aimed at research of place names in localities that were demolished or are in danger due to human activity, predominantly coal mining and building large plants. There are the localities of Most and Karviná (coal mining); Staré Hamry, Nové Heřminovy (dam building); Ostrava-Hrušov (a locality in decay caused by a flood).
Outputs of this project include several articles, chapters in collections of studies, and also an exhibition presentation.
Vocabulary Used by Drug Addicts II
SGS, SGS13/FF/2014, 2014
The project was focused on the specificity of communication – predominantly of vocabulary – used within a subculture of users, distributors and producers of illegal drugs. This addresses two key goals. Firstly, it aims to offer readers the largest possible collection of original, previously unpublished, authentic, contemporary linguistic material taken from the communication of users, distributors, and producers of illegal drugs. Secondly, it aims to determine whether the members of the drug subculture attempt to conceal meanings in their communication by using specific vocabulary. The research also examines the word-formation processes that are active in certain specific expressions – especially those related to the production of methamphetamine – and traces the development of and the situation with regard to illegal drug use in the Czech Republic. The outputs of this project include the book Mluva uživatelů a výrobců drog (Language Used by Drug Users and Producers; written with Jana Rausová) and several articles and conference papers.
Vocabulary Used by Drug Addicts
SGS, SGS2/FF/2013, 2013
The research was based on a field survey realized within a subculture of Czech users, distributors and producers of illegal drugs. The project was the first step to gain a language material for further analysis aimed predominantly at slang vocabulary. The inal outputs of this project include the book Mluva uživatelů a výrobců drog (Language Used by Drug Users and Producers; wriiten with Jana Rausová) and several articles and conference papers.
QUITA (Quantitative Index Text Analyzer) – Software Measuring Vocabulary Richness and Other Quantitative Features of Texts
IGA (no. FF_2013_031)
Radek Čech, Vladimír Matlach, Miroslav Kubát
Quantitative Index Text Analyzer (QUITA) covers the most common indicators, especially those connected with frequency structure of a text. In addition to computing results of the indicators, QUITA also provides statistical testing and graphical visualization of obtained data. QUITA is a versatile tool with many uses designed for researchers from various disciplines (linguistics, literary criticism, history, sociology, psychology, politics, biology, etc.). The programme enables basic text processing functions – such as creating word lists, text lemmatizing, or creating n-grams. The program also provides more advanced tools, such as a random text creator or a binary file translator. However, the main part of the software is an indicator computing. Although the authors focused mainly on the indicators connected to frequency structure of a text (e.g., h-point, entropy, repeat rate, adjusted modulus, Gini’s coefficient, lambda), there are also several other characteristics, such as thematic concentration, activity & descriptivity, or writer’s view. More information about the software is to be found in the book QUITA – Quantitative Index Text Analyzer and in the diploma thesis Kvantitativně lingvistický software.
Place Names as a Significant Component of Cultural Heritage and an Important Source of Local, Regional, and National Identity. Preparing Guidelines for the Preservation of Place Names.
NAKI, project No DF11P01OVV022, 2011–2014
Jaroslav David, Přemysl Mácha (Faculty of Sciences, University of Ostrava)
The goal of this project was to contribute to the preservation of place names as significant components of cultural heritage and important sources of local, regional, and national identity. The way to this goal included the preparation of certified guidelines for the preservation and presentation of place names to be used by town and village authorities and other institutions (schools, museums, libraries, archives, etc.), the exhibitions (the cities of Ostrava and Havířov), and the preparation of maps and cartographic guidelines for place names. Further outputs of this project included the book Názvy míst. Paměť, identita, kulturní dědictví (Place Names. Memory, Identity, Cultural Heritage), several articles and conference papers, and the database Názvy míst (Place Names) collecting popular place names forms.
GAČR, project No GAP406/11/0268, 2011–2013
Jaroslav David, Radek Čech, Jana Davidová Glogarová, Lucie Radková, Hana Šústková
The project Historical Semantics focused on historical/diachronic semantics. Fortunately, Czech historical semantics has an existing scholarly methodology for lexical analysis, and the medieval period has been studied in depth. However, the 19th century and the modern era have not received any systematic study. The semantic changes in Modern Czech are illustrated not only with an analysis of the material in dictionaries, thesauruses, and newspapers (e.g., texts by Karel Čapek, Ladislav Jehlička, Jaroslav Durych, New Year’s presidential speeches, etc.), but also includes the language material in the Czech National Corpus. The research into historical semantics in modern times – known as Begriffsgeschichte, History of Concepts, Political Semantics, and Critical Discourse Analysis – emanates from an interdisciplinary approach to semantics and lexis. The outputs of this project include the book Slovo a text v historickém kontextu (Words nad Texts in Historical Contexts), and several articles and conference papers.
Components of Transitivity Analysis of Czech Sentences (Emergent Grammar Approach)
GAČR 405/08/P157, 2008–2010
The project focuses on the analysis of components of high and low transitivity and the relationship among them. The emergent grammar approach is adopted for an inquiry of transitivity. In the framework of emergent grammar, transitivity is viewed as a matter of the grammar of the entire clause, and it comprises the component parameters which are displayed in morphosyntax or semantics. The analysis will cover the texts stored in the Czech National Corpus. The goal of the project is a more precise recognition of transitivity in general – both of its inner structure (the relationship among its components), and of the transitivity and its components dependence on grammar-external factors (language form, types of texts, etc.). The project will also deal with the possibilities of using the findings in mother tongue teaching. The results of the research will be published in linguistic journals and presented at conferences.
Folk Etymology – its Specificity and Functions (as Illustrated with the Place Names)
GAČR, project č. 405/07/P144, 2007–2009
The project was aimed at several issues reflecting the usage of folk/popular etymology in proper names presentation, predominatly in place names presentation – e.g., popular etymology used as a rhetorical figure, its usage in political discourse and fiction, etc. The outputs of this project include several articles and conference papers. The research project topic was also elaborated in the specialized chapters of books Neviditelní svědkové minulosti. Místní a pomístní jména na Vysočině (Invisible Witnesses of the Past. Official and Unofficial Place Names in the Vysočina Region) and Smrdov, Brežněves a Rychlonožkova ulice. Kapitoly z moderní české toponymie (Smrdov, Brežněves, and Rychlonožka Street. Chapters from Modern Czech Toponymy).