THE DEVELOPMENT OF VOCABULARIES OF HISTORICAL PERIOD NAMES FROM WEB ACQUIRED CORPORA7

Authors

  • Maria S. Mouroutsou National and Kapodistrian University of Athens, School of Philosophy, Dept. of Philology Panepistimiopolis, 15784 Ilissia, Greece
  • Stella Markantonatou ILSP/ “Athena”RIC, Artemidos 6 & Epidavrou GR- 151 25 Maroussi, Greece
  • Vasilis Papavasiliou ILSP/ “Athena”RIC, Artemidos 6 & Epidavrou GR- 151 25 Maroussi, Greece

Keywords:

periodization, time period name, Focused Monolingual Crawler, unstructured Web data

Abstract

Periodization is a universal and very popular system of organizing History (Petras, et al., 2006) by arbitrary dividing time into periods such as “Δικτατορία” (dictatorship) in a way that is specific to places and communities. Structured collections of time period names and timelines are considered very useful in cultural content documentation and temporal information extraction. However, to the best of our knowledge, this is the first report on the systematic collection of period names of Greek History. New period names are constantly created or left out of use. Aiming to capture this combination of dispersed specificity and constant evolution, we used the Focused Monolingual Crawler (FMC) (Mastropavlos, et al., 2011) and an initial list of 25 “seed-terms” to develop corpora dense in period names with Web retrieved documents. Period names were manually retrieved from the accumulated corpora and were annotated for a set of features, including allomorphs that occurred in the collected corpora and whether the term denoted a fact or a time period or something else as well as for persons, places and other period names related with the term. The linguistic environments where the terms occurred were identified and some of them were fed to the (FMC) as new “seed-terms”. This cycle was repeated for three times and yielded 78 period names with an average of 16 paradigms per term and a corpus consisting of 3020 valid XML documents. Some first observations on the strategies employed by Greek communities to coin time period names are reported.

Downloads

Published

2023-07-28

Issue

Section

Articles