Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion: clib #3953

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions data/xml/2014.clib.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
<?xml version='1.0' encoding='UTF-8'?>
<collection id="2014.clib">
<volume id="1" ingest-date="2024-10-11" type="proceedings">
<meta>
<booktitle>Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)</booktitle>
<publisher>Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences</publisher>
<address>Sofia, Bulgaria</address>
<month>September</month>
<year>2014</year>
<url hash="82f47cf0">2014.clib-1</url>
<venue>clib</venue>
</meta>
<frontmatter>
<pages>110</pages>
<url hash="cd607317">2014.clib-1.0</url>
<bibkey>clib-2014-1</bibkey>
</frontmatter>
<paper id="1">
<title>Electronic Language Resources in Teaching Mathematical Linguistics</title>
<author><first>Ivan</first><last>Derzhanski</last></author>
<author><first>Rositsa</first><last>Dekova</last></author>
<pages>1–5</pages>
<abstract>The central role of electronic language resources in education is widely recognised (cf. Brinkley et al, 1999; Bennett, 2010; Derzhanski et al., 2007, among others). The variety and ease of access of such resources predetermines their extensive use in both research and education. With regard to teaching mathematical linguistics, electronic dictionaries and annotated corpora play a particularly important part, being an essential source of information for composing linguistic problems and presenting linguistic knowledge. This paper discusses the need for electronic resources, especially for less studied or low-resource languages, their creation and various uses in teaching linguistics to secondary school students, with examples mostly drawn from our practical work.</abstract>
<url hash="e0a18d34">2014.clib-1.1</url>
<bibkey>derzhanski-dekova-2014-electronic</bibkey>
</paper>
<paper id="2">
<title>Harnessing Language Technologies in Multilingual Information Channelling Services</title>
<author><first>Diman</first><last>Karagiozov</last></author>
<pages>6–13</pages>
<abstract>Scientists and industry have put significant efforts in creating suitable tools to analyze information flows. However, up to now there are no successful solutions for 1) dynamic modeling of the user-defined interests and further personalization of the results, 2) effective cross-language information retrieval, and 3) processing of multilingual content. As a consequence, much of the potentially relevant and otherwise accessible data from the media stream may elude users’ grasp. We present a multilingual information channeling system, MediaTalk, which offers broad integration between language technologies and advanced data processing algorithms for annotation, analysis and classification of multilingual content. As a result, the system not only provides an all-in-one monitoring service that covers both traditional and social media, but also offers dynamic modeling of user profiles, personalization of obtained data and cross-language information retrieval. Bulgarian and English press clipping services relying on this system implement advanced functionalities such as identification of emerging topics, forecasting and trend prediction, all of which allow the users to monitor their standing reputation, events and relations. The architecture of the system is robust, extensible and adheres to the Big Data paradigm.</abstract>
<url hash="09d9e2ea">2014.clib-1.2</url>
<bibkey>karagiozov-2014-harnessing</bibkey>
</paper>
<paper id="3">
<title>Automatic Semantic Filtering of Morphosemantic Relations in <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et</title>
<author><first>Svetlozara</first><last>Leseva</last></author>
<author><first>Ivelina</first><last>Stoyanova</last></author>
<author><first>Borislav</first><last>Rizov</last></author>
<author><first>Maria</first><last>Todorova</last></author>
<author><first>Ekaterina</first><last>Tarpomanova</last></author>
<pages>14–22</pages>
<abstract>In this paper we present a method for automatic assignment of morphosemantic relations between derivationally related verb–noun pairs of synsets in the Bulgarian WordNet (BulNet) and for semantic filtering of those relations. The filtering process relies on the meaning of noun suffixes and the semantic compatibility of verb and noun taxonomic classes. We use the taxonomic labels assigned to all the synsets in the Princeton WordNet (PWN) – one label per synset – which denote their general semantic class. In the first iteration we employ the pairs &lt;noun suffix : noun label&gt; to filter out part of the relations. In the second iteration, which uses as input the output of the first one, we apply a stronger semantic filter. It makes use of the taxonomic labels of the noun-verb synset pairs observed for a given morphosemantic relation. In this way we manage to reliably filter out impossible or unlikely combinations. The results of the performed experiment may be applied to enrich BulNet with morphosemantic relations and new synsets semi-automatically, while facilitating the manual work and reducing its cost.</abstract>
<url hash="87e722fc">2014.clib-1.3</url>
<bibkey>leseva-etal-2014-automatic</bibkey>
</paper>
<paper id="4">
<title>Noun-Verb Derivation in the <fixed-case>B</fixed-case>ulgarian and the <fixed-case>R</fixed-case>omanian <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et – A Comparative Approach</title>
<author><first>Ekaterina</first><last>Tarpomanova</last></author>
<author><first>Svetlozara</first><last>Leseva</last></author>
<author><first>Maria</first><last>Todorova</last></author>
<author><first>Tsvetana</first><last>Dimitrova</last></author>
<author><first>Borislav</first><last>Rizov</last></author>
<author><first>Verginica</first><last>Barbu Mititelu</last></author>
<author><first>Elena</first><last>Irimia</last></author>
<pages>23–31</pages>
<abstract>Romanian and Bulgarian are Balkan languages with rich derivational morphology that, if introduced into their respective wordnets, can aid broadening of the wordnet content and the possible NLP applications. In this paper we present a joint work on introducing derivation into the Bulgarian and the Romanian WordNets, BulNet and RoWordNet, respectively, by identifying and subsequently labelling the derivationally and semantically related noun-verb pairs. Our research aims at providing a framework for a comparative study on derivation in the two languages and offering training material for the automatic identification and assignment of derivational and morphosemantic relations needed in various applications.</abstract>
<url hash="5480494c">2014.clib-1.4</url>
<bibkey>tarpomanova-etal-2014-noun</bibkey>
</paper>
<paper id="5">
<title>Semi-Automatic Detection of Multiword Expressions in the <fixed-case>S</fixed-case>lovak Dependency Treebank</title>
<author><first>Daniela</first><last>Majchrakova</last></author>
<author><first>Ondrej</first><last>Dusek</last></author>
<author><first>Jan</first><last>Hajic</last></author>
<author><first>Agata</first><last>Karcova</last></author>
<author><first>Radovan</first><last>Garabik</last></author>
<pages>32–39</pages>
<abstract>We describe a method for semi-automatic extraction of Slovak multiword expressions (MWEs) from a dependency treebank. The process uses an automatic conversion from dependency syntactic trees to deep syntax and automatic tagging of verbal argument nodes based on a valency dictionary. Both the valency dictionary and the treebank conversion were adapted from the corresponding Czech versions; the automatically translated valency dictionary has been manually proofread and corrected. There are two main achievements – a valency dictionary of Slovak MWEs with direct links to corresponding expressions in the Czech dictionary, PDT-Vallex, and a method of extraction of MWEs from the Slovak Dependency Treebank. The extraction reached very high precision but lower recall in a manual evaluation. This is a work in progress, the overall goal of which is twofold: to create a Slovak language valency dictionary paralleling the Czech one, with bilingual links; and to use the extracted verbal frames in a collocation dictionary of Slovak verbs.</abstract>
<url hash="17f10a1b">2014.clib-1.5</url>
<bibkey>majchrakova-etal-2014-semi</bibkey>
</paper>
<paper id="6">
<title>Automatic Categorisation of Multiword Expressions and Named Entities in <fixed-case>B</fixed-case>ulgarian</title>
<author><first>Ivelina</first><last>Stoyanova</last></author>
<pages>40–48</pages>
<abstract>This paper describes an approach for automatic categorisation of various types of multiword expressions (MWEs) with a focus on multiword named entities (MNEs), which compose a large portion of MWEs in general. The proposed algorithm is based on a refined classification of MWEs according to their idiomaticity. While MWE categorisation can be considered as a separate and independent task, it complements the general task of MWE recognition. After outlining the method, we set up an experiment to demonstrate its performance. We use the corpus Wiki1000+ that comprises 6,311 annotated Wikipedia articles of 1,000 or more words each, amounting to 13.4 million words in total. The study also employs a large dictionary of 59,369 MWEs noun phrases (out of more than 85,000 MWEs), labelled with their respective types. The dictionary is compiled automatically and verified semi-automatically. The research presented here is based on Bulgarian although most of the ideas, the methodology and the analysis are applicable to other Slavic and possibly other European languages.</abstract>
<url hash="9289693d">2014.clib-1.6</url>
<bibkey>stoyanova-2014-automatic</bibkey>
</paper>
<paper id="7">
<title>Temporal Adverbs and Adverbial Expressions in a Corpus of <fixed-case>B</fixed-case>ulgarian and <fixed-case>U</fixed-case>krainian Parallel Texts</title>
<author><first>Ivan</first><last>Derzhanski</last></author>
<author><first>Olena</first><last>Siruk</last></author>
<pages>49–54</pages>
<abstract>This paper presents a comparative bilingual corpus-based study of the use of several frequent temporal adverbs and adverbial expressions (‘always’, ‘sometimes’, ‘never’ and their synonyms) in Bulgarian and Ukrainian. The Ukrainian items were selected with the aid of synonym dictionaries of words and of set expressions, the corpus was used to identify their most common Bulgarian counterparts, and the frequencies of the correspondences were compared and scrutinised for possibly informative regularities.</abstract>
<url hash="53907c66">2014.clib-1.7</url>
<bibkey>derzhanski-siruk-2014-temporal</bibkey>
</paper>
<paper id="8">
<title>Historical Corpora of <fixed-case>B</fixed-case>ulgarian Language and Second Position Markers</title>
<author><first>Tsvetana</first><last>Dimitrova</last></author>
<author><first>Andrej</first><last>Boyadzhiev</last></author>
<pages>55–63</pages>
<abstract>This paper demonstrates how historical corpora can be used in researching language phenomena. We exemplify the advantages and disadvantages through exploring three of the available corpora that contain textual sources of Old and Middle Bulgarian language to shed light on some aspects of the development of two words of ambiguous class. We discuss their behaviour to outline certain conditions for diachronic change they have undergone. The three corpora are accessible online (and offline – for downloading search results, xml files, etc.).</abstract>
<url hash="e9adefa8">2014.clib-1.8</url>
<bibkey>dimitrova-boyadzhiev-2014-historical</bibkey>
</paper>
<paper id="9">
<title>Mаchine Translation Based on <fixed-case>W</fixed-case>ord<fixed-case>N</fixed-case>et and Dependency Relations</title>
<author><first>Luchezar</first><last>Jackov</last></author>
<pages>64–72</pages>
<abstract>The proposed machine translation (MT) approach uses WordNet (Fellbaum, 1998) as a base for concepts. It identifies the concepts and dependency relations using context-free grammars (CFGs) enriched with features, role markers and dependency markers. Multiple interpretation hypotheses are generated and then are scored using a knowledge base for the dependency relations. The hypothesis with the best score is used for generating the translation. The approach has already been implemented in an MT system for seven languages, namely Bulgarian, English, French, Spanish, Italian, German, and Turkish, and also for Chinese on experimental level.</abstract>
<url hash="2c1e5f52">2014.clib-1.9</url>
<bibkey>jackov-2014-machine</bibkey>
</paper>
<paper id="10">
<title>Recognize the Generality Relation between Sentences Using Asymmetric Association Measures</title>
<author><first>Sebastiao</first><last>Pais</last></author>
<author><first>Gael</first><last>Dias</last></author>
<author><first>Rumen</first><last>Moraliyski</last></author>
<pages>73–81</pages>
<abstract>In this paper we focus on a particular case of entailment, namely entailment by generality. We argue that there exist various types of implication, a range of different levels of entailment reasoning, based on lexical, syntactic, logical and common sense clues, at different levels of difficulty. We introduce the paradigm of Textual Entailment (TE) by Generality, which can be defined as the entailment from a specific statement towards a relatively more general statement. In this context, the Text T entails the Hypothesis H, and at the same time H is more general than T . We propose an unsupervised and language-independent method to recognize TE by Generality given a case of Text − Hypothesis or T − H where entailment relation holds.</abstract>
<url hash="30f9e32b">2014.clib-1.10</url>
<bibkey>pais-etal-2014-recognize</bibkey>
</paper>
<paper id="11">
<title>Unsupervised and Language Independent Method to Recognize Textual Entailment by Generality</title>
<author><first>Sebastiao</first><last>Pais</last></author>
<author><first>Gael</first><last>Dias</last></author>
<author><first>Joao</first><last>Cordeiro</last></author>
<author><first>Rumen</first><last>Moraliyski</last></author>
<pages>82–90</pages>
<abstract>In this work we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). In text, there are different kinds of entailment yielded from different types of implicative reasoning (lexical, syntactic, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more G general one. Therefore, we have T (G)→ H whenever the premise T entails the hypothesis H, the hypothesis being more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, given a pair T, H in an entailment relation. We have evaluated our proposal G → H English pairs, where we know through two experiments: (a) Test on T (G)→ H English pairs, where we know that TEG holds; (b) Test on T → H Portuguese pairs, randomly selected with 60% of TEGs and 40% of TE without generality dependency (TEnG).</abstract>
<url hash="9897e644">2014.clib-1.11</url>
<bibkey>pais-etal-2014-unsupervised</bibkey>
</paper>
</volume>
</collection>
Loading
Loading