Skip to content

Commit

Permalink
ingested workshop ctt.
Browse files Browse the repository at this point in the history
  • Loading branch information
anthology-assist committed Sep 16, 2024
1 parent 986e530 commit 498243f
Show file tree
Hide file tree
Showing 2 changed files with 90 additions and 0 deletions.
88 changes: 88 additions & 0 deletions data/xml/2024.ctt.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
<?xml version='1.0' encoding='UTF-8'?>
<collection id="2024.ctt">
<volume id="1" ingest-date="2024-09-16" type="proceedings">
<meta>
<booktitle>Proceedings of the 1st Workshop on Creative-text Translation and Technology</booktitle>
<editor><first>Bram</first><last>Vanroy</last></editor>
<editor><first>Marie-Aude</first><last>Lefer</last></editor>
<editor><first>Lieve</first><last>Macken</last></editor>
<editor><first>Paola</first><last>Ruffo</last></editor>
<publisher>European Association for Machine Translation</publisher>
<address>Sheffield, United Kingdom</address>
<month>June</month>
<year>2024</year>
<url hash="6b2d1a5c">2024.ctt-1</url>
<venue>ctt</venue>
</meta>
<frontmatter>
<url hash="f52b61ac">2024.ctt-1.0</url>
<bibkey>ctt-2024-1</bibkey>
</frontmatter>
<paper id="1">
<title>Using a multilingual literary parallel corpus to train <fixed-case>NMT</fixed-case> systems</title>
<author><first>Bojana</first><last>Mikelenić</last></author>
<author><first>Antoni</first><last>Oliver</last><affiliation>Universitat Oberta de Catalunya</affiliation></author>
<pages>1-9</pages>
<abstract>This article presents an application of a multilingual and multidirectional parallel corpus composed of literary texts in five Romance languages (Spanish, French, Italian, Portuguese, Romanian) and a Slavic language (Croatian), with a total of 142,000 segments and 15.7 million words. After combining it with very large freely available parallel corpora, this resource is used to train NMT systems tailored to literature. A total of five NMT systems have been trained: Spanish-French, Spanish-Italian, Spanish-Portuguese, Spanish-Romanian and Spanish-Croatian. The trained systems were evaluated using automatic metrics (BLEU, chrF2 and TER) and a comparison with a rule-based MT system (Apertium) and a neural system (Google Translate) is presented. As a main conclusion, we can highlight that the use of this literary corpus has been very productive, as the majority of the trained systems achieve comparable, and in some cases even better, values of the automatic quality metrics than a widely used commercial NMT system.</abstract>
<url hash="fc158c42">2024.ctt-1.1</url>
<bibkey>mikelenic-oliver-2024-using</bibkey>
</paper>
<paper id="2">
<title>‘Can make mistakes’. Prompting <fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case> to Enhance Literary <fixed-case>MT</fixed-case> output</title>
<author><first>Gys-Walt</first><last>Egdom</last><affiliation>Utrecht University</affiliation></author>
<author><first>Christophe</first><last>Declercq</last><affiliation>Utrecht University</affiliation></author>
<author><first>Onno</first><last>Kosters</last><affiliation>Utrecht University</affiliation></author>
<pages>10-20</pages>
<abstract>Operating at the intersection of generative AI (artificial intelligence), machine transla-tion (MT), and literary translation, this paper examines to what extent prompt-driven post-editing (PE) can enhance the quality of ma-chine-translated literary texts. We assess how different types of instruction influence PE performance, particularly focusing on lit-erary nuances and author-specific styles. Situated within posthumanist translation theory, which often challenges traditional notions of human intervention in translation processes, the study explores the practical implementation of generative AI in multilin-gual workflows. While the findings suggest that prompted PE can improve translation output to some extent, its effectiveness var-ies, especially in literary contexts. This highlights the need for a critical review of prompt engineering approaches and empha-sizes the importance of further research to navigate the complexities of integrating AI into creative translation workflows effective-ly.</abstract>
<url hash="ad7fd2f1">2024.ctt-1.2</url>
<bibkey>egdom-etal-2024-make</bibkey>
</paper>
<paper id="3">
<title><fixed-case>L</fixed-case>it<fixed-case>PC</fixed-case>: A set of tools for building parallel corpora

from literary works</title>
<author><first>Antoni</first><last>Oliver</last><affiliation>Universitat Oberta de Catalunya</affiliation></author>
<author><first>Sergi</first><last>Alvarez-Vidal</last><affiliation>Universitat Pompeu Fabra and Universitat Oberta de Catalunya</affiliation></author>
<pages>21-31</pages>
<abstract>In this paper, we describe the LitPC toolkit, a variety of tools and methods designed for the quick and effective creation of parallel corpora derived from literary works. This toolkit can be a useful resource due to the scarcity of curated parallel texts for this domain. We also feature a case study describing the creation of a Russian-English parallel corpus based on the literary works by Leo Tolstoy. Furthermore, an augmented version of this corpus is used to both train and assess neural machine translation systems specifically adapted to the author’s style.</abstract>
<url hash="8e036e4b">2024.ctt-1.3</url>
<bibkey>oliver-alvarez-vidal-2024-litpc</bibkey>
</paper>
<paper id="4">
<title>Prompting Large Language Models for Idiomatic Translation</title>
<author><first>Antonio</first><last>Castaldo</last><affiliation>Universita’ di Pisa, University of Pisa and University of Naples ‘L’Orientale</affiliation></author>
<author><first>Johanna</first><last>Monti</last><affiliation>University of Naples L’Orientale</affiliation></author>
<pages>32-39</pages>
<abstract>Large Language Models (LLMs) have demonstrated impressive performance in translating content across different languages and genres. Yet, their potential in the creative aspects of machine translation has not been fully explored. In this paper, we seek to identify the strengths and weaknesses inherent in different LLMs when applied to one of the most prominent features of creative works: the translation of idiomatic expressions. We present an overview of their performance in the EN<tex-math>\rightarrow</tex-math>IT language pair, a context characterized by an evident lack of bilingual data tailored for idiomatic translation. Lastly, we investigate the impact of prompt design on the quality of machine translation, drawing on recent findings which indicate a substantial variation in the performance of LLMs depending on the prompts utilized.</abstract>
<url hash="fcb674c9">2024.ctt-1.4</url>
<bibkey>castaldo-monti-2024-prompting</bibkey>
</paper>
<paper id="5">
<title>An Analysis of Surprisal Uniformity in Machine and Human Translations</title>
<author><first>Josef</first><last>Jon</last><affiliation>Charles University Prague</affiliation></author>
<author><first>Ondřej</first><last>Bojar</last><affiliation>Charles University Prague</affiliation></author>
<pages>40-56</pages>
<abstract>This study examines neural machine translation (NMT) and its performance on texts that diverege from typical standards, focusing on how information is organized within sentences. We analyze surprisal distributions in source texts, human translations, and machine translations across several datasets to determine if NMT systems naturally promote a uniform density of surprisal in their translations, even when the original texts do not adhere to this principle.The findings reveal that NMT tends to align more closely with source texts in terms of surprisal uniformity compared to human translations.We analyzed absolute values of the surprisal uniformity measures as well, expecting that human translations will be less uniform. In contradiction to our initial hypothesis, we did not find comprehensive evidence for this claim, with some results suggesting this might be the case for very diverse texts, like poetry.</abstract>
<url hash="c72e78a7">2024.ctt-1.5</url>
<bibkey>jon-bojar-2024-analysis</bibkey>
</paper>
<paper id="6">
<title>Impact of translation workflows with and without <fixed-case>MT</fixed-case> on textual characteristics in literary translation</title>
<author><first>Joke</first><last>Daems</last><affiliation>Universiteit Gent</affiliation></author>
<author><first>Paola</first><last>Ruffo</last></author>
<author><first>Lieve</first><last>Macken</last><affiliation>Universiteit Gent</affiliation></author>
<pages>57-64</pages>
<abstract>The use of machine translation is increasingly being explored for the translation of literary texts, but there is still a lot of uncertainty about the optimal translation workflow in these scenarios. While overall quality is quite good, certain textual characteristics can be different in a human translated text and a text produced by means of machine translation post-editing, which has been shown to potentially have an impact on reader perceptions and experience as well. In this study, we look at textual characteristics from short story translations from B.J. Novak’s One more thing into Dutch. Twenty-three professional literary translators translated three short stories, in three different conditions: using Word, using the classic CAT tool Trados, and using a machine translation post-editing platform specifically designed for literary translation. We look at overall text characteristics (sentence length, type-token ratio, stylistic differences) to establish whether translation workflow has an impact on these features, and whether the three workflows lead to very different final translations or not.</abstract>
<url hash="7775effe">2024.ctt-1.6</url>
<bibkey>daems-etal-2024-impact</bibkey>
</paper>
<paper id="7">
<title>Machine Translation Meets Large Language Models: Evaluating <fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case>’s Ability to Automatically Post-Edit Literary Texts</title>
<author><first>Lieve</first><last>Macken</last><affiliation>Universiteit Gent</affiliation></author>
<pages>65-81</pages>
<abstract>Large language models such as GPT-4 have been trained on vast corpora, giving them excellent language understanding. This study explores the use of ChatGPT for post-editing machine translations of literary texts. Three short stories, machine translated from English into Dutch, were post-edited by 7-8 professional translators and ChatGPT. Automatic metrics were used to evaluate the number and type of edits made, and semantic and syntactic similarity between the machine translation and the corresponding post-edited versions. A manual analysis classified errors in the machine translation and changes made by the post-editors. The results show that ChatGPT made more changes than the average post-editor. ChatGPT improved lexical richness over machine translation for all texts. The analysis of editing types showed that ChatGPT replaced more words with synonyms, corrected fewer machine errors and introduced more problems than professionals.</abstract>
<url hash="118eff4e">2024.ctt-1.7</url>
<bibkey>macken-2024-machine</bibkey>
</paper>
</volume>
</collection>
2 changes: 2 additions & 0 deletions data/yaml/venues/ctt.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
acronym: CTT
name: Workshop on Creative-text Translation and Technology

0 comments on commit 498243f

Please sign in to comment.