Add NAACL 2024 DOIs (#3704)

acl-org · Jul 28, 2024 · b154ef1 · b154ef1
1 parent 35a3859
commit b154ef1
Show file tree

Hide file tree

Showing 15 changed files with 1,503 additions and 0 deletions.
diff --git a/data/xml/2024.americasnlp.xml b/data/xml/2024.americasnlp.xml
diff --git a/data/xml/2024.clinicalnlp.xml b/data/xml/2024.clinicalnlp.xml
diff --git a/data/xml/2024.dash.xml b/data/xml/2024.dash.xml
@@ -40,6 +40,7 @@
       <abstract>Prompt engineering is an iterative procedure that often requires extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to provide LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstrations for LLMs is labor-intensive, frequently entailing sifting through an extensive search space. In this demonstration, we showcase a human-in-the-loop tool called ool (Active Prompt Engineering) designed for refining prompts through active learning. Drawing inspiration from active learning, ool iteratively selects the most ambiguous examples for human feedback, which will be transformed into few-shot examples within the prompt.</abstract>
       <url hash="48327f7a">2024.dash-1.1</url>
       <bibkey>qian-etal-2024-ape</bibkey>
+      <doi>10.18653/v1/2024.dash-1.1</doi>
     </paper>
     <paper id="2">
       <title>Towards Optimizing and Evaluating a Retrieval Augmented <fixed-case>QA</fixed-case> Chatbot using <fixed-case>LLM</fixed-case>s with Human-in-the-Loop</title>
@@ -51,6 +52,7 @@
       <abstract>Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of a large multinational company to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot’s response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.</abstract>
       <url hash="77ea7298">2024.dash-1.2</url>
       <bibkey>afzal-etal-2024-towards</bibkey>
+      <doi>10.18653/v1/2024.dash-1.2</doi>
     </paper>
     <paper id="3">
       <title>Evaluation and Continual Improvement for an Enterprise <fixed-case>AI</fixed-case> Assistant</title>
@@ -69,6 +71,7 @@
       <abstract>The development of conversational AI assistants is an iterative process with many components involved. As such, the evaluation and continual improvement of these assistants is a complex and multifaceted problem. This paper introduces the challenges in evaluating and improving a generative AI assistant for enterprise that is under active development and how we address these challenges. We also share preliminary results and discuss lessons learned.</abstract>
       <url hash="80fe204a">2024.dash-1.3</url>
       <bibkey>maharaj-etal-2024-evaluation</bibkey>
+      <doi>10.18653/v1/2024.dash-1.3</doi>
     </paper>
     <paper id="4">
       <title>Mini-<fixed-case>DA</fixed-case>: Improving Your Model Performance through Minimal Data Augmentation using <fixed-case>LLM</fixed-case></title>
@@ -80,6 +83,7 @@
       <abstract>When performing data augmentation using large language models (LLMs), the common approach is to directly generate a large number of new samples based on the original dataset, and then model is trained on the integration of augmented dataset and the original dataset. However, data generation demands extensive computational resources. In this study, we propose Mini-DA, a minimized data augmentation method that leverages the feedback from the target model during the training process to select only the most challenging samples from the validation set for augmentation. Our experimental results show in text classification task, by using as little as 13 percent of the original augmentation volume, Mini-DA can achieve performance comparable to full data augmentation for intent detection task, significantly improving data and computational resource utilization efficiency.</abstract>
       <url hash="caf8af5e">2024.dash-1.4</url>
       <bibkey>yang-etal-2024-mini</bibkey>
+      <doi>10.18653/v1/2024.dash-1.4</doi>
     </paper>
     <paper id="5">
       <title><fixed-case>CURATRON</fixed-case>: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models</title>
@@ -90,6 +94,7 @@
       <abstract>This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), focusing on incomplete and corrupted data in preference datasets. We propose a novel method for robustly and completely recalibrating values within these datasets to enhance LLMs’ resilience against the issues. In particular, we devise a guaranteed polynomial time ranking algorithm that robustifies several existing models, such as the classic Bradley–Terry–Luce (BTL) model and certain generalizations of it. To the best of our knowledge, our present work is the first to propose an algorithm that provably recovers an <tex-math>\epsilon</tex-math>-optimal ranking with high probability while allowing as large as <tex-math>O(n)</tex-math> perturbed pairwise comparison results per model response. Furthermore, we show robust recovery results in the partially observed setting. Our experiments confirm that our algorithms handle adversarial noise and unobserved comparisons well in LLM preference dataset settings. This work contributes to the development and scaling of more reliable and ethically aligned AI models by equipping the dataset curation pipeline with the ability to handle missing and maliciously manipulated inputs.</abstract>
       <url hash="1c3f42e7">2024.dash-1.5</url>
       <bibkey>nguyen-etal-2024-curatron</bibkey>
+      <doi>10.18653/v1/2024.dash-1.5</doi>
     </paper>
   </volume>
 </collection>