From 944b877523d30583ae77de8d666ceb979356daf0 Mon Sep 17 00:00:00 2001 From: nataliaElv Date: Fri, 22 Nov 2024 12:09:38 +0100 Subject: [PATCH] Updated images and banners --- chapters/en/chapter10/1.mdx | 7 +++++++ chapters/en/chapter10/2.mdx | 9 ++++++++- chapters/en/chapter10/3.mdx | 11 +++++++++-- chapters/en/chapter10/4.mdx | 9 +++++++-- chapters/en/chapter10/5.mdx | 9 ++++++++- chapters/en/chapter10/6.mdx | 7 ++++++- chapters/en/chapter10/7.mdx | 2 +- 7 files changed, 46 insertions(+), 8 deletions(-) diff --git a/chapters/en/chapter10/1.mdx b/chapters/en/chapter10/1.mdx index ec0239e1b..4301620cc 100644 --- a/chapters/en/chapter10/1.mdx +++ b/chapters/en/chapter10/1.mdx @@ -1,9 +1,16 @@ # Introduction to Argilla[[introduction-to-argilla]] + + In Chapter 5 you learnt how to build a dataset using the 🤗 Datasets library and in Chapter 6 you explored how to fine-tune models for some common NLP tasks. In this chapter, you will learn how to use [Argilla](https://argilla.io) to **annotate and curate datasets** that you can use to train and evaluate your models. The key to training models that perform well is to have high-quality data. Although there are some good datasets in the Hub that you could use to train and evaluate your models, these may not be relevant for your specific application or use case. In this scenario, you may want to build and curate a dataset of your own. Argilla will help you to do this efficiently. +Argilla sign in page. + With Argilla you can: - turn unstructured data into **structured data** to be used in NLP tasks. diff --git a/chapters/en/chapter10/2.mdx b/chapters/en/chapter10/2.mdx index 7baaf1dd7..fc4739d3a 100644 --- a/chapters/en/chapter10/2.mdx +++ b/chapters/en/chapter10/2.mdx @@ -1,4 +1,11 @@ -# Set up your Argilla instance +# Set up your Argilla instance[[set-up-your-argilla-instance]] + + To start using Argilla, you will need to set up your own Argilla instance first. Then you will need to install the Python SDK so that you can manage Argilla using Python code. diff --git a/chapters/en/chapter10/3.mdx b/chapters/en/chapter10/3.mdx index 55b40192c..3d8a5153b 100644 --- a/chapters/en/chapter10/3.mdx +++ b/chapters/en/chapter10/3.mdx @@ -1,4 +1,11 @@ -# Load your dataset to Argilla +# Load your dataset to Argilla[[load-your-dataset-to-argilla]] + + Depending on the NLP task that you're working with and the specific use case or application, your data and the annotation task will look differently. For this section of the course, we'll use [a dataset collecting news](https://huggingface.co/datasets/SetFit/ag_news) to complete two tasks: a text classification on the topic of each text and a token classification to identify the named entities mentioned. @@ -33,7 +40,7 @@ We can now think about the settings of our dataset in Argilla. These represent t from datasets import load_dataset data = load_dataset("SetFit/ag_news", split="train") -data.features() +data.features ``` These are the features of our dataset: diff --git a/chapters/en/chapter10/4.mdx b/chapters/en/chapter10/4.mdx index 710d0ce39..59c845502 100644 --- a/chapters/en/chapter10/4.mdx +++ b/chapters/en/chapter10/4.mdx @@ -1,4 +1,9 @@ -# Annotate your dataset +# Annotate your dataset[[annotate-your-dataset]] + + Now it is time to start working from the Argilla UI to annotate our dataset. @@ -25,7 +30,7 @@ Sometimes, you want to have more than one submitted response per record, for exa When you open your dataset, you will realize that the first question is already filled in with some suggested labels. That's because in the previous section we mapped our question called `label` to the `label_text` column in the dataset, so that we simply need to review and correct the already existing labels: -Screenshot of the dataset in Argilla. +Screenshot of the dataset in Argilla. For the token classification, we'll need to add all labels manually, as we didn't include any suggestions. This is how it might look after the span annotations: diff --git a/chapters/en/chapter10/5.mdx b/chapters/en/chapter10/5.mdx index d889d51a7..7a3b01f1b 100644 --- a/chapters/en/chapter10/5.mdx +++ b/chapters/en/chapter10/5.mdx @@ -1,4 +1,11 @@ -# Use your annotated dataset +# Use your annotated dataset[[use-your-annotated-dataset]] + + We will learn now how to export and use the annotated data that we have in Argilla. diff --git a/chapters/en/chapter10/6.mdx b/chapters/en/chapter10/6.mdx index d5b061832..a65b9e3c8 100644 --- a/chapters/en/chapter10/6.mdx +++ b/chapters/en/chapter10/6.mdx @@ -1,4 +1,9 @@ -# Argilla, check! +# Argilla, check![[argilla-check]] + + That's all! Congrats! 👏 diff --git a/chapters/en/chapter10/7.mdx b/chapters/en/chapter10/7.mdx index 90b923e8b..1e0b82173 100644 --- a/chapters/en/chapter10/7.mdx +++ b/chapters/en/chapter10/7.mdx @@ -3,7 +3,7 @@ # End-of-chapter quiz[[end-of-chapter-quiz]]