Skip to content

Commit

Permalink
Updated images and banners
Browse files Browse the repository at this point in the history
  • Loading branch information
nataliaElv committed Nov 22, 2024
1 parent 20f7313 commit 944b877
Show file tree
Hide file tree
Showing 7 changed files with 46 additions and 8 deletions.
7 changes: 7 additions & 0 deletions chapters/en/chapter10/1.mdx
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
# Introduction to Argilla[[introduction-to-argilla]]

<CourseFloatingBanner
chapter={10}
classNames="absolute z-10 right-0 top-0"
/>

In Chapter 5 you learnt how to build a dataset using the 🤗 Datasets library and in Chapter 6 you explored how to fine-tune models for some common NLP tasks. In this chapter, you will learn how to use [Argilla](https://argilla.io) to **annotate and curate datasets** that you can use to train and evaluate your models.

The key to training models that perform well is to have high-quality data. Although there are some good datasets in the Hub that you could use to train and evaluate your models, these may not be relevant for your specific application or use case. In this scenario, you may want to build and curate a dataset of your own. Argilla will help you to do this efficiently.

<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter10/signin-hf-page.png" alt="Argilla sign in page."/>

With Argilla you can:

- turn unstructured data into **structured data** to be used in NLP tasks.
Expand Down
9 changes: 8 additions & 1 deletion chapters/en/chapter10/2.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# Set up your Argilla instance
# Set up your Argilla instance[[set-up-your-argilla-instance]]

<CourseFloatingBanner chapter={10}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter10/section2.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter10/section2.ipynb"},
]} />

To start using Argilla, you will need to set up your own Argilla instance first. Then you will need to install the Python SDK so that you can manage Argilla using Python code.

Expand Down
11 changes: 9 additions & 2 deletions chapters/en/chapter10/3.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# Load your dataset to Argilla
# Load your dataset to Argilla[[load-your-dataset-to-argilla]]

<CourseFloatingBanner chapter={10}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter10/section3.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter10/section3.ipynb"},
]} />

Depending on the NLP task that you're working with and the specific use case or application, your data and the annotation task will look differently. For this section of the course, we'll use [a dataset collecting news](https://huggingface.co/datasets/SetFit/ag_news) to complete two tasks: a text classification on the topic of each text and a token classification to identify the named entities mentioned.

Expand Down Expand Up @@ -33,7 +40,7 @@ We can now think about the settings of our dataset in Argilla. These represent t
from datasets import load_dataset

data = load_dataset("SetFit/ag_news", split="train")
data.features()
data.features
```

These are the features of our dataset:
Expand Down
9 changes: 7 additions & 2 deletions chapters/en/chapter10/4.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Annotate your dataset
# Annotate your dataset[[annotate-your-dataset]]

<CourseFloatingBanner
chapter={10}
classNames="absolute z-10 right-0 top-0"
/>

Now it is time to start working from the Argilla UI to annotate our dataset.

Expand All @@ -25,7 +30,7 @@ Sometimes, you want to have more than one submitted response per record, for exa
When you open your dataset, you will realize that the first question is already filled in with some suggested labels. That's because in the previous section we mapped our question called `label` to the `label_text` column in the dataset, so that we simply need to review and correct the already existing labels:

<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter10/argilla_initial%20dataset.png" alt="Screenshot of the dataset in Argilla."/>
<img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter10/argilla_initial_dataset.png" alt="Screenshot of the dataset in Argilla."/>

For the token classification, we'll need to add all labels manually, as we didn't include any suggestions. This is how it might look after the span annotations:

Expand Down
9 changes: 8 additions & 1 deletion chapters/en/chapter10/5.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# Use your annotated dataset
# Use your annotated dataset[[use-your-annotated-dataset]]

<CourseFloatingBanner chapter={10}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter10/section5.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter10/section5.ipynb"},
]} />

We will learn now how to export and use the annotated data that we have in Argilla.

Expand Down
7 changes: 6 additions & 1 deletion chapters/en/chapter10/6.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Argilla, check!
# Argilla, check![[argilla-check]]

<CourseFloatingBanner
chapter={10}
classNames="absolute z-10 right-0 top-0"
/>

That's all! Congrats! 👏

Expand Down
2 changes: 1 addition & 1 deletion chapters/en/chapter10/7.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# End-of-chapter quiz[[end-of-chapter-quiz]]

<CourseFloatingBanner
chapter={9}
chapter={10}
classNames="absolute z-10 right-0 top-0"
/>

Expand Down

0 comments on commit 944b877

Please sign in to comment.