Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mauriceweber authored Nov 20, 2023
1 parent c2d4707 commit 26c5417
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,12 @@ have a docker and apptainer installation.
The pipeline is composed of three steps, namely 1) preparing artifacts, 2) computing quality signals, and 3)
deduplication.

**Important:** In case you are not running steps (1) and (2) with the provided scripts (i.e., docker containers built with the provided Dockerfile), make sure to set the `PYTHONHASHSEED` environment variable to a consistent value (e.g., 42) using
```bash
export PYTHONHASHSEED=42
```
This is to ensure consistency of hash functions used in the computation of DSIR weights.

### 1. Create Artifacts

This part of the pipeline creates the artifacts that are used in the subsequent steps. This includes building quality
Expand Down

0 comments on commit 26c5417

Please sign in to comment.