Skip to content

Commit

Permalink
add callouts, fix exercises, formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
fredjaya committed Sep 24, 2024
1 parent ebcca62 commit 2b433b8
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 30 deletions.
13 changes: 13 additions & 0 deletions docs/part2/01_salmon_idx.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ It contains:
* The empty `output:` block for us to define the output data for the process.
* The `script:` block prefilled with the command that will be executed.

!!! info

The process [`script`](https://www.nextflow.io/docs/latest/process.html#script)
block is executed as a Bash script by default. In Part 2 of the workshop, we will
only be using Nextflow variables within the `script` block.

Next, we will edit the `input` and `output` definitions to match the specific
data and results for this process. In the `00_index.sh` script, the relevant
information is:
Expand Down Expand Up @@ -285,6 +291,13 @@ params.transcriptome_file = "$projectDir/data/ggal/transcriptome.fa"

We will use [`$projectDir`](https://www.nextflow.io/docs/latest/script.html#configuration-implicit-variables) to indicates the directory of the `main.nf` script. This is defined by Nextflow as the directory where the `main.nf` script is located.

!!! info "The `params` and `process` names do not need to match!"

In the `INDEX` process, we defined the input as a path called `transcriptome`, whereas
the parameter is called `transcriptome_file`. These do not need to be identical names
as they are called in different scopes (the `INDEX` process scope, and `workflow` scope,
respectively).

Recall that [parameters](https://www.nextflow.io/docs/latest/module.html#module-parameters)
are inputs and options that can be customised when the workflow is
executed. They allow you to control things like file paths and options for
Expand Down
126 changes: 96 additions & 30 deletions docs/part2/02_fastqc.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,9 +293,9 @@ definition of `tuple val(sample_id), path(reads_1), path(reads_2)`:
[gut, /home/setup2/hello-nextflow/part2/data/ggal/gut_1.fq, /home/setup2/hello-nextflow/part2/data/ggal/gut_2.fq]
```

!!! quote "Checkpoint"
!!! quote "How's it going?"

Zoom react Y/N
Once you have run the workflow, select the **"Yes"** react on Zoom.

Next, we need to assign the channel we create to a variable so it can be passed to the `FASTQC`
process. Assign to a variable called `reads_in`, and remove the `.view()`
Expand Down Expand Up @@ -363,45 +363,111 @@ executor > local (1)
If you inspect `results/fastqc_gut_logs` there is an `.html` and `.zip` file
for each of the `.fastq` files.

> Need to revisit the Advanced exercise
!!! example "Advanced exercise"

??? example "Advanced exercise"
This advanced exercise walks through inspecing the output of the intermediate
operators in the `reads_in` channel:

Inspect what the `.fromPath()` and `.splitCsv()` commands do by using `.view()`
- `Channel.fromPath`
- `.splitCsv`

The current workflow block should look like:

```groovy title="main.nf"
// Define the workflow
workflow {
Channel
.fromPath(params.reads)
.view()

index_ch = INDEX(params.transcriptome_file)
```

```console title="Output"
Launching `main.nf` [hungry_lalande] DSL2 - revision: 587b5b70d1

[de/fef8c4] INDEX [100%] 1 of 1, cached: 1 ✔
/home/setup2/hello-nextflow/part2/data/samplesheet.csv
// Run the index step with the transcriptome parameter
INDEX(params.transcriptome_file)

```

```groovy title="main.nf"
workflow {
Channel
.fromPath(params.reads)
// Define the fastqc input channel
reads_in = Channel.fromPath(params.reads)
.splitCsv(header: true)
.view()
.map { row -> [row.sample, file(row.fastq_1), file(row.fastq_2)] }

index_ch = INDEX(params.transcriptome_file)
// Run the fastqc step with the reads_in channel
FASTQC(reads_in)
}
```

**`Channel.fromPath`**

1. In the workflow scope, comment out the lines for `.splitCsv`, `.map`, and `FASTQC()`
2. Add `.view()` on the line after `Channel.fromPath` and before the commented `.splitCsv`
3. Run the workflow with `-resume`

??? note "Solution"

```groovy title="main.nf" hl_lines="9-11 14"
// Define the workflow
workflow {
// Run the index step with the transcriptome parameter
INDEX(params.transcriptome_file)
// Define the fastqc input channel
reads_in = Channel.fromPath(params.reads)
.view()
//.splitCsv(header: true)
//.map { row -> [row.sample, file(row.fastq_1), file(row.fastq_2)] }
// Run the fastqc step with the reads_in channel
//FASTQC(reads_in)

}
```
The output of the `Channel.fromPath(params.reads)` step produces a path to the samplesheet:

```console title="Output"
Launching `main.nf` [hungry_lalande] DSL2 - revision: 587b5b70d1
[de/fef8c4] INDEX [100%] 1 of 1, cached: 1 ✔
/home/user1/part2/data/samplesheet.csv
```

**`.splitCsv`**

1. In the workflow scope, *un*comment the line for `.splitCsv`
2. Move `.view()` to the line after `.splitCsv` (before the commented `.map` line)
3. Run the workflow with `-resume`

```console title="Output"
Launching `main.nf` [tiny_yonath] DSL2 - revision: 22c2c9d28f
[de/fef8c4] INDEX | 1 of 1, cached: 1 ✔
[sample:gut, fastq_1:data/ggal/gut_1.fq, fastq_2:data/ggal/gut_2.fq]

```
??? note "Solution"

```groovy title="main.nf" hl_lines="9-10"
// Define the workflow
workflow {
// Run the index step with the transcriptome parameter
INDEX(params.transcriptome_file)
// Define the fastqc input channel
reads_in = Channel.fromPath(params.reads)
.splitCsv(header: true)
.view()
//.map { row -> [row.sample, file(row.fastq_1), file(row.fastq_2)] }
// Run the fastqc step with the reads_in channel
//FASTQC(reads_in)

}
```

`.splitCsv` takes the path from `.fromPath` and reads the header and data line as a tuple.
Each element in the tuple is named by the header in the correspoding column in the samplesheet:

```console title="Output"
Launching `main.nf` [tiny_yonath] DSL2 - revision: 22c2c9d28f
[de/fef8c4] INDEX | 1 of 1, cached: 1 ✔
[sample:gut, fastq_1:data/ggal/gut_1.fq, fastq_2:data/ggal/gut_2.fq]
```

The `.map` step takes that tuple and formats it into the tuple that is emitted by `reads_in`.

Before proceeding, ensure to *un*comment the `.map` and `FASTQC` lines, and remove `.view()`.


!!! abstract "Summary"

Expand Down
4 changes: 4 additions & 0 deletions docs/part2/03_quant.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,10 @@ process QUANTIFICATION {

You have just defined a process with multiple inputs!

!!! quote "How's it going?"

Once you have defined the `process` block, select the **"Yes"** react on Zoom.

### 5. Call the process in the `workflow` scope

Recall that the inputs for the `QUANTIFICATION` process are emitted by the
Expand Down

0 comments on commit 2b433b8

Please sign in to comment.