Consider not providing the names for top_markers to the model #2

maxim-h · 2024-07-29T16:50:00Z

Hello!

I was quite curious about how your framework would perform with our data. So the first thing I did was to retrieve the markers of already manually annotated cell populations. And I was very happy with the performance until I read the Reasons and realized that it seems to be making inferences not fully explained by the markers provided.

This led me to realize that during the creation of the prompt the names of the top_genes are being used. So if the markers were identified based on a pre-existing annotation the model will see those labels in the prompt.

ceLLama/R/ceLLama.R

Lines 25 to 29 in 2af357a

    
           annotation_data <- lapply(names(top_genes), function(cluster) { 
        
             up_genes <- paste(top_genes[[cluster]]$up, collapse = ", ") 
        
             down_genes <- paste(top_genes[[cluster]]$down, collapse = ", ") 
        
             prompt <- paste( 
        
               "This cell cluster (", cluster, ") has up-regulated genes:", up_genes,

Perhaps it might be better to anonymize the cluster names within the function or point out in the tutorial that the marker.list must have anonymized names.

The text was updated successfully, but these errors were encountered:

eonurk · 2024-07-29T17:19:25Z

I am not sure I follow, but might also be insomnia.

maxim-h · 2024-07-29T17:31:18Z

Hehe, no worries.

We have a seurat object with manually annotated cell types in the field Manual_annotation.
Here's what I did.

Idents(seurat) <- "Manual_annotation"

markers <- FindAllMarkers(seurat, min.pct = .5)
markers.list <- split(markers, markers$cluster)
## at this point `names(markers.list)` has the manually annotated cluster names. 
## Which is inserted into the prompt in the lines that I linked in the last message.

res <- ceLLama(markers.list, temperature = 0, seed = 101, n_genes = 30)

As a results the prompt contains our manual annotation, so the model just comes up with a "random" justification for whatever labels we already provided to it.

eonurk · 2024-07-29T18:04:42Z

Oh I see. That's cheating! 😄 I will think about this, maybe overriding could be an option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider not providing the names for top_markers to the model #2

Consider not providing the names for top_markers to the model #2

maxim-h commented Jul 29, 2024

eonurk commented Jul 29, 2024

maxim-h commented Jul 29, 2024 •

edited

Loading

eonurk commented Jul 29, 2024

Consider not providing the names for top_markers to the model #2

Consider not providing the names for top_markers to the model #2

Comments

maxim-h commented Jul 29, 2024

eonurk commented Jul 29, 2024

maxim-h commented Jul 29, 2024 • edited Loading

eonurk commented Jul 29, 2024

maxim-h commented Jul 29, 2024 •

edited

Loading