Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: get cds overlap given chromosome, start, + stop #217

Merged
merged 11 commits into from
Nov 7, 2023
Merged

Conversation

korikuzma
Copy link
Member

@korikuzma korikuzma commented Nov 3, 2023

Address #216 for 0.1.x . This is needed for variation manuscript

Notes:

  • Add class (FeatureOverlap) for getting cds overlap
    • Only supports GRCh38 input
    • Returns VRS Sequence Locations
  • get_mane_summary --> get_mane
    • Now can additionally download MANE RefSeq GFF file

@korikuzma korikuzma added enhancement New feature or request priority:high High priority labels Nov 3, 2023
@korikuzma korikuzma self-assigned this Nov 3, 2023
identifier: Optional[str] = None,
residue_mode: ResidueMode = ResidueMode.RESIDUE,
) -> Optional[Dict]:
"""Get feature overlap for GRCh38 genomic data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would someone (Kori? @austinant?) be willing -- for my sake -- to add a descriptive sentence or two here? I can also paraphrase something from the manuscript if there's a specific relevant section

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still a mess but hopefully more descriptive)

For input CNV, specified as a sequence location by chromosome, start, stop, this function returns all coding exons (CDS regions) with which the variant has nonzero base pair overlap. Result is a dictionary keyed by genes which overlap with the input variant, mapping to a list of the overlapping exons in each gene with the beginning and end of the variant's overlap with each.

Copy link
Member Author

@korikuzma korikuzma Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsstevenson updated. Let me know if that clears things up for you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@austinant how about

        """Given GRCh38 genomic data, find the overlapping MANE features (gene and cds).
        The genomic data is specified as a sequence location by `chromosome`, `start`,
        `end`. All CDS regions with which the input sequence location has nonzero base
        pair overlap will be returned.

        :return: MANE feature (gene/cds) overlap data represented as a dict. The
            dictionary will be keyed by genes which overlap the input sequence location.
            Each gene contains a list of the overlapping CDS regions with the beginning
            and end of the input sequence location's overlap with each
            {
                gene: {
                    'cds': VRS Sequence Location
                    'overlap': VRS Sequence Location
                }
            }
        """

vrs seq loc, inter-residue, docstring
@korikuzma
Copy link
Member Author

Need to look over this with fresh brain/eyes before re-requesting reviews

@korikuzma korikuzma requested a review from ahwagner November 6, 2023 12:09
@korikuzma
Copy link
Member Author

Need to look over this with fresh brain/eyes before re-requesting reviews

Ready for review!

@wesleygoar
Copy link

Good catch @ahwagner

@korikuzma
Copy link
Member Author

Good catch @ahwagner

We have to keep @ahwagner on his toes

identifier: Optional[str] = None,
residue_mode: ResidueMode = ResidueMode.RESIDUE,
) -> Optional[Dict]:
"""Get feature overlap for GRCh38 genomic data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still a mess but hopefully more descriptive)

For input CNV, specified as a sequence location by chromosome, start, stop, this function returns all coding exons (CDS regions) with which the variant has nonzero base pair overlap. Result is a dictionary keyed by genes which overlap with the input variant, mapping to a list of the overlapping exons in each gene with the beginning and end of the variant's overlap with each.

@korikuzma korikuzma merged commit 26f57a1 into 0.1.x Nov 7, 2023
2 checks passed
@korikuzma korikuzma deleted the issue-216-1-x branch November 7, 2023 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:high High priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants