A useful tool for quickly finding viable probes for any DNA sequence. Want to perform an experiment like probe hybridisation capture on a specific region of DNA but can't be bothered to spend time manually blasting sequences, checking Tm, and sequence quality? Here is a script that allows you to rapidly generate candidate probes for target enrichment.
- Customisable: Simply input your sequence of interest (tested up to 10kb but likely can input much larger) and adjust the parameters:
- Adjust: Probe length, distance between probes, ideal Tm, allowed GC content, homopolymer length and more!
- Probe Seek takes your sequence and breaks it into small chunks (default: 50 bp with 25 bp overlap between chunks) and performs blastn locally to identify all viable regions that are unique or semi-unique (follow directions online or ask ChatGPT to set this up on your computer).
- Once all unique regions are mapped (green), probe design is performed by scanning across your viable sequences, adjusting length and spacing to meet your desired probe parameters. The script automatically alternates between forward and reverse complimentary with your target.
- The final probes are exported, along with a pretty little chunk uniqueness map showing your probes.
Green = unique (ideal, script checks Tm, GC, etc), Orange = 2-3 instances in reference genome (script avoids on 3' end of probe (30 bp from 3' end) and also must be <50% of probe sequence), Red: > 3 instances (script avoids), Purple = 0 instances (avoids) <-- happens if you have a deletion/indel/wrong reference genome
Forward probes = black, Reverse probes = blue
Blast Search:
- I used the hg.38 human genome but you could input any genome you want to blast your sequence against
- Evalue: 1e-5
- word_size: 11
- Dust: "no"
- Chunk size: 50
- Chunk overlap: 25
Probe design:
- Probe length: 100 bp
- Probe spacing 250 bp
- Probe Melting Temperature: 75-80ºC based roughly on NEB Tm Calculator Q5 Hotstart polymerase (this can easily be adjusted for your desired kit, just modify mt.Tm_NN
- GC content: 40-60%
- Max homopolymer length: 5
If you have trouble designment probes:
- Double check your Tm range (might not be able to find viable sequences in your temperature range)
- Adjust your probe size and spacing
- Check your Blastn parameters to make sure sequences are being categorised correctly as unique
- Flip your sequence around and check the reverse compliment (usually finds almost identical probes).
- J. Alexander Chalk
Happy probing!