You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For CodonPartition, we error if sequence length does not match partition, however for other DiscretePartitions we don't (a warning is commented out).
Optimization
If we decide that we want to enforce that sequence length matches with partition, some optimizations can be made, since we can now do in-place modifications. Here is my suggestion for such an optimization for CodonPartition (the idea is applicable to other DiscretePartitions) :
functionobs2partition!(dest::CodonPartition, seq::String; code = universal_code)
problem_codons = String[]
ifmod(length(seq), 3) !=0error("Codon sequences must be divisible by 3")
endiflength(seq) /3!= dest.sites
error("Sequence length does not match partition")
end@viewsfor j inaxes(dest.state, 2)
c = seq[3*(j-1)+1:3*(j-1)+3]
cod_ind =get(code.string2sense, c, -1)
if cod_ind ==-1fill!(dest.state[:, j], 1.0)
push!(problem_codons, c)
elsefill!(dest.state[:, j], 0.0)
dest.state[cod_ind, j] =1.0endendreturncountmap(problem_codons)
end
And here is a benchmark of this suggestion
#Simulate a sequencefunctionsim_seq(n)
nucleotides =collect(MolecularEvolution.nucstring)
returnjoin([rand(nucleotides) for _ =1:(3*n)])
end
n =100_000
seq =sim_seq(n)
cpart =CodonPartition(n)
@benchmarkobs2partition!(cpart, seq)
Main
BenchmarkTools.Trial: 387 samples with 1 evaluation.
Range (min … max): 11.382 ms … 15.878 ms ┊ GC (min … max): 0.00% … 10.13%
Time (median): 12.950 ms ┊ GC (median): 9.26%
Time (mean ± σ): 12.944 ms ± 542.715 μs ┊ GC (mean ± σ): 8.53% ± 2.75%
▄▄▇█ ▃▁▄▂▃
▃▁▃▃▄▃▂▃▃▄▂▄▁▁▁▁▃▁▁▃▁▃▃▇▅▄▄▆████▇█████▇▇▆▄▃▄▄▃▂▃▂▃▄▁▂▃▁▁▁▁▁▂ ▃
11.4 ms Histogram: frequency by time 14.3 ms <
Memory estimate: 54.29 MiB, allocs estimate: 200012.
Suggestion
BenchmarkTools.Trial: 1461 samples with 1 evaluation.
Range (min … max): 3.287 ms … 4.815 ms ┊ GC (min … max): 0.00% … 28.54%
Time (median): 3.412 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.420 ms ± 79.042 μs ┊ GC (mean ± σ): 0.13% ± 1.57%
▄▅█▇▅▃▃▃▁
▂▁▁▁▂▁▁▁▁▁▁▂▁▁▁▂▂▂▄▃▄▅█████████▇▅▅▄▃▃▂▂▂▂▂▂▁▁▂▂▂▁▁▁▁▂▁▁▁▂▂ ▃
3.29 ms Histogram: frequency by time 3.56 ms <
Memory estimate: 230.77 KiB, allocs estimate: 4624.
The text was updated successfully, but these errors were encountered:
Good. This seems like an improvement on performance, and preventing possible bugs. If there is a use case where the sequence length doesn't match the partition, then users should populate their own partition anyway.
Incosistencies
dest.scaling = zeros(length(seq))
not occuring for CodonPartition but for other DiscretePartitions.Optimization
If we decide that we want to enforce that sequence length matches with partition, some optimizations can be made, since we can now do in-place modifications. Here is my suggestion for such an optimization for
CodonPartition
(the idea is applicable to otherDiscretePartition
s) :And here is a benchmark of this suggestion
Main
Suggestion
The text was updated successfully, but these errors were encountered: