Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small variations in sonalyze output in runs on the same data #600

Open
lars-t-hansen opened this issue Sep 17, 2024 · 1 comment
Open

Small variations in sonalyze output in runs on the same data #600

lars-t-hansen opened this issue Sep 17, 2024 · 1 comment
Labels
component:sonalyze sonalyze/* task:bug Something isn't working task:ux User experience, both web and command line

Comments

@lars-t-hansen
Copy link
Collaborator

There are two variations I've observed:

  • records are transposed in the output, ie the sort order is not total
  • there are small variations in the calculated results, I see this in peak RAM use for jobs for example

The first one is more a matter of taste than anything, but it comes down to improving the predicate used by sortableSummaries in jobs/print.go. For testing purposes we can pass the output through sort and we'll have something more stable.

The second one is more worrisome. Here's the diff from two adjacent runs of jobs (the outputs have been post-sorted to avoid the first problem):

2139c2139
< 2141439   ec-veronsua                 0d 3h35m   int-3    1690     2866      25       34        0        0         0           0            STAR,gzip,sh,sh <defunct>
---
> 2141439   ec-veronsua                 0d 3h35m   int-3    1690     2865      25       34        0        0         0           0            STAR,gzip,sh,sh <defunct>
7692c7692
< 610538    ec-bthj                     0d 1h50m   c1-11    840      867       20       22        0        0         0           0            features,kromosynth,kromosynth-gRPC,kromosynth-rend,quality_mood
---
> 610538    ec-bthj                     0d 1h50m   c1-11    840      866       20       22        0        0         0           0            features,kromosynth,kromosynth-gRPC,kromosynth-rend,quality_mood
57631c57631
< 685540    ec-milas                    0d 2h15m   c1-26    1591     1917      60       79        0        0         0           0            java,mutect2_v3.sh
---
> 685540    ec-milas                    0d 2h15m   c1-26    1591     1918      60       79        0        0         0           0            java,mutect2_v3.sh
57721c57721
< 685670    ec-milas                    1d11h30m   c1-9     3439     50489     144      226       0        0         0           0            java,mutect2_v3.sh,perl
---
> 685670    ec-milas                    1d11h30m   c1-9     3439     50490     144      226       0        0         0           0            java,mutect2_v3.sh,perl
60543c60543
< 691952    ec-edwardfb                 0d14h10m   gpu-5    1945     4249      84       120       350      376       94          94           python
---
> 691952    ec-edwardfb                 0d14h10m   gpu-5    1945     4250      84       120       350      376       94          94           python
70421c70421
< 708211    ec-milas                    0d15h30m   c1-22    3301     37799     116      207       0        0         0           0            java,mutect2_v3.sh,perl
---
> 708211    ec-milas                    0d15h30m   c1-22    3301     37800     116      207       0        0         0           0            java,mutect2_v3.sh,perl
72660c72660
< 711767    ec-koenvg                   0d10h 0m   c1-19    567      631       30       31        0        0         0           0            python,python3.11
---
> 711767    ec-koenvg                   0d10h 0m   c1-19    567      630       30       31        0        0         0           0            python,python3.11

(Command: sonalyze jobs -data-dir ~/sonar/data/fox.educloud.no -u - -from 2024-05-01 -to 2024-06-30)

There's usually a small difference - one ULP - in the memory readings. This is probably some kind of numeric instability, which may in turn come down to the order in which records are processed, but it would be good to verify that, and if so, to fix it.

@lars-t-hansen lars-t-hansen added task:bug Something isn't working task:ux User experience, both web and command line component:sonalyze sonalyze/* labels Sep 17, 2024
@lars-t-hansen
Copy link
Collaborator Author

The best idea I've had so far is that the field list should imply a sort order, and output is always sorted according to all the fields. This is not a pancea: field selection could be such that two records are indistinguishable. So the bigger hammer is to sort according to the requested fields, and then sort according to all the other fields, in some predictable order.

lars-t-hansen pushed a commit that referenced this issue Oct 29, 2024
lars-t-hansen added a commit that referenced this issue Oct 29, 2024
lars-t-hansen added a commit that referenced this issue Oct 29, 2024
For #600 - print cluster aliases in deterministic order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:sonalyze sonalyze/* task:bug Something isn't working task:ux User experience, both web and command line
Projects
None yet
Development

No branches or pull requests

1 participant