-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xORMQR and xLARFB discrepancy with documentation #1011
Comments
For LARFB, you are correct. We could change: "The elements equal to 1 are not stored; the corresponding array elements are modified but restored on exit. The rest of the array is not used." to "The upper triangular part of V (including its diagonal) is not referenced." Thanks. Feel welcome to provide a PR for LARFB. For ORMQR, you are correct too. If we use the "unblocked" version of the code, we use ORM2R and then "the diagonal elements of A are modified but restored on exit". And then after line 96 in ORMQR write "A is modified by the routine but restored on exit." Also, in ORM2R and ORMQR, A could be labelled as "input/output" and not "input". Well, I am not sure about this. A is the same in input and in output, but it is changed and that would help for thread safety. However we should really "fix" ORM2R (by fixing LARF) so that LARF does not require the "1" on the diagonal of V so that ORM2R does not "modify and restore on exit" entries of V. Maybe we create a new routine for this, such as LARF1. Or just change LARF proper. Another fix for ORMQR would be to remove the call to ORM2R and always call the "blocked code". So remove IF( NB.LT.NBMIN .OR. NB.GE.K ) THEN
*
* Use unblocked code
*
CALL DORM2R( SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK,
$ IINFO )
ELSE
*
* Use blocked code
*
***** [ KEEP THIS PART ] *****
END IF So that we never use ORM2R in ORMQR. That would be do the trick. If you want to propose a PR for the documentation of ORMQR, this is OK. My preference is to fix LARF, I can work on this but this will have to wait for a month or so. |
…tation fix LARFB documentation, #1011
@langou, about ORM2R and LARF. Looks like the simple solution is this:
Also, I think that removing of unblocked algorithm ORM2R from ORMQR is not so bad idea. But, may be there are some reasons that can damaged some tests. I don't know. |
Thanks for sharing ideas. Some comments: () Increasing necessary workspace is not a good idea in general and we have been reluctant to do it for backward compatibility. () ORM2R will be fixed if we fix LARF. (*) For LARF, we should write a version that assumes that the first or the last component of v is a 1. LARF1F (1 first) and LARF1L (1 last) for example. And then propagate to ORM2R. I will try to get this done by this Summer.
This is a good idea but I am leery to go in this direction. This kind of change is up to people tuning the code. I think the code as is fine. The condition to call ORM2R (inside ORMQR) is Line 275 in 4a8ed6b
and I agree that this can be changed. This should not damage any test. (As long as this is done correctly.) I agree that changing this line of code could bring some speedup. That being said, I do not think we should change this for now. Julien. |
@jprhyne just pushed his implementation in #1020 I am sorry for the duplication of work. I did not expect it. It seems like both implementations are suffering from corner cases. While we need to get these corner cases working, I do not want to create stress and start a race on who finishes first. It would be great if both of you are able contribute to LAPACK through your work. Maybe we can agree that you will work on LARF1F, and the other work on LARF1L. Or someone does C and S and the other one does Z and D. If that's acceptable. I think the split {C,S} for one and {Z,D} for the other is the best for now. The two codes (@jprhyne's #1020 and @EduardFedorenkov #1019) look similar-ish that I feel it is possible to merge both work and it would be great to have both of you contributing to the software. In any case, it is great to have contributions and a start of resolve for the issue. Thanks to both of you for the work. |
Hi, @langou and @jprhyne! Great idea. I support your suggestions! I will try to find the solution for corner cases as well as @jprhyne |
Related, not necessarily urgent, but it would be good to push the change to more subroutines than only ORM2R. For example GEQR2 as the same "guilty" lines: Lines 184 to 191 in dd2e5ef
Let us fix ORM2R first. I did a grep of LARF and found all these routines: Note: the xOR prefixes are for real arithmetic {S,D} and so the corresponding complex arithmetic subroutines would be under xUN for complex arithmetic {C,Z}. |
Hi everyone, just a passing by comment for this since I use these extensively in the guts of SciPy; are we expecting any breaking changes in the behavior of Very tiny markup nitpick; @EduardFedorenkov if you put the relevant issue number in the PR description and GitHub will associate the relevant PR in the commit history of the file or in git blame. Then you don't need the issue number in the commit messages which typically paint the issues as above. |
No. Behavior of 'un/orm2r' should be unchanged. 'un/orm2r' will be thread-safe though, so you will be able to call in shared-memory in parallel a few 'un/orm2r' at once on the same piece of A (after this PR, A will be read only) to update multiple pieces of C at once. As of now, this is not permitted. Since, currently, values of A are changed during the execution, albeit the values are unchanged in output. In addition, 'un/orm2r' will be easier to read and look cleaner. The impacted routine should not be only 'un/orm2r'. The list of potentially impacted routines is at: #1011 (comment) But no behavior should change. |
Why are you mentioning 'larfgp'? (1) Are you using it? - I am being curious, (2) 'larfgp' generate a Householder reflector. So 'larfgp' and 'larfg' are not relevant to this change. They generate a Householder reflector. We are working on 'larf' which uses the Householder reflector. (3) Maybe you meant 'larf'? If that is the case, then yes and no. In short the plan is to leave 'larf' as is but introduce new routines 'larf1f' and 'larf1l' to replace 'larf'. I think if we do it right 'larf' will not be used any longer by LAPACK, but the plan is to leave 'larf' in LAPACK. |
Apologies, muscle memory I guess. I meant |
I think this PR for {S, C} is about 95% done. I look over one more time and add some small fixes. I have some notes to discus:
|
Related to @EduardFedorenkov point about lastv with larf1l. I have 2 questions/ideas about how to proceed. We could edit: fixed incorrect file names |
After some prototyping, I don't think option 2 is the best idea. I'm not too familiar with how the negative incv values should behave, and the ideas that I can think of to use this are harder to read than the current implementation so I'll be not working on that anymore, but maybe someone can look at it at some point and revisit this idea. Until then I'll be working on the 1l with a "firstv" variable instead of a lastv edit: Fixed formatting accidentally referencing another post |
I had not thought about this. Oh my. @jprhyne is making a good point and proposing to introduce a new feature that is not in LARF and that is an advantage of this LARF1L / LARF1F. The current LARF code always scan and look for zeros from the bottom (last entry of the vector). However when the "one" is at the bottom, it makes more sense to scan from start of the vector. What @jprhyne is proposing would be a new feature. (If I understand correctly.) And this makes sense. |
I think that the better way is to change the set of the arguments in LARF1F and LARF1L. What about to add parameter that store index in which ONE is assumed to be. No need to compute lastv or firstv in this case. What do you think @jprhyne, @langou? Because even the current realization of LARF1F failed the tests if in place where we expect ONE will be the pure ZERO. |
I like this idea, but I don't see how it will be different for larf1f The current idea behind my attempt is to assume v(1) = 1 without referencing it. Something isn't playing nice with that scheme and is probably a simple indexing issue with the lastv scanning or something else that I need to just sit down and go through slowly with many cases, so adding another parameter shouldn't change anything about the algorithm itself. We would just do
or maybe
And later on we would instead of referencing Which is exactly what the form of the Then this raises another question of since we're adding another parameter why not have a character input |
…rf1l-in-lapack develop DLARF1F and implement in ORM2R, #1011
Issue resolved by #1020 and #1019. Thanks @EduardFedorenkov and @jprhyne |
Let's suppose that we have a task to apply vectors from A (returned by xGEQRF) to the set of matrices by xORMQR in parallel. I found that current LAPACK has discrepancy with documentation that introduce doubts when you try to solve this task.
Maybe it's worth doing some corrections in LAPACK documentations for xLARFB and xORMQR?
The text was updated successfully, but these errors were encountered: