Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xORMQR and xLARFB discrepancy with documentation #1011

Closed
EduardFedorenkov opened this issue Apr 18, 2024 · 17 comments
Closed

xORMQR and xLARFB discrepancy with documentation #1011

EduardFedorenkov opened this issue Apr 18, 2024 · 17 comments

Comments

@EduardFedorenkov
Copy link
Contributor

EduardFedorenkov commented Apr 18, 2024

Let's suppose that we have a task to apply vectors from A (returned by xGEQRF) to the set of matrices by xORMQR in parallel. I found that current LAPACK has discrepancy with documentation that introduce doubts when you try to solve this task.

  1. xORMQR : According to documentation the unblocked code branch - xORM2R modify diagonal of A matrix and restore it on the exit. But xORMQR has no such remark about A matrix. So, in this case we can't use xORMQR in parallel because of rewriting the diagonal of A.
  2. xLARFB : According to documentation vectors in V are modified and restore on the exit. But according to the code GEMM and TRMM with diag='Unit' don't touch the diagonal elements of V.
    image

Maybe it's worth doing some corrections in LAPACK documentations for xLARFB and xORMQR?

@langou
Copy link
Contributor

langou commented Apr 18, 2024

For LARFB, you are correct. We could change: "The elements equal to 1 are not stored; the corresponding array elements are modified but restored on exit. The rest of the array is not used." to "The upper triangular part of V (including its diagonal) is not referenced." Thanks. Feel welcome to provide a PR for LARFB.

For ORMQR, you are correct too. If we use the "unblocked" version of the code, we use ORM2R and then "the diagonal elements of A are modified but restored on exit". And then after line 96 in ORMQR write "A is modified by the routine but restored on exit."

Also, in ORM2R and ORMQR, A could be labelled as "input/output" and not "input". Well, I am not sure about this. A is the same in input and in output, but it is changed and that would help for thread safety.

However we should really "fix" ORM2R (by fixing LARF) so that LARF does not require the "1" on the diagonal of V so that ORM2R does not "modify and restore on exit" entries of V. Maybe we create a new routine for this, such as LARF1. Or just change LARF proper.

Another fix for ORMQR would be to remove the call to ORM2R and always call the "blocked code". So remove

      IF( NB.LT.NBMIN .OR. NB.GE.K ) THEN
*
*        Use unblocked code
*
         CALL DORM2R( SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK,
     $                IINFO )
      ELSE
*
*        Use blocked code
*
***** [ KEEP THIS PART ] *****
      END IF

So that we never use ORM2R in ORMQR. That would be do the trick.

If you want to propose a PR for the documentation of ORMQR, this is OK. My preference is to fix LARF, I can work on this but this will have to wait for a month or so.

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 20, 2024
langou added a commit that referenced this issue May 21, 2024
@EduardFedorenkov
Copy link
Contributor Author

@langou, about ORM2R and LARF. Looks like the simple solution is this:

  1. ORM2R: double workspace for ORM2R. One part is for copy of V and another for storing C**T * V. Finaly remove code which modifies A.
  2. LARF: just add COPY in LARF to copy V inside work with something like work(1) = 1;.

Also, I think that removing of unblocked algorithm ORM2R from ORMQR is not so bad idea. But, may be there are some reasons that can damaged some tests. I don't know.

@langou
Copy link
Contributor

langou commented May 21, 2024

Hi @EduardFedorenkov,

about ORM2R and LARF. Looks like the simple solution is this:
(1) ORM2R: double workspace for ORM2R. One part is for copy of V and another for storing C**T * V. Finaly remove code which modifies A.
(2) LARF: just add COPY in LARF to copy V inside work with something like work(1) = 1;.

Thanks for sharing ideas. Some comments: () Increasing necessary workspace is not a good idea in general and we have been reluctant to do it for backward compatibility. () ORM2R will be fixed if we fix LARF. (*) For LARF, we should write a version that assumes that the first or the last component of v is a 1. LARF1F (1 first) and LARF1L (1 last) for example. And then propagate to ORM2R. I will try to get this done by this Summer.

Also, I think that removing of unblocked algorithm ORM2R from ORMQR is not so bad idea. But, may be there are some reasons that can damaged some tests. I don't know.

This is a good idea but I am leery to go in this direction. This kind of change is up to people tuning the code. I think the code as is fine. The condition to call ORM2R (inside ORMQR) is

IF( NB.LT.NBMIN .OR. NB.GE.K ) THEN

and I agree that this can be changed. This should not damage any test. (As long as this is done correctly.) I agree that changing this line of code could bring some speedup. That being said, I do not think we should change this for now.

Julien.

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 29, 2024
@langou
Copy link
Contributor

langou commented May 29, 2024

@jprhyne just pushed his implementation in #1020

I am sorry for the duplication of work. I did not expect it.

It seems like both implementations are suffering from corner cases.

While we need to get these corner cases working, I do not want to create stress and start a race on who finishes first. It would be great if both of you are able contribute to LAPACK through your work. Maybe we can agree that you will work on LARF1F, and the other work on LARF1L. Or someone does C and S and the other one does Z and D. If that's acceptable. I think the split {C,S} for one and {Z,D} for the other is the best for now.

The two codes (@jprhyne's #1020 and @EduardFedorenkov #1019) look similar-ish that I feel it is possible to merge both work and it would be great to have both of you contributing to the software.

In any case, it is great to have contributions and a start of resolve for the issue. Thanks to both of you for the work.

@EduardFedorenkov
Copy link
Contributor Author

EduardFedorenkov commented May 30, 2024

Hi, @langou and @jprhyne!
It's OK, let's work together!

Great idea. I support your suggestions! I will try to find the solution for corner cases as well as @jprhyne
let's split {Z,D} and {C,S}. Also, we need to align code. So, @jprhyne can review my PR and I can review #1020. When we all agree about code we can just merge #1019 with {Z,D} and #1020 with {C,S} for example.

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 30, 2024
@langou
Copy link
Contributor

langou commented May 30, 2024

Related, not necessarily urgent, but it would be good to push the change to more subroutines than only ORM2R. For example GEQR2 as the same "guilty" lines:

lapack/SRC/dgeqr2.f

Lines 184 to 191 in dd2e5ef

*
* Apply H(i) to A(i:m,i+1:n) from the left
*
AII = A( I, I )
A( I, I ) = ONE
CALL DLARF( 'Left', M-I+1, N-I, A( I, I ), 1, TAU( I ),
$ A( I, I+1 ), LDA, WORK )
A( I, I ) = AII

Let us fix ORM2R first.

I did a grep of LARF and found all these routines:
xGEBD2
xGEHD2
xGELQ2
xGEQL2
xGEQP3RK
xGEQR2
xGEQR2P
xGERQ2
xLAQP2
xLAQP2RK
xLAQR2
xLAQR3
xLARFX
xOPMTR
xORBDB
xORBDB1
xORBDB2
xORBDB3
xORBDB4
xORG2L
xORG2R
xORGL2
xORGR2
xORM2L
xORM2R
xORML2
xORMR2

Note: the xOR prefixes are for real arithmetic {S,D} and so the corresponding complex arithmetic subroutines would be under xUN for complex arithmetic {C,Z}.

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 31, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 5, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 5, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 6, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 6, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 6, 2024
@ilayn
Copy link
Contributor

ilayn commented Jun 6, 2024

Hi everyone, just a passing by comment for this since I use these extensively in the guts of SciPy; are we expecting any breaking changes in the behavior of un/orm2r and larfgp as a result?

Very tiny markup nitpick; @EduardFedorenkov if you put the relevant issue number in the PR description and GitHub will associate the relevant PR in the commit history of the file or in git blame. Then you don't need the issue number in the commit messages which typically paint the issues as above.

@langou
Copy link
Contributor

langou commented Jun 6, 2024

are we expecting any breaking changes in the behavior of 'un/orm2r'?

No. Behavior of 'un/orm2r' should be unchanged.

'un/orm2r' will be thread-safe though, so you will be able to call in shared-memory in parallel a few 'un/orm2r' at once on the same piece of A (after this PR, A will be read only) to update multiple pieces of C at once. As of now, this is not permitted. Since, currently, values of A are changed during the execution, albeit the values are unchanged in output.

In addition, 'un/orm2r' will be easier to read and look cleaner.

The impacted routine should not be only 'un/orm2r'. The list of potentially impacted routines is at: #1011 (comment)

But no behavior should change.

@langou
Copy link
Contributor

langou commented Jun 6, 2024

are we expecting any breaking changes in the behavior of 'larfgp' as a result?

Why are you mentioning 'larfgp'? (1) Are you using it? - I am being curious, (2) 'larfgp' generate a Householder reflector. So 'larfgp' and 'larfg' are not relevant to this change. They generate a Householder reflector. We are working on 'larf' which uses the Householder reflector. (3) Maybe you meant 'larf'? If that is the case, then yes and no. In short the plan is to leave 'larf' as is but introduce new routines 'larf1f' and 'larf1l' to replace 'larf'. I think if we do it right 'larf' will not be used any longer by LAPACK, but the plan is to leave 'larf' in LAPACK.

@ilayn
Copy link
Contributor

ilayn commented Jun 6, 2024

Apologies, muscle memory I guess. I meant larf indeed. Thanks for the heads up, I'll take a note of this.

@EduardFedorenkov
Copy link
Contributor Author

EduardFedorenkov commented Jun 7, 2024

Hello, @langou, @jprhyne!

I think this PR for {S, C} is about 95% done. I look over one more time and add some small fixes. I have some notes to discus:

  1. We talk with @jprhyne about zero vector as input for xLARF1F and decide that the smallest LASTV must be limited by 1. But unfortunately I have no idea how to deal with this issue in xLARF1L (I mean zero vector input)
  2. Interesting case is in xLARFX. Looks like changes in xLAEXC is needed in this case. And maybe xLARFX must be spited by xLARFX1F and xLARFX1L as well

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024
@jprhyne
Copy link
Contributor

jprhyne commented Jun 7, 2024

Related to @EduardFedorenkov point about lastv with larf1l. I have 2 questions/ideas about how to proceed.

We could
1: Since we are assuming v(lastv)=1 (in fact we guaranteed this was the case in the previous paradigm) we could replace it with a firstv parameter and start scanning at the start of v.
2: scrap the larf1l.f files and instead call larf1f.f with a negative incv (I have not really tested this so I am not too sure how feasible this is). This has the benefit of one less extra file but could lead to some headaches depending on if there are edge cases that don't play nice with this paradigm. My naive attempts of just dropping in dlarf1f where dlarf1l was and negating incv so far have added some failures but it may just need us to think differently about the incv < 0 case.

edit: fixed incorrect file names

@jprhyne
Copy link
Contributor

jprhyne commented Jun 9, 2024

After some prototyping, I don't think option 2 is the best idea. I'm not too familiar with how the negative incv values should behave, and the ideas that I can think of to use this are harder to read than the current implementation so I'll be not working on that anymore, but maybe someone can look at it at some point and revisit this idea.

Until then I'll be working on the 1l with a "firstv" variable instead of a lastv

edit: Fixed formatting accidentally referencing another post

@langou
Copy link
Contributor

langou commented Jun 10, 2024

the 1l with a "firstv" variable

I had not thought about this. Oh my.

@jprhyne is making a good point and proposing to introduce a new feature that is not in LARF and that is an advantage of this LARF1L / LARF1F. The current LARF code always scan and look for zeros from the bottom (last entry of the vector). However when the "one" is at the bottom, it makes more sense to scan from start of the vector. What @jprhyne is proposing would be a new feature. (If I understand correctly.) And this makes sense.

@EduardFedorenkov
Copy link
Contributor Author

I think that the better way is to change the set of the arguments in LARF1F and LARF1L.

What about to add parameter that store index in which ONE is assumed to be. No need to compute lastv or firstv in this case. What do you think @jprhyne, @langou?

Because even the current realization of LARF1F failed the tests if in place where we expect ONE will be the pure ZERO.

@jprhyne
Copy link
Contributor

jprhyne commented Jun 10, 2024

I like this idea, but I don't see how it will be different for larf1f

The current idea behind my attempt is to assume v(1) = 1 without referencing it. Something isn't playing nice with that scheme and is probably a simple indexing issue with the lastv scanning or something else that I need to just sit down and go through slowly with many cases, so adding another parameter shouldn't change anything about the algorithm itself. We would just do

DO WHILE(LASTV.GT.FIRSTV_ARGUMENT .AND. V(I) = ZERO)
   LASTV = LASTV-1
   I = I - INCV
END DO

or maybe

DO WHILE(LASTV.GE.FIRSTV_ARGUMENT+1 .AND. V(I) = ZERO)
   LASTV = LASTV-1
   I = I - INCV
END DO

And later on we would instead of referencing V(1) and C(1,1) we would have
V(1 + (FIRSTV_ARGUMENT-1) * INCV) and C(FIRSTV_ARGUMENT,1) (and the C parameters flipped when SIDE is changed)

Which is exactly what the form of the FIRSTV scanning I was discussing above would do. (Which is when I noticed a bug I pushed a small fix for earlier by removing this scanning entirely before continuing to work on it). I'm really not sure how helpful this would be as personally these scanning parameters are not intuitive and our tests are designed around this behavior not existing. So I will be trying to get everything else done before that functionality.

Then this raises another question of since we're adding another parameter why not have a character input STOREV (like how is done in xLARFT) to merge these together. I'm not sure if that's the best of ideas as at this point we're making xLARF1{F,L} somewhat more complicated than it needs to be for applying a single householder reflector with a small change in structure. But, I'm not sure what the correct move is for this.

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 11, 2024
EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 13, 2024
langou added a commit that referenced this issue Jun 19, 2024
…rf1l-in-lapack

develop DLARF1F and implement in ORM2R, #1011
@langou
Copy link
Contributor

langou commented Jun 20, 2024

Issue resolved by #1020 and #1019. Thanks @EduardFedorenkov and @jprhyne

@langou langou closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants