xORMQR and xLARFB discrepancy with documentation #1011

EduardFedorenkov · 2024-04-18T05:27:58Z

Let's suppose that we have a task to apply vectors from A (returned by xGEQRF) to the set of matrices by xORMQR in parallel. I found that current LAPACK has discrepancy with documentation that introduce doubts when you try to solve this task.

xORMQR : According to documentation the unblocked code branch - xORM2R modify diagonal of A matrix and restore it on the exit. But xORMQR has no such remark about A matrix. So, in this case we can't use xORMQR in parallel because of rewriting the diagonal of A.
xLARFB : According to documentation vectors in V are modified and restore on the exit. But according to the code GEMM and TRMM with diag='Unit' don't touch the diagonal elements of V.

Maybe it's worth doing some corrections in LAPACK documentations for xLARFB and xORMQR?

langou · 2024-04-18T15:52:40Z

For LARFB, you are correct. We could change: "The elements equal to 1 are not stored; the corresponding array elements are modified but restored on exit. The rest of the array is not used." to "The upper triangular part of V (including its diagonal) is not referenced." Thanks. Feel welcome to provide a PR for LARFB.

For ORMQR, you are correct too. If we use the "unblocked" version of the code, we use ORM2R and then "the diagonal elements of A are modified but restored on exit". And then after line 96 in ORMQR write "A is modified by the routine but restored on exit."

Also, in ORM2R and ORMQR, A could be labelled as "input/output" and not "input". Well, I am not sure about this. A is the same in input and in output, but it is changed and that would help for thread safety.

However we should really "fix" ORM2R (by fixing LARF) so that LARF does not require the "1" on the diagonal of V so that ORM2R does not "modify and restore on exit" entries of V. Maybe we create a new routine for this, such as LARF1. Or just change LARF proper.

Another fix for ORMQR would be to remove the call to ORM2R and always call the "blocked code". So remove

      IF( NB.LT.NBMIN .OR. NB.GE.K ) THEN
*
*        Use unblocked code
*
         CALL DORM2R( SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK,
     $                IINFO )
      ELSE
*
*        Use blocked code
*
***** [ KEEP THIS PART ] *****
      END IF

So that we never use ORM2R in ORMQR. That would be do the trick.

If you want to propose a PR for the documentation of ORMQR, this is OK. My preference is to fix LARF, I can work on this but this will have to wait for a month or so.

…tation fix LARFB documentation, #1011

EduardFedorenkov · 2024-05-21T04:59:23Z

@langou, about ORM2R and LARF. Looks like the simple solution is this:

ORM2R: double workspace for ORM2R. One part is for copy of V and another for storing C**T * V. Finaly remove code which modifies A.
LARF: just add COPY in LARF to copy V inside work with something like work(1) = 1;.

Also, I think that removing of unblocked algorithm ORM2R from ORMQR is not so bad idea. But, may be there are some reasons that can damaged some tests. I don't know.

langou · 2024-05-21T17:56:40Z

Hi @EduardFedorenkov,

about ORM2R and LARF. Looks like the simple solution is this:
(1) ORM2R: double workspace for ORM2R. One part is for copy of V and another for storing C**T * V. Finaly remove code which modifies A.
(2) LARF: just add COPY in LARF to copy V inside work with something like work(1) = 1;.

Thanks for sharing ideas. Some comments: () Increasing necessary workspace is not a good idea in general and we have been reluctant to do it for backward compatibility. () ORM2R will be fixed if we fix LARF. (*) For LARF, we should write a version that assumes that the first or the last component of v is a 1. LARF1F (1 first) and LARF1L (1 last) for example. And then propagate to ORM2R. I will try to get this done by this Summer.

Also, I think that removing of unblocked algorithm ORM2R from ORMQR is not so bad idea. But, may be there are some reasons that can damaged some tests. I don't know.

This is a good idea but I am leery to go in this direction. This kind of change is up to people tuning the code. I think the code as is fine. The condition to call ORM2R (inside ORMQR) is

lapack/SRC/dormqr.f

Line 275 in 4a8ed6b

IF( NB.LT.NBMIN .OR. NB.GE.K ) THEN

and I agree that this can be changed. This should not damage any test. (As long as this is done correctly.) I agree that changing this line of code could bring some speedup. That being said, I do not think we should change this for now.

Julien.

langou · 2024-05-29T21:27:05Z

@jprhyne just pushed his implementation in #1020

I am sorry for the duplication of work. I did not expect it.

It seems like both implementations are suffering from corner cases.

While we need to get these corner cases working, I do not want to create stress and start a race on who finishes first. It would be great if both of you are able contribute to LAPACK through your work. Maybe we can agree that you will work on LARF1F, and the other work on LARF1L. Or someone does C and S and the other one does Z and D. If that's acceptable. I think the split {C,S} for one and {Z,D} for the other is the best for now.

The two codes (@jprhyne's #1020 and @EduardFedorenkov #1019) look similar-ish that I feel it is possible to merge both work and it would be great to have both of you contributing to the software.

In any case, it is great to have contributions and a start of resolve for the issue. Thanks to both of you for the work.

EduardFedorenkov · 2024-05-30T04:23:42Z

Hi, @langou and @jprhyne!
It's OK, let's work together!

Great idea. I support your suggestions! I will try to find the solution for corner cases as well as @jprhyne
let's split {Z,D} and {C,S}. Also, we need to align code. So, @jprhyne can review my PR and I can review #1020. When we all agree about code we can just merge #1019 with {Z,D} and #1020 with {C,S} for example.

langou · 2024-05-30T15:22:40Z

Related, not necessarily urgent, but it would be good to push the change to more subroutines than only ORM2R. For example GEQR2 as the same "guilty" lines:

lapack/SRC/dgeqr2.f

Lines 184 to 191 in dd2e5ef

    
           * 
        
           *           Apply H(i) to A(i:m,i+1:n) from the left 
        
           * 
        
                       AII = A( I, I ) 
        
                       A( I, I ) = ONE 
        
                       CALL DLARF( 'Left', M-I+1, N-I, A( I, I ), 1, TAU( I ), 
        
                $                  A( I, I+1 ), LDA, WORK ) 
        
                       A( I, I ) = AII

Let us fix ORM2R first.

I did a grep of LARF and found all these routines:
xGEBD2
xGEHD2
xGELQ2
xGEQL2
xGEQP3RK
xGEQR2
xGEQR2P
xGERQ2
xLAQP2
xLAQP2RK
xLAQR2
xLAQR3
xLARFX
xOPMTR
xORBDB
xORBDB1
xORBDB2
xORBDB3
xORBDB4
xORG2L
xORG2R
xORGL2
xORGR2
xORM2L
xORM2R
xORML2
xORMR2

Note: the xOR prefixes are for real arithmetic {S,D} and so the corresponding complex arithmetic subroutines would be under xUN for complex arithmetic {C,Z}.

…e-LAPACK#1011

ilayn · 2024-06-06T12:12:46Z

Hi everyone, just a passing by comment for this since I use these extensively in the guts of SciPy; are we expecting any breaking changes in the behavior of un/orm2r and larfgp as a result?

Very tiny markup nitpick; @EduardFedorenkov if you put the relevant issue number in the PR description and GitHub will associate the relevant PR in the commit history of the file or in git blame. Then you don't need the issue number in the commit messages which typically paint the issues as above.

langou · 2024-06-06T12:23:03Z

are we expecting any breaking changes in the behavior of 'un/orm2r'?

No. Behavior of 'un/orm2r' should be unchanged.

'un/orm2r' will be thread-safe though, so you will be able to call in shared-memory in parallel a few 'un/orm2r' at once on the same piece of A (after this PR, A will be read only) to update multiple pieces of C at once. As of now, this is not permitted. Since, currently, values of A are changed during the execution, albeit the values are unchanged in output.

In addition, 'un/orm2r' will be easier to read and look cleaner.

The impacted routine should not be only 'un/orm2r'. The list of potentially impacted routines is at: #1011 (comment)

But no behavior should change.

langou · 2024-06-06T12:27:00Z

are we expecting any breaking changes in the behavior of 'larfgp' as a result?

Why are you mentioning 'larfgp'? (1) Are you using it? - I am being curious, (2) 'larfgp' generate a Householder reflector. So 'larfgp' and 'larfg' are not relevant to this change. They generate a Householder reflector. We are working on 'larf' which uses the Householder reflector. (3) Maybe you meant 'larf'? If that is the case, then yes and no. In short the plan is to leave 'larf' as is but introduce new routines 'larf1f' and 'larf1l' to replace 'larf'. I think if we do it right 'larf' will not be used any longer by LAPACK, but the plan is to leave 'larf' in LAPACK.

ilayn · 2024-06-06T12:31:48Z

Apologies, muscle memory I guess. I meant larf indeed. Thanks for the heads up, I'll take a note of this.

…LAPACK#1011

EduardFedorenkov · 2024-06-07T07:54:01Z

Hello, @langou, @jprhyne!

I think this PR for {S, C} is about 95% done. I look over one more time and add some small fixes. I have some notes to discus:

We talk with @jprhyne about zero vector as input for xLARF1F and decide that the smallest LASTV must be limited by 1. But unfortunately I have no idea how to deal with this issue in xLARF1L (I mean zero vector input)
Interesting case is in xLARFX. Looks like changes in xLAEXC is needed in this case. And maybe xLARFX must be spited by xLARFX1F and xLARFX1L as well

jprhyne · 2024-06-07T13:24:38Z

Related to @EduardFedorenkov point about lastv with larf1l. I have 2 questions/ideas about how to proceed.

We could
1: Since we are assuming v(lastv)=1 (in fact we guaranteed this was the case in the previous paradigm) we could replace it with a firstv parameter and start scanning at the start of v.
2: scrap the larf1l.f files and instead call larf1f.f with a negative incv (I have not really tested this so I am not too sure how feasible this is). This has the benefit of one less extra file but could lead to some headaches depending on if there are edge cases that don't play nice with this paradigm. My naive attempts of just dropping in dlarf1f where dlarf1l was and negating incv so far have added some failures but it may just need us to think differently about the incv < 0 case.

edit: fixed incorrect file names

jprhyne · 2024-06-09T19:35:50Z

After some prototyping, I don't think option 2 is the best idea. I'm not too familiar with how the negative incv values should behave, and the ideas that I can think of to use this are harder to read than the current implementation so I'll be not working on that anymore, but maybe someone can look at it at some point and revisit this idea.

Until then I'll be working on the 1l with a "firstv" variable instead of a lastv

edit: Fixed formatting accidentally referencing another post

langou · 2024-06-10T07:44:32Z

the 1l with a "firstv" variable

I had not thought about this. Oh my.

@jprhyne is making a good point and proposing to introduce a new feature that is not in LARF and that is an advantage of this LARF1L / LARF1F. The current LARF code always scan and look for zeros from the bottom (last entry of the vector). However when the "one" is at the bottom, it makes more sense to scan from start of the vector. What @jprhyne is proposing would be a new feature. (If I understand correctly.) And this makes sense.

EduardFedorenkov · 2024-06-10T08:14:35Z

I think that the better way is to change the set of the arguments in LARF1F and LARF1L.

What about to add parameter that store index in which ONE is assumed to be. No need to compute lastv or firstv in this case. What do you think @jprhyne, @langou?

Because even the current realization of LARF1F failed the tests if in place where we expect ONE will be the pure ZERO.

jprhyne · 2024-06-10T15:28:44Z

I like this idea, but I don't see how it will be different for larf1f

The current idea behind my attempt is to assume v(1) = 1 without referencing it. Something isn't playing nice with that scheme and is probably a simple indexing issue with the lastv scanning or something else that I need to just sit down and go through slowly with many cases, so adding another parameter shouldn't change anything about the algorithm itself. We would just do

DO WHILE(LASTV.GT.FIRSTV_ARGUMENT .AND. V(I) = ZERO)
   LASTV = LASTV-1
   I = I - INCV
END DO

or maybe

DO WHILE(LASTV.GE.FIRSTV_ARGUMENT+1 .AND. V(I) = ZERO)
   LASTV = LASTV-1
   I = I - INCV
END DO

And later on we would instead of referencing V(1) and C(1,1) we would have
V(1 + (FIRSTV_ARGUMENT-1) * INCV) and C(FIRSTV_ARGUMENT,1) (and the C parameters flipped when SIDE is changed)

Which is exactly what the form of the FIRSTV scanning I was discussing above would do. (Which is when I noticed a bug I pushed a small fix for earlier by removing this scanning entirely before continuing to work on it). I'm really not sure how helpful this would be as personally these scanning parameters are not intuitive and our tests are designed around this behavior not existing. So I will be trying to get everything else done before that functionality.

Then this raises another question of since we're adding another parameter why not have a character input STOREV (like how is done in xLARFT) to merge these together. I'm not sure if that's the best of ideas as at this point we're making xLARF1{F,L} somewhat more complicated than it needs to be for applying a single householder reflector with a small change in structure. But, I'm not sure what the correct move is for this.

…rf1l-in-lapack develop DLARF1F and implement in ORM2R, #1011

langou · 2024-06-20T12:49:35Z

Issue resolved by #1020 and #1019. Thanks @EduardFedorenkov and @jprhyne

EduardFedorenkov added the Type: Question label Apr 18, 2024

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 20, 2024

fix LARFB documentation, Reference-LAPACK#1011

201a5c4

EduardFedorenkov mentioned this issue May 20, 2024

fix LARFB documentation, #1011 #1016

Merged

1 task

langou added a commit that referenced this issue May 21, 2024

Merge pull request #1016 from EduardFedorenkov/1011-fix-larfb-documen…

4a8ed6b

…tation fix LARFB documentation, #1011

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 29, 2024

develop DLARF1F and implement in ORM2R, Reference-LAPACK#1011

b8a6443

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 30, 2024

fix DLARF1F in case lastv = 1, Reference-LAPACK#1011

0d2bff7

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue May 31, 2024

align DLARF1F versions, Reference-LAPACK#1011

a4698c3

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024

remove dlarf1f prototype and add slarf1f, slarf1l, Reference-LAPACK#1011

e4f5ae0

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024

remove dlarf1f prototype and add slarf1f, slarf1l, Reference-LAPACK#1011

797ed89

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024

remove dlarf1f prototype and add slarf1f, slarf1l, Reference-LAPACK#1011

a32aef5

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 3, 2024

remove dlarf1f prototype and add slarf1f, slarf1l, Reference-LAPACK#1011

5e7dad3

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 4, 2024

update single precision routines to use slarf1f and slarf1l, Referenc…

8dd7e13

…e-LAPACK#1011

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 5, 2024

implement clarf1f, Reference-LAPACK#1011

6c0a98f

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 5, 2024

try clarf1f in cunm2r, Reference-LAPACK#1011

5889e3e

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 6, 2024

fix lastv possible range in slarf1f and slarf1l, Reference-LAPACK#1011

1d4010e

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 6, 2024

fix lastv possible range in clarf1f, Reference-LAPACK#1011

ea943fc

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 6, 2024

implement clarf1l, Reference-LAPACK#1011

b8b9771

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024

update single complex routines to use clarf1f and clarf1l, Reference-…

8ed1ab5

…LAPACK#1011

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024

small fix in larf1f and larf1l, Reference-LAPACK#1011

b579759

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024

define larf1f and larf1l in lapack_64.h, Reference-LAPACK#1011

cbd638d

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024

small fix in routines to use larf1f and larf1l, Reference-LAPACK#1011

05f087f

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 7, 2024

small fix in routines to use larf1f and larf1l, Reference-LAPACK#1011

ba27bf0

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 11, 2024

add firstv param in larf1l, Reference-LAPACK#1011

690067c

EduardFedorenkov added a commit to EduardFedorenkov/lapack that referenced this issue Jun 13, 2024

code style small fixes, Reference-LAPACK#1011

c8b1a51

langou added a commit that referenced this issue Jun 19, 2024

Merge pull request #1019 from EduardFedorenkov/1011-add-larf1f-and-la…

256c836

…rf1l-in-lapack develop DLARF1F and implement in ORM2R, #1011

langou closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xORMQR and xLARFB discrepancy with documentation #1011

xORMQR and xLARFB discrepancy with documentation #1011

EduardFedorenkov commented Apr 18, 2024 •

edited

Loading

langou commented Apr 18, 2024

EduardFedorenkov commented May 21, 2024

langou commented May 21, 2024

langou commented May 29, 2024

EduardFedorenkov commented May 30, 2024 •

edited

Loading

langou commented May 30, 2024 •

edited

Loading

ilayn commented Jun 6, 2024

langou commented Jun 6, 2024 •

edited

Loading

langou commented Jun 6, 2024

ilayn commented Jun 6, 2024

EduardFedorenkov commented Jun 7, 2024 •

edited

Loading

jprhyne commented Jun 7, 2024 •

edited

Loading

jprhyne commented Jun 9, 2024 •

edited

Loading

langou commented Jun 10, 2024 •

edited

Loading

EduardFedorenkov commented Jun 10, 2024

jprhyne commented Jun 10, 2024

langou commented Jun 20, 2024

xORMQR and xLARFB discrepancy with documentation #1011

xORMQR and xLARFB discrepancy with documentation #1011

Comments

EduardFedorenkov commented Apr 18, 2024 • edited Loading

langou commented Apr 18, 2024

EduardFedorenkov commented May 21, 2024

langou commented May 21, 2024

langou commented May 29, 2024

EduardFedorenkov commented May 30, 2024 • edited Loading

langou commented May 30, 2024 • edited Loading

ilayn commented Jun 6, 2024

langou commented Jun 6, 2024 • edited Loading

langou commented Jun 6, 2024

ilayn commented Jun 6, 2024

EduardFedorenkov commented Jun 7, 2024 • edited Loading

jprhyne commented Jun 7, 2024 • edited Loading

jprhyne commented Jun 9, 2024 • edited Loading

langou commented Jun 10, 2024 • edited Loading

EduardFedorenkov commented Jun 10, 2024

jprhyne commented Jun 10, 2024

langou commented Jun 20, 2024

EduardFedorenkov commented Apr 18, 2024 •

edited

Loading

EduardFedorenkov commented May 30, 2024 •

edited

Loading

langou commented May 30, 2024 •

edited

Loading

langou commented Jun 6, 2024 •

edited

Loading

EduardFedorenkov commented Jun 7, 2024 •

edited

Loading

jprhyne commented Jun 7, 2024 •

edited

Loading

jprhyne commented Jun 9, 2024 •

edited

Loading

langou commented Jun 10, 2024 •

edited

Loading