Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use libblastrampoline #3

Open
ViralBShah opened this issue Mar 7, 2021 · 21 comments
Open

Update to use libblastrampoline #3

ViralBShah opened this issue Mar 7, 2021 · 21 comments

Comments

@ViralBShah
Copy link

It would be great to use the same mechanism that MKL.jl uses now, and leverage libblastrampoline.

https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/src/MKL.jl

@xrq-phys
Copy link
Collaborator

xrq-phys commented Mar 7, 2021

Hi.

Thanks for contacting.
I'm not familiar with libblastrampoline, but what I want to tout is that BLIS provides a more flexible API compared to standard BLAS (e.g. generic strides and mixed precision) and I want to make use of it.

At this moment simply substituting the backend seems to be insufficient in that sense.

@ViralBShah
Copy link
Author

Right, BLIS provides a more flexible API. We should also be able to provide a way for BLIS to replace the underlying Julia BLAS with only one line of code. I will try this out and report findings - but we first need to do some more work on the LAPACK front.

@xrq-phys
Copy link
Collaborator

xrq-phys commented Mar 7, 2021

I've once mimicid MKL.jl and created this toy.

I can directly put the switcher code inside this repo but trying libblastrampoline out seems more interesting.

@ViralBShah
Copy link
Author

ViralBShah commented Mar 7, 2021

Basically, lbt_forward in current Julia master (1.7-dev), allows you to switch the underlying BLAS for all routines with a new one with MKL or potentially BLIS, without having to rebuild the system image.

The only thing is that both OpenBLAS and MKL provide the full LAPACK, but when we use BLIS, we probably want to compile our own LAPACK from source and provide it in BinaryBuilder.

cc @staticfloat

@xrq-phys
Copy link
Collaborator

xrq-phys commented Mar 7, 2021 via email

@ViralBShah
Copy link
Author

ViralBShah commented Mar 7, 2021

In order to use BLIS in Julia, we will just have LAPACK link to BLIS' BLAS API trough LBT. Then all packages that need BLAS can use BLIS and we can see how it performs.

Separately this package and FLAME.jl in the future can explore further capabilities as you articulate.

@ViralBShah
Copy link
Author

@carstenbauer
Copy link
Member

Just wanted to drop a +1 here. Getting MKL via a simple using MKL is awesome. Would be great for BLIS too!

@carstenbauer
Copy link
Member

carstenbauer commented Apr 22, 2022

FWIW,

using blis_jll
using LinearAlgebra
BLAS.lbt_forward(blis)

seems to work nicely (up to the fact that the remaining LAPACK doesn't use / link against BLIS as mentioned by @ViralBShah above, see also here). I would really like to have a MKL.jl-like package for BLIS that does this simple BLAS switching via LBT. As I understand it from the comments above, the package here (BLIS.jl) currently has a different goal / approach. Is this correct (@xrq-phys)? Should I therefore create a new package, say, "BLISBLAS.jl"?

Side comment: I realized that for the stacked OpenBLAS + BLIS (example above) the function BLAS.set_num_threads(N) sets the number of OpenBLAS threads. Is there a way to also set the BLIS threads or, more generally, the threads of a specific BLAS/LAPACK in the LBT stack (cc @staticfloat)? For now I use

blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint
blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid

@staticfloat
Copy link
Member

It looks to me like LBT should already know how to deal with BLIS.

There is a completely generic way in which you can register get/set_numthreads functions for your own BLAS library, but BLIS should already be handled natively.

@carstenbauer
Copy link
Member

carstenbauer commented Apr 22, 2022

Thanks for the info, that's good to know. But what if I have multiple BLAS/LAPACK libraries stacked on top of each other? Unless I'm missing something, BLAS.get_num_threads/BLAS.set_num_threads doesn't allow me to specify the library. Do we need to extend the API here or is there another way to access the registered get/set_num_threads functions?

UPDATE: According to the doc strings for lbt_get/set_num_threads I should get/set the num threads of all libraries at the same time. But that doesn't seem to be the case?

julia> using LinearAlgebra

julia> BLAS.get_num_threads()
8

julia> using blis_jll

julia> BLAS.lbt_forward(blis; clear=false)
157

julia> BLAS.get_num_threads()
8

julia> blis_get_num_threads() = @ccall blis.bli_thread_get_num_threads()::Cint;

julia> blis_set_num_threads(nthreads) = @ccall blis.bli_thread_set_num_threads(nthreads::Cint)::Cvoid;

julia> blis_get_num_threads()
-1

julia> blis_set_num_threads(2)

julia> blis_get_num_threads()
2

julia> BLAS.get_num_threads()
8

julia> BLAS.set_num_threads(3)

julia> blis_get_num_threads()
2

@carstenbauer
Copy link
Member

FYI: https://github.com/carstenbauer/BLISBLAS.jl

@xrq-phys
Copy link
Collaborator

@carstenbauer I think LBT's failure to set # of threads is due to this line. libblastrampoline 64_ suffix to all library subroutines not just BLAS ones, while BLIS is built only with the latter.

@xrq-phys
Copy link
Collaborator

Sorry not really.

BLIS DOES has 64_ suffix, but is in the form of bli_thread_set_num_threads_64_ instead of bli_thread_set_num_threads64_.

I would suppose in this case we shall amend libblastrampoline since BLIS in 32-bit case also yields bli_thread_set_num_threads_.

@staticfloat
Copy link
Member

staticfloat commented Apr 22, 2022

You can teach LBT about your thread function name with the following Julia code:

julia> using Libdl, blis_jll, libblastrampoline_jll
       getter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_get_num_threads_64_")
       setter = Libdl.dlsym(blis_jll.blis_handle, "bli_thread_set_num_threads_64_")
       @ccall libblastrampoline.lbt_register_thread_interface(getter::Ptr{Cvoid}, setter::Ptr{Cvoid})::Cvoid

Note that the 32-bit version of BLIS calls its thread setter function bli_thread_set_num_threads; no trailing underscore. I think there may be a small naming incongruity here.

EDIT: Whoops, I mis-read my own API, this code chunk is wrong.

@xrq-phys
Copy link
Collaborator

xrq-phys commented Apr 23, 2022

Sorry I made a mistake.

In BLIS only the setter has F77 interface:

  • bli_thread_set_num_threads_64_ for 64-bit.
  • bli_thread_set_num_threads_ for 32-bit.

while bli_thread_set_num_threads is presented as C interface. So there's no incongruity here.

The problem is that bli_thread_get_num_threads doesn't have an F77-style counterpart. i.e. only accessible via C-style calling.

Another issue is that: While Julia deploys 64-bit BLAS by default, thread-num setter always passes in 32-bit integers. On the contrary, bli_thread_set_num_threads_ is LP64/ILP64 aware. I fear that the higher 32-bit lbt_set_num_threads() passes in would break the lib down. The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_num_threads_ is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.

Btw line#14 and line#21 seem to have reversed setter and getter.

@xrq-phys
Copy link
Collaborator

Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.

@staticfloat
Copy link
Member

In BLIS only the setter has F77 interface:

  • bli_thread_set_num_threads_64_ for 64-bit.
  • bli_thread_set_num_threads_ for 32-bit.

while bli_thread_set_num_threads is presented as C interface. So there's no incongruity here.

I'm a little confused here; is bli_thread_set_num_threads supposed to have a trailing underscore or not? Here's what I see from the blis_jll that I can download right now:

julia> using blis_jll
       run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep bli_thread_set_num_threads"`)
0000000000a95520 T bli_thread_set_num_threads
0000000000a703e0 T bli_thread_set_num_threads_64_

So what I see here is that one symbol has no trailing underscore, whereas another does have the trailing underscore. I call this a trailing underscore because the ILP64 symbol suffix that the BLIS library uses (as detected by LBT) is 64_. You can see this with the following:

julia> using LinearAlgebra, blis_jll
       BLAS.lbt_forward(blis_jll.blis_path; verbose=true)
Generating forwards to /home/sabae/.julia/artifacts/b548e034d149feec83ed78f22ab942fea1ac3d12/lib/libblis.so
 -> Autodetected symbol suffix "64_"
 -> Autodetected interface ILP64 (64-bit)
 -> Autodetected gfortran calling convention
Processed 4945 symbols; forwarded 157 symbols with 64-bit interface and mangling to a suffix of "64_"
157

This symbol suffix is detected by probing for a few F77 names with a few suffixes, and if we look at the names for those symbols that are exported from BLIS:

julia> using blis_jll
       run(`/bin/bash -c "nm $(blis_jll.blis_path) | grep isamax"`);
0000000000a61340 T isamax_64_

We see that the canonical name isamax_ has 64_ suffixed to it. Now, for consistency's sake (and to allow for loading of libraries that export BOTH ILP64 and LP64 interfaces in a single .so!) LBT expects all exported names to follow a consistent naming rule, which is that the "canonical" names (whether C or FORTRAN) are suffixed reliably. This means that, for instance, if your LP64 symbol is called bli_thread_set_num_threads, then the ILP64 symbol is named bli_thread_set_num_threads64_. Otherwise, LBT has no hope of automatically finding all the different symbols. This is what I mean when I say that there is a symbol naming inconsistency.

The thread-setting routine used by lbt_set_num_threads is void (int) while bli_thread_set_num_threads_ is an F77 interface void (int *), while the C interface bli_thread_set_num_threads takes 64-bit integers instead of 32-bit ones.

Are you using a different version of libblis than I am? I do not have both bli_thread_set_num_threads and bli_thread_set_num_threads_ in my version. I'm using v0.9.0+0 of the JLL. In any case, if there were a C interface that takes 64-bit integers that's fine, as C passes arguments through registers, so when we pass a 32-bit integer it gets zero-extended. The FORTRAN interface would indeed be a problem though.

Btw line#14 and line#21 seem to have reversed setter and getter.

Good catch! Swapped in JuliaLinearAlgebra/libblastrampoline@145bb64

Perhaps, at least the generic registration method should support thread-num setter with and without the 64_ extension, while preferring the one with an extension.

The generic registration method doesn't pay any attention to names; it relies on you to do the dlsym() manually, then just pass in raw function pointer addresses. So you can do what I mentioned in the code snippet in my previous message and use that directly (with the C interface version of the symbols) and things should "just work".

@xrq-phys
Copy link
Collaborator

This line seems only working on strings?

@xrq-phys
Copy link
Collaborator

@staticfloat To your question, current configuration for BLIS builds bli_thread_set_num_threads_ for 32-bit machines and bli_thread_set_num_threads_64_ for 64-bit machines, while bli_thread_set_num_threads (the one without an underscore) is built always as a BLIS-defined C interface.

Anyway, since libblastrampoline does not pass-in pointers, I'd stick to bli_thread_set_num_threads without an underscore and manually create a bli_thread_set_num_threads64_ counterpart.

@jd-foster
Copy link

The issue observed above (#3 (comment)) should be fixed with the latest update to the Yggdrasil recipe (JuliaPackaging/Yggdrasil#7448).
@carstenbauer As verification, it seems to work now in tandem with the direct calls wrapped in BLISBLAS.jl:

julia> import BLISBLAS
[ Info: Precompiling BLISBLAS [6f275bd8-fec0-4d39-945b-7e95a765fa1e]

julia> using LinearAlgebra

julia> BLAS.get_num_threads()
6

julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
├ [ILP64] libopenblas64_.0.3.21.dylib
└ [ILP64] libblis.4.0.0.dylib

julia> BLAS.set_num_threads(42)

julia> BLISBLAS.get_num_threads()
42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants