Problem
When running llama.cpp on RHEL9-like OS, it is noticed that a large portion of time is single-threaded, which is unexpected.
Attaching with GDB shows that it is running sgemm
provided by OpenBLAS; but OpenBLAS should run with multiple threads.
Analyze
Search engine gives this post: Red Hat Bugzilla – Bug 1589823 - The OpenBLAS package that comes with RHEL does not have multithreading capabilities
Don’t use the sequential library if you want the parallel one. Use -lopenblaso for OpenMP and -lopenblasp for pthreads.
This behavior is different from Debian-like OSs.
e.g. in Ubuntu 24.04, the OpenBLAS variant used is mananged by update-alternatives
, and only libopenblas.so
is provided.
Moreover, CMake checks libopenblas.so
only: FindBLAS.cmake @ 321b71c, line 751-805
Solution
Not sure whether package policy or FindBLAS should change. An issue is reported: cmake/cmake#26659.
Workaround is simple:
sudo mv libopenblas.so libopenblas.so.bak
sudo mv libopenblas.so.0 libopenblas.so.0.bak
sudo ln -s libopenblasp.so libopenblas.so
sudo ln -s libopenblasp.so.0 libopenblas.so.0