Problem

When running llama.cpp on RHEL9-like OS, it is noticed that a large portion of time is single-threaded, which is unexpected.

Attaching with GDB shows that it is running sgemm provided by OpenBLAS; but OpenBLAS should run with multiple threads.

Analyze

Search engine gives this post: Red Hat Bugzilla – Bug 1589823 - The OpenBLAS package that comes with RHEL does not have multithreading capabilities

Don’t use the sequential library if you want the parallel one. Use -lopenblaso for OpenMP and -lopenblasp for pthreads.

This behavior is different from Debian-like OSs. e.g. in Ubuntu 24.04, the OpenBLAS variant used is mananged by update-alternatives, and only libopenblas.so is provided.

Moreover, CMake checks libopenblas.so only: FindBLAS.cmake @ 321b71c, line 751-805

Solution

Not sure whether package policy or FindBLAS should change. An issue is reported: cmake/cmake#26659.

Workaround is simple:

sudo mv libopenblas.so libopenblas.so.bak
sudo mv libopenblas.so.0 libopenblas.so.0.bak
sudo ln -s libopenblasp.so libopenblas.so
sudo ln -s libopenblasp.so.0 libopenblas.so.0