TL;DR: this is a stack overflow caused by large array on stack, so enlarge stack size using ulimit -s <stack size in KB>

Incident

An error occurred when processing raw baseband data of a radio telescope using dspsr:

$ dspsr -b 4194304 -D 56.716 -A -L 1.073741824 -c 1.073741824  -O ${file}_128 -e rf -F 128:D ${file}.bin -U 4096

Only single polarization detection available
dspsr: Single archive with multiple sub-integrations
dspsr: dedispersion filter length=131072 (minimum=8192) complex samples
dspsr: 128 channel dedispersing filterbank requires 33554432 samples
dspsr: blocksize=330382096 samples or 4096 MB
dsp::Fold::choose_nbin WARNING Requested nbin=4194304 > sensible nbin=2097152.  Where:
  sampling period     = 0.000256 ms and
  requested bin width = 0.000256 ms

dsp::Archiver::finish archive '13835058401541322426_128.rf' with 1 integrations
62305 Segmentation Fault (Core dumped)

Analyze

Recompile dspsr & psrchive with debug info (CFLAGS=-g, CXXFLAGS=-g), then run with gdb, segmentation fault is again triggered:

Program received signal SIGSEGV, Segmentation fault.
0x00000000008188d4 in fcompwrite (nvals=4194304, vals=0x7fffd193c010, fptr=0xcb5be0) at fcomp.C:103
103       if (scale == 0.0 || isnan(scale)) {
=> 0x00000000008188d4 <+433>:   call   0x4fdc93 <_ZSt5isnanf>

with registers

(gdb) info reg
rax            0x4a3c0d03          1245449475
rbx            0x7fffffffd500      140737488344320
rcx            0x10                16
rdx            0xfffffc            16777212
rsi            0x7fffd193c010      140736709509136
rdi            0x400000            4194304
rbp            0x7fffffffd560      0x7fffffffd560
rsp            0x7fffff7fd500      0x7fffff7fd500
r8             0x400000            4194304
r9             0x0                 0
r10            0x400000            4194304
r11            0x0                 0
r12            0x0                 0
r13            0x7fffffffdc50      140737488346192
r14            0x0                 0
r15            0x0                 0
rip            0x8188d4            0x8188d4 <fcompwrite(unsigned int, float const*, _IO_FILE*)+433>
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
...

Weird, why calling isnan() gives segmentation fault? But it seems indeed faulty:

(gdb) print isnan(scale)
Cannot access memory at address 0x7fffff7fd47f

the only thing that may link to this address is rsp = 0x7fffff7fd500, but then what?

… if know little about x86 assembly

Scanning the code, one array definition is noticed:

unsigned short int packed_buf [nvals];

where, in this context, nvals = 4194304.

This array is on stack, and it is common that large array cannot fit into stack (depending on runtime configuration), so maybe this segmentation fault is actually a stack overflow.

… if know a little about x86 assembly

Register rsp is a pointer pointing to current location on stack and grows downward. If rsp - 0x81 cannot be accessed, then it means this position is beyond stack area, so possibly a stack overflow.

Verify

This machine has a default stack size of 8 MiB:

$ ulimit -s
8192

but size of a unsigned short int [nvals] is 2 * 4194304 = 8 MiB, so another stack frame (like calling isnan() or whatever other function) gives segmentation fault immediately.

Solution / Expedient

Enlarge stack size using ulimit -s <stack size in KB> before executing dspsr with these parameters.