I solved my problem in the meantime.
The segmentation fault was caused by accessing a memory location with the address 0xffffffff. This happened, when the rmu driver (rmu_driver.c) tries to mmap some CPU registers. The mmap fails and returns (correctly) with -1 (MAP_FAILED), which the driver fails to notice. The driver only checks for 0/NULL, but not for MAP_FAILED/-1. So he proceeds and tries to access the memory at -1/0xffffffff.
So I checked the mmap mechanic behind the rmu_driver.c, which can be found in the kernel source file uio.c, which is the generic userspace IO driver source code. Here the mmap of the rmu_driver.c arrives, because the rmu kernel module fsl_rmu_uio.c does not have its own mmap implementation. The uio_mmap function in uio.c then checks a few things, like comparing the requested memory size vs the allowed memory size of the rmu unit (specifically the messe units 0, 1 and the doorbell unit). Here the mmap fails, because the allowed mem size is set to 0x100 (for the rmu message units) and 0x80 (for the doorbell unit), but a call to mmap will always try to get AT LEAST one page (4096 bytes). So 4096 is bigger than 0x100 or 0x80, which is why the mmap fails, which is why it returns with MAP_FAILED.
The mem sizes (0x100 and 0x80) are set via the device tree binding for the rmu unit, more specifically the "reg" parameter in the message units and the doorbell unit. So increasing the size there to 0x1000 is one option to let the mmap call succeed, but I am not sure if that is wise. There might be other kernel/driver components who read out the device tree parameters and use those sizes for other things. I am not sure about the consequences of this possible solution.
My personal and current solution is to add a mmap function to the fsl_rmu_uio.c kernel module, where I do not do this kind of check. It works fine and I can now send and receive doorbells between two P4080 boards. However I am not sure if this is an optimal solution.
I want to investigate some more and see if there are more elegant solutions, but it works now and I can use the rmu application as well as the fsl_rmu_uio.ko kernel module just fine.
Any input from you guys is appreciated. Maybe this helps someone else as well.
Best regards,
Andre