In the end, we found that the GPU crash was caused by CMA allocation failure. The failure itself was due to the random placement of kernel code/data, since the original CMA alloc-range size was only 1 GB, while the CMA required 640 MB of contiguous memory. Unfortunately, if the kernel code/data happened to fall in the middle of that 1 GB range, CMA could not allocate the required 640 MB block.
Both Yocto 4.2 and Yocto 5.0 have KASLR enabled, but I still cannot figure out why this issue does not occur with Yocto 4.2. For Yocto 4.0, the issue does not occur because KASLR is disabled due to the lack of a seed.
As a workaround, the CMA alloc-range size is increased to 2 GB or by disabling KASLR.