We have an odd situation and would appreciate any suggestions. We've been building iMX8 Mini based boards for a while, running Yocto linux 5.15.32. Partnumbers:
CPU: MIMX8MM6DVTLZAA (quad core iMX8 mini)
RAM: Micron MT53E256M32D2DS-053 AAT:B (LPDDR4 1 GB, x32 wide, single chip)
Recently we've had a batch of boards that crash within minutes just when running the Linux OS. No application running, room temperature, but if we run 4 instances of memtester (one memtester running on each of the CPUs) we can get memtester errors. Looks like a simple RAM problem, right?
Here's the question. If we add the kernel argument:
nr_cpus=2
So now only 2 CPUs are used by the OS, then magically everything becomes stable. The boards will run for days. We can run 2 instances of memtester (one on each CPU) indefinitely, etc.
There's only a single RAM interface, and a single RAM chip. So why would the RAM be stable for 2 CPUs and yet not for 4? What's the difference between 2 CPUs accessing RAM versus 4 CPUs accessing RAM (even when hammering RAM using multiple memtesters)?
Thanks.
I checked the crash log and did same memory test with same kernel version. No such issue with 4 memtester instances.
From the crash log, the sync error is about page and cache swap. This could relate to DDR or kernel. Can you try more kernel version? like 5.15.71 ,6.1.55 and 6.6.3