We have an odd situation and would appreciate any suggestions. We've been building iMX8 Mini based boards for a while, running Yocto linux 5.15.32. Partnumbers:
CPU: MIMX8MM6DVTLZAA (quad core iMX8 mini)
RAM: Micron MT53E256M32D2DS-053 AAT:B (LPDDR4 1 GB, x32 wide, single chip)
Recently we've had a batch of boards that crash within minutes just when running the Linux OS. No application running, room temperature, but if we run 4 instances of memtester (one memtester running on each of the CPUs) we can get memtester errors. Looks like a simple RAM problem, right?
Here's the question. If we add the kernel argument:
nr_cpus=2
So now only 2 CPUs are used by the OS, then magically everything becomes stable. The boards will run for days. We can run 2 instances of memtester (one on each CPU) indefinitely, etc.
There's only a single RAM interface, and a single RAM chip. So why would the RAM be stable for 2 CPUs and yet not for 4? What's the difference between 2 CPUs accessing RAM versus 4 CPUs accessing RAM (even when hammering RAM using multiple memtesters)?
Thanks.