We are using NXP2160a with LSDK v20.04. We have a problem :
When run “poweroff” command on terminal, the system is shut down ,but after a while , the system reboot automatically.
2. The preliminary reason:
Last, we find the reason:
In 《QorIQ LX2160A Reference Manual》4.3.1.2 Reset Request Mask Register (RSTRQMR1) , the sheet in chapter 4.3.1.2.4 shows :
17 MBEE_MSK
Multi-bit ECC error reset request mask
0b - Multi-bit ECC error event can cause a reset request
1b - Multi-bit ECC error event cannot cause a reset request. MBEE_MSK field must be preset to 1
to not cause a reset request due to multi-bit ECC error on boot due to some IP reading memories
before they are initialized.
3. What we do :
I masked bit 17, then the problem is solved.
4. ****** The ultimate question*******
I want to ask that what is bit17 used for? Why it caused reboot after running “poweroff” command?
“multi-bit ECC error on boot due to some IP reading memories before they are initialized.”
What is IP represent for? How does it happen?
Thank you!
Please refer to the following update from the SE team.
I reproduced the reset issue after executed poweroff on LX2160ARDB.
Is there any log or steps from customer?
The reset value of MBEE_MSK is "1", I masked it (set it to 0), the board will reboot immediately, I cannot execute the poweroff.
How the customer did it?
if MBEE_MSK is "0", the system will restart after “poweroff” command or "shutdown -h now" command.
the log is :
[20:51:30] [ OK ] Reached target Final Step.
[20:51:30] Starting Power-Off...
[20:51:30] [ 18.673954] reboot: Power down
After this, the system will restart .
if MBEE_MSK is "1", the system will not restart after “poweroff” command or "shutdown -h now" command.
if MBEE_MSK is =0 then while system is running and encounters an MBEE event then the RESET_REQ_B signal will be asserted, which if it is connected to HRESET on the customer board for example, it then will cause a reboot. and the sentence that says some IP, it means any entity that has the ability to access some internal ECC monitored transactions like core, dma, etc.
The reset value is 1, and the multi-bit ECC error event cannot cause a reset request.
I have a trial, if set the value is 0, the board will reboot immediately and cannot input poweroff.
The IP in MMBE_MSK description means some IPs will be impacted by Multi-bit ECC error can trigger a REST_REQ assertion.
such as PCIe, eSDHC as i know.
The PCI Express internal transmit and receive buffers are ECC protected. In order for ECC checking to perform correctly the space in the buffers must be written to first before ever performing a read to that space. This write initializes the buffers to create correct ECC syndromes. However, due to this erratum, these buffers are continually read once the PCI Express receives clocks. Because these buffers are being read before they are written, a false multi-bit ECC error is likely to occur. The multi-bit ECC error is flagged in RSTRQSR1[MBEE_RR] = 1.
The false multi-bit ECC error can trigger a RESET_REQ assertion (RSTRQSR1[MBEE_RR] =1) if RSTRQMR[MBEE_MSK] does not mask the assertion.
Please refer to the following update from the SE team.
I got another LX2160ARDB in ATX boardfarm.
There is no reboot after executed poweroff.
does customer use LX2160ARDB or their board?
Thank for your detail reply, our board is designed based on LX2160ARDB.
Do you mean that , when poweroff or shut down ,there's some entity such as PCI or DMA still try to read RAM memories?
Because it is power off, the memory is not initialized, so multi ECC error happens.
But why this can happen ,normally, reading action can't be executed when power off.