Why would a DSB instruction never complete?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Why would a DSB instruction never complete?

8,443 Views
chriscowdery
Contributor V

Hi All,

MIMXRT1021DAF5A, IAR compiler, FreeRTOS, code in Serial Flash (SQPI)

 I have a DSB instruction that never completes. This causes a processor 'lockup'. This is the sequence of events:

1. Code executing normally

2. SysTick interrupt occurs.

3. System 'stops'. (I do not know why it stops - I guess the pipeline has just fetched the DSB instruction)

4. Using J-Link Commander, I 'halt' and read register contents. I can confirm PC = first instruction of SysTick handler. Stack & registers OK, and consistent.

5. Using J-Link Commander, I 'single step' :

6. PUSH {R7,LR} - OK

7. MOVS R0,#32 - OK

8. MSR BASEPRI,R0 - OK

9. DSB #15 - Any J-Link Commander command after this reports 'CPU is not halted!'. Trying to 'halt' the CPU says 'WARNING: CPU could not be halted'

I think this means the DSB instruction never completed. I think the CPU is waiting for an input to allow DSB to complete which never comes.

Why would a DSB instruction never complete?

Thanks,

Chris.

Labels (1)
31 Replies

5,515 Views
chriscowdery
Contributor V

Bit more information - does not look to be DSB specific.

I just single-stepped these lines in J-Link Commander:

60021546:  FB F7 34 FA        BL        #-0x4B98
J-Link>s
6001C9B2:  F8 B5              PUSH      {R3-R7,LR}
J-Link>s
6001C9B4:  00 25              MOVS      R5, #0
J-Link>s
6001C9B6:  CC 48              LDR       R0, [PC, #+0x330]
J-Link>s
6001C9B8:  00 68              LDR       R0, [R0]
J-Link>s
6001C9BA:  00 28              CMP       R0, #0
J-Link>s
6001C9BC:  6B D1              BNE       #+0xD6
J-Link>s
6001C9BE:  DF F8 AC 03        LDR       R0, [PC, #+0x3AC]

****** Error: Communication timed out: Requested 4 bytes, received -2 bytes !
J-Link>s

****** Error: Cannot read register 15 (R15) while CPU is running

****** Error: CPU is not halted
J-Link>

There are two LDR instructions - why would the first one execute OK, but not the second one?!?!?!??

Chris.

0 Kudos
Reply

5,515 Views
mjbcswitzerland
Specialist V

Chris

What is the [R0] pointing to? If it is a misaligned address it could be a fault taking place.
I have had a couple of incidences with the GCC version used by MCUXpresso where the compiler version was not respecting alignment correctly when -Os was used.

Regards

Mark

[uTasker project developer for Kinetis and i.MX RT]

5,515 Views
chriscowdery
Contributor V

Hi,

 I have run the same sequence again, reading out more information (note this code is part of SysTick so has run millions of times before failing. This leads me to conclude the compiler output must be correct...)

J-Link>s
6001C9BA:  00 28              CMP       R0, #0
J-Link>s
6001C9BC:  6B D1              BNE       #+0xD6
J-Link>regs
PC = 6001C9BE, CycleCnt = F798A73B
R0 = 00000000, R1 = 6004A51C, R2 = 00000000, R3 = 00000000
R4 = 00000000, R5 = 00000000, R6 = 4039C000, R7 = 00000001
R8 = 00000008, R9 = 09090909, R10= 10101010, R11= 11111111
R12= 33EC8413
SP(R13)= 2000FFC0, MSP= 2000FFC0, PSP= 20206710, R14(LR) = 6002154B
XPSR = 6100000F: APSR = nZCvq, EPSR = 01000000, IPSR = 00F (INTISR0)
CFBP = 00002000, CONTROL = 00, FAULTMASK = 00, BASEPRI = 20, PRIMASK = 00

FPS0 = 00000000, FPS1 = 00000000, FPS2 = 00000000, FPS3 = 00000000
FPS4 = 00000000, FPS5 = 00000000, FPS6 = 00000000, FPS7 = 00000000
FPS8 = 00000000, FPS9 = 00000000, FPS10= 00000000, FPS11= 00000000
FPS12= 00000000, FPS13= 00000000, FPS14= 00000000, FPS15= FFFFFFFF
FPS16= 00000000, FPS17= 00000000, FPS18= 00000000, FPS19= 00000000
FPS20= 00000000, FPS21= 00000000, FPS22= 00000000, FPS23= 00000000
FPS24= 00000000, FPS25= 00000000, FPS26= 00000000, FPS27= 00000000
FPS28= 00000000, FPS29= 00000000, FPS30= 00000000, FPS31= FFFFFFFF
FPSCR= 02000000
J-Link>mem8 6001cd68 16
6001CD68 = 10 BD 00 00 30 C3 00 20 10 B5 04 00 00 2C 07 D1
6001CD78 = 20 20 80 F3 11 88
J-Link>s
6001C9BE:  DF F8 AC 03        LDR       R0, [PC, #+0x3AC]

****** Error: Communication timed out: Requested 4 bytes, received -2 bytes !
J-Link>

So, the LDR instruction is (if I understand it correctly), loading R0 with the contents of memory location 0x6001c9be + 0x3ac == 0x6001cd6a.

According to the mem8 command, that should be 0xc3300000 although I fully imagine the pipeline delay means the value it should read is 0x2000c330.

That is pretty innocuous, it shouldn't cause an issue.

kerryzhou‌ - I can't reproduce the issue on an EVK. It takes 37 minutes of run time with FreeRTOS operating an RF stack and the TCP/IP stack before it gets into this state. If I removed enough code for it to operate on an EVK, it would not fail.

I believe something has gone wrong in the MPU somewhere that is causing the core to lock up, but it is hard to tell what it is...!

Thanks,

Chris.

0 Kudos
Reply

5,515 Views
mjbcswitzerland
Specialist V

Chris

If the failure is infrequent it is not compiler related.
Do you always see R0 set to 0 when it operates generally or only when the failure occurs?
Can the SysTick interrupt be interrupted by other interrupts and are the stacks being used sufficient to handle worst case interrupt nesting? Basically, when you step the "linear" code is it possible that interrupts and/or task switching is taking place so you are seeing the result of other code rather than just the instructions in the IRQ?

Regards

 

Mark

[uTasker project developer for Kinetis and i.MX RT]

5,515 Views
chriscowdery
Contributor V

Hi Mark,

 I will check that out. Unfortunately this morning each time it stops, I can't break into it using J-Link. When J-Link does get hold of the DAP, the MCU is in the ROM bootloader (address 0x0020xxxx) somewhere, having reset the core.

I don't think the possibility you identify about the single stepping being interrupted is the case - I can check the core cycle counter is only incrementing by a very small number each time I make a step. If interrupts were occuring, I would expect a large increase for each step. But I will confirm.

But SysTick can definitely be interrupted - the ENET and GPT1 can both cause interrupts for a start. Although ENET just sets a FreeRTOS notification so the handling task will wake up on the next SysTick.

Chris.

0 Kudos
Reply

5,515 Views
chriscowdery
Contributor V

Hi Mark,

 OK, I managed to get a look-in with the J-Link. I can confirm that the cycle counter indicates that no other execution is going on as it increments by the expected amount for each instruction.

J-Link>regs
PC = 6001C9B2, CycleCnt = 92391F33, R0 = 00000020, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9B2:  F8 B5              PUSH      {R3-R7,LR}
J-Link>regs
PC = 6001C9B4, CycleCnt = 92391F3A, R0 = 00000020, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9B4:  00 25              MOVS      R5, #0
J-Link>regs
PC = 6001C9B6, CycleCnt = 92391F3B, R0 = 00000020, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9B6:  CC 48              LDR       R0, [PC, #+0x330]
J-Link>regs
PC = 6001C9B8, CycleCnt = 92391F3D, R0 = 2000C354, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9B8:  00 68              LDR       R0, [R0]
J-Link>regs
PC = 6001C9BA, CycleCnt = 92391F3F, R0 = 00000000, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9BA:  00 28              CMP       R0, #0
J-Link>regs
PC = 6001C9BC, CycleCnt = 92391F40, R0 = 00000000, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9BC:  6B D1              BNE       #+0xD6
J-Link>regs
PC = 6001C9BE, CycleCnt = 92391F41, R0 = 00000000, R1 = 00000000, R2 = 2020676C, R3 = 00000000
J-Link>s
6001C9BE:  DF F8 AC 03        LDR       R0, [PC, #+0x3AC]

****** Error: Communication timed out: Requested 4 bytes, received -2 bytes !
J-Link>

Everything looks perfectly sensible until it goes boom!  I have to confess as to being at a bit of a loss!

Chris.

0 Kudos
Reply

5,515 Views
mjbcswitzerland
Specialist V

Chris

Is it possible that you FlexSPI speed is set slightly too high for the QSPI flash where your code is running in? And that the problem is due to an instruction read with a transfer/bit error?

Try slightly reducing the speed or relocating the interrupt to RAM to see whether the problem stops or moves to another piece of code.

Regards

Mark

[uTasker project developer for Kinetis and i.MX RT]

5,515 Views
chriscowdery
Contributor V

Hi Mark,

 I just ran a test with QSPI set to 100MHz, and it made no different - still crashed after 37 minutes.

Previously I have wondered about memory integrity so I wrote a soak test that had 20,000 NOP's (which is enough to cause the I-Cache to be flushed) and that ran for a few days. It also did a pseudo-random write and read of OCRAM, ITCM and DTCM to check they were reliable too. They were.

It is very mysterious.

Chris.

0 Kudos
Reply

5,515 Views
kerryzhou
NXP TechSupport
NXP TechSupport

Hi  Chris Cowdery,

  Do you enable the cache?

   Try to disable the cache, whether any improvement or not?

Best Regards,

Kerry

5,515 Views
chriscowdery
Contributor V

Hi Kerry,

 Yes I have tried that test:

Turn off I-Cache - OK

Turn off D-Cache - OK

So if I turn off one cache, the system is OK. It does not matter which cache I turn off.

What does that mean?

Thanks,


Chris.

0 Kudos
Reply

5,515 Views
kerryzhou
NXP TechSupport
NXP TechSupport

Hi Chris Cowdery

    So, your issue just happens when you enable the cache right? Cache usage is really needed to be very careful.

    NXP official side have a document about it, please read it on your side at first, some data need to put in the non-cache area.

    https://www.nxp.com/docs/en/application-note/AN12042.pdf 

If you still have issues about it, please kindly let me know.

Best Regards,

Kerry

 

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

 

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------

5,515 Views
chriscowdery
Contributor V

Hi Kerry,

 I have checked the Application Note - thankyou for sharing it with me.

We are using the same memory configuration as the EVK - because changing it breaks the debugger (the .flash file that the J-Link uses assumes the standard configuration).

Please explain "some data need to put in the non-cache area."

Having data in the 'wrong' FlexRAM region (e.g. stack in OCRAM) will have a performance penalty. But it will still work right?

Is there something that can cause a 'lockup' if we have it in a cacheable area?

Thanks,

Chris.

0 Kudos
Reply

5,515 Views
kerryzhou
NXP TechSupport
NXP TechSupport

Hi Chris Cowdery

   About the non cache area, eg, the USB data buffer, enethnet data buffer, SD card related data, more details, you can check the AN.

  Abou the wrong data, do you try to put it in the ITCM or DTCM, whether it still has issues or not?

Best Regards,

Kerry

 

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

 

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------

5,515 Views
chriscowdery
Contributor V

Hi Kerry,

 It is just Ethernet that uses DMA. We use fsl_enet.c/h from NXP. I have #define FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL which takes care of cache on DMA buffers.

It is hard for me to move RAM allocation. We are using FreeRTOS, and have the heap in OCRAM (128K). We need the heap to be that size, so we cannot move it to DTCM even though DTCM would be better!

But it does not matter right? (apart from speed)

Chris.

0 Kudos
Reply

5,515 Views
kerryzhou
NXP TechSupport
NXP TechSupport

Hi Chris Cowdery,

  From this block diagram:

pastedImage_1.png

We can find the OCRAM, SIM_EMS(FlexSPI), SEMC all need to use the Interconnect wich is connect the I Cache, D cache, so if the related module use the cache in the same time, it may have conflict, and may cause your lock issues.

BTW, you also can refer to the AN chapter 5. Constraint Speculative Prefetch.  As Cortex-M7 support speculative prefetch feature, which can do speculative accesses to memory locations with Normal Memory attribute at any time, and if prefetching happens on invalid address, it will generate bus fault, so it must to avoid this issue occur.

Best Regards,

Kerry

5,515 Views
chriscowdery
Contributor V

Hi Kerry,

 1. "it may have conflict, and may cause your lock issues." - we use I-Cache, D-Cache, FlexSPI and OCRAM. Are you saying we cannot use OCRAM and FlexSPI at the same time? Surely the bus fabric must arbitrate?

2. "BTW, you also can refer to the AN chapter 5" - which application note? AN12077 only has 4 chapters.

3. "if prefetching happens on invalid address, it will generate bus fault," - this causes an exception right? So we know if it happens? We do not see any BUSFAULT exceptions, just a lockup.

Thanks,

Chris.

0 Kudos
Reply

5,515 Views
chriscowdery
Contributor V

I just found the AN - AN12042. Using the i.MXRT L1 Cache.

Chris.

0 Kudos
Reply

5,515 Views
kerryzhou
NXP TechSupport
NXP TechSupport

Hi Chris Cowdery,

  So sorry, the previous AN link is wrong, you are right, I want to give your AN12042 cache application note.

  About the lockup, as I know, the ARM core if get busfault and didn't clear it, the new fault is coming, then it will meet the luck up issues.

  Please check the cache application note again.

  Any updated information, please kindly let me know.

Best Regards,

Kerry

5,514 Views
chriscowdery
Contributor V

Hi Kerry,

 As far as I can tell, we are doing everything in the AN. We do write to flash, but use ram_funcs for that, and flush the cache afterwards.

We are using "Cache maintenance in SDK driver" for Ethernet.

We do not have SDRAM.

You say busfault - if a busfault occurs, it should go into the busfault handler right? It does not go into a busfault handler.

Surely all faults should go to the respective handler? Then the debugger can halt the processor and we can detect the fault.

Chris.

0 Kudos
Reply

5,512 Views
kerryzhou
NXP TechSupport
NXP TechSupport

Hi Chris Cowdery

   Thanks for your updated information.

   So now, even you do all the related items in the AN12042, you still meet the lockup issues?

   In the previous time, I also meet other customer meets the lockup issues, it at last is caused by the ARM speculative prefetch issues.

pastedImage_3.png

   About why it is lockup issue, not the busfault issue.

  You may find some information from the ARM core document.

pastedImage_1.png

   pastedImage_2.png

   This issue should also related to the ARM core, please check more details in the ARM cortex M7 document:

Documentation – Arm Developer 

Best Regards,

Kerry

 

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

 

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------