Failure on BX LR istruction in redlib

dh1 · ‎04-14-2022

Hello

We are using the RT1060 processor. Since a long time, we experience problems from time to time in the standard library. We are using the redlib library. Often there are hard faults when using strlen, memcmp or in the current case fabs().

The problems often appear, when code is changed in a completely different part of the application. And it disappears, if e.g. only a line of random code is added.

The debugging of the problem shows that the processor does not return to the calling code on the BX LR instruction. Instead, it just continues to the next address in the code.

Here is a debug session showing the problem:

Start. Here, the function fabs() is called:

This is the situation two steps later, after the execution of the BL.
It's a bit strange that the disassembly shows the code to be in the asin() function. But the PC shows address 0x600aff652 address. So I think this is a debugger problem? Or could this be a problem, that the call is to 0x600aff652, but the fabs adress is at 0x600aff653?

The next three steps seem normal:

But the next single step, jumps into the fp_round() function instead of going back:

From there, it goes on in the pf_round function until it crashes.

I wonder if the standard library could be compiled differently than our project, so it does not fit together.
It is strange that we have various such problems with the standard library.

Any help is appreciated!

Regards,
Daniel

dh1 · ‎05-13-2022

After two days of debugging, I think I am close to the root cause. The debugging is very difficult because very little changes to the code made the problems disappear.
In the project, we do a configuration of the FlexSPI parameters. The function looks like this:

__RAMFUNC(RAM2) void fwflash_nor_flash_init(FLEXSPI_Type *base)
{
flexspi_config_t config;

/*Get FLEXSPI default settings and configure the flexspi. */
FLEXSPI_GetDefaultConfig(&config);

/*Set AHB buffer size for reading data through AHB bus. */
config.ahbConfig.enableAHBPrefetch = true;
config.ahbConfig.enableAHBBufferable = true;
config.ahbConfig.enableReadAddressOpt = true;
config.ahbConfig.enableAHBCachable = true;
config.rxSampleClock = kFLEXSPI_ReadSampleClkLoopbackFromDqsPad;


FLEXSPI_Init(base, &config);

/* Configure flash settings according to serial flash feature. */
FLEXSPI_SetFlashConfig(base, &deviceconfig, kFLEXSPI_PortA1);

/* Update LUT table. */
FLEXSPI_UpdateLUT(base, 0, customLUT, CUSTOM_LUT_LENGTH);

/* Do software reset. */
FLEXSPI_SoftwareReset(base);
}

I have now found out, that if I jump over the FLEXSPI_SoftwareReset call or remove it, the problem disappears.
This line seems to cause different kind of effects. It was causing the problem with fabs() explained above, but also a failing memcmp() described in this thread: memcmp-fails-on-RT1060

I made sure that all the code handling the FlexSPI settings is running in RAM.

Is there an explanation for these effects?

Regards,
Daniel

crist_xu · ‎05-13-2022

Please also help check the location of the LUT table, if it has a const value, and be stored in flash?

crist_xu · ‎05-10-2022

Hi,

Please have a look at the address of the function, when it runs to an error, the address is usually aligned to a 4-bytes, or 2-bytes, for thumb-code set is 16-width length or 32-bit width. So it has no chance to be compiled as a function which address is odd.

In other words, all the function address must be a even number, such as, the address of the fp_round:

Please have a check if the lib is compiled with another compiler, or another arm core. Also, you said that :

And it disappears, if e.g. only a line of random code is added. I think that maybe the added line, lead the address to a

even one. You can also check when all the address is even or aligned to 2/4B, if the error gone?

Regards,

Crist

dh1 · ‎05-11-2022

Hi Crist

Thanks for your reply.
I checked the address of fabs(), the failing function, and it is 0x600af652 in the current test case.

Here is some information I found with objdump about the library:

fabsf.o: file format elf32-littlearm

Contents of section .comment:
0000 00474343 3a202847 4e552041 726d2045           .GCC: (GNU Arm E
0010 6d626564 64656420 546f6f6c 63686169              mbedded Toolchai
0020 6e203130 2d323032 302d7134 2d6d616a            n 10-2020-q4-maj
0030 6f722920 31302e32 2e312032 30323031             or) 10.2.1 20201
0040 31303320 2872656c 65617365 2900 103             (release).

fabs.o: file format elf32-littlearm
architecture: armv7e-m, flags 0x00000010:
HAS_SYMS
start address 0x00000000

I further checked about the alignment of the function labs(). There seems to be no alignment forcement to an even number.

Here it is on an even address:

Then I add a NOP at a random place in code. This moves the function to an odd address:

Is this unexpected?

Regards,
Daniel

crist_xu · ‎05-13-2022

Hi，

Default, the compiler can help insert some "NOP" to make the alignment. But, here, I think this situtation is strange, so please help check, which compiler did you use? and if it is ok, could you please share your project, or a simple-project that can re-produce the erro?

Regards,

Crist

danielchen · ‎05-05-2022

It seems a memory issue. check the memory overflow? memory unalignment?...
I would suggest you narrow down the problem by using

#if 0

#endif

dh1 · ‎05-05-2022

I don't understand how a memory overflow can lead to a failure of a BX LR instruction? Can you be more specific?

However, I checked that the stack did not overflow.

Or how could the code be unaligned? I understand that data can be unaligned, which usually leads to an unaligned exception. But that is not the case.

Your approach of #if0 does not work, because every minor change in the code makes the problem go away (to re-appear sometimes later at another random change). That makes it very hard to find the cause this way.

Regards,
Daniel

danielchen · ‎05-05-2022

Did you try other IDE?

dh1 · ‎05-05-2022

What is the point of this question? Do you suspect the debugger does something wrong?

The crash also happens without debugger. Do you mean with the same or with a different compiler? I am almost sure the problem will go away (temporarily), if we use a different compiler. But it does not help finding the root cause.