S32K144: function different link address caused different behavior

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

S32K144: function different link address caused different behavior

跳至解决方案
2,638 次查看
dsfire
Contributor III

Hello, guys,

I am using S32K144 for my project. I encountered a strange problem on S32 Design Studio for ARM V1.3, id:170119.
The following codes implement approximate milliseconds delay.

void DelayMS (INT32U num)
{
    INT32U i,j;

    for (i = 0,j = 0; i < num; i++){
        while(j++ < 9000) __asm__ ("NOP");
        j = 0;
    }
}
when this function "DelayMS" be linked at address xxxxxxx8(e.g. 0x00020078), the time delay is approach to 1.5*"num".
when i modify some codes(in other function) and this function "DelayMS" be linked at address xxxxxxx0(e.g. 0x00020080 0x00020090), xxxxxxx4(e.g. 0x00020084), xxxxxxxC(e.g. 0x0002008C), the time delay is approach to "num".

No project setting changed and the disassembly of function"DelayMS" is the same during other codes modified.
And i can sure no interrupt happened during function DelayMS executed.

why different link adddress caused different behavior?

Can somebody support me about this strange question?

Thanks.

1 解答
2,203 次查看
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

it's caused by fetching of the code from flash memory.

Reading of physical flash array is not very fast, it's not read in one cycle. There's 128-bit data bus between flash array and flash controller. If data are already present in a buffer then the data can be read in one cycle. If data are not present, it's necessary to read physical flash array and it takes more time. See "Flash Memory Controller" section in the reference manual for more details.

So, it depends on alignment of your code across 128-bit flash lines. 

To eliminate this effect, it's necessary to have the instruction cache enabled. The loop will be cached during first iteration and then all instruction fetches will be done in one cycle only.

Or, when you need to have accurate delay, use a timer.

Regards,

Lukas

在原帖中查看解决方案

6 回复数
2,204 次查看
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

it's caused by fetching of the code from flash memory.

Reading of physical flash array is not very fast, it's not read in one cycle. There's 128-bit data bus between flash array and flash controller. If data are already present in a buffer then the data can be read in one cycle. If data are not present, it's necessary to read physical flash array and it takes more time. See "Flash Memory Controller" section in the reference manual for more details.

So, it depends on alignment of your code across 128-bit flash lines. 

To eliminate this effect, it's necessary to have the instruction cache enabled. The loop will be cached during first iteration and then all instruction fetches will be done in one cycle only.

Or, when you need to have accurate delay, use a timer.

Regards,

Lukas

2,203 次查看
dsfire
Contributor III

Hi, Lukas,

Thanks for your reply.

I will try to enable cache for instruction and test.

Yes, timer is the best. I want to know the root cause, and understand the root cause.

But How to understand "it depends on alignment of your code across 128-bit flash lines"  on the below picture.

Because it seems to me that no special different in four part on below picture.

disassembly.png

0 项奖励
2,203 次查看
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

I will use simple example only: Let's assume we have very short loop - four 32bit instructions only. This is 128bits in total and it is the width of flash line buffer. Let's also assume that no speculative read is enabled (read 35.5.2 Speculative reads in the reference manual for more details).

If this loop is in aligned to 128bit, whole loop fits one flash line. Once it is loaded to flash line buffer, it can be read in single cycle. So, when the core is fetching instructions again and again in a loop, it fetches them in single cycle.

If we shift the loop, so two instructions are in one 128bit frame and next two instructions are in next 128bit frame, then reading of physical flash array is needed during execution. First two instructions are executed, then there's buffer miss, so next line of flash is loaded, next two instructions are executed and it jumps back but there's also buffer miss... and this is going again and again.

So, the cache memory helps here a lot.

Regards,

Lukas

2,203 次查看
dsfire
Contributor III

Hi, Lukas,

Thanks for your quickly reply.

I got the point.

1. Can we regard flash line buffer as a another 'Cache', right?

2. Would you please provide a demo to me to learn how to enable cache for instruction and data, because nothing found about cache configuration in community.

3. I use __attribute__((section(".m_data"))) for function DelayMS to assign it in RAM, but it seems to be ineffective.

Thanks again.

Best Regards

Liu

0 项奖励
2,203 次查看
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi Liu,

1. Yes, but it's very simple one holding just one flash line.

2. Search for Cache control register PCCCR in the reference manual for more details:

pastedImage_1.png

And then write this value to the register (from reference manual, page 711):

pastedImage_2.png

3. You need to use:

__attribute__ ((section(".code_ram")))

Regards,

Lukas

0 项奖励
2,203 次查看
dsfire
Contributor III

Hi, Lukas,

Thanks again.

Regards,

Liu

0 项奖励