S32K144: function different link address caused different behavior

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

S32K144: function different link address caused different behavior

Jump to solution
3,784 Views
dsfire
Contributor III

Hello, guys,

I am using S32K144 for my project. I encountered a strange problem on S32 Design Studio for ARM V1.3, id:170119.
The following codes implement approximate milliseconds delay.

void DelayMS (INT32U num)
{
    INT32U i,j;

    for (i = 0,j = 0; i < num; i++){
        while(j++ < 9000) __asm__ ("NOP");
        j = 0;
    }
}
when this function "DelayMS" be linked at address xxxxxxx8(e.g. 0x00020078), the time delay is approach to 1.5*"num".
when i modify some codes(in other function) and this function "DelayMS" be linked at address xxxxxxx0(e.g. 0x00020080 0x00020090), xxxxxxx4(e.g. 0x00020084), xxxxxxxC(e.g. 0x0002008C), the time delay is approach to "num".

No project setting changed and the disassembly of function"DelayMS" is the same during other codes modified.
And i can sure no interrupt happened during function DelayMS executed.

why different link adddress caused different behavior?

Can somebody support me about this strange question?

Thanks.

1 Solution
3,349 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

it's caused by fetching of the code from flash memory.

Reading of physical flash array is not very fast, it's not read in one cycle. There's 128-bit data bus between flash array and flash controller. If data are already present in a buffer then the data can be read in one cycle. If data are not present, it's necessary to read physical flash array and it takes more time. See "Flash Memory Controller" section in the reference manual for more details.

So, it depends on alignment of your code across 128-bit flash lines. 

To eliminate this effect, it's necessary to have the instruction cache enabled. The loop will be cached during first iteration and then all instruction fetches will be done in one cycle only.

Or, when you need to have accurate delay, use a timer.

Regards,

Lukas

View solution in original post

6 Replies
3,350 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

it's caused by fetching of the code from flash memory.

Reading of physical flash array is not very fast, it's not read in one cycle. There's 128-bit data bus between flash array and flash controller. If data are already present in a buffer then the data can be read in one cycle. If data are not present, it's necessary to read physical flash array and it takes more time. See "Flash Memory Controller" section in the reference manual for more details.

So, it depends on alignment of your code across 128-bit flash lines. 

To eliminate this effect, it's necessary to have the instruction cache enabled. The loop will be cached during first iteration and then all instruction fetches will be done in one cycle only.

Or, when you need to have accurate delay, use a timer.

Regards,

Lukas

3,349 Views
dsfire
Contributor III

Hi, Lukas,

Thanks for your reply.

I will try to enable cache for instruction and test.

Yes, timer is the best. I want to know the root cause, and understand the root cause.

But How to understand "it depends on alignment of your code across 128-bit flash lines"  on the below picture.

Because it seems to me that no special different in four part on below picture.

disassembly.png

0 Kudos
Reply
3,349 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

I will use simple example only: Let's assume we have very short loop - four 32bit instructions only. This is 128bits in total and it is the width of flash line buffer. Let's also assume that no speculative read is enabled (read 35.5.2 Speculative reads in the reference manual for more details).

If this loop is in aligned to 128bit, whole loop fits one flash line. Once it is loaded to flash line buffer, it can be read in single cycle. So, when the core is fetching instructions again and again in a loop, it fetches them in single cycle.

If we shift the loop, so two instructions are in one 128bit frame and next two instructions are in next 128bit frame, then reading of physical flash array is needed during execution. First two instructions are executed, then there's buffer miss, so next line of flash is loaded, next two instructions are executed and it jumps back but there's also buffer miss... and this is going again and again.

So, the cache memory helps here a lot.

Regards,

Lukas

3,349 Views
dsfire
Contributor III

Hi, Lukas,

Thanks for your quickly reply.

I got the point.

1. Can we regard flash line buffer as a another 'Cache', right?

2. Would you please provide a demo to me to learn how to enable cache for instruction and data, because nothing found about cache configuration in community.

3. I use __attribute__((section(".m_data"))) for function DelayMS to assign it in RAM, but it seems to be ineffective.

Thanks again.

Best Regards

Liu

0 Kudos
Reply
3,349 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi Liu,

1. Yes, but it's very simple one holding just one flash line.

2. Search for Cache control register PCCCR in the reference manual for more details:

pastedImage_1.png

And then write this value to the register (from reference manual, page 711):

pastedImage_2.png

3. You need to use:

__attribute__ ((section(".code_ram")))

Regards,

Lukas

0 Kudos
Reply
3,349 Views
dsfire
Contributor III

Hi, Lukas,

Thanks again.

Regards,

Liu

0 Kudos
Reply