MKV5x, bad performance when loading data from flash.

arimendes · ‎06-07-2016

I did the following code in IAR Assembler to test the MKV5x:

...

SET_PIN_PB23 ;MACRO

MOV32 R0,#FloatTable

VLDR.F32 S2,[R0,#0]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#4]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#8]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#12]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#16]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#20]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#24]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#28]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#32]

VMLA.F32 S0,S1,S2

VLDR.F32 S2,[R0,#36]

VMLA.F32 S0,S1,S2

CLEAR_PIN_PB23 ;MACRO

...

DATA

alignram 2

FloatTable

DF32 0.1

DF32 0.2

DF32 0.3

DF32 0.4

DF32 0.5

DF32 0.6

DF32 0.7

DF32 0.8

DF32 0.9

DF32 1.0

The addres of FloatTable is 0x100013A0.

This code takes 1.05 uS when CPU clock is 200MHZ or 210 cycles. (too much)

When FloatTable is replaced to another table in SRAM, the code only takes 0.275 uS or 55 cycles.

If the chip have 8 KB of Data cache, why the performance is so bad when loading from Flash?

How to solve it?

ivadorazinova · ‎06-21-2016

Hi Ari.

KV58 is Cortex M7 based – i.e. has Harvard architecture buses:

-CODE bus (optimized for access to instruction - instruction fetch), through which access to the flash (instruction - cache), also to the I-TCM (something like RAM)

- System Bus (optimized for data access - Data Access), through which access to the D0-TCM, D1-TCM, peripherals etc.

Both buses are cached, i.e. both buses have access to data of slower memories (everything except TCM memory) which are cacheable. Data are stored in the cache only in certain circumstances.

Depends on the occurrence, repeatability access to this data, it is possible to locked content on some cache memories, i.e. to-cached specific code/data and the region cache consequently locked.

It looks that your code on which monitors the performance does not perform from cache but flash memory (i.e. if there are some branch instructions – wait states)

The code is executed from flash memory because is performed only once (it could be cached during its performing – in fact that it is in cache until it was erased by some other event) but due to to the fact that it was executed only once, it is acted as if it is done only from flash).

I consulted your issue with the Application team and for optimum use of core M7 performance we suggest to you store critical code into the I-TCM memory (this is a quick memory intended exclusively for the code – Instruction Tightly-Coupled-Memory) and static data to the D0-TCM and dynamic (stack) to the D1-TCM. Thus, the maximum power is achieved, i.e. CODE bus accesses (fetch) to instructions which are stored in I-TCM and also SYSTEM bus can access to data which are stored in D0-TCM or D1-TCM.

In case that the code is stored in Flash memory, is needed to enable I-CACHE and consequently is required repeatability of the code, i.e. to perform more then 1 time. It could be also possible by the way that the code is performed by force – i.e. just the code cached) and then the cache is locked and other code performing does not have to access to the flash, but it is performed directly from a cache memory.

Also, it is often used data from peripherals directly in the calculation is needed to enable also D-CACHE (it could be also useless in some cases – for example that data are used from ADC is often necessary anyway update).

In case of any issue, please let me know.

Best regards,

Iva

arimendes · ‎06-27-2016

Iva,

How to enable the D-CACHE?

When the new KSDK 2.0 for Kinetis KV5x will be available?

Thanks,

Ari.

ivadorazinova · ‎06-28-2016

Hello Ari,

please see chapter 17 in RM,http://cache.nxp.com/files/32bit/doc/ref_manual/KV5XP144M220RM.pdf?fsrch=1&sr=1&pageNum=1#d162e5a131...

KV5x is expected by mid of July, Introducing Kinetis SDK v2 | NXP Community

I hope this helps.

Best Regards,

Iva

arimendes · ‎06-10-2016

I tested it using the PKV58F1M0VLL22.

MKV5x, bad performance when loading data from flash.

MKV5x, bad performance when loading data from flash.

Kinetis V Series MCUs