If I lock L1 I cache of MPC7410 ...

yongsungkwon · ‎03-25-2016

If I lock L1 I cache(with L1 I and L1 d cache enabled, L2 disabled) of MPC7410, does system bus transaction occur same as when L1 I cache unlocked? Does 32-byte burst read occur?

I have encountered system crash when a certian event occur with L1 I, LI D cache eabled, L2 disabled.

But system does not crash with L1 I cache disabled.

I have been suggested to do intense memory test with L1 cache locked from a fellow engineer.

Doing that, would bus operation still be heavy same as L1 cache unlocked?

If system does not crash with L1 I cache locked, Can I conclude that bus line transaction(ex. burst read) has no problem?

LPP · ‎03-29-2016

>does system bus transaction occur same as when L1 I cache unlocked

In either case, instruction fetches to cacheable memory use 32-byte burst reads. On a miss to locked Icache, the bus request is a 32-byte burst read, but the cache is not loaded with data.

Instruction fetches to cache-inhibitted memory (WIMG = x1xx) in MPX mode use 16-byte reads.

>do intense memory test with L1 cache locked

I doubt the crash is caused during access to L1 cache. It is rather data corruption during external memory accesses.

The idea is to run such a memory test that causes heavy load to the memory but doesn't crash on data corruption. For this purpose, the test code should be compact to be resident in L1 cache. Test variables should be located in GPRs. Locking Icache is not mandatory but may be usefull.

> would bus operation still be heavy same as L1 cache unlocked

There is no difference between the cases if the code is resident to L1 Icache.

> still be heavy

Use a memory stress test.

For example:

a. initialize a memory block (size less or more vs L1 cache size) with a test pattern and calculate CRC

b. copy this block to a new location (various memory offset)

c. check CRC of the target

d. repeat from (b) by using the target as a source

PS. Optimize memcpy to increase the bus load. Below is a snippet from IBM appnote about memcpy optimization. The performance of the optimized code is reported as x4 times of initial code:

cache_copy_loop:

dcbt r4,r6 // touch 2 cache lines ahead

lwzu r0,4(r4) // load 8 registers from cache

lwzu r5,4(r4)

lwzu r7,4(r4)

lwzu r8,4(r4)

lwzu r9,4(r4)

lwzu r10,4(r4)

lwzu r11,4(r4)

lwzu r12,4(r4)

dcbz r3,r6 // zero 2 lines ahead

stwu r0,4(r3) // store 8 registers to cache

stwu r5,4(r3)

stwu r7,4(r3)

stwu r8,4(r3)

stwu r9,4(r3)

stwu r10,4(r3)

stwu r11,4(r3)

stwu r12,4(r3)

bdnz cache_copy_loop

Have a great day,
Pavel

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------