Pegasus711 KnightRider wrote:
Hmmm..thanks Scott and Simbu for your inputs.
So Scott a better way would be to replace dcbz, dcbza and dcbzep with their long equivalents for processes that does require the hardware default cache size that the core offers? if not then have this bit set in L1CSR0 (for d-cache), follow the sync requirements mentioned in the manuals and be done with it?
In the kernel, yes, though I wouldn't look at it as "requires the hardware default cache size" but "does not have a legacy requirement for a simulated 32-byte cache block". The best way is to avoid the need for DCBZ32 at all.
I do not recommend trying to selectively patch up userspace to use dcbzl. Either implement a per-process mechanism or lie to userspace and say that cache blocks are 32 bytes.
What I am unable to get my head around is actually understanding how setting this bit on a per process basis, with a process being more of a kernel construct, work with the exact same register on the hardware level, with some processes requiring the entire native cache line size while the others needing only 32 bytes.
To make myself clear: Consider two processes P1 and P2. P1 doesn't need this bit set and P2 does. Now the scheduler has P1 scheduled before P2. P1 goes ahead, fetches a certain memory location, finds that the dcache does not have the needed memory location in the dcache, brings it from the RAM, updates the cache block (line) and sets the appropriate bit telling that the cache line is valid. Now the scheduler schedules P2, which ironically wants the same memory and it wants to have this memory area from the cache cleared (why would it want the same memory area is beyond me although) . So although this requirement may sound stupid, if it needs to work on the same cache line for some strange reason, and in addition it also needs it to be 32bytes only, then what if it goes and sets that bit in this register. What would happen to the other 32 bytes in that line?
Again, DCBZ32 does not alter the actual structure of cache. A cache line is 64 bytes on e500mc, always. DCBZ32 just changes the behavior of instructions like dcbz. Instead of allocating and zeroing a cache line, it becomes a "zero out 32 bytes at this address" instruction. The performance benefit of dcbz is lost.
To Simbu: I believe the flag, as Scott said above was process based (although it has to be global flag it seems). I wonder if having a separate config option added to the KConfigs under powerpc would be right? Any comments on this Scott?
Additionally simbu, you said your tool chain is givin you problems. Is there a way to specify this to GCC? I mean cache line size? IF yes why would you want to specify it while doing the compilation,linking, loading translation?
Hoping to hear from you fellas
I didn't say that it is process based normally -- that was something that Simbu was trying to implement. The current state of the kernel is that DCBZ32 is not supported.