Galcore module and cache-coherency

todd_blanchard · ‎03-27-2020

Please advise on the following regarding CMA and galcore cache-flushing. This is on an iMX6q, kernel version 4.9.88.

The galcore module allocates a large CMA region defined by contiguousSize at module load. This region is MEM/BUFFERABLE/WC (uncached) in the kernel. The same region is then mmapped via _CMAFSLMapUser() and the subsequent (virtual) address space is then 'flushed' via gckOS_CacheFlush(). This cache flush takes upwards of 800 ms, and is done for each process which mmaps the same area (5 in our case).

This is adding significant time to application start. Here are the questions/comments:

I see no configuration to completely disable the cache flushing because the command buffers are allocated via the gfp allocator, and are cacheable. Setting CACHE_FUNCTION_UNIMPLEMENTED will cause the command buffer operations to fail in obscure ways.
The flushing of the mmapped area seems unnecessary, as the region is memset/flushed by the kernel during dma_alloc_wc() and the region is uncached.

Is there a configuration other than a specific code change that allows the mmapped CMA to bypass the cache flushing but keep the cache-flushing for the command buffers allocated from the gfp area? In addition, why would the CMA allocator perform cache-flushing (CMA is not cacheable).

Thanks,
Todd

Bio_TICFSL · ‎04-02-2020

Hello Todd,

We believe that your application requires more memory than the one allocated by CMA. We deduce this as the applications call gckOS_CacheFlush(), which we suppose is called only for cacheable memory. However from i.MX Graphics user guide, CMA allocates noncacheable memory and only if CMA cannot allocate required memory, system allocator will allocate the required memory(cacheable) from system allocator to the application. So it would be system allocator and not CMA allocator that would be performing cache-flush.

Can you check after increasing the CMA size?

Regards

todd_blanchard · ‎04-02-2020

We have more than enough CMA. Only a little over half is used. There is over 100 MB CMA free.

Can you look at the code? In gc_hal_kernel_allocator_cma:CMAFSLAlloctorInit()

allocator->capability = gcvALLOC_FLAG_CONTIGUOUS
| gcvALLOC_FLAG_DMABUF_EXPORTABLE;

The CMA allocator cannot allocate NON_CONTIGUOUS memory.

USE_KERNEL_VIRTUAL_BUFFERS is explicitly set in gc_hal_options.h:

#ifndef USE_KERNEL_VIRTUAL_BUFFERS
#if defined(UNDER_CE)
# define USE_KERNEL_VIRTUAL_BUFFERS 1
#else
# define USE_KERNEL_VIRTUAL_BUFFERS 1
#endif
#endif

This results in:

gckCOMMAND_Construct()->gckKERNEL_AllocateVirtualCommandBuffer()

Where the flag is set to gcvALLOC_FLAG_NON_CONTIGUOUS, and the GFP is used instead because the CMA is not configured to allocate NON_CONTIGUOUS memory. I expect it may be a bad idea to allocate the command objects from CMA anyway, due to potential fragmentation from many smaller allocations. Not 100% sure on this though.

This is not the real issue, however. The CMA allocator explicitly flushes the caches in _CMAFSLMapUser(), which seems unnecessary altogether. It is this cache flush that takes extraordinarily long, and seems like it should just be removed:

gc_hal_kernel_allocator_cma.c:

if (gcmIS_SUCCESS(status)) {

     gcmkONERROR(gckOS_CacheFlush(
                                 Allocator->os,
                              _GetProcessID(),
                              Mdl,
                              gcvINVALID_ADDRESS,
                              userLogical,
                              Mdl->numPages * PAGE_SIZE
                              ));

                 *UserLogical = userLogical;
}

What is the reason for the above call to gckOS_CacheFlush()?

Bio_TICFSL · ‎04-03-2020

As you are using kernel version 4.9.88, can you move to kernel version 4.14.78 or 4.14.98? The call to gckOS_CacheFlush has been removed from _CMAFSLMapUser.