If we run completely in kernel and use kmalloc to allocate 1M buffers, it takes about 2ms to copy one to another. If run completely in userspace and use malloc to allocate 1M buffers, it takes about .5ms to copy them. We tried both standard memcpy and a NEON optimized copy for both, no difference. We changed the size of buffers and made no difference, always about 1/4 the speed.
Why is kmalloc'ed memory slower then malloc'ed memory? Is the cache different?
Running iMX6 DL Sabre SDP using latest Freescale ltib generated kernel (126.96.36.199-2039)