Why is access to V4L2 buffers slower than other memory?

The reason for this is that the memory that the HW blocks like IPU and VPU must be non-cached.  As a result, the CPU operations are very slow versus similar operations using cached memory areas.  One way that you could get around this would be to allocate the memory as cached and then be sure to use cache maintenance routines to ensure coherency.