Inefficient code in nadk_memcpy.h

clemenseisserer · ‎02-17-2016

Hi,

I just had a look at nadk_memcpy.h delivered as part of ls2085a-sdk-ear5 and to me the code seems to be rather inefficient.
It seems to be based on a hand-optimized memcpy routine for Intel SSE capable CPUs and defines macros to perform block-wise SIMD moves which for the freescale-version have been replaced with calls to memcpy:

[code]

static inline void nadk_mov64(uint8_t *dst, const uint8_t *src) { memcpy(dst, src, 64); }

static inline void nadk_mov128(uint8_t *dst, const uint8_t *src) { memcpy(dst, src, 128); }

[/code]

Later those macros are called from within nadk_memcpy_func(), which handles all the alignment issues one would have to care about when those macros would actually be *real* assembler. While most likely the generated code isn't as horrible as the C-code suggests, I still don't understand why nadk_memcpy isn't simply redirecting to memcpy?
Memcpy most likely is already SIMD optimized, and for small memcpys the compiler can use fast inline-versions. At least it would remove a lot of code which most likely doesn't do what it has been designed for (to provide a fast and efficient version of memcpy I presume)

Best regards

clemenseisserer · ‎03-16-2016

(When) will this be fixed?

Inefficient code in nadk_memcpy.h

Inefficient code in nadk_memcpy.h

QorIQ LS2 Devices