Hi,
I just had a look at nadk_memcpy.h delivered as part of ls2085a-sdk-ear5 and to me the code seems to be rather inefficient.
It seems to be based on a hand-optimized memcpy routine for Intel SSE capable CPUs and defines macros to perform block-wise SIMD moves which for the freescale-version have been replaced with calls to memcpy:
[code]
static inline void nadk_mov64(uint8_t *dst, const uint8_t *src) { memcpy(dst, src, 64); }
static inline void nadk_mov128(uint8_t *dst, const uint8_t *src) { memcpy(dst, src, 128); }
[/code]
Later those macros are called from within nadk_memcpy_func(), which handles all the alignment issues one would have to care about when those macros would actually be *real* assembler. While most likely the generated code isn't as horrible as the C-code suggests, I still don't understand why nadk_memcpy isn't simply redirecting to memcpy?
Memcpy most likely is already SIMD optimized, and for small memcpys the compiler can use fast inline-versions. At least it would remove a lot of code which most likely doesn't do what it has been designed for (to provide a fast and efficient version of memcpy I presume)
Best regards
(When) will this be fixed?