Here's that same question in this forum from 9 years ago:
https://community.nxp.com/message/56006?commentID=56006#comment-56006
What functions you end up using depends on whether raw speed is important for this function in your application.
If you have access to the Linux Sources (and we all do, just download them) there are plenty of examples on how to do operations like this on different CPUs.
They also have "generic, works in C, slowly on any CPU" definitions like the one Fang pointed you to in the linked post. If you're only doing this in one-shot startup code (or code that can be written to do that), then use that. If your code is deep in the middle of processing a lot of data fast, then you may have to look for faster alternatives. As an example of where this matters, the SKHA Hardware in these chips have all the bytes reversed in the control registers, so to get any real performance out of them I had to use byterev:
Here's arch/m68k/include/asm/swab.h:
#ifndef _M68K_SWAB_H
#define _M68K_SWAB_H
#include <linux/types.h>
#include <linux/compiler.h>
#define __SWAB_64_THRU_32__
#if defined (__mcfisaaplus__) || defined (__mcfisac__)
static inline __attribute_const__ __u32 __arch_swab32(__u32 val)
{
__asm__("byterev %0" : "=d" (val) : "0" (val));
return val;
}
#define __arch_swab32 __arch_swab32
#elif !defined(__mcoldfire__)
static inline __attribute_const__ __u32 __arch_swab32(__u32 val)
{
__asm__("rolw #8,%0; swap %0; rolw #8,%0" : "=d" (val) : "0" (val));
return val;
}
#define __arch_swab32 __arch_swab32
#endif
#endif /* _M68K_SWAB_H */
Here's that code in the current tree:
https://elixir.bootlin.com/linux/latest/source/arch/m68k/include/uapi/asm/swab.h#L11
And here's the bad news. Note the "!defined(__mcoldfire__)" line above? That's because the M68k could perform word rotates, but the Coldfire can't so it can't use that trick.
Here's some other tricks (and someone who doesn't know the difference between bits and bytes):
https://stackoverflow.com/questions/22012483/how-to-reverse-the-4-bytes-of-an-unsigned-integer
Byte reversal functions are required for writing network code on little-endian machines. They use htons() and htonl() (for host-to-network order short/long) and the identical functions ntohs() and ntohl(), but they usually only generate code on little-endian machines and compile out on big-endian ones like ColdFire. But there may be some neat ideas in code like that.
You can always perform this function in one instruction if you have a 16 Gigabyte lookup table.
Tom