Hi Caesarl,
Per MD5 spec, there is a pad process before feeding the raw message, please kindly refer to the following for details:
MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit words); the message is padded so that its length is divisible by 512. The padding works as follows: first a single bit, 1, is appended to the end of the message. This is followed by as many zeros as are required to bring the length of the message up to 64 bits fewer than a multiple of 512. The remaining bits are filled up with 64 bits representing the length of the original message, modulo 264.
Here I provide a pad() for that purpose, you may use it in your application, and please also note, if you use mmcau_md5_update() instead, no need to call cau_md5_initialize_output() firstly.
void pad(char *p, int num_blks, int nbytes)
{
int i,j,n;
for (i=0; i<(64*num_blks); i++) {
if (p[i] == 0)
break;
}
p[i++] = 0x80;
for (j=i; j<((64*num_blks)-8); j++) {
p[j] = 0;
}
n = nbytes << 3;
p[j++] = n;
p[j++] = n>>8;
p[j++] = n>>16;
p[j++] = n>>24;
p[j++] = 0;
p[j++] = 0;
p[j++] = 0;
p[j] = 0;
}
I also attached my test project here, you may refer to it for more details.

Hope that helps,
Have a great day!
B.R
Kan !