Hello Charles,
Ghm, didn't older CW for MCUs have option to erase/program only used flash/eeprom areas? I remember I had to use something like it in CW debugger settings to recover from S12Z machine exception on EEPROM ECC fault. I can't find such options in CW for MCUs 11.1. But if you had it enabled, it could lead as well to problems verifying checksum. You calculate checksum assuming all unused bytes are 0xFF, than if program gets shorter so that not all previously used sectors are erased.. checksum check will fail.
My asm pseudocode uses 32 bits checksum, easy to adapt to 16 bits, bit longer though.
Yes, C is always convenient and often as effective as assembler, provided your algorithm is effective and CPU friendly, as well C code should give optimizer close or even better nothing do do.
First line of your loop body 1) shifts index left, 2) adds result to pointer, then 3) dereferences pointer and 4) stores flash dword to some file or global scope variable. Then there are 4,5) two adds. Global scope variable reduces optimizers freedom to optimize that store out as not needed and reuse values in CPU registers instead. Local variable on stack should be better, I think. Then for loop 6) increments index, 7) compares to the limit and 8) branches to the start of loop body. Optimizer may group and eliminate some of these steps, but not all. My code 1) dereferences pointer, reads and adds data to the checksum and advances pointer to the next location in one step, for 16 bits checksum it would be identical step 2). 3) compare pointer to the limit, 3) branch to start. Quite faster, isn't it?
Here's routine for you to calculate 16 bits sum:
__asm unsigned short cksum(unsigned long start, unsigned long top)
{
// load start to X pointer register
// first 32bits argument is passed in D6
TFR D6, X
// load Y with end address
// top is assumed to be word aligned,
// start-top 0x100..0x110 will return identical result for 0x100..0x10F
// +1 because Y is 24 bits pointer, not 32 like top
LD Y, top+1
CLR D2 // initialize 16 bits checksum = 0
L1: ADD D2, (X+) // add and advance pointer
// ... ADD D2, (X+)
CMP X,Y // compare with top
BLO L1 // keep looping while X < Y
// return value is in D2 already
}
You may try as well uncomment second ADD provided start and top differ by multiple of 4. You may also replace two ADD's with 32 bits fetch, shift and adds to check if 32 bits access is faster. Try replacing code down from L1 with this
L1: LD D6, (X+) // 32 bits fetch
ADD D2, D6 // add lo word
LSR D6, D6, #16 // get high word
ADD D2, D6 // add hi word
CMP X,Y // compare with top
BLO L1 // keep looping while X < Y
// return value is in D2 already
Edward