I didn't know how many LDD-STD pairs do you have actually. Yes, 5bit indexed addressing mode offset (IDX) takes sometimes shorter than 9bit indexed addressing mode (IDX1). There's also IDX2 which sometimes takes even more. Assembler usually compiles to the shortest and fastest form of instruction.
LDD (IDX) takes 3 cycles (See CPU12 reference manual, Section 6 "Instruction Glossary").
LDD (IDX1) takes also 3 cycles
STD (IDX) takes 2 cycles
STD (IDX1) takes 3 cycles
So when offset is from -16 to +15 - assembler chooses more compact IDX addressing, that's why 8 pairs do take 5 cycles each, and starting from ninth they take 6 cycles.
You could rewrite your code using postincrement indexed addressing mode like this
LDD _PORTAB // Load 16-bit word to register
STD 2,X+ // Copy register to buffer AND ADVANCE POINTER to the buffer in register X
LDD _PORTAB // etc...
STD 2,X+
LDD _PORTAB
STD 2,X+
...
postincrement indexed STD takes the same like STD with 5bit offset. So timing of one pair should be the same 5 cycles.
CPU reference manual contains all the answers to such questions, it's free download :smileyhappy:
Regards