Variable PORTAB read/store execution time in NE64?

nerdboy · ‎07-18-2007

We have an NE64 connected to an FPGA for data transfer. We transfer 16-bit words from the FPGA to the NE64 via PORTAB, using our own handshaking scheme implemented in software on PORTE. When the handshaking completes, the FPGA just sends 16-bit word after word, every X clock cycles. The FPGA uses the ECLK from the NE64 for the transfer. The NE64 is in Normal Single Chip mode with NECLK = 0.

The code on the NE64 is the following (minus the handshaking):

    asm
    {
      SEI                // Disable interrupts
      LDX   PointerTo16bitWideBuffer // Load buffer address
      LDD   _PORTAB      // Load 16-bit word to register
      STD   0,X          // Copy register to buffer
      LDD   _PORTAB      //   etc...
      STD   2,X
      LDD   _PORTAB
      STD   4,X
      ...
    }

Given that it's just a bunch of LDD and STD instructions, I'd expect each of these instruction pairs to take a FIXED number of clock cycles. But this doesn't seem to be the case. If for example the FPGA sends (0, 1, 2, 3, 4, 5, 6, 7, etc.) leaving each unchanged for three clock cycles, the NE64 "stores" this as (0, 1, 2, 3, 4, 5, 7, etc.), with 6 missing.

If I repeat the same procedure with the FPGA leaving the data unchanged for 1, 2, 3, 4, 5, 6, 7 clock cycles, I always run into missing or duplicated codes. No constant number of cycle transfer system works!

A hack that I made which works keeps each code active for the following number of clock cycles:
3, 6, 2, 6, 6, 6, 6 Aaaaah!

So my question is: what is going on? Is the NE64 really taking a variable number of clock cycles to execute the LDD and STD pairs?

I've tried playing with the bus cycle stretching settings without success (they shouldn't do anything anyway).

Thanks!

Jeff

kef · ‎07-18-2007

Given that it's just a bunch of LDD and STD instructions, I'd expect each of these instruction pairs to take a FIXED number of clock cycles. But this doesn't seem to be the case. If for example the FPGA sends (0, 1, 2, 3, 4, 5, 6, 7, etc.) leaving each unchanged for three clock cycles, the NE64 "stores" this as (0, 1, 2, 3, 4, 5, 7, etc.), with 6 missing.

Do you expect each LDD-STD pair taking fixed number of cycles, or each LDD and each STD taking the same fixed number of cycles? In your code LDD ext takes 3 cycles, STD idx 2 cycles. So each LDD-STD pair - 5 cycles. I don't understand how, leaving the FPGA output unchanged for just 3 cycles, you are receiving 6 good numbers from FPGA?

BTW, in your code, SEI instruction is way to late to have reliable protocol. It should be moved somewhere into the middle of handshaking procedure.

I see variable number of cycles only here:

LDX PointerTo16bitWideBuffer // Load buffer address

LDX of misalignedd pointer variable in FLASH will take 1 cycle more than LDX of pointer variable in internal RAM or word aligned var in FLASH. But that's just one cycle.

nerdboy · ‎07-18-2007

Thanks for the quick reply kef.

I was expecting every LDD-STD pair to take a fixed number of cycles. I've done a lot of debugging, and this is what I've found:

The first nine LDD-STD pairs take 5 clock cycles, and then all of those thereafter take 6 clock cycles. If I adjust my FPGA to change the data according to this standard, everything works fine.

Could it be that the STD #,X instructions take an extra cycle if the # is greater than a certain value? Different addressing mode?

Would love to know why this occurs, but at least now I can write less-hacked code for my FPGA.

Thanks again,

Jeff

kef · ‎07-18-2007

I didn't know how many LDD-STD pairs do you have actually. Yes, 5bit indexed addressing mode offset (IDX) takes sometimes shorter than 9bit indexed addressing mode (IDX1). There's also IDX2 which sometimes takes even more. Assembler usually compiles to the shortest and fastest form of instruction.

LDD (IDX) takes 3 cycles (See CPU12 reference manual, Section 6 "Instruction Glossary").

LDD (IDX1) takes also 3 cycles

STD (IDX) takes 2 cycles

STD (IDX1) takes 3 cycles

So when offset is from -16 to +15 - assembler chooses more compact IDX addressing, that's why 8 pairs do take 5 cycles each, and starting from ninth they take 6 cycles.

You could rewrite your code using postincrement indexed addressing mode like this

      LDD   _PORTAB      // Load 16-bit word to register
      STD   2,X+          // Copy register to buffer AND ADVANCE POINTER to the buffer in register X
      LDD   _PORTAB      //   etc...
      STD   2,X+
      LDD   _PORTAB
      STD   2,X+
      ...

postincrement indexed STD takes the same like STD with 5bit offset. So timing of one pair should be the same 5 cycles.

CPU reference manual contains all the answers to such questions, it's free download :smileyhappy:

Regards

nerdboy · ‎07-19-2007

kef wrote:

You could rewrite your code using postincrement indexed addressing mode like this

      LDD   _PORTAB      // Load 16-bit word to register
      STD   2,X+          // Copy register to buffer AND ADVANCE POINTER to the buffer in register X
      LDD   _PORTAB      //   etc...
      STD   2,X+
      LDD   _PORTAB
      STD   2,X+
      ...
postincrement indexed STD takes the same like STD with 5bit offset. So timing of one pair should be the same 5 cycles.

Thanks for the code kef. I tested it out and it does indeed execute at a constant 5 cycles per pair. It also makes my VHDL code much less awkward.

I actually did have a good look through the manual last year when I wrote most of the (hacked) code, but I have to admit that I've never really warmed to this architecture. I work in parallel with all sorts of PICs and, well, it requires a lot less effort to write good assembly code for them.

For instance, I'm disappointed that on the instruction description page of the CPU manual, there's no indication of execution time relative to addressing mode (unless it is very cryptically given?).

But, I'm very happy to have better learned this architecture thanks again to this forum. Thanks again guys!

Jeff

Alban · ‎07-18-2007

EDIT: I see I talked about S08 and not S12. Still the same principle is applicable. And the CPU12 book should be looked into.

Hi guys,

I enclosed a part of the CPU reference manual.
Depending on the addressing mode, you have the access details and number of cycles.
I hope this helps you.

The document is HCS08RMV1.pdf

Cheers,
Alban.

Message Edited by Alban on 2007-07-18 09:35 PM

Variable PORTAB read/store execution time in NE64?

Variable PORTAB read/store execution time in NE64?

General