Application: each suite in a multi-storey hotel has its own control unit that handles lighting, air conditioning, access control (door locks) and other features involving interaction with hotel guests.
The control unit implements TCP-IP/Ethernet networking, to permit exchange of data on an on-going basis with a central server in the hotel. Network is closed (not accessible from the Internet), but there can be several hundred control units in the system. These form local area networks, typically on a per-floor basis. Thus in general there are two hops between control unit and central server, one to the LAN router and one from router to central server.
Control unit uses an MC9S12XDT512 processor interfaced to a 10Mb/s Ethernet controller, Microchip type ENC28J60. The interface uses the SPI2 port of the processor, which in the 80-pin device does not have the Slave Select line bonded to an external pin. The SS function for the ENC28J60 therefore uses a GPIO port line (PTS.3).
Problem: when reading data from the ENC28J60 via SPI2 (which can involve several hundred bytes in one operation), the monitoring loop which tests SPTEF and the one that tests SPIF appear to 'hang', showing all the appearances of waiting for a flag that never becomes set.
A block of data is read from the ENC28J60 using the following subroutine. The number of bytes to be read is passed in Accum D, and the starting address of a receiving buffer in HCS12 RAM is passed in Index Reg X:
read_mem: PSHD ; Save byte count
rdmem01: LDAB SPI2SR ; Check SPTEF
BEQ rdmem01 ; Keep looping until set
BCLR PTS, mSS2_EN ; Assert ENC28J60 CS line
NOP ; Delay 2 T(bus)
MOVB #RBM_CMD, SPI2DR ; Send Read Memory command byte
rdmem02: LDAB SPI2SR ; Check SPIF
BEQ rdmem02 ; Keep looping until set
LDAB SPI2DR ; Read returned byte, discard
rdmem03: LDAB SPI2SR ; Check SPTEF
BEQ rdmem03 ; Keep looping until set
CLR SPI2DR ; Send null byte
rdmem04: LDAB SPI2SR ; Check SPIF
BEQ rdmem04 ; Keep looping until set
MOVB SPI2DR, 1,X+ ; Get returned byte, copy to
; receiving memory
DECW 0, SP ; Decrement counter and loop
LEAS 2, SP ; Adjust stack
BSET PTS, mSS2_EN ; De-assert CS line
The above issues a 'read memory' command (RBM_CMD) to the ENC28J60, after which the latter supplies data from consecutive locations of its own memory buffer as SPI clocks are generated.
The NOPs to delay 2 bus clock periods just after the ENC28J60 CS line is asserted are included because there is a certain amount of decoding logic (as well as a 5V/3.3V level translator) between the PTS.3 pin and the actual enable pin of the ENC28J60, and the device Slave Select should be stable before SPI transactions begin. Processor bus clock frequency is 39.3216 MHz (period = 25.4 nsec).
Diagnostic instructions to turn a row of four PCB LEDs on in certain combinations were inserted before and after each of the loops that test SPTEF and SPIF, and these indicate the point at which the program gets stuck.
Tests consist of attaching about 30 control units to a local network via Ethernet switches. A computer running the central server software (as well as DHCP server software) is also on the LAN. On power-up, each control unit successfully acquires a DHCP assigned IP address from the server and begins normal operation. Normal operation consists of relatively low volume traffic, each control unit receiving a poll from the server at 20-30 second intervals. After about 10 minutes the control units begin to go off line and cease to respond to Ping messages. After 1-2 hours, more than half of the units have ceased to operate, and in all cases the diagnostic LEDs indicate hanging at one or other of the test loops for SPTEF and SPIF.
SPI clock was originally set at the maximum of half bus clock frequency. But I later noticed that this must be derated above 35 MHz, so I reduced SPI clock frequency to one-sixth of bus clock (approx 6.5 MHz). This made no discernible difference to the behaviour.
(ii) The SPI0 port of the same processor is also being used for something. Its SPI clock frequency is lower (approx 1.6 MHz), but otherwise operates in pretty well exactly the same way from a firmware point of view. No problems have come to light with this however. On the face of it, the next step should be to reduce SPI2 clock further, even though it is now well within spec.
(iii) The processor executes a very short (~1usec) interrupt routine every 10msec. I tried disabling this during execution of the above subroutine, but this made no difference.
Are there known issues with the SPI ports of the MC9S12X series, in particular when operating at higher frequencies? If not, can anyone spot a weakness in the above code or make any further suggestions?