Flash programming for GB60 and compatibles

tonyp · ‎04-01-2006

Hi all,

I'm trying to minimize the RAM size needed to program the flash of a GB60, QG8 or compatibles. So, I made some changes to working code, and it no longer works. But I want to understand why.

From various documentation (GB60, QG8, HCS08) about flash programming, it seems that a flash command is actually 'launched' when the FCBEF bit in FSTAT is written with 1. It should then be possible to do all previous steps (i.e., latching the address and data "sta ,x", writing to FCMD) from Flash, not RAM. Only the steps of writing FCBEF in FSTAT (that launches the command) and waiting for the FCCF to become set (that marks the end of a command) should be executed from RAM.

It isn't very clear when a Flash programming session actually begins, so that Flash is no longer available for running code from it. Any ideas or better reference that explains the steps in greater detail?

Thanks

tonyp@acm.org

rocco · ‎04-02-2006

Hi, Tony:

My understanding, and it's based on vague communications with Motorola tech support (remember them?) on a GP20 issue, was that the address latch is affected by both reads and writes to flash. So the instruction fetch cycles would interfere with programming.

Not sure how reliable that information is today, however. Working with flash was a struggle back then.

Message Edited by rocco on 04-02-200602:00 PM

Alban · ‎04-05-2006

Hi Rocco & Tony,

The problem on the GP was on some of HC08 family.
QG and GB are S08 and use a completely different technology from HC08.

Therefore we can't make any comparison between these.
I have never heard of any problem on S08 Flash yet.

Cheers,
Alban.

rocco · ‎04-05-2006

Hi, Alban:

I don't think that Tony is implying that there is a problem, he is just attempting to push the envelope a little.

And though the technologies are different between HC08 and HCS08, the programming algorithms are so similar that I would bet the address latch has the same behavior. Not that this behavior is a problem, this is simply an exercise in understanding the mechanisms behind flash programming.

Alban · ‎04-07-2006

Hi guys,

nah, I'm all right and wasn't getting nervous at all

Still I think you can compare the S08 to the S12 for the Flash but nor really for HC08. First reason being the state machine it uses with commands instead of timings for you to manage...

Alban

rocco · ‎04-10-2006

Alban wrote:
. . . First reason being the state machine it uses with commands instead of timings for you to manage...

Ah, yes, I see your point now, after reading over the flash programming chapters for both parts. I guess I will have to experiment some, as I also need to reduce the ram footprint of my boot loaders.

tonyp · ‎04-06-2006

That's right. I never said there was a problem with Flash programming, just that I was looking for way to reduce the RAM required (especially for the QG8), and that the manuals leave unanswered the question of 'when exactly' flash access is inhibited.

Anyway, I managed to shorten the example code given in Fig. 4-12 of the HCS08 Family Reference Manual from 24 bytes to 21 bytes + 2 for the JSR/BSR.

This was done by making these changes (RAM portion of code listed below):

1. Preload A with the value to write just before calling the routine.

2. Use immediate addressing mode for the loading of the command byte.

3. Have the loader patch the address in STA FLASH with the actual address that eliminating the need to load HX inside the routine and then use STA ,X. (We couldn't possibly avoid using HX for calling the RAM routine because any other method would mean extra stack used by the loader portion, so the benefit would be lost.)

But I had hoped to do even better by being able to the "sta FLASH" and "lda...sta FCMD" outside this routine.

tonyp@acm.org

;*******************************************************************************
; Purpose: RAM routine to do the job we can't do from Flash
; Input : A = value to program
; Note(s): This routine is modified in RAM by its loader at @2,3 and @5
; : Stack needed: 21 bytes + 2 for JSR/BSR

?RAM_Execute sta FLASH ;FLASH (@2,@3) is replaced

lda #mByteProg ;mByteProg (@5) is replaced
sta FCMD ;Step 2 - Write command to FCMD

                   lda       #FCBEF_
                   sta       FSTAT               ;Step 3 - Write FCBEF_ in FSTAT
                   nop                           ;required delay

?RAM_Execute.Loop  lda       FSTAT               ;Step 4 - Wait for completion
                   lsla                          ;check FCCF_ for completion
                   bpl       ?RAM_Execute.Loop
?RAM_Execute_End   rts                           ;on exit, A has non-zero if error

?RAM_Needed equ *-?RAM_Execute

tonyp · ‎04-20-2006

Here's an improved version, one byte shorter. Instead of NOP we use LSRA which provides the same delay but also leaves A with only FCCF bit set. Later, instead of "LOOP: LDA FSTAT, LSLA, BPL LOOP" we use the mask already in A to do "LOOP: BIT FSTAT, BEQ LOOP"

We can check for errors once we exit this RAM routine, from the Flash portion, by loading FSTAT and checking for FPVIOL and FACCERR.

tonyp@acm.org

;*******************************************************************************
; Purpose: RAM routine to do the job we can't do from Flash
; Input : A = value to program
; Output : None
; Note(s): This routine is modified in RAM by its loader at @2,3 and @5
; : Stack needed: 20 bytes + 2 for JSR/BSR

?RAM_Execute       sta       EEPROM              ;Step 1 - Latch data/address
                                                 ;EEPROM (@2,@3) replaced
                   lda       #mByteProg          ;mByteProg (@5) replaced
                   sta       FCMD                ;Step 2 - Write command to FCMD

                   lda       #FCBEF_
                   sta       FSTAT               ;Step 3 - Write FCBEF_ in FSTAT
                   lsra                          ;min delay before checking FSTAT
                                                 ;(FCBEF -> FCCF for later BIT)
?RAM_Execute.Loop  bit       FSTAT               ;Step 4 - Wait for completion
                   beq       ?RAM_Execute.Loop   ;check FCCF_ for completion
?RAM_Execute_End   rts

;after exit, check FSTAT for FPVIOL and FACCERR

?RAM_Needed equ *-?RAM_Execute

Message Edited by tonyp on 04-20-200603:24 PM

rocco · ‎04-20-2006

Hi, Tony:

I noticed that you were not using the H:X registers for anything. Is there a reason?

I tried using it for the address to program, and squeezed out two bytes, for a total of 18.

PS: I had to use a .PDF attachment, as I can't prevent the board from munging the formatting on the code, and the board won't accept .txt, .asm .ect . . .

Message Edited by rocco on 04-20-200603:45 PM

rocco · ‎04-20-2006

But then I tried the same thing for Flash register addressing, and saved 5 bytes, for a total of 15.

PS: I did a second post because the board doesn't appear to allow two attachments in one post

Message Edited by rocco on 04-20-200603:45 PM

tonyp · ‎04-20-2006

OK, both your squeezes are already known, but here's why they won't do...

The first one won't really save you any RAM since to use HX inside the routine you must find a different (other than JSR ,X) way of calling the stacked routine. One way is to do this:

; this is right after stacking the routine to RAM

LDHX #RETURN_ADDRESS
PSHHX
TSX
AIX #2 ;skip return address
PSHHX
LDHX #FSTAT
STA COPCTL
SEI
RTS ;call routine

RETURN_ADDRESS:
...etc...

So, what you save inside the routine, you lose outside before calling it. And it evens out. The comment should read: "Stack Needed: 18 bytes + 4 for JSR emulation"

The second squeeze, although in practice I've tested several times and seems to work, I can't 'officially' trust it yet because, for the time being at least, it violates the timing requirement of four cycles between writing to FSTAT (with STA ,X) and checking it (with BIT ,X). So, if there is any possibility of FSTAT giving a wrong response, the test may fail unpredictably, and the routine will be unreliable.

But, if Freescale can verify that there is no problem (since in reality it 'appears' to work OK), then I'll be fine with it, and it's the shortest possibility found todate.

tonyp@acm.org

Message Edited by tonyp on 04-21-200602:04 AM

Message Edited by tonyp on 04-21-200602:34 AM

rocco · ‎04-21-2006

Ahh, You WERE using H:X already! I didn't notice that.

I also didn't notice the four cycle delay. In my mind, that is a deal breaker. I would not violate the delay, without express, written permission from Freescale and it's insurance company. You don't want the flash failing in the field.

My flash routine sits in a dedicated portion of ram, not on the stack, so I call it with an extended address. The flash routine never runs when the firmware is operational, so I just overlay expendable data.

Flash programming for GB60 and compatibles

Flash programming for GB60 and compatibles

General