Flashing Made Simple Min Ram Redux HCS08 Family

JimDon · ‎02-28-2008

Well, thanks to the help of several people, and the realization that the app. notes sort of misrepresented certain aspects of flashing, we now have it down to 10 bytes of ram.

It turns out that in non-burst mode, the only code that must be in ram is writing the start command bit and the wait loop, which cut another 8 bytes out of ram.

If you notice in the asm code, that does not go in ram, I seeming loaded X with 'page' when I did not need to. This is not an oversight, as it is bad design to "know" where the compiler puts things, as other compilers may or may not pass things in this way, or this compiler may change.

According to the app. note that this was taken from (HCS08RUG) this is for the HCS08 family.
It started out at 59 bytes in ram.

Code:

volatile unsigned char PGM[10]  = {  0xc7,0x18,0x25,     // sta       FSTAT  0x44,               // lsra -  delay and convert to FCCF bit0xc5,0x18,0x25,     // Bit fstat0x27,0xfb,          // BEQ *-30x81                // RTS}; byte FlashErasePage(word page) {      asm {      TPA         ; Get status to A      PSHA        ; Save current status        SEI         ; Disable interrupts      LDA  #0x30      STA  FSTAT  ; Clear FACCERR & FPVIOL flags      LDX  page      STA ,X      ; Save the data      LDA  #$40   ; Erase command      STA  FCMD      LDA  #FSTAT_FCBEF_MASK      JSR  PGM      PULA        ; Restore previous status      TAP   }     if (FSTAT&0x30){                     //check to see if FACCERR or FVIOL are set    return 0xFF;                       //if so, error.   }      return 0;    }byte FlashProgramByte(word address, byte data) {      asm{      TPA        PSHA        ; Save current status        SEI         ; Disable interrupts      LDA  #0x30      STA  FSTAT  ; Clear FACCERR & FPVIOL flags      LDX  address      LDA  data      STA ,X      ; Save the data      LDA  #$20   ; Burn command      STA  FCMD      LDA  #FSTAT_FCBEF_MASK      JSR   PGM      PULA        ; Restore previous status      TAP   }       if (FSTAT&0x30){                     //check to see if FACCERR or FVIOL are set    return 0xFF;                         //if so, error.   }      return 0;    }

JimDon · ‎02-28-2008

Ya, I thought of that, but a 2AM I was to lazy to figure out if you can do BIT ,X or not.
I also introduced a bug late last night which is fixed ( I actually had LDHX originally, but for some reason changed it)
If possible maybe one of you can delete the first post, or even correct it.
Since none of you spotted it, I don't feel as bad :--)

Code:

volatile unsigned char PGM[6]  = { 
0xf7,                // sta ,X      FSTAT 
0x44,               // lsra -  delay and convert to FCCF bit
0xf5,               // Bit ,X       FSTAT
0x27,0xfd,          // BEQ *-1
0x81                // RTS
};
byte FlashErasePage(word page)
{
  
   asm {
      TPA         ; Get status to A
      PSHA        ; Save current status 
      SEI         ; Disable interrupts
      LDA  #0x30
      STA  FSTAT  ; Clear FACCERR & FPVIOL flags
      LDHX  page
      STA ,X      ; Save the data
      LDA  #$40   ; Erase command
      STA  FCMD
      LDA  #FSTAT_FCBEF_MASK
      LDHX @FSTAT
      JSR  PGM
      PULA        ; Restore previous status
      TAP
   }
  
   return (FSTAT & 0x30); 
}

byte FlashProgramByte(word address, byte data)
{
  
   asm{
      TPA 
      PSHA        ; Save current status 
      SEI         ; Disable interrupts
      LDA  #0x30
      STA  FSTAT  ; Clear FACCERR & FPVIOL flags
      LDHX  address
      LDA  data
      STA ,X      ; Save the data
      LDA  #$20   ; Burn command
      STA  FCMD
      LDA  #FSTAT_FCBEF_MASK
      LDHX @FSTAT
      JSR   PGM
      PULA        ; Restore previous status
      TAP
   }
  
   return (FSTAT & 0x30);
 
}

Message Edited by JimDon on 2008-02-27 09:29 PM

bigmac · ‎02-28-2008

Hello,

Actually, the delay to the read cycle is now only 2 cycles, rather than 4. I chose to put in an extra dummy read to get sufficient delay, giving a total of 7 bytes.

void PGM( void)
{
   __asm {
      STA ,X  ; wp   Commence flash command; delay 5 cycles until next test
      LSRA     ; p    FCCF mask (0x40)
      BIT ,X  ; rfp Dummy read
L1:   BIT ,X  ; rfp Test for command complete
      BEQ L1  ;      Loop if not
   }
}

Regards,

Mac

CarlFST60L · ‎04-26-2013

I was just looking for a version with COP reset and was unable to find. Here is the updated RAM function that will work on all processors, and, now with COP reset. My GB32 was resetting cop even on slowest setting so it was required.

//------------------------------------------------------------------------------------------------------------------------

//ASM code to write to flash from RAM

#pragma DATA_SEG MY_ZEROPAGE //003

#define Use_COP //Use COP reset, you may need to modify the address of SRS for the selected processor

#ifdef Use_COP

volatile unsigned char PGM[10] = {

#else

volatile unsigned char PGM[7] = {

#endif

0xf7, // sta ,X FSTAT

0x44, // lsra - delay and convert to FCCF bit

0xf5, // Bit ,X FSTAT

#ifdef Use_COP

0xC7,0x18,0x00, // sta SRS (Clear cop), SRS is at 1800 in this processor

#endif

0xf5, // Bit ,X FSTAT //Added dummy read to allow all masked version to function correctly

#ifndef Use_COP

0x27,0xfd, // BEQ *-1

#else

0x27,0xfa, // BEQ *-4

#endif

0x81 // RTS

};

#pragma DATA_SEG DEFAULT //003

//------------------------------------------------------------------------------------------------------------------------

CompilerGuru · ‎02-28-2008

any reason not to preload H:X with @FSTAT before calling into RAM?
That would reduce it from 10 to 6 RAM bytes.
Daniel

bigmac · ‎02-28-2008

Hello Jim,

JimDon wrote:
It turns out that in non-burst mode, the only code that must be in ram is writing the start command bit and the wait loop, which cut another 8 bytes out of ram.

Congratulations - everyone had missed this up until now, but it makes sense that access to flash would remain up until the FCBEF flag is written, and the command commences. Probably a case of "not seeing the wood for the trees".

If you notice in the asm code, that does not go in ram, I seeming loaded X with 'page' when I did not need to. This is not an oversight, as it is bad design to "know" where the compiler puts things, as other compilers may or may not pass things in this way, or this compiler may change.

This also avoids a compiler warning about an unused parameter.

Within your erase and program functions, I might suggest that the statements -

if (FSTAT & 0x30){  // check to see if FACCERR or FVIOL are set
   return 0xFF;     // if so, error.
}
return 0;

be replaced with -

return (FSTAT & 0x30);

The return will still be zero when there is no error.

Regards,

Mac

tonyp · ‎02-28-2008

bigmac wrote:

Congratulations - everyone had missed this up until now, but it makes sense that access to flash would remain up until the FCBEF flag is written, and the command commences. Probably a case of "not seeing the wood for the trees".

Actually, this idea has some history.

But in those days it didn't seem to work (can't remember if I had actually ran tests or relied on the fact that the app note indicated otherwise), and if I remember correctly one explanation against it was that the address latching is affected by both reads and writes to flash. But this may have been fixed in the meantime.

I'll run my own tests first chance I get, but for the record has anyone actually tested this (i.e., only the writing to FSTAT up until the FCCF bit is set needs to be in RAM) and it works, or are we dealing with theory once again?

Rereading my original post, it is evident that I had tested this and it didn't work at the time.

Message Edited by tonyp on 2008-02-28 12:52 PM

peg · ‎02-28-2008

Hello,

Well I was just reading these posts and I was thinking...... "Tony is going to have something to say about this" Lo and behold he posts a few minutes later.

I must admit I have been wanting to check this out myself for a long time. Freescale/Motorola are well known for there ultra conservative approach to this sort of thing often offering the safe cover all approach over the exact technically based fully optimised approach.

I have discovered many cases where only by experimentation can you find the real technical details behind these sort of things that are glossed over in the documentation.

JimDon · ‎02-28-2008

Well, I did test it. I am not one to arm chair these things if I have the item right in front of me.
It seems a given if you post code you have not tried it will be wrong.

However I will recheck things, but it seems to work on QG8, AW16 and QE128.
It is easy enough to check things wrong, so I will try it all again.

The test is erase a sector, write a sector with a count pattern, check the data.
I will say that as far as when the flash goes off line, when you get this wrong it is a hard failure - crash and burn, as of course if execution memory can't be read things go bad fast.

As peg mentions, Freescale leaves us to guess as to the actual logic, and sample code has to taken into account errata (published and un published). It does make sense that the flash would not go offline until the command is started. I played with the burst mode, which leaves the charge pump on, and it would seem that as long as the charge pump is on, no flash.

To be honest, the 20 something byte one was fine by me, but unless there is some compelling evidence to the contrary, what the heck.

As far as the extra BSR ,X can you give any particular reason why the BUSY bit would not go true after you set the command bit and execute an instruction? Originally it was 4 NOPs, but I am thinking that all that was needed was some extra clocks to propagate the busy bit thru some logic, and there is not a variation there, as the single cycle seems to have worked every time. Perhaps an early version truly required 4

Again, speculation, but having designed similar logic and put such requirements it makes sense. On the other hand, if we are waiting for the charge pump to come up before busy is set, it could depend on the FLCK and it's relation to BCLK.
If you believe that latter probably the original 4 cycles should be there. I currently believe the former, as that is what the current data seems to indicate.

If you would try it on the GB60. I suppose it is possible you has some other problem.
There could be odd timing problems I have not seen.

Message Edited by JimDon on 2008-02-28 09:16 AM

tonyp · ‎02-28-2008

OK, you had me going. So, I took out my GB60 demo board and another QG8 board we use for a product (both known to work without any problems if loaded with correct code), and ran some tests on both. For both boards, the exact same code was used except for the inevitable differences in MCU specific initialization. All tests were performed in run mode (no debugger attached) to eliminate possible side effects of using BDM.

I first tested with working Flash code that I've been using in numerous and diverse applications without any problems, just to verify my test code was setup correctly for both MCUs. Everything worked perfectly, I could erase, re-write, etc. The original (working) RAM routine is the familiar one:

;*******************************************************************************
; Purpose: RAM routine to do the job we can't do from Flash
; Input : A = value to program
; Output : None
; Note(s): This routine is modified in RAM by its loader at @2,3 and @5
; : Stack needed: 20 bytes + 2 for JSR/BSR

?RAM_Execute       sta       EEPROM              ;Step 1 - Latch data/address
                                                 ;EEPROM (@2,@3) replaced
                   lda       #mByteProg          ;mByteProg (@5) replaced
                   sta       FCMD                ;Step 2 - Write command to FCMD

                   lda       #FCBEF_
                   sta       FSTAT               ;Step 3 - Write FCBEF_ in FSTAT
                   lsra                          ;min delay before checking FSTAT
                                                 ;(FCBEF -> FCCF for later BIT)
?RAM_Execute.Loop  bit       FSTAT               ;Step 4 - Wait for completion
                   beq       ?RAM_Execute.Loop   ;check FCCF_ for completion
                   rts

;after exit, check FSTAT for FPVIOL and FACCERR

?RAM_Needed equ *-?RAM_Execute

Next, I moved Step1, Step2, and half of Step3 outside this routine, just prior to calling it with this code (at the dotted point, A has the data and HX the address):

;*******************************************************************************
; Purpose: RAM routine to do the job we can't do from Flash
; Input : A = FCBEF bit mask
; Output : None
; Note(s): This routine is modified in RAM by its loader at @2,3 and @5
; : Stack needed: 10 bytes + 2 for JSR/BSR

?RAM_Execute       sta       FSTAT               ;Step 3 - Write FCBEF_ in FSTAT
                   lsra                          ;min delay before checking FSTAT
                                                 ;(FCBEF -> FCCF for later BIT)
?RAM_Execute.Loop  bit       FSTAT               ;Step 4 - Wait for completion
                   beq       ?RAM_Execute.Loop   ;check FCCF_ for completion
                   rts

;after exit, check FSTAT for FPVIOL and FACCERR

?RAM_Needed equ *-?RAM_Execute

                   ...

                   sta       ,x                  ;Step 1: Latch the data/address
                   lda       #mByteProg          ;command to use
                   sta       FCMD                ;Step 2 - Write command to FCMD
               lda       #FCBEF_             ;prepare for Step 3
                 sei                           ;disable interrupts
                   tsx
                   sta       COP                 ;reset COP
                   jsr       ,x                  ;execute RAM routine to perform Flash command
                   ais       #?RAM_Needed        ;de-allocate stacked routine
                   lda       FSTAT
                   bit       #FPVIOL_|FACCERR_   ;check for errors
                   beq       ?Success
?Error            sec
                   rts

Both boards gave consistent errors in all programming (or erasing) attempts.   Not the crash type error one would expect if Flash was momentarily unavailable, but the routine returned with an error condition, but in all cases code continued to run from Flash correctly. After checking the code over and over for even the tiniest possible violation of Freescale's published steps for Flash programming, I couldn't find any.

But, I still didn't give up. I thought if JimDon doesn't see a problem I shouldn't either, so we must be doing something differently. After juggling several things around trying this and that, I had some progress.

The last thing I tried (isn't it always the last place you look?) was to place the COP resetting instruction before the STA ,X that latches the data/address pair (where the dots are in the code above.) I then tried it on the QG8 and, guess what, SUCCESS!

I had finally killed the beast that was hunting me for nearly two years, and I went back and tried it again, this time on the GB60. Unfortunately, not the same results here. It kept failing constistently like it did before this change. It seems I had only wounded the beast, but it was still loose.

Now, keep in mind that COP is the SRS register (at $1800 in both chips). It is NOT flash, so technically, there is no violation by writing to COP after latching with STA ,X. But, surprisingly, it made a difference even if only for the QG8.

Conclusion:

This method is unreliable, it seems whether it will work or not is dependant on mask revisions or other hidden differences, and the fact that writing to COP alone even makes a difference is disturbing. (In some early QT's, writing to COP at $FFFF which was also part of Flash caused serious problems, but in this case there is no excuse.)

Regarding another issue, the 4 cycle delay, and whether we should obey it or not: Like I said here (second to last paragraph), my experience shows that indeed it works even without it. But, unless one knows the exact internal logic, it is unsafe. Isn't it possible, for example, that just one of the error conditions takes four cycles, while all others take just one cycle? Your code should be able to handle them all. Only reliable source for this is Freescale, unless someone can come up with a test for each and every possible error condition. And given today's results with unexpected and undocumented behavior, I would be very hesitant to use code that works only on certain people's birthdays.

I'll stick with what is working 100% for now, until there is brighter light on this issue.

Message Edited by tonyp on 2008-02-28 08:21 PM

JimDon · ‎02-28-2008

Let me ask you this, was the flash correctly programmed?
Meaning was the data programmed into the flash the same values?
I realize this is not the final test, but that is what I was, perhaps erroneously, looking for.

I was pretty clear about that, and I will agree that that there is more to it, but I am curious, as it was not entirely clear from you post if this was the case or not.

"but in all cases code continued to run from Flash correctly" does this imply that the data was correctly programed? Did you verify that all bits were set? As I have agreed, there are other considerations, like how long will it stay set, but getting the correct data would be the first qualification as to if it programmed correctly.

At this point I am just curious.

BTW I appreciate all of you jumping in here. I will say my goal was to not really cut this down tomin size and perhaps I should not have titled the post in the way, and I am still using the 59 byter verbatim from the app note, because I don't have to equipment to really test this, which means temperature controlled oven and 100 chips to see if the method ages well. However, the 20 something one look pretty good, and I guess the GB60 is an acid test.
Thanks tony for your time and expertise.

There were other issues, like not having to change the PRM and correctly dealing with interrupts. T

Message Edited by JimDon on 2008-02-28 02:24 PM

tonyp · ‎02-28-2008

JimDon wrote:
Let me ask you this, was the flash correctly programmed?

In all cases, when there was success, I got the correct data programmed. When there was failure, no change in the flash contents.

By "continued to run" I meant there is was no crash or other such problem. When it failed, it just failed to alter the contents of Flash (and correctly reported this as an error.)

JimDon · ‎02-28-2008

It was that way for me. Either it worked 100%, or not at all.
In my tests, it seemed to work, but I will look at all again.

As I mentioned, I wrote a 512 byte sector count pattern, and when it worked the data was always 100% correct.
I did do it with the debugger, so I will see if that makes a difference.
I will devise a harder test.

I just was not sure what you meant by that statement.

I will run the tests again tonight, but as I said unless I had the means to do an aging test, I would stick with a more conservative approach, because if you read the detailed data on these technologies, you can think you programmed it bit for bit correct, but the retention may be bad if you did not program it "correctly". Although the temptation is to believe if the hardware says ok, then it is good (unlike external programmers of the past, which could have incorrect algorithms)

Freescale publishes data for aging tests - so many degrees for so long equals so many years.
In the early days of EEPROMs we had equipment to do this, but not anymore.
Again, it would be interesting to see.

I'm thinking the 18 byte model in the last thread was good, where the STA ,X was in ram. I guess we kinda went nuts after that. As I said, I was delighted with the 35 byte model, as the app note was 59.

Also, if you could, post your code with a bigger font. I have to increase the font size, then things don't fit so well. Thanks, and again tony, thanks for helping out (you too peg).

Message Edited by JimDon on 2008-02-28 03:02 PM

peg · ‎03-01-2008

Hello Jim,

Previously, when experimenting with bit at a time flash programming I was tricked by a phenomenon where the flash appeared to programme but it did not survive a power down.
Perhaps you could do a power cycle after you believe it works, as part of your "harder" test.
I don't remember exactly how I was tricked, but it looked like it was programmed, but when you cycled power it read back as $FF. Wish I had some spare time to check this out for myself.

JimDon · ‎03-01-2008

Peg,
Thanks for the tip.

It seems that is not a problem, and it also seems that the Advanced Programming /Preserve Range feature does not work.

I wrote some code that would burn, but then if it was already there not burn, so I could cycle power and go back and look. I even exited Hiwave and restarted in the odd chance it some how cached it.

I can't find any thing wrong with the 6 byte method on a QG4.
I did serveral patterns address in address, all 0's, AA 55 and it seems to work.
Must be the GB60 has some oddness, because as tony agrees, we are following the spec., and it does make sense that the array would not go offline until the command is actaully started.

And tony did see it work ok on his QG.

peg · ‎03-02-2008

Hello all,

Well I finally got to test it out.
I can do everything from flash up to the write to FSTAT ($80) with a
QG8 Mask 3M77B
and a
GB60 Mask 3L31R
no errors indicated and flash is properly erased or programmed.
Tested with a modified version of DoOnStack with COP disabled.

P.S. Tony, my birthday was a week and a half ago, so that is ruled out.

JimDon · ‎03-02-2008

Peg,

Thanks Peg! I tested the heck out of as well, and found no problems.

I tested on a QG8 3M77B, pre release QE128 OM11J, QG4 ZRNL 0711 ?? , AW16 5M75B, Pre release JM 60 OM36H.

The best news is that you had a GB60. I wonder what mask set tony was using?

tonyp · ‎03-03-2008

JimDon wrote:
I wonder what mask set tony was using?

GB60 mask: 1L31R whose errata doesn't mention anything related to this issue.

This came on an AXM-0313 Rev. B M68DEMO9S08GB60 board. This is the same board I had run the original tests on about two years ago.

Your collective results indicate it's possible whatever problem I'm seeing has been fixed in later revisions. If it turns out to be a specific mask issue and given that many older masks may still be out there for sale, how safe is it to use in a general-purpose library that will end up loaded here and there? No, thanks. I would like to have Freescale's OK first, even in the form of an updated App Note or updated errata that tells us which masks have this "problem."

bigmac · ‎03-04-2008

Hello,

The use of the RAM based code within this thread, and also for a previous thread detailing a 19/20 byte routine, does rely entirely on ANSI initialisation of the global variable containing the routine, and will not work if initialisation is bypassed.

I recall there have been discussions within previous threads concerning the long term reliability, and the possibility of data corruption over time. This application would seem to be an instance where the data must remain uncorrupted for an indefinite period. Particularly so for non-volatile storage just prior to power-down.

It would appear to me that, for optimum reliability, this approach should additionally provide the means to explicitly load the routine code to the variable, just prior to the use of the routine.

Regards,

Mac

JimDon · ‎03-04-2008

Well there have been discussions about many things, but I know of no actual data or evidence that points to the "C" initialization code not working properly. If this is the case, then it must be fixed as it is a fundamental assumption, and anyone noting this should submit a service request.

As for long term reliability of ram, again there is no evidence of this and for Freescales sake I hope this is just an idle allegation.

If anyone has any evidence of either of the these, please I would like to hear of it. It needs to be investigated, as it there is any truth at all to these ideas, then there will be serious problems.

Now, having said that, if there are bugs in your code and you overwrite ram, then yes this could be a problem. On the other hand, presumably you will be crashing in other places as well, and you need to fix your bugs.

Of course you could copy it to ram each time, as if you do believe that Freescale "C" startup code is bugged, or that Freescale ram is faulty (or that your code is bugged) this would solve that problem. You will need to copy it each time, otherwise I don't see that you have done much to solve this perceived problem. If you use DoOnStack, then of you are copying each time..

peg · ‎03-04-2008

Hi Jim,

I believe you have missed the point.
What I believe Mac is saying is not that anything is done wrongly. But he is questioning whether it is good practice to rely on the contents of RAM that was written to an indeterminate time period before it is used.
i.e has it been corrupted by noise power supply fluctuations in the indefinate period between when it was written and when it is used. Nothing to do a SR on here.

Flashing Made Simple Min Ram Redux HCS08 Family

Flashing Made Simple Min Ram Redux HCS08 Family

General