flash reprogramming thru the SCI

zigbertsschecht · ‎10-06-2006

I am using the MC9S08GT16 and have working/stable 'flash erase', 'flash byte' and 'flash burst' routines (thank you Technoman64 for the head start).

I am assembling a network of 1600 processors that are inaccessible once installed on location. I have an RS485 link to this network for command and control.

Has anyone had experience modularizing a section of code such that it could be replaced on the fly using the SCI interface (like a boot_loader that targets only parts of the code)?

For instance, I have an ISR that is triggered by the TPM comparator. It is less than 500 bytes so it could reside on a single flash page by itself. That way a bug fix could occur by replacing the ISR page. AND if I give all 1600 processors a common "bug fix" address, I could do the repair to all of the devices at once.

I know there can be a lot of pitfalls so I am hoping someone might have a few suggestions and recommendations.

zigbertsschecht · ‎10-10-2006

This is all great advice...thanks.

Since I have a host computer I will use it to program all 1600 devices one at a time. It will be easy to implement a script to do the programming.

I think I can break up my code into small chunks. Right now I have:
main, init, drivers, isr totaling 400 bytes data, 1.8K bytes code. No module is larger than 500 bytes. I can isolate the flash interface drivers and SCI stuff then give each of the other code sections 2 pages (1K) of space. I can't imagine bug that would require over 500 bytes to fix in this application.

As for the nuts and bolts of doing this, it seems like a bookkeeping problem. I need to set and keep track of global variable locations and function entry points so that the non-modified code can share information after re-programming.

This project is also a servo motor project but the servo loop is inactive most of the time. The only other isr's are the SCI and watchdog_timer. This brings me to a dilemma.

If I disable interrupts I loose SCI (unless I stare polling...blah)
If I don’t disable interrupts then the watchdog may time out. I guess I need to pay particular attention to the timing...

Questions:
Since my SCI data parser likes to deal with ascii I would like to use the old *.hex ascii based programming file or an equivalent. Working with CodeWarrior can I generate an ascii based file?

Can I compile individual code segments or do I need to do the entire project and cut and paste the section I want?

rocco · ‎10-09-2006

Hi, Zigbert:

I have a similar application, but not nearly on the scale of yours. I have an RS485 network with up to 128 nodes, although typically there are only around 20 nodes. I need to update the firmware in-situ, over the network, since the nodes are not physically available once they are installed. The nodes provide a variety of functions, mainly servo-controllers, camera-controllers and acquisition-controllers, so the firmware varies among the nodes.

I have a bootloader that sits in the low 512 bytes of flash, can communicate on the network, and can update the rest of the firmware. It is designed to never overwrite itself, and to insure that the reset vector always points to itself.

On reset, it performs a checksum on the entire firmware space, and enters bootstrap mode if it finds the firmware to be corrupt. That way, it can always recover from a failed firmware load. It will also enter bootstrap when commanded from the network.

I use a jump table at a fixed location to allow control to pass between the firmware and the bootloader. The checksum is also at a fixed location.

As Tony and Mac suggested, I don't update more than one node at a time. It take just under a minute to re-flash 32k of firmware on a GP32. That is a considerable amount of time with 1600 processors. I also shut the axes down when updating firmware, and the bootloader disables all interrupts and operates in a very single-minded polled fashion.

bigmac · ‎10-08-2006

Hello zigbert,

I totally agree with tonyp about not attempting to update all slave units simultaneously, and for the same reasons that he stated. I also agree that the solution should be as "general" as possible.

Obviously, all your SCI communications and the flash modification functions would need to be held in protected flash, and not subject to any modification. It would also potentially simplify the re-programming process if all normal slave operations from unprotected flash could be suspended until the update is complete. On this basis, I would envisage commands (data packets) to "suspend" and "resume" normal slave operation.

The download of the new data might be done by embedding standard S-record format within a data packet. This is not the most efficient format, but does include starting address infomation, the number of data bytes within the record, has error checking capability, and uses all ASCII characters. It also provides a specific record format to indicate the end of the downloaded data. If the data to be downloaded is not large, and does not have to be done frequently, this method may be quite satisfactory.

In practice, the S-record might cater for 32 bits at a time, giving a suitable packet size for re-send, when necessary. The slave could itself determine whether the flash page needed to be erased, or had already been erased, as each S-record was received.

Regards,

Mac

tonyp · ‎10-07-2006

zigbert wrote:
Has anyone had experience modularizing a section of code such that it could be replaced on the fly using the SCI interface (like a boot_loader that targets only parts of the code)?

For instance, I have an ISR that is triggered by the TPM comparator. It is less than 500 bytes so it could reside on a single flash page by itself. That way a bug fix could occur by replacing the ISR page. AND if I give all 1600 processors a common "bug fix" address, I could do the repair to all of the devices at once.

There's really no end to the possibilities one can come up with, and here's some:

Don't count on any scheme based on the current ISR's size. A possible future bug fix could make the code larger than a Flash page, and your original strategy might fail.

Are you interested in being able to update only the specific ISR, or any part of your code? If any part of the code is a candidate, it won't be easy to break your code into about 32 distinct sections (your MCU's max Flash pages, I assume) to support your current strategy.

Is this going to be a 'live' update (meaning, while the rest of the code is executing)? In this case, if you want to replace a single ISR for example, you can have a RAM pointer to it, and always make calls to it thru that pointer. Once a possible update is finished, you only need to write the new entry address to that pointer (with an atomic instruction, such as STHX), and your code will continue running with the new version. This way, the updated code can be located anywhere in Flash, not just in a fixed page, and be any size (within availability of unused Flash). The original code space now becomes available again for a another possible update. This method means you need as many pointers as the sections of code which are predefined as upgradeable.

By far, the hardest part is doing a concurrent update to all modules. Sending code to all of them is certainly easy, but guaranteeing that all modules will receive it without errors and no need for 'packet' retransmits, is a problem. It's like UDP broadcasts. You can the send packets but there are no guarantees there will be received by all devices. You need something like TCP, but that means one at a time, if you don't want to end up with 1600 'sockets' open at once.

Assuming they do not all need to be synchronized with the new code, but you're only trying to simplify the process by sending the update once and have all devices get updated, I'd go a different route. Send the update to device 1 (using some protocol which, like TCP, 'guarantees' [timeouts not withstanding] the successful delivery of your data), and once that device is done updating itself, it will be responsible to send the update to device 2 (again using the same protocol), and so on. The update packets should be designated as such so the device will know to pass it on, rather than consume it and stop. The cascade time might take a while but you'll only worry about sending the update once. If there is a failure at some node (e.g., comms timeout), you can have it report this back to the main PC (where the update began), so you can try again (once the comms problem is fixed) starting with that node's address. I've actually done this scheme with a HC11 based network, and it worked very well (under my RTOS).

flash reprogramming thru the SCI

flash reprogramming thru the SCI

General