K70 Internal flash writing not reliable with PEx IFsh1 driver & MQX

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

K70 Internal flash writing not reliable with PEx IFsh1 driver & MQX

Jump to solution
3,219 Views
theobarker
Contributor III

Summary:

When executing from internal flash Block 1 and writing 1KiB sectors of Block 0 flash for application update using the IFsh1 PEx-generated driver IFsh1_SetBlockFlash() function, data isn't always written, or ERR_SPEED is returned.

Processor: K70FN1M0VMJ12

Toolset: CodeWarrior 10.6

Processor Expert version: 10.6.3.RT6_b1446-0504

MQX version: 4.1

Binary image being written - size: ~479KiB

Start address: 0x00008000

Component Configuration in Processor Expert:

Write method: Destructive write

Interrupt Service: disabled

Wait in RAM: yes

Virtual Page: disabled

Initialization: Events enabled in init: yes; Wait enabled in init: yes

CPU clock/speed selection: High speed mode - enabled, others disabled

All sector writes are IntFlashLdd1_ERASABLE_UNIT_SIZE using IFsh1_SetBlockFlash(), except for the last one (remainder).

Application execution from image starting address: 0x00080000 (Block 1)

Tasks: 6 ready or blocked

From looking at the code for IntFlashLdd1.c, I see that SafeRoutineCaller() calls _int_disable() before calling the SafeRoutine and calling _int_enable() after. Thus there should not be a context switch during the actual write operation.

When I added routines to verify each sector immediately after writing them to flash, many bytes remained in erased state (0xFF). When I added a routine to calculate the MD5 hash/sum of the image just written using the mmCAU MQX library, I end up with bytes in the last sector remaining in the erased state. Obviously, the md5sum will not match the expected.

When not attempting to verify or calculate MD5 sum, I occasionally get a return code from IFsh1_SetBlockFlash() of 0x01. From the header of the function, ERR_SPEED is the return value of 0x01. (The error description makes no sense in the source header, BTW, i.e. not applicable). Tracing the source of the bit analytically, it would have to be FTFE_FSTAT.MGSTAT0. The description in K70P256M150SF3RM reference manual section 30.34.1, "The MGSTAT0 status flag is set if an error is detected during execution of an FTFE command or during the flash reset sequence."

None of the other error bits are set.

HELP!

1 Solution
2,725 Views
theobarker
Contributor III

So this has been failure.

From what I can tell the FTFE_FSTAT bit is set by the FTFE before the flash has quiesced into final [erased|written] state and therefore cannot be trusted. As a result the code generated by PEx cannot be trusted to not return until the content of flash has quiesced. 

Therefore, my workaround has been the following:

On starting the operation:

  • Erase the first sector (4KiB) to be written
  • Parse the content that will be written to the first 1KiB block in that sector
  • Write the first sector
  • Erase the second sector 

With each subsequent 1KiB block:

  • If ((DesitinationAddress & IntFlashLdd1_ERASABLE_UNIT_MASK) == 0)
    • Call EraseSector(DestinationAddress + IntFlashLdd1_ERASABLE_UNIT_SIZE)
    • Erases the N+1 sector after writing the first block of sector N

At the end of the writing process:

  • _time_delay(500)
  • Call the Kinetis CAU cau_md5_hash_n() to calculate the MD5 sum of the image
  • Check against build server calculated MD5
  • If they don't match, try again for at least 5 times
  • If they still don't match, declare failure
  • If they do match, write the binary MD5 sum to the next phrase boundary in flash after the image

Writing the MD5 sum after the image proved another challenge. The image tended to end on a longword boundary, mid-phrase. In spite of what the PEx code would lead you to believe, you cannot reliably write mid-phrase.

Hypothesis that could easily be falsified:

  • Kinetis only tested their PEx code with pre-erased flash.
  • When testing the destructive-write code, they were writing (essentially) the same content as what the flash already contained

Please falsify this hypothesis with repeatable data.

View solution in original post

13 Replies
2,726 Views
theobarker
Contributor III

So this has been failure.

From what I can tell the FTFE_FSTAT bit is set by the FTFE before the flash has quiesced into final [erased|written] state and therefore cannot be trusted. As a result the code generated by PEx cannot be trusted to not return until the content of flash has quiesced. 

Therefore, my workaround has been the following:

On starting the operation:

  • Erase the first sector (4KiB) to be written
  • Parse the content that will be written to the first 1KiB block in that sector
  • Write the first sector
  • Erase the second sector 

With each subsequent 1KiB block:

  • If ((DesitinationAddress & IntFlashLdd1_ERASABLE_UNIT_MASK) == 0)
    • Call EraseSector(DestinationAddress + IntFlashLdd1_ERASABLE_UNIT_SIZE)
    • Erases the N+1 sector after writing the first block of sector N

At the end of the writing process:

  • _time_delay(500)
  • Call the Kinetis CAU cau_md5_hash_n() to calculate the MD5 sum of the image
  • Check against build server calculated MD5
  • If they don't match, try again for at least 5 times
  • If they still don't match, declare failure
  • If they do match, write the binary MD5 sum to the next phrase boundary in flash after the image

Writing the MD5 sum after the image proved another challenge. The image tended to end on a longword boundary, mid-phrase. In spite of what the PEx code would lead you to believe, you cannot reliably write mid-phrase.

Hypothesis that could easily be falsified:

  • Kinetis only tested their PEx code with pre-erased flash.
  • When testing the destructive-write code, they were writing (essentially) the same content as what the flash already contained

Please falsify this hypothesis with repeatable data.

2,725 Views
mjbcswitzerland
Specialist V

Hi

I don't know the routines that you are using but is it possible that they only write to Flash when you write 8 bytes blocks (due to the K70's requirement for phrase programming) and so if you write blocks of less that 8 bytes, or a quantity of bytes not divisible by 8, the phrase write (or final phrase write) will not be committed?

For comparison, the uTasker Flash driver will buffer non-complete phrases until it is possible to committ them when further (linear) data is ready, or when the user expressly demands that (incomplete) data is committed (filling in with 0xff for each uncollected byte).

Regards

Mark

Kinetis for professionals: http://www.uTasker.com

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

As I stated in the original posting, the writes are all 1 KiB = IntFlashLdd1_ERASABLE_UNIT_SIZE  except for the last one. The failures almost always occur long before the end. So no, it is not a size<phrase problem.

EDIT: IntFlashLdd1_ERASABLE_UNIT_SIZE = 4096 for the K70.

0 Kudos
Reply
2,725 Views
mjbcswitzerland
Specialist V

Hi

I never experienced any Flash programming issued on a K70 so it sound as though - since you confirm it is not a use-case error - you have encountered a task switching issue/driver bug in the generated code.

I would read back each phrase after it has been written to catch the place where it happens, in order to get a first hold of where to start debugging and correcting.

ERR_SPEED sounds like something left over from code taken maybe from a Coldfire project where the Flash programing settings needed to be correctly set. If you get an error you can usually get quite detailed information in the status registers pointing to why it failed - since everything is self-timed it can only really be due to setting up bad programming addresses or trying to program protected sectors, or trying to program phrases which are already programmed (resp. not erased beforehand).

Regards

Mark

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

Thanks for the replies, Mark. You might want to take a look at the code that Processor Expert generates... That said, you bring up a good point. I did write a sector checking routine that attempted to verify each sector after writing it. Lots of failures very quickly, sometimes the first sector. Weird thing was, it would copy internal flash back to RAM immediately after the IFsh1_SetBlockFlash() returned, I'd hit a breakpoint, look at the RAM copy, the flash copy and the original, and none of them matched. I.e. during the delay for the breakpoint and JTAG memory viewing time, some but not all of the Flash would quiesce to the written values.

NOTE: a critical aspect of this problem is the combination of NXP/Freescale-provided Processor Expert and MQX RTOS. I do not have this problem using the same code on a bare-metal Processor Expert project.

I've read Erich Styger's Serial Bootloader for Freedom Board with Processor Expert blog posting   https://mcuoneclipse.com/2013/04/28/serial-bootloader-for-the-freedom-board-with-processor-expert/ along with many others. I had hoped BlackNight, Jorge Gonzalez or other NXP/Freescale experts would respond to this.

0 Kudos
Reply
2,725 Views
mjbcswitzerland
Specialist V

Theo

If you have no problems with PE generated code in one case I would check whether the code generated in the other case is the same or not. I have no idea if the code that is created has any version management or whether it changes (sometimes for the better and sometimes for the worse) depending on which exact processor type is defined or whether other componts or OS are specified. Also you may find the generated code changing if you use different versions of the IDE (and its components). Basically it could be a bit hit or miss so not necessarily advisable for professional work or real products (?)

Assuming you can confirm that the generated code is always the same and seriously managed in some way you will be able to ascertain that any changes in behavior are likely to be due to use or OS behavior, where you will then be able to cencentrate your search.

Regards

Mark

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

Mark, you may be on to something here.

When using PEx generated code with MQX (which fails to write):

  • PRIMASK is 0 and
  • Calls _int_disable() before calling SafeRoutine() RAM execution loop
  • _int_disable() only sets BASEPRI to 0x0040

While the code generated by PEx for bare metal projects (which work correctly):

  • Calls EnterCritical()
  • Which sets BASEPRI to 0x0000
  • and executes CPSID f
  • Which sets FAULTMASK to 0x00000001

However, modifying the PEx SafeRoutineCaller code on the MQX project to use EnterCritical() instead of _int_disable() still fails occasionally.

Which leads to the question:

Has anyone at Freescale>NXP tested PEx-generated IFSH driver with MQX, writing Block 0, executing from Block 1?

Jorge_Gonzalez​? BlackNight​?

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

To no avail... see above

0 Kudos
Reply
2,725 Views
BlackNight
NXP Employee
NXP Employee

Hi Theo,

a few tips, in case they help:

- have a look at the stack size: are you sure there is no stack overflow? Try increase the task stack size and the size for the MSP (main stack pointer)

- I was supecting that MQX still runs some interrupts. Can you verify/check that upon entering the flash programming routines all interrupts are disabled (PRIMASK I bit set)?

- quickly looking through the code, there is no place where it would assign the ERR_SPEED return code. So this let me think that you have some corrupted stack or similar.

PS: I had strange flash programming on a K22 where the problem was if the base address and the number of blocks programmed were not aligned. I had to align it to an 8 KByte block. The strange thing was that it worked some time, but not always. Always programming 8 KByte blocks with 8k alignment solved the problem on my side.

I hope this helps,

Erich

2,725 Views
theobarker
Contributor III

I wrote a new macro that does set PRIMASK to 1.

I call it before SafeRoutine is called (in place of _int_disable()). 

I verified that it actually does set PRIMASK to 1

I still have the LDD_FLASH_MULTIPLE_WRITE_ERROR condition.

 

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

LuisCasado‌? danielchen@fsl‌? jia-ding‌? Kan‌? Any ideas based on the above and below information? 

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

BTW, It appears that an MQX _int_disable() only sets the BASEPRI register to 0x00000040 for the K70 CM4. PRIMASK is 0.

Also, SP_MAIN has 304 of 352 bytes remaining.

0 Kudos
Reply
2,725 Views
theobarker
Contributor III

Jorge_Gonzalez​? BlackNight​? Any ideas given the further detail?

0 Kudos
Reply