If you re-read section 29.4.12.8 (and also observe the error codes that can be returned to the command) you should find that all information is actually there.
On the other hand it always helps to experiment - the sequences can be performed with any debugger (just type in the sequences in the FTFL registers and watch what happens) so if anything is unclear it can be practically verified in a few minutes.
In the uTasker project there is a Flash driver that automatically switches between phrase and section programming depending on flash alignment, boundary locations etc. (when allowed - see #if defined USE_SECTION_PROGRAMMING below).
There is however a disadvantage with section programming that I didn't mention in the previous post; while programming 1k using phrase writes may take twice as long as a section write it is being split into 128 write operations lasting typically 70us each. Usually interrupts are blocked during each phrase write operation since the Flash can't be used for code execution (Flash program routines always execute from RAM). This means that between each phrase programming duration interrupts can be handled and so it is still possible to receive data streams (for example 57600kBaud UART streams in parallel without overruns - assuming interrupt driven and not DMA driven). When programming 1k with the section command the interrupt will be blocked typically 5ms long and so communication streams using interrupts that need to be received in parallel with flash writing will start to fail (overruns) even at low data rates unless additional interrupt handling is also run in RAM.
As reference, the write routine (not showing the actual command interface, which is in fnFlashNow() and controls blocking interrupts and execution from RAM where needed - it also simulates the internal Flash operation to allow incorrect use to be detected before moving to target level debugging) is show below. This allows writing any size of data (1 byte to max. possible) and also allows operation on any K, KE or KL part (adapts itself to write sizes and flash granularity etc. note that some K parts can write up to 2k of data in a single section write command). It has a local phrase buffer allowing users to write part phrases so that the flash programming rules are not 'fully' exposed at the higher level.
static int fnWriteInternalFlash(unsigned long ulFlashAddress, unsigned char *ucData, MAX_FILE_LENGTH Length)
{
static unsigned char *ptrOpenBuffer = 0;
unsigned char *ptrFlashBuffer;
unsigned long ulBufferOffset;
MAX_FILE_LENGTH BufferCopyLength;
if (ucData == 0) { // close an open buffer
ulBufferOffset = ((CAST_POINTER_ARITHMETIC)ptrOpenBuffer & (FLASH_ROW_SIZE - 1));
if (ulBufferOffset == 0) {
return 0; // no open buffer so nothing to do
}
ulBufferOffset = FLASH_ROW_SIZE; // cause the open buffer to be saved without copying any input data
ptrOpenBuffer = (unsigned char *)((CAST_POINTER_ARITHMETIC)ptrOpenBuffer & ~(FLASH_ROW_SIZE - 1));
}
else {
ptrOpenBuffer = (unsigned char *)(ulFlashAddress & ~(FLASH_ROW_SIZE - 1)); // set to start of long word or phrase that the address is in
ulBufferOffset = (ulFlashAddress & (FLASH_ROW_SIZE - 1)); // offset in the long word or phrase
}
do { // handle each byte to be programmed
#if defined USE_SECTION_PROGRAMMING
if (ulBufferOffset == 0) { // if the data start is aligned there is a possibility of using accelerated section programming
MAX_FILE_LENGTH SectionLength = (Length & ~(FLASH_ROW_SIZE - 1)); // round down to full long words/phrases
if (SectionLength > (FLASH_ROW_SIZE * 2)) { // from 2 long words or phrases
unsigned char *ptrEnd = (unsigned char *)((CAST_POINTER_ARITHMETIC)ptrOpenBuffer & ~(FLASH_GRANULARITY - 1));
ptrEnd += FLASH_GRANULARITY; // pointer to the next flash sector
if ((ptrOpenBuffer + SectionLength) >= ptrEnd) { // end of write is past a sector boundary
SectionLength = (ptrEnd - ptrOpenBuffer); // limit
}
if (SectionLength > (FLASH_ROW_SIZE * 2)) { // if still at least 2 long words or phrases to make section write worthwhile
if (SectionLength > FLEXRAM_MAX_SECTION_COPY_SIZE) {
SectionLength = FLEXRAM_MAX_SECTION_COPY_SIZE;
}
uMemcpy((void *)FLEXRAM_START_ADDRESS, ucData, SectionLength); // copy the data to the accelerator RAM
ulFlashRow[0] = (SectionLength/FLASH_ROW_SIZE); // the number of long words/phrases to be written to the section
if ((fnFlashNow(FCMD_PROGRAM_SECTOR, (unsigned long *)ptrOpenBuffer, &ulFlashRow[0])) != 0) { // write section
return 1; // error
}
ptrOpenBuffer += SectionLength;
Length -= SectionLength;
ucData += SectionLength;
if (Length == 0) {
return 0;
}
continue;
}
}
}
#endif
BufferCopyLength = (FLASH_ROW_SIZE - ulBufferOffset); // remaining buffer space to end of present backup buffer
if (BufferCopyLength > Length) { // limit in case the amount of bytes to be programmed is less than the long word or phrase involved
BufferCopyLength = Length;
}
ptrFlashBuffer = (unsigned char *)ulFlashRow + ulBufferOffset; // pointer set in FLASH row backup buffer
uMemcpy(ptrFlashBuffer, ucData, BufferCopyLength); // copy the input data to the FLASH row backup buffer
ucData += BufferCopyLength;
Length -= BufferCopyLength; // remaining data length
ptrFlashBuffer += BufferCopyLength; // next copy location
if (ptrFlashBuffer >= ((unsigned char *)ulFlashRow + FLASH_ROW_SIZE)) { // a complete backup buffer is ready to be copied to FLASH
ptrFlashBuffer = (unsigned char *)ulFlashRow; // set pointer to start of FLASH row backup buffer
ulBufferOffset = 0;
if ((fnFlashNow(FCMD_PROGRAM, (unsigned long *)ptrOpenBuffer, &ulFlashRow[0])) != 0) { // write long word/phrase
return 1; // error
}
ptrOpenBuffer += FLASH_ROW_SIZE;
uMemset(ulFlashRow, 0xff, FLASH_ROW_SIZE); // flush the intermediate buffer
}
else { // incomplete buffer collected
ptrOpenBuffer += BufferCopyLength;
}
} while (Length != 0);
return 0;
}