Hello hliu,
The complexity of the process will depend on how many times, within the life of the product, the power is removed and re-applied. The flash memory has a limited number of erase and write cycles before "wear out" occurs. The simplest case is to erase a 64 byty block, and then write to the same address within the block, each time the value needs to be saved.
As the number of times that the parameter needs to be saved increases, the single byte value will need to be programmed to a number of successive locations within the block, and then whenever the block becomes full, the block would be erased, and the process startted again. This would spread the wear over the whole block.
The MCU you are using contains erase and programming routines within ROM to simplify the process. The MCU datasheet mentions that their use is described within application note AN2635. The datasheet also references some other application notes that provide for EEPROM emulation with spread wear. As these represent a general approach, the associated code will be quite complex.
If your project should require the wear spreading method, it should be possible to simplify this process since the active byte values used can most likely avoid the unprogrammed value ($FF) for the flash. Therefore the most recent active value within the block can be very easily determined.
Regards,
Mac