Question is for the MPC5xxx. The power of small data (SDA) feature of the PowerPC is that for each global variable access (load or store), rather than performing a "lis" instruction followed by a "lwz/stw", only the "lwz/stw" is needed, thus saving a 1-cycle instruction. Indeed, the load/store is done based on r2 or r13 anchor register, which means however that the displacement from this register is limited to 64Kb.
However, VLE load/store 16-bits instructions have a very reduced immediate field, allowing only for 64 bytes displacement around the register. Hence, it is nonsense to do a VLE load/store based on r2/r13 anchor registers since the small data area accessed could only be of 64 bytes.
What should a compiler do if both VLE and SDA are enabled? Should it use one 32-bits VLE load/store instructions when accessing SDA variables? Or is overall performance better if it ignores the SDA nature of the variable and uses 16bits instructions to get the address and load/store the data?
Let's say VLE alone gives a performance improvement of 10% and SDA alone gives 10% performance improvement, should be expect around 20% improvement if both are combined or it will stay around 10%?