Nice programming and record, I was only able to toggle it at 3.6 MHz ! You can otpmize a few cycles @ 1 GHz but at one point there is a limit due to the fact we have to go through caches, a couple of crossbars and a peripheral bridge @ 66 MHz. The trick to reproduce it with platform SDK (and QNX, they should be aware of this) was to enable caches, MMU and configure mmu entry for those as a standard entry and not as strongly ordered as it is by default, it allows the accesses to be buffered :
+ mmu_enable();
+ // Enable L2 Cache
+ _l2c310_cache_setup();
+ _l2c310_cache_invalidate();
+ _l2c310_cache_enable();
- mmu_map_l1_range(0x00a00000, 0x00a00000, 0x0f600000, kStronglyOrdered,kShareable, kRWAccess); // More peripherals
+ mmu_map_l1_range(0x00a00000, 0x00a00000, 0x0f600000, kDevice, kShareable, kRWAccess); // More peripherals