I have made some more checks. Setting the pin high and than immediately low takes 4 asm instruction for a total of at most 5 clock cycles. I have got 255 nSecs, that is compatible with a clock of 20Mhz (I would have expected ca 160 nSecs).
So, I've set up the main clock as clockout on a pin, and I got a clear 30Mhz on the scope.
The system is configured for a 30Mhz FRO as main clock and the SYSAHBCLKDIV is set to 1 (one), so no div.
Could be it's core is running at a lower clock frequency also if the main clock, at the clockout output (enbled via SYSCON) is showing 30Mhz?
Or something is slowing down the core?
... I think I've found part of the matter: the core is slowed down by the FLASH access time. Changing the Flash access time give a big improvement.
But I don't have seen any details about the minimum guaranties access time in the datasheet, or it's condition.
If the instruction time improved, the interrupt latency is better (now 5uSec with 2 system clock access time for flash) but still far from the expectation.