Well, I've been researching this issue in the last days and finally came to a conclusion.
The SSP0 signal waveforms are really bad in our custom board and also on the EVK (that comes with 0ohm series resistors). We observed severe ringing on both boards, with undershoot and overshoot leading to a Vp-p of around 7V at worst.
To improve the signal and reduce ringing we tried series resistors in the range 33 - 100 ohm
However, that is not the main issue. Turned out that the ROM bootloader incorrectly setup the SSP clock with inverted polarity. Why this is allowed by the hardware peripheral is beyond me, since the SD specifications only support one clock polarity. It must be a SPI flag that is not ignored in SD mode.
This has been recently confirmed by Freescale in the chip errata and with a "patch" that you can load on an additional EEPROM memory, to fix the clock polarity and finally boot from the SD card.
Unfortunately this happened way after we designed our board.
From our measurements with a scope, the CMD line changes on the rising edge of the clock, which is the edge used by SD cards to sample the data and commands. So how can it work? Because the command is actually earlier than the clock by a few nanoseconds. With slow rising edges, some cards do not reply. But if you burn OTP flag to enable 12mA drive-strength, the edges are faster and it magically works. That of course with no guarantees at all or any safety margin.
Well, this is the end of the story. We investigated if we could just place a logic inverter on the clock line, but any SD card spec we saw had a max output delay time of 14ns from the rising edge of the clock. That means with a 50MHz clock, a period of 20ns, you have only 6ns to play with, minus 2.5ns for the setup time of SSP signals. Only 3.5ns left, and the fastest inverter (74LVC1G04) has a propagation delay of 3.3ns at 3.3V. Not accounting clock jitter.
Safest option is to boot from EEPROM, which fixes the clock polarity and continues to boot from SSP0. Freescale released the binary image to fix boot from SSP0 and SSP1 (even if they called them 1 and 2). We tested this solution and verified clock is good and boot is successful with the problematic SD cards we identified.
It works fine on a board we left with blank OTP. The fast edges are only really needed with a fast clock. Since the boot defaults are to use an SSP clock of 12MHz, the default drive strength is fine. After linux has been loaded and booted, its SD/MMC driver correctly programs the clock and increases the drive strength for 48MHz clock rate.
I hope this is useful to anybody that plans to boot from SD/MMC cards.