So far I haven't succeeded - I get garbage all the time and it is not really in sync with RM. I have tried both 16bit and 12bit configurations, I'll do some more trying, but I guess this is not really possible. 12bit is completely unusable, so I won't mention that case.
For 16bit case:
There is a nice equation on page 2729 of RM (Rev1), Chapter 37.4.2.7:
ADDR = EBA + (XB + SX)*BPP + (YB + SY)*(SL + 1)
However, system behaves differently. According to this, Pixel (x = 0, y = 0) should be placed at EBA, while Pixel (x = 1, y = 0) should be placed at EBA + 1*BPP (EBA + 16bit in case of 16bpp). However, this is not the case as for the first frame I get 32 bytes of packed data, and then 32 bytes of garbage data (burst size is 32 pixels). Afterwards I get interleaved lines. To simplify:
A = 32 bytes of valid data,
G = 32 bytes of garbage,
X = 32 bytes of what seems to be valid data, but isn't (lines from the previous frame or similar).
Memory looks like:
1. frame: AGAGAGAGAGAGAGAG,
2..n frames: AXAXAXAXAXAXAXAX,
This all shows that DMA is bursting 32 bytes of packed 12-bit data, but for the next line it bursts whatever is in IPU memory back to system memory - this is why for the first frame it bursts garbage, and for the next frames it bursts the leftovers from the previous frame(s). This is all when first started after boot. When started for second, third or n-th time it always shows AXAXAX patterns (no garbage per say), which makes sense, as IPU memory is already full of 'valid' data from the previous application run.
For the case with 16bit data and 1 pixel burst size, it gets really weird - I get the 130 bytes from RG line and then 130 bytes from GB line (?!?) - I have no idea what is going on in this case - it could be that IPU->System memory bandwidth is too low and it's somehow missing pixels, but I could easily be wrong.
Ok, so if you have the nerves to read this essay :smileyhappy:, have you got any further suggestions on how to put 12bit packed data to 16bit memory?