DCU at 1024x768 and 32bpp (XGA resolution)

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

DCU at 1024x768 and 32bpp (XGA resolution)

2,063件の閲覧回数
valterm
Contributor II

I’m working on the Vybrid display driver on Windows Embedded Compact and I have issues supporting the standard XGA resolution (that is reported as maximum resolution supported on the processor) at 32 bits per pixel.  We are using our custom modules and we developed the Windows CE BSP from scratch. We ported Linux BSP and currently run 3.2 kernel. On our 400MHz/500MHz platform the system works fine at 16BPP but increasing the bits per pixel leads to corrupted images with fifo overruns.  Using Linux on the 500MHz version image quality is better, but the on-screen image is still corrupted when the device is accessing USB, SD, flash memory etc. The reference manual report XGA as maximum resolution, but with no specs about the BPPs. On the other side the DCU chapter in the reference manual states that up to 4 layers can be used when the framebuffer is stored in DDR (we need to keep it in DDR because it’s too big for the internal RAM) and I'm currenty using just one layer.  I tried setting the DCU_MODE[DDR_MODE] bit or unsetting it but the behaviour is the same. It seems that DDR bus has enough bandwidth to support this resolution, but it seems also that the display internal pipeline is not able to collect data faster enough to deliver the expected output. For XGA mode the pixel clock is 65MHz. I use the PLL1PFD2 clock as source for DCU (452MHz) configuring it in CCM_CSCMR1. First divisor is 1, using 452MHz as frequency also for the DCU and second divisor is 7, leading to a 64MHz pixel clock, this respect the suggestion that DCU must run at at least 4x pixel clock. I enabled test mode for the DCU and I can see the color stripes, so I think that video output timings are ok. I have also no issue running the same resolution at 16bpp (same pixel clock, data also in DDR and same output parms), or driving the display at 32bpp at SVGA resolution (800x600 with 40MHz pixel clock, same memory layout for layer data).  That's why I suspect a memory bandwidth issue.  I'm checking the BSPs to understand why the issue is more noticeble on Windows CE than it is on Linux but still need to solve it on both operating systems. Is there a way to increase the bus bandwidth of the DCU controller to support high-resolutions at 32bpp?

ラベル(3)
タグ(2)
6 返答(返信)

1,278件の閲覧回数
ioseph_martinez
NXP Employee
NXP Employee

Hi Valter,

Thanks for your responses. Also try the DCU input clock synchronous to the DDR (regardless the pixel clock) maybe that helps (i.e. 400MHz, same clock as DDR)

From what you tell, your part does not have L2 cache... I am guessing that such part may have better performance since will access memory more efficiently.

Regarding the six layers: No, it should just use the layers you enable. You may do the test, start adding more layers, perhaps pointing to the same location and  you may see the failure even when you are not running the CPU on the linux version.

0 件の賞賛

1,278件の閲覧回数
valterm
Contributor II

As I said we need to support this resolution under Windows CE, having a working linux implementation would be useful to check how the DCU or other components are initialized.

At the FTF 2014 in Dallas people from Freescale confirmed that this resolution/refresh rate should be supported by the Vybrid DCU, but I found this document that states that 800x600 is the maximum resolution supported.

http://cache.freescale.com/files/microcontrollers/doc/app_note/AN4651.pdf

On the other side all the calculations on this document are done considering merging 6 layer per pixel (maximum supported number of overlapping layers).

I don't need that.

One layer can be enough, two would be perfect.

Taking numbers from that document the bandwidth between DCU and DDR is 1GB/s (less than 1.6GB/s DDR bandwidth). Using one layer will require 1024x768x4(32bpp)x60(refresh rate)~=189MB/s, less than 20% of the available bandwidth.

For six layers this would exceed memory bandwidth (~1.2GB), but the only method suggested by the application note to avoid this is to reduce pixel clock.

Why reducing the number of enabled layers does not change the figures about required bandwidth?

What happens if I enable just one or two layers?

I tried to set BLEND_ITER in the DCU_MODE register to 6 (max layers), 4 (suggested if you enable also DDR_MODE that is what I do), 2 (minimum value), but nothing changes.

I reset all the layer configuration registers to 0, clearing the enable bit (EN=31) in CTRLDESCLX_3 register for each layer but the one I'm currently using (layer 0).

The DCU still loads data from 6 layers!?

Where does it takes the address, since all registers are set to 0?

It loads 6 copies of the same data from DDR (the only enabled layer that has a meaninful memory address configured in CTRLDESCRX_3?

0 件の賞賛

1,277件の閲覧回数
ioseph_martinez
NXP Employee
NXP Employee

Hi Valter,

A comments on the calculations:

The required BW on the dcu is actually pix_clk * bpp/8 per layer. So in this case is 260MB/s. But you are right, that is way below the available BW on the DDR.

What other peripherals are using DDR? is the CA5 and/or the M4 running out DDR? Do you still see corruption of the DCU if you stop execution? Different masters accessing to DDR adds complexity to the system and it is not all about BW but also about Latencies. Having the A5 and/or M4 sharing the BUS of the DDR may lead to some issues at such high frequencies.

I know there is some latency penalty when clocks are not the same on master and slave. Try the DCU run at the same freq as the DDR, to see if that gives any help. Also, according tot he manual, the valid frequencies for the DCU pixel clock are: 5- 60MHz

Also, don't use DDR mode on the DCU, it wont help and it will make things worst.

Regarding this: The DCU still loads data from 6 layers!?

I don't understand, are you seeing the 6 layers? how do you know is loading the 6 layers? (besides the behavior of the BLEND_ITER register)

Regarding this: I reset all the layer configuration registers to 0

Do you set to 0 ALL the descriptors of ALL layers or only CTRLDESCLX_3 bit 31 of all layers? I just want to double check per your comment on the 6 layers

What entity is rendering the content? how often? what type of content?

That is what I can think of for now...

1,278件の閲覧回数
valterm
Contributor II

Ciao Ioseph,

I'll try to change clock frequency and try (60MHz may be in the accepted range). I'm using 65MHz already in 16bits mode and it works, by the way.

In the meantime I reply to your questions.

What other peripherals are using DDR? is the CA5 and/or the M4 running out DDR? Do you still see corruption of the DCU if you stop execution?

The A5 is running out of DDR and most of the memory buffers used by other devices (Eth, USB, SD) is stored there. The M4 is not running. The plan is to use the internal RAM for the M4 (if available) or for some buffers that can fit the 1.5MB we have (not the framebuffer). On Linux the image is OK when the system is idle, we have issues when it's running. On CE is always bad (the CPU is not put in idle state at the moment).

Also, don't use DDR mode on the DCU, it wont help and it will make things worst.

Good to know, but it's not what the documentation suggests :smileyhappy:

I don't understand, are you seeing the 6 layers? how do you know is loading the 6 layers? (besides the behavior of the BLEND_ITER register)

Regarding this: I reset all the layer configuration registers to 0

Do you set to 0 ALL the descriptors of ALL layers or only CTRLDESCLX_3 bit 31 of all layers? I just want to double check per your comment on the 6 layers


I was asking about the 6 layers because the app note I referenced always used 6 layers as the number used in calculation, so I was wondering if there are always 6 "pipelines" loading data from the memory.

I set all registers to 0, this obviously re-set also the enable bit. I have only one layer with the bit set to 1 and valid parms in the other regs. I reset everything to 0 to not risk that a bootloader enabling the DCU to show a splash screen may leave some layers enabled. This is not the case right now, the bootloader does not access the DCU.


0 件の賞賛

1,278件の閲覧回数
timesyssupport
Senior Contributor II

Hi Valter,

We are not aware of any ways to increase the DCU controller's bus bandwidth. Perhaps the Vybrid Design team would have a comment on this.

As you are working with custom hardware and Linux kernel version, this would fall outside the scope of Timesys standard support, and would need to be approached under a services agreement. Please contact sales@timesys.com if this is something you would be interested in.

Thanks,

Timesys Support

0 件の賞賛

1,279件の閲覧回数
karina_valencia
NXP Apps Support
NXP Apps Support

timesyssupport can  you attend this case?

0 件の賞賛