Hello Yiping Wang,
Thank you for paying attention to my post.
I didn't made any changes from the qdma node in arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi provided by the LSDK19.03:
qdma: qdma@8380000 {
compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
reg = <0x0 0x8380000 0x0 0x1000>, /* Controller regs */
<0x0 0x8390000 0x0 0x10000>, /* Status regs */
<0x0 0x83a0000 0x0 0x40000>; /* Block regs */
interrupts = <0 153 0x4>,
<0 39 0x4>,
<0 40 0x4>,
<0 41 0x4>,
<0 42 0x4>;
interrupt-names = "qdma-error", "qdma-queue0",
"qdma-queue1", "qdma-queue2", "qdma-queue3";
channels = <8>;
block-number = <1>;
block-offset = <0x10000>;
queues = <2>;
status-sizes = <64>;
queue-sizes = <64 64>;
big-endian;
};
I'm going to ask you further explanation on your comment. You said "The performance data will be increased when increasing the channels number. The channels number could be defined in qdma device node in the dts file.".
Do you recommend to increase the property "channels" to <64> ? I don't really understand how the properties in qdma should be set to achieve better performances.
From my point of view, the property "block-number" should be <4> instead of <1> but is the property "block-number" depends on the <channels> property ?
Quite the same than my previous sentence, the property "queues" should be <8> instead of <2> but is the property "block-number" depends on the <channels> property ?
Regarding "status-sizes" and "queue-sizes", do those properties need to be the same ? How to set them properly ?
From your test results, can you precise the unit of the transfers ? Can you provide the associated to .dts to achieve this (same than mine and you only change the property <channels>) ? Can you describe the link speed and grade of the PCIe endpoint of your test and also the payload of this PCIe device ?
I took also a look at the "Layerscape Software Development Kit User Guide" and in the chapter "7.2.10 Queue Direct Memory Access Controller (qDMA)" and the dmatest provides the following information:
dmatest: dma0chan0-copy3: summary 1000 tests, 0 failures 4078 iops 33474 KB/s (0)
dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 3024 iops 24486 KB/s (0)
dmatest: dma0chan0-copy2: summary 1000 tests, 0 failures 2881 iops 23588 KB/s (0)
Can you comment on these performance results ? I know it is done with a LS1043a SoC but in my opinion, it should be pretty much the same with LS1046a and it is slow.
I am sorry for asking these many questions but it still is confusing to me.
Again, thank gor your time and consideration,
Have a nice day,
Romain Gallais