LS1046A DMA performances

Showing results for 
Search instead for 
Did you mean: 

LS1046A DMA performances

Contributor I

Hi everyone,

I would like to discuss about the LS1046A DMA performances and my issues.

Indeed, I have some troubles with the DMA performances achieved by my platform with a LS1046A SoC.

I would like to achieve high throughput for PCIe transfers with the use of DMA engines provided by the SoC.

By using QDMA with DPAA1:

I can't transfer above 25 MB/s with qDMA when writting to a PCIe area and 40 kB/s when reading to a PCIe area.

From the "Layerscape Software Development Kit User Guide", performances looks similar except readings (from dmatest).

I tried using eDMA, but I failed to make it work. I may have misconfigured my transfer but I didn't find any relevant example to use it correctly..

Am I missing something ? Are there some guidelines or examples to use those drivers in a better way ?

Has someone any experiences in this case ?

I am using the linux kernel 4.19 from LSDK19.03.

Thank you for your time !

Labels (1)
0 Kudos
3 Replies

NXP TechSupport
NXP TechSupport

Hello Romain Gallais,

I asked LS1046 PCIe QDMA write and read test data from the Linux SDK development team, please refer to the following test configuration and results. The performance data will be increased when increasing the channels number. The channels number could be defined in qdma device node in the dts file.

1block 1queue(memory->pcie(ep))

channels     64B                 512B           1KB               2KB            4KB                16KB             1M

8                 25250             165703        322148          106088       168108            315478          452810

32               53347             421814        774816           445189       448636           450857          453467

64               76392             603313        889618           445849       448599           450566          453594

1block 1queue (pcie(rc) -> memory)

channels     64B                 512B           1KB                    2KB            4KB                16KB             1M

8             25118                  171422        322148              113659       162014          309375           437318

32           53656                  421814        774816               435680       436031            435994        438145

64            76392                  603313        889618               436083      436125             35917         438111

If your problem remains, please describe your test scenario in details and please provide your Kernel configuration and dts file.


Have a great day,

- If this post answers your question, please click the "Mark Correct" button. Thank you!

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.

0 Kudos

Contributor I

Hello Yiping Wang,

Thank you for paying attention to my post.

I didn't made any changes from the qdma node in arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi provided by the LSDK19.03:

        qdma: qdma@8380000 {
            compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
            reg = <0x0 0x8380000 0x0 0x1000>, /* Controller regs */
                  <0x0 0x8390000 0x0 0x10000>, /* Status regs */
                  <0x0 0x83a0000 0x0 0x40000>; /* Block regs */
            interrupts = <0 153 0x4>,
                     <0 39 0x4>,
                     <0 40 0x4>,
                     <0 41 0x4>,
                     <0 42 0x4>;
            interrupt-names = "qdma-error", "qdma-queue0",
                "qdma-queue1", "qdma-queue2", "qdma-queue3";
            channels = <8>;
            block-number = <1>;
            block-offset = <0x10000>;
            queues = <2>;
            status-sizes = <64>;
            queue-sizes = <64 64>;

I'm going to ask you further explanation on your comment. You said "The performance data will be increased when increasing the channels number. The channels number could be defined in qdma device node in the dts file.".

Do you recommend to increase the property "channels" to <64> ? I don't really understand how the properties in qdma should be set to achieve better performances.

From my point of view, the property "block-number" should be <4> instead of <1> but is the property "block-number" depends on the <channels> property ?

Quite the same than my previous sentence, the property "queues" should be <8> instead of <2> but is the property "block-number" depends on the <channels> property ?

Regarding "status-sizes" and "queue-sizes", do those properties need to be the same ? How to set them properly ?

From your test results, can you precise the unit of the transfers ? Can you provide the associated to .dts to achieve this (same than mine and you only change the property <channels>) ? Can you describe the link speed and grade of the PCIe endpoint of your test and also the payload of this PCIe device ?

I took also a look at the "Layerscape Software Development Kit User Guide" and in the chapter "7.2.10 Queue Direct Memory Access Controller (qDMA)" and the dmatest provides the following information:

dmatest: dma0chan0-copy3: summary 1000 tests, 0 failures 4078 iops 33474 KB/s (0)

dmatest: dma0chan0-copy0: summary 1000 tests, 0 failures 3024 iops 24486 KB/s (0)

dmatest: dma0chan0-copy2: summary 1000 tests, 0 failures 2881 iops 23588 KB/s (0)

Can you comment on these performance results ? I know it is done with a LS1043a SoC but in my opinion, it should be pretty much the same with LS1046a and it is slow.

I am sorry for asking these many questions but it still is confusing to me.

Again, thank gor your time and consideration,

Have a nice day,

Romain Gallais

0 Kudos

NXP TechSupport
NXP TechSupport

Please try "channels"  as 64 in the dts file.

Please refer to Documentation/devicetree/bindings/dma/fsl-qdma.txt for details.

0 Kudos