LS1046a: How do I force PCIe and dmatest to use qdma instead of edma?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

LS1046a: How do I force PCIe and dmatest to use qdma instead of edma?

1,630 Views
AbelianMeme
Contributor III

We are getting extremely poor performance out of  PCIe on our LS1046a. It is configured as a 4 lane, Gen3 root complex, and our application is heavily I/O bound through this interface.

In order to try and troubleshoot the problem, I downloaded and compiled the recommended PCIe dmatest I found elsewhere on this forum. (dma-performance-test.tar.gz)  Unfortunately, I can not get the dmatest to run.  It prints:

[ 113.171176] dmatest: current_mask 0x1
[ 113.171292] dmatest: len 5001 save_mask f tmp_mask 2
[ 113.171320] dmatest: pcie_phy_base : 4841800000 pcie_virt_base : 0000000067846f89

But never proceeds any further, and never returns results.

After a little bit of investigating, trying to figure out why the fsl-qdma routines weren't getting called, I had dmatest.c print out the following value:

printk("%pF\n",chan->device->device_issue_pending);

I found out this function is actually pointing to:  fsl_edma_issue_pending

Implying that the dma controller being assigned to the PCIe bus is the edma controller, not the qdma controller.  How do I force the PCIe bus to use the qdma controller? I tried assigning a dmas property to the pcie@3400000 dts node, but that made no difference.  I still see that the dmatest device is trying to use edma. On mysystem, under /sys/class/dma, I can see the system created 32 edma ports on dma0, and 64 qdma ports on dma1. How do I force PCIe (and dmatest) to only use dma channels from dma1?

 

Thank you for any assistance,

-----------------------------------------------------------------------------------------------------

For reference, here are the relevant parts of my .dts tree:

pcie1: pcie@3400000 {
compatible = "fsl,ls1046a-pcie";
reg = <0x00 0x03400000 0x0 0x00100000 /* controller registers */
0x40 0x00000000 0x0 0x00200000>; /* configuration space */
reg-names = "regs", "config";
interrupts = <GIC_SPI 118 IRQ_TYPE_LEVEL_HIGH>, /* controller interrupt */
<GIC_SPI 117 IRQ_TYPE_LEVEL_HIGH>; /* PME interrupt */
interrupt-names = "aer", "pme";
#address-cells = <3>;
#size-cells = <2>;
device_type = "pci";
dma-coherent;
dmas = <&qdma 0 &qdma 1 &qdma 2 &qdma 3 &qdma 4 &qdma 5 &qdma 6 &qdma 7
&qdma 8 &qdma 9 &qdma 10 &qdma 11 &qdma 12 &qdma 13 &qdma 14 &qdma 15
&qdma 16 &qdma 17 &qdma 18 &qdma 19 &qdma 20 &qdma 21 &qdma 22 &qdma 23
&qdma 24 &qdma 25 &qdma 26 &qdma 27 &qdma 28 &qdma 29 &qdma 30 &qdma 31
&qdma 32 &qdma 33 &qdma 34 &qdma 35 &qdma 36 &qdma 37 &qdma 38 &qdma 39
&qdma 40 &qdma 41 &qdma 42 &qdma 43 &qdma 44 &qdma 45 &qdma 46 &qdma 47
&qdma 48 &qdma 49 &qdma 50 &qdma 51 &qdma 52 &qdma 53 &qdma 54 &qdma 55
&qdma 56 &qdma 57 &qdma 58 &qdma 59 &qdma 60 &qdma 61 &qdma 62 &qdma 63>;

iommu-map = <0 &smmu 0 1>; /* update by bootloader */
num-viewport = <32>;
bus-range = <0x0 0xff>;
ranges = <0x81000000 0x0 0x00000000 0x40 0x00200000 0x0 0x00200000 /* downstream I/O */
0x82000000 0x0 0x40000000 0x40 0x40000000 0x0 0x40000000>; /* non-prefetchable memory */
msi-parent = <&msi1>, <&msi2>, <&msi3>;
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 7>;
interrupt-map = <0000 0 0 1 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>,
<0000 0 0 2 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>,
<0000 0 0 3 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>,
<0000 0 0 4 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>;
big-endian;
status = "okay";
};

qdma: dma-controller@8380000 {
compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
reg = <0x0 0x8380000 0x0 0x1000>, /* Controller regs */
<0x0 0x8390000 0x0 0x10000>, /* Status regs */
<0x0 0x83a0000 0x0 0x40000>; /* Block regs */
interrupts = <GIC_SPI 153 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 40 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 42 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "qdma-error", "qdma-queue0",
"qdma-queue1", "qdma-queue2", "qdma-queue3";
#dma-cells = <1>;
dma-channels = <64>;
block-number = <1>;
block-offset = <0x10000>;

fsl,dma-queues = <4>;
status-sizes = <8192>;
queue-sizes = <4096 4096 4096 4096>;
big-endian;
};

edma0: edma@2c00000 {
#dma-cells = <2>;
compatible = "fsl,vf610-edma";
reg = <0x0 0x2c00000 0x0 0x10000>,
<0x0 0x2c10000 0x0 0x10000>,
<0x0 0x2c20000 0x0 0x10000>;
interrupts = <GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "edma-tx", "edma-err";
dma-channels = <32>;
big-endian;
clock-names = "dmamux0", "dmamux1";
clocks = <&clockgen 4 1>,
<&clockgen 4 1>;
};

 

 

0 Kudos
Reply
5 Replies

1,618 Views
AbelianMeme
Contributor III

An update on this problem.

I still have not figured out how to get dmatest to use QDMA rather than eDMA, however in order to proceed I completely removed eDMA from the device tree. Now I only get the qdma channels in /sys/class/dma,  and dmatest does seem to choose dma0chan0, which is now QDMA.

Unfortunately, that is as far as it goes. The dmatest still does not work. It appears the QDMA driver is broken. I was using the fsl-qdma driver from the 6.0.9 mainline kernel.  After some debugging statements, what I saw was:

1) dmatest.c calls fsl_qdma_enqueue_desc (with block = 0, queue = 0, chan = 0) several hundred times.

2) The interrupt handler, fsl_qdma_queue_handler is *NEVER* called

3) The dma queue fills until the X_OFF threshold is reached

4) Eventually, fsl_qdma_enqueue_desc, hangs in a spinloop forever

 

That's not good. So I upgraded to the latest version of the fsl-qdma driver (available in v6.2-rc4).  Now dmatest no longer hangs, but it does terminate on an error. After queueing 4106 memcopy's, qdma finally returns an error here:

fsl_qdma_request_enqueue_desc()

The newest driver version apparently adds a timeout function, which errors gracefully rather than hanging forever.

 

But it still doesn't solve my problem. Why is QDMA never retiring DMA commands or calling the IRQ handler? Is there something that needs to be done to make this block functional?  Any advice is appreciated. Other than commenting out eDMA, I have not changed the dts files previously attached.

 

 

 

0 Kudos
Reply

1,612 Views
AbelianMeme
Contributor III
Here is what I can see the fsl_qdma driver is setting up in the registers. I don't see anything obviously wrong with this. Perhaps someone more skilled than I can see the problem. Why would the DMA not be working and why am I not receiving interrupts?

(BTW, while doing this I noticed a bug in the latest version of the fsl-qdma.c driver. It does not affect anything on my setup, so I ignored it, but the maintainer may wish to correct this problem. I have noted with (**) 2 registers that are set in the loop for every block. But the LS1046A Reference manual (section 28) says unlike the other registers, these 2 registers only exist on block 0. Either the driver is wrong or the manual is wrong. Whichever one it is should be corrected.)

[ 14.210941] fsl_qdma_reg_init : B0CQ0MR = 0x80860000
[ 14.216264] fsl_qdma_reg_init : B0CQ1MR = 0x80860000
[ 14.221581] fsl_qdma_reg_init : B0CQ2MR = 0x80860000
[ 14.226898] fsl_qdma_reg_init : B0CQ3MR = 0x80860000
[ 14.232226] fsl_qdma_reg_init : SQCCMR = 0x00200000 (**)
[ 14.239368] fsl_qdma_reg_init : B0SQEPAR = 0xe2d80000
[ 14.246510] fsl_qdma_reg_init : B0SQDPAR = 0xe2d80000
[ 14.253651] fsl_qdma_reg_init : B0CQ0IER = 0x00008000
[ 14.260793] fsl_qdma_reg_init : B0SQICR = 0x80058000
[ 14.267847] fsl_qdma_reg_init : CQIER = 0x80000001 (**)
[ 14.275336] fsl_qdma_reg_init : B0SQMR = 0x80060000
[ 14.280564] fsl_qdma_reg_init : B0SQSR = 0x00020000
[ 14.285791] fsl_qdma_reg_init : FSL_QDMA_DMR = 0x00000000
[ 14.291540] fsl_qdma_reg_init : FSL_QDMA_DEDR = 0x00000000
[ 14.297377] fsl_qdma_reg_init : FSL_QDMA_DEIER = 0xff000000
[ 14.303299] driver: 'fsl-qdma': driver_bound: bound to device '8380000.dma-controller'
[ 14.303334] bus: 'platform': really_probe: bound device 8380000.dma-controller to driver fsl-qdma
0 Kudos
Reply

1,606 Views
AbelianMeme
Contributor III

This will be the last update I make for a while. I am hoping someone can offer me some guidance from here. I now have a handle on *WHAT* is happening. I have no idea *WHY* it is happening. I added some more debugging information in fsl-qdma, and was able to discover the following:

First thing to understand. The "Set EI" command (described in register B0CQ0MR in section 28.3.1.25.4 of the LS1046a Reference Manual) is what causes QDMA to advance the buffer and process the commands. I have printed out the important registers before and after calling this command to see what the QDMA system is doing.

The problem: the system dies after processing 4 commands, and never even completes those 4 commands.

Question:  What is causing this behavior?

 

QUICK EDIT HERE:  The likely reason it is stopping at 4 is mentioned in section 28.1.4:

• Supports single dequeue from command queues, store up to 4 prefetched command descriptors.

But apparently, it never executes any of these prefetched command descriptors

 

Attached below is the log I created from the function fsl_qdma_enqueue_desc. Things marked with ** I added by hand.

Note: "Set EI" occurs at the end of fsl_qdma_enqueue_desc at the following lines:
   reg = qdma_readl(fsl_chan->qdma, block + FSL_QDMA_BCQMR(fsl_queue->id));
   reg |= FSL_QDMA_BCQMR_EI;
   qdma_writel(fsl_chan->qdma, reg, block + FSL_QDMA_BCQMR(fsl_queue->id));

 

-----
** Put the following command into the queue Q[0]:
[   60.439507] fsl_qdma_enqueue_desc :
    cmd@0000 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f8010 }
-----
** Set EI, EPAR -> EPAR+1 (0x10)
[   60.439514] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90000
[   60.439519] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90000
[   60.439524] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90000
[   60.439529] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90010
-----
** Put the following command into the queue Q[1]
[   60.439535] fsl_qdma_enqueue_desc :
    cmd@0010 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f8050 }
-----
** Q[0] was processed. DPAR -> DPAR+1. (0x10) Set EI. EPAR -> EPAR+1 (0x20)
[   60.439541] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90010
[   60.439545] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90010
[   60.439550] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90010
[   60.439554] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90020
-----
** Put the following command into the queue Q[2]
[   60.439559] fsl_qdma_enqueue_desc :
    cmd@0020 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f8090 }
-----
** Q[1] was processed. DPAR -> DPAR+1. (0x20) Set EI. EPAR -> EPAR+1 (0x30)
[   60.439565] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90020
[   60.439569] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90020
[   60.439573] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90020
[   60.439577] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90030
-----
** Put the following command into the queue Q[3]
[   60.439582] fsl_qdma_enqueue_desc :
    cmd@0030 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f80d0 }
-----
** Q[2] was processed. DPAR -> DPAR+1. (0x30) Set EI. EPAR -> EPAR+1 (0x40)
[   60.439587] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90030
[   60.439591] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90030
[   60.439595] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90030
[   60.439599] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90040
-----
** Put the following command into the queue Q[4]
[   60.439605] fsl_qdma_enqueue_desc :
    cmd@0040 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f8110 }
-----
** Q[3] was processed. DPAR -> DPAR+1. (0x40) Set EI. EPAR -> EPAR+1 (0x50)
[   60.439610] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90040
[   60.439614] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90040
[   60.439618] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90040
[   60.439622] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90050
-----
** Put the following command into the queue Q[5]
[   60.439627] fsl_qdma_enqueue_desc :
    cmd@0050 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f8150 }
-----
********************************************************************************
** OOPS! Q[4] was not processed. DPAR never changes again. No completion IRQs ever arrive.   <======
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[   60.439632] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90040
[   60.439636] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90050
[   60.439640] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90040
[   60.439643] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90060
-----
[   60.439649] fsl_qdma_enqueue_desc :
    cmd@0060 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f8190 }
-----
[   60.439654] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90040
[   60.439657] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90060
[   60.439661] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90040
[   60.439665] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90070
-----
[   60.439671] fsl_qdma_enqueue_desc :
    cmd@0070 = { .status = 40000000, .cfg = 20000000, .data = 00000000801f81d0 }
-----
[   60.439676] fsl_qdma_enqueue_desc : Before EI - B0CQ0DPAR = 0xe2d90040
[   60.439679] fsl_qdma_enqueue_desc : Before EI - B0CQ0EPAR = 0xe2d90070
[   60.439684] fsl_qdma_enqueue_desc : After EI - B0CQ0DPAR = 0xe2d90040
[   60.439687] fsl_qdma_enqueue_desc : After EI - B0CQ0EPAR = 0xe2d90080
-----
...

 

 

0 Kudos
Reply

1,530 Views
khushbur
NXP TechSupport
NXP TechSupport

Hi @AbelianMeme 

 

Can you please try testing with latest LSDK 21.08 and let me know the results.

 

Thanks

Khushbu

0 Kudos
Reply

1,571 Views
stadium_aquino
Contributor IV

I don't know what's causing your problem, but you can try posting your findings to the Linux ARM mailing list . I emailed them about a DMA problem I had recently and they were very helpful in debugging it.

0 Kudos
Reply