AnsweredAssumed Answered

PCIe xHCI support with IMX6

Question asked by jerome bolduc on Apr 23, 2018
Latest reply on Apr 24, 2018 by igorpadykov



I am using a Variscite module with the iMX6 Solo (VAR-SOM-SOLO) using their latest Yocto release.

>> Linux var-som-mx6 4.9.11-02300-ga1ac172-dirty #46 SMP PREEMPT Mon Apr 23 16:25:08 EDT 2018 armv7l armv7l armv7l GNU/Linux


We want to capture from USB 3.0 cameras using gstreamer. I enabled xHCI module in kernel and I am able to see our USB 3.0 camera. We are also using a Renesas UPD720202 chipset as a PCIe-2-USB3.0 bridge. The device is well detected on our system and enumerates as a gen2 device. Dump for PCIE:

[ 0.472685] OF: PCI: host bridge /soc/pcie@0x01000000 ranges:
[ 0.472699] OF: PCI: No bus range found for /soc/pcie@0x01000000, using [bus 00-ff]
[ 0.689975] imx6q-pcie 1ffc000.pcie: link up
[ 0.689988] imx6q-pcie 1ffc000.pcie: link up
[ 0.690000] imx6q-pcie 1ffc000.pcie: Link up, Gen2
[ 0.690203] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
[ 0.720570] pcieport 0000:00:00.0: Signaling PME through PCIe PME interrupt
[ 0.720591] pcie_pme 0000:00:00.0:pcie001: service driver pcie_pme loaded
[ 0.720730] aer 0000:00:00.0:pcie002: service driver aer loaded


When using the following pipeline rendering video into fakesink:

>> gst-launch-1.0 v4l2src ! video/x-raw,width=1920,height=1080,framerate=60/1 ! fpsdisplaysink video-sink=fakesink text-overlay=false -v

The result is that I can only get frames from the USB 3.0 camera at about 37 fps. This represents a bandwidth of about 150 MB/s. This result is far lower than PCIe gen2 speed at 5GB/s = 625 MB/s. When removing 20% efficiency for overhead, we should get about 500 MB/s theoretically. I am a bit concerned if I am really working on a gen2 link.


I am wondering what is the actual PCIe bandwidth limit for the iMX6? I have seen all the stuff about putting an external clock if we want to use a gen2 link, but this only stands for PCIe compliance and we should be able to get PCIe gen2 working without this clock generator, right?


Regarding memory bandwidth, I am using the /unit_tests/MMDC/mmdc2 and I am able to get memory utilization. Using the gstreamer pipeline, here is what I get: 

i.MX6DL detected.

MMDC new Profiling results:
Measure time: 548ms
Total cycles count: 217041312
Busy cycles count: 171558534
Read accesses count: 6400933
Write accesses count: 3639669
Read bytes count: 351126122
Write bytes count: 200725584
Avg. Read burst size: 54
Avg. Write burst size: 55
Read: 611.06 MB/s / Write: 349.32 MB/s Total: 960.38 MB/s
Utilization: 40%
Overall Bus Load: 79%
Bytes Access: 54


It seems quite high. If I only monitor the ARM:

MMDC new Profiling results:
Measure time: 555ms
Total cycles count: 219713502
Busy cycles count: 172663185
Read accesses count: 4262270
Write accesses count: 2752165
Read bytes count: 213796440
Write bytes count: 87786328
Avg. Read burst size: 50
Avg. Write burst size: 31
Read: 367.37 MB/s / Write: 150.85 MB/s Total: 518.22 MB/s
Utilization: 21%
Overall Bus Load: 78%
Bytes Access: 42


It seems like the CPU is handling the packets on PCIe itself. Is there a way to use some kind of DMA access?


Tell me what you think about it.