PCIe DMA Transfer failure

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

PCIe DMA Transfer failure

2,293 Views
alessandrocamel
Contributor I

Hi to all

 

I'm working with T2080RDB and with the QorIQ-SDK-V2.0-20160527 and I put a PCIe device into the slot on board.

The device driver loads correctly and I can read and write all the BARn registers and the device's memory.

The problems start when using the DMA transfer (in both directions).

 

The error is the follow:

 

Oct 16 11:00:48 t2080rdb kernel: PCIe error(s) detected
Oct 16 11:00:48 t2080rdb kernel: PCIe ERR_DR register: 0x00100000
Oct 16 11:00:48 t2080rdb kernel: PCIe ERR_CAP_STAT register: 0x80000001
Oct 16 11:00:48 t2080rdb kernel: PCIe ERR_CAP_R0 register: 0x00000800
Oct 16 11:00:48 t2080rdb kernel: PCIe ERR_CAP_R1 register: 0x00000000
Oct 16 11:00:48 t2080rdb kernel: PCIe ERR_CAP_R2 register: 0x00000000
Oct 16 11:00:48 t2080rdb kernel: PCIe ERR_CAP_R3 register: 0x00000000

 

I tried to decode the message above reading the T2080RM datasheet and I found this:

 

PNM: PCI Express no map. A no-map transaction was detected in RC mode.

 

How does it mean?

I didn't understand if the issue is on the host or Endpoint side.

 

Could someone kindly help me?

 

Attached to this question there are the syslog and all the messages printed by the board on console.

 

Any kind of help will be appreciated.

Original Attachment has been moved to: boot.log.txt.zip

Original Attachment has been moved to: syslog.zip

Labels (1)
0 Kudos
9 Replies

1,337 Views
ufedor
NXP Employee
NXP Employee

> I didn't understand if the issue is on the host or Endpoint side.

The ERR_DR[PNM]=1 means that the T2080 PCIe detected request from an external master trying to access address which is not mapped to any inbound window.

0 Kudos

1,337 Views
alessandrocamel
Contributor I

Hi Ufedor

Thanks for your fast reply.

The event described is quite strange, I've just one device connected to the PCIe interface.

Just one host and just one endpoint.

Furthermore the same board with the same driver works well if I use a standard PC (x86 or x86_64) as a host.

Could be an endianness issue?

In other words the host read the data from the device plugged into PCIe slot but with the wrong endianness and use this (overturned) data to configure the DMA engine.

Thanks in advance

0 Kudos

1,337 Views
ufedor
NXP Employee
NXP Employee

> the same driver works well if I use a standard PC (x86 or x86_64) as a host.

Please confirm the driver capabilities with its developers.

0 Kudos

1,337 Views
alessandrocamel
Contributor I

Hi ufedor

Thanks for the suggestion, I'll try.

I have other doubts about the DMA capability of the QorIQ T208x Platform yet.

At boot time I see these messages:

platform ffe240000.pcie:pcie@0: Invalid size 0xfffff9 for dma-range

platform ffe250000.pcie:pcie@0: Invalid size 0xfffff9 for dma-range

platform ffe270000.pcie:pcie@0: Invalid size 0xfffff9 for dma-range

Are these messages symptom of a wrong cofiguration inside the dtb file?

And, last but not least, is the DMA memory contiguous?

Regards

0 Kudos

1,337 Views
ufedor
NXP Employee
NXP Employee

The DTB seems to be incorrect.

According to my understanding it is not absolutely required for the DMA memory ranges to be contiguous, but this coould simplify the system memory map.

0 Kudos

1,337 Views
alessandrocamel
Contributor I

Hi ufedor

As you suggested me I rechecked the driver used but unfortunately I didn't find anything useful.

I attached a screenshot of the difference which I see comparing the syslog obtained using the driver on a x86_64 system and on the T2080RDB.

x86_vs_ppc64.png

All the used addresses (virtual or not) in my understanding are correct.

Could you please suggest me any other actions to perform?

Regards

0 Kudos

1,337 Views
ufedor
NXP Employee
NXP Employee

It will be useful to attach a PCIe bus analyzer and capture trace containing the problem transaction.

0 Kudos

1,337 Views
alessandrocamel
Contributor I

Hi Ufedor

I studied the device driver deeper and I think to have understood the problem.

My device works in this way:

The DMA Subsystem for PCIe uses a linked list of descriptors that specify the source, destination, and length of the DMA transfers. Descriptor lists are created by the driver and stored in host memory. The DMA channel is initialized by the driver with a few control registers to begin fetching the descriptor lists and executing the DMA operations.

Descriptors describe the memory transfers that the DMA Subsystem for PCIe should perform.

After the channel is enabled, the descriptor channel begins to fetch descriptors from the initial address.

The host to device flow sequence is as follow:

1. Open the device and initialize the DMA.

2. The user program reads the data file, allocates a buffer pointer, and passes the pointer to write function with the specific device and data size.

3. The driver creates a descriptor based on input data/size and initializes the DMA with descriptor start address, and if there are any adjacent descriptor.

4. The driver writes a control register to start the DMA transfer.

5. The DMA reads descriptor from the host and starts processing each descriptor.

6. The DMA fetches data from the host and sends the data to the user side. After all data is transferred based on the settings, the DMA generates an interrupt to the host.

7. The ISR driver processes the interrupt to find out which engine is sending the interrupt and checks the status to see if there are any errors. It also checks how many descriptors are processed.

8. After the status is good, the driver returns the transfer byte length to the user side so it can check for the same.

----------------------------------------------------------------------------------------------------------------------------------------------------------

I think the PCI stops working at step 5. The PCI engine doesn't permit to the device to fetch the descriptor from the host memory.

In my opinion this is the reason of the unmapped inbound transaction detected by the PCI controller.

Why does the T2080 behaviour is different from a x86_64 PC?

Can I change the behaviour of the host PCI controller in order to permit this action?

If yes, what do I have to do?

Regards

0 Kudos

1,337 Views
ufedor
NXP Employee
NXP Employee

I believe that it is reasonable to consult with the driver developers.

0 Kudos