DMA copy from a Ls1046A EP in guest VM

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DMA copy from a Ls1046A EP in guest VM

1,470 Views
zy_mooncity
Contributor III

Hi folks,

In my scenario, two LS1046A boards are across connected for data transfer.  One LS1046A acts as RC, the other acts as EP. The RC can read and write the specific memory through in the EP through BAR in a VM running VxWorks. But the performance is not good. In order to improve the data transfer performance, the DMA operation is taken through eDMA controller.

SMMU should be used to do so, both for eDMA and PCI. The eDMA configured with ICID and works well with local memory to local memory test.  As for eDMA, only eDMA ICID Register in SCFG and a SMMU map entry should be configured.

But to PCI controller, it's not configured through SCFG register. I found PEX LUT Entries might be for the configuration.

But I did not found any guide on how to configure these registers to associate with a SMMU entry, both in Ls1046a reference manual or SMMUv2 specification.

My questions are:

  1. what's the step to configure PCIe controller to associate with a SMMU?
  2. what's the relationship among the REQID, ICID (in PEX LUT entry) and streamID in SMMU? What's the rule to choose REQID, ICID for PCIe and streamID for SMMU?
  3. Does only RC need to be configured, or Ls1046 EP also need to be configured?

Thanks in advance.

Regards

Yun

Labels (1)
0 Kudos
2 Replies

1,355 Views
yipingwang
NXP TechSupport
NXP TechSupport

Please refer to the suggestion from the AE team:

Most likely, we would have to re-assign this issue to SW team, since the solution would be from them. Just want to get some clarification from HW & PCIe point of view to understand customer’s application, assuming this is for VxWorks software development done by our third-party, Wind River.

 

So the fundamental requirement is to boost performance of PCIe transaction by using qDMA, is that right? Customer used the term eDMA – I guess this is a typo. Please double confirm they meant qDMA since eDMA is not tied to PCIe. BTW, I thought if the qDMA transaction is outbound, I don’t think we need to involve StreamID. I think our SW team can confirm that.

 

Does customer have to use VM? Normally, qDMA/PCIe transaction doesn’t need VM, which implies that we do NOT need unique StreamID for each EP/Function’s transaction tagging. In other words, more naturally, we can assign one StreamID for ALL downstream PCIe peripherals/EPs, which can still fulfill the usage model of qDMA. The reason of such SW model is that, as described in the LS1046AMR as customer noticed, this version of SMMU implemented in LS1046 only supports 8 StreamID bits, which can’t even be used to uniquely identify various PCIe peripherals sitting on different Bus#. Therefore, hardware-wise, we are short of StreamID to uniquely identify different EPs especially when VM is used. Our SW team would be able to tell us how many StreamID they can actually allocate for PCIe, since StreamIDs are shared among all peripherals. The other common usage of StreamID is when DPAA is used.

 

To solve such limitation stated above, MSI interrupt is a good example that we can use to identify different PCIe devices. Luckily enough our MSI service is provided by SCFG instead of ITS. With that, we can enable all PCIe device to share the same StreamID. This works well as long as customer doesn’t have a requirement to run virtual machine.

 

Not sure if the above answer addresses some of the questions from customer. If not, let me try further to directly answer customer’s questions:

  1. What's the step to configure PCIe controller to associate with a SMMU?

I would leave this one to be answered by our SW team, since that’s their territory.

 

  1. What's the relationship among the REQID, ICID (in PEX LUT entry) and StreamID in SMMU? What's the rule to choose REQID, ICID for PCIe and StreamID for SMMU?

This can be explained from HW side. Let me try. I think customer is well familiar with the PCIe base spec and the Request ID definition from the PCIe TLP header, which is Byte 4 & 5 of the TLP header with Byte 4’s 8 bits for Bus# and Byte 5’s 8 bits shared by Device Number (Bit [7:3]) and Function Number (Bit [2:0]). In short, to uniquely identify transactions from all possible Bus#/Dev#/Fun# downstream peripherals, we need 16-bit Request IDs to cover all the B/D/F combinations.

 

For our Layerscape SoCs, the StreamIDs allocated to inbound PCIe transactions are based on the ICIDs & couple of other AXI side-band signals passed from PCIe controller’s PEXLUT block to SMMU. Different SoCs have different number of StreamIDs defined in the SMMU block of our RM. LS1046 has 8 bits. LS2088 & LS1088 adopt 10 bits defined by AMR MMU-500 r1p0 & below while the latest LX2160 adopts full 16-bit of StreamID width defined by ARM MMU-500 r2p0 and newer.

 

As described in every Layerscape SoC’s PCI Express chapter, the PEX_LUT logic block is responsible for generating the required mapping between StreamID and a particular transaction’s Requester ID. Without getting into too much detail, there are 32 PEX_LUT entries defined for each PEX controller. Each entry allows a predefined PEXLnUDR [MASK] to be used for bit-wise matching with an inbound PEX request’s Requester ID against the PEXLnUDR [REQID]. If there is a match, the PEXLnLDR [PL, BMT, ICID] bit fields will be used on the AXI side-band signals to pass to SMMU for further defined process.

 

Based on our SoC architecture definition, all 16-bit Requester ID required for PCIe transaction is implemented. Therefore, the limitation of StreamID bits is not within the PCI Express controller IP and its wrapper blocks.

 

  1. Does only RC need to be configured, or Ls1046 EP also need to be configured?

I think just RC, unless I missed something from SW side.

 

Let us know is this addresses some questions from customer. Please elaborate little bit on their application need so that we can get SW team involved for further help. Thanks!

1,355 Views
zy_mooncity
Contributor III

Thanks so much for the response. 

I finally success to perform the DMA using qDMA.

0 Kudos