How to Disable Memory Barrier in SDMA

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to Disable Memory Barrier in SDMA

1,318 Views
turgaypamuklu
Contributor II

Hi,

For my I.MX6 chip, I wrote a script for SDMA core which transferred data between the arm memory and the EIM Bus (FPGA on the other side). When I run the script, I monitored that each transfer duration is approximately 5 EIM clock cycles (address+1DWord Data + wait time) and the time gaps between these transfers are around 25 EIM clock cycles. So I thought I have a performance problem and I operated the same operation by a kernel thread in ARM CPU. The timings were the same! At the end of the day, I realized that the memory barrier in write&read operation caused this tremendous gap time between the operations. So, I used writel_relaxed&readl_relaxed operations, which does not call barrier functions, and the new gap time reduced to only 4 EIM clock cycles. So, I think the same problem occurs in the SDMA operations, because the gap durations are exactly the same.

In conclusion, I would like to know if there is any configuration&method to disable memory barrier in SDMA? I saw this gap problem in every SDMA function units (Peripheral&Burst) and every type of operations (prefetch, copy mode, read and write modes...).

Best Regards.

Turgay Pamuklu.

Labels (4)
Tags (3)
3 Replies

1,075 Views
turgaypamuklu
Contributor II

I attach the Chipscope waveforms of three situations at this post. First one shows the signal values of EIM operations between the CPU and FPGA without barrier (relaxed mode). CS2 is the chip selection of the operation. It is an active low signal and when it is low, CPU is allowed to operate an EIM operation. CS2 is at high value between the X and O vertical lines and this value is around 5 EIM clock cycles for relaxed mode.

relaxed.png

The second image (below) shows the signal values of EIM operations between the CPU and FPGA with barrier (normal mode). The gap between the X and O vertical lines is 25 EIM clock cycles, which means that with the barrier mode, the utilization of EIM bus is around %20. I solved this utilization problem by reading&writing at relaxed mode during the EIM operations and I add only one barrier at the end&beginning of the overall operation. However we could not run CPU in burst mode without SDMA so I should solve this utilization problem in SDMA.

normal.png

The last image (below) show the signal values of EIM operations between the SDMA and FPGA.  The gap between the X and O vertical lines is 25 EIM clock cycles, which is the same as the barrier mode CPU & FPGA communication.

sdma.png

I hope someone know the reason of this gap between the EIM operations in the SDMA. I think the reason is barrier mode operation in SDMA because the second and last image has the same gap values but I do not know how to prove and disable it.

Best Regards.

Turgay Pamuklu.

0 Kudos
Reply

1,075 Views
igorpadykov
NXP Employee
NXP Employee

Hi turgay

sdma itself does not affected by memory barriers, however

initiating sdma transfer is done as usual linux command and it may

be affected by barriers. Suggest to post this on kernel.org as

seems this is specific kernel issues.

Best regards

igor

-----------------------------------------------------------------------------------------------------------------------

Note: If this post answers your question, please click the Correct Answer button. Thank you!

-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply

1,075 Views
turgaypamuklu
Contributor II

Hi igorpadykov,

in my script SDMA works independently from the processor&Linux. The script read&write to the EIM bus periodically. The following code is an example of my peripheral operation:

------------------------------------------

#Reg[4] = FPGA_BASE_ADDR;// EIM_BUS_ADDRESS
#Reg[7] = 0xFFFFFFFF;// for always loop

getdescription:

################################################### CHECK READY FLAG

  stf r5, 0xc7# Get the Desciptor (To MSA, incremented mode, 32 bit data width)
  ldf r0, 0xc8# Copy first dw of SDMA descriptor (pd,)
  btsti r7, 0# always loop
  bf getdescription# always loop

------------------------------------------

As I said before, by this code SDMA has no dependency to main processor. It is an infitinite loop to read 1Dword from EIM_BUS  and it cause huge gaps between the read operations.

I think this gap is caused from memory barrier for two reasons. First the gap duration in SDMA operation is the same with the main processor read operation with memory barrier. Second in ARM site, it is stated that "A barrier is required between a CPU memory access and a DMA operation".

Anyway, I may totally wrong about "memory barrier" but I could not understand why SDMA needs these huge gaps between its operations. The problem is not from EIM_BUS because when I run a kthread concurrently with sdma operation, I can reach the EIM_BUS and complete a read&write operation between the SDMA operations!! 

Regards.

Turgay.

0 Kudos
Reply