i.MX6Q eCSPI RXFIFO overflow while DMA transfermation

huang_dexiang · ‎08-05-2014

We are developping a device based on IMX6Q. We use SPI interface(working as SLAVE mode) to communicate with other devices. The driver of CSPI in the BSP(L3.0.35_4.1.0_130816) does not support DMA transfer, so we write a driver that support DMA. The driver can read data from SPI directory from the driver. To test the driver We also write a test program that receive data from SPI and save in an TF card,but we found that the SPI will overflow.

Attached are the source code of the driver and the test program.

When running the program, the callback function (spidev_rxdma_callback() in the driver) will usually report RXFIFO overflowed. Compare the recevied data with the source data,we found that there will always be some data missed.

BTW, We also used the same kind driver in i.MX31, and It didn't overflow .

Can anyone give me some tips about how to fix it?

Original Attachment has been moved to: driver.rar

Original Attachment has been moved to: test.rar

jonashippold · ‎09-29-2014

Hi,

we had a similar project, using an imx6 quad core as SPI slave. We also used a buffer approach, but no DMA. With high data rates we would also see occasional FIFO overflows.

The solution was pretty simple: Instruct the CPU to handle the SPI's interrupts on another core than the rest of the system. Refer for example to this document on how to do that:

https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt

Rates of 20 Mbit/s and more were possible then. (The imx6's Linux Manual states that the maximum data rate of the SPI interface is 12 Mbit/s :smileywink: )

So, I can't check your code for correctness, but according to the rest of this thread you might able to make it work this way.

igorpadykov · ‎08-05-2014

Hi huang

please look at links below and verify your driver

Gmane Loom

Linux Kernel - [PATCH V1] spi: imx: add dma support for ecspi

I.MX6 has different SPI module than i.MX31.

Best regards

chip

-----------------------------------------------------------------------------------------------------------------------

Note: If this post answers your question, please click the Correct Answer button. Thank you!

-----------------------------------------------------------------------------------------------------------------------

huang_dexiang · ‎08-06-2014

I do review the patch submit by Gmane Loom, and config the registers of eCSPI as the patch does. But set the src_addr_width to DMA_SLAVE_BUSWIDTH_4_BYTES, the RX FIFO overflow will always occurs.

igorpadykov · ‎08-06-2014

Hi huang

probably first you can verify solution with SDK (it has ecspi test)

i.MX 6Series Platform SDK :

Best regards

chip

huang_dexiang · ‎08-06-2014

Hi, chip.

I did review the code of the spi test in the sdk released in fsl website. But it got no help from the spi test program. RX FIFO still overflow.

our test case is that the master side(I.MX31) of SPI keep sending data with 8bit ,max speed mode, and the slave (i.mx6dq) work in 32-bit mode, receive data in DMA mode. The slave side will always miss some data.

I try to config the slave side to work in 8-bit mode, but we get only the first byte of every word. For example ,if the master send the data like:0x12 0x34 0x56 0x78, 0xab 0xcd 0xa1 0xa2 ,0xa3 0xa4 0xa5 0xa6,..., the slave side will only get 0x12, 0xab,0xaa3...

I don't know how to get data without missing data.

chip, can you take a look at the source code i uploaded, and give me some hints ?

igorpadykov · ‎08-06-2014

Hi huang

not sure if it is possible to help with that, since

sdma applications are very special. Also this use case is not

supported in Freescale BSPs. If you have commercial

project I would suggest to apply to Freescale Professional Services to develop it;

http://www.freescale.com/webapp/sps/site/overview.jsp?code=CW_PROFESSIONAL

Best regards

chip

huang_dexiang · ‎08-10-2014

Hi, chip.

The eCSPI won't overflow if we don't store the datum into TF card.

But we made a test for the speed of TF card on the MCIMX6Q-SDB, the result is as following .

arm#dd if=/dev/zero of=/mnt/sdcard/test.data bs=2K count=15360

15360+0 records in

15360+0 records out

31457280 bytes (30.0MB) copied, 0.320764 seconds, 93.5MB/s

arm#

I don't know what cause the issue which received datum with DMA from eCSPI and write the datum into TF card.

Can it be the SDMA issue? Is it the conflict of the eCSPI channel and the MMC channale?

igorpadykov · ‎08-11-2014

I think you can test without sdma - if this will work, then

probably this is sdma issue.

Also I think testing with SDK would help

to shed additional light on this problem.

i.MX 6Series Platform SDK :

Best regards

chip

huang_dexiang · ‎08-11-2014

I did a test to receive data without dma instead of interrupt,the RXFIFO still overflow .

从三星移动设备发送

TomE · ‎08-11-2014

What clock rate are you running SPI at? If you're getting FIFO overruns, then something is delaying by 64 bytes, or 64*4 bytes worth of time.

> I try to config the slave side to work in 8-bit mode,

Is the master sending 8 bits per assertion of chip-select, 32 bits per or hundreds of bits with chip select held down? That may be related to this problem. It looks like a problem with configuration of the "burst length" somewhere.

As for the FIFO overruns, you may be seeing an L2 cache flush problem.

I remember reading about this somewhere, but can't find the reference. Something like this is noted in the following:

http://www.xenomai.org/pipermail/xenomai/2013-September/029249.html

The problem I remember was that code was writing to the framebuffer driver that was then flushing the caches to get the written data into the physical memory. Flushing the (shared) L2 cache locked out all cores, so even if you have tasks shared across cores they ALL get blocked. This could be locking out the DMA as well.

Here's a paper from ARM stating "Measurements taken on a dual-core Cortex-A9 at 1GHz with a 256K L2 cache showed that cache flushing can take of the order of 100us.". So multiply that by four for a 1M L2 cache.

http://www.arm.com/files/pdf/cachecoherencywhitepaper_6june2011.pdf

Your TF card driver may be causing this.

I'd suggest writing to the TF driver in small chunks (512 bytes) and then "fflush" after every write. That may get you less sustained delays.

SPI Slaves are very difficult to write as there's no flow control back to the master. If there are unexpected delays in the slave (and Linux is full of unexpected delays) you need some sort of "flow control protocol" layered on top of SPI to handle this. Do you have anything like that?

We had a lot of trouble with the CAN driver because it used NAPI, and delayed all I/O until the NAPI thread. Unfortunately the FEC driver didn't do this, would hog the CPU in the interrupt routine and the CAN hardware would overflow. I had to rewrite the CAN driver to buffer in the interrupts to fix this.

Your TF driver may be the problem here.

Do you manage to have SPI DMA enabled all the time, or do you have to switch DMA buffers over using interrupts? The interrupt latency might be the problem.

Does your SPI protocol allow any "negotiated flow control"? You might want the master to send a small (fit in the FIFO) command, and then wait until the slave has a large enough DMA transfer set up to take the data burst.

The main problem may be that "Linux can't do real time". You will get long unexpected stalls. If your system can't deal with and recover from this it can't work.

Tom

huang_dexiang · ‎08-14-2014

We also do the following test.

Testcase 1: Receive data from SPI but save the data to a temporary, after about 30MB datum was received, stop SPI reading and store the datum

to TF card.

Result : RXFIFO of ECSPI didn't overflow during reading SPI, and the datum received is correct.

Testcase 2: Receive data from SPI , store datum to a USB DISK.

Result: RXFIFO of eCSPI overflow .

Testcase 3: create two threads, and set cpu affinity to each threads, thread one works on core 1, thread two works on core 2.

Thread One is just to read datum from SPI and store to a ring buffer, Thread TWO is to read datum from the ring buffer

and store the datum to TF card.

Result: RXFIFO of eCSPI overflow.

Testcase 4: Launch two process, Process ONE is a background process, it is just to read SPI data but not store to any disk,

process TWO is a foreground process, it is to decode an h264 video file which is in the TF card.

(or process TWO is to write random data to TF card).

Result: RXFIFO of eCSPI overflow.

Testcase 5: make the SD device driver work in PIO mode and eCSPI in DMA mode, received datum fro SPI and store datum to

SD card.

Result: RXFIFO of eCSPI overflow.

Testcase 6: make the eCSPI driver work in interrupt mode instead of DMA mode and SD card in DMA mode, receive data from SPI

and store datum to TF card.

Result: RXFIFO of eCSPI overflow.

TomE · ‎08-14-2014

Have you tried VERY small writes to the CF card? You didn't say you'd tried this.

> Result: RXFIFO of eCSPI overflow.

The conclusion is that the FIFO is always going to overflow if Linux is allowed to do anything else. Or if it feels like doing something else you may not expect it to do.

What I said previously still applies:

> The main problem may be that "Linux can't do real time". You will get long unexpected stalls.

> If your system can't deal with and recover from this it can't work.

SPI Slaves need to be tiny, dumb hardware-based devices, or at least (usually) small 8-bit micros in a tight code loop doing nothing else.

Can you make the i.MX the MASTER SPI device and make the other end the Slave?

You'll either have to implement a proper flow-control protocol (hardware and/or software), have the sender send LESS than a FIFO-full, and then not send again until told to, or it will have to keep copies of blocks of data and keep resending until it is received correctly.

Or work out how to get the DMA controller to guarantee the minimum latency for you, because the main CPU can't.

The alternative is to play "Whack-a-Mole" with all the operations and other drivers everywhere else in the system that are causing the delays. If you find and fix one, then another one will pop up.

> The master running at rate of 11MHz

So 256 bytes (the FIFO size) takes 186 us. Or 46 us if there's only one byte stored per FIFO entry.

> We also used the same kind driver in i.MX31

It might be worth finding out what is different there. Why does it work on the i.MX31? Have you run the other tests (CF, USB, Video decode) on that platform?

Tom

huang_dexiang · ‎08-14-2014

I did try that testcase. One thread to read 2KB data from SPI and put to a ring buffer and another thread to write 512 bytes data from ring buffer to the SD card. The RXFIFO of eCSPI overflow.

TomE · ‎08-14-2014

> I did try that testcase.

There are three different ways to code this. Which one (or ones) did you try?

fwrite(fp);
fwrite(fp); fflush(fp);
write(fd);

The last two should be the same, but should be different to the first one.

> write 512 bytes

You may only have 46 us before the overflow. If the SD card is using a slow SPI channel it might take a lot longer than that to write 512 bytes.

It would be worth measuring the interrupt latency. Take a timestamp on every exit from the SPI RX ISR. When it overflows, grab another timestamp, subtract the first one and "printk()" the result. That will show how long it was between interrupts.

Better still, build a kernel with FTRACE enabled.

I had all the details on how to do this typed in, then "Jive" locked up and lost the lot when I tried to paste some text and it blew up.

ftrace - Wikipedia, the free encyclopedia

That tells you to read the following, which is also with your kernel sources:

https://www.kernel.org/doc/Documentation/trace/ftrace.txt

If you're lucky and your kernel already has this in it, try "mount -t debugfs nodev /sys/kernel/debug".

If this isn't enabled, basically "make menuconfig", then turn stuff on in "kernel hacking / enable tracers select tracers".

Then interesting stuff shows up in "/sys/kernel/debug/tracing/", including stuff to find what's locking out interrupts. This is the PROPER way to trace down these things.

Not that you'll be then able to rewrite enough of everything to fix it, but it is very educational.

You should get something like this (I'm about to try another paste, it will probably go horribly wrong again...)

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 259 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: ps-6143 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: __lock_task_sighand
# => ended at: _raw_spin_unlock_irqrestore
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
  ps-6143 2d... 0us!: trace_hardirqs_off <-__lock_task_sighand
  ps-6143 2d..1 259us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore
  ps-6143 2d..1 263us+: time_hardirqs_on <-_raw_spin_unlock_irqrestore
  ps-6143 2d..1 306us : <stack trace>
=> trace_hardirqs_on_caller
=> trace_hardirqs_on
=> _raw_spin_unlock_irqrestore
=> do_task_stat
=> proc_tgid_stat
=> proc_single_show
=> seq_read
=> vfs_read
=> sys_read
=> system_call_fastpath

(Edit) Not horribly wrong, but the multiple spaces didn't survive, and only the first four lines after "||||" survived.. What is it with a "Show Code" filter that removes multiple spaces and destroys the formatting!

Let's see if I can use raw HTML, plus lots of hacking and editing...

# tracer: irqsoff # # irqsoff latency trace v1.1.5 on 3.4.0-rc3-karo # -------------------------------------------------------------------- # latency: 559 us, #374/374, CPU#0 | (M:server VP:0, KP:0, SP:0 HP:0) #    ----------------- #    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) #    ----------------- #  => started at: trace_hardirqs_off #  => ended at:   trace_hardirqs_on # # #                  _------=> CPU# #                 / _-----=> irqs-off #                | / _----=> need-resched #                || / _---=> hardirq/softirq #                ||| / _--=> preempt-depth #                |||| /     delay #  cmd     pid   ||||| time  |   caller #     \   /      |||||  \    |   / # [idle]-0       0d...    1us : __irq_svc There was meant to lots of other interesting stuff in here, but "Jive" trips over it and kills it.

That looked fine on my screen. Then it got destroyed so it is now all on one line!

Maybe the only way to show code samples in here is to insert a video or a photo of it, or to attach a separate file that people have to click on to open?

So here's a JPEG PICTURE of the text!

Tom

huang_dexiang · ‎08-22-2014

We made a test on the RIoTboard(a board based on mx6 solo).The eCSPI RXFIFO overflows after receiving about 620MB data. While the same driver and programs running on MCIMX6Q-SDB board, the eCSPI RXFIFO overflows after receiving about 8MB data.

TomE · ‎08-25-2014

> The eCSPI RXFIFO overflows after receiving about 620MB

Is that "every 8M on average" versus "every 620M on average"? How many times have you run that test to get those figures? If it is repeatable, then you're probably measuring the period of some other task that is causing the CPU to stall, causing the overrun.

So it may be down to the OS build and the other processes running on those two boards, unless you know they're the same (or close).

I think the higher-spec chip has a larger L2 cache, and would be likely to suffer a longer stall should some code flush the L2. Of course a flush only takes time if the cache is dirty, so it depends on what the CPU and the processes have been dong prior to the flush.

You should try and run the above trace I mentioned. You can put calls in your code so that the eCSPI driver can make a call to STOP the trace when it overflows. Then you know that the last thing the CPU was doing before the trace stopped was the thing that caused the overflow.

Tom

huang_dexiang · ‎09-02-2014

During the last few days ,we did some tests.

After receiving data from SPI, we put the data to VPU to decode instead of stroring to SD card. RXFIFO overflow.

We also measured the interrupt latency as you mentioned above, The delay is not constant. sometime 3ms, sometime 20ms

saurabh206 · ‎11-24-2014

Hi

Huang

Are you able to fixed this issue?

Saurabh

huang_dexiang · ‎03-10-2015

Hi Saurabh,

Sorry for late to reply to you. We still can not fixed the issue about SPI slave.

How about your's ?

huang_dexiang · ‎08-11-2014

Hi Tom,

The master running at rate of 11MHz, and it sends 8 bits per assertion of chip select. And there is no flow control during the transfer, the slave couldn't feedback to the master, the master will just sending datum without any interaction.

I have attached my source code of the driver before, i have to switch the DMA buffer in the CALLBACK of the DMA.

As for the L2 cache you mentioned, I guest the issue is most probably caused by flush the L2 cache, but I don't know how to confirm that.

i.MX6Q eCSPI RXFIFO overflow while DMA transfermation

i.MX6Q eCSPI RXFIFO overflow while DMA transfermation

i.MX6Quad

Linux