mcf5441x uart dma?

dciliske · ‎03-07-2013

We've been having some issues which we initially thought were "can't be"s (as in, "that can't be happening") with regards to dropping characters on receive over the uarts on the MCF54415. Relatedly, we've seen some odd, rare cases where interrupts can be take ~150us to enter the ISR. I've so far implemented a dma driven DSPI driver (20-100% better throughput over IRQ driven) and recently made an example periodic sampling ADC app using DMA timers. I'm looking into adding DMA to our uart driver to give us a bigger buffers than the 4 byte FIFO in the hardware module, but I can't seem to find anything in the reference manual (version 3 or 4) about how to enable DMA requests instead of interrupts on the UART. Does anyone know the secret I'm missing?

dciliske · ‎04-25-2013

This is the approach we ended up doing (since it was the correct thing). What we found should be a cautionary tale of something not mentioned in the reference manual... The whole story can be found on my boss's blog (Unreasonable Rocket: Finding a Firmware bug...). The short of it is we had a bus timeout occurring when trying to access memory address 0x0000_0000 because we had specifically unmapped that block (all of our platforms do that, so we can catch Null pointers). We also had no FlexBus chip select specially configured to that memory range, so the timeout took the maximum time possible, 64K ticks at 125MHz or 520us.

The problem is that we never tried to access 0x0000_0000; the processor did that on it's own. What happened is due to speculative loading and the following code structure. We have a fair bit of code that looks like this:

if (pFunc) pFunc();

This ends up compiling into assembly that looks like:

moveal pFunc,%a0

testl %a0

beqw skip

jsr @%a0

skip:

What happens is that if pFunc is null, the branch will be taken and the jsr will be skipped over and never called. However, since the processor doesn't know this until the single instruction before the jsr, it attempts to be helpful and load that memory address before the call occurs. In our case, that triggers the unmapped memory acccess, even though the software never tried to do so. And the problem here is that nowhere that we can find in the manual does it mention that unmapping 0x0000_0000 will lead to latency issues due to speculative loading...

The fix for this is simple: establish a valid chip select for that memory range and set it to perform an internal transfer acknowledge with a 0 cycle delay.

View solution in original post

hemanth_mansi · ‎05-19-2014

Hello Dan,

In your above original thread you have mentioned about "dropping characters on receive over the uarts on the MCF54415". I wanted to know the about this issue in detail. can you please provide more info on this or guide me to correct post on this. Currently i am using eDMA for UART1 in MCF54415 processor for my project. i am experiencing a dropping of single character during read.

Please help me on this querry. Thanks in advance.

thanks,

Hemanth

TomE · ‎05-19-2014

Hemanth sent me a private message pointing to this thread.

> I wanted to know the about this issue in detail.

It is fully explained in this thread already. There's also a link to a blog post which still works which details the problem.

I suggest you read them thoroughly before posting again.

It is very unlikely you'll have anything like the same problem though. This was a very specific problem related to prefetching code at an unmapped location, causing a long stall.

Reading characters from a UART is difficult enough on its own without throwing a DMA controller into the mix. This code is always prone to bugs unless you really know what you're doing (and then you make very complicated bugs that are harder to find and fix :--). I expect that's the explanation to your current problem.

Tom

Monica · ‎03-25-2013

Hey Dan!

Were these suggestions helpful?

We'd like to know, keep us posted!

Regards!

TomE · ‎03-09-2013

> I'm looking into adding DMA to our uart driver.

A simpler solution is to move your UART interrupts to IPL6. Then if you're still suffering the delays you really do have to find what is locking out interrupts for so long. You should do this anyway. Search your entire code base for everything that is disabling interrupts completely (to IPL7). Then fix that code so it only raises to the minimum level necessary. If the code is a device driver protecting itself against being interrupted by its associated interrupt at IPL3 (say) then that mainline code should only set itself to IPL3 and not IPL7.

Every piece of code that totally disables interrupts should be considered to be buggy or very lazy programming unless proved otherwise. And then add copious comments saying why it has to be that way. At the very least the code shouldn't be more than a few lines long and shouldn't call any functions.

If you really want to get desperate (and "fix" your current problem while leaving the code in a worse state for future maintenance) then change the UART interrupts to IPL7. This makes the UART driver a lot trickier to write. Not only can't you protect the mainline against the interrupt, but the interrupt routine has to be re-entrant!

Tom

, especially as these UARTs don't allow separate receive and transmit interrupts

dciliske · ‎04-25-2013

This is the approach we ended up doing (since it was the correct thing). What we found should be a cautionary tale of something not mentioned in the reference manual... The whole story can be found on my boss's blog (Unreasonable Rocket: Finding a Firmware bug...). The short of it is we had a bus timeout occurring when trying to access memory address 0x0000_0000 because we had specifically unmapped that block (all of our platforms do that, so we can catch Null pointers). We also had no FlexBus chip select specially configured to that memory range, so the timeout took the maximum time possible, 64K ticks at 125MHz or 520us.

The problem is that we never tried to access 0x0000_0000; the processor did that on it's own. What happened is due to speculative loading and the following code structure. We have a fair bit of code that looks like this:

if (pFunc) pFunc();

This ends up compiling into assembly that looks like:

moveal pFunc,%a0

testl %a0

beqw skip

jsr @%a0

skip:

What happens is that if pFunc is null, the branch will be taken and the jsr will be skipped over and never called. However, since the processor doesn't know this until the single instruction before the jsr, it attempts to be helpful and load that memory address before the call occurs. In our case, that triggers the unmapped memory acccess, even though the software never tried to do so. And the problem here is that nowhere that we can find in the manual does it mention that unmapping 0x0000_0000 will lead to latency issues due to speculative loading...

The fix for this is simple: establish a valid chip select for that memory range and set it to perform an internal transfer acknowledge with a 0 cycle delay.

TomE · ‎04-25-2013

Well found, Dan. That's a nasty trap.

> The fix for this is simple: establish a valid chip select for that memory range ...

It is a shame to get rid of the ability to automatically trap null pointers. I like to have that available too. It is almost worth learning how to set up the MMU to get that ability. Almost...

I can think of a few other possibilities.

1 - Change the Bus Monitor Timeout from 64k to 512. That reduces the "lockout" from 512us to 4us. That's still a very long time for a CPU like this (but have you measured how long it takes to read or write a GPIO pin lately?).

2 - Change all problem code sequences from "if (pFunc) pFunc(); to "if (dummy && pFunc) pFunc();" where "dummy" is a global flag that is always set. That might be enough to avoid the prefetch, although this CPU looks very smart (target caches and so on).

3 - (2) but go via a function call like "if (pFunc) myCallFunc(pFunc)". That should stop it.

4 - I've (ab)used the Debug Module Program Counter Breakpoint registers on an MPC860 to trap "all accesses below 64k" and "all write accessed below the bottom of the stack", which was set up to be the top of the code and read-only data section. That caught a LOT of problems! The Debug module in the MCF5441x looks like it might be able to do the same, and can be accessed from running code via the WDEBUG instruction..

You'd use (4) in combination with enabling mapping at address zero to have both the real-time null-pointer checks and the code running properly.

Tom

dciliske · ‎02-03-2015

Tom,

This is a bit of an older post, but I'll reply anyways: With regards to suggestion (4), I've implemented NULL catching using the debug module for reads and the cache controller for writes under 1MB memory address.

The code for the NULL write traps is as follows:

// CatchNullWrites catches writes to the low 1MB of memory by using the cache
// to disallow writes to that memory space.
asm(".text");
asm(".global CatchNullWrites");
asm("CatchNullWrites:");
asm("  move.l #0x0000E04C,%d0");
asm("  movec  %d0, %ACR0");
asm("  nop;");
asm("  rts;");

-Dan

TomE · ‎04-27-2013

I wrote:

> Options 1, 2, 3, 4.

Here's some more.

5 - Instead of using NULL function pointers, write a "designated empty function". Initialise all "empty" function pointers to point to this dummy one. Then you can either use "if (pFunc != pDummyFunc) pFunc();" or simply call the function and have it do nothing, or log that it was called (telling you about some calls you haven't converted to the new form yet).

6 - Initialise the "empty" function pointers to ONE instead of zero. Use a #define to make the purpose clear (and let you change the definition of the "invalid function pointer" if the code is ported to something else). This will involve the usual fun casting, but you can hide it in the #define. Then write "if (VALID_FUNC(pFunc)) pFunc();". The "speculative prefetch" should throw an illegal address trap before it performs the read and gets the bus timeout stall.

I'd suggest "#define VALID_FUNC(pFunc) (pFunc > ((void (*)())1))" as that will skip both NULL and "one" pointers.

Tom

TomE · ‎03-08-2013

> Relatedly, we've seen some odd, rare cases where interrupts can be take ~150us to enter the ISR

So your code is doing something you don't understand and don't expect. That's always dangerous.

I wouldn't try to work around a problem like that, but would try to find the buggy code that is locking the CPU out for that long.

This is very easy to do - it's about 10 minutes of programming to find the culprit, and an hour or two to implement a really useful solution you can leave in there that enhances the system for future monitoring and testing.

As part of your basic system design (stating what it is and what it does) you should have a line stating "Interrupt latency must be less than so-many microseconds". You may be a bit more sophisticated and separately list the required IPL0 latency, IPL1 latency, all the way up to IPL6 latency.

Then you have to enforce these requirements by actively measuring the interrupt latency, and declaring any code that disobeys those requirements to be broken, and then fix it.

The toughest one to measure is the IPL0 latency - the maximum time IPL0 can be locked out by interrupts. You need intrusive loop-testing to do this. Basically, replace your "waiting for something to do" loop with code that reads a DMA timer and measures the maximum time it has been locked out for. To find out what locked you out you throw a breakpoint, and then check the stack FORWARDS (not back) to see what the ISR pushed onto the stack. That should let you know which one did it. But you really need "interrupt logging" to track this down (see later).

All the other levels can be measured with a periodic timer interrupt that reads the timer in its ISR, calculated the latency and then throws a breakpoint when it detects a "latency violation". Usually you'll find the interrupt has happened to a line of code that just re-enabled interrupts, and then you know what function ran for a long time with interrupts locked out (that you probably didn't intend).

In the more complicated case of a higher interrupt service routine taking too long (and causing the latency), you write "ISR_LOG_START" and "ISR_LOG_FINISH" macros that you add to the start and finish of all interrupt service routines. They keep a log of start and finish times, and that then provides the evidence of what is taking too long.

I have those functions recording the total and maximum times taken at each IPL (automatically subtracting time taken at higher IPLs that interrupted the lower ones) and also the IPL0 non-idle time, and can track which execution levels are taking how much of the CPU.

Of course if you're using a third-party USB stack, and it is causing the problems, you're in for a lot of work trying to fix it.

Rant over, back to your original question...

> I can't seem to find anything in the reference manual (version 3 or 4) about how

> to enable DMA requests instead of interrupts on the UART.

I just started in the UART chapter and searched for every instance of "DMA":

41.3 Memory Map/Register Definition

NOTE: Interrupt can mean an interrupt request asserted to the CPU or a DMA

request.

41.4.2.3 FIFO

The RXRDY or FFULL bit can be selected to cause an interrupt and TXRDY or RXRDY can be used to generate a DMA

request.

41.5.1 Interrupt and DMA Request Initialization

41.5.1.2 Setting up the UART to Request DMA Service

Table 41-15. UART DMA Requests

UISRn 1 Receive DMA request

UISRn 0 Transmit DMA request

Well that was a bust. As you found, that didn't help answer your question at all!

Let's try searching the eDMA chapter for "UART":

19.4.3 eDMA Enable Request Registers (EDMA_ERQH, EDMA_ERQL)

The EDMA_ERQ{H,L} registers provide a bit map for the 64 implemented channels to enable the request

2 UISR0[FFULL/RXRDY] UART0 Receive

3 UISR0[TXRDY] UART0 Transmit

4 UISR1[FFULL/RXRDY] UART1 Receive

5 UISR1[TXRDY] UART1 Transmit

6 UISR2[FFULL/RXRDY] UART2 Receive

7 UISR2[TXRDY] UART2 Transmit

And so on. So it looks like you enable the UART Interrupts within the eDMA module.

Tom

mcf5441x uart dma?

mcf5441x uart dma?

General