USB mass storage class (MSC) hang-ups

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Thu Jul 31 01:45:43 MST 2014
The last weeks i am trying to make my application to work stable, but a seem to have some trouble with the USBD ROM code.

I am using a LPC4370 with the M0App core that starts the USB device like the USB MSC example code from the LPCOpen library does. The M4 core does some intense calculations and uses DMA to write directly to the memory that is allocated for the USB MSC disk. The PC (Linux) is accessing this device as block device and shares the same addresses (offsets) to read from the correct places.

Simple scenario:
1. PC sends a data "command" to block 0 of the USB MSC.
1a. PC constantly reads block 0 to check if data is ready .. (continues to step 5)
2. LPC can see this data and start the calculation using DMA.
3. The transfer is directly written to block 1 of the USB MSC.
4. LPC updates block 0 after the calculations are done to inform the PC that the data is ready.
5. PC has confirmed from block 0 that the transfer is ready and reads block 1 to get the data.
6. Repeat from step 1.

While this is running, some other blocks in the memory are used for other communication tasks that run less frequently (and run on the M0Sub core).

The problem i see is that the system is responding good for roughly a minute (sometimes 5 seconds, sometimes 2 minutes, mostly random) and then the PC is trying to read from the device, but gets no answer back. This causes the application to crash, since this read should not fail. When i hook up the debugger and watch what the M0App core is doing (that started the USB ROM stack), i see the address 0x10401e28. And i see this almost all the times i am trying this. The offset is in the USBD ROM stack, but i have no code to debug. It seems to wait for something there, but what??

Is it not allowed to access the USB memory from multiple cores at the same time?? What source can be found at the magical 0x10401e28 address?

As far as is can see in the debugger, the rom versions : Core = 1, Hardware Interface version = 1, MSC class = 1, DFU class = 1, HID class = 1, CDC class = 1.

Thank you,
Danny

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Thu Aug 07 05:33:33 MST 2014
Good news, in the meantime we found the problem. It was the memory bus as i expected a few days ago. The M4 was holding up the memory and flash bus, so the M0App could not reach the code for the ISR.

Conclusion, always run all code from RAM.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Wed Aug 06 09:43:05 MST 2014
The instruction trace is just causing a hard fault and crashes the M4 application. But after the hard fault.. i can trace it with the instruction trace. Something is crashing in the clock library file. With or without the specially declared array.

I have looked over our code with a collage today (peer review) and we are both just speechless for what we see. We now for sure that the stack is not the issue, there is plenty of space left. We also did some testing with different codes in the M4 applications main loop. We just can't figure out why the M0App USB is depending on the M4 application. Today we have seen that after the USB MSC is no longer responding, the cores all are just running through the main code. What is nothing for the M0App and M0Sub.. and some dummy code for the M4 application. By changing the M4 main loop code, the USB is reacting different.. and it is repeatable. It either just crashing on formatting the disk or coping the first/second file, or it just keeps working for more then 20 files. There is nothing in between.. and a simple memset in the main loop of the M4 can just cause this !!

Apparently we can't debug the code according to this : http://www.lpcware.com/content/forum/how-debug-when-using-rom-drivers so we are stuck with trying and trying.. just trail and error.

We are contacting NXP since this is really stalling our development. We are just not able to solve this issue on our own, the only thing we can think of is some collision on memory/data bus. Or something with missing interrupts for some reason. Maybe the M0App core is not capable of running the USB code reliable.. it just feels wrong and it sure can't be ignored.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by whitecoe on Wed Aug 06 01:02:08 MST 2014
You might find instruction trace helpful to work out what is going on, on the M4 side at least.

HTH!

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Tue Aug 05 22:33:45 MST 2014
The example that is unstable is just the MSC example with the PC as the only one that is written/reading data from the device. So there is only one user. But eventually the MSC memory will be written/read from two places (M0 core and PC side) but the device will be used as block device, so without file system.

Both the M4 and M0 code are running from flash, i changed the memory mapping per project since they where overlapping.

M4:
   Ram                   Location: 0x1008 0000 Size: 0x7800
   RamFunctions   Location: 0x1008 7800 Size: 0x800
   Flash                 Location: 0x1400 0000 Size: 0x50000

M0App:
   Ram1                Location: 0x1008 8000 Size: 0x4000
   Ram2                Location: 0x1800 0000 Size: 0x3800
   RamFunctions   Location: 0x1800 3800 Size: 0x800
   Flash                Location: 0x1405 0000 Size: 0x28000

M0Sub:
   Ram                  Location: 0x1008 c000 Size: 0x5000
   RamFunctions   Location: 0x1800 4000 Size: 0x800
   Flash                 Location: 0x1407 8000 Size: 0x28000

The USB stack was on 0x1000 0000 with a size of 64KB. For testing it is moved to 0x2000 0000 with a size of 16KB. Both are given the same results. But 0x2000 0000 is the AHB memory.
The USB disk starts at 0x2800 0000 with a size of 64MB (Dycs0).

Oh and the ROM code was something i was hoping to get from here..

lpcware · ‎06-15-2016

Content originally posted in LPCWare by rocketdawg on Tue Aug 05 12:12:12 MST 2014
to many variables. Where is M4 code & M0 code executing from RAM/FLASH? Where is M4 & M0 data? Where is the stack?
It would never block forever, since the bus has arbitration, it could slow it down.

hard fault finds things like stack overflow ...
http://www.lpcware.com/content/faq/lpcxpresso/debugging-hard-fault

if it gets stuck in ROM USB then it is waiting for some event. but what.
you might want to ask NXP if the source to the ROM USB is available and place that code into you app just to debug it.

Brings up a good question, How does one debug the ROM drivers anyway?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by rocketdawg on Tue Aug 05 11:52:46 MST 2014
You may have a problem with the host.
the MSD cannot serve two masters, and the host is the master.
The USB MSD device cannot change the contents of the storage.
The host is the only one that can change the content of the MSD.
Does DMA directly to the storage medium cause conflicts with the FAT or ExFAT file checksum?

You might get it to work, but it is a hack.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Tue Aug 05 11:19:09 MST 2014
The code i have written is not using mallocs. I'm not sure what the requirements for the USBD ROM MSC application is, but this has it's own stack. The USB stack is now set to 64K to be sure that this is not an issue.
The fact that the USB device is not responding anymore is just mind blowing. Why is the M4 causing the M0App to fail?? Are they sharing resources?

I haven't done anything for hard fault exception handling, but shouldn't it be looping in the default handler when i look with the debugger??
Also did some testing with manual stack allocations, so creating a fixed size array and changing the start-up code to use this as stack. Unfortunately this didn't help.
The extra function jump was also not true, when using a volatile bool the system was unstable as well. So the code was just optimized without function call.

What i was thinking .. could it be that the main loop of the M4 is accessing memory and therefore blocking the memory bus for the M0App USB code??

lpcware · ‎06-15-2016

Content originally posted in LPCWare by rocketdawg on Tue Aug 05 11:05:29 MST 2014
Did you include a hard fault exception handler in your project?
Joseph Yiu has one, and there are other versions. Google it
this may tell you where in your code you crashed the system.
generate a map file, that will tell you sizes and placement of objects.
sure sounds like a stack problem.

Using a heap in an embedded project is always a concern.
heap fragmentation can cause malloc() to fail and return NULL, do you check it, then what do you do?
if you don't check it and start writing to address NULL, well ....

the core is fine, we have one dev board running a simple RTOS test app with 10 threads, interrupts, DMA, running for a month now.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Tue Aug 05 04:05:41 MST 2014
One problem less, the SDRAM seems to work quite well now. The configuration of the MSC has been changed to use the SDRAM and the size is changed to 64MB. Now using the USB MSC just as the example, it is not very stable. The code as i described above is disabled, only non-related MSC code is running now.

It seems that the more code i am running on the M4 core, the less stable it gets. Without any code in the main loop i am able to copy a file of 64MB over 10 times to the disk. With an extra function call in the main loop, that just returns a false bool, the example is struggling, it barely can do 1 copy?? The stack and heaps sizes are dynamic, right??

lpcware · ‎06-15-2016

Content originally posted in LPCWare by DannyBergh on Sun Aug 03 22:53:48 MST 2014
Does someone have any idea's ?? In the meantime i got the external SDRAM to work on our hardware board. So maybe later this week it will be hooked up to the USBD MSC device.

USB mass storage class (MSC) hang-ups

USB mass storage class (MSC) hang-ups

LPC43xx