SPIFI boot vs. USB/JTAG boot: differences

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Wed Sep 24 03:47:22 MST 2014
Hello,

I recently wanted to boot my almost completed application directly from SPIFI ("cold").
It did not only not work, it even managed to get the chip into a state where the JTAG debugger couldn't connect any more.
The only way to get it back into life was by booting into USB.

The application was working perfectly under JTAG debugging (also from SPIFI).

Right now I got it into working oder (cold boot), but I'm still not happy.

1. Oscillator
One major difference is the clock set up.
With USB the BOOT ROM enables the XTAL oscillator, so when the app. starts, it's already configured and running.
When booting from SPIFI the clock is generated by the IRC and the CPU is supposed to run at 96 MHz.
So the app. has to enable the XTAL oscillator on its own. No problem, my app did that anyway, but I think the LPCOpen 2.12 supplied function void Chip_Clock_EnableCrystal(void) may contain two flaws.
1. The Highspeed-Bit is not set correctly when Fosc<20MHz, because the function never resets it in the first place and by default it is set already. Fix is to replace the line
OldCrystalConfig &= (~1); by OldCrystalConfig &= (~5);.
2. The final wait may be too short. Since the CPU might run at up to 200 MHz at this point, an empty loop 0...1000 seems too short.
I'd rather use 10000 or so.

2. malloc()
This is for me the more serious of the two differences I have found.
I use the redlib() no-host variant. I also used the variant where redlib uses malloc() to print "all-in-one", not char-by-char.
When booting "cold" from SPIFI the code crashes, probably within malloc(). The same code started "warm" from USB/JTAG does not crash. The start of heap looks OK, it's the same anyway.
When I switch to "char-by-char" the problem goes away and I can cold boot.

Although it works, I'm not happy because I do not know why the first version does not work.
It could be sheer luck that it works now and that's not good enough for the field :(

Anyone any idea about the malloc()-problem and/or where to look for remedy?

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Thu Oct 02 07:58:25 MST 2014
Done.

Your new version also works for the real (big) project.
I will now continue to use it for more extensive testing.
So no (further) news is good news :).

I may still finally switch to "nanolib", because the original memory footprint advantage of "redlib" vs. "newlib" mostly vanished with "nanolib".

But until then I'll use "redlib" with your alloc.o. This should give your improvement more test time (for other users).

Again thanks + best regards,

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by lpcxpresso-support on Thu Oct 02 07:27:10 MST 2014
Please try again with the new version attached and let me know how you get on. This should fix your issue, as well as the one I introduced in the previous attachment 0:)

Regards,
LPCXpresso Support

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Thu Oct 02 05:07:17 MST 2014
Hi,

a very quick test with the real setup seems to indicate: problem gone!

I did not alter my source code, just went back to redlib/nohost/"buffered" printf().

With the original alloc.o in the redlib library: immediate crash.
With your new alloc.o: no crash, cold boot OK.

Thanks!

Mike

Update:
Just saw your new comment.
OK, then I'll better continue for now with "nanolib".
It was important for me, though, that this particular problem was not within my code.
Therefore I appreciated your fast response, even if the final solution may take some time.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by lpcxpresso-support on Thu Oct 02 04:50:15 MST 2014
Having done some more testing locally, I've spotted a new problem introduced with my fix for your issue. I've thus removed my previously posted alloc test object. I'll post an updated version once I've fixed and done more extensive testing.

Regards,
LPCXpresso Support

lpcware · ‎06-15-2016

Content originally posted in LPCWare by lpcxpresso-support on Thu Oct 02 01:15:00 MST 2014
I now have a test build of an updated Redlib malloc() that I would like you to evaluate. This can be found in the attached ZIP file.

To use this, select your project in the Project Explorer view, right click to display the context sensitive menu and choose New -> Folder, calling the new folder 'objects'.

Now extract alloc.o from the ZIP file and place it into the'objects' folder that you just created.

Now open Project -> Properties and go to

C/C++ Build -> Settings -> MCU Linker -> Miscellaneous

and click on the Add button in the "Other objects" box, choose the "Workspace" option, then navigate to alloc.o. Repeat as appropriate for other Build Configurations.

Then do a full clean and rebuild. Please report back as to whether this solves your problems.

Note - the alloc.o supplied is suitable for use with Cortex-M4, configured for "FPv4-SP (Soft ABI)" only, so works fine with the test project you supplied. But it won't work with other CPU or FPU settings.

Regards,
LPCXpresso Support

lpcware · ‎06-15-2016

Content originally posted in LPCWare by lpcxpresso-support on Wed Oct 01 01:18:29 MST 2014
Thanks for your test case, which has allowed us to quickly reproduce the problem.

Anyway, this appears to be a very subtle (and normally completely benign) issue in the way the Redlib memory allocator works. It is **only** triggered because of the side effects of the call to spifi_HW_ResetController(), which is why this has not been spotted previously.

We need to do some more investigation and testing, but we will fix this in our next product release.

Regards,
LPCXpresso Support

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Tue Sep 30 04:41:42 MST 2014
Hi,

I have found some time earlier than expected.
Please find attached the stripped-down project that still exhibits the strange behaviour.

I have also changed the code such that it runs on a second LPCLink2, so that you can directly download the code.

The setup here is now:
- one LPCLink2 as debug probe
- one LPCLink, configured for SPIFI boot, as target

On USART2 TxD you can see the output of the printf(), if needed.
For the actual problem to occur this is not necessary, just a proof.

Right now the code is compiled with the buffered printf(), ready for download.
It will execute the first printf() and die on the second.

If you then compile with CR_PRINTF_CHAR defined, it will execute both printf().
See also my comments in main().

For cold boot, the same:
The first version dies, the second runs as expected.

The "real" project is much larger, so you might see unused stubs or functionality in the code.
I have removed the "nanolib" branch, since the problem is triggered already.
With nanolib I it works also (no problem).

Best regards,

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Mon Sep 29 13:17:14 MST 2014
Hi,

thanks for the offer to look into the problem.
It will take some time to get the project back to the state when it failed with redlib.

Indeed I do not terminate (some) text output with "\n", this is no problem for me, I just need to add a flush().

Yes, most of the time I could get away with copying only some routines into the internal RAM.
But since I also need to be able to erase ALL of the external SPIFI it's easier to just become completely independent of the SPIFI. So I copy everything and don't have to think about whether I must disable some interrupts just because the vector table or some called function isn't available when the int comes in while the SPIFI is in command mode.

Your assumption about LPCLink2 as a probe is correct.

I'll come back with a hopefully minimal project showing the effect later, may take several days.

Best regards,

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by lpcxpresso-support on Thu Sep 25 11:39:15 MST 2014

Quote: mch0

However the problem does not show up at all with nanolib.
Nanolib seems to buffer printf() internally also, because I had to add a fflush(stdout) to get the last line actually printed (to my _sys_write()).
So nanolib also seems to use some sort of malloc(), but does not cause problems so far. Cold boot OK.

Yes, Nanolib does use malloc. However its allocation is different to that of Redlib. The fflush() issue makes it sound like you failed to write a newline at the end of your printf (ie a "\n").

Quote:

Program executes entirely from internal SRAM (local SRAM 0x10000000), only routine left in SPIFI is reset_isr().

That's probably more than you really ought to copy. I suggest you take a look at the example at:

http://www.lpcware.com/content/forum/running-code-ram#comment-1135399

This is for an LPC4337 with internal flash, but the principles port over to external SPIFI.

Anyway, given your description, I'm now wondering if things are working under the debugger because of something a redlink script is doing (I assume you are debugging via an LPC-Link2). Again, I need a project, or at least map files and debug logs to investigate further.

Regards,
LPCXpresso Support

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Thu Sep 25 11:03:23 MST 2014
I'll try to reconfigure the system back to the failing state which may take some time.
In the meantime I have altered the project to use nanolib, simply because I can get sources there and in case of a failure I can try to find the cause.
However the problem does not show up at all with nanolib.
Nanolib seems to buffer printf() internally also, because I had to add a fflush(stdout) to get the last line actually printed (to my _sys_write()).
So nanolib also seems to use some sort of malloc(), but does not cause problems so far. Cold boot OK.

General info:

LPC4370
LPCXpresso 7.4.0 (=latest)
Custom Board (only external memory is SPIFI)
SPIFI: S25FL064P
Program executes entirely from internal SRAM (local SRAM 0x10000000), only routine left in SPIFI is reset_isr().

I'll hopefully be able to provide more info later.

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by lpcxpresso-support on Thu Sep 25 00:35:30 MST 2014

Quote: mch0

2. malloc()
This is for me the more serious of the two differences I have found.
I use the redlib() no-host variant. I also used the variant where redlib uses malloc() to print "all-in-one", not char-by-char.
When booting "cold" from SPIFI the code crashes, probably within malloc(). The same code started "warm" from USB/JTAG does not crash. The start of heap looks OK, it's the same anyway.
When I switch to "char-by-char" the problem goes away and I can cold boot.

Although it works, I'm not happy because I do not know why the first version does not work.
It could be sheer luck that it works now and that's not good enough for the field :(

Anyone any idea about the malloc()-problem and/or where to look for remedy?

Nothing springs to mind as to what might trigger your problems.

The main difference in the two printf variants here is that one uses malloc, the other doesn't. However if you are linking with the nohost library variant you will still get malloc pulled in (even if you app doesn't use it) as it is used for grabbing memory for the file i/o streams.

Could you just clarify what the actual circumstances of the failure are though? Are you saying that you debug your image - programming it into SPIFI flash, and it then runs correctly. But if you then disconnect the debugger and probe, and power off/on your board - then your applications does not run correctly?

One thing you could also try after the power off/on your board is starting an attach only debug connection, and see if this provides you with any clues.

http://www.lpcware.com/content/faq/lpcxpresso/debugging-running-system

If you need further assistance, could you provide an actual project that shows up the problem?

http://www.lpcware.com/content/faq/lpcxpresso/how-importexport-projects

Or at least post the linker map files created for both the "all-in-one" and the "char-by-char" printf variant builds. Debug logs from when you start a debug session for each might be interesting too:

http://www.lpcware.com/content/faq/lpcxpresso/debug-log

Also, please can you confirm the exact part you are using, which SPIFI device, and also whether this is your own board or an off-the-shelf one (in which case, which one)? And finally, please confirm your LPCXpresso version.

Regards,
LPCXpresso Support