Vybrid QSPI boot flow

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Vybrid QSPI boot flow

Jump to solution
10,421 Views
ogj
Contributor IV

I'm hoping someone up out there is familiar with the boot up flow of the Vybrid processor.  I have an OTS SOM that will be booting the A5 core from QSPI flash (located on the SOM) in DDR x2 mode, with the code ending up in DDR memory. I’m trying to fill in the data structures and understand the boot flow. I understand the QuadSPI Configuration Parameter block Table 7-21 and the basics of the IVT.

My understanding of the first part of the boot flow with QSPI is:

  • -The QSPI clock is configured to run at 18 MHz. What speed is the processor clock?
  • -The QSPI pins are configured.
  • -Do basic QSPI read operation starting at flash location 0 (318 bytes) to get configuration parameters
  • -Re-configure the QSPI controller per the parameters
  • -Re-configure the clock to run the QSPI controller at 66 MHz (from parameter). What does this do to the processor clock?

I assume that the following (or something like it) is what happens next:

  • -The boot loader reads the first 4KB(??) from flash starting at location 0 into OCRAM memory starting at 0x3f00_0000(??). It then knows that the IVT will be at 0x3f00_0400.
  • -Executes any instructions in the DCD (set processor clock to 396 MHz and enable the DDR cntlr).
  • -Using the info in the BOOT data struct Table 7-54 (start and length), “length” bytes (the code image) are loaded into memory starting at location “start”. How does the boot loader know where the image is in flash memory?
  • -Boot loader “jumps” to the entry point “entry” given in the IVT.

Questions

How much of the above is actually correct?

When the QSPI clock is at 66 MHz, what speed is the A5 core running at? (I assume 396 MHz)

What location is the “boot stuff” copied into in OCRAM? If it is 0x3f00_0000, what happens when the app code is also loaded into the same space? How many bytes are read in?

How does the boot loader know where the app code is in flash memory?

Can the QSPI controller function properly in DDR mode during boot? I haven’t seen any of the examples use it. They all use SDR mode.

Labels (1)
1 Solution
9,586 Views
kef2
Senior Contributor V

Hi,

I was quite familiar with it few years ago but can give some hints.

Well, using non-XIP mode (copy all to RAM) with big DDR RAM makes sense. But using quite small on-chip SRAM I use XIP mode because the most of initialization routines and not time critical code can run happily from QSPI saving RAM space for data. Time critical code, also something like code to write some data back to QSPI memory is copied by C startup routine to RAM. You may initialize DCD struct with DDR RAM initialization commands, but if you have doubts about how QSPI boot initializes clocks and have even more doubts how this may interfere with DDR RAM setup, then you should rethink what's better for you, XIP boot followed by clocks reconfiguration for best possible CPU and DDR RAM speeds or non-XIP boot with not optimal clocks. I don't know if it's safe to reconfigure DDR without loosing code bits already copied to DDR?

-The QSPI clock is configured to run at 18 MHz. What speed is the processor clock?

 

Well, in XIP mode it doesn't matter, I reconfigure clocks in my code.

  • -The QSPI pins are configured.
  • -Do basic QSPI read operation starting at flash location 0 (318 bytes) to get configuration parameters

1k (0x400) is read with QSPI configuration settings like memory sizes, two chips parallel or single chip mode, CS hold/setup times etc. With parallel setup 1k is read from chip A.

  • -Re-configure the QSPI controller per the parameters

Yes

  • -Re-configure the clock to run the QSPI controller at 66 MHz (from parameter). What does this do to the processor clock?

Again, it doesn't matter for XIP boot.

   I assume that the following (or something like it) is what happens next:

-The boot loader reads the first 4KB(??) from flash starting at location 0 into OCRAM memory starting at 0x3f00_0000(??). It then knows that the IVT will be at 0x3f00_0400.

It's not clear whether it is copied to OCRAM or handled at the fly, but next DCD table should be processed, else how could you perform for example DDR RAM setup before copying data to it? BTW, you may try using DCD to reconfigure device clocks for optimal setup!

  • -Executes any instructions in the DCD (set processor clock to 396 MHz and enable the DDR cntlr).

-Using the info in the BOOT data struct Table 7-54 (start and length), “length” bytes (the code image) are loaded into memory starting at location “start”. How does the boot loader know where the image is in flash memory?

  • -Boot loader “jumps” to the entry point “entry” given in the IVT.

 

Yes, something like this.

When the QSPI clock is at 66 MHz, what speed is the A5 core running at? (I assume 396 MHz)

I don't know, but it doesn't matter in XIP mode. Also you may craft your DCD table to reconfigure it like you wish.

You may have problems debugging DCD, try using device clock monitor pins. Yes, you need to setup them from the same DCD table. Since I used XIP, my DCD has 0 commands, just table size and DCD version.

What location is the “boot stuff” copied into in OCRAM? If it is 0x3f00_0000, what happens when the app code is also loaded into the same space? How many bytes are read in?

Since datasheets don't specify reserved boot locations in OCRAM -you shouldn't worry about it, i'm sure all RAM is available for your purposes.

How does the boot loader know where the app code is in flash memory?

Source is at least address of QSPI memory. Destination is specified in Boot data structure. (A little problem with source image in parallel mode. First kilobyte is read from A device 0..0x3FF. The rest is read in parallel mode, which means device address 0x400..0x4FF corresponds to destination offsets 0x??400..0x??5FF, 2x more data.)

Can the QSPI controller function properly in DDR mode during boot? I haven’t seen any of the examples use it. They all use SDR mode.

Yes, why not. You just need setup configuration struct accordingly. I tried to juice as much as possible from my QSPI memory, it has quad pin mode, also supports DDR mode, but at slower clock speed. 99MHz SDR performed   faster than available fastest DDR mode.

Hope this helps

Edward

View solution in original post

0 Kudos
Reply
41 Replies
3,803 Views
ogj
Contributor IV

I took a closer look at the Vybrid QuadSPI in the Datasheet (Rev 9 1/2018). The max frequency given in Table 49 (SDR mode) of 80 MHz and Table 51 (DDR mode) of 45 MHz are for writes to the flash - not reads. Obviously writes are slower than reads. The read time for DDR mode is:

The Tck shown in this diagram is not the same frequency as the one shown in the write diagram. My SOM has 80 MHz devices so I think running at 80 MHz is not out of the question. Just have to slow the clock for writes.

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

Good eye! But what about address and command phases of instruction? It is still output/write, which table title talks about "QuadSPI Output/Write timing (xDR mode)". I would love better 1/Tck limit for reads. If DDR with learning works up to 108MHz on Vybrid Tower Board, I'd love 80 and even 66MHz.

BTW I was testing it in parallel mode.

  •    I also noticed that I was only using 2 dummy cycles in DDR x4 mode. I increased that to 4 and DDR x4 started working.

You should not guess these figures but check your memory datasheets. But it is not easy to deduce reading and scrolling memory datasheets several times. For Spansion S25FL128S the numbers of dummy cycles are specified in four tables:

tables810-811.png

tables812-813.png

Two tables for "high performance" and two tables for "enhanced performance". The first or the second depends on part code, either ordering with enhanced feature or without it, or reading part information from the chip. LC in the table is configurable thing. You may reprogram register inside the memory chip(s) for one or another LC setting and thus for more or less dummy cycles. But you can't take any LC setting but should follow Freq. requirements in these tables. Also, keeping in mind DLP feature, only settings with more than 4 cycles will work with DLP. So you need 5, 6,.. dummy cycles. It was not clear for me are S25's on Tower board enhanced high performance or not, fortunately ED/EE commands have the same dummy cycles in both chip variants. Default LC=00 setting is already fine for DLP. Of course it is a wonder DDR reading was working up to 108MHz, while memory tops at 66.

You said you used PLL1 PFD3 and /3/2/2 dividers. 528/3/2/2=44, looks like you not use PFD but bypass it? Why not lowering divider and use finer PFD clock setting granularity to reach 45MHz? I used 2nd suggestion for QSPI in Table 6-5. Typical PFD Configuration, PLL3 PFD4. This is pll3_pfd4 fraction setting

ANADIG->PLL3_PFD =   (ANADIG->PLL3_PFD & ~ANADIG_PLL3_PFD_PFD4_FRAC_MASK)
                | ANADIG_PLL3_PFD_PFD4_FRAC(31);

Fpll3 = 480MHz.

Fpfd4 = Fpll3 * 18 /  pfd4_frac; where pfd4_frac ranges from 12 to 35

Dividers 1/2/2 should be fine up to and even above 480/1/2/2=120 MHz. Fine tuning of QSPI clock is done changing PFD4_FRAC. PFD4_FRAC = 21,20,19 means 102.9, 108, 113.6MHz. With higher PLL1 clock I would have a bit finer granularity and perhaps could reach even faster DDR. Not really. 113.1 vs 113.6, still over 5MHz to go from 108.

Edward

0 Kudos
Reply
3,806 Views
ogj
Contributor IV

Thanks for the feedback. I dropped the speed to 44MHz (PLL1:PFD4 @ 528 MHz /3 /2 /2 = 44 MHz), and I also noticed that I was only using 2 dummy cycles in DDR x4 mode. I increased that to 4 and DDR x4 started working. I know I had this working at 66 MHz at one time but in changing so many things, I must have changed the number of dummy cycles. It's very tempting to run faster, but for now I'm going to leave it at 44 MHz. I am using 0x34 as the DLP value. Since I have two identical parts I might try parallel in the future. Thanks again.

0 Kudos
Reply
3,806 Views
ogj
Contributor IV

I finally tracked this down to the SOM that I am using (manufactured by Emcraft) can't reliably run at 66 MHz DDR x4. The best it can do (at 66 MHz) is x2 - even though the flash devices are rated at 80 MHz. I don't know whether this is a data transfer issue or what, but I don't have any more time to spend on it. I attached a couple of pictures of the data I'm seeing. Let me know what you think. Were you running the flash at 66 MHz or some other speed?

Correct.jpg

Incorrect.jpg

0 Kudos
Reply
3,806 Views
kef2
Senior Contributor V

Hi,

Well, the first thing users should do - read specifications :-). And vybridsec.pdf rev9 specifies max QSPI 1/Tck = 80MHz for SDR mode and max 1/Tck = 45MHz for DDR mode. Almost 2x worse DDR clock spec. kind of negates usefulness of DDR mode on Vybrid.

Yes, your pictures look like DDR sampling points are bad. Did you try changing DDR sampling point settings, perhaps this could help.

Edward

0 Kudos
Reply
3,801 Views
ogj
Contributor IV

That explains a lot. Since I didn’t design the SOM, I didn’t pay enough attention to the specs. I thought I had seen 80MHz for DDR mode and 132 MHz for SDR mode. Turns out that was for the flash. It didn’t dawn on me to check the processor spec’s. I’m using DLP when in DDR mode. I’m under the impression that DLP sets the sample point automatically. Do you know if that is true?

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

Yes, DLP should help setting DDRSMP sample point automatically, but

1) are you sure your memory chip is transmitting DLP? S25FLxxx have configurable latency code setting. Not all settings have enough dummy clocks to allow DLP transfer. There's also DLP pattern setting, are you sure it is something useful like 0x34 (should toggle on clock edges, as well in the middle between two clock edges) and not zero or 0xFF?

2) did you check Vybrid QSPI status bits, DLPFF? Once QuadSPI module executes DATA_LEARN sequence (what you show in your picture) it should set DLPFF bit in case of problem determining DDRSMP.

(Some) Smaller QSPI memories don't have DDR instructions so I decided to go better availability, code works well with bigger and smaller memories, so no DDR.

Regards

Edward

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

You've made me curious to try DATA_LEARN. I made tower board reading QSPI in 4x DDR mode at 45MHz. Then enabled DATA_LEARN and raised frequency to 108MHz easily, no data failures. QuadSPI-SR-DLPSMP changes few units up and down with frequency. I disabled DATA_LEARN feature and can't go above 49MHz without struggling with DDR sampling settings. Nice feature, but specifications still don't allow to go above 45.. Perhaps it is an error, I don't know.

With DATA_LEARN enabled I see read failures even at 45MHz until Vybrid and memory DLP patterns match. DLP match is not enough. It is also necessary to have match between DATA_LEARN number of pins argument and real number of pins. This is bit weird. Spansion QSPI memory tells it transfers the same DLP on all pins. 8bits DLP in all 1x, 2x and 4x pins modes should take the same 4 clocks to transfer. So perhaps Vybrid checks all pins or something, I don't know.

This is excerpt from LUT initialization. #if 1 case is for DATA_LEARN disabled and #else case for DATA_LEARN enabled. Spansion Quad I/O read command (0xED/0xEE) with default LC==0 latency code setting has the same amount of dummy bits (6) in both HPLC and EHPLC tables. So you may see two LUT variants, 6 dummy bits not learning and 2 dummy + 4 data learn bits:

  // SEQID 8 - Quad DDR Read
  // quad ddr read - 24 bit addresses
  QSPI->LUT[32]=   LUTi0(lCMD,      pOne,  0xED)
                 | LUTi1(lADDR_DDR, pFour, 24);
  QSPI->LUT[33]=   LUTi0(lMODE_DDR, pFour, 0x55) // <--- complimentary nibbles = continue (ex. 0xA5)
#if 1
                   | LUTi1(lDUMMY,    pFour, 6);
  QSPI->LUT[34]=   LUTi0(lREAD_DDR, pFour, 128) //  // read 128 bytes    // 24013a80
                 | LUTi1(lJMP_ON_CS,pOne,  0);
  QSPI->LUT[35]=0;
#else
            | LUTi1(lDUMMY,    pFour, 2);
  QSPI->LUT[34]=   LUTi0(lDATA_LEARN, pFour, 0x34)
         | LUTi1(lREAD_DDR, pFour, 128); //  // read 128 bytes
  QSPI->LUT[35]=   LUTi0(lJMP_ON_CS,pOne,  0);
#endif

Edward

0 Kudos
Reply
3,801 Views
ogj
Contributor IV

Ran into an interesting phenomenon recently. As you know I am trying to write a "loader" program that gets loaded by the Vybrid ROM on reset. The loader program (executing from OCRAM) initializes the clocks and DDR controller, and using an optimized QSPI driver, copies my main program (setting in QSPI flash) into DDR memory. This gets around using the DCD to set up either the QSPI, DDR, or clocks.

In doing this, I've written a QSPI driver based on the one in u-boot. One of the problems I'm having is that it produces different results depending on whether I use it with my loader app (which is bare metal), or use it with my main app (which runs under MQX). I don't use the QSPI driver internal to MQX because of some problems it has. The biggest issue is that when I use my driver under MQX, and look at the data using IAR's examine memory function, it appears the way it should be. When I use the exact same driver in my bare metal app, the data is all messed up (nibbles out of order). If I halt my loader program and examine the flash memory (0x20000000+), the nibbles are out of order. When I halt my main app and examine the same area, it looks fine. I even checked the endian bit, but it's the same with both apps (little endian). I examined memory with J-Flash which is independent of code, and it shows the flash to be correct. There is no difference in the driver between the two apps. The only difference I can see is the memory copy function used in the driver. In the bare metal app, memory copy (as supplied by IAR) is done using load/store multiple instructions. Under MQX, memory copy (as supplied by Freescale) is done using the Neon coprocessor.

The IAR  debugger has an examine memory function. The thing that's really crazy is that when I use this function to examine flash memory (0x2000000+) while running the two apps (loader and main apps) the results are different. Nothing is being reprogrammed (of course I have to initialize the QSPI controller, but I checked, and the registers are set up the same way in both apps).

I can program my loader program to manually manipulate the nibbles to make them correct, but that takes way too much time. Any ideas?

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

Do you really see nibbles swapped and not bytes? Are you using QSPI DDR mode? If you had 4-pins mode + DDR, then perhaps bad phase delay could produce >>4 or <<4 shift and memory view similar to nibbles swapped?

I don't know what else could make nibbles swapped. You can't nibble swap even in parallel QUAD mode, for this Vybrid pin muxes should allow pins swapping QSPI_IOn_A with QSPI_IOn_B, which is not available.

Well I tried for my curiosity all QSPI modes in the past, 1x, 2x, 4x +DDR / noDDR. All worked well. Bad things of course may happen if you fail setting up QSPI lookup table (QSPI commands) properly, but I have no idea if and how these may lead to nibble swap.

Ah, do you see these problems when debugging from IDE or when booting from QSPI? If problems only while booting, then this may indicate some part of your initialization assumes reset default state, something more is needed to reinitialize after boot ROM. (DS-5 debugger helped there. After QSPI was programmed, I was unchecking run until _main option, loading only debug symbols, clicking reset in debugger, setting up HW breakpoint at program entry point, clicking continue. This made boot ROM code executed and stopped at my program entry point. This allowed to figure all troubles.)

Edward

9,587 Views
kef2
Senior Contributor V

Hi,

I was quite familiar with it few years ago but can give some hints.

Well, using non-XIP mode (copy all to RAM) with big DDR RAM makes sense. But using quite small on-chip SRAM I use XIP mode because the most of initialization routines and not time critical code can run happily from QSPI saving RAM space for data. Time critical code, also something like code to write some data back to QSPI memory is copied by C startup routine to RAM. You may initialize DCD struct with DDR RAM initialization commands, but if you have doubts about how QSPI boot initializes clocks and have even more doubts how this may interfere with DDR RAM setup, then you should rethink what's better for you, XIP boot followed by clocks reconfiguration for best possible CPU and DDR RAM speeds or non-XIP boot with not optimal clocks. I don't know if it's safe to reconfigure DDR without loosing code bits already copied to DDR?

-The QSPI clock is configured to run at 18 MHz. What speed is the processor clock?

 

Well, in XIP mode it doesn't matter, I reconfigure clocks in my code.

  • -The QSPI pins are configured.
  • -Do basic QSPI read operation starting at flash location 0 (318 bytes) to get configuration parameters

1k (0x400) is read with QSPI configuration settings like memory sizes, two chips parallel or single chip mode, CS hold/setup times etc. With parallel setup 1k is read from chip A.

  • -Re-configure the QSPI controller per the parameters

Yes

  • -Re-configure the clock to run the QSPI controller at 66 MHz (from parameter). What does this do to the processor clock?

Again, it doesn't matter for XIP boot.

   I assume that the following (or something like it) is what happens next:

-The boot loader reads the first 4KB(??) from flash starting at location 0 into OCRAM memory starting at 0x3f00_0000(??). It then knows that the IVT will be at 0x3f00_0400.

It's not clear whether it is copied to OCRAM or handled at the fly, but next DCD table should be processed, else how could you perform for example DDR RAM setup before copying data to it? BTW, you may try using DCD to reconfigure device clocks for optimal setup!

  • -Executes any instructions in the DCD (set processor clock to 396 MHz and enable the DDR cntlr).

-Using the info in the BOOT data struct Table 7-54 (start and length), “length” bytes (the code image) are loaded into memory starting at location “start”. How does the boot loader know where the image is in flash memory?

  • -Boot loader “jumps” to the entry point “entry” given in the IVT.

 

Yes, something like this.

When the QSPI clock is at 66 MHz, what speed is the A5 core running at? (I assume 396 MHz)

I don't know, but it doesn't matter in XIP mode. Also you may craft your DCD table to reconfigure it like you wish.

You may have problems debugging DCD, try using device clock monitor pins. Yes, you need to setup them from the same DCD table. Since I used XIP, my DCD has 0 commands, just table size and DCD version.

What location is the “boot stuff” copied into in OCRAM? If it is 0x3f00_0000, what happens when the app code is also loaded into the same space? How many bytes are read in?

Since datasheets don't specify reserved boot locations in OCRAM -you shouldn't worry about it, i'm sure all RAM is available for your purposes.

How does the boot loader know where the app code is in flash memory?

Source is at least address of QSPI memory. Destination is specified in Boot data structure. (A little problem with source image in parallel mode. First kilobyte is read from A device 0..0x3FF. The rest is read in parallel mode, which means device address 0x400..0x4FF corresponds to destination offsets 0x??400..0x??5FF, 2x more data.)

Can the QSPI controller function properly in DDR mode during boot? I haven’t seen any of the examples use it. They all use SDR mode.

Yes, why not. You just need setup configuration struct accordingly. I tried to juice as much as possible from my QSPI memory, it has quad pin mode, also supports DDR mode, but at slower clock speed. 99MHz SDR performed   faster than available fastest DDR mode.

Hope this helps

Edward

0 Kudos
Reply
3,801 Views
ogj
Contributor IV

Thanks for the reply. I do need to end up with the code in DDR memory for speed reasons. There are several plans of attack that I see:

1 – use the DCD to reconfigure the clocks to set the A5 core to 396 MHz and initialize the DDR (I have 512MB) and set up the BOOT DATA to copy my code from the flash to DDR

2 – forget the DCD and BOOT DATA structs and run a small routine that executes XIP and resets the clocks and reconfigures the QSPI controller to DDR x4 @ 66 MHz

My first question with both of these approaches is “can you reconfigure the QSPI controller while you’re running from it in XIP mode?”

Table 7-22 indicates that on bootup, the fastest speed of the QSPI controller in DDR mode is 18 MHz. I don’t know why it’s limited to that. Have you found otherwise? For speed reasons, it would be nice to run in DDR mode, but 18 MHz is a non-starter. My guess is that changing the processor clock speed while

Running in XIP mode is probably OK, but I don’t think you can change the QSPI controller setup while running in XIP mode. So I don’t think method 2 will work. You’re stuck with the fastest rate being SDR x4 @ 74 MHz which can be set up via the Configuration Parameters at powerup.

I’m still trying to figure out how method 1 works. I have a set of DCD parameters that I think will work for both clock setup and DDR configuration. The problem is I don’t understand how the BOOT DATA (Table 7-54) thing functions. I believe that the functionality it is “supposed” to provide is to copy (at least) the app code from flash and write it into physical memory. The issue is, I see a start address and a size, but that’s not enough to do a memcpy function. You need both a from address (in this case an address in flash) and a to address (in physical memory). The RM states that the “start” address in BOOT DATA is the absolute address of the image. Is this the “from” address or the “to” address and what is the value of the missing address? If I can figure this out, I can copy all of my app to DDR memory and then jump to the entry address in the IVT (although the copy routine speed might be limited to SDR x4 74 MHz).

Another approach might be to have the start address in BOOT DATA point to a “load routine” that gets copied into SRAM and can reconfigure the clocks and QSPI controller, and then load the main app at the higher transfer speed using my own memcpy routine. This has the disadvantage of course of taking the time to load the load routine, short as it may be.

Your thoughts?

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

My first question with both of these approaches is “can you reconfigure the QSPI controller while you’re running from it in XIP mode?”

Good question. Of course you can’t shoot to your own wheel why driving. After reinitializing PLL and core clocks I execute QSPIinit() in OCRAM. This init routine first of all waits until QSPI is idle

   // wait while busy before switching clk divider

   while(QSPI->SR & (QuadSPI_SR_BUSY_MASK | QuadSPI_SR_AHB_ACC_MASK | QuadSPI_SR_IP_ACC_MASK | QuadSPI_SR_AHBTRN_MASK | QuadSPI_SR_AHBGNT_MASK))

   {

   }

Then init routine reconfigures QSPI to the best speed using new changed PLL clocks settings. Before leaving init routine and exiting back from OCRAM to QSPI XIP I do two more steps: 1) Trigger HAB read to not relevant QSPI address and 2) invalidate I-cache. I can’t explain why these are required, as I remember my code was unable to operate properly without them

And if you go DCD way, hm, well, it's not as easy as I thought. The question is how DCD is read from QSPI and processed, is it read to OCRAM and then processed or is it read and processed on the fly? If on the fly then most likely we are in trouble, I doubt it is possible to reconfigure QSPI without upsetting readability of QSPI, I'm not sure. My QSPI init routine involves disabling and reenabling QSPI controller. This of course is not the option for step by step DCD read and write to QSPI registers. Perhaps it's possible without disable and reenable steps, I don't know.

 

Table 7-22 indicates that on bootup, the fastest speed of the QSPI controller in DDR mode is 18 MHz. I don’t know why it’s limited to that. Have you found otherwise? For speed reasons, it would be nice to run in DDR mode, but 18 MHz is a non-starter. My guess is that changing the processor clock speed while

 It is 18MHz (least possible?) because Vybrid doesn’t know in advance how cheap your QSPI memory is and whether it supports 2x 4x or DDR and even doesn't know what read commands your memory chip is using in specific 2x/4x/etc mode. All these fast settings are read from configuration struct at bottom 1k of your QSPI memory.

Running in XIP mode is probably OK, but I don’t think you can change the QSPI controller setup while running in XIP mode. So I don’t think method 2 will work. You’re stuck with the fastest rate being SDR x4 @ 74 MHz which can be set up via the Configuration Parameters at powerup.

Well, I think I answered above. Using XIP boot I’m not only using routine in OCRAM to reconfigure QSPI , I also store occasionally some NV parameters back to QSPI memory. As you may guess QSPI write routines also have to execute from RAM.

 

  

I’m still trying to figure out how method 1 works. I have a set of DCD parameters that I think will work for both clock setup and DDR configuration. The problem is I don’t understand how the BOOT DATA (Table 7-54) thing functions. I believe that the functionality it is “supposed” to provide is to copy (at least) the app code from flash and write it into physical memory. The issue is, I see a start address and a size, but that’s not enough to do a memcpy function. You need both a from address (in this case an address in flash) and a to address (in physical memory). The RM states that the “start” address in BOOT DATA is the absolute address of the image. Is this the “from” address or the “to” address and what is the value of the missing address? If I can figure this out, I can copy all of my app to DDR memory and then jump to the entry address in the IVT (although the copy routine speed might be limited to SDR x4 74 MHz).

 Hm, which document do you refer? In Vybrid Reference Manual Rev. 7 it’s table 19-40 and also Figure 19-20. Start field in boot data specifies where to copy. Yes, boot data doesn’t specify source argument to memcpy, in case of boot from QSPI0 memcpy source is the bottom of QSPI0 (0x20000000). Yes, Figure 19-20 is confusing, it’s not only IVT as advertised in figure title, but also includes boot data fields. Also it is not clear what is the bottom address of Dest Memory specified in Figure 19-20. It would help having simple example with addresses specified in figure and ivt and boot data struct settings along with structs and fields absolute addresses.

 

 

Another approach might be to have the start address in BOOT DATA point to a “load routine” that gets copied into SRAM and can reconfigure the clocks and QSPI controller, and then load the main app at the higher transfer speed  using my own memcpy routine. This has the disadvantage of course of taking the time to load the load routine, short as it may be.

Yes, it should work. But why disadvantage? Amount of data to load is the same. If your load routine optimizes speed settings, then all it should boot faster.

 

Edward

 

0 Kudos
Reply
3,801 Views
ogj
Contributor IV

Thanks for your comments. BTW: I’m using the Vybrid RM 10/2016. I’m getting close to figuring this out. I think I’m going to go with a separate” bare metal” load routine. The routine will configure the A5 clock to 396 MHz (to match the DDR), set up the DDR memory, and copy my app from QSPI to 0x80000000 via DDR x4, then jump to it. My main app is already set to load the M4 code from QSPI. This makes it easier to update my apps as I can put everything on sector boundaries for easy erasing. I’ll skip using the DCD, and have the load routine do everything. I know the IVT and DCD are big endian, is the QSPI Configuration block( at location 0 in flash) and BOOT DATA also big endian?

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

OK, I must having outdated doc's.

All device config, IVT, boot data and DCD have native to ARM endianness (little). Perhaps you found some documentation bug?

Edward

0 Kudos
Reply
3,801 Views
ogj
Contributor IV

On checking further and looking at some of the examples I found, it’s just the IVT header that is big endian (for some weird reason). For the DCD, the RM states:

The ROM determines the location of the DCD table based on information located in the

Image Vector Table (IVT). See Image Vector Table and Boot Data for more details. The

DCD table shown below is a big endian byte array of the allowable DCD commands.

Do you know if all of the “stuff” from flash location 0 through the app code (in my case the load routine) can be copied into 0x3F00_0000, or does it have to start at 0x3F04_0000?

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

I took IVT header setting from some examples:

#define IVT_MAJOR_VERSION           0x4
#define IVT_MAJOR_VERSION_SHIFT     0x4
#define IVT_MAJOR_VERSION_MASK      0xF
#define IVT_MINOR_VERSION           0x1
#define IVT_MINOR_VERSION_SHIFT     0x0
#define IVT_MINOR_VERSION_MASK      0xF

#define IVT_VERSION(major, minor)   \
  ((((major) & IVT_MAJOR_VERSION_MASK) << IVT_MAJOR_VERSION_SHIFT) |  \
  (((minor) & IVT_MINOR_VERSION_MASK) << IVT_MINOR_VERSION_SHIFT))


#define byte_swap16(x) ((((x)>>8) & 0xFF) | (((x) << 8) & 0xFF00))

#define IVT_TAG_HEADER        (0xD1)       /**< Image Vector Table */
#define IVT_SIZE              byte_swap16(sizeof(ivt))
#define IVT_PAR               IVT_VERSION(IVT_MAJOR_VERSION, IVT_MINOR_VERSION)

#define IVT_HEADER          (IVT_TAG_HEADER | (IVT_SIZE << 8) | (IVT_PAR << 24))
#define IVT_RSVD            (uint32_t)(0x00000000)

Yes, doc's say that header tag field is most significant and header is told to be big endian. IVT_HEADER above has tag field swapped to be little endian, which kind of confirms what's said in the docs. Arghh, why not just fix the docs to use native endianness?...

And regarding DCD, I considered it but never used it because all arguments where against it. Perhaps header is as told big endian, but I hope address + data pairs are little endian.

Regards,

Edward

Update:

Forgot about 0x3F00_0000. Why do you think you have restrictions here? I believe it should be usable. If you think boot ROM has to store somewhere configuration data, then look at Figure 19-20. Image Vector Table (sorry, still RM Rev 7), will boot ROM copy config, ivt and dcd in non-XIP mode, or won't, you are safe because offsets from bottom of QSPI and bottom of destination memory will be preserved, 0 for config, 0x400 for ivt and so on. You code is safe to be not overwritten.

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

Hi Ken,

I verified endianness of DCD. Indeed and unfortunately everything in DCD is big endian including address and data/mask. Here's DCD struct example which successfully writes some data to 0x3F400000:

// write single byte/word/dword command struct

// Please note that data field is 32bits wide for all byte/word/dword

// for more data in single write command you need to have multiple

// of adr+d pairs within the same struct, it would save some DCD space,

// 3*N vs 1+2*N dwords per N writes

// struct is used to keep track of proper length setting in hdr

typedef struct {
 uint32_t hdr;
 void *adr;
 uint32_t d;
} dcdwrcmd1;

const struct _ {
   uint32_t dcdhdr;
   dcdwrcmd1 c1;
   dcdwrcmd1 c2;
   dcdwrcmd1 c3;
} device_config_data = {
   (uint32_t)(DCD_TAG_HEADER |
     (byte_swap16(sizeof(device_config_data)) << 8) |
     (DCD_VERSION << 24)),

  {
      0xCC | byte_swap16(sizeof(dcdwrcmd1)) << 8 | 1/*byte*/ << 24,
   (void*)0x0100403F,
   0x51000000             // *(byte*)0x3F400001 = 0x51
  },

  {
      0xCC | byte_swap16(sizeof(dcdwrcmd1)) << 8 | 2/*word*/ << 24,
   (void*)0x0600403F,
   0x15160000          // *(word*)0x3F400006 = 0x1615
  },

  {
      0xCC | byte_swap16(sizeof(dcdwrcmd1)) << 8 | 4/*dword*/ << 24,
   (void*)0x0800403F,
   0x33229988   // *(dword*)0x3F400008 = 0x88772233
  },
};

Hope this helps

Edward

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

One more notice. I told previously it should be possible to reconfigure QSPI from DCD. Unfortunately DCD addresses are filtered and QSPI registers aren't listed in Table 7-61 Valid DCD Peripheral Address Ranges (latest RM Rev0). So no, you can reconfigure clocks and DDRMC but not QSPI settings. Solution for faster QSPI is to reconfigure it from function copied to RAM.

Edward

0 Kudos
Reply
3,801 Views
ogj
Contributor IV

So now I’m trying to write the load routine app that will have 3 functions: do whatever it takes to get the A5 core running (not sure what all that is yet) including setting the clocks up for 396 MHz operation, initializing the DDR and QSPI controllers, and copying my real app into DDR memory. The load routine is the one that will be loaded by the boot ROM (together with the QSPI configuration parameter , IVT,…) into 0x3F04_0000. I know how to set up the clocks and controllers. What else is needed to get the A5 core going well enough to do the above? I don’t need interrupts. I’m using MQX on the A5 which will set everything else up when it starts running.Have you done anything like this before?

0 Kudos
Reply
3,801 Views
kef2
Senior Contributor V

Hi,

..like this? - Booting from QSPI - yes. Using DDR RAM from bare metal - no, OCRAM was enough for me for bare metal. Vybrid with DDR RAM and Linux - yes.

Did you see and try Vybrid Sample Code(VSC)? https://www.nxp.com/webapp/sps/download/license.jsp?colCode=VYBRID_SAMPLE_CODE_SBCH

If they didn't change anything, to start with boot form QSPI you need two projects, hello_world and quadspi_load. You need to set hello_world make configuration to QuadSPI_XIP. Quadspi_load project includes code image from hello_world (hello_world_output.c) and programs it to QSPI. Once programmed you need to reconfigure Tower board jumpers to boot from QSPI to make it working XIP.

For non-XIP mode  you need to clone hello_world QuadSPI_XIP target, then 1) edit (new) scatter file from QuadSPI_XIP to replace all 0x20xxx addresses to 0x3Fxxx addresses. RAM section address should be moved to stay above the code section. 2) Edit quadspi_boot.c, you need to replace in bood_data FLASH_BASE(0x20000000) with your selected 0x3Fxxxx address, also change load size accordingly to make system boot copying conf, ivt, dcd, boot data and all required code to RAM.

If you are using IAR this may be useful as well https://community.nxp.com/docs/DOC-339559QuadSPI XIP and NON-XIP boot

MQX perhaps offers something like VSC, I don't know

0 Kudos
Reply