PowerQuad private RAM problem above 8kB

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

PowerQuad private RAM problem above 8kB

3,694 Views
Mateo
Contributor II

Hello,

I use PowerQuad private RAM in LPCXpresso55S69 development board to accelerate computation.
From documentation PowerQuad private RAM has size of 16kB, but I have problem with addressing upper half. It seems that data from lower half of private RAM are overwritten if upper half are used.
I add sample project when write to upper half of private RAM causes that data is overwritten. When there is no call to upper half, example works OK.

/*******************************************************************************
 * Definitions
 ******************************************************************************/
#define DEMO_POWERQUAD POWERQUAD

#define EXAMPLE_ASSERT_TRUE(x)            \
    if (!(x))                             \
    {                                     \
        PRINTF("%s error\r\n", __func__); \
        while (1)                         \
        {                                 \
        }                                 \
    }

#define EXAMPLE_FIR_DATA_LEN     256
#define EXAMPLE_FIR_TAP_LEN      256
/*
 * Power Quad driver uses the first 4K private RAM, the RAM starts from 0xE0001000
 * could be used for other purpose.
 */
#define EXAMPLE_PRIVATE_RAM_1 ((float *)0xE0001000)
#define EXAMPLE_PRIVATE_RAM_3 ((float *)0xE0003000)


/* Float FIR */
static void PQ_FIRFloatExample(void)
{
    uint32_t i;
    pq_config_t pqConfig;

    PQ_GetDefaultConfig(&pqConfig);
    PQ_SetConfig(DEMO_POWERQUAD, &pqConfig);

    // Move data that will be moved back to buffer and compared, placed from 4kB of memory
    PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_FIR_TAP_LEN / 16, 0), 1.0f, s_firTaps,
                   EXAMPLE_PRIVATE_RAM_1);
    PQ_WaitDone(DEMO_POWERQUAD);

    // Move other data in private PowerQuad memory, placed from 12kB of memory
    // If this block is commented, example works, no overwrite
    PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_FIR_TAP_LEN / 16, 0), 1.0f, s_firOutputRef,
                   EXAMPLE_PRIVATE_RAM_3);
    PQ_WaitDone(DEMO_POWERQUAD);

    // Move back table from memory placed form 4kB of memory
    PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_FIR_TAP_LEN / 16, 0), 1.0f , EXAMPLE_PRIVATE_RAM_1,
        PowerQuadOutput);
    PQ_WaitDone(DEMO_POWERQUAD);

    // Data moved back to main memory are from second table not from first. Assert fails
    for (i = 0; i < EXAMPLE_FIR_DATA_LEN; i++)
    {
        EXAMPLE_ASSERT_TRUE(fabs(PowerQuadOutput[i] - s_firTaps[i]) < 0.00001);
    }
}

Could please check this and help with problem?

Best regards,
Mateusz

Labels (1)
0 Kudos
Reply
10 Replies

3,621 Views
Mateo
Contributor II

Hi xiangjun_rong,

I modified SDK example "lpcxpresso55s69_powerquad_fir_fast" to present problem. I use every solution that you provide:

1. TMPBASE is in shared memory

 

  pqConfig.tmpBase        = (uint32_t *) PowerQuadTemp;
  PQ_SetConfig(DEMO_POWERQUAD, &pqConfig);

2. SRAM4 is checked that is not used
Linker output:
" SRAM_4: 0 GB 16 KB 0.00%"
Map file:
SRAM_4 0x20040000 0x00004000 xrw

3. Only user accessible addres are used 0xE000_3000 and 0xE000_1000

// First 4K not used as is reserved for PowerQuad
#define EXAMPLE_PRIVATE_RAM_0 ((float *)0xE0000000)
// Memory that can be used by user, divided in 4K blocks
#define EXAMPLE_PRIVATE_RAM_1 ((float *)0xE0001000)
#define EXAMPLE_PRIVATE_RAM_2 ((float *)0xE0002000)
#define EXAMPLE_PRIVATE_RAM_3 ((float *)0xE0003000)

  // Move first data table that will be next moved back to buffer and compared, placed from 4kB of memory
  PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_DATA_LEN / 16, 0), 1.0f, s_firstData,
                 EXAMPLE_PRIVATE_RAM_1);
  PQ_WaitDone(DEMO_POWERQUAD);

  // Move second data table in private PowerQuad memory, placed from 12kB of memory
  // If this block is commented, example works, no overwrite
  PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_DATA_LEN / 16, 0), 1.0f, s_secondData,
                 EXAMPLE_PRIVATE_RAM_3);
  PQ_WaitDone(DEMO_POWERQUAD);

  // Move back first data table from memory placed form 4kB of memory
  PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_DATA_LEN / 16, 0), 1.0f , EXAMPLE_PRIVATE_RAM_1,
      PowerQuadOutput);
  PQ_WaitDone(DEMO_POWERQUAD);

 

Example move two different tables into two separated places of PowerQuad private memory EXAMPLE_PRIVATE_RAM_1 and EXAMPLE_PRIVATE_RAM_3. Next move back first table EXAMPLE_PRIVATE_RAM_1 to other shared buffer and compare that is the same as saved.

This example works if use of "EXAMPLE_PRIVATE_RAM_3" is commented. Moved tables are the same, but if add moving of second table to EXAMPLE_PRIVATE_RAM_3 this fails.

If any more question please ask.

I add project in attachment, maybe this will help investigate  problem.

Best Regards,
Mateusz Litwin

0 Kudos
Reply

3,655 Views
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi,

I think there are four base address, which define the memory allocation OUTBASE,INABASE, INBBASE,TMPBASE, they are configured by application code explicitly, but they can not be overlapped, in the PQ_GetDefaultConfig(), the TMPBASE is set up in 0xE000_0000, so user can not use the space.

Hope it can help you

BR

Xiangjun Rong

xiangjun_rong_0-1633769230873.png

 

 

 

void PQ_GetDefaultConfig(pq_config_t *config)
{
config->inputAFormat = kPQ_Float;
config->inputAPrescale = 0;
config->inputBFormat = kPQ_Float;
config->inputBPrescale = 0;
config->outputFormat = kPQ_Float;
config->outputPrescale = 0;
config->tmpFormat = kPQ_Float;
config->tmpPrescale = 0;
config->machineFormat = kPQ_Float;
config->tmpBase = (uint32_t *)0xE0000000U;
}

0 Kudos
Reply

3,647 Views
Mateo
Contributor II

Hi,

1. I checked SDK example "lpcxpresso55s69_powerquad_fir_fast" and there is example of use memory 0xE0001000 to accelerate computation and TMPBASE is set to 0xE0000000.

/*
 * Power Quad driver uses the first 4K private RAM, the RAM starts from 0xE0001000
 * could be used for other purpose.
 */
#define EXAMPLE_PRIVATE_RAM ((void *)0xE0001000)
.
.
.
.
.
.
    /*
     * Fast method
     *
     * The input data B is convert and saved to private RAM, thus the PQ could
     * fetch data through two path. The input data B is converted to float format
     * and saved to private ram.
     */
    pqConfig.inputAFormat   = kPQ_Float;
    pqConfig.inputAPrescale = 0;
    pqConfig.inputBFormat   = kPQ_Float;
    pqConfig.inputBPrescale = 0;
    pqConfig.outputFormat   = kPQ_Float;
    pqConfig.outputPrescale = 0;
    pqConfig.tmpFormat      = kPQ_Float;
    pqConfig.tmpPrescale    = 0;
    pqConfig.machineFormat  = kPQ_Float;
    pqConfig.tmpBase        = (uint32_t *)0xE0000000;

    PQ_SetConfig(DEMO_POWERQUAD, &pqConfig);

    PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_FIR_TAP_LEN / 16, 0), 1.0, tap,
                   EXAMPLE_PRIVATE_RAM);
    PQ_WaitDone(POWERQUAD);

2. Comment in this example also points that only 4K of private RAM is used by TMPBASE and rest can be used by user.

3. If I use only 0xE0001000 private memory, example works OK despite TMPBASE is set to 0xE0000000, but if use also 0xE0003000 this brakes.

4. I also test method with changing TMPBASE to shared memory buffer and this not help.

Best Regards,

Mateusz Litwin

 

0 Kudos
Reply

3,634 Views
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi, Mateo

The AE team said that the address space 0xE000_0000 to 0xE000_3FFF are shared the same memory cell with SRAM4 0x2004_0000 to 0x2004_3FFF, pls check if you use the SRAM4

BR

Xiangjun Rong

0 Kudos
Reply

3,579 Views
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi,

I have tried to fill the private memory from 0xE000_0000 to 0xE000_3FFF with 0xFFFF_FFFF, after the operation is over, then read the private memory, I can not read them correctly.

If I fill the SRAM4, I can write and read correctly.

In conclusion, the private memory can not be used to save application data.

In the an12383, the private memory from 0xE000_0000 to 0xE000_3FFF are used for temporary only and it is assigned to PowerQuad->TMPBASE register.

BTW, the private memory is  mentioned in neither reference manual not data sheet. Pls do not use the memory unless it is  ussed as TMPBASE.

Hope it can help you

BR

XiangJun Rong

 

xiangjun_rong_0-1633936595828.png

 

0 Kudos
Reply

3,570 Views
Mateo
Contributor II

1. PowerQuad is mapping private memory from 0xE000_0000 to 0xE000_3FFF address space to SRAM4 0x2004_0000 to 0x2004_3FFF address space and use SRAM in interleave way. I checked that data is correctly saved in SRAM4. I add screenshots below.

If you try to use memory from 0xE000_0000 to 0xE000_3FFF address space from ARM you use different peripheral.

PQ_memory_1.pngPQ_memory_2.pngPQ_memory_3.pngPQ_memory_4.png

2. I add project which points whats the problem and work correctly.

3. https://community.nxp.com/t5/LPC-Microcontrollers/Use-Powerquad-Private-RAM/td-p/1069956 points that: "We have powerquad_fir_fast SDK demo include powerquad private ram operation that we recommend. Please check it." Than I can I use this memory as pointed by other answer. I use exactly matrix scale operation to access private PowerQuad RAM.

4. Whats the point of articles like https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/tech-days/319/1/AMF-SMH-T3513_Hands-on_3... this is LAB from NXP that FIR with USE of the PRIVATE RAM is faster.
PQ_private_RAM.png

EDIT:

- This LAB from NXP points that private RAM can be used for improve FIR, convolve, corelate opertation. They explicit points that placing the input data B. Private RAM can be used not only for TMPBASE.

5. I add many application notes like https://www.nxp.com/docs/en/application-note/AN12383.pdf or https://www.nxp.com/docs/en/application-note/AN12282.pdf that points that private RAM of PowerQuad could be used by user. What the point of share this information if this should be not used by user?

3,629 Views
Mateo
Contributor II

Hi xiangjun_rong,

As SRAM4 is explicit reserved for PowerQuad, I don't use this memory.

I can prepare and provide modified SDK example "lpcxpresso55s69_powerquad_fir_fast", which causes problems on my board, too fully investigate if needed.

Best Regards,
Mateusz Litwin

0 Kudos
Reply

3,639 Views
Mateo
Contributor II

Also AN12282 and AN12383 points that TMPBASE is used only for FFT or Matrix Inversion.

AN12282_2.png

AN12383_2.png

I don't use any of this operation. Then TEMP area should not be used.

0 Kudos
Reply

3,660 Views
Mateo
Contributor II

Hi,

Thanks for response. I took this information from several places, examples are below:

- AN12383

AN12383AN12383

- AN12282

AN12282AN12282

- SDK example "lpcxpresso55s69_powerquad_fir_fast"

/*
 * Power Quad driver uses the first 4K private RAM, the RAM starts from 0xE0001000
 * could be used for other purpose.
 */
#define EXAMPLE_PRIVATE_RAM ((void *)0xE0001000)

 

I use this memory for performance reasons and it works very well till 8KB boundry (0xE0002000), but this notes says that should be possibility to use more PowerQuad private memory.

If there is need for more information to investigate this, please ask.

Best Regards,

Mateusz Litwin

0 Kudos
Reply

3,668 Views
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi,

Unfortunately, I have not found out where the memory from address 0xE000_3000 or 0xE000_1000 are, can you show any documentation which tell us that user can use the memory as private memory?

I have checked the UM11126.pdf, I have not seen it.

If you declare an array for example resultArray[]; and call the scale function, what is the result?

float resultArray[N];

static void PQ_FIRFloatExample(void)
{
    uint32_t i;
    pq_config_t pqConfig;

    PQ_GetDefaultConfig(&pqConfig);
    PQ_SetConfig(DEMO_POWERQUAD, &pqConfig);

    // Move data that will be moved back to buffer and compared, placed from 4kB of memory
    PQ_MatrixScale(DEMO_POWERQUAD, POWERQUAD_MAKE_MATRIX_LEN(16, EXAMPLE_FIR_TAP_LEN / 16, 0), 1.0f, s_firTaps,
                   resultArray);
    PQ_WaitDone(DEMO_POWERQUAD);
}

Sorry if I misunderstand you

BR

XiangJun Rong

 

0 Kudos
Reply