iMXRT1060 and HyperRAM transaction length

g_volokh · ‎04-10-2020

Hi All,

We are working with iMXRT1060 and connecting HyperRAM to the FlexSPI. Everything works properly.

We are using HyperRAM access via AHB.

As I understand, the transaction with HyperRAM should not be very long, the HyperRAM must make internal refresh between (or in the beginning) of transactions. As HyperRAM DS says, the recommended transaction length should not be longer than 4us. It means that with 166MHz frequency the transaction length should be about 1kbytes, not more.

The question: do we have opportunity to variate the transaction length via AHB? If yes, then how?

The idea is to make length much more (to increase speed) but not more than 4us.

Thanks in advance.

Regards,

George Volokh.

g_volokh · ‎04-11-2020

I have made some additional research with FlexSPI setting in the iMXRT1060.

If I make

AHBCR.PREFECHEN=1

AHBCR.CACHABLEEN=1

AHBRXBUF0CR0.PREFETCH=1

AHBRXBUF0CR0.BUFSZ=0x80

reading works fine.

The transaction data length during read operation becomes 1024 bytes according AHB read Buffer size (128x64bits=1024bytes). For reading everything seems fine and are working fast enough: reading 1024 bytes on the 166MHz frequency takes around 3.5us.

What about writing it works much worse! The writing 1024 bytes to HyperRAM takes around 9.5us, what is almost 3 times worse. When I am checking the write transaction I see that the data length is only 32 bytes (16 clocks).

I am trying to set

AHBCR.BUFFERABLEEN=1

or

AHBCR.BUFFERABLEEN=0

The result is the same: the data length in the write transaction is 32 bytes.

I see in the FlexSPI description that AHB TX Buffer size: 8 * 64 Bits = 64 bytes.

My question is: what should I do in the FlexSPI settings or anywhere else to use all 64 bytes buffer?

64 bytes buffer can increase the writing speed in twice.

What can you suggest?

Thanks in advance.

Regards, George.

Hui_Ma · ‎04-14-2020

Hi George,

There is an application note AN12239 about using HyperRAM with RT products.

The chapter 5.2 about the performance and related analysis.

There with below description:

So the write performance of transferring 32 bytes is higher.

There without related register related to set AHB TX buffer, which is SOC defined for FlexSPI AHB TX buffer size 64bytes.

Have a great day,
Mike

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------

g_volokh · ‎04-15-2020

Dear Mike,

Thanks a lot for your answer.

I see IP FIFO has 128 bytes and working via IP commands to make access

to FIFO via AHB can work faster.

Did you check that? And can you give me examples how to work via IP

command using DMA?

Thanks in advance.

George.

Hui_Ma · ‎04-16-2020

Hi George,

Please check the different of bus width of AHB & IPS to FlexSPI:

I couldn't find example of DMA using IP command access FlexSPI memory.

Please check AN12239SW example, which provides below API function:

flexspi_hyper_ram_ipcommand_write_data();

flexspi_hyper_ram_ipcommand_read_data();

Thanks for the attention.
Have a great day,
Mike

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------

g_volokh · ‎04-17-2020

Dear Mike,

The goal is to increase the write speed.

Using the ordinary writing through AHB gives us the very slow speed -

around 10..11us for writing 1024 bytes.

It means that the speed is 100MB or even less. For reading (if we

increase the buffer size up to 1024 byte) the speed is around 4..5us,

which means 200..250Mbytes for reading.

The reason of this so slow speed during writing is that the physically

transaction size to Hyper RAM is only 8 bytes for each transaction.

The main idea is to increase this size as many as we can (for reading

the transaction size can be increased up to 1024 bytes).

IP engine has 128 bytes FIFO for reading and writing. And additionally

it has an opportunity to use these FIFO via AHB bus.

I have made small modifications in the AN12239SW example:

- increase the watermark level, IPRXFCR.RXWMRK= 15 (128 bytes),

IPTXFCR.TXWMRK=15 ((128 bytes).

- set MCR0.ATDFEN=1 // Enable AHB bus write to IP TX FIFO

- make some small modifications in the working example (below):

Everything works fine if the source buffer lays both in the DTCM and the

OCM memories.

The speed increases around twice.

But if I do some minor modifications:

in fact, using only memcpy function instead of for() loop, it works fine

if the source buffer lays in DTCM, but doesn't work if it lays in OCM

(location 0x2020'0000).

I try to use cache or switching it off, it doesn't effect anything.

The read data:

I have checked what is working incorrectly: reading or writing. It's

writing (I have checked reading procedures many times).

Can you explain that? What do I do wrong?

Thanks for your support.

Best regards,

George Volokh.

Hui_Ma · ‎04-20-2020

Hi George,

The DTCM is non-cache memory range, while OCRAM is cacheable memory range.

When two masters to access the same memory range in OCRAM, in fact both master will access the cache instead of access the OCRAM memory range directly. There could exists the issue when cached data is different with actual data in OCRAM. So, it need to do cache maintainance during actual application.

And I also find below errata from RT1060 errata file:

Please check here about my colleague shared <iMXRTxxxx Memory Performance: ITCM / DTCM / OCRAM / SDRAM / FlexSPI (QSPI / HyperFLASH)>.

Thanks for the attention.
Have a great day,
Mike

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------

melissa_hunter · ‎04-20-2020

Hi George,

The FlexSPI controller is optimized for reads. Most customers are using it to interface to serial flash vs. RAM. As you've already seen there is a large prefetch buffer used for reads (intended to boost XIP code execution from FlexSPI). For writes we don't have as large a buffer available, but at least there is buffering. In general it is much easier to get read performance instead of write performance (as you've already seen).

For the incorrect data issue you are seeing...as Mike pointed out, the OCRAM space is cached by default by the SDK setup which might be causing some issues (DTCM is already single cycle access so it is never cached).

I don't think the cache settings would be the problem for your case, because the core is moving the data from the OCRAM to the FlexSPI2 AHB address. I'm guessing you have another code modification that advances fptr to a new AHB address in the main while (0U != size) loop?

I'm also interested to know more about your setup. From the comments in your code modifications, it sounds like you are running on your own hardware where the HyperRAM is on FlexSPI2. Is this correct? Where is your code stored the code location and cache settings for that area can also affect the read bandwidth that you get, so knowing your setup for that might be helpful.

Regards,

Melissa

g_volokh · ‎04-20-2020

Dear Melissa,

Thanks a lot for your answer.

I really see that FlexSPI controller is optimized for reading, but not

for writing.

But I believe writing is important too and practically everything is

done, we need to do only all settings correctly.

As I have already written, the main goal for us is

1. to increase the writing speed as many as possible (now we have around

90MB/s using AHB bus).

2. use DMA to leave MCU for another tasks.

When we are using AHB bus for writing I see by oscilloscope that data

transaction size is only 8 bytes.

The AHB buffer is 64 bytes. Can we use the all 64 bytes' buffer for one

transaction?

If yes, I think it can increase the speed a lot. How can we do that?

So far we could not increase writing speed trough AHB bus and decided to

do that using IP commands and writing to IP FIFO through AHB bus.

It's the code from the fsl_flexspi.c -> status_t

FLEXSPI_WriteBlocking(FLEXSPI_Type *base, uint32_t *buffer, size_t size)

This code is working:

melissa_hunter · ‎04-20-2020

Hi George,

When you use the AHB bus for writes transactions, the size of the internal bus requests determines how many bytes are written. We have a 64-bit bus interface to the FlexSPI controller, so that is why you are seeing 8 bytes/64-bits on a normal write from the core. The core typically does not request burst cycles, so it does the accesses as singles of the bus width.

If you use the DMA for the write instead (which is where you want to go long term), then the access is still going out on a 64-bit wide internal bus, but the DMA can request burst cycles. In order to get the maximum number of bytes in a single transaction make sure that the DSIZE (in the DMA_TCDn_ATTR) is configured for the 32-byte burst. This is the maximum amount of data the DMA can write in a single transaction.

The drawback to the DMA approach is that cache management can become an issue. If the source data and/or the HyperRAM is cached, then you'll need some cache management code (or temporarily disable cache while you are preparing the buffer to write to the HyperRAM and writing the HyperRAM).

Regards,

Melissa

g_volokh · ‎04-22-2020

Dear Melissa,

Using DMA with 32 byte burst works fine for writing to the HyperRAM.

But when I am trying to use DMA for reading from HyperRAM (read from the

address 0x7000'0000 to the OCRAM, burst mode is 4 bytes for source and

destination),

the DMA indicates an error that source address is incorrect: ES.CHNL=4

(I am using the 4th channel), ES.SBE=1, the DMA counter and source

destination addresses are in the beginning.

It means that the error has occurred on the first step, I believe.

Why DMA works fine for writing and doesn't work for reading using AHB bus?

Thanks in advance.

Best regards,

George Volokh.

melissa_hunter · ‎04-22-2020

Hi George,

Are you seeing any errors reported from the FlexSPI module? The DMA configuration looks fine. The bus error indicates a problem on the FlexSPI side, so there should be an error indicator there that will hopefully give a better idea of what went wrong.

Regards,

Melissa

g_volokh · ‎04-22-2020

Melissa,

You are right, I see the AHB bus timeout.

What can be the reason?

melissa_hunter · ‎04-22-2020

Hi George,

It is an AHB bus timeout as opposed to an AHB grant timeout. This means that the FlexSPI was able to start the cycle, but it didn't complete. So it sounds like the HyperRam is not responding for some reason (no DQS pulses coming from the memory). Can you try reading from 0x70000000 using a memory window just to verify that the HyperRAM is still working? I suspect whatever happened is a temporary condition that resolved itself (I expect reading from the memory window will probably work). If that works, then you could try manually re-triggering the DMA from a register window by writing bit 4 of the DMA_SSRT register.

Regards,

Melissa

g_volokh · ‎04-22-2020

Dear Mellisa,

Everything works if I am using the function for reading HyperRAM

memcpy(buf, (void*)0x70000000, 1024);

But always when I try to use DMA I get the fault with the result I

sent to you.

You are writing "I expect reading from the memory window will probably

work", sorry, I didn't understand, what do you mean?

Thanks for your support.

Regards,

George Volokh.

melissa_hunter · ‎04-23-2020

Hi George,

This is very strange. A read from the DMA should look pretty much the same as a read from any other master. So there is no obvious reason why memcpy is working when the DMA read is not (even stranger because the DMA writes are working).

In my previous response, I was suggesting to open a memory window within the debugger to look at the HyperRAM memory area. This is what I expect will work.

Were you able to try starting the DMA manually by modifying the SSRT register from a debugger register window as I suggested? Is that included in your statemet that you always get a fault when reading with the DMA?

Do you have access to probe any of the FlexSPI signals with a scope/logic analyzer? In particular it would be interesting to see what the DQS and slave select signals are doing during an attempted DMA read vs. memcpy.

Regards,

Melissa

g_volokh · ‎04-23-2020

Dear Melissa,

It really seems strange. With memcpy everything works nice for reading .

I see all signals by oscilloscope.

For reading I see 1024 bytes transaction length according to the FlexSPI

buffer size.

Moreover, memcmp indicates that read information fully complies to

written data.

It means that hardware interface works fully correct.

The DMA writing works correct, but the DMA reading from the HyperRAM

doesn't work at all. I have tried different source and destination units

(from 1 to 32 bytes).

Result is the same. Maybe some FlexSPI settings should be improved. I

can send you all information, you need, including all my project (it is

made for IAR) or all FlexSPI SFR settings after initialization.

Thanks a lot for your support.

Best regards,

George Volokh.

melissa_hunter · ‎04-23-2020

Hi George,

Using the scope to monitor the FlexSPI lines during the DMA write do you see any activity? Do you see the slave select assert at least? As I said before, the error on the FlexSPI does indicate that it at least thinks it started the cycle. So I expect there to be some activity on the lines.

Regards,

Melissa

g_volokh · ‎04-24-2020

Milissa,

You are absolutely right. After starting DMA for reading HyperRAM I see

the activity on the HyperRAM bus, but only one very small transaction -

reading only one 16-bit word.

I am attaching oscilloscope diagrams:

- CLK_RWDS - the CLK and RWDS signals for DMA,

- CLK_SS - the CLK and SS signals for DMA.

As I have already written before, memcpy procedure works fine before DMA

reading and right after it.

What is wrong?

Thanks and regards,

George Volokh.

melissa_hunter · ‎04-24-2020

Hi George,

And those scope shots were obtained with the DMA SSIZE configured for 32-bit? If so, it is very strange. As you've said the two DQS toggles should get the FlexSPI controller 16-bits of data, but the FlexSPI controller should be continuing to toggle the clock until it receives the full amount of data it is expecting.

In fact, I know that the FlexSPI will actually drive the clock even longer (after all the expected data has been driven onto the bus), because it takes several clocks for the data to actually reach the FlexSPI logic from the pins and then for the FlexSPI to tell the clock pad to stop driving. There should be 3 pulses on RWDS above and beyond the data actually requested.

I have most of the FlexSPI registers from after the error happened from one of your earlier posts, but not all of the registers fit. Can you send me a screenshot showing the values in the rest of the FlexSPI registers (after the AHBWAIT error) starting with STS0?

Thanks,

Melissa

g_volokh · ‎04-25-2020

Dear Melissa,

Yes, the scope shots were done with DMA ssize = 32 bits. I am including

DMA registers after error

It's the FlexSPI registers after the AHBWAIT error:

Thanks a lot for your support.

Best regards,

George Volokh.

iMXRT1060 and HyperRAM transaction length

iMXRT1060 and HyperRAM transaction length

i.MXRT