i.MXRT1060 SEMC SDRAM Data Corruption

cancel
Showing results for 
Search instead for 
Did you mean: 

i.MXRT1060 SEMC SDRAM Data Corruption

Jump to solution
646 Views
Specialist I

We are having apparent SDRAM Corruption problems. These are very intermittent, usually once every few HOURS of running. We are wondering if this is happening to anyone else, or if it matches any known problems.

We have this happening on our boards, but have been able to get it to fail on the NXP MIMXRT1060-EVK board as well.

We have an 8MiB DRAM at 0x80000000-0x80800000 and have configured the MPU to overwrite the default WT cache attribute as WBWA to avoid ARM errata #1259864. So we're using "Writeback" mode rather than the default "Writethrough".

The problem seems to be that sometimes data fails to be written to SDRAM. We have full traces (using a high speed debug pod with instruction and data trace) showing "0" being written to and read from an SDRAM location and 20,800 trace lines later, a different value is written, but then the location is read back as "0". This tracing is from the CPU's view, and can't show if and how that data got flushed from the cache back to the SDRAM, or if it was written back at all.

We have checked and changed all the SEMC SDRAM timing parameters, making them tighter, and then more lenient. We've used the NXP SDK DCD SEMC settings. We've changed the port settings to higher and lower impedances, and with different slew rates. None of those changes affect this corruption.

We are not running the SDK sample code it can't support the simultaneous operations we require. We are transferring 5MB/second over USB and writing to an EMMC at the same time, as well has having the CPU about 70% busy. The code is in SDRAM as is most of the data. We are using the I-TCM, D-TCM and OCRAM2 as well.

The USB and EMMC are using their own DMA to and from the SDRAM, overlapping with the CPU. Our drivers are performing all the Cache Flush and Invalidation operations required for this.

This failure can be made to happen more often by increasing the SEMC SDRAM refresh rate to very high rates. It fails every few minutes when configured like that.

We have found two ways to stop the corruption.

Setting the fn_mod register of the m_b_0 (Cortex-M7) port of the NIC-301 interconnect to limit the write
issuing capability to one (write '2' to address 0x41442108) appears to stop it from happening.

Setting the "DISRAMODE: Disables dynamic read allocate mode for Write-Back Write-Allocate memory regions" in the ARM CPU also stops it. This is achieved by setting 0xe000e008 to 0x00001800.

Both of these affect the maximum rate that the CPU can burst to the SDRAM, and it is this high burst rate (from the CPU through the caches, the AXI, NIC, SEMC and to the SDRAM) that seems to be triggering this problem. Increasing the SDRAM Refresh rate probably applies "backpressure" on the memory system and the write pipeline.

We have this failing on the EVK, but there's no way we could generate a program that demonstrates this based solely on the SDK code, so please don't ask for that.

Tom

0 Kudos
1 Solution
488 Views
NXP TechSupport
NXP TechSupport

Hi Tom,

Thanks for the patience.

I think you already get attached explaination from our local i.MXRT product team.

The SEMC module has re-order feature, which could cause issue when multiple AXI masters accessing the SDRAM with large/burst data operation scenario (back to back operation).

Please check attached pdf file for the detailed info.

About BMCRx registers value, we had submitted a request to SDK software team to change both BMCRx registers value to 0x81 to avoid the similar issue happen.

Thanks for the attention.

Mike

View solution in original post

9 Replies
619 Views
Specialist I

That App Note doesn't mention DMA at all.

The latest Errata (Chip Errata for the i.MX RT1060 (REV 1.1)) doesn't detail any Core/DMA conflict. It only mentions SEMC/NAND problems.

Is there Errata or a document detailing this problem?

Is this a problem in the CPU, the NIC-301 or the SEMC?

The DMA and M7 access looks to be mediated by the NIC-301. Are there any register changes in there that might help? I've already swapped the CPU and DMA priority, and it didn't seem to help.

There are also registers in the SEMC (BMCR0, BMCR1) that look to control  8 entries command reordering in Queue B; QoS priority, latency and efficiency adjustable arbitration scheme.". There's nothing in the Reference Manual detailing what they do or how to use them. There's nothing in the App Note you referenced that does either. Is anything available detailing what these do and how to set them? Could they change this problem?

Disabling data cache on the SDRAM would slow the system down terribly. We've already found two different ways to "reduce the bandwidth" that makes the problem go away, but we'd prefer a fix that doesn't slow the system down.

Tom

 

0 Kudos
588 Views
NXP TechSupport
NXP TechSupport

Hi Tom,

I double checked with local i.MX RT product team about this issue.

There with below suggestion you could try at first:

Please try to change SEMC registers register BMCR0 and BMCR1 to 0x81.

Please let us know the test result. Thanks.

best regards,

Mike

0 Kudos
561 Views
Specialist I

> Please try to change SEMC registers register BMCR0 and BMCR1 to 0x81.

We tried that last night with three of our units and with the NXP Evaluation board.

All ran reliably with this modification, so that change fixes this problem.

What did that value change actually do? We would like to have some understanding of the fix.

I notice that these register values have been changed in the past. This document details a previous problem, without saying what it was and what the fix was:

https://mcuxpresso.nxp.com/api_doc/dev/1891_doc/MCUXpresso%20SDK%20Release%20Notes%20for%20EVK-MIMXR...

2.0.4
Bug Fixes
* Fixed the SEMC queueA and queueB weight configuration issue

One of the difficulties we had was that the boards could run for many hours before this problem showed up with something that we noticed, usually a Crash through one of the Exceptions, or with the "Asserts" we have in the code. We have no idea how many "undetected corruptions" we were getting, if any. We had to find ways to make these errors more frequent so we could characterize them, and test changes (like the BMCRn change).

Anyone else having intermittent problems that look like ours might like to know how to make them more frequent to help with their tests.

We found that making the SDRAM Refresh extremely frequent made the SDRAM corrupt more often. Slowing the SEMC clock down also helped make it fail. We changed CCM_CBCDR[SEMC_PODF] from "2" to "7" (166MHz down to 62MHz) and changed SDRAM_CR3 (Refresh) from the usual "0x3c1e0b09" to "0x0a09010f". That is trying to trigger an 8-burst refresh every 9 clocks. That usually gets us a failure within a minute, but with the BCMRn changes it ran all night.

Tom

 

0 Kudos
550 Views
NXP TechSupport
NXP TechSupport

Hi Tom,

Glad to know the issue was fixed.

I am checking with i.MX RT product team about the explaination (BMCR0 and BMCR1 set to 0x81 fix the issue ).

I will update here when there with any feedback.

Thanks for the patience.

Mike

0 Kudos
538 Views
Specialist I

Could you also please advise the MCUExpresso Team to look at the values that they recommend in the SDK and to make any required changes.

I find it a little confusing as there are three very different sets of values for these registers in the SDK. To that we can add the fourth set that you have just relayed to us. The field values range widely and very different when compared to each other.

To detail these, in the "SDK_2.8.2_MIMXRT1062xxxxA.zip" file I retrieved today, there are:

  1. 213 "dcd.c" files:        BMCR0 = 0x00030524, BMCR1 = 0x06030524
  2. 160 ".jlinkscript" files: BMCR0 = 0x00030524, BMCR1 = 0x06030524
  3. 10 ".mex" files:          BMCR0 = 0x00030524, BMCR1 = 0x06030524
  4. bl_semc.c:                BMCR0 = 0x00404085, BMCR1 = 0x00400085
  5. fsl_semc.c:               BMCR0 = 0x00104085, BMCR1 = 0x40246085
  6. Today's advice:           BMCR0 = 0x00000081, BMCR1 = 0x00000081

(4) is middleware/mcu-boot/src/drivers/semc/bl_semc.c
(5) is devices/MIMXRT1062/drivers/fsl_semc.c

Tom

 

0 Kudos
489 Views
NXP TechSupport
NXP TechSupport

Hi Tom,

Thanks for the patience.

I think you already get attached explaination from our local i.MXRT product team.

The SEMC module has re-order feature, which could cause issue when multiple AXI masters accessing the SDRAM with large/burst data operation scenario (back to back operation).

Please check attached pdf file for the detailed info.

About BMCRx registers value, we had submitted a request to SDK software team to change both BMCRx registers value to 0x81 to avoid the similar issue happen.

Thanks for the attention.

Mike

View solution in original post

472 Views
Specialist I

Thank you.

That explanation matches the corruption we were seeing.

That and the explanation of how the workaround functions gives us confidence that this problem won't come back.

Tom

 

0 Kudos
182 Views
Specialist I

When the SEMC is used as intended (programming the queues for "best operation"), it performs the operations in the wrong order and corrupts the memory. It fails like this when used as intended.

It fails like this when programmed by the SDK, or using the SDK as an example. All of the example code that is "out there" has it enabled.

The workaround effectively disables the function of the queue.

This matches the criteria for documenting this as an Errata Item. Detail the problem, give the workaround and document which version of the SDK has the workaround applied.

I would hope to see an updated Errata item soon.

Tom

 

0 Kudos
624 Views
NXP TechSupport
NXP TechSupport

Hi Tom,

There is AN12437 about i.MX RT series performance optimization.

The SDRAM Data corruption during write opreation was caused by ARM Cortex M7 core and DMA existing conflicts to write SDRAM at same time. Customer slow down the ARM Cortex M7 core write bandwidth via Setting the fn_mod register of the m_b_0 (Cortex-M7) port of the NIC-301 interconnect to limit the writeIf could reduce the conflicts possibility.

If customer could try to disable DCache of SDRAM memory range and check if the issue could be fixed?

Thanks for the attention.

Mike 

 

0 Kudos