Cache Corruption on MX6UL(L)

a_fatoum · ‎09-03-2019

During work on the barebox bootloader, we noticed that a particular sequence
of events can lead to reliably triggering D-Cache and I-Cache corruption on the
i.MX6 UltraLite and UltraLiteLite and are asking for NXP to take a look at it.

Attached is a self-contained binary that when loaded from 0x9fe00000, parses
an embedded device tree and then at $pc = 0x9fe66ffc resets the SoC by writing
to the watchdog at 0x020bc000. The binary doesn't do any MMIO accesses besides
accessing the serial port at 0x02020000 and the watchdog at 0x020bc000 after a
successful run.
It doesn't do any cache maintenance and shouldn't need to:
it doesn't relocate itself, it does no MMU reconfiguration and no DMA.
This can also be verified by running it in user mode under Linux using the
attached linux-loader.c.

The binary is called after data caches have been flushed and instruction caches
were invalidated, but with the MMU enabled.

Observation on the MCIMX6ULL-EVK
- When run under Linux, the binary reaches the expected location.
- When run from U-Boot with data caches _off_, the binary reaches the expected
location.
- When run from U-Boot with data caches _on_, the binary experiences instruction
and data cache corruption. User visible effects can vary:

* system hangs without serial output
* corrupted strings are printed to the serial console then system hangs
* the U-Boot exception handler is triggered with data abort or undefined
instruction and system resets
* the U-Boot exception handler is triggered, but experiences corrupted
instructions itself and system locks up. Even issuing a CPU halt over JTAG
fails in this case

Steps to reproduce:

1) Flash a SD Card with the 6ul-corruption.sdcard image in the attached zip file.
This image contains the NXP U-Boot as bootloader, as well as two binaries in
the FAT partition: "corruption-yes" and "corruption-no".

 host$ dd if=6ul-corruption.sdcard of=/dev/sdc‍

2) Load "corruption-yes" with U-Boot and and wait till system hangs:

 => fatload mmc 1:1 0x9fe00000 corruption-yes
 reading corruption-yes
 870648 bytes read in 160 ms (5.2 MiB/s)
 => dcache flush
 => icache flush
 => go 0x9fe00000
 ## Starting application at 0x9FE00000 ...
 start.c: runtime offset at 0x00000000, text 0x9fe00000, barebox_base=0x9fe00000
 start.c: memory at 0x9f800000, size 0x00800000
 astart.c: initializing malloc pool at 0x9fb00000 (end 0x9fe00000)
 start.c: starting barebox...
 
 >core
 uaaaasing boarddata provided DTB
 start.c: barebox_arm_boot_dtb: using barebox_boarddata
 using boarddata provided DTB
 Will either experience cache corruption or continue...
 usf1
 error‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

(Note the gibberish in the line before "error", that originates from corrupted data).

3) Reset and boot with data cache off and the output is as expected:

 => fatload mmc 1:1 0x9fe00000
 reading corruption-yes
 870648 bytes read in 160 ms (5.2 MiB/s)
 => dcache flush
 => dcache off
 => icache flush
 => go 0x9fe00000
 ## Starting application at 0x9FE00000 ...
 start.c: runtime offset at 0x00000000, text 0x9fe00000, barebox_base=0x9fe00000
 start.c: memory at 0x9f800000, size 0x00800000
 astart.c: initializing malloc pool at 0x9fb00000 (end 0x9fe00000)
 start.c: starting barebox...
 
 >core
 uaaaasing boarddata provided DTB
 start.c: barebox_arm_boot_dtb: using barebox_boarddata
 using boarddata provided DTB
 Will either experience cache corruption or continue...
 1
 2
 3
 4
 Reached end successfully
 
 U-Boot 2017.03-00887-g5a61b28d205f (Aug 27 2019 - 08:55:30 +0200)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

(System has reset after success output to console).

----

When it's possible to halt the SoC via JTAG, the fact that cache corruption
occurred can sometimes be observed with a JTAG debugger:

- read the memory region at 0x9fe00000+0x80000 from the CPU's viewpoint
(utilizing the caches)
- clean the data cache to the point of coherence
- again, read the memory region at 0x9fe00000+0x80000 from the CPU's viewpoint.

With OpenOCD on an i.MX6UL:

 host$ openocd --log_output 6ul-cache-corruption.log
 host$ telnet 127.0.0.1 4444
 Open On-Chip Debugger
 > halt
 > imx6.cpu.0 cache auto 0
 > mdw 0x9fe00000 0x80000
 > echo "-----clean-----"
 > imx6.cpu.0 cache l1 d clean 0x9fe00000 0x80000
 > mdw 0x9fe00000 0x80000‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The observation is that sometimes the memory dumps differs in a cache line, which
implies that after cleaning, there remained non-dirty cache lines, which content
differs from what's in in a lower-level cache, i.e. they had been corrupted:

 --- mem-pre-clean 2019-08-22 16:00:13.040244665 +0200
 +++ mem-post-clean 2019-08-22 16:00:12.068260582 +0200
 
 -0x9fe00100: 4606fffb f06fb928 46500a0b e8bdb009 46018ff0 f03e4628 4683f899 d0392800 
 -0x9fe00120: 4606fffb f06fb928 46500a0b e8bdb009 46018ff0 f03e4628 4683f899 d0392800 
 +0x9fe00100: 49214a20 f80cf03e b9e04682 463b68e2 f113a904 d21a33ff 46436922 33fff113 
 +0x9fe00120: 2301d21a 9300aa04 9b024658 f03d4917 4682ffbc 4240b1c8 fdbcf000 46024914 ‍‍‍‍‍‍‍‍‍‍‍‍‍‍

When dumping L1 I-Cache or L1 D-Cache with MCR p15, 3, , c15, c4
unexpected invalid instructions can also be observed.

Also in the FAT partition is corruption-no, which differs in that just two
instructions have been swapped:

 --- corruption-yes.thumb 2019-08-27 13:00:11.119154084 +0200
 +++ corruption-no.thumb 2019-08-27 13:00:11.359152538 +0200
 @@ -124494,6 +124494,6 @@
 3e662: 4614 mov r4, r2
 3e664: bebe bkpt 0x00be
 - 3e666: 2201 movs r2, #1
 - 3e668: fab3 f383 clz r3, r3
 + 3e666: fab3 f383 clz r3, r3
 + 3e66a: 2201 movs r2, #1
 3e66c: bebe bkpt 0x00be
 3e66e: 0320 lsls r0, r4, #12‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

This transformation should be always safe to do, especially as these two
instructions are unreachable and never executed. However on the i.MX6UL(L) it
alters the program flow and lets the program terminate successfully as if the
caches were off.

Placing hardware breakpoints to cover 0x9fe3e666-0x9fe3e666b has the effect
of "correcting" the runtime behavior and the binary runs to completion.
Neither breakpoints or watchpoints at this location are triggered however.

This issue is reproducible when invoking the binary from U-Boot
imx_v2017.03_4.9.88_2.0.0_ga, U-Boot 2019.12.0-rc2 as well as barebox v2019.07.0.
To avoid collision between binary load address and U-Boot reserved memory,
two U-Boot patches are attached. Apply the first on the uboot-imx and both on
upstream U-Boot. The used U-Boot config is mx6ull_14x14_evk_defconfig.

a_fatoum · ‎10-10-2019

Correctly enforcing the ARMv7 eXecute Never (XN) attribute on the MMDC region following the SDRAM makes the issue disappear. The attached patch can be used to verify this. This indicates that the instruction prefetcher was speculating behind the end of SDRAM. Reading from this region makes the system hang, but speculating into it eventually has us end up with the corrupted cache reported above.

The proper fix is to map all device memory with the XN attribute. I've drafted this patchset for barebox to do so:

[PATCH 0/3] ARMv7: mmu: fix setting eXecute Never for device memory — Barebox

U-Boot would need to do the same. It currently does so only for OMAP it seems. See the attached patch's commit message for more details:

 The ARM Architecture Reference Manual notes[1]:
 > When using the Short-descriptor translation table format, the XN     
 > attribute is not checked for domains marked as Manager.
 > Therefore, the system must not include read-sensitive memory in
 > domains marked as Manager, because the XN bit does not prevent  
 > speculative fetches from a Manager domain.
 
 To avoid speculative access to read-sensitive memory-mapped peripherals
 on ARMv7, we'll need U-Boot to use client domain permissions, so the XN
 bit can function.
 
 This issue has come up before and was fixed in de63ac278
 ("ARM: mmu: Set domain permissions to client access") for OMAP2 only.
 It's equally applicable to all ARMv7-A platforms where caches are    
 enabled.
 
 The proper fix would be to have this set for all uncached memory. The
 purpose of this patch is to only demonstrate how doing this for the  
 MMDC regions after the i.MX6ULL-EVK's 512M SDRAM fixes the hangs due
 to speculative execution on the i.MX6ULL-EVK reported at [2].  ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

[1]: B3.7.2 - Execute-never restrictions on instruction fetching
[2]: "Cache Corruption on MX6UL(L)": https://community.nxp.com/thread/511925‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Linux, both decompressor and kernel, set the DACR correctly.