What is the correct way to reserve the lower 2MB of DDR for use with the M4 with the i.mx7? I am having trouble getting the kernel to successfully decompress at an address outside of the first 2MB of ram (address 0x00208000 exactly). When the kernel boots,
The motivation for relocating the kernel comes from this previous question: IMX7 M4 caching and execution speed
Summary: The M4 Cache only works with the first 2MB of DDR address space. In order to run the M4 out of DDR with the best performance, the binary should be loaded into this cache-able space.
The steps I have taken to use the lower 2MB:
Details
2. Usually, ARM kernels decompress themselves at an offset of 0x8000 from the beginning of memory, which conflicts with the M4 cache-able zone. By changing textofs-y to 0x00208000 (arch/arm/Makefile:139), I am able to boot the kernel at a higher address, with a couple BUGS(). Is this the correct way to change the kernel entry point/where the kernel decompresses itself?
3. Here are the related nodes of the device tree:
memory { device_type = "memory"; reg = <0x80000000 0x20000000>; linux,usable-memory = <0x80200000 0x3ff00000>; }; reserved-memory { #address-cells = <0x1>; #size-cells = <0x1>; ranges; linux,cma { compatible = "shared-dma-pool"; reusable; size = <0x14000000>; linux,cma-default; }; m4@80000000 { reg = <0x80000000 0x100000>; }; };
The full dmesg output is attached
Original Attachment has been moved to: dmesg.txt.zip
I've determined that the problem is not with modifying the text offset, but how the device tree describes the reserved memory. Am I breaking an alignment issues for arm kernels and memory nodes?
The following device tree causes the bug below it.
memory {
device_type = "memory";
reg = <0x80000000 0x40000000>;
linux,usable-memory = <0x80200000 0x3fdf0000>;
};
reserved-memory {
#address-cells = <0x1>;
#size-cells = <0x1>;
ranges;
linux,cma {
compatible = "shared-dma-pool";
reusable;
size = <0x14000000>;
linux,cma-default;
};
m4@80000000 {
reg = <0x80000000 0x200000>;
};
rpmsg@BFFF0000 {
reg = <0xbfff0000 0x10000>;
};
};
BUG: Bad page state in process swapper pfn:bff8c page:bfd5a180 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x70860253(locked|error|dirty|active|arch_1|mappedtodisk|reclaim) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: flags: 0x41(locked|active) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.15+ #2 Hardware name: Freescale i.MX7 Dual (Device Tree) [<80215e78>] (unwind_backtrace) from [<802127dc>] (show_stack+0x10/0x14) [<802127dc>] (show_stack) from [<80729910>] (dump_stack+0x84/0xc4) [<80729910>] (dump_stack) from [<802b31ec>] (bad_page+0xcc/0x11c) [<802b31ec>] (bad_page) from [<802b3464>] (free_pages_prepare+0x228/0x28c) [<802b3464>] (free_pages_prepare) from [<802b5338>] (free_hot_cold_page+0x34/0x198) [<802b5338>] (free_hot_cold_page) from [<802b56fc>] (free_highmem_page+0x28/0x78) [<802b56fc>] (free_highmem_page) from [<809451a8>] (mem_init+0x210/0x404) [<809451a8>] (mem_init) from [<8093facc>] (start_kernel+0x1fc/0x3a8) [<8093facc>] (start_kernel) from [<8020807c>] (0x8020807c)
This change to the device tree removes the bug from the kernel, but breaks userspace programs (e.g. gdbserver):
memory {
device_type = "memory";
reg = <0x80000000 0x40000000>;
- linux,usable-memory = <0x80200000 0x3fdf0000>;
+ linux,usable-memory = <0x80200000 0x3fd00000>;
}
Hi Ryan,
Please try
memory {
device_type = "memory";
reg = <0x80000000 0x40000000>;
- linux,usable-memory = <0x80200000 0x3fdf0000>;
+ linux,usable-memory = <0x80200000 0x3fbf0000>;
}
I think the rpoblem is you are not substracting the offset to the whole size, so the final size is larger that the real memory size and the MCU may consider some memory space out of boundaries.
Regards,
Carlos
Hi Carlos,
The original size is correct. RAM goes from address 0x80000000 - C0000000
0x80000000 - 0x801FFFFF is for the M4
0x80200000 - 0xBFEFFFFF is for linux os
0xBFFF0000 - 0xBFFFFFFF is for RPMSG
In the device tree, 0x80200000 + 0x3FBF0000 = 0xBFFF0000.
I've uploaded our current solution - which seems to mostly works. It isn't currently under moderator review.
In your document the first method of reserving the first 2MB from linux creates userspace regressions. In the TI document you linked earlier (http://processors.wiki.ti.com/index.php/HOWTO_Change_the_Linux_Kernel_Start_Address ) as well as AN5127 it is stated that the kernel requires start addresses to be a multiple of 16MB. Is this something you tried?
Hi Allen,
Are the userspace regressions you refer to when using the "linux,usable-memory" device tree node? We haven't seen any userspace errors since switching to the "reserved-memory" approach.
No, I haven't tried having the start of memory be 16MB aligned.
I was referring to the example where you just modify the "linux,usable-memory" node, the first attempt you discuss. I wondered if you had reserved the 16MB you might not have had to go to the lengths you described as there wouldn't have been any userspace bugs.
It looks like I cannot change the text offset by 16MB as described by the makefile modification of "textofs-$(CONFIG_ARCH_MXC)" or I get an assembly error unfortunately. Any thoughts?
evanthompson got this far, and modified the assembly to fix the error by hardcoding the offset. The kernel compiles, but did not boot. The 16MB offset solution was abandoned after this.
Here's the changes he made for reference:
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -198,7 +198,9 @@ not_angel:
mov r4, pc
and r4, r4, #0xf8000000
/* Determine final kernel image address. */
- add r4, r4, #TEXT_OFFSET
+ add r4, r4, #0x00008000
+ add r4, r4, #0x00800000
+// add r4, r4, #TEXT_OFFSET
#else
ldr r4, =zreladdr
#endif
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -52,7 +52,9 @@
.equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE
.macro pgtbl, rd, phys
- add \rd, \phys, #TEXT_OFFSET
+// add \rd, \phys, #TEXT_OFFSET
+ add \rd, \phys, #0x00008000
+ add \rd, \rd, #0x00800000
sub \rd, \rd, #PG_DIR_SIZE
.endm
I was able to get it to work and boot with the attached patch. I am using the Toradex colibri dev board but the same changes to the colibri dts could be made to the imx7d-sdb-m4.dtsi
So far I have not tried to load the firmware from linux as I have only reserved the memory in the dts but not configured it with the shared-dma-pool compatible property.
I have since tried compiling this patch with the kernel provided with the Compulab CL-SOM-iMX7 and have found it to no longer work with the movw and movt instructions. When I have tried using the ldr instruction instead to load a full 32bit constant the kernel still halts prior to booting with a 16MB offset. With a 2MB offset the kernel has no issue booting using the ldr instruction, there is some additional issue booting the kernel with the 16MB offset on the Compulab kernel.
Status update:
I had relied on a combination of the method described in the patch and the method described in the reserve_m4_memory.docx. Unfortunately after months of success with the combo I hit a final roadblock related to module relocation and getuser(). In the end I found that AN5127_R1.pdf (just google it) describes what seems to be the best method and avoids a lot of issues documented in the reserve_m4_memory.docx. The method in the app note along with a change of the dts to set the linux usable memory to the correct offset essentially "hides" the first 16MB of ram. This seems preferable to using the CMA and changing the TEXT_OFFSET because it requires no patching to the kernel code, just build configuration.
DTS Modification:
/* usable memory */
memory {
- linux,usable-memory = <0x80000000 0x1ff00000>,
+ linux,usable-memory = <0x81000000 0x1ef00000>,
<0xa0000000 0x1ff00000>;
};
Could a moderator review my document and approve it?
Hi Ryan,
I found your document very useful, since I have the same needs.
I 've just started working with i.mx7 and I have a question about your doc: the use for "Kernel Module to Reload M4 " chapter is only to reload M4 firmware while Linux is running, while if I just update the m4 file in "mnt/mmcboot" (in this case I load from eMMC) and then reboot, there is no need to change the FORCE_MAX_ZONEORDE, is this correct?
Thanks,
Simone
That is correct, you don't need to change FORCE_MAX_ZONEORDER if you are not reloading the M4 from Linux.
Hi Ryan,
Don't modify anything in the kernel Makefile if you want to relocate kernel address.
Please modify CONFIG_LOADADDR and CONFIG_SYS_TEXT_BASE in uboot to satisfy your requirement. Aslo, you need reserve the first 2MB DDR memory in the device tree.
Best regards,
Carlos
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
CONFIG_LOADADDR is where the zImage is placed in memory, not the final destination of the decompressed kernel.
CONFIG_SYS_TEXT_BASE is the location that u-boot is initially placed in memory. U-boot will then relocate itself to the top of RAM. (linux kernel - Understand U-Boot memory footprint - Stack Overflow ).
This does not change the final location of the kernel *after decompression*. Subsequently, when execution is handed to the kernel, it will decompress itself to 0x80008000, which conflicts with the memory reserved for the M4. Adding a reserved memory node to the device tree does not changes this.
This link is an example of what I am trying to accomplish with the i.MX7: HOWTO Change the Linux Kernel Start Address - Texas Instruments Wiki
Hi Ryan,
I understand your point, I have just one question. Where did you see that "when execution is handed to the kernel, it will decompress itself to 0x80008000". I cannot find this code.
Regards,
Carlos
HI Carlos,
Thanks for helping out. Here is link to where I've read the kernel is decompressed to 0x80008000 (the RAM baseaddr of the i.mx7 plus the 0x8000 offset). Booting ARM Linux. The location of kernel code and data can also be found by `cat /proc/iomem`
I believe the code that actually does the decompression is in arch/arm/boot/compressed/decompress.c, which calls __decompress from the listed includes depending on the type of compression. Here is a link to the overview of the kernel startup process: What is the Linux boot sequence in case of ARM processor - Quora
Best,
Ryan