i.MX 8M Quad synchronous EA on GPC access from EL0

stefankalkowski · ‎07-07-2020

Hi all,

I currently implement an i.MX 8M platform driver for the component-based Genode OS framework. It is a microkernel architecture, and drivers are simple applications executed in EL0. I could sucessfully use the driver already to configure, e.g., the Clock Control Module (CCM). However, when I access certain registers of the General Power Controller (GPC) I receive an synchronous external data-abort. I can successfully access the general configuration registers of the same GPC MMIO region (everything in between 0x303a0000-0x303a0250), but if I read/write the Power Gating Controller (PGC) registers of any power domain (>= 0x303a0800), I encounter a synchronous EA.

The curious thing about it is whenever I do the same accesses in EL1 in the kernel, I can successfully power up/down the corresponding domains without failure. Therefore, I was suspecting some of the security modules in the SoC, like AIPSTZ1-4, or the Central Security Unit (CSU) to prohibit the EL0 access. However, they are configured by the ARM Trusted Firmware (ATF) to allow any access, also normal-world, unprivileged one. I can read those CSU and AIPSTZ registers, and see they are locked but allow any access.

Of course, I can build a work-around to either do the GPC access, or call the ATF within the kernel. However, it somehow hits the otherwise pure design.

But foremost, I would like to understand which mechanism in the SoC returns the bus-error in my use-case?

Any clarification is highly appreciated. Thank you in advance.

nxf63969 · ‎07-27-2020

I apologize for the late reply.

To understand which mechanism in the SoC has caused the SEA is important to understand its meaning.
An abort resulting due to invalid or unsuccessful access of memory.
External Aborts are caused by errors flagged by the AXI interfaces when the request goes external to the processor. You can read the ARM Data Fault Status Register (DFSR) to dermine the fault type (There are bit field related to the type of Aborts, i.e. 0b01000 is SEA, but you could check the SD bit that is valid only for external aborts and is related to AXI/AHB error caused).

The typical causes that generate a SEA are: some AXI mechanism, clocking or Trust Zone (when accessing protected memory).

I know you discarded some of them, and other cause is the following and is related to why accesses in EL1 are possible and accesses in EL0 are prohibited, probably you did not setup the AP[2:1] bit field.
The data access permission controls:
AP[2] selects between read-only and read/write access.
AP[1] selects between Application level (EL0) and System level (EL1) control.
This provides four permission settings for data accesses, one of them is "Read/write at EL1, NO access by software executing at EL0" (0b00) Probably you have this configuration.

For more information, check the following link:

https://armv8-ref.codingbelief.com/en/chapter_d4/d44_1_memory_access_control.html

Best regards,

Luis Pérez

stefankalkowski · ‎08-10-2020

Thank Luis for your reply.

Luis Perez wrote:
I apologize for the late reply.

To understand which mechanism in the SoC has caused the SEA is important to understand its meaning.
An abort resulting due to invalid or unsuccessful access of memory.
External Aborts are caused by errors flagged by the AXI interfaces when the request goes external to the processor. You can read the ARM Data Fault Status Register (DFSR) to dermine the fault type (There are bit field related to the type of Aborts, i.e. 0b01000 is SEA, but you could check the SD bit that is valid only for external aborts and is related to AXI/AHB error caused).

I've used the ESR_EL1 instead of the DFSR, because I run in AARCH64 state and not in AARCH32, but anyway I do not know which SD bit in DFSR you are refering to? Maybe you mean ExT bit [12], which should indicate whether it is AXI decode or slave error on external aborts? But will it be meaningful in the AARCH64 state too, because I did not found a similar bit in the ESL_ELx encoding?

The typical causes that generate a SEA are: some AXI mechanism, clocking or Trust Zone (when accessing protected memory).

I know you discarded some of them, and other cause is the following and is related to why accesses in EL1 are possible and accesses in EL0 are prohibited, probably you did not setup the AP[2:1] bit field.
The data access permission controls:
AP[2] selects between read-only and read/write access.
AP[1] selects between Application level (EL0) and System level (EL1) control.
This provides four permission settings for data accesses, one of them is "Read/write at EL1, NO access by software executing at EL0" (0b00) Probably you have this configuration.
For more information, check the following link:
https://armv8-ref.codingbelief.com/en/chapter_d4/d44_1_memory_access_control.html
Best regards,
Luis Pérez

Well, apart from the fact that I've checked all MMU-related settings several times, the AP fields should not be the issue, otherwise I would not observe a "Synchronous external abort, not on translation table walk" as the ESR_EL1 register suggests, but a permission fault.

To me it was interesting to know, whether I can identify the chip part denying the access, which is according to my observations no CPU-internal part, but some other TrustZone / bus controller logic, which makes it difficult resp. impossible to me to identify the source.

Regards

Stefan Kalkowski

nxf63969 · ‎07-15-2020

Hi,

Synchronous external aborts are caused by failed read access or write to protected area.

More details on External aborts can be found in Section D4.5 "External Aborts" of ARMv8-A Architecture Reference Manual.

And probably as you said, the access denied could be related to Central Security Unit (CSU), so I suggest you to check if the memory range (>= 0x303a0800) is allowed there.

Best regards,

Luis Pérez

stefankalkowski · ‎07-16-2020

Thank you for your reply and the reference to the ARMv8-A Manual, which I'm aware of. My question was not so much about what a synchronous external aborts is, but what SoC mechanism has caused it.

Your suggestion to check the CSU settings is astonishing me, because I already explained in my first post that I did so. Moreover, there is no possibility to restrict access at least below page-granularity in the CSU. You can restrict access to I/O registers of whole peripherals, e.g., the whole GPC, but not just for parts of it.

To sum it up, CSU and AIPSTZ settings are _not_ related to the fault behaviour. They allow everything to either normal/secure world, and EL0/1. That is why I asked here in this forum, what other reason can cause the fault?

Best regards,

Stefan Kalkowski

i.MX 8M Quad synchronous EA on GPC access from EL0

i.MX 8M Quad synchronous EA on GPC access from EL0

i.MX 8M | i.MX 8M Mini | i.MX 8M Nano