GPU affinity and Vulkan

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

GPU affinity and Vulkan

2,382 Views
lavusedu
Contributor II

Hi,

I'm experimenting with imx8qmmek board and I'm running into issues with GPU affinity support with Vulkan.

In OpenCL I can use the VIV_MGPU_AFFINITY environment variable to choose either or both the chips to execute my compute workload - this has measurable difference in performance and power consumption so I am confident it is working.

However, executing an equivalent compute workload through Vulkan I see the performance equivalent to that of OpenCL with affinity set only on one chip. The environment variable does not seem to affect the computation at all. I also cannot see the devices separately in Vulkan, neither as physical devices, nor through the vkEnumeratePhysicalDeviceGroupsKHR extension function which is supported in the drivers.

I tested this on imx-5.10.72-2.2.0 and imx-5.15.32-2.0.0 yocto builds, neither of them seems to allow me to use both chips at once in Vulkan. Is there something I'm missing to allow me to execute a compute workload in Vulkan on both chips same as in OpenCL?

0 Kudos
Reply
7 Replies

2,374 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello lavusedu,

I looked into the combine or independent mode setting for vulkan,  I got:

option->affinityMode = __VK_MGPU_AFFINITY_COMBINE;

option->affinityCoreID = 0

 

gcoOS_GetEnv(gcvNULL, "VIV_MGPU_AFFINITY", &affinity);

    if (affinity)

    {

        gctSIZE_T length;

        gcoOS_StrLen(affinity, &length);

        if (length >= 1)

        {

            if (affinity[0] == '0')

            {

                option->affinityMode = __VK_MGPU_AFFINITY_COMBINE;

            }

            else if ((affinity[0] == '1') && (affinity[1] == ':'))

            {

                if ((affinity[2] != '0') || (affinity[2] != '1'))

                {

                    option->affinityMode = __VK_MGPU_AFFINITY_INDEPENDENT;

                    option->affinityCoreID = affinity[2] - '0';

                }

            }

        }

    }

 

I feel the code has a bug,  in line if ((affinity[2] != '0') || (affinity[2] != '1')),  I think it should be

if ((affinity[2] == '0') || (affinity[2] == '1'))

 

It seems to me that the above code will always set gpu to be combined mode if following the variable setup in the graphics manual.

 

Can you try export 

export VIV_MGPU_AFFINITY=1:2,  

to see if you have the application running only on gpu 1?

 

In the mean time, The developers will work on this issue.

 

Regards

0 Kudos
Reply

2,371 Views
lavusedu
Contributor II

Hello,

thank you for your answer.

Unfortunately setting the environment variable to higher numbers (e.g. VIV_MGPU_AFFINITY=1:2) causes both OpenCL and Vulkan to segfault. In addition, in Vulkan this causes the kernel to report an internal error in logs with a call trace:

Call trace:
gckKERNEL_Dispatch+0x57c/0x13f0
gckDEVICE_Dispatch+0x84/0x1d0
drv_ioctl+0x2b8/0x460
__arm64_sys_ioctl+0xac/0xf0
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0xd4/0xfc
do_el0_svc+0x2c/0x94
el0_svc+0x28/0x80
el0t_64_sync_handler+0xa8/0x130
el0t_64_sync+0x1a0/0x1a4

0 Kudos
Reply

2,364 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

I changed the code and rebuilt vulkan library for you to try.  Can you try the attached vulkan lib?  It is based on 5.10.72.2.2.0.

Regards

0 Kudos
Reply

2,359 Views
lavusedu
Contributor II

I tried using the library you sent by replacing the path in `/etc/vulkan/icd.d/`. In my benchmark even with this library the problem was still persistent - the GPU took the same amount of time to complete the benchmark with all three `VIV_MGPU_AFFINITY=0`, `VIV_MGPU_AFFINITY=1:0` and `VIV_MGPU_AFFINITY=1:1`.

It no longer segfaulted with `VIV_MGPU_AFFINITY=1:2`.

However, putting some load on the GPU and computing my benchmark multiple times in a row resulted in a kernel driver crash and kernel logging a memory dump. This does not happen with the current driver at all.

0 Kudos
Reply

2,351 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

you should not use export VIV_MGPU_AFFINITY=1:2.  It is not valid config.  Last time, I asked you to do it just for testing purpose to see if gpu can go to independent mode.  Forget this.

1. with my libvulkan.so,  do you experience gpu dump with all these three configs, VIV_MGPU_AFFINITY=0.0, VIV_MGPU_AFFINITY=1:0 and VIV_MGPU_AFFINITY=1:1 ? 

It doesn't make sense since I only changed one line to let vulkan to be in independent mode when

VIV_MGPU_AFFINITY=1:0 or VIV_MGPU_AFFINITY=1:1

.2. what is your benchmark application?

Regards

 

0 Kudos
Reply

2,341 Views
lavusedu
Contributor II

Ok, nevermind the gpu dump. My main problem is that the GPU performs the same in both the combined and the independent mode when running Vulkan compute. On OpenCL it performs better in combined mode (because it uses both chips), but in Vulkan the combined mode is as slow as if it was only using one chip.

0 Kudos
Reply

2,345 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

If you have anymore question please enter a case in www.nxp.com

Regards

 

0 Kudos
Reply