Hello,
I've got a bit of a weird bug, with no simple reproducer (yet), but would be interested if someone who can see the source could have a first look to see if there would be weird check for which pid the current process is as I cannot see anything else.
My setup is just starting a single tensorflow/libneuralnetwork application in a container like this, which segfaults 100% of time during init:
podman run -ti --name camera --replace --env=XDG_RUNTIME_DIR=/run/xdg_home --device=/dev/video3 --device=/dev/galcore --volume=/tmp/xdg_home:/run/xdg_home localhost/demo python3 /root/demo_app/detect_object.py
Whereas just wrapping the command in sh works: (note using 'exec python3' here also segfaults, so it really seems related to being PID1?)
podman run -ti --name camera --replace --env=XDG_RUNTIME_DIR=/run/xdg_home --device=/dev/video3 --device=/dev/galcore --volume=/tmp/xdg_home:/run/xdg_home localhost/demo sh -c "python3 /root/demo_app/detect_object.py"
Here's a backtrace and some limited traces (registers, disassembly around the fault address) -- note we're using imx-gpu-viv-6.4.3.p1.2-aarch64 (from the 5.10.9_1.0.0 release) on debian, as newer versions require symbols from glibc 2.33 which is not available for us. kernel is (similar to) lf-5.10.72-2.2.0.
#0 0x0000ffffa6f36c40 in ?? () from /usr/lib/aarch64-linux-gnu/libGAL.so
#1 0x0000ffffa6f36e94 in ?? () from /usr/lib/aarch64-linux-gnu/libGAL.so
#2 0x0000ffffa6f6cb6c in ?? () from /usr/lib/aarch64-linux-gnu/libGAL.so
#3 0x0000ffffa6f2595c in gcoVX_CreateHW () from /usr/lib/aarch64-linux-gnu/libGAL.so
#4 0x0000ffffa6f25b50 in gcoVX_Construct () from /usr/lib/aarch64-linux-gnu/libGAL.so
#5 0x0000ffffa6f25d7c in gcoVX_SwitchContext () from /usr/lib/aarch64-linux-gnu/libGAL.so
#6 0x0000ffffa804af20 in ?? () from /usr/lib/aarch64-linux-gnu/libOpenVX.so.1
#7 0x0000ffffa8290358 in vsi_nn_CreateContext () from /usr/lib/aarch64-linux-gnu/libovxlib.so.1.1.0
#8 0x0000ffffa86238d4 in nnrt::Execution::Execution(nnrt::Compilation*) () from /usr/lib/aarch64-linux-gnu/libnnrt.so.1.1.9
#9 0x0000ffffa8742454 in ANeuralNetworksExecution_create () from /usr/lib/aarch64-linux-gnu/libneuralnetworks.so.1
#10 0x0000ffffa8ce5204 in tflite::delegate::nnapi::NNAPIDelegateKernel::Invoke(TfLiteContext*, TfLiteNode*, int*) ()
from /usr/lib/python3/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so
#11 0x0000ffffa8d02e80 in tflite::Subgraph::Invoke() () from /usr/lib/python3/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so
#12 0x0000ffffa8c2c094 in tflite::Interpreter::Invoke() () from /usr/lib/python3/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so
#13 0x0000ffffa8c0ee3c in tflite::interpreter_wrapper::InterpreterWrapper::Invoke() ()
from /usr/lib/python3/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so
#14 0x0000ffffa8c134d8 in ?? () from /usr/lib/python3/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so
#15 0x0000ffffa8c251b8 in ?? () from /usr/lib/python3/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so
#16 0x00000000004cac54 in ?? ()
#17 0x00000000004a5300 in _PyObject_MakeTpCall ()
#18 0x00000000004c6cc8 in ?? ()
#19 0x000000000049c258 in _PyEval_EvalFrameDefault ()
#20 0x00000000004b1a48 in _PyFunction_Vectorcall ()
#21 0x0000000000498218 in _PyEval_EvalFrameDefault ()
#22 0x00000000004b1a48 in _PyFunction_Vectorcall ()
#23 0x0000000000498064 in _PyEval_EvalFrameDefault ()
#24 0x00000000004b1a48 in _PyFunction_Vectorcall ()
#25 0x0000000000498064 in _PyEval_EvalFrameDefault ()
#26 0x00000000004964f8 in ?? ()
#27 0x0000000000496290 in _PyEval_EvalCodeWithName ()
#28 0x00000000005976fc in PyEval_EvalCode ()
#29 0x00000000005c850c in ?? ()
#30 0x00000000005c2520 in ?? ()
#31 0x00000000005c8458 in ?? ()
#32 0x00000000005c7c38 in PyRun_SimpleFileExFlags ()
#33 0x00000000005b7afc in Py_RunMain ()
#34 0x0000000000587638 in Py_BytesMain ()
#35 0x0000ffffb0a96218 in __libc_start_main (main=0x587538 <_start+56>, argc=11, argv=0xffffc90bd6e8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:308
#36 0x0000000000587534 in _start ()
(gdb) info proc map
Mapped address spaces:
...
0xffffa6ec8000 0xffffa7067000 0x19f000 0x0 /usr/lib/aarch64-linux-gnu/libGAL.so
0xffffa7067000 0xffffa7076000 0xf000 0x19f000 /usr/lib/aarch64-linux-gnu/libGAL.so
0xffffa7076000 0xffffa7078000 0x2000 0x19e000 /usr/lib/aarch64-linux-gnu/libGAL.so
0xffffa7078000 0xffffa7089000 0x11000 0x1a0000 /usr/lib/aarch64-linux-gnu/libGAL.so
(gdb) info reg
x0 0x801028a 134283914
x1 0x0 0
x2 0x28a 650
x3 0x2270b600 577811968
x4 0xffffc90bbba8 281474054732712
x5 0x4f3bf83c 1329330236
x6 0x8d4d90 9260432
x7 0x0 0
x8 0x21f5d010 569757712
x9 0x2270b7f0 577812464
x10 0x0 0
x11 0x0 0
x12 0x0 0
x13 0x1 1
x14 0x2 2
x15 0x20 32
x16 0xffffa7076de0 281473484025312
x17 0xffffa6ef19e0 281473482430944
x18 0x0 0
x19 0x22724940 577915200
x20 0x28a 650
x21 0x228f8650 579831376
x22 0x1 1
x23 0x1 1
x24 0x2286a1a0 579248544
x25 0xffff79cfc3a0 281472725402528
x26 0xffffc90bbbdc 281474054732764
x27 0x0 0
x28 0x0 0
x29 0xffffc90bbb20 281474054732576
x30 0xffffa6f36c10 281473482714128
sp 0xffffc90bbb20 0xffffc90bbb20
pc 0xffffa6f36c40 0xffffa6f36c40
cpsr 0x60001000 [ EL=0 SSBS C Z ]
fpsr 0x8000010 134217744
fpcr 0x0 0
(gdb) disass 0x0000ffffa6f36c40, 0x0000ffffa6f36c80
Dump of assembler code from 0xffffa6f36c40 to 0xffffa6f36c80:
=> 0x0000ffffa6f36c40: str w0, [x25], #4
0x0000ffffa6f36c44: adrp x26, 0xffffa703e000
0x0000ffffa6f36c48: add x1, x26, #0xe98
0x0000ffffa6f36c4c: add x0, sp, #0x90
0x0000ffffa6f36c50: add w24, w22, w20
0x0000ffffa6f36c54: sub x23, x5, #0x4
0x0000ffffa6f36c58: mov x26, x25
0x0000ffffa6f36c5c: mov w27, #0xc // #12
0x0000ffffa6f36c60: str x0, [sp, #104]
0x0000ffffa6f36c64: str x1, [sp, #120]
0x0000ffffa6f36c68: b 0xffffa6f36c94
0x0000ffffa6f36c6c: add x0, x21, x2
0x0000ffffa6f36c70: ldr w1, [sp, #100]
0x0000ffffa6f36c74: str w20, [x21, x2]
0x0000ffffa6f36c78: stp w1, w4, [x0, #4]
0x0000ffffa6f36c7c: ldr w1, [x19, #12]
When I have time I'll try to reproduce this on a yocto build, with updated versions, but that might take a while. I'm sorry I can't share the python code either, so will have to start with a reproducer I can share...
Anyway, please let me know if you have an idea!