The FW crashed issue is still exist in Generic_PCIE-WLAN-UART-BT-8997-LNX_6_6_3-IMX8-16.92.21.p119.2-16.92.21.p119.2-MM6X16437.P3-GPL.
follow previous case:
and reopen by this case.
Hi, @ArthurC
Our internal team reviewed our local logs and your logs and observed below identical error before scan timeout.
[74019.137550] uap0:
[74019.137567] CMD_RESP (74019.136043): 802_11_SCAN_EXT [0x8107], result 4, len 142, seqno 0x10d5
[74019.148288] CMD_RESP: cmd 0x107 error, result=0x4
Based on our analysis:
We have sanitized the patch, test still running not observed error on local setup.
Attaching patch here.
Our suggestion is: please switch to latest release Q2-24 (https://www.nxp.com/webapp/sps/download/license.jsp?colCode=GENPCIE16.92.21.p119.3MM6X16437P21GPL&ap...) and apply the patch on top of Q2-24 release. It is recommended to always use the latest-release for stability.
Can you please help us to understand how you start Soft AP on uap0, if nmcli creating two instances(mlan0 and uap0) of wpa_supplicant?
Also, Please help to monitor if you see error “cmd 0x107 error, result=0x4” during test, after this error our handling will work.
Best regards,
Christine.
Hi, @ArthurC
I am very sorry to make you feel uncomfortable and dissatisfied with our products.
But to resolve the issue, we are still working actively.
Please see below:
Regarding disconnection issue,
Please check below screenshot from the given log.
We can see Disconnect is due to NetworkManager issued Disconnect to Supplicant, reason highlighted that new Activation request from Network manager.
New Activation request will cause disconnect and reconnect, this is not an issue. And this disconnect triggered is not due to any NXP side DRV or FW issue.
Can you please check if you have issued a new Activation on the same Network using nmcli? Can you please help to check nmcli path.
Regarding 0x107 scan timeout,
We are in discussion with internal team and will update you soon with the next steps and debug approach.
Best regards,
Christine.
Hello @Christine_Li,
Regarding disconnection issue,
Here disconnection do not seem to be NXP DRV/FW issue, please check/verify your network utility.
-> Why the module do disconnect by itself? We don't think so. Because in the same setting, other vendor's modules works OK, such as Boardcom BCM4356. Looks like 88W8997 is not a stable product. If so, we will not use this solution any more.
Looks like we waste our time for it.
Hi, @ArthurC
Please see our internal team's feedback as below:
==========
Regarding disconnection issue,
Here disconnection do not seem to be NXP DRV/FW issue, please check/verify your network utility.
Regarding 0x107 scan timeout issue,
Thanks for the confirmation that 0x107 timeout only observe when auto_fw_reload is enabled. This is the actual issue which require to debug and we will be focusing in this thread.
Please help with below queries for further debugging.
Best regards,
Christine.
Hello @Christine_Li ,
In our platform, without auto_fw_reload=0 in driver probe parameter system will be hanged every time when drvdbg=0xa0037 or 0x80037.
Hi, @ArthurC
I am so sorry if our reply makes you confused.
We will continue to check the given logs carefully and feedback to your queries one by one.
At the same time, can you please help to confirm about the "We think the system hang and 0x107 timeout are caused by auto_fw_reload when issue occurred" whether it occurred every time.
This might be another break point. We should not give up any one possible reasons or points to track.
Best regards,
Christine.
Hello @Christine_Li,
Why the ip is disapper during iperf3 test?
[Christine]
- Is it disappear before driver receive debug_dump command or after? If after then expected behavior as explained in point 1.
-> Before We sent the debug_dump command.
- Also, during re-connection process there might chance ip disappear.
-> Why can't do any reconnect to recover ?
- And even this IP handling will done by Network Manager, FW and Driver will not do anything. Customer always debug in their Network Manager path.
-> Why other vendor's module didn't occured this issue?
Please don't make point confused.
BTW, we provide 3 logs about 3 states in initial, issue occured & fw_dump.
Please compare First & Second, initial, issue occured to explan.
Hi, @ArthurC
Please find my inline response.
Why there is "Block woal_cfg80211_scan in abnormal driver state" in dmesg ?
[Christine] This message comes when driver receive debug_dump command, at that time driver start process hang because we need to read register values from the FW. And part of this process we are getting the message “Block woal_cfg80211_scan in abnormal driver state”, which is expected behavior when FW dump command executed.
Please see below dmesg logs time stamp:
=========
Line 8913: [ 222.932159] wlan: Received disassociation request on wlan0, reason: 3 //Christine
Line 8917: [ 222.950041] QUEUE_CMD: 802_11_DEAUTHENTICATE [0x24] is queued //Christine
Line 9900: [ 226.251348] wlan: HostMlme wlan0 Connected to bssid bc:XX:XX:XX:03:1a successfully //Christine
Line 43302: [ 910.377127] Recevie debug_dump command //Christine
Line 43449: [ 930.816864] ====PCIE DEBUG MODE OUTPUT END: 930.815566 ==== //Christine
Line 43457: [ 930.834633] Block woal_cfg80211_del_key in abnormal driver state //Christine
Line 43478: [ 930.904799] IOCTL is not allowed while the device is not present or hang //Christine
Line 43479: [ 930.946102] Block woal_cfg80211_scan in abnormal driver state //Christine
=============
Why the ip is disapper during iperf3 test?
[Christine]
- Is it disappear before driver receive debug_dump command or after? If after then expected behavior as explained in point 1.
- Also, during re-connection process there might chance ip disappear.
- And even this IP handling will done by Network Manager, FW and Driver will not do anything. Customer always debug in their Network Manager path.
Why Wi-Fi feature is broken & no working any more?
[Christine] After debug_dump command, Wi-Fi features will break, if auto_fw_reload is enable then it will reset the FW and work well. But in the given logs, we didn't see any Wi-Fi features broke before manually execute FW dump.
How the NXP concluded there is no any issue?
[Christine] As per log analysis, we are not seeing any issue from Driver and FW perspective. DUT able to re-connect with the same AP after disconnection.
Is the firmware sending cheatting command to software?
[Christine] Sorry, I could not understand this question. What we concluded above is based on given logs.
One point from your older comment, you mentioned “We think the system hang and 0x107 timeout are caused by auto_fw_reload when issue occurred”, Is this behavior observed every time? If yes, then we can discuss this point with our internal team.
Best regards,
Christine.
Hello @Christine_Li,
We confirmed our platform.
The photo from our platform gui about terminals showing dmesg, iperf3 and ifconfig as attached.
If there is no issue.
Why there is "Block woal_cfg80211_scan in abnormal driver state" in dmesg ?
Why the ip is disapper during iperf3 test?
Why Wi-Fi feature is broken & no working any more?
How the NXP concluded there is no any issue?
Please explan.
Is the firmware sending cheatting command to software?
Hi, @ArthurC
I had a discussion with internal team and from the detailed analysis of the logs journal_issue_occured.log, we concluded there is no any issue. Please find analysis below.
Initially DUT connected with AP: bc:a5:11:ad:03:1a
Line 7233: Jul 04 16:53:18 tbox100 wpa_supplicant[788]: wlan0: CTRL-EVENT-CONNECTED - Connection to bc:a5:11:ad:03:1a completed [id=0 id_str=]
Then DEAUTH received from AP
Jul 04 16:54:04 tbox100 wpa_supplicant[788]: wlan0: CTRL-EVENT-DISCONNECTED bssid=bc:a5:11:ad:03:1a reason=3 locally_generated=1
Then DUT successfully re-connected to AP
Jul 04 16:54:07 tbox100 wpa_supplicant[788]: wlan0: CTRL-EVENT-CONNECTED - Connection to bc:a5:11:ad:03:1a completed [id=0 id_str=]
After connection I did not find any suspected message, Device was up and running. I could not understand what issue observed and mentioned after disconnect, could not recover.
From logs looks like no any issue just you executed manual FW dump command when DUT was in working state.
There is no 0x107 time out as well no any re-connection issue. Device successfully re-connected with same AP.
Can you please help to confirm on your side?
Best regards,
Christine.
Hello @Christine_Li ,
The issue is unexpected Wi-Fi disconnected & can't be recovered any more during the test.
And we dump the log by ourselves when Wi-Fi connection broken occured.
We think the system hang and 0x107 timeout are caused by auto_fw_reload when issue occured.
Please fix issue about unexpected Wi-Fi connection broken and can't be recovered.
Hi, @ArthurC
We checked the latest logs , this time we are not seeing 0x107 timeout in the dmesg logs.
But found below info:
1. Line 8913: [ 222.932159] wlan: Received disassociation request on wlan0, reason: 3
Line 8914: [ 222.938666] wlan: REASON: (Deauth) Sending STA is leaving (or has left) IBSS or ESS
Line 8915:[ 222.946367] Deauth: bc:XX:XX:XX:03:1a
2. Line 43302: [ 910.377127] Recevie debug_dump command
We have below info to confirm with you:
Did you take dump manually?
2.If device is not in hang state and not observed 0x107 timeout then in which state you took dump? any issue observed?
or you found the Wi-Fi is disconnected automatically then took dump manually?
Because from logs looks like device is in working state and captured dump.
Wi-Fi disassociated at about: Jul 04 16:54:04
Dump is captured at about: Jul 04 17:04, after Wi-Fi disconnected issue occurred about [910-222]=688 seconds.
So prefer to confirm with you.
Thanks,
Christine.
Hello @Christine_Li,
Fortunately, our platform is not hang when issue occured this time with driver load parameter about drvdbg=0xa0037 and auto_fw_reload=0 both.
The needed logs and fw_dump files are as attached.
Please help to fix issue.
Thank You.
Hi, @ArthurC
Here we are blocked for the further debugging because we don't have the FW dumps and issue reproduced at our end only one time and due to auto_fw_reload enable we were not able to capture the dumps.
We have tried to re-create the same locally but setup run for more than 2 days and the issue is not seen.
As the issue is quicky seen at your end, Can you please re-run the test and share the dump which is necessary to identify the root cause of the issue, Because in the previously shared logs the dump files are not available.
Parallel, We are keep trying to re-create the issue locally.
Please make sure to collect the log with below parameters while running the scenario.
- Make sure to use drvdbg=0xa0037 and auto_fw_reload=0 in driver load parameter
- Please share FW dumps file, which should be auto generate and dump file path will be notified in the dmesg by the driver.
If auto dump not generated, please follow below steps to get dump manually.
echo "debug_dump" >/proc/mwlan/adapter0/config
cat /proc/mwlan/adapter0/drv_dump > file_drv_dump
cat /proc/mwlan/adapter0/fw_dump > file_fw_dump
Make sure the FW dump will be generate because the issue debugging is based on the FW dump only.
Besides, as I also know with drvdbg=0xa0037 or drvdbg=0x80037, your device will hang sometimes especially when issue is reproduced. So I still hope if you can send us one board, it will be appreciated. If you think send board to China Mainland is too difficult, how about send board directly to our India colleagues?
Please think about it, so that we can move forward together.
Otherwise, we have been blocked here for long time.
Best regards,
Christine.
Hello @Christine_Li,
About feature support in our product, there is wifi network sharing usage.
If we configured to be STA only, the drv_mode=1, how can we switch the working mode to be AP in runtime?
Is there any design support about working mode between STA & AP mode in runtime?
If not, is there any FW adjustment for this issue?
Hi, @ArthurC
From detailed analysis of your shared logs and local test observation, we found there are two instances of wpa_supplicant are created by nmcli for uap0 and wlan0/mlan0. As the issue is related to scan timeout and normally nmcli do very aggressive scan for both the interfaces, we are suspecting this is culprit.
As your use-case is only STA mode, we tried to run the STA only scenario by using drv_mode=1 for more than ~2 days and the issue is not observed.
Can you please tried out this solution at your end and share the feedback if you still facing the issue or not?
Please also take care below points before start the test.
Driver load parameters should be as below.
cfg80211_wext=0xf max_vir_bss=1 cal_data_cfg=none ps_mode=1 auto_ds=1 host_mlme=1 drv_mode=1 auto_fw_reload=0 drvdbg=0xa0037
If issue seen, please share FW dumps file, which should be auto generate and dump file path will be notified in the dmesg by the driver.
If auto dump not generated, please follow below steps to get dump manually.
echo "debug_dump" >/proc/mwlan/adapter0/config
cat /proc/mwlan/adapter0/drv_dump > file_drv_dump
cat /proc/mwlan/adapter0/fw_dump > file_fw_dump
Regards,
Christine.
Hi, @ArthurC
Thank you for the confirmation.
We actively debugging to identifying the suspect of the issue.
As the "cmd 0x107 error" issue is locally reproduced, We are trying to capture the dump files to help the engineering team to identify the suspect of the issue.
We will update you soon.
Best regards,
Christine.