88W8997 module command timeout issue (interface PCIE+UART, Host: iMX8MQ) - reopen

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

88W8997 module command timeout issue (interface PCIE+UART, Host: iMX8MQ) - reopen

25,658 Views
yao_feng
Contributor III

The FW crashed issue is still exist in Generic_PCIE-WLAN-UART-BT-8997-LNX_6_6_3-IMX8-16.92.21.p119.2-16.92.21.p119.2-MM6X16437.P3-GPL.

follow previous case:

https://community.nxp.com/t5/Wireless-Connectivity/88W8997-module-command-timeout-issue-interface-PC...

and reopen by this case.  

@ArthurC
@Christine_Li
88W8997 

0 Kudos
Reply
169 Replies

1,815 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Do you have interim update if you have started test?

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,833 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Did you get any chance to test this issue and verify our patch based on Q2-24 release?

Please let me know if you have any updates.

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,882 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Our internal team reviewed our local logs and your logs and observed below identical error before scan timeout.

[74019.137550] uap0:

[74019.137567] CMD_RESP (74019.136043): 802_11_SCAN_EXT [0x8107], result 4, len 142, seqno 0x10d5

[74019.148288] CMD_RESP: cmd 0x107 error, result=0x4

Based on our analysis:

  • The suspect of this error is two instances of wpa_supplicant on mlan0 and uap0 may be impacting the behavior. Looks like nmcli creating these two instances and second instance of supplicant initiating scan on uap0 interface.
  • Because of this error scan time exceeds and gives scan timeout. So, have added the handling of identified suspect in the attached driver-patch.

We have sanitized the patch, test still running not observed error on local setup.

Attaching patch here.

Our suggestion is: please switch to latest release Q2-24 (https://www.nxp.com/webapp/sps/download/license.jsp?colCode=GENPCIE16.92.21.p119.3MM6X16437P21GPL&ap...) and apply the patch on top of Q2-24 release. It is recommended to always use the latest-release for stability.

 

Can you please help us to  understand how you start Soft AP on uap0, if nmcli creating two instances(mlan0 and uap0) of wpa_supplicant?

Also, Please help to monitor if you see error “cmd 0x107 error, result=0x4” during test, after this error our handling will work.

 

Best regards,

Christine.

0 Kudos
Reply

1,892 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

I am very sorry to make you feel uncomfortable and dissatisfied with our products.

But to resolve the issue, we are still working actively.

Please see below:

Regarding disconnection issue,

Please check below screenshot from the given log.

 

Christine_Li_1-1720583856511.png

 

We can see Disconnect is due to NetworkManager issued Disconnect to Supplicant, reason highlighted that new Activation request from Network manager.

New Activation request will cause disconnect and reconnect, this is not an issue.  And this disconnect triggered is not due to any NXP side DRV or FW issue.

 

Can you please check if you have issued a new Activation on the same Network using nmcli? Can you please help to check nmcli path.

 

Regarding 0x107 scan timeout,

We are in discussion with internal team and will update you soon with the next steps and debug approach.

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,949 Views
ArthurC
Contributor III

Hello @Christine_Li,

 

Regarding disconnection issue,

  • As I mentioned, from the latest customer logs we can conclude no any issue of Wi-Fi FW and Driver. The deauth received from the AP and after deauth DUTSTA successfully able to re-connect. Even from the shared screenshot we can see wlan0 interface is active.
  • Here customer's main issue is why wlan0 interface not able to get the IP address and that could be because of their Network manager utility and not our Diver/FW issue.

 

Here disconnection do not seem to be NXP DRV/FW issue, please check/verify your network utility.

 

-> Why the module do disconnect by itself?  We don't think so. Because in the same setting, other vendor's modules works OK, such as  Boardcom BCM4356.  Looks like 88W8997 is not a stable product. If so, we will not use this solution any more.  

 

Looks like we waste our time  for it. 

 

 

 

 

 

 

 

0 Kudos
Reply

1,947 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Please see our internal team's feedback as below:
==========

Regarding disconnection issue,

  • As I mentioned, from the latest customer logs we can conclude no any issue of Wi-Fi FW and Driver. The deauth received from the AP and after deauth DUTSTA successfully able to re-connect. Even from the shared screenshot we can see wlan0 interface is active.
  • Here customer's main issue is why wlan0 interface not able to get the IP address and that could be because of their Network manager utility and not our Diver/FW issue.

 

Here disconnection do not seem to be NXP DRV/FW issue, please check/verify your network utility.

 

Regarding 0x107 scan timeout issue,

Thanks for the confirmation that 0x107 timeout only observe when auto_fw_reload is enabled. This is the actual issue which require to debug and we will be focusing in this thread.

Please help with below queries for further debugging.

  • What is the frequency of 0x107 timeout issue, we locally observed only one time.
  • Can you please share model of Netgear AP used AP configuration? So, we can replicate the same at our end.
  • In previous communication, We have feedback from you that you are not seeing scan timeout with MM Q1-24 release, only have disconnection issue then how are you observing scan timeout again with the same Q1-24 release?
    • What is the difference in your steps/environment/configuration for observed scan timeout issue now?
  • Can you please check whether is available for this week to have a debug session call and you can show us your setup and repro the issue live over the call?

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,859 Views
ArthurC
Contributor III

Hello @Christine_Li ,

 

In our platform, without auto_fw_reload=0 in driver probe parameter system will be hanged every time when drvdbg=0xa0037 or 0x80037.

0 Kudos
Reply

1,863 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

I am so sorry if our reply makes you confused.

We will continue to check the given logs carefully and feedback to your queries one by one.

At the same time, can you please help to confirm  about the "We think the system hang and 0x107 timeout are caused by auto_fw_reload when issue occurred" whether it occurred every time. 

This might be another break point. We should not give up any one possible reasons or points to track.

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,863 Views
ArthurC
Contributor III

Hello @Christine_Li,

 

Why the ip is disapper during iperf3 test?

[Christine]

- Is it disappear before driver receive debug_dump command or after? If after then expected behavior as explained in point 1.

-> Before We sent the debug_dump command.

 

- Also, during re-connection process there might chance ip disappear.

-> Why can't do any reconnect to recover ? 

- And even this IP handling will done by Network Manager, FW and Driver will not do anything. Customer always debug in their Network Manager path.

-> Why other vendor's module didn't occured this issue?

 

Please don't make point confused.

 

BTW, we provide 3 logs about 3 states in initial, issue occured & fw_dump.

 

Please compare First & Second, initial, issue occured to explan.

 

 

0 Kudos
Reply

1,828 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Please find my inline response.

 

Why there is "Block woal_cfg80211_scan in abnormal driver state" in dmesg ?

[Christine] This message comes when driver receive debug_dump command, at that time driver start process hang because we need to read register values from the FW. And part of this process we are getting the message “Block woal_cfg80211_scan in abnormal driver state”, which is expected behavior when FW dump command executed.

Please see below dmesg logs time stamp:

=========

Line 8913: [ 222.932159] wlan: Received disassociation request on wlan0, reason: 3 //Christine
Line 8917: [ 222.950041] QUEUE_CMD: 802_11_DEAUTHENTICATE [0x24] is queued //Christine
Line 9900: [ 226.251348] wlan: HostMlme wlan0 Connected to bssid bc:XX:XX:XX:03:1a successfully //Christine
Line 43302: [ 910.377127] Recevie debug_dump command //Christine
Line 43449: [ 930.816864] ====PCIE DEBUG MODE OUTPUT END: 930.815566 ==== //Christine
Line 43457: [ 930.834633] Block woal_cfg80211_del_key in abnormal driver state //Christine
Line 43478: [ 930.904799] IOCTL is not allowed while the device is not present or hang //Christine
Line 43479: [ 930.946102] Block woal_cfg80211_scan in abnormal driver state //Christine

=============

Why the ip is disapper during iperf3 test?

[Christine]

- Is it disappear before driver receive debug_dump command or after? If after then expected behavior as explained in point 1.

- Also, during re-connection process there might chance ip disappear.

- And even this IP handling will done by Network Manager, FW and Driver will not do anything. Customer always debug in their Network Manager path.

Why Wi-Fi feature is broken & no working any more?

[Christine] After debug_dump command, Wi-Fi features will break, if auto_fw_reload is enable then it will reset the FW and work well. But in the given logs, we didn't see any Wi-Fi features broke before manually execute FW dump.

How the NXP concluded there is no any issue?

[Christine] As per log analysis, we are not seeing any issue from Driver and FW perspective. DUT able to re-connect with the same AP after disconnection.

Is the firmware sending cheatting command to software?

[Christine] Sorry, I could not understand this question. What we concluded above is based on given logs.

One point from your older comment, you mentioned “We think the system hang and 0x107 timeout are caused by auto_fw_reload when issue occurred”, Is this behavior observed every time? If yes, then we can discuss this point with our internal team.

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,833 Views
ArthurC
Contributor III

Hello @Christine_Li,

 

We confirmed our platform.

The photo from our platform gui about terminals showing dmesg, iperf3 and ifconfig as attached. 

 

If there is no issue.

Why there is "Block woal_cfg80211_scan in abnormal driver state" in dmesg ?

Why the ip is disapper during iperf3 test?

Why Wi-Fi feature is broken & no working any more?

 

How the NXP concluded there is no any issue?

 

Please explan.

 

Is the firmware sending cheatting command to software?

 

 

0 Kudos
Reply

1,837 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

I had a discussion with internal team and from the detailed analysis of the logs journal_issue_occured.log, we concluded there is no any issue. Please find analysis below.

 

Initially DUT connected with AP: bc:a5:11:ad:03:1a

Line 7233: Jul 04 16:53:18 tbox100 wpa_supplicant[788]: wlan0: CTRL-EVENT-CONNECTED - Connection to bc:a5:11:ad:03:1a completed [id=0 id_str=]

 

Then DEAUTH received from AP

 Jul 04 16:54:04 tbox100 wpa_supplicant[788]: wlan0: CTRL-EVENT-DISCONNECTED bssid=bc:a5:11:ad:03:1a reason=3 locally_generated=1

 

Then DUT successfully re-connected to AP

Jul 04 16:54:07 tbox100 wpa_supplicant[788]: wlan0: CTRL-EVENT-CONNECTED - Connection to bc:a5:11:ad:03:1a completed [id=0 id_str=]

 

After connection I did not find any suspected message, Device was up and running. I could not understand what issue observed and mentioned after disconnect, could not recover.

 

From logs looks like no any issue just you executed manual FW dump command when DUT was in working state.

 

 There is no 0x107 time out as well no any re-connection issue. Device successfully re-connected with same AP.

 

Can you please help to confirm on your side? 

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,901 Views
ArthurC
Contributor III

Hello @Christine_Li ,

 

The issue is unexpected Wi-Fi disconnected & can't be recovered any more during the test.

 

And we dump the log by ourselves when Wi-Fi connection broken occured.

 

We think the system hang and 0x107 timeout are caused by auto_fw_reload when issue occured.

 

Please fix issue about unexpected Wi-Fi connection broken and can't be recovered.

 

0 Kudos
Reply

1,860 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

We checked the latest logs , this time we are not seeing 0x107 timeout in the dmesg logs.

But found below info:

1. Line 8913: [ 222.932159] wlan: Received disassociation request on wlan0, reason: 3

   Line 8914: [ 222.938666] wlan: REASON: (Deauth) Sending STA is leaving (or has left) IBSS or ESS
   Line 8915:[ 222.946367] Deauth: bc:XX:XX:XX:03:1a

 

2. Line 43302: [ 910.377127] Recevie debug_dump command

 

We have below info to confirm with you:

Did you take dump manually?

2.If device is not in hang state and not observed 0x107 timeout then in which state you took dump? any issue observed?

or you found the Wi-Fi is disconnected automatically then took dump manually?

 

Because from logs looks like device is in working state and captured dump.

 Wi-Fi disassociated at about: Jul 04 16:54:04

Dump is captured at about: Jul 04 17:04, after Wi-Fi disconnected issue occurred about  [910-222]=688 seconds.

So  prefer to confirm with you.

 

Thanks,

Christine.

 

Tags (1)
0 Kudos
Reply

1,884 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Thanks for your efforts to repro and provide us the requested logs.

We will keep you posted once have any updates.

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,873 Views
ArthurC
Contributor III

Hello @Christine_Li,

 

Fortunately, our platform is not hang when issue occured this time with driver load parameter about drvdbg=0xa0037 and auto_fw_reload=0 both.

 

The needed logs and fw_dump files are as attached.

 

Please help to fix issue.

 

Thank You.

0 Kudos
Reply

1,841 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Here we are blocked for the further debugging because we don't have the FW dumps and issue reproduced at our end only one time and due to auto_fw_reload enable we were not able to capture the dumps.

 We have tried to re-create the same locally but setup run for more than 2 days and the issue is not seen.

As the issue is quicky seen at your end, Can you please re-run the test and share the dump which is necessary to identify the root cause of the issue, Because in the previously shared logs the dump files are not available.

Parallel, We are keep trying to re-create the issue locally.

Please make sure to collect the log with below parameters while running the scenario.

- Make sure to use drvdbg=0xa0037 and  auto_fw_reload=0 in driver load parameter

- Please share FW dumps file, which should be auto generate and dump file path will be notified in the dmesg by the driver.

 

If auto dump not generated, please follow below steps to get dump manually.

echo "debug_dump" >/proc/mwlan/adapter0/config

cat /proc/mwlan/adapter0/drv_dump > file_drv_dump

cat /proc/mwlan/adapter0/fw_dump > file_fw_dump

 

Make sure the FW dump will be generate because the issue debugging is based on the FW dump only.

Besides, as I also know with drvdbg=0xa0037 or drvdbg=0x80037, your device will hang sometimes especially when issue is reproduced.  So I still hope if you can send us one board, it will be appreciated.  If you think send board to China Mainland is too difficult, how about send board directly to our India colleagues?

Please think about it, so that we can move forward together.

Otherwise, we have been blocked here for long time.

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,867 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Thanks for your quick reply.

OK, We thought you only need STA mode in your product from your test scenarios.

Now we got your point, you need both STA and also AP mode.

Then let me discuss internally for further actions.

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply

1,855 Views
ArthurC
Contributor III

Hello @Christine_Li,

 

About feature support in our product, there is wifi network sharing usage.

 

If we configured to be STA only, the drv_mode=1, how can we switch the working mode to be AP in runtime?

 

Is there any design support about working mode between STA & AP mode in runtime?

 

If not, is there any FW adjustment for this issue? 

 

0 Kudos
Reply

1,843 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

From detailed analysis of your shared logs and local test observation, we found there are two instances of wpa_supplicant are created by nmcli for uap0 and wlan0/mlan0. As the issue is related to scan timeout and normally nmcli do very aggressive scan for both the interfaces, we are suspecting this is culprit.

As your use-case is only STA mode, we tried to run the STA only scenario by using drv_mode=1 for more than ~2 days and the issue is not observed.

 

Can you please tried out this solution at your end and share the feedback if you still facing the issue or not?

 

Please also take care below points before start the test.

Driver load parameters should be as below.

cfg80211_wext=0xf max_vir_bss=1 cal_data_cfg=none ps_mode=1 auto_ds=1 host_mlme=1 drv_mode=1 auto_fw_reload=0 drvdbg=0xa0037

 

If issue seen, please share FW dumps file, which should be auto generate and dump file path will be notified in the dmesg by the driver.

If auto dump not generated, please follow below steps to get dump manually.

 

echo "debug_dump" >/proc/mwlan/adapter0/config

cat /proc/mwlan/adapter0/drv_dump > file_drv_dump

cat /proc/mwlan/adapter0/fw_dump > file_fw_dump

 

Regards,

Christine.

Tags (1)
0 Kudos
Reply

2,007 Views
Christine_Li
NXP TechSupport
NXP TechSupport

Hi, @ArthurC 

Thank you for the confirmation.

We actively debugging to identifying the suspect of the issue.

As the "cmd 0x107 error" issue is locally reproduced, We are trying to capture the dump files to help the engineering team to identify the suspect of the issue.

We will update you soon.

 

Best regards,

Christine.

Tags (1)
0 Kudos
Reply