ethernet-1:00: port 2 failed to delete

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

ethernet-1:00: port 2 failed to delete

Jump to solution
750 Views
alexsv
Contributor II

Dear NXP engineers,

since L5.15.71_2.2.0 we observe new errors in the log during DSA switch configuration:

[   23.671830] device lan0 left promiscuous mode
[   23.683340] sys1: port 1(lan0) entered disabled state
[   23.724036] LAN9303_MDIO 5b050000.ethernet-1:00: port 2 failed to delete 00:a0:03:25:0d:cf vid 1 from fdb: -2
[   24.438938] sys1: port 1(lan0) entered blocking state
[   24.447414] sys1: port 1(lan0) entered disabled state
[   24.458661] device lan0 entered promiscuous mode
[   24.463747] sys1: port 1(lan0) entered blocking state
[   24.466529] LAN9303_MDIO 5b050000.ethernet-1:00: port 2 failed to add 00:a0:03:25:0d:cf vid 1 to fdb: -95
[   24.468883] sys1: port 1(lan0) entered forwarding state
[   25.145881] device lan0 left promiscuous mode
[   25.155739] sys1: port 1(lan0) entered disabled state
[   25.194089] LAN9303_MDIO 5b050000.ethernet-1:00: port 2 failed to delete 00:a0:03:25:0d:cf vid 1 from fdb: -2
[   25.391185] sys1: port 1(lan1) entered blocking state
[   25.402125] sys1: port 1(lan1) entered disabled state
[   25.413279] device lan1 entered promiscuous mode
[   25.419397] LAN9303_MDIO 5b050000.ethernet-1:00: port 1 failed to add 00:a0:03:25:0d:cf vid 1 to fdb: -95
[   25.471755] sys1: port 2(lan0) entered blocking state
[   25.477173] sys1: port 2(lan0) entered disabled state
[   25.484444] LAN9303_MDIO 5b050000.ethernet-1:00: port 2 failed to add 00:a0:03:25:0d:cf vid 1 to fdb: -95
[   25.505081] device lan0 entered promiscuous mode
[   25.510165] sys1: port 2(lan0) entered blocking state
[   25.515282] sys1: port 2(lan0) entered forwarding state

Or with Marvell switches:

[   28.612012] device lan0 left promiscuous mode
[   28.623071] sys1: port 1(lan0) entered disabled state
[   28.677660] mv88e6085 5b050000.ethernet-1:00: port 2 failed to delete 00:a0:03:25:77:a6 vid 1 from fdb: -95
[   29.303068] sys1: port 1(lan0) entered blocking state
[   29.309124] sys1: port 1(lan0) entered disabled state
[   29.327602] device lan0 entered promiscuous mode
[   29.361424] sys1: port 1(lan0) entered blocking state
[   29.366557] sys1: port 1(lan0) entered forwarding state
[   29.844204] device lan0 left promiscuous mode
[   29.852886] sys1: port 1(lan0) entered disabled state
[   30.137652] sys1: port 1(lan1) entered blocking state
[   30.143013] sys1: port 1(lan1) entered disabled state
[   30.171619] device lan1 entered promiscuous mode
[   30.243102] sys1: port 2(lan0) entered blocking state
[   30.248468] sys1: port 2(lan0) entered disabled state
[   30.277409] device lan0 entered promiscuous mode
[   30.284984] sys1: port 2(lan0) entered blocking state
[   30.290085] sys1: port 2(lan0) entered forwarding state

This is probably caused by the following commit:

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Wed Mar 2 21:14:10 2022 +0200

net: dsa: install secondary unicast and multicast addresses as host FDB/MDB

Could you please suggest us what is missing in LAN9303 and mv88e6085 DSA drivers to avoid errors in conjunction with the above new feature?

0 Kudos
1 Solution
684 Views
vladimir_oltean
NXP Employee
NXP Employee

Hello,

The error messages in the lan9303 and mv88e6xxx drivers do not have the same root cause, and neither of them is caused, or even exposed, by commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB").

The change set "RX filtering in DSA", which has exposed sub-par implementations of the FDB and MDB operations in drivers, had a simple purpose.
https://patchwork.kernel.org/project/netdevbpf/cover/20210629140658.2510288-1-olteanv@gmail.com/
Bridged traffic destined towards the CPU would also be flooded towards other bridge ports, because of a lack of address learning on the switches' CPU port. Try to put lan1 and lan2 under br0 and ping br0 from a station connected to lan1. You will see the ICMP requests on a station connected to lan2 as well!
By detecting the bridge "local" FDB entries and automatically installing them on the DSA CPU ports, this flooding is prevented.

On the other hand, the change suggested by you as "probable" - commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") - is not to blame here. That solves a different purpose: on standalone (non-bridged) ports, we can prevent flooding unknown traffic to the CPU, and deliver only traffic to a MAC address that software has expressed an interest in. As opposed to the "RX filtering in DSA" patch set, this is just an optimization, so it has been made opt-in, and is guarded by the dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering() checks, which return false for your (and for most) hardware.

In the case of mv88e6xxx, the driver returns -EOPNOTSUPP in port_fdb_del() because the VLAN 1 was removed earlier from the VTU, so mv88e6xxx_port_db_load_purge() can no longer find a VID -> FID mapping.

I am noticing that the error you have posted comes from a bridge deletion code path. So this patch from the mainline kernel will fix that:

https://patchwork.kernel.org/project/netdevbpf/patch/20220211174506.3874409-1-vladimir.oltean@nxp.co...

However, there are other circumstances in which the mv88e6xxx driver may still incorrectly return -EOPNOTSUPP when attempting to add a FDB entry; those are solved by this other patch set, which also describes the problem in detail:

https://patchwork.kernel.org/project/netdevbpf/cover/20220215170218.2032432-1-vladimir.oltean@nxp.co...

The easiest way to get access to these bug fixes / structural improvements is to switch to a 6.1 LTS kernel. I do not recommend attempting to backport core DSA, switchdev and bridge changes.


As for lan9303, its lan9303_port_fdb_add() and lan9303_port_fdb_del() methods return -EOPNOTSUPP whenever vid != 0. So, no wonder you see errors when vid=1 as in your log, and this is new behavior only to the extent that DSA now automatically adds some FDB entries - because otherwise, it comes from commit 0620427ea0d6 ("net: dsa: lan9303: Add fdb/mdb manipulation") dated 2017. But if you add an FDB entry manually, like "bridge fdb add dev lan1 00:01:02:03:04:05 vid 1 master static", surely you will see the same kind of message.

The lan9303 errors were "silent" until kernel commit 2fd186501b1c ("net: dsa: be louder when a non-legacy FDB operation fails").

The only way to get rid of the lan9303 errors is to correctly implement FDB and MDB operations for entries with non-zero VID, and not just VLAN-unaware FDB/MDB entries. Otherwise, if you don't care about these addresses being offloaded to hardware, you can ignore the errors - worst case, it will be just as before. It was a deliberate decision to make the errors loud, to attract attention that things are not perfect, and hopefully, some of the people with the hardware can improve the state of things.

View solution in original post

0 Kudos
4 Replies
685 Views
vladimir_oltean
NXP Employee
NXP Employee

Hello,

The error messages in the lan9303 and mv88e6xxx drivers do not have the same root cause, and neither of them is caused, or even exposed, by commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB").

The change set "RX filtering in DSA", which has exposed sub-par implementations of the FDB and MDB operations in drivers, had a simple purpose.
https://patchwork.kernel.org/project/netdevbpf/cover/20210629140658.2510288-1-olteanv@gmail.com/
Bridged traffic destined towards the CPU would also be flooded towards other bridge ports, because of a lack of address learning on the switches' CPU port. Try to put lan1 and lan2 under br0 and ping br0 from a station connected to lan1. You will see the ICMP requests on a station connected to lan2 as well!
By detecting the bridge "local" FDB entries and automatically installing them on the DSA CPU ports, this flooding is prevented.

On the other hand, the change suggested by you as "probable" - commit 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") - is not to blame here. That solves a different purpose: on standalone (non-bridged) ports, we can prevent flooding unknown traffic to the CPU, and deliver only traffic to a MAC address that software has expressed an interest in. As opposed to the "RX filtering in DSA" patch set, this is just an optimization, so it has been made opt-in, and is guarded by the dsa_switch_supports_uc_filtering() and dsa_switch_supports_mc_filtering() checks, which return false for your (and for most) hardware.

In the case of mv88e6xxx, the driver returns -EOPNOTSUPP in port_fdb_del() because the VLAN 1 was removed earlier from the VTU, so mv88e6xxx_port_db_load_purge() can no longer find a VID -> FID mapping.

I am noticing that the error you have posted comes from a bridge deletion code path. So this patch from the mainline kernel will fix that:

https://patchwork.kernel.org/project/netdevbpf/patch/20220211174506.3874409-1-vladimir.oltean@nxp.co...

However, there are other circumstances in which the mv88e6xxx driver may still incorrectly return -EOPNOTSUPP when attempting to add a FDB entry; those are solved by this other patch set, which also describes the problem in detail:

https://patchwork.kernel.org/project/netdevbpf/cover/20220215170218.2032432-1-vladimir.oltean@nxp.co...

The easiest way to get access to these bug fixes / structural improvements is to switch to a 6.1 LTS kernel. I do not recommend attempting to backport core DSA, switchdev and bridge changes.


As for lan9303, its lan9303_port_fdb_add() and lan9303_port_fdb_del() methods return -EOPNOTSUPP whenever vid != 0. So, no wonder you see errors when vid=1 as in your log, and this is new behavior only to the extent that DSA now automatically adds some FDB entries - because otherwise, it comes from commit 0620427ea0d6 ("net: dsa: lan9303: Add fdb/mdb manipulation") dated 2017. But if you add an FDB entry manually, like "bridge fdb add dev lan1 00:01:02:03:04:05 vid 1 master static", surely you will see the same kind of message.

The lan9303 errors were "silent" until kernel commit 2fd186501b1c ("net: dsa: be louder when a non-legacy FDB operation fails").

The only way to get rid of the lan9303 errors is to correctly implement FDB and MDB operations for entries with non-zero VID, and not just VLAN-unaware FDB/MDB entries. Otherwise, if you don't care about these addresses being offloaded to hardware, you can ignore the errors - worst case, it will be just as before. It was a deliberate decision to make the errors loud, to attract attention that things are not perfect, and hopefully, some of the people with the hardware can improve the state of things.

0 Kudos
680 Views
alexsv
Contributor II

Thanks a lot Vladimir for your time and explanations! I'll try to implement your suggestions! Have a nice weekend!

0 Kudos
699 Views
Sanket_Parekh
NXP TechSupport
NXP TechSupport
Hi @alexsv 
 
I hope you are doing well
 
Are you using NXP demo images? If not, kindly try to reproduce the issue on the EVK board with the demo images of the same BSP version from the Embedded Linux for i.MX Applications Processors
 
Thanks & Regards,
Sanket Parekh
0 Kudos
695 Views
alexsv
Contributor II

Dear Sanket, thank you for the quick reply!
NXP EVK do not have the affected switch models I suppose.

Would it be possible for you to seek an advice from Vladimir, the author of the patch in question?

Best regards!

0 Kudos