lock of qman_fq_lookup_table

jiafang · ‎01-17-2024

Hi,

We have a question about the qman_fq_lookup_table.

From the code, we can see:

 259 static int find_empty_fq_table_entry(u32 *entry, struct qman_fq *fq)
 260 {
 261         u32 i;
 262
 263         spin_lock(&fq_hash_table_lock);
 264         /* Can't use index zero because this has special meaning
 265          * in context_b field. */
 266         for (i = 1; i < qman_fq_lookup_table_size; i++) {
 267                 if (qman_fq_lookup_table[i] == NULL) {
 268                         *entry = i;
 269                         qman_fq_lookup_table[i] = fq;
 270                         spin_unlock(&fq_hash_table_lock);
 271                         return 0;
 272                 }
 273         }
 274         spin_unlock(&fq_hash_table_lock);
 275         return -ENOMEM;
 276 }

 278 static void clear_fq_table_entry(u32 entry)
 279 {
 280         spin_lock(&fq_hash_table_lock);
 281         BUG_ON(entry >= qman_fq_lookup_table_size);
 282         qman_fq_lookup_table[entry] = NULL;
 283         spin_unlock(&fq_hash_table_lock);
 284 }
 285
 286 static inline struct qman_fq *get_fq_table_entry(u32 entry)
 287 {
 288         BUG_ON(entry >= qman_fq_lookup_table_size);
 289         return qman_fq_lookup_table[entry];
 290 }

When we find_empty_fq_table_entry(), we have a fq_hash_table_lock;

When we clear_fq_table_entry(), we also have a fq_hash_table_lock;

But when we get_fq_table_entry(), we don't have a fq_hash_table_lock.

Do you think if we should use a lock for get_fq_table_entry ?

yipingwang · ‎02-05-2024

No. I do not think it is necessary to use lock here.

jiafang · ‎02-05-2024

We ask this as we meet an issue that handle kernel NULL pointer at 0
[ 184.234396] PC is at qman_p_poll_dqrr+0x238/0x288
[ 184.237793] LR is at qman_p_poll_dqrr+0x240/0x288
[ 184.241189] pc : [<ffff0000088f6088>] lr : [<ffff0000088f6090>] pstate: 80000145
..
[ 184.389893] [<ffff0000088f6088>] qman_p_poll_dqrr+0x238/0x288
[ 184.394336] [<ffff00000879137c>] dpaa_eth_poll+0x2c/0x68
[ 184.398342] [<ffff000008936488>] net_rx_action+0x150/0x3f8
[ 184.402522] [<ffff000008081be4>] __do_softirq+0x12c/0x374
[ 184.406616] [<ffff0000080add9c>] irq_exit+0x84/0x88

objdump code:
ffff0000088f6068: b9440f21 ldr w1, [x25, #1036] //dq->contextB
ffff0000088f606c: f9400660 ldr x0, [x19, #8] //qman_fq_lookup_table_size
ffff0000088f6070: eb00003f cmp x1, x0 // BUG_ON(entry >= qman_fq_lookup_table_size);
ffff0000088f6074: 540001e2 b.cs ffff0000088f60b0 <qman_p_poll_dqrr+0x260> // b.hs, b.nlast
fff0000088f6078: f9400a63 ldr x3, [x19, #16]
ffff0000088f607c: aa1a03e0 mov x0, x26
ffff0000088f6080: f8617863 ldr x3, [x3, x1, lsl #3]
ffff0000088f6084: aa0303e1 mov x1, x3
ffff0000088f6088: f9400063 ldr x3, [x3] // x3 is 0
ffff0000088f608c: d63f0060 blr x3

It seems like the failed on:
__poll_portal_fast()
...
res = fq->cb.dqrr(p, fq, dq); // crash here

Do you have any idea how this dqrr is changed to 0 ?

yipingwang · ‎02-05-2024

We will investigate your problem.

jiafang · ‎02-27-2024

Do you have any idea of this issue ?

yipingwang · ‎02-28-2024

Can it be reproduced with the default code? Any modification? Can it be reproduced every time?
From the objdump to see, it seems the fq is NULL, so could you check the entry index when this issue occurred? Then check whether this entry has been filled correctly.

jiafang · ‎02-29-2024

Regarding the entry value, the code has the checking:
BUG_ON(entry >= qman_fq_lookup_table_size);
So it seems like the entry should be a correct value since no BUG_ON here.

jiafang · ‎02-28-2024

It is very hard to reproduce and it happened with the default code.

yipingwang · ‎02-28-2024

OK, thx