I seem to have multiple exceptions in tcp_transmit_skb(). Two shown below. We have 44 boards running, and only a couple are failing. The boards are new instances of an old design that has been working for several years now. We recently changed uboot, but I can't say what changed because we don't have the source code for the previous uboot. I haven't touched the kernel.
I have got to believe there is information in the these logs that would help us debug the problem, but I can't find any documentation. For example, what does "TRAP: 0700" mean?
The problem only occurs when the board is being heavily used and the ambient air temperature is around 40C.
Jan 3 21:51:56 freescale user.emerg kernel: skb_under_panic: text:c02190d0 len:74 put:32 head:d8aaf400 data:d8aaf3e0 tail:0xd8aaf42a end:0xd8aaf4a0 dev:<NULL>
Jan 3 21:51:56 freescale user.emerg kernel: ------------[ cut here ]------------
Jan 3 21:51:56 freescale user.crit kernel: Kernel BUG at c01d51a0 [verbose debug info unavailable]
Jan 3 21:51:56 freescale user.warn kernel: Oops: Exception in kernel mode, sig: 5 [#1]
Jan 3 21:51:56 freescale user.warn kernel: SMP NR_CPUS=2 P2020 DS
Jan 3 21:51:56 freescale user.warn kernel: Modules linked in:
Jan 3 21:51:56 freescale user.warn kernel: NIP: c01d51a0 LR: c01d51a0 CTR: c0188888
Jan 3 21:51:56 freescale user.warn kernel: REGS: d7e4fb50 TRAP: 0700 Not tainted (2.6.32-svn5914)
Jan 3 21:51:56 freescale user.warn kernel: MSR: 00029000 <EE,ME,CE> CR: 22442424 XER: 20000000
Jan 3 21:51:56 freescale user.warn kernel: TASK = dbe9a600[599] 'python' THREAD: d7e4e000 CPU: 1
Jan 3 21:51:56 freescale user.warn kernel: GPR00: c01d51a0 d7e4fc00 dbe9a600 00000079 00021000 ffffffff c0189440 00000000
Jan 3 21:51:56 freescale user.warn kernel: GPR08: 000028b6 c03149ac 00000073 00465000 22442422 101a3bfc 00000000 00000000
Jan 3 21:51:56 freescale user.warn kernel: GPR16:
Jan 3 21:51:56 freescale user.info kernel: 00000003
Jan 3 21:51:56 freescale user.info kernel: 00010000 00000000 00004000 d7e4fce8 dbe7a7dc c03464e4 00000000
Jan 3 21:51:56 freescale user.warn kernel: GPR24: dbe7a7dc 00000020 c03390a4 00000020 d7e4fc30 d7592fec d7592fc8 d7592fc8
Jan 3 21:51:56 freescale user.warn kernel: NIP [c01d51a0] skb_under_panic+0x48/0x5c
Jan 3 21:51:56 freescale user.warn kernel: LR [c01d51a0] skb_under_panic+0x48/0x5c
Jan 3 21:51:56 freescale user.warn kernel: Call Trace:
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fc00] [c01d51a0] skb_under_panic+0x48/0x5c (unreliable)
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fc10] [c01d71a4] skb_push+0x58/0x60
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fc20] [c02190d0] tcp_transmit_skb+0xdc/0x760
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fc80] [c021bfb8] tcp_write_xmit+0x1fc/0x480
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fcd0] [c021c2a8] __tcp_push_pending_frames+0x38/0xb8
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fce0] [c020e474] tcp_sendmsg+0x1bc/0xc04
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fd60] [c01d0178] sock_sendmsg+0xb4/0xec
Jan 3 21:51:56 freescale user.warn kernel: [d7e4fe40] [c01d050c] sys_sendto+0xbc/0xf0
Jan 3 21:51:56 freescale user.warn kernel: [d7e4ff10] [c01d1090] sys_socketcall+0x1c0/0x238
Jan 3 21:51:56 freescale user.warn kernel: [d7e4ff40] [c000facc] ret_from_syscall+0x0/0x3c
Jan 3 21:51:56 freescale user.warn kernel: Instruction dump:
Jan 3 21:51:56 freescale user.warn kernel: 2f800000 80e30098 8103009c 81230090 81430094 419e0024 3c60c02c 90010008
Jan 3 21:51:56 freescale user.warn kernel: 7ca42b78 38637a98 7d655b78 480819b1 <0fe00000> 48000000 3c80c02a 3804131c
Jan 3 21:51:56 freescale user.warn kernel: ---[ end trace d0a44476c96c002e ]---
Jan 4 00:53:19 freescale auth.info login[662]: root login on 'pts/0'
Jan 4 01:39:22 freescale user.emerg kernel: skb_under_panic: text:c02190d0 len:95 put:32 head:d8975a00 data:d89759e0 tail:0xd8975a3f end:0xd8975aa0 dev:<NULL>
Jan 4 01:39:22 freescale user.emerg kernel: ------------[ cut here ]------------
Jan 4 01:39:23 freescale user.crit kernel: Kernel BUG at c01d51a0 [verbose debug info unavailable]
Jan 4 01:39:23 freescale user.warn kernel: Oops: Exception in kernel mode, sig: 5 [#2]
Jan 4 01:39:23 freescale user.warn kernel: SMP NR_CPUS=2 P2020 DS
Jan 4 01:39:23 freescale user.warn kernel: Modules linked in:
Jan 4 01:39:23 freescale user.warn kernel: NIP: c01d51a0 LR: c01d51a0 CTR: c0188888
Jan 4 01:39:23 freescale user.warn kernel: REGS: dbee3800 TRAP: 0700 Tainted: G D (2.6.32-svn5914)
Jan 4 01:39:23 freescale user.warn kernel: MSR: 00029000 <EE,ME,CE> CR: 24422424 XER: 20000000
Jan 4 01:39:23 freescale user.warn kernel: TASK = dbe9af80[160] 'SIO3_SuperAppli' THREAD: dbee2000 CPU: 0
Jan 4 01:39:23 freescale user.warn kernel: GPR00: c01d51a0 dbee38b0 dbe9af80 00000079 00021000 ffffffff c0189440 00000000
Jan 4 01:39:23 freescale user.warn kernel: GPR08: 00002f5a c03149ac 00000073 00455000 24422422 1010c278 00000000 00000000
Jan 4 01:39:23 freescale user.warn kernel: GPR16: 00000003 00010000 00000000 000005a8 dbee3998 dbe7893c c03464e4 00000000
Jan 4 01:39:23 freescale user.warn kernel: GPR24: dbe7893c 00000020 c03390a4 00000020 dbee38e0 d47d07ac d47d0788 d47d0788
Jan 4 01:39:23 freescale user.warn kernel: NIP [c01d51a0] skb_under_panic+0x48/0x5c
Jan 4 01:39:23 freescale user.warn kernel: LR [c01d51a0] skb_under_panic+0x48/0x5c
Jan 4 01:39:23 freescale user.warn kernel: Call Trace:
Jan 4 01:39:23 freescale user.warn kernel: [dbee38b0] [c01d51a0] skb_under_panic+0x48/0x5c (unreliable)
Jan 4 01:39:23 freescale user.warn kernel: [dbee38c0] [c01d71a4] skb_push+0x58/0x60
Jan 4 01:39:23 freescale user.warn kernel: [dbee38d0] [c02190d0] tcp_transmit_skb+0xdc/0x760
Jan 4 01:39:23 freescale user.warn kernel: [dbee3930] [c021bfb8] tcp_write_xmit+0x1fc/0x480
Jan 4 01:39:23 freescale user.warn kernel: [dbee3980] [c021c2a8] __tcp_push_pending_frames+0x38/0xb8
Jan 4 01:39:23 freescale user.warn kernel: [dbee3990] [c020e474] tcp_sendmsg+0x1bc/0xc04
Jan 4 01:39:23 freescale user.warn kernel: [dbee3a10] [c01d0178] sock_sendmsg+0xb4/0xec
Jan 4 01:39:23 freescale user.warn kernel: [dbee3af0] [c01d0578] kernel_sendmsg+0x2c/0x44
Jan 4 01:39:23 freescale user.warn kernel: [dbee3b00] [c010ad38] smb_sendv+0x104/0x304
Jan 4 01:39:23 freescale user.warn kernel: [dbee3b80] [c010b000] SendReceive2+0xc8/0x4f4
Jan 4 01:39:23 freescale user.warn kernel: [dbee3bc0] [c00f6fb0] CIFSSMBRead+0x16c/0x320
Jan 4 01:39:23 freescale user.warn kernel: [dbee3c10] [c010356c] T.1018+0xf8/0x2ac
Jan 4 01:39:23 freescale user.warn kernel: [dbee3c70] [c01037b4] cifs_readpage_worker+0x94/0x1ec
Jan 4 01:39:23 freescale user.warn kernel: [dbee3ca0] [c0103a9c] cifs_write_begin+0x190/0x210
Jan 4 01:39:23 freescale user.warn kernel: [dbee3ce0] [c0065958] generic_perform_write+0xc0/0x1e8
Jan 4 01:39:23 freescale user.warn kernel: [dbee3d40] [c0067708] generic_file_buffered_write+0x64/0xec
Jan 4 01:39:23 freescale user.warn kernel: [dbee3d80] [c0067d0c] __generic_file_aio_write+0x33c/0x50c
Jan 4 01:39:23 freescale user.warn kernel: [dbee3df0] [c0067f4c] generic_file_aio_write+0x70/0xf0
Jan 4 01:39:23 freescale user.warn kernel: [dbee3e20] [c00ee61c] cifs_file_aio_write+0x20/0x50
Jan 4 01:39:23 freescale user.warn kernel: [dbee3e30] [c00921bc] do_sync_write+0xc4/0x138
Jan 4 01:39:23 freescale user.warn kernel: [dbee3ef0] [c00922e4] vfs_write+0xb4/0x10c
Jan 4 01:39:23 freescale user.warn kernel: [dbee3f10] [c0092424] sys_write+0x4c/0x90
Jan 4 01:39:23 freescale user.warn kernel: [dbee3f40] [c000facc] ret_from_syscall+0x0/0x3c
Jan 4 01:39:23 freescale user.warn kernel: Instruction dump:
Jan 4 01:39:23 freescale user.warn kernel: 2f800000 80e30098 8103009c 81230090 81430094 419e0024 3c60c02c 90010008
Jan 4 01:39:23 freescale user.warn kernel: 7ca42b78 38637a98 7d655b78 480819b1 <0fe00000> 48000000 3c80c02a 3804131c
Jan 4 01:39:23 freescale user.warn kernel: ---[ end trace d0a44476c96c002f ]---
Jan 4 02:37:38 freescale user.debug kernel: prune_queue: c=70c81daf
Jan 4 03:28:45 freescale user.debug kernel: prune_queue: c=70c81daf
Jan 4 04:18:53 freescale user.debug kernel: prune_queue: c=70c81daf
You should turn on CONFIG_DEBUG_BUGVERBOSE to get better reporting of such events, but in this case there's a previous print saying that it was a call to skb_under_panic(). Since it's temperature dependent, it's probably not a software issue.
The trap number tells you what sort of exception you took -- in this case, it's a program exception (see "EXC_XFER_STD(0x0700, program_check_exception)"). This program exception was deliberately triggered by BUG() in order to generate a backtrace.