crash during caam async hash generation

cancel
Showing results for 
Search instead for 
Did you mean: 

crash during caam async hash generation

992 Views
markusstockhaus
Contributor I

Hi,

I'm trying to use the caamhash driver on a TP-Link WDR4900 v1 (e500v2 P1014).

Linux kernel is 3.10.49 and module loading works fine. But when I try to test an

async sha256 hash I get a crash dump.

I nailed it down to the following command in function  ahash_set_sh_desc() of

caamhash.c:

        /* Load data and write to result or context */

        ahash_append_load_str(desc, ctx->ctx_len);

        ctx->sh_desc_update_dma = dma_map_single(jrdev, desc, desc_bytes(desc), <<< CRASH HERE

                                                 DMA_TO_DEVICE);

dmesg shows:

[   76.672773] platform ffe31000.jr: failed to flush job ring 0

[   76.684773] platform ffe32000.jr: failed to flush job ring 1

[   76.696765] platform ffe33000.jr: failed to flush job ring 2

[   76.708772] platform ffe34000.jr: failed to flush job ring 3

[   76.715288] caam ffe30000.crypto: device ID = 0x0a14010000000000 (Era 3)

[   76.722023] caam ffe30000.crypto: job rings = 4, qi = 0

[   76.797922] testing speed of async sha256

[   76.817173] Unable to handle kernel paging request for data at address 0x00000010

[   76.824655] Faulting instruction address: 0xc98b04a8

[   76.829620] Oops: Kernel access of bad area, sig: 11 [#1]

[   76.835009] Freescale P1014

[   76.837794] Modules linked in: tcrypt(+) caamhash caam ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables gpio_keys crc_ccitt compat booke_wdt ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 dm_crypt dm_mirror dm_region_hash dm_log dm_mod ipv6 algif_skcipher algif_hash af_alg md5 arc4 crypto_blkcipher crypto_hash leds_gpio ehci_platform ehci_hcd fsl_mph_dr_of button_hotplug input_core usbcore nls_base usb_common

[   76.913294] CPU: 0 PID: 2641 Comm: insmod Not tainted 3.10.49 #3

[   76.919293] task: c46b9860 ti: c46e8000 task.ti: c46e8000

[   76.924683] NIP: c98b04a8 LR: c98b0478 CTR: c016f074

[   76.929639] REGS: c46e9bc0 TRAP: 0300   Not tainted  (3.10.49)

[   76.935460] MSR: 00029000 <CE,EE,ME>  CR: 40002082  XER: 20000000

[   76.941558] DEAR: 00000010, ESR: 00000000

[   76.945557]

[   76.945557] GPR00: c98b0478 c46e9c70 c46b9860 0000001e c901c501 c0310000 00000018 c0317b7c

[   76.945557] GPR08: 00000000 00000000 00000000 00000115 00000000 1001aae0 0000fff2 0000fff1

[   76.945557] GPR16: c0057b8c 00000000 00000124 000004b0 0000001e c0317800 c98e4904 c4c0a690

[   76.945557] GPR24: c98e4054 00000000 00000020 00000000 c0330000 c4a26610 c469a05c c469a000

[   76.975255] NIP [c98b04a8] 0xc98b04a8

[   76.978909] LR [c98b0478] 0xc98b0478

[   76.982474] Call Trace:

[   76.984914] [c46e9c70] [c98b0478] 0xc98b0478 (unreliable)

[   76.990319] [c46e9cb0] [c01152d8] crypto_create_tfm+0x88/0xd4

[   76.996061] [c46e9cd0] [c01153b8] crypto_alloc_tfm+0x94/0xe4

[   77.001717] [c46e9d00] [c98dfc18] init_module+0x25c18/0x28988 [tcrypt]

[   77.008239] [c46e9dc0] [c98e1cf8] init_module+0x27cf8/0x28988 [tcrypt]

[   77.014761] [c46e9de0] [c98ba084] init_module+0x84/0x138 [tcrypt]

[   77.020851] [c46e9e00] [c00021b4] do_one_initcall+0xe0/0x1a0

[   77.026512] [c46e9e30] [c005a730] load_module+0x18ac/0x1cc0

[   77.032079] [c46e9ee0] [c005ac3c] SyS_init_module+0xf8/0x10c

[   77.037734] [c46e9f40] [c000c23c] ret_from_syscall+0x0/0x3c

[   77.043312] --- Exception: c01 at 0x48061aa8

[   77.043312]     LR = 0x10001a10

[   77.050699] Instruction dump:

[   77.053659] 80df005c 7fa90034 5529d97e 2e090000 54c615fa 4092000c 813d0080 48000008

[   77.061413] 39200000 39400000 0f0a0000 3f80c033 <81290010> 815cc1d8 3c9e4000 5484c9f4

[   77.069342] ---[ end trace fba9b540001cbc0b ]---

[   77.073949]

Can anybody give an advise where to search next?

Thanks in advance.

Markus

Labels (1)
0 Kudos
9 Replies

137 Views
lunminliang
NXP Employee
NXP Employee

Hi,

Which SDK are you using?

There is bug and patch that might be pertinent to this error message:

crypto: caam - remove duplicated sg copy functions

Replace equivalent (and partially incorrect) scatter-gather functions with ones from crypto-API.

The replacement is motivated by page-faults in sg_copy_part triggered by successive calls to crypto_hash_update. The following fault appears after calling crypto_ahash_update twice, first with 13 and then with 285 bytes:

Unable to handle kernel paging request for data at address 0x00000008

Faulting instruction address: 0xf9bf9a8c

Oops: Kernel access of bad area, sig: 11 [#1]

SMP NR_CPUS=8 CoreNet Generic

Modules linked in: tcrypt(+) caamhash caam_jr caam tls

CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted 3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75

task: e9308530 ti: e700e000 task.ti: e700e000

NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0

REGS: e700fb80 TRAP: 0300 Not tainted (3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2)

MSR: 00029002 <CE,EE,ME> CR: 44f92024 XER: 20000000

DEAR: 00000008, ESR: 00000000

GPR00: f9bfcf28 e700fc30 e9308530 e70b1e55 00000000 ffffffdd e70b1e54 0bebf888

GPR08: 902c7ef5 c0e771e2 00000002 00000888 c0019ea0 00000000 00000000 c07a4154

GPR16: c08d0000 e91a8f9c 00000001 e98fb400 00000100 e9c83028 e70b1e08 e70b1d48

GPR24: e992ce10 e70b1dc8 f9bfe4f4 e70b1e55 ffffffdd e70b1ce0 00000000 00000000

NIP [f9bf9a8c] sg_copy+0x1c/0x100 [caamhash]

LR [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash]

Call Trace:

[e700fc30] [f9bf9c50] sg_copy_part+0xe0/0x160 [caamhash] (unreliable)

[e700fc50] [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash]

[e700fcb0] [f954e19c] crypto_tls_genicv+0x13c/0x300 [tls]

[e700fd10] [f954e65c] crypto_tls_encrypt+0x5c/0x260 [tls]

[e700fd40] [c02250ec] __test_aead.constprop.9+0x2bc/0xb70

[e700fe40] [c02259f0] alg_test_aead+0x50/0xc0

[e700fe60] [c02241e4] alg_test+0x114/0x2e0

[e700fee0] [c022276c] cryptomgr_test+0x4c/0x60

[e700fef0] [c004f658] kthread+0x98/0xa0 [e700ff40]

[c000fd04] ret_from_kernel_thread+0x5c/0x64

sdk/linux.git - Freescale PowerPC Linux Tree

Regards

0 Kudos

137 Views
markusstockhaus
Contributor I

Hello,

sorry for not mentioning the software environment. Having the fear to brick my router I'm "only" using an OpenWrt firmware image that is based on stock kernel 3.10.49. So it may lack some recent development.

Nevertheless I implemented your patch but it did not help. So I digged deeper into the reason of the message "platform ffe31000.jr: failed to flush job ring 0" and found some hints:

1. The job ring initialization code inside caam_jr_init() calls caam_reset_hw_jr() for each detected job ring.

2. caam_reset_hw_jr() tries to flush the requested job ring.

3. That fails with the above message and the function quits with -EIO.

4. caam_jr_init() exits directly aftterwards. The DMA initialization part afterwards (dma_alloc_coherent) is ignored totally

5. It might be obvous that this kicks in when calling dma_map_single() afterwards.

But:

1. /proc/interrupts still shows the registered job ring interrupts

2. /proc/crypto still shows the registered hash algorithms

A quick shot would be: I have no (enabled) TSEC hardware in my chip.

But how to check that? Especially if the driver detects some kind of information about the hardware.

Markus

0 Kudos

137 Views
lunminliang
NXP Employee
NXP Employee

Hi,

May I ask how did you test the async sha256?


Have a great day,
Lunmin

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

137 Views
markusstockhaus
Contributor I

Hi Lunmin,

thanks a lot for your help. Sorry for my native view onto it all because I'm no developer. Nevertheless I built up my own OpwenWRT Barrier Breaker build environment. So I'm able to flash any patched Linux 3.10 onto the device.

Regarding the "cfg_io_ports[0:1] selection of eTSEC1" statement. Maybe I do not understand. Where would I be able to make those settings?

Regarding the test: For the software synchronous implementation that is working well Im just doing:

> insmod crypto_hash

> insmod sha256_generic

> insmod tcrypt sec=3 mode=304

This will run several SHA256 test cases. Logs can be seen via dmesg.

Nearly the same applies for caam testing:

> insmod crypto_hash

> insmod caam

> insmod caamhash

> insmod tcrypt sec=3 mode=404 <<< crash here

Mode=404 is for asynch hash.

0 Kudos

137 Views
lunminliang
NXP Employee
NXP Employee

Hi,

For your question "A quick shot would be: I have no (enabled) TSEC hardware in my chip.", do you mean TSEC or SEC?

If you do not have SEC on how can test SHA? Please note, CAAM driver uses device tree to get SEC configuration info, not detect SEC directly.

You test can not be repeated by us unless we have such hardware/software configurations or know how to simulate their test by using our reference boards.

CAAM has multiple kernel modules.The job ring module is a basic one since all cryptographic operations are based on that. The issue is: the job ring flush is either failed or timeout during reset, that led to input/out ring buffers not be allocated. So all following cryptographic operations cannot be executed since SEC descriptors need to be copied into input ring buffer and results be copied into output ring buffer.

This can also be seen from ‘dmesg’:

[  76.837794] Modules linked in: tcrypt(+) caamhash caam ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables gpio_keys crc_ccitt compat booke_wdt ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 dm_crypt dm_mirror dm_region_hash dm_log dm_mod ipv6 algif_skcipher algif_hash af_alg md5 arc4 crypto_blkcipher crypto_hash leds_gpio ehci_platform ehci_hcd fsl_mph_dr_of button_hotplug input_core usbcore nls_base usb_common

The caamhash and caam modules are loaded, but no caam_jr module loaded since ‘caam_jr_init’ failed.

The job ring module source code is straight forward. For ‘reset’, it just writes to job ring command register and waits on the job ring status register.

If this part code has issue it should have been noticed before since job ring rest is a foundation for all cryptographic operations on SEC.

/proc/interrupts still shows the registered job ring interrupts.

>>The job ring interrupts were requested before job ring reset; job ring module loading failure does not dispose interrupt mapping.

/proc/crypto still shows the registered hash algorithms

>>The hash algorithms from SEC were registered to crypto during hash module(caam) loading and it was succes

1: enable ‘CRYPTO_DEV_FSL_CAAM_DEBUG’ at Kconfig. It can give us some debug messages in addition to error messages.

2: please send CAAM source code they are using.

3: what does ‘no (enabled) TSEC hardware in my chip’ mean? For CAAM drivers to work we need to have SEC.


Have a great day,
Lunmin

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

137 Views
markusstockhaus
Contributor I

Thanks a lot for your patience Lunmin.

Maybe we have a slight misunderstanding. My SHA-1/SHA-256 tests were done with software algorithms. But now back to errors during jobring initialization.

As mentioned I have a TP-LINK WDR4900 router. So I guess this is hardware with specs that we two do not know. Im driving OpenWRT Barrier Breaker with Linux 3.10 stock kernel on it.


To simplify the discussion. caam_reset_hw_jr() does not work on my hardware. The detection fails in this loop:

# Send out jobring reset command

wr_reg32(&jrp->rregs->jrcommand, JRCR_RESET);

# wait until reset is complete

while (((rd_reg32(&jrp->rregs->jrintstatus) & JRINT_ERR_HALT_MASK) ==

JRINT_ERR_HALT_INPROGRESS) && --timeout)

  cpu_relax();

# If reset failed, kick out with error

if ((rd_reg32(&jrp->rregs->jrintstatus) & JRINT_ERR_HALT_MASK) !=

  JRINT_ERR_HALT_COMPLETE || timeout == 0) {

  dev_err(dev, "failed to flush job ring %d\n", jrp->ridx);

  return -EIO;

  }

That is basis for my two questions:

- Do you think that this might come because TSEC is optional and is not available in the WDR4900 SoC?

- If it is not builtd in, how could I check that?

Markus

0 Kudos

137 Views
lunminliang
NXP Employee
NXP Employee

Hi,

"Do you think that this might come because TSEC is optional and is not available in the WDR4900 SoC" What do you mean by this?

As wrote before "If you do not have SEC on how can test SHA? Please note, CAAM driver uses device tree to get SEC configuration info, not detect SEC directly."

Regards

0 Kudos

137 Views
markusstockhaus
Contributor I

Hi Lunmin,

according to AN4938 there exist several revisions of the P1010/P1014 SoCs. Some with security features some without. I read out the SVR on the TP-Link WDR4900 and it gives 0x80f10110 => without security.

So I think that this is the reason that job rings cannot be initialized. Can you confirm that my assumption is right?

Markus

0 Kudos

137 Views
lunminliang
NXP Employee
NXP Employee

Hi,

Thanks for the information.

are you using the supported OpenWrt Version, Barrier Breaker 14.07 as specified:

Index of /barrier_breaker/14.07/mpc85xx/generic/

In my opinion, I think the TSEC hardware is enabled. I do not know the implementation details in the OpenWrt image.

There is reset configuration cfg_io_ports[0:1] selection of eTSEC1 and enabling/disabling eTSEC2 controller:cfg_io_ports[0:1] and POR device status register GUTS_PORDEVSR1.

Regards

0 Kudos