I2C transaction gets lost

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

I2C transaction gets lost

958 Views
axels
Contributor I

Dear All,

We are experiencing a problem with the I2C and DMA hardware of the i.MX28.  The problem that we run into is that for some unknown reason a write page is never sent on the I2C bus and the driver never reports any problem with this tranaction.

While communicating with an EEPROM attached to the i.MX28, we noticed that sometimes the returned data read from the EEPROM was messed up.  This doesn't occur always, rather occasionaly, making it rather hard to debug.
After attaching a logic analyzer to an automated test setup, we were able to see that the page number was never written to the EEPROM, causing it to return values starting from a random address.

The strange thing is, that the driver (code is located in build_dir/linux-imx28_customer/linux-2.6.35.3/drivers/i2c/busses/i2c-mxs.c) never returned an error code, and thus – for the software – this transaction was completed successfully.
Next, our parser tried to go through the returned data and failed.

While going through the kernel driver code, we noticed that there were two interrupt handling routines and the fact that the 'default' state of the error code was a success (0).  Both the DMA isr (mxs_i2c_dma_isr) and the I2C isr (mxs_i2c_isr) handle some part of the transaction and both get fired when it's finished (probably always the I2C before the DMA – although this might be different in the failing case...). Both of these isr's unlock the blocked thread that's waiting for the transaction to complete (cmd_complete flag) which means that whichever comes first will unlock the thread.
Only the I2C isr will change the cmd_err flag (not modified by the DMA isr), which means that if the I2C isr is not fired (or a lot later than the DMA one, causing the blocking thread to complete before the cmd_err flag was set (guess)), the cmd_err flag was never updated.
The code has now been updated to start from a pessimistic case (-ENETDOWN – this code should never occur for I2C), and has to be set to 0 by the I2C isr. This is currently our way of detecting this occurrence and allows us to act upon it. This is in no way a fix to remove the root cause of our problem.
The DMA isr is one that seems to be executed always (both in any passing or any failing case), but the work it performs remains limited to finishing up the DMA work. This does however mean that the blocking thread will always be unlocked, and the timeout of 1s will never be triggered.
The recovery mechanism consists of looking at our -ENETDOWN flag when it occurs, and retry for at most 10 attempts. So far, our logs show that only one retry is required.

For a moment it seems we had a way to deal with this problem by this recovering mechanism.  Unfortunately we are running into other devices on the I2C bus which really don't like this retry mechanism and are getting stuck.  So we are forced to keep looking for the root cause or look for another way to handle the I2C communication!

Has anybody had any similar experience with the I2C and DMA hardware of the i.MX28?

We are running a 2.6.35 kernel.

Thanks in advance,

Axel

Message was edited by: Axel Schollaert Had to come back on the idea that we thought the recovery mechanism would save us. It turns out that it causes other problems now with other components on the I2C bus.

Labels (2)
0 Kudos
4 Replies

677 Views
fabio_estevam
NXP Employee
NXP Employee

Please try a 3.13 kernel.

Regards,

Fabio Estevam

0 Kudos

677 Views
MarekVasut
Senior Contributor I

I second to that.

0 Kudos

677 Views
axels
Contributor I

Dear All,

Further investigation of the failure mechanism, and comparisons with more recent kernels has learned us that there were 2 interrupt handlers and a worker thread trying to handle all the I2C stuff.  As a result it is possible that transaction get aborted.

We can close this topic now.

Thanks,


Axel

0 Kudos

677 Views
Yuri
NXP Employee
NXP Employee

You wrote that "the page number was never written to the EEPROM".
Does it mean that the specific transaction sequence for EEPROM data access really is not as described in section

27.2.2.2 (Typical EEPROM Transactions) of the i.MX28 Reference Manual ?

0 Kudos