VIU3 to I2C SCL interference?

kef2 · ‎01-23-2014

Hi

I have some weird I2C2 SCL clock issue. Please see attached oscillogram.

Digital chn 0 and analogue chn 1 is I2C2 SCL signal

Digital chn 1 - I2C2 SDA

Digital chn 2 - one of VIU3 pixel data inputs

Digital chn 3 - VIU3 HSYNC input

Digital chn 5 - VIU3 PIXEL_CLK input

It seems SCL high pulse can be abandoned by VIU3 pixel data. No matter what's pixel clock, is it 10MHz or 40MHz, SCL still may get broken. I have 3rd party CMOS camera module attached to TVR-VF65GS10 card. This camera module for some reason passes SCL through integrating RC circuit with R=22 Ohm, C=680pF. That's why SCL rising edges are quite round. Issue persists even if I reduce SCL frequency to <10kHz (which I think should make rising time relatively short to be relevant). Luckily I have workaround for this problem, making SCL pin fully driven and not open drain solves the issue.

What's worse, once such short SCL pulse is generated, I2C module seems entering some weird state. For example sometimees I see bus busy in status register, though I2C and SCL both are idling high and MSSL=0.

Am I wrong in thinking that once I2C wins arbitration, nothing on SCL or SDA should be able to shorten SCL-high pulses. Isn't it? SCL-low can be stretched by slave device, but SCL-high shouldn't be affected, right?

If it matters here's connection list:

// cmos_pixclk pci_primary_a8 ptb15

// cmos_hsync pci_primary_b40 ptb6 ? J23 remove jumper from pin 4, SCI2_TX; or keep unused

// cmos_vsync pci_primary_b39 ptb7 J24 remove jumper from pin 4, SCI2_RX; or keep unused

// cmos_d0 pci_primary_a19 ptc3

// cmos_d1 pci_primary_a20 ptc4

// cmos_d2 pci_primary_b13 ptc5

// cmos_d3 pci_primary_b19 ptc6

// cmos_d4 pci_primary_b20 ptc7

// cmos_d5 pci_primary_b46 ptb19

// cmos_d6 pci_primary_b44 ptb20

// cmos_d7 pci_primary_b45 ptb21

// cmos_d9 pci_primary_a22 pta16 don't enable trace d0 14

// cmos_d9 pci_primary_a39 ptb1 yellow LED 19

// cmos_d10 pci_primary_a38 ptb2 yel/grn LED

// cmos_d11 pci_primary_a37 ptb3 or/red LED 21

I2C2 SCL PTA22

I2C2 SDA PTA23

Any ideas?

Thanks

richard_stulens · ‎02-19-2014

Hi Edward,

Thanks for the feedback. Looks like the camera is built to try to avoid cross coupling.

We have found through simulation that a glitch (or spike) can actually trigger the observed behavior. We are now investigating further.

If you ever see an opportunity to capture the rising edge of SCL for the error case, that would still be helpful as it will give us some data to match against.

Best regards,

Richard

View solution in original post

kef2 · ‎06-18-2014

Since my prior post, I did try Edward's work-around of selecting high drive strength

There's no HDRS bit or ISC0 register on Vybrid. There are 3 drive strength selection bits (DSE) in each IOMUXC pad control register. But no, high drive strength didn't work for me too. What works for me is setting up SCL pin as a push-pull driven instead of open drain (ODE=0 in pad control register). This would pose problems using slow I2C slaves (slow slaves could keep SCL driven low, I2C clock stretching), but in my case it works well. I added series resistor to limit SCL current in case Vybrid would drive SCL high and slave would drive SCL low at the same time.

I get these glitches at random intervals, but on the average they are a little more frequent than once a second. I attempt to read from the bus every 25 ms, so I get a glitch that causes a bad read about every 35 reads, or a bit less than once every 2800 clock cycles.

It is not clear a bit. Could you recover once glitch happens? I found no way to recover except resetting whole chip... Perhaps there's a difference between function of our slave devices. I use both reads and writes to slave, and writes are more important. I'd like to recover after write glitch happens and try again, but I found no way for it. It is scary a bit and this is why I keep using ODE=1 workaround. It is not clear how long I would need to test SCL filter solution before saying my self that problem won't ever happen...

avm · ‎06-19-2014

Edward,

Thanks for the reply and additional information. No HDRS bit? Then perhaps the I2C controllers aren't completely similar between devices.

I looked closer at the Kinetis port pin control registers, and I do see the drive strength and open collector control bits in there. (I knew one could set drive strength, but I didn't realize one could manually control open collector for any pin - nice feature!) I'll give those a try tonight after dinner -- maybe I'll be able to cancel tomorrow's debug session with the board designer, that would save me over an hour of driving time (not to mention fuel!)

My slave device does not lock up after a read error. At least not most of the time. This all came about based on reports from my customer that every so often the touchscreen will lock up and be unresponsive (on the order of weeks between events.) We were finally able to get on the phone together while this was happening, and after a long remote debugging session I had convinced myself that it was a hardware problem, not software. So I started digging deeper, realized there were read errors, realized there were glitched clock pulses, and the rest is history.

Another team of ours using a very similar board and screen setup has occasionally seen bus lockups, and have been able to successfully reset the slave chip to recover. I tried that, but I am seeing so many bus errors that I was spending more time resetting the chip than I was running, and it brought my system to its knees.

I'd like to get the frequency of these errors knocked down. Then, if low enough it may be safe to reset whenever there is an error, or perhaps I will count errors and if there are too many per second I'll assume the bus finally locked up and reset it.

Again, thanks for the tips, and thanks for starting this thread!

-- Adam

avm · ‎06-19-2014

Well, I was going to try this tonight, but my fly-wire on the SCL line fell off, and after staring at this computer screen for almost 10 hours today, I'm too blearly-eyed to squint at the tiny parts and re-solder the connection. But at least I did confirm that the current setup on the pin (according to the port control register) is low drive strength and open collector. So it's definitely worth giving this a try.

Besides, I share Edward's fear of the processor trying to pull the clock high while the slave is trying to extend it low. I'm going to keep my appointment with the board designer: there is currently a zerOhm in-line with the SCL line, near the slave device. So if nothing else, I'll swap that out with an appropriate series resistor (I don't have a stock of 0603 resistors here at home.) Then, if it works, I'll be able to discuss it with the hardware guy and figure out how he'll update the 5 prototype units that are at the customer's site at the other end of the state. (I love it when something becomes someone else's problem!)

naoumgitnik · ‎07-14-2014

Dear Adam and kef, kef2,

If not too late, we have some news for you regarding the potential workaround for the issue:

...a noise spike can cause the I2C bus to lock up...

A problematic scenario can arise if the processor/I2C module gets reset while it is in the middle of mastering a transfer. In this scenario, the external slave might be holding SDA low to transmit a 0 (or ACK). In this case, it will not release SDA until it gets another falling edge on SCL. Even in this case it is not until it tries to transmit a '1' that it will actually release SDA after seeing SCL fall. The end result is that the bus will hang. If the I2C tries to initiate a new transfer, it will hit an "arbitration lost" condition because SDA won't match the address it's sending. There are a couple ways to recover from this scenario:

1: For devices that mux the SCL/SDA pins with GPIO (e.g.,Vybrid), the easiest thing is to configure the pins for GPIO operation and toggle SCL until the slave releases SDA. At this point you should be able to resume normal operation.

2: Many devices don't mux SCL/SDA with GPIO, since the I2C I/O cells are often special open-drain cells. A workaround has been reported to work even on these devices. By configuring the I2C for "free data format" and then reading a byte, the I2C will immediately start sending clocks to input data (rather than trying to send an address). This can be used to free up the bus.

As of now, we got confirmation that at least in one of the similarly looking cases the workaround #1 helped.

Sincerely, Naoum Gitnik.

kef2 · ‎07-14-2014

Naoum.

thanks for trying, but what you provided is known I2C issue, common to (nearly?) all I2C devices, which has nothing to do with my issue. It is not enough to #1 or #2 to recover from missing SCK pulse caused slave_is_driving_SDA_low. Even when I was disconnecting slave, Vybrid still could report bus busy and refuse to communicate...

naoumgitnik · ‎07-15-2014

Thanks for your quick reply, Edward.

Just wanted to let you know, just in case…

[update]

BTW, regarding your "no way to recover except resetting whole chip... " phrase - another user was able to recover from the I2C "lock-up" by applying one of the workarounds I proposed (yes, the known ones for virtually any I2C block on the market) + resetting the I2C block (not the entire chip!) by using the MDIS bit in the I2Cx_IBCR register.

(If this does not work for you, than the operation corruption in your case is much more severe...)

/Naoum.

[avm]

naoumgitnik · ‎06-18-2014

Dear Adam,

As per our investigation for Vybrid, the only cure here is low-pass filtering - RC, LC, or series ferrite bead, depending on your design specifics.

Regards, Naoum Gitnik.

avm · ‎06-18-2014

Naoum,

Thanks for the fast reply. I was afraid you were going to say that. But at least I take it that you think it's the same issue?

Since my prior post, I did try Edward's work-around of selecting high drive strengh (setting the HDRS bit in ISC0_C2) but sadly it made no difference for me.

-- Adam

naoumgitnik · ‎06-18-2014

Hello Adam,

Unfortunately, I am not an IC designer to comment on whether the 2 processors share the I2C block design (it might be, BTW…).

As a board-level design engineer currently acting as a Vybrid Applications/Support person, I may only, in addition to the previous comment, advise you to better separate the I2C signals from the LCD ones, e.g., if you are using a ribbon cable, you for sure have GND return wires in it - try placing one (or even several) of them as a separator.

The Kinetis family is supported by another team (Alistair N Muir , it might make sense for you to know about this issue).

Adam, you might think about posting this issue in the Kinetis Community space to possibly get Kinetis-specific feedback from all the Community members, including the customers.

Sincerely, Naoum Gitnik.

kef2 · ‎02-12-2014

Hi Richard,

Thanks. Yes, I know that slow edge may generate bunch of pulses behind digital input buffer, but I'm sure I2C hardware just has not to cancel SCL-high, no matter what happens. My final HW will have steeper rising SCL edges. Even if it would make it looking cured, I won't leave SCL open drain, because I'm not guaranteed what max rising time is 100% safe. Please let me know if you find such figure or explanation for what's wrong with Vybrid I2C. Thanks and best regards

Edward

richard_stulens · ‎02-12-2014

Hi Edward,

Indeed, the SCL high cycle should not be cut short. On the other hand, that I2C module is used on many other parts and is in millions of products. Never seen such an issue.

We have too little information to perform a good analysis. We do not know what is triggering this., we do not know if it is a glitch. I have been assuming this, but it has not been seen on a scope image. It may be the remote device that pulls SCL low instead of a glitch. There are multiple possibilities and I have no way of knowing what it is. I do not have a setup that behaves in this way.

We do know that it has to do with pixel data, but we do not know if Vybrid has a problem with that or if it is the slave device.

For example, we have a parallel display with 24-bit RGB data on the tower system. That also generates pixel data and simultaneously switching outputs. I2C is used for the touchpad, but we have not had any reports of I2C issues.

So, I have to rely on your inputs.

I appreciate any help you can give in debugging this issue, but I will also understand if you cannot spend more time on this. Just let me know and I'll stop asking.

If you have more than 1 setup that has this issue, could we have one to measure and investigate further?

Thanks & regards,

Richard

kef2 · ‎02-12-2014

Hi Richard

It can't be remote device! Yes, remote device could pull SCL low, but it can't make master (Vybrid) producing standard length SCL-low after cancelled and short SCL-high. SCL-low would need to be about 2x longer then what I see after incomplete SCL-high. Also it doesn't explain why I can't recover after cancelled SCL-high. After wrong SCL pulse appears, status register may report bus busy, though open drain SCL and SDA idle high! I saw different odd states, but in all cases I couldn't send more data until reset. Unfortunately I can't spend more time on this, unless you have some knew good idea.

I'm curiuos to know what would happen if you add capacitor to SCL in your parallel display design. If it is slow rise time issue, I guess you'll see the same problem. And if it so, then there must be some minimum time, though it's odd that lowering various clocks i found no way to avoid the issue.

Regarding my setup I'll write you personal message.

Thanks

Edward

richard_stulens · ‎02-12-2014

Hi Edward,

I really appreciate the time you spent on investigating this. I have no new ideas to test at this time.

I will check with the team on how we want to proceed. Maybe we can setup a simulation to verify some theories.

If we find a problem, it will be published in the Errata.

Thanks & regards,

Richard

kef2 · ‎02-19-2014

Hi Richard,

I've a small update, perhaps it matters, no good news though. I found that camera chip has outputs slew rate control register, which was left in the middle speed position. I tried with slowest rate settings, which seem to be too slow, since image quality is poor with this setting. Next to slowest setting makes image clocked in properly. Both slew rate settings don't make I2C things better. I didn't mention it above, but in addition to slew rate control settings, all camera data, sync and pixel clock wires have series 22 Ohm resistors.

richard_stulens · ‎02-19-2014

Hi Edward,

Thanks for the feedback. Looks like the camera is built to try to avoid cross coupling.

We have found through simulation that a glitch (or spike) can actually trigger the observed behavior. We are now investigating further.

If you ever see an opportunity to capture the rising edge of SCL for the error case, that would still be helpful as it will give us some data to match against.

Best regards,

Richard

kef2 · ‎02-20-2014

Hi Richard,

thanks. Capture could be taken but I see nothing interesting or new on it, just RC step reponse. RC components are as follows.

Best regards,

Edward

naoumgitnik · ‎02-20-2014

Gentlemen,

Sorry for jumping in with my 2 cents, but the R6*C2 time-constant is about 1 microsecond. Might it influence the interface performance or not..?

Regards, Naoum Gitnik.

kef2 · ‎02-20-2014

Naoum,

Yes, most likely it influences results. But something tells me that shorter R*C will just lower probability of lock of I2C module, but won't eliminate it completely. Anyway <<10kHz SCL clock should be, but unfortunately isn't OK with 1us rise time.

Regards,

Edward

richard_stulens · ‎02-20-2014

Edward, Naoum,

I think it will probably help filtering glitches on the camera side, but because it reduces the edge rate, it makes the Vybrid side more sensitive.

It may help to put a similar cap on the vybrid side, even though the edge will be even slower, it may keep the glitches out.

Best regards,

Richard