Endless active error frames on CAN bus by XMC4500 node

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
User20457
Level 1
Level 1
We have multiple XMC 4500 micro controllers connected on a single CAN bus that is controlled by an embedded PC. Under certain stress conditions (in particular if we change the environment temperature between room temperature and 100 °C) we see some of the microcontrollers send out CAN error active frames endlessly. According to the CAN specification bus nodes should go from error active into error passive and finally bus off depending on the error count. This is not what I see, and I wonder why.

My understanding is that the XMC implements the error handling for CAN in hardware according to the CAN specification. How can it be that the XMC sends out error active frames as if there's no error counter? Also in our firmware I couldn't find an explanation for this behavior. How can I make the micro controller stop sending error frames if that happens? Could this behavior be caused by an hardware defect?
0 Likes
5 Replies
jferreira
Employee
Employee
10 sign-ins 5 sign-ins First like received
Hi,

Did you have a look at the content of 18.3.3.3 Error Handling Unit for debugging the issue?
What do you mean by "How can it be that the XMC sends out error active frames as if there's no error counter?" ?

Regards,
Jesus
0 Likes
DRubeša
Employee
Employee
First solution authored First like received
Hello,

can you please provide the snapshot of the register values for the following related registers: NCRx, NSRx, NECNT once the issue occurs. Also, it would be helpful if you could provide the snapshot from the oscilloscope/logic analyzer that shows the error active frame on the bus.
0 Likes
User20457
Level 1
Level 1
Thanks for your responses.

jferreira wrote:
Hi,

Did you have a look at the content of 18.3.3.3 Error Handling Unit for debugging the issue?
What do you mean by "How can it be that the XMC sends out error active frames as if there's no error counter?" ?

Regards,
Jesus


Yes, and the error counters mentioned there are what I'm referring to. It seems that none of the two error counter reaches the error-passive limit of 128 and I don't understand why because the microcontroller never stos sending out error-active frames when it enters the error condition (probably a hardware fault). It just keeps on sending those error-active frames all the time and disturbs the whole bus by doing that.

Regarding the register values and the snapshot: I'm working on that now.
0 Likes
User20457
Level 1
Level 1
Sorry for the late reply ... I don't have the register state yet, but I can provide you the scope snapshot already: 4778.attach
0 Likes
User20457
Level 1
Level 1
Again, sorry for the late reply. It wasn't easy to capture the register state while this issue happened, but now I managed to do it. I can provide you detailed information about the register states when this error happens.

First the captured logs from when errors get handled as expected (all values in hex):

LEC Stuf NCR 00C NSR 019 NECNT 00600001 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Stuf NCR 00C NSR 019 NECNT 0060000C STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Stuf NCR 00C NSR 019 NECNT 00600018 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Stuf NCR 00C NSR 019 NECNT 00600023 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Stuf NCR 00C NSR 019 NECNT 0060002F STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Stuf NCR 00C NSR 019 NECNT 0060003A STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002

One can clearly see how NECNT changes its value. After some time the uC enters error-active mode again and the bus communication continues to work again.

Now the case of the "endless error frames". The microcontroller seems to report them as LEC=stuffing error. However, when NECNT reaches 00600088 (=REC=136 and TEC=0) it stays there and doesn't change anymore. The same happens with other XMC microcontrollers on the CAN bus that are listening on the bus.

LEC Stuf NCR 00C NSR 019 NECNT 00600001 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
LEC Stuf NCR 00C NSR 079 NECNT 00600087 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
LEC Stuf NCR 00C NSR 079 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)
Alert NCR 00C NSR 079 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
REC_TEC_EWRN NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(2x more)
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)

...

LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(2x more)
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)
LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001AE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 05040002
(1x more)


When a microcontroller tries to transmit something the TEC quickly goes up after it was able to send a frame:


LEC Stuf NCR 00C NSR 059 NECNT 00600088 STAT0 200016E0 STAT32 20001EA0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 08040002
NO_ERROR NCR 00C NSR 058 NECNT 0060007F STAT0 200016E0 STAT32 20001EA0 FGPR0 00000000 FGPR32 20202020 FCR0 02040001 FCR32 08040002
LEC Stuf NCR 00C NSR 059 NECNT 00600080 STAT0 200016E0 STAT32 20001EA0 FGPR0 00000000 FGPR32 20202020 FCR0 02040001 FCR32 08040002
LEC Stuf NCR 00C NSR 059 NECNT 00600080 STAT0 200016E0 STAT32 20001FE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Bit1 NCR 00C NSR 05C NECNT 03605080 STAT0 200016E0 STAT32 20001FE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002
LEC Bit1 NCR 00C NSR 07C NECNT 0360B080 STAT0 200016E0 STAT32 20001FE0 FGPR0 00000000 FGPR32 20202020 FCR0 00040001 FCR32 06040002


Apart of the stuffing errors I also see Bit1 and Bit0 errors sometimes. As a result of all this I see this endless pattern of error frames on the bus that only stops when I reset the microcontrollers. I always see when this happens that the Rx error counter REC stays at 0x88 (136) when all these stuffing errors get detected. Shouldn't it increment? And which node sends all these error frames that get interpreted as stuffing errors?
0 Likes