XMC4500 multiCAN, getting MSGLST in a RXFIFO configuration

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
User6793
Level 4
Level 4
Hello.

Just wanted to check if anyone else is seeing this problem.

We have set up a Gateway node connecting 2 CANnets.
Since the HW Gateway has it quirks http://see https://www.infineonforums.com/threads/5360-Problems-with-FIFO-Gateway-on-XMCs-4400-Multi..., we opted for setting up 4 RX FIFOs and let the irq routines do the data shuffling.

This scheme works almost OK, but we are experiencing MSGLST from a MO in the RXFIFO even though the FIFO is not full.
The CAN_RXOF irq is enabled (and tested OK), but is not asserted in these MSGLST cases.
Fault rate is about 1-2 messages in about 40 millions, so the FIFOs wrapps around numerous times. We also have a FIFO highwater-mark monitor, and it shows that the FIFO is filled with max 2 unread messages at the time MSGLST is set.
FIFO sizes vary from 8 to 20 MOs.
Have also implemented a test command where the RX irq routine can skip n readouts, hence filling the FIFO, and that also works fine. Next irq that reads, empties the FIFO.

We are using the MSIMASK/MSID HW registers to find out which FIFO to read from.

In all the multiCAN examples we have come across, there is no examples on how to empty multiple RXFIFO using the MSIMASK/MSID HW registers.

Hoping someone can chime in on this one.

Here is our irq routine:

    bool receive(CanMessage* msg)
{
uint32_t pendingIndex;
int i;
// printf("%s:\n", m_owner);
// printf("MSPND: %08x %08x\n", CAN->MSPND[1], CAN->MSPND[0]);
// printf("MSIMASK: %08x %08x\n", m_msimask[1], m_msimask[0]);

if (m_irqOwner != (uint32_t)(PPB->ICSR & PPB_ICSR_VECTACTIVE_Msk) - 16) {
Can *can = Can::getInstance();
can->s_wrongIRQ++;
return false;
}

for (i=0; i<2; i++) {
CAN->MSIMASK = m_msimask;
pendingIndex = CAN->MSID;
// printf("%-15s[%d] %d\n", "pendingIndex", i, pendingIndex);

if (pendingIndex == CAN_MSI_NO_PENDING) {
if (i == 1) {
return false;
}
else {
continue;
}
}
CLR_BIT(CAN->MSPND, pendingIndex);
pendingIndex += 32*i;
break;
}
// printf("%-15s %d\n", "pendingIndex", pendingIndex);
uint8_t baseIndex = m_messageObjects.front().getNumber();
XMC_CAN_MO_t* baseMoPtr = m_messageObjects.front().getMoPtr();
XMC_CAN_FIFO_SetSELMO(baseMoPtr, pendingIndex);
//printf("%-15s %d\n", "baseIndex", baseIndex);
return m_messageObjects[pendingIndex - baseIndex].receive(msg);
}

bool receive(CanMessage* msg)
{
uint32_t mo_message_lost = (uint32_t)((m_xmcCanMo.can_mo_ptr->MOSTAT) & CAN_MO_MOSTAT_MSGLST_Msk) >> CAN_MO_MOSTAT_MSGLST_Pos;
checkAgain:
m_xmcCanMo.can_mo_ptr->MOCTR = CAN_MO_MOCTR_RESNEWDAT_Msk | CAN_MO_MOCTR_RESRXPND_Msk; // reset NEWDAT & RXPND

if ((((m_xmcCanMo.can_mo_ptr->MOAR) & CAN_MO_MOAR_IDE_Msk) >> CAN_MO_MOAR_IDE_Pos) == 1U) { // 29-bit ID
uint32_t identifier = (m_xmcCanMo.can_mo_ptr->MOAR & CAN_MO_MOAR_ID_Msk);
newCanMessage(msg, identifier & 0x000000ff);
msg->size = (uint8_t)((uint32_t)((m_xmcCanMo.can_mo_ptr->MOFCR) & CAN_MO_MOFCR_DLC_Msk) >> CAN_MO_MOFCR_DLC_Pos);
msg->identifier = identifier;
msg->nodeNum = (identifier & 0x0000ff00) >> 8;
uint32_t* dataPtr = reinterpret_cast(msg->data);
if (msg->size > 0) {
dataPtr[0] = m_xmcCanMo.can_mo_ptr->MODATAL;
}
if (msg->size > 4) {
dataPtr[1] = m_xmcCanMo.can_mo_ptr->MODATAH;
}
}

uint32_t mo_new_data_available = (uint32_t)((m_xmcCanMo.can_mo_ptr->MOSTAT) & CAN_MO_MOSTAT_NEWDAT_Msk) >> CAN_MO_MOSTAT_NEWDAT_Pos;
uint32_t mo_recepcion_ongoing = (uint32_t)((m_xmcCanMo.can_mo_ptr->MOSTAT) & CAN_MO_MOSTAT_RXUPD_Msk) >> CAN_MO_MOSTAT_RXUPD_Pos;

if ((mo_new_data_available) || (mo_recepcion_ongoing)) {
Can::s_checkedAgain++;
goto checkAgain;
}

if (mo_message_lost) {
m_xmcCanMo.can_mo_ptr->MOCTR = CAN_MO_MOCTR_RESMSGLST_Msk; // reset lost bit
if (Can::s_gatewayEnabled) {
if ((m_number>=CAN_RX12_MOSTART) && (m_number<=CAN_RX12_MOEND)) {
Can::s_lostCntr1++;
} else {
if ((m_number>=CAN_RX2_MOSTART) && (m_number<=CAN_RX2_MOEND)) {
Can::s_lostCntr2++;
}
}
} else {
if ((m_number>=CAN_RX_MOSTART) && (m_number<=CAN_RX_MOEND)) {
Can::s_lostCntr1++;
}
}
// panic_printf(PANIC_USER, "CAN MSG LOST, MO%02d, idle %d%%", m_number, getIdlePercent());
//// panic_printf(PANIC_USER, "LOST MO%02d %08x %08x", m_number, CAN->MSPND[1], CAN->MSPND[0]);
//// printf("\n\nLOST MO%02d %08x %08x\n\n", m_number, CAN->MSPND[1], CAN->MSPND[0]);
// return false;
}


// if (mo_recepcion_ongoing) {
// panic_printf(PANIC_USER, "CAN MSG RX BUSY, MO%02d", m_number);
// }

return true;
}


(Also opened a support case: Case:3740999)
0 Likes
7 Replies
Not applicable
I've done a bunch of testing of CAN code, running packets for days straight, without issues. Although I'm not using the DAVE libraries; we're using our own code. But we haven't seen problems of dropped packets.

One thing that's noticeable about your code is you're doing a lot in the ISR, including changing variables which seem to be part of some other classes. I assume they're declared as "volatile"? But it makes me wonder if perhaps what you're seeing is some race condition. Is it possible that in some, rare, condition, the ISR triggers at just the right time to confuse some variable (ie, in the middle of a read - modify - write operation) and that's producing the symptom you're seeing?
0 Likes
User6793
Level 4
Level 4
Seems like we smoked this one out 🙂
Problem was our FIFO MO allocation and the use of the MSIMASK register to obtain a pendingIndex from MSID.
We had 4 RX FIFOs (and 2 TX MOs. MO0 & MO32), 2 FIFOs had allocated 8 MOs and 2 had 20 MOs.
They were allocated sequentially from the pool, and one FIFO ended up with MOs in the upper 0-31 range and the lower 32-63.
Hence, we checked twice for an pending index and there is were it went wrong. Some messages (~1 per million) was not read by the irq routine, and produced a MSGLST when the FIFO wrapped into it.
Just re-allocating the MOs so that no FIFO belonged to both the 0-31 and the 32-63 range fixed the problem. MSIMASK produced a consistent pending index in just one MSI[] register.
Our FIFO allocation now looks like this:

CAN RX FIFO: CAN_RX          length:  8, CAN1, filterID: 03002300, filterMask: FFFFFF80, msiMask: 00000000 000001FE, irq: 7, irqOF: 6
CAN RX FIFO: CAN_RX2 length: 23, CAN2, filterID: 00000000, filterMask: 00000000, msiMask: 00000000 FFFFFE00, irq: 1, irqOF: 6
CAN RX FIFO: CAN_RXGL length: 8, CAN1, filterID: 03000000, filterMask: FFFFFF80, msiMask: 000001FE 00000000, irq: 4, irqOF: 6
CAN RX FIFO: CAN_RX12 length: 20, CAN1, filterID: 03000000, filterMask: FFFF0080, msiMask: 1FFFFE00 00000000, irq: 3, irqOF: 6
0 Likes
Not applicable
Good catch.
0 Likes
User6793
Level 4
Level 4
This was not the silver bullet we hoped for.
Our problem with 'false' MSG_LOST continued.
Then we realised we were on version 2.1.8 of XMClib.
Stepping to 2.1.18 with the change introduced to void XMC_CAN_MO_Config(const XMC_CAN_MO_t *const can_mo) seemed to do the trick.

Problem was that TX events was enabled on RX MOs and visa versa. That was why this problem was only an issue on our gateway nodes. They have a lot of continuous TX & RX, on both the CAN1 & CAN2 modules.
0 Likes
User6793
Level 4
Level 4
And still we are getting MSG_LOST a few times every day. (SW FIFO scheme)
Unsure of what to try next.

We have a FIFO high-water mark logging scheme and it never uses more than 1 MO.
We have a test were we can skip n IRQ readings of the FIFO, so that it gets filled. It is emptied OK on the n+1 IRQ.
We have a IRQ time measurement. It is always in the 14-20us range.

Fault is only seen on the gateway node with 'simultaneous' TX and RX on both CAN1 & CAN2.
On nodes with just CAN1 it is not a problem.

We use systems with 4-7 gateway nodes, and the problem appears on a random node(s) and stays on that node. Other nodes are OK.

Any suggestions on how to proceed would be most welcome.
0 Likes
SunYajun
Employee
Employee
10 replies posted 5 replies posted Welcome!
By using gateway with FIFO, normally one message object is configured as RxMO on gateway source side, and a couple of message objects are configured as TxFIFO MOs on gateway destination side.
the 4 RxFIFO mentioned here should be 4 TxFIFO on gateway destination side, right ?
regarding your problem “… experiencing MSGLST from a MO in the RXFIFO even though the FIFO is not full.” please see replies for in https://www.infineonforums.com/threads/5360-Problems-with-FIFO-Gateway-on-XMCs-4400-Multican?
each message object has 2 interrupt triggers (TxOk and RxOk interrupt). in case when a TxFIFO structure is used, the overflow interrupt (water level interrupt) of the TxFIFO base object is generated on RxOk interrupt of this base object, also 2 interrupt sources share on one interrupt trigger line.
extract from XMC user’s manual: If bit field MOFCRn.OVIE (“Overflow Interrupt Enable”) of the FIFO base object is set and the current pointer CUR becomes equal to MOFGPRn.SEL, a FIFO overflow interrupt request is generated. The interrupt request is generated on interrupt node RXINP of the base object after postprocessing of the received frame. Receive interrupts are still generated for the Transmit FIFO base object if bit RXIE is set.
MSGLST indicates lost/or overwritten situation, but it can’t trigger interrupt.
the AppNote AP32300 has a detailed description about FIFO/Gateway with init. code, maybe it can help. https://www.infineon.com/dgdl/Infineon-MultiCAN-XMC4000-AP32300-AN-v01_00-EN.pdf?fileId=5546d4624e76...
0 Likes
User6793
Level 4
Level 4
Thanks for chiming in on this:

By using gateway with FIFO, normally one message object is configured as RxMO on gateway source side, and a couple of message objects are configured as TxFIFO MOs on gateway destination side.
the 4 RxFIFO mentioned here should be 4 TxFIFO on gateway destination side, right ?


We are not using the Multican HW gateway feature. We use 3 RX FIFOs on CAN1 (that we call the main-bus) each with different acceptance masks, and 1 RX FIFO on CAN2(the sub-bus) . CAN1 and CAN2 each uses just a single TX MO, were the buffering is done using freeRTOS queues.
Each RX FIFO base MO is set to generate its own IRQ, so we have 4 RX IRQs, 2 TX IRQs and RXOF & ALERT IRQs, 8 in total.

regarding your problem “… experiencing MSGLST from a MO in the RXFIFO even though the FIFO is not full.” please see replies for in https://www.infineonforums.com/threa...4400-Multican?


I will check that out.
Edit: this was HW Gateway related and the reason why we abandoned it and went for a 'manual' data shuffling scheme. (what we call the SW Gateway)

each message object has 2 interrupt triggers (TxOk and RxOk interrupt). in case when a TxFIFO structure is used, the overflow interrupt (water level interrupt) of the TxFIFO base object is generated on RxOk interrupt of this base object, also 2 interrupt sources share on one interrupt trigger line.


We are not using a TxFIFO structure as such, just 1 single MO for CAN1 and one for CAN2.

extract from XMC user’s manual: If bit field MOFCRn.OVIE (“Overflow Interrupt Enable”) of the FIFO base object is set and the current pointer CUR becomes equal to MOFGPRn.SEL, a FIFO overflow interrupt request is generated. The interrupt request is generated on interrupt node RXINP of the base object after postprocessing of the received frame. Receive interrupts are still generated for the Transmit FIFO base object if bit RXIE is set.
MSGLST indicates lost/or overwritten situation, but it can’t trigger interrupt.


We have enabled both the RXOF & the ALERT IRQs and they never bark. On every RX IRQ we check how the FIFO high water mark, and it is always just 1. Still, since every RX message makes the FIFO step through all MOs one indicates MSGLST. It can happen on both CAN1 and CAN2. It never happens on nodes using just CAN1. It can work fine for hours with varying busload, but when it happens we have confirmed that the message was actually lost.

the AppNote AP32300 has a detailed description about FIFO/Gateway with init. code, maybe it can help. https://www.infineon.com/dgdl/Infine...4ed91d6be32110


I will check this out too.
Edit: I have been through these examples before and could not find a good example applicable to our setup.
Keep in mind that it can work OK for hours with 60-70% bus-load between MSGLSTs.
What we fear, is that this is a HW problem in the chip, caused by simultaneous events on CAN1 and CAN2 operating on the common MO pool.
0 Likes