Nov 21, 2019
03:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nov 21, 2019
03:31 PM
Here's a list of bugs I have found in the iLLD Ethernet driver:
1. If the interface is configured as RMII, the code fails to configure the SMI (MDC/MDIO) pins.
2. The pin mux tables contain incorrect mux data for Ethernet for some pins.
So if the SMI interface is not working, the pin mux table for the pins you are using must be manually verified.
3. IfxEth_wakeupTransmitter() and IfxEth_wakeupReceiver() are broken.
They do not start the transmitter/receiver if the transmitter/receiver are in the STOPPED state after reset.
4. The Tricores have a data cache.
The Ethernet GMAC uses DMA to access descriptors and buffers.
The Ethernet GMAC DMA is not cache-coherent.
The Ethernet drivers do not account for this.
Helpful tip:
1. You MUST clear ALL bits in the ETH_STATUS register before exiting from the Ethernet interrupt handler.
If ANY bits are left set, it will prevent the interrupt from being triggered again.
This should provide a starting point for fixing the Ethernet code if you are using it.
I do not work for Infineon, so do not post or send me questions about this.
I will not provide tech support for any problems mentioned in this post.
Toshi
1. If the interface is configured as RMII, the code fails to configure the SMI (MDC/MDIO) pins.
2. The pin mux tables contain incorrect mux data for Ethernet for some pins.
So if the SMI interface is not working, the pin mux table for the pins you are using must be manually verified.
3. IfxEth_wakeupTransmitter() and IfxEth_wakeupReceiver() are broken.
They do not start the transmitter/receiver if the transmitter/receiver are in the STOPPED state after reset.
4. The Tricores have a data cache.
The Ethernet GMAC uses DMA to access descriptors and buffers.
The Ethernet GMAC DMA is not cache-coherent.
The Ethernet drivers do not account for this.
Helpful tip:
1. You MUST clear ALL bits in the ETH_STATUS register before exiting from the Ethernet interrupt handler.
If ANY bits are left set, it will prevent the interrupt from being triggered again.
This should provide a starting point for fixing the Ethernet code if you are using it.
I do not work for Infineon, so do not post or send me questions about this.
I will not provide tech support for any problems mentioned in this post.
Toshi
- Tags:
- IFX
5 Replies
Nov 25, 2019
10:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nov 25, 2019
10:41 AM
Hi Toshi. It's important to note that DMA is never cache-coherent on the AURIX. That holds true for Ethernet DMA and the regular DMA controller.
I usually suggest that CPUx_PMA0 should be set to 0x100 (cache PFLASH segment 8), rather than the default 0x300 (segment 8 and LMU segment 9). That way, access to constants in PFLASH is still boosted with the data cache, but there are no cache coherency issues between CPU cores or with DMA.
I usually suggest that CPUx_PMA0 should be set to 0x100 (cache PFLASH segment 8), rather than the default 0x300 (segment 8 and LMU segment 9). That way, access to constants in PFLASH is still boosted with the data cache, but there are no cache coherency issues between CPU cores or with DMA.
Nov 26, 2019
12:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nov 26, 2019
12:33 AM
You can use also segment 0xB for accesses from CPU to LMU. This segment is uncached, this means each access to segment 0xB will go to the destination independent of the cache settings.
Nov 26, 2019
01:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nov 26, 2019
01:45 PM
UC wrangler wrote:
> Hi Toshi. It's important to note that DMA is never cache-coherent on the AURIX.
> That holds true for Ethernet DMA and the regular DMA controller.
The DMA not being cache-coherent is not an AURIX architecture issue per se.
This is due to a design choice made on current AURIX implementations not to support bus snooping on DMA access.
So stating "DMA is never cache-coherent on the AURIX" is an overly broad blanket statement.
There is nothing in the AURIX architecture which precludes future implementations from supporting cache coherency.
> I usually suggest that CPUx_PMA0 should be set to 0x100
If I understand correctly, this disables caching of RAM.
Since LMU/EMEM accesses go over the crossbar, this probably increases load/store latency by two or three clocks due to external bus access and crossbar arbitration.
Based on my previous experience performing dynamic instruction set analysis at a previous job, the typical dynamic instruction mix for non-numeric applications is about 20% branches, 25% load/stores, and 55% ALU instructions.
Assuming a 25% load/store instruction mix and a three-clock load/store penalty, this will result in a performance penalty of 75% + (25% * 3) = 150%.
Stated otherwise, the code will probably run about 1.5x slower if caching is disabled for RAM as you recommend.
So in my opinion, this is a sloppy way of solving the cache coherency issue because the performance penalty is very high.
It is much better to selectively place the data structures which are shared between the CPU and DMA in an uncached address range to avoid incurring this performance penalty.
Toshi
> Hi Toshi. It's important to note that DMA is never cache-coherent on the AURIX.
> That holds true for Ethernet DMA and the regular DMA controller.
The DMA not being cache-coherent is not an AURIX architecture issue per se.
This is due to a design choice made on current AURIX implementations not to support bus snooping on DMA access.
So stating "DMA is never cache-coherent on the AURIX" is an overly broad blanket statement.
There is nothing in the AURIX architecture which precludes future implementations from supporting cache coherency.
> I usually suggest that CPUx_PMA0 should be set to 0x100
If I understand correctly, this disables caching of RAM.
Since LMU/EMEM accesses go over the crossbar, this probably increases load/store latency by two or three clocks due to external bus access and crossbar arbitration.
Based on my previous experience performing dynamic instruction set analysis at a previous job, the typical dynamic instruction mix for non-numeric applications is about 20% branches, 25% load/stores, and 55% ALU instructions.
Assuming a 25% load/store instruction mix and a three-clock load/store penalty, this will result in a performance penalty of 75% + (25% * 3) = 150%.
Stated otherwise, the code will probably run about 1.5x slower if caching is disabled for RAM as you recommend.
So in my opinion, this is a sloppy way of solving the cache coherency issue because the performance penalty is very high.
It is much better to selectively place the data structures which are shared between the CPU and DMA in an uncached address range to avoid incurring this performance penalty.
Toshi
Nov 27, 2019
07:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nov 27, 2019
07:46 AM
> There is nothing in the AURIX architecture which precludes future implementations from supporting cache coherency.
I am constraining my discussion to AURIX variants that actually exist 🙂
> So in my opinion, this is a sloppy way of solving the cache coherency issue because the performance penalty is very high.
It is much better to selectively place the data structures which are shared between the CPU and DMA in an uncached address range to avoid incurring this performance penalty.
I agree with your statement in principle - but over time, as variables are added to applications by developers who are not aware of the AURIX architecture, eventually someone puts something in the wrong place. In my experience, tracking down cache coherency problems is not worth the trouble. Your mileage may vary.
I am constraining my discussion to AURIX variants that actually exist 🙂
> So in my opinion, this is a sloppy way of solving the cache coherency issue because the performance penalty is very high.
It is much better to selectively place the data structures which are shared between the CPU and DMA in an uncached address range to avoid incurring this performance penalty.
I agree with your statement in principle - but over time, as variables are added to applications by developers who are not aware of the AURIX architecture, eventually someone puts something in the wrong place. In my experience, tracking down cache coherency problems is not worth the trouble. Your mileage may vary.
Dec 03, 2019
01:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dec 03, 2019
01:09 PM
Few more bugs I have found in iLLD code:
5. The iLLD Ethernet driver configures the Ethernet MAC to drop packets < 64 bytes long.
Ethernet ARP replies are usually 42 bytes long, so this causes ARP to fail.
6. The iLLD Ethernet demo does not configure a valid MAC address.
5. The iLLD Ethernet driver configures the Ethernet MAC to drop packets < 64 bytes long.
Ethernet ARP replies are usually 42 bytes long, so this causes ARP to fail.
6. The iLLD Ethernet demo does not configure a valid MAC address.