Dec 07, 2018
03:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dec 07, 2018
03:25 AM
Hi,
My goal is to evaluate how fast I can toggle an I/O on the XMC1100.
For this, I bought the cute XMC 2Go kit and installed DAVE4.
Starting point was the XMC_2Go_Initial_Start_v1.3 example.
I can change the frequency of the blinking led. Toolchain works. Fine.
Then I added the line
P0_5_toggle();
In a while(1) loop.
Toggling works and time from rise to fall is about 3us.
Really slow.
According the comments in the example, the CPU clock is running on 8 Mhz.
Not sure how this works, but I need the 32 Mhz.
Changed the configuration to :
SCU_CLK->CLKCR = 0x0FFC0100UL;
This resulted in toggle time of about 1.2 us.
After this, I replaced the function call to P0_5_toggle() with it's contents:
PORT0->OMR = 0x00200020UL;
This resulted in an improvement to about 530ns.
Oops! The tooling neglects the inline directive of the P0_5_toggle() function. No idea why.
Next step is to execute the toggle code from RAM.
Therefore, I moved the toggling code to a separate function in a separate file (header + c file).
In the function declaration in the header file, I added the famous __attribute__((section(".ram_code")))
However, the tooling also neglects this directive and the code is still executed from flash.
Anybody knows a solution?
It seems that it is a tooling issue.
I tried to understand the linker script, but I did not see strange things.
Thanks,
Lodewijk
--
An investigation of the 530 ns:
The P0_5_toggle() generates 3 assembly instructions (2 loads and 1 store):
LDR: 2 cycles
LDR: 2 cycles
STR: 2 cycles
+ a B(ranch) for the while loop, good for 3 cycles
So, this is 9 cycles. If we assume 2 wait cycles for reading from flash, we have 8 additional cycles.
In total 17 cycles. 17 cycles * 31ns = 527ns.
My goal is to evaluate how fast I can toggle an I/O on the XMC1100.
For this, I bought the cute XMC 2Go kit and installed DAVE4.
Starting point was the XMC_2Go_Initial_Start_v1.3 example.
I can change the frequency of the blinking led. Toolchain works. Fine.
Then I added the line
P0_5_toggle();
In a while(1) loop.
Toggling works and time from rise to fall is about 3us.
Really slow.
According the comments in the example, the CPU clock is running on 8 Mhz.
Not sure how this works, but I need the 32 Mhz.
Changed the configuration to :
SCU_CLK->CLKCR = 0x0FFC0100UL;
This resulted in toggle time of about 1.2 us.
After this, I replaced the function call to P0_5_toggle() with it's contents:
PORT0->OMR = 0x00200020UL;
This resulted in an improvement to about 530ns.
Oops! The tooling neglects the inline directive of the P0_5_toggle() function. No idea why.
Next step is to execute the toggle code from RAM.
Therefore, I moved the toggling code to a separate function in a separate file (header + c file).
In the function declaration in the header file, I added the famous __attribute__((section(".ram_code")))
However, the tooling also neglects this directive and the code is still executed from flash.
Anybody knows a solution?
It seems that it is a tooling issue.
I tried to understand the linker script, but I did not see strange things.
Thanks,
Lodewijk
--
An investigation of the 530 ns:
The P0_5_toggle() generates 3 assembly instructions (2 loads and 1 store):
LDR: 2 cycles
LDR: 2 cycles
STR: 2 cycles
+ a B(ranch) for the while loop, good for 3 cycles
So, this is 9 cycles. If we assume 2 wait cycles for reading from flash, we have 8 additional cycles.
In total 17 cycles. 17 cycles * 31ns = 527ns.
2 Replies
Dec 11, 2018
12:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dec 11, 2018
12:51 AM
Hi,
The inline will be ignored if the compiler optimization level it is left at its default level -O0. You can either use __STATIC_FORCEINLINE or use at least -O1.
See below code snippet. As you can see I have placed the main also in RAM since we are inlining the P0_5_toggle() function.
The assembler generated using -O1 is (you can also experiment with other compiler optimizations)
The inline will be ignored if the compiler optimization level it is left at its default level -O0. You can either use __STATIC_FORCEINLINE or use at least -O1.
See below code snippet. As you can see I have placed the main also in RAM since we are inlining the P0_5_toggle() function.
#include
__STATIC_INLINE __attribute__ ((section (".ram_code"))) void P0_5_toggle(void);
void P0_5_toggle(void)
{
XMC_GPIO_ToggleOutput(P0_5);
}
__attribute__ ((section (".ram_code"))) int main(void)
{
XMC_GPIO_SetMode(P0_5, XMC_GPIO_MODE_OUTPUT_PUSH_PULL);
/* Placeholder for user application code. The while loop below can be replaced with user application code. */
while(1U)
{
P0_5_toggle();
}
}
The assembler generated using -O1 is (you can also experiment with other compiler optimizations)
20000520:
{
XMC_GPIO_ToggleOutput(P0_5);
}
__attribute__ ((section (".ram_code"))) int main(void)
{
20000520: b508 push {r3, lr}
XMC_GPIO_SetMode(P0_5, XMC_GPIO_MODE_OUTPUT_PUSH_PULL);
20000522: 4804 ldr r0, [pc, #16] ; (20000534 <__data_end+0x14>)
20000524: 2105 movs r1, #5
20000526: 2280 movs r2, #128 ; 0x80
20000528: f000 f80a bl 20000540 <__XMC_GPIO_SetMode_veneer>
__STATIC_INLINE void XMC_GPIO_ToggleOutput(XMC_GPIO_PORT_t *const port, const uint8_t pin)
{
XMC_ASSERT("XMC_GPIO_ToggleOutput: Invalid port", XMC_GPIO_CHECK_OUTPUT_PORT(port));
port->OMR = 0x10001U << pin;
2000052c: 4a01 ldr r2, [pc, #4] ; (20000534 <__data_end+0x14>)
2000052e: 4b02 ldr r3, [pc, #8] ; (20000538 <__data_end+0x18>)
20000530: 6053 str r3, [r2, #4]
20000532: e7fd b.n 20000530
20000534: 40040000 .word 0x40040000
20000538: 00200020 .word 0x00200020
2000053c: 00000000 .word 0x00000000
Dec 19, 2018
05:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dec 19, 2018
05:43 AM
Thanks for the tip that compiler optimizations should be enabled to use compiler directives.
Consecutive toggling of the same pin requires the execution of one STR operation only, and is now possible with a pulse width of about 62 ns (2 clock cycles).
Consecutive toggling of the same pin requires the execution of one STR operation only, and is now possible with a pulse width of about 62 ns (2 clock cycles).