Assembly extended register

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
User15397
Level 1
Level 1
Hello,

I am using the Free Entry Toolchain and want to use some inline assembly.

static void maddq(int64_t *out, uint32_t a, uint32_t b){
__asm__ ("madd.q %0,%0,%1,%2" : "+d" (*out) : "d" (a), "d" (b) );
}


The compiler tells me. "Opcode/operand mismatch: madd.q %d15,%d15,%d2,%d3"

instead of %d15,%d15 i guess there should be an extended register %e2 for example.

if i try to compile it (whith madd.u because with madd.q this also does not compile) like:

static void maddq(int64_t *out, uint32_t a, uint32_t b){
__asm__ ("madd.u %%e2,%%e2,%1,%2" : "+d" (*out) : "d" (a), "d" (b) );
}

it compiles, but this does not work, as I want the result of the mac assigned to out and also I want to use madd.q...

how can I accomplish this?
0 Likes
10 Replies
User15397
Level 1
Level 1
I forgot one parameter for the madd.q function. It now at least compiles but I had to rewrite it such that it works:


static void maddq(int64_t out, uint32_t a, uint32_t b){
__asm__ ("ld.d %%e6,%0 \n\t"
"madd.q %%e6,%%e6,%1,%2,0 \n\t"
"st.d %0,%%e6"
: "+m" (c)
: "r" (a), "r" (b)
);
}


Isn't this possible by just constraining (c) to an extendend register pair?

Or is there an intrinsic function which is not listed in my manual, which multiplys accumulates using madd.q?
0 Likes
User13290
Level 5
Level 5
First like received First solution authored
Hi Jonas,

jonnyx wrote:
I forgot one parameter for the madd.q function. It now at least compiles but I had to rewrite it such that it works:
Isn't this possible by just constraining (c) to an extendend register pair?


You'd need the A contraint for that, which you'll find explained in chapter 6.2.3. But perhaps inline assembly can be avoided for your use-case. Take a look at this example:

#include 

int32_t a = 2;
int32_t b = 2;

int64_t c = 0;
int64_t d = 1;

int main(void) {
c = d + (int64_t)a*b;
return c != 5;
}
If I compile this and then inspect its disassembly, it shows the following:

int main(void) {
c = d + (int64_t)a*b;
0: 91 00 00 f0 movh.a %a15,0

20: 03 f2 6a 24 madd %e2,%e4,%d2,%d15
So you probably don't need it. Of course you must timely apply the proper promotions, which is the reason why I'm using the (int64_t) cast to the product of variables a and b. By the way, since you hardcoded e6 in your inline assembly example you will have to add it to the clobber list so the compiler knows it must not allocate variables into this register prior to calling your inline function. Also see chapter 6.3 in the user manual.

Best regards,

Henk-Piet Glas
Principal Technical Specialist
0 Likes
User15397
Level 1
Level 1
Thank you, but how could I rewrite the c-code to force using the madd.q instruction?

And yeah, I had some values in e6 overwritten sometimes (thanks for the hint)!

Best regards


edit: I just tried the "A" constraint, but the compiler tells me: "impossible register constraint in 'asm' "
edit2: i found out not to use A as a constraint but when referencing. "madd %A0,%A0,%1,%2" : "+r"(test) : "r" (test1), "r" (test2)
edit3: big thank you, i got it with only one line of assembler now
0 Likes
User13290
Level 5
Level 5
First like received First solution authored
Hi Jonas,

jonnyx wrote:
Thank you, but how could I rewrite the c-code to force using the madd.q instruction?

And yeah, I had some values in e6 overwritten sometimes (thanks for the hint)!

Best regards


edit: I just tried the "A" constraint, but the compiler tells me: "impossible register constraint in 'asm' "
edit2: i found out not to use A as a constraint but when referencing. "madd %A0,%A0,%1,%2" : "+r"(test) : "r" (test1), "r" (test2)
edit3: big thank you, i got it with only one line of assembler now


Good to read you've nailed it down to one line of inline assembly. Nice use of the + contraint modifier. I'm using a slightly different approach but it pretty much boils down to the same:

/*
* Synopsis: inline Q format multiply and add
*
* --- Copyright HighTec EDV-Systeme GmbH 1982-2018 ---
*/

#include
#include

typedef int32_t Q31_t;
typedef int64_t Q63_t;

#define __INLINE __inline __attribute__((always_inline))

static __INLINE Q63_t __maddq(Q63_t Ed, const Q31_t Da, const Q31_t Db) {
__asm__ __volatile__(
"madd.q\t%A0,%A0,%2,%3,1"
: "=d"(Ed)
: "0"(Ed), "d"(Da), "d"(Db)
:);
return Ed;
}

Q63_t d = INT64_C(1)<<62; /* 0.5 */

Q31_t a = INT32_C(1)<<30; /* 0.5 */
Q31_t b = INT32_C(1)<<30; /* 0.5 */

Q63_t c = INT64_C(3)<<61; /* 0.75 */

int main(void) {
return c != __maddq(d,a,b);
}

When I run this via command line TSIM (included in the product) it passes the bar as follows:

standalone test completed
Total cycles : 2311
Instruction delay count : 1982
test passed (A14 = 0x0000900d)

I was mistaken in my initial reply. You won't be able to do this from C since there currently is no datatype that supports the Q format.

Best regards,


Henk-Piet Glas
Principal Technical Specialist
0 Likes
User15397
Level 1
Level 1
Hello,

Very helpful thank you!
I didn't know about TSIM. I will definitely try this out.

Best regards,
Jonas
0 Likes
User13290
Level 5
Level 5
First like received First solution authored
Hi Jonas,
jonnyx wrote:
Hello,

Very helpful thank you!
I didn't know about TSIM. I will definitely try this out.

Best regards,
Jonas


Let me know if you need help with that. Typically it boils down to copying an MConfig template from the BSP folder local to your project output folder and then calling tsim with a suitable set of command line options. Generally I call it as follows (for TC297 in my case):

tsim16p_e -H -B -disable-watchdog -o foobar.elf


If I subsequently run into issues, I typically add -e to generate an instruction trace:

tsim16p_e -H -B -disable-watchdog -e -o foobar.elf


The instruction trace has helped me out on several occasions. The tsim user guide can be found in the docs folder, in case you whish to explore. You can add tsim to your project as an advanced post-build step but you can also opt to add it as a user-target (user targets can be added to the makefile.targets placeholder). To be able to build user-targets they must additionally be added to the "target build configuration..." menu.

Best regards,

Henk-Piet Glas
Principal Technical Specialist

0 Likes
User15397
Level 1
Level 1
Hi,

great thanks, yeah I tried to call it using my iRam.ld memoryfile but that did not work.. I've got a TC297 & TC275

Best regards,
Jonas
0 Likes
User15397
Level 1
Level 1
How can I achieve getting output from tricore-gcov code coverage using the tsim? More specifically: I want to generate .gcda files by executing the programm in order to be able to run tricore-gcov.

I run a while() loop and somewhen do exit(0) like stated in the user manual. The simulator exits after reaching the cycle count (-x) and not when exit(0) is called.
the simulator is started like:
tsim1311 -B -p -x 400000 -disable-watchdog -H -o ./coverage.elf


when I run the command you have told me (has worked on previous tests but not here):
tsim16_p -H -p -B -x 400000 -disable-watchdog -o ./coverage.elf
it always aborts immediatley and in the log it says
"Simulation stopped: exit pc: 0x8000e096"
"Simulation complete"

using it with the -e flag it looks like:
[HTML](_start+0x0)
...
...
(_board_init + 0x0)
..
..
(_disable_wdt + 0x0)
..
..
(_init_csa +0x0)
..
..
(__clear_table_func + 0x0)
..
..
(__board_init + 0xe2)
..
..
(_exit + 0x0)[/HTML]
0 Likes
User13290
Level 5
Level 5
First like received First solution authored
Hi Jonas,

jonnyx wrote:
it always aborts immediatley and in the log it says
"Simulation stopped: exit pc: 0x8000e096"
"Simulation complete"


Your second scenario probably hits a trap and ends up in one of the default trap handlers that can be found in the startup code crt0-tc2x.S. As a usual suspect for this I think your MConfig is either off or missing. For 1st generation AURIX you need to copy this one:

/bsp/tricore/common/tsim/tc161/MConfig


MConfig reflects your memory map. If it is not there, or it is the wrong one, then it is likely to end up being out off sync with that of your application. This increases the chance that the application will trap during either csa, copy or clear table initialisation because it might end up poking in memory that tsim has not allocated.

As a side-note, I noticed that our manual and myself have been using -B while it doesn't appear to be documented. After investigation it turns out that this option dates from 2005 to emulate some silicon bugs that were current back then. These days you no longer need it. You may drop it from your command line.

Best regards,


Henk-Piet Glas
Principal Technical Specialist
0 Likes
User13290
Level 5
Level 5
First like received First solution authored
Hi Jonas,

Note that when enabling code coverage you will at least need to add 20K to your current user stack size, because some fairly large structures are being pushed onto it, when building the histogram. If it's size does not accomodate for that, the stack will corrupt whatever else precedes it in your memory map. Most of it will not be alive anymore (the histogram is built when during the exit of your program) so sometimes you may just get lucky. I however, was one of the unlucky ones (or lucky depending the way you look at it) that had a batch that broke. As a side-effect the file handles for my .gcda didn't close. While my hint about MConfig is definitely not wasted information, it will probably not have solved your case. Adjusting the user stack size, should.

Best regards,


Henk-Piet Glas
Principal Technical Specialist
0 Likes