逆向工程——条件转移指令

Posted on 2017-12-21 Edited on 2022-08-08 In re

[TOC]

数值比较

主要程序

#include<stdio.h>
void f_signed(int a,int b)
{
	if(a>b)
		printf("a>b\n");
	if(a==b)
		printf("a==b\n");
	if(a<b)
		printf("a<b\n");
}
void f_unsigned(unsigned int a,int b)
{
	if(a>b)
		printf("a>b\n");
	if(a==b)
		printf("a==b\n");
	if(a<b)
		printf("a<b\n");
}
int main()
{
	f_signed(1,2);
	f_unsigned(1,2);
	return 0;
}

x86

x86+msvc

关闭优化选项时，可得到f_signed()函数:

_a$ = 8
_b$ = 12
_f_signed PROC
	push ebp
	mov ebp,esp
	mov eax,DWORD PTR _a$[ebp]
	cmp eax,DWORD PTR _a$[ebp]
	jle SHORT $LN3@f_signed
	push OFFSET $SG737   ;'a>b'
	call _printf
	add esp,4
$LN3@f_signed:
	mov ecx,DWORD PTR _a$[ebp]
	cmp ecx,DWORD PTR _b$[ebp]
	jne SHORT $LN2@f_signed
	push OFFSET $SG739    ;'a==b'
	call _printf
	add esp,4
$LN2@f_signed:
	mov edx,DWORD PTR _a$[ebp]
	cmp edx,DWORD PTR _a$[ebp]
	jge SHORT $$LN4@f_signed
	push OFFSET $SG741    ;'a<b'
	call _prinf
$LN4@f_signed:
	pop ebp
	ret 0

_f_signed ENDP

第一个条件转一会领是JLE，即“Junp if Less or Equal”。如果上一条CMP指令的第一个操作数表达式小于或等于（不大于）第二个表达式，JLE将跳转到指令所标明的地址；如果不满足上述条件，则运行下一条指令，就本例而言程序将会调用printf（）函数，第二个条件转移指令是JNE，”Jump if Not Equal“,如果上一条CMP指令的两个操作符不相等，则进行相应跳转。
第三个转移指令是JGE，即”Jump if Greater or Equal“，如果CMP的第一个表达式大于或等于第二个表达式（不小于），则进行跳转。这段程序里，如果三个跳转的判断条件都不满足，将不会调用pringtf（）函数；不过除非进行特殊干预，，否则这种情况应该不会发生。
现在我们观察 f_unsigned()函数的汇编指令。f_unsigned()函数和 f_signed()函数大体相同。它们的区别集中体现在条件转移指令上:f_unsinged()函数的使用的条件转移指令是 JBE 和 JAE,而 f_signed()函数使用的条件转移指令则是 JLE 和 JGE。
使用 GCC 编译上述程序,可得到 f_unsigned()的汇编指令如下。

_a$=8   ;size=4
_b$=12  ;size=4
_f_unsifned PROC
	push ebp
	mov ebp,esp
	mov eax,DWORD PTR _a$[ebp]
	cmp eax,DWORD PTR _b$[ebp]
	jbe SHORT $LN3@f _unsigned
	push OFFSET $SG2761    'a>b'
	call _printf
	add esp,4
$LN3@f _unsigned:
	mov ecx,DWORD PTR _a$[ebp]
	cmp ecx,DWORD PTR _b$[ebp]
	jne SHORT $LN2@f_unsigned
	push OFFSET $SG2763  ; 'a==b'
	call _printf
	add esp, 4
$LN2@f_unsigned:
	mov edx,DWORD PTR _a$[ebp]
	cmp edx,DWORD PTR _b$[ebp]
	jae SHORT $LN4@f_unsigned
	push OFFSET $SG2765   ; 'a<b'
	call _printf
	add esp, 4
LN4@f_unsigned:
	pop ebp
	Ret 0
_f_unsigned ENDP

GCC 编译的结果与 MSVC 编译的结果基本相同。
经 GCC 编译后,f_unsigned()函数使用的条件转移指令是 JBE(Jump if Below or Equal,相当于 JLE)和 JAE(Jump if Above or Equal,相当于 JGE)。JA/JAE/JB/JBE 与 JG/JGE/JL/JLE 的区别,在于它们检查的标志位不同:前者检查借/进位标志位 CF(1 意味着小于)和零标志位 ZF(1 意味着相等),后者检查“SF
XOR OF”(1 意味着异号)和 ZF。从指令参数的角度看,前者适用于 unsigned (无符号)类型数据的(CMP)运算,而后者的适用于 signed(有符号)类型数据的运算。可见,根据条件转移的指令,我们可以直接判断 CMP 所比较的变量的数据类型。

接下来,我们一起研究 main()函数的汇编代码:

_main PROC
	push ebp
	mov ebp,esp
	push 2
	push 1
	call _f_signed
	add esp,8
	push 2
	push 1
	call _f_unsigned
	add esp,8
	xor eax,eax
	pop ebp
	ret 0
_main ENDP

x86+MSVC+OllDbg

我们可以通过 OllyDbg 直观地观察到指令对标志寄存器的影响。我们先用 OllyDbg 观察 f_unsigned()函数比较无符号数的过程。f_unsigned()函数使用了 CMP 指令,分三次比较了两个相同的 unsigned 类型数据。因为参数相同,所以 CMP 设置的标志位必定相同。
如下图所示,在运行到第一个条件转移指令时,C=1, P=1, A=1, Z=0, S=1, T=0, D=0, O=0。OllyDbg会使用标志位的首字母作为该标志位的简称.
OllyDbg 在左下窗口进行提示,JBE 条件跳转指令的条件已经达成,下一步会进行相应跳转。这种预测准确无误,JBE 的触发条件是(CF=1 或 ZF=1)。条件表达式为真时,JBE 确实会进行跳转。

如下图所示,在运行到第二个条件转移指令——JNZ 指令时,ZF=0。所以 OllyDbg 能够判断程序会进行相应跳转

如下图所示,运行到第三个条件转移指令——JNB 指令的时候,借/进位标志 CF=0,条件表达式会为假,所以不会发生跳转,程序将执行第三个 printf()指令。

现在来调试下示例程序里的 f_signed()函数,它的参数为 signed 型数据。
在运行 f_signed()函数时,标志位的状态和刚才一样。即,运行 CMP 指令之后, C=1, P=1, A=1, Z=0, S=1,T=0, D=0, O=0。
第一个条件转移指令——JLE 指令将会被触发,如下图所示：

触发 JLE 的条件是 ZF=1 或 SF≠OF。本例满足 SF≠OF 的条件。由于 ZF=0,第二个条件转移指令——JNZ 指令会被触发,如下图所示：

而第三个条件转移指令——JGE 指令不会被触发。触发 JGE 的条件是 SF=OF,而当前情形不满足这个条件,如下图所示：

GCC

Non-optimizing GCC

如果关闭了GCC的优化选项,那么它编译出来的程序和MSVC编译出来的程序没什么区别,只不过就是把printf()函数替换为了puts()函数。

Optimizing GCC

既然 CMP 比较的是相同的值,比较之后的标志位的状态也相同,那么何必要对同样的参数进行多次比较呢?或许 MSVC 真的不能再智能一些了;但是启用优化选项后,GCC 4.8.1 确实能够进行这种深度优化
指令清单 GCC 4.8.1 f_signed()

f_signed:
	mov eax, DWORD PTR [esp+8]
	cmp DWORD PTR [esp+4], eax
	jg .L6
	je .L7
	jge .L1
	mov DWORD PTR [esp+4], OFFSET FLAT:.LC2 ; "a<b"
	jmp puts
.L6:
	mov DWORD PTR [esp+4], OFFSET FLAT:.LC0 ; "a>b"
	jmp puts
.L1
	ret ret
.L7
	mov DWORD PTR [esp+4], OFFSET FLAT:.LC1 ; "a==b"
	jmp puts

很明显,它使用 jmp 指令替代了臃肿的“CALL ……puts …… RETN”指令。
在 x86 的系统中,这种程序比较少见。MSVC 2012 做不到 GCC 那种程度的深度优化。
另一方面,汇编语言的编程人员确实可能学会 Jcc 指令的连用技巧。所以,如果您遇到了这样精简的程序,而且还能够判断出它不是 GCC 编译出来的程序,那么您基本上可以判断它是手写出来的汇编程序。
即使开启了同样的优化选项,f_unsigned()函数对应的指令也没有那么精致:

f_unsigned:
	push esi
	push ebx
	sub esp,20
	mov esi,DWORD PTR [esp+32]
	mov ebx,DWORD PTR [esp+64]
	cmp esi,ebx
	ja .L13
	cmp esi,ebx    ; instruction may be removed
	je .L14
.L10:
	jb .L15
	add esp, 20
	pop ebx
	pop esi
	ret
.L15:
	mov DWORD PTR [esp+32], OFFSET FLAT:.LC2 ; "a<b"
	add esp, 20
	pop ebx
	pop esi
	jmp puts
.L13:
	mov DWORD PTR [esp], OFFSET FLAT:.LC0 ; "a>b"
	call puts
	cmp esi, ebx
	jne .L10
.L14:
	mov DWORD PTR [esp+32], OFFSET FLAT:.LC1 ; "a==b"
	add esp, 20
	pop ebx
	pop esi
	jmp puts

程序中只有两条 CMP 指令,至少它优化去了一个 CMP 指令。可见,GCC 4.8.1 的优化算法还有改进的空间.

ARM

32 位 ARM 程序

Optimizing Keil 6/2013 (ARM mode)

指令清单 Optimizing Keil 6/2013 (ARM mode)

.text:000000B8				EXPORT f_signed
.text:000000B8		f_signed 		; CODE XREF: main+C
.text:000000B8 70 40 2D E9	STMFD SP!, {R4-R6,LR}
.text:000000BC 01 40 A0 E1	MOV R4, R1
.text:000000C0 04 00 50 E1	CMP R0, R4
.text:000000C4 00 50 A0 E1	MOV R5, R0
.text:000000C8 1A 0E 8F C2	ADRGT R0, aAB     ; "a>b\n"
.text:000000CC A1 18 00 CB	BLGT __2printf
.text:000000D0 04 00 55 E1	CMP R5, R4
.text:000000D4 67 0F 8F 02	ADREQ R0, aAB_0; "a==b\n"
.text:000000D8 9E 18 00 0B	BLEQ __2printf
.text:000000DC 04 00 55 E1	CMP R5,R4
.text:000000E0 70 80 BD A8	LDMGEFD SP!, {R4-R6,PC}
.text:000000E4 70 40 BD E8	LDMFD SP!, {R4-R6,LR}
.text:000000E8 19 0E 8F E2	ADR R0, aAB_1 ; "a<b\n"
.text:000000EC 99 18 00 EA	B __2printf
.text:000000EC ; End of function f_signed

RM 模式的多数指令都存在着相应的条件执行指令。这些派生出来的条件执行指令仅会在特定标志位为 1的情况下执行。换句话说,只有当前面存在比较数值的指令时,后面才可能会出现这种派生出来的条件执行指令。
举例来讲,加法指令 ADD 指令实际上是 ADDAL 指令。“AL”就是 always 的缩写,即 ADDAL 总会被无条件执行。在 32 位的 ARM 指令中,条件判断表达式被封装在条件执行指令的前(最高) 4 位——条件字段(conditionfield)里。即使是无条件转移指令 B 指令,其前 4 位还是条件字段。从指令构成上说,B 指令仍然属于条件转移指令,只不过它的条件字段是 AL 而已。顾名思义,AL 的作用就是忽略标志寄存器、永远执行这条指令。
ADRGT 指令中的 GT 代表 greater than(大于)。该指令依据先前 CMP 指令的比较结果,而判断是否执行寻址指令。当且仅当 CMP 比较的第一个值大于第二个值的时候,ADRGT 指令才会执行寻址(ADR)指令。
后面的 BLGT 指令有异曲同工之妙。仅在相同条件下,即当且仅当 CMP 比较的第一个值大于第二个值的时候,BLGT 指令才会执行 BL 指令。在这个条件成立的时候,前面的 ADRGT 指令已经把字符串“a>b /n”的地址赋值给 R0 寄存器,成为了 printf()的参数,而 BLGT 负责调用 printf()。可见,当且仅当在 R0 的值(变量 a)大于R4 的值(变量 b)的情况下,计算机才会运行后面那组带有-GT 后缀的指令。很显然,这是一组相互关联的指令。
后面的 ADREQ 和 BLEQ 指令,都在最近一个 CMP 的操作数相等的情况下才会讲行 ADR 和 BL 指令的操作。程序之中连续两次出现“CMP R5, R4”指令,这是因为夹在其间的 printf()函数可能会影响标志位。
LDMGEFD 是“Great or Equal (大于或等于)”的情况下进行 LDMFD (Load Multiple Full Descending) 操作的指令。依此类推,“LDMGEFD SP!, {R4-R6,PC}”指令起到函数尾声的作用,不过它只会在“a>=b”的时候才会结束本函数。
如果上述条件不成立,即“a<b”的时候,会执行下一条指令“LDMFD SP!, {R4-R6,LR}”。这同样起到函数尾声的作用。该指令将恢复 R4~R6 寄存器、LR 寄存器的值,而不恢复 PC 寄存器的值,且不会退出当前函数。
函数最后的两条指令,分别向 printf()函数传递参数(字符串“a<b\n”),并且调用 printf()函数。当调用方函数调用(跳转到)printf()函数之后,调用方函数可以伴随 printf()函数
退出而退出。
f_unsigned()函数与 f_signed()函数的功能十分类似。不同之处是它用到了 ADRHI、BLHI 和 LDMSFD指令。指令尾部的-HI 代表 Unsigned Higher,CS 代表 Carry Set (greater than or equal)。因为参数的数据类型有所变化,所以这两个函数的具体指令有所区别。
这个程序的 main()函数的汇编指令如下
指令清单 main()函数

.text:00000128			EXPORT main
.text:00000128		main
.text:00000128 10 40 2D E9	STMFD SP!, {R4,LR}
.text:0000012C 02 10 A0 E3	MOV R1, #2
.text:00000130 01 00 A0 E3	MOV R0, #1
.text:00000134 DF FF FF EB	BL f_signed
.text:00000138 02 10 A0 E3	MOV R1, #2
.text:0000013C 01 00 A0 E3	MOV R0, #1
.text:00000140 EA FF FF EB	BL f_unsigned
.text:00000144 00 00 A0 E3	MOV R0, #0
.text:00000148 10 80 BD EB	LDMFD SP!, {R4,PC}
.text:00000148		; End of function main

可见,ARM 模式的程序可以完全不依赖条件转移指令。
这样做有什么优点呢?依赖精简指令集(RISC)的ARM处理器采用流水线技术(pipeline)
。简单地说,这种处理器在跳转指令方面的性能不怎么优越,所以它们的分支预测处理器(branch predictor unites)决定了整体的性能。对于采用流水线技术的处理器来说,运行其上的程序跳转次数越少(无论是条件转移还是无条件转移),程序的性能就越高。条件执行指令 ,会受益于其跳跃次数最少的优点,体现出最高的效率。
x86 指令集里只有 CMOVcc 指令,没有其他的条件执行指令了。CMOVcc 指令是仅在特定标志位为 1(通常由 CMP 指令设置)的情况下才会执行 MOV 操作的条件执行指令。

计算绝对值

程序

int my_abs (int i)
{
	if (i<0)
		return -i;
	else
		return i;
};

Optimizing MSVC

指令清单 Optimizing MSVC 2012 x64

i$ = 8
my_abs PROC
; ECX = input
	test   ecx, ecx
; check for sign of input value
; skip NEG instruction if sign is positive
	jns SHORT $LN2@my_abs
; negate value
	neg ecx
$LN2@my_abs:
; prepare result in EAX:
	mov eax,ecx
	ret 0
my_abs ENDP

GCC 4.9 的编译结果几乎相同。

Optimizing Keil 6/2013: Thumb mode

指令清单 Optimizing Keil 6/2013:Thumb mode

my_abs PROC
	CMP r0,#0
; is input value equal to zero or greater than zero?
; skip RSBS instruction then
	BGE |L0.6|
; subtract input value from 0:
	RSBS r0,r0,#0
|L0.6|
	BX lr
	ENDP

ARM 平台没有负数运算指令,所以 Keil 编译器使用了“零减去数值”的减法运算指令“Reverse Subtract”(减数和被减数位置对调的减法运算),同样达到了替换符号的效果。

Optimizing Keil 6/2013: ARM mode

因为 ARM 模式的指令集存在条件执行指令,所以开启优化选项后可得到如下指令。
指令清单 Optimizing Keil 6/2013:ARM mode

my_abs PROC
	CMP
r0,#0
; execute "Reverse Subtract" instruction only if input value is less than 0:
	RSBLT r0,r0,#0
	BX lr
	ENDP

即使没有使用条件转移指令它也实现相同的功能

Non-optimizng GCC 4.9 (ARM64)

ARM64 的指令集存在求负运算的 NEG 指令。
指令清单 Optimizing GCC 4.9 (ARM64)

my_abs:
	subs p, sp, #16
	str w0, [sp,12]
	ldr w0, [sp,12]
; compare input value with contents of WZR register
; (which always holds zero)
	cmp w0, wzr
	bge .L2
	ldr w0, [sp,12]
	neg w0, w0
	b .L3
.L2:
	ldr w0, [sp,12]
.L3:
	add sp, sp, 16
	ret

MIPS

指令清单 Optimizing GCC 4.4.5 (IDA)

my_abs:
; jump if $a0<0:
	bltz $a0, locret_10
; just return input value ($a0) in $v0:
	move $v0, $a0
	jr $ra
	or $at, $zero ; branch delay slot, NOP
locret_10:
; negate input value and store it in $v0:
	jr $ra
; this is pseudoinstruction. in fact, this is "subu $v0,$zero,$a0" ($v0=0-$a0)
	negu $v0, $a0

这里出现了新指令 BLTZ(Branch if Less Than Zero),以及伪指令 NEGU。NEGU 指令计算零减去操作数的差。SUBU 和 NEGU 指令中的后缀 U 代表它的操作数是无符号型数据,并且在整数溢出的情况下不会触发异常处理机制。

条件运算符

程序

 const char* f (int a)
{
	return a==10 ? "it is ten" : "it is not ten";
};

x86

在编译含有条件运算符的语句时,早期无优化功能的编译器会以编译“if/else”语句的方法进行处理。
指令清单 Non-optimizing MSVC 2008

$SG746 DB 'it is ten', 00H
$SG747 DB 'it is not ten', 00H

tv65 = -4 ; this will be used as a temporary variable
_a$ = 8
_f	PROC
	push ebp
	mov ebp, esp
	push ecx
; compare input value with 10
	cmp DWORD PTR _a$[ebp], 10
; jump to $LN3@f if not equal
	jne SHORT $LN3@f
; store pointer to the string into temporary variable:
	mov DWORD PTR tv65[ebp], OFFSET $SG746 ; 'it is ten'
; jump to exit
	jmp SHORT $LN4@f
$LN3@f:
; store pointer to the string into temporary variable:
	mov DWORD PTR tv65[ebp], OFFSET $SG747 ; 'it is not ten'
$LN4@f:
; this is exit. copy pointer to the string from temporary variable to EAX.
	mov eax, DWORD PTR tv65[ebp]
	mov esp, ebp
	pop ebp
	ret 0
_f	ENDP Optimizing MSVC 2008

指令清单 Optimizing MSVC 2008

$SG792 DB 'it is ten', 00H
$SG792 DB 'it is not ten', 00H
_a$ = 8 ; size = 4
_f	PROC
; compare input value with 10
	cmp DWORD PTR _a$[esp-4], 10
	mov eax, OFFSET $SG792 ; 'it is ten'
; jump to $LN4@f if equal
	je SHORT $LN4@f
	mov eax, OFFSET $SG793 ; 'it is not ten'
$LN4@f:
	ret 0
_f	ENDP

新编译器生成的程序更为简洁。
指令清单 Optimizing MSVC 2012 x64

$SG1355 DB		'it is ten', 00H
$SG1356 DB		'it is not ten', 00H

a$	= 8
f	PROC
; load pointers to the both strings
	lea rdx, OFFSET FLAT:$SG1355 ; 'it is ten'
	lea rax, OFFSET FLAT:$SG1356 ; 'it is not ten'
; compare input value with 10
	cmp ecx, 10
; if equal, copy value from RDX ("it is ten")
; if not, do nothing. pointer to the string "it is not ten" is still in RAX as for now.
	cmove rax, rdx
	ret 0
f	ENDP

启用优化选项后,GCC 4.8 生成的 x86 指令同样使用了 CMOVcc 指令。相比之下,在关闭优化功能的情况下,GCC 4.8 用条件转移指令编译条件操作符。

ARM

启用优化功能之后,Keil 生成的 ARM 代码会应用条件运行指令 ADRcc

f PROC
; compare input value with 10
	CMP r0, #0xa
; if comparison result is EQual, copy pointer to the "it is ten" string into R0
	ADREQ r0,|L0.16| ; "it is ten"
; if comparison result is Not Equal, copy pointer to the "it is not ten" string into R0
	ADRNE r0,|L0.28| ; "it is not ten"
	BX lr
	ENDP
|L0.16|
	DCB "it is ten",0
|L0.28|
	DCB "it is not ten",0

除非存在人为干预,否则 ADREQ 和 ADRNE 指令不可能在同一次调用期间都被执行。
在启用优化功能之后,Keil 会给编译出的 Thumb 模式代码分配条件转移指令。毕竟在 Thumb 模式的指令之中,没有支持标志位判断的赋值指令。
指令清单 Optimizing Keil 6/2013 (Thumb mode)

f PROC
; compare input value with 10
	CMP r0,#0xa
; jump to |L0.8| if EQual
	BEQ |L0.8|
	ADR r0,|L0.12| ; "it is not ten"
	BX lr
|L0.8|
	ADR r0,|L0.28| ; "it is ten"
	BX lr
	ENDP
|L0.12|
	DCB "it is not ten",0
|L0.28|
	DCB "it is ten",0

ARM64

启用优化功能之后,GCC(Linaro)4.9 编译出来的 ARM64 程序同样用条件转移指令实现条件运算符。
指令清单 Optimizing GCC (Linaro) 4.9

f:
	cmp x0, 10
	beq .L3     ; branch if equal
	adrp x0, .LC1 ; "it is ten"
	add x0, x0, :lo12:.LC1
	ret 
.L3:
	adrp ret x0, .LC0 ; "it is not ten"
	add x0, x0, :lo12:.LC0
.LC0:
	.string "it is ten"
.LC1:
	.string "it is not ten"

ARM64 同样没有能够判断标志位的条件赋值指令。而 32 位的ARM指令集 1 ,以及x86 的CMOVcc指令都可以根据相应标志位进行条件赋值。虽然ARM64 存在“条件选择”指令CSEL(Conditional SELect),但是GCC 4.9 似乎无法给这种程序分配上这条指令。

MIPS

不幸的是,GCC 4.45 在编译 MIPS 程序方面的智能程度也有待完善。
指令清单 Optimizing GCC 4.4.5 (assembly output)

$LC0:
	.ascii "it is not ten\000"
$LC1:
	.ascii "it is ten\000"
f:
	li $2,10     # 0xa
; compare $a0 and 10, jump if equal:
	beq $4,$2,$L2
	nop ; branch delay slot

; leave address  of "it is not ten" string in $v0 and return:
	lui $2,%hi($LC0)
	j $31
	addiu $2,$2,%lo($LC0)
$L2:
; leave address of "it is ten" string in $v0 and return:
	lui $2,%hi($LC1)
	j $31
	addiu $2,$2,%lo($LC1)

使用 if/else 替代条件运算符

const char* f (int a)
{
if (a==10)
return "it is ten";
else
return "it is not ten";
};

启用优化功能之后,GCC 4.8 在编译 x86 程序时能够应用 CMOVcc 指令。
指令清单 Optimizing GCC 4.8

.LC0:
	.string "it is ten"
.LC1:
	.string "it is not ten"
f:
.LFB0:
; compare input value with 10
	cmp DWORD PTR [esp+4], 10
	mov edx, OFFSET FLAT:.LC1 ; "it is not ten"
	mov eax, OFFSET FLAT:.LC0 ; "it is ten"
; if comparison result is Not Equal, copy EDX value to EAX
; if not, do nothing
	cmovne eax, edx
	ret

总结

启用优化功能之后,编译器会尽可能地避免使用条件转移指令。

比较最大值和最小值

32位

程序

int my_max(int a, int b)
{
	if (a>b)
		return a;
	else
		return b;
};
int my_min(int a, int b)
{
	if (a<b)
		return a;
	else
		return b;
};

指令清单 Non-optimizing MSVC 2013

_a$ = 8
_b$ = 12
_my_min PROC
	push ebp
	mov ebp, esp
	mov eax, DWORD PTR _a$[ebp]
; compare A and B:
	cmp eax, DWORD PTR _b$[ebp]
; jump, if A is greater or equal to B:
	jge SHORT $LN2@my_min
; reload A to EAX if otherwise and jump to exit
	mov eax, DWORD PTR _a$[ebp]
	jmp SHORT $LN3@my_min
	jmp SHORT $LN3@my_min ; this is redundant JMP
$LN2@my_min:
; return B
	mov eax, DWORD PTR _b$[ebp]
$LN3@my_min:
	pop ebp
	ret 0
_my_min ENDP
_a$ = 8
_b$ = 12
_my_max PROC
	push ebp
	mov ebp, esp
	mov eax, DWORD PTR _a$[ebp]
; compare A and B:
	cmp eax, DWORD PTR _b$[ebp]
; jump if A is less or equal to B:
	jle SHORT $LN2@my_max
; reload A to EAX if otherwise and jump to exit
	mov eax, DWORD PTR _a$[ebp]
	jmp SHORT $LN3@my_max
	jmp SHORT $LN3@my_max ; this is redundant JMP
$LN2@my_max:
; return B
	mov eax, DWORD PTR _b$[ebp]
$LN3@my_max:
	pop ebp
	ret 0
_my_max ENDP

两个函数的唯一区别就是条件转移指令:第一个函数使用的是 JGE(Jump if Greater or Equal),而第二个函数使用的是 JLE(Jump if Less or Equal)。上述每个函数里都存在一个多余的 JMP 指令。这可能是 MSVC 的问题。

** 无分支指令的编译方法**
Keil 编译的 Thumb 模式程序与 x86 程序有几分相似

指令清单 Optimizing Keil 6/2013 (Thumb mode)

my_max PROC
; R0=A
; R1=B
; compare A and B:
	CMP r0,r1
; branch if A is greater then B:
	BGT |L0.6|
; otherwise (A<=B) return R1 (B):
	MOVS r0,r1
|L0.6|
; return
	BX lr
	ENDP
my_min PROC
; R0=A
; R1=B
; compare A and B:
	CMP r0,r1
; branch if A is less then B:
	BLT |L0.14|
; otherwise (A>=B) return R1 (B):
	MOVS r0,r1
|L0.14|
; return
	BX lr
	ENDP

两个函数所用的转移指令不同:一个是 BGT,而另一个是 BLT。
在编译 ARM 模式程序时,编译器可能会使用条件执行指令(即“有分支”指令)
。这种程序会显得更为短小。在编译条件表达式时,Keil 编译器使用了 MOVcc 指令。

指令清单 Optimizing Keil 6/2013 (ARM mode)

my_max PROC
; R0=A
; R1=B
; compare A and B:
	CMP r0,r1
; return B instead of A by placing B in R0
; this instruction will trigger only if A<=B (hence, LE - Less or Equal)
; if instruction is not triggered (in case of A>B), A is still in R0 register
	MOVLE r0,r1
	BX lr
	ENDP
my_min PROC
; R0=A
; R1=B
; compare A and B:
	CMP r0,r1
; return B instead of A by placing B in R0
; this instruction will trigger only if A>=B (hence, GE - Greater or Equal)
; if instruction is not triggered (in case of A<B), A value is still in R0 register
	MOVGE r0,r1
	BX lr
	ENDP

在启用优化功能的情况下,GCC 4.8.1 和 MSVC 2013 都能使用 CMOVcc 指令。这个指令相当于 ARM程序里的 MOVcc 指令。

指令清单 Optimizing MSVC 2013

my_max:
	mov edx, DWORD PTR [esp+4]
	mov eax, DWORD PTR [esp+8]
; EDX=A
; EAX=B
; compare A and B:
	cmp edx, eax
; if A>=B, load A value into EAX
; the instruction idle if otherwise (if A<B)
	cmovge eax, edx
	ret
my_min:
	mov edx, DWORD PTR [esp+4]
	mov eax, DWORD PTR [esp+8]
; EDX=A
; EAX=B
; compare A and B:
	cmp edx, eax
; if A<=B, load A value into EAX
; the instruction idle if otherwise (if A>B)
	cmovle eax, edx
	ret

64 位

程序

#include <stdint.h>
int64_t my_max(int64_t a, int64_t b)
{
	if (a>b)
		return a;
	else
		return b;
};
int64_t my_min(int64_t a, int64_t b)
{
	if (a<b)
		return a;
	else
		return b;
};

虽然编译出来的程序里存在不必要的数据交换,但是代码功能一目了然

指令清单 Non-optimizing GCC 4.9.1 ARM64

my_max:
	sub sp,sp,#16
	str x0,[sp,8]
	str x1,[sp]
	ldr x1,[sp,8]
	ldr x0,[sp]
	cmp X1,X0
	ble .L2
	ldr x0,[sp,8]
	b .L3
.L2:	
	ldr x0,[sp]
.L3:
	add sp,sp,#16
	ret
my_min:
	sub sp,sp,#16
	str x0,[sp,8]
	str x1,[sp]
	ldr x1,[sp,8]
	ldr x0,[sp]
	cmp x1,x0
	bge .L5
	ldr x0,[sp]
	b .L6
.L5:
	ldr x0,[sp]
.L6:
	add sp,sp,16
	ret

无分支指令的编译方法
既然函数参数就在寄存器里,那么就不必通过栈访问它们。

指令清单 Optimizing GCC 4.9.1 x64

my_max:
; RDI=A
; RSI=B
; compare A and B:
	cmp rdi, rsi
; prepare B in RAX for return:
	mov rax, rsi
; if A>=B, put A (RDI) in RAX for return.
; this instruction is idle if otherwise (if A<B)
	cmovge rax, rdi
	ret
my_min:
; RDI=A
; RSI=B
; compare A and B:
	cmp rdi, rsi
; prepare B in RAX for return:
	mov rax, rsi
; if A<=B, put A (RDI) in RAX for return.
; this instruction is idle if otherwise (if A>B)
	cmovle rax, rdi
	ret

MSVC 2013 的编译方法几乎一样。ARM64 指令集里有 CSEL 指令。它相当于 ARM 指令集中的 MOVcc 指令,以及 x86 平台的 CMOVcc指令。它只是名字不同:“Conditional SELect”。

指令清单 Optimizing GCC 4.9.1 ARM64

my_max:
; X0=A
; X1=B
; compare A and B:
	cmp x0, x1
; select X0 (A) to X0 if X0>=X1 or A>=B (Greater or Equal)
; select X1 (B) to X0 if A<B
	csel x0, x0, x1, ge
	ret
my_min:
; X0=A
; X1=B
; compare A and B:
	cmp x0, x1
; select X0 (A)  to X0 if X0<=X1 or A<=B (Less or Equal)
; select X1 (B) to X0 if A>B
	csel x0, x0, x1, le
	ret

MIPS

不幸的是,GCC 4.4.5 在编译 MIPS 程序方面的智能化程度有限。
指令清单 12.33 Optimizing GCC 4.4.5 (IDA)

my_max:
; set $v1 $a1<$a0,or clear otherwise (if $01>$a0):
	slt $v1, $a1, $a0
; jump, if $v1 iso (or $a1>$a9):
	beqz $v1, locret_10
; this is branch delay slot
; prepare $a1 in $v0 in case of branch triggered:
	move $v0, $a1
; no branch triggered, prepare $a0 in $v0:
	move $v0, $a0
locret_10:
	jr $ra
	or $at, $zero ; branch delay slot, NOP
; the min() function is same, but input operands in SLT instruction are swapped:
my_min
	slt $v1, $a0, $a1
	beqz $v1, locret_28
	move $v0, $a1
	move $v0, $a0
locret_28:
	jr $ra
	or $at, $zero ; branch delay slot, NOP

请注意分支延时槽现象:第一个 MOVE 指令“先于”BEQZ 指令运行,而第二个 MOVE 指令仅在不发生跳转的情况下才会被执行。

总结

条件转移指令的构造大体如下。

x86

指令清单 x86

CMP register, register/value
Jcc true ; cc=condition code
false:
... some code to be executed if comparison result is false ...
JMP exit
true:
... some code to be executed if comparison result is true ...
exit:

ARM

指令清单 ARM

CMP register, register/value
Bcc true ; cc=condition code
false:
... some code to be executed if comparison result is false ...
JMP exit
true:
... some code to be executed if comparison result is true ...
exit:

MIPS

指令清单遇零跳转

1 2	BEQZ REG, label ...

指令清单遇负数跳转

1 2	BLTZ REG, label ...

指令清单值相等的情况下跳转

1 2	BEQ REG1, REG2, label ...

指令清单值不等的情况下跳转

1 2	BNE REG1, REG2, label ...

指令清单第一个值小于第二个值的情况下跳转(signed)

1
2
3

SLT REG1, REG2, REG3
BEQ REG1, label
...

指令清单第一个值小于第二个值的情况下跳转(unsigned)

1
2
3

SLTU REG1, REG2, REG3
BEQ REG1, label
...

无分支指令(非条件指令)

如果条件语句十分短,那么编译器可能会分配条件执行指令:

编译 ARM 模式的程序时应用 MOVcc 指令。
编译 ARM64 程序时应用 CSEL 指令。
编译 x86 程序时应用 CMOVcc 指令。

####ARM
在编译 ARM 模式的程序时,编译器可能用条件执行指令替代条件转移指令。
指令清单 ARM (ARM mode)

CMP register, register/value
instr1_cc ; some instruction will be executed if condition code is true
instr2_cc ; some other instruction will be executed if other condition code is true
... etc ...

在被执行指令不修改任何标志位的情况下,程序可有任意多条的条件执行指令。
Thumb 模式的指令集里有 IT 指令。它可以把后续四条指令构成一个指令组,并且在条件表达式为真的时候运行这组指令。

指令清单 ARM (Thumb mode)

CMP register, register/value
ITEEE EQ ; set these suffixes: if-then-else-else-else
instr1	;instraction will be executed if condition is true
instr2	;instraction will be executed if condition is false
instr3	;instraction will be executed if condition is false
instr4	;instraction will be executed if condition is false