In this exercise, you will define the syntax and execution model for a small language inspired by the x86-64 assembly language.
The goal is to formalize how code is written, parsed, and executed within a constrained environment.
Your assembly code will be written inside a special construct:
let program := assemble!(
-- assembly code here
)This construct must produce a program, which simulates a function that executes the assembly code and returns a value.
This program is then invoked using the following syntax, where a, b, c, etc. are arguments:
let result := program(a, b, c, ...)All arguments and the return value are 64-bit signed integers. A maximum of 6 arguments is allowed.
The return value and all arguments are passed in registers. Your language must support the following registers:
| register | role |
|---|---|
rdi | 1st argument |
rsi | 2nd argument |
rdx | 3rd argument |
rcx | 4th argument |
r8 | 5th argument |
r9 | 6th argument |
rax | return value |
Registers start with a default value of 0, unless they are used to pass arguments to the function.
For example, in the following program, all registers have a value of 0 with the exception of rdi, which is initialized with 10, and rsi, which is initialized with -20:
program(10, -20)The return value of the program is always stored in rax.
Register names are case-insensitive.
This means that rax, RAX and rAx all refer to the same register.
Each line in the assembly code can be of two forms:
Unless modified by some instruction, the execution flow of the program proceeds linearly. This means that a previous line is fully executed before the line after it is executed.
Labels have the following syntax:
<label>:They do not alter the value of any register or have any effect on the program other than marking specific places of the code so they can be used by instructions.
Labels are case-sensitive.
This means that Start, start and sTart are all different labels.
Most instructions have the following syntax:
<opcode> <destination>, <source>The opcode indicates the operation being executed, using the values of the destination and source operands.
The result of the operation is stored in the destination operand.
For example, the instruction add sums the values in the destination and the source operands and stores this result in the destination operand.
The destination operand is always a register, whereas the source operand may be a register or a literal number.
This is a list of instructions your program must support:
| opcode | operation performed |
|---|---|
mov | destination := source |
add | destination := destination + source |
sub | destination := destination - source |
mul | destination := destination * source |
div | destination := destination / source |
xor | destination := destination ^^^ source |
and | destination := destination &&& source |
or | destination := destination ||| source |
shl | destination := destination <<< source |
shr | destination := destination >>> source |
Other than those, your program must support the two-operand cmp instruction.
This instruction does not modify any register, but instead compares the destination and the source operands and sets an internal state of the program to one of three options:
destination > sourcedestination == sourcedestination < sourceHow you keep track internally of this state is up to you.
There are some instructions that alter the flow of the program, transferring execution to another point of the code. They are called jumping instructions.
They all take just one operand, which is a label that indicates the target of the jump, i.e., the point of the code where execution will continue:
<opcode> <label>The jmp instruction always makes the jump to the target label.
Other jumping instructions make the jump only if the internal state of the program is set to a specific value.
This means they are usually preceded by a cmp instruction.
Those are the jumping instructions your program must support:
| instruction | operation performed |
|---|---|
jmp | unconditional jump. The jump is always performed |
je | jumps if the internal state is equal |
jl | jumps if the internal state is less than |
jg | jumps if the internal state is greater than |
Instruction opcodes, be they two-operand or one-operand, are case-insensitive.
This means that add, ADD and aDd all refer to the same instruction.
In this exercise, you will define the syntax and execution model for a small language inspired by the x86-64 assembly language.
The goal is to formalize how code is written, parsed, and executed within a constrained environment.
Your assembly code will be written inside a special construct:
let program := assemble!(
-- assembly code here
)This construct must produce a program, which simulates a function that executes the assembly code and returns a value.
This program is then invoked using the following syntax, where a, b, c, etc. are arguments:
let result := program(a, b, c, ...)All arguments and the return value are 64-bit signed integers. A maximum of 6 arguments is allowed.
The return value and all arguments are passed in registers. Your language must support the following registers:
| register | role |
|---|---|
rdi | 1st argument |
rsi | 2nd argument |
rdx | 3rd argument |
rcx | 4th argument |
r8 | 5th argument |
r9 | 6th argument |
rax | return value |
Registers start with a default value of 0, unless they are used to pass arguments to the function.
For example, in the following program, all registers have a value of 0 with the exception of rdi, which is initialized with 10, and rsi, which is initialized with -20:
program(10, -20)The return value of the program is always stored in rax.
Register names are case-insensitive.
This means that rax, RAX and rAx all refer to the same register.
Each line in the assembly code can be of two forms:
Unless modified by some instruction, the execution flow of the program proceeds linearly. This means that a previous line is fully executed before the line after it is executed.
Labels have the following syntax:
<label>:They do not alter the value of any register or have any effect on the program other than marking specific places of the code so they can be used by instructions.
Labels are case-sensitive.
This means that Start, start and sTart are all different labels.
Most instructions have the following syntax:
<opcode> <destination>, <source>The opcode indicates the operation being executed, using the values of the destination and source operands.
The result of the operation is stored in the destination operand.
For example, the instruction add sums the values in the destination and the source operands and stores this result in the destination operand.
The destination operand is always a register, whereas the source operand may be a register or a literal number.
This is a list of instructions your program must support:
| opcode | operation performed |
|---|---|
mov | destination := source |
add | destination := destination + source |
sub | destination := destination - source |
mul | destination := destination * source |
div | destination := destination / source |
xor | destination := destination ^^^ source |
and | destination := destination &&& source |
or | destination := destination ||| source |
shl | destination := destination <<< source |
shr | destination := destination >>> source |
Other than those, your program must support the two-operand cmp instruction.
This instruction does not modify any register, but instead compares the destination and the source operands and sets an internal state of the program to one of three options:
destination > sourcedestination == sourcedestination < sourceHow you keep track internally of this state is up to you.
There are some instructions that alter the flow of the program, transferring execution to another point of the code. They are called jumping instructions.
They all take just one operand, which is a label that indicates the target of the jump, i.e., the point of the code where execution will continue:
<opcode> <label>The jmp instruction always makes the jump to the target label.
Other jumping instructions make the jump only if the internal state of the program is set to a specific value.
This means they are usually preceded by a cmp instruction.
Those are the jumping instructions your program must support:
| instruction | operation performed |
|---|---|
jmp | unconditional jump. The jump is always performed |
je | jumps if the internal state is equal |
jl | jumps if the internal state is less than |
jg | jumps if the internal state is greater than |
Instruction opcodes, be they two-operand or one-operand, are case-insensitive.
This means that add, ADD and aDd all refer to the same instruction.