In this tutorial we will learn how to write a nanoprocessor instruction set.
Part I. Syntax and machine code
Consider we want to create a nanoprocessor performing XOR, MOVR, STA, LDA, IN and OUT operations with the following semantics
STA n | saves the value `n' into the accumulator |
LDA addr | saves the content of the accumulator into the memory at address `addr' |
MOVR addr1, addr2 | saves the value stored at address `addr2' into the memory at address `addr1' |
XOR addr1, addr2 | performs xor operation between values stored at addresses `addr1' and `addr2' |
IN | reads a number from the input stream and saves it into the accumulator |
OUT | writes the content of the accumulator into the output stream |
Our nanoprocessor will access ACCUMULATOR (processor register) and a linear memory.
Now let us create a file with the description of the nanoprocessor in nanoassembler, i.e., instruction.def.
First, we can specify the name of the nanoprocessor by writing
#name "XOR_MACHINE"
this directive is not mandatory.
It is necessary to specify the length of instructions in bits by the `#instrlen' directive
#instrlen 20
Some constants (macros to be more precise) will be put to use. These can be defined by the `#define' directive
#define regaddr_length 8
Before we can write definitions of instructions, we have to know their format in machine code. Consider following:
sta a: 0111xxxxxxxxaaaaaaaa
lda n: 0011xxxxxxxxnnnnnnnn
movr b, a: 0001bbbbbbbbaaaaaaaa
xor b, a: 1111bbbbbbbbaaaaaaaa
in: 1011xxxxxxxxxxxxxxxx
out: 1001xxxxxxxxxxxxxxxx
For example, the sequence 0111xxxxxxxxaaaaaaaa describes an instruction twenty bits long with one parameter `a'. This parameter is eight bits long and is stored in the lowest eight bits of the instruction. Next eight bits (the sequence of letters `x') are so called `don't care' bits---that means bits with unspecified value. The sequence 0111 is the opcode (operation code) of the instruction.
In nanoassembler, instructions may be defined by the `#define' directive
#define sta $a :0111 $x[regaddr_length] $a[regaddr_length]
#define xor $b, $a :1111 $b[regaddr_length] $a[regaddr_length]
On the left side (preceeding the letter `:'), there is the format of the instruction as it will appear in a nanoprogram. On the right side, there is the machine code specification. Tokens $a and $b represent instruction parameters. $b[regaddr_length] $a[regaddr_length] specify the length and position of both parameters in the machine code.
Token $x describes `don't care' bits.
It is possible to write more `wild' definitions (if they have sense) like
#define wild $a, $b, $c :0 $a[11] 10 $b[1] $c[4] $x[1]
Back to our sample nanoprocessor. The syntax of the XOR instruction has been described. Definitions of remaining instructions follow
#define movr $b, $a :0001 $b[regaddr_length] $a[regaddr_length]
#define sta $a :0111 $x[regaddr_length] $a[regaddr_length]
#define lda $n :0011 $x[regaddr_length] $n[regaddr_length]
#define in :1011 $x[16]
#define out :1001 $x[16]
Now we have the machine code of our instruction set defined. We could stop at this point with defining the instructions, if we just wanted to compile programs without interpretation or debugging.
Part II. Semantics
In case we want to interpret/debug our nanoprograms, there is instruction semantics left to be written.
Before we proceed with our tutorial, we have to understand, how nsim works.
The whole semantics of instructions is written in the C language. If we want
to define variables, macros, C directives, the initial part of the interpreter
and instruction semantics, we use powerful feature of nanoassembler:
the embedded C code. There are two ways to use it:
`#define {embedded C code}'
to define variables, macros, C directives and
the initial part of the interpreter.
`#define instruction $param : opcode $param[length] {embedded C code}'
to define the semantics of an instruction.
Prior to nanoprogram interpretation, nsim inserts the embedded C code into
an interpreter template creating a custom interpreter. The interpreter is then
used to simulate and/or debug the nanoprogram.
Note: There are three possible forms of the `#define' directive. The first form
#define my_macro 0xfe, const+1
defines the macro `my_const' which can be used anywhere in the code. This form is often used e.g. to simplify or parametrize the notation of instruction definition (see `regaddr_length' in the previous text).
The second form
#define {#include<sys/stat.h>}
#define {#define my_const 0xfe}
#define {char* title;}
instruct nsim to include a C code closed in brackets into the template of the generated custom interpreter. The `#define{...commands...}' containing C commands is treated as the initial part of the interpreter.
The third form
#define sta $a :0111 $x[const_length] $a[const_length] {\
areg = &(memory[(code1 & CONST_MASK)]);\
*areg = *acc;\
ip ++;\
}
is used to define the semantics of instructions.
NOTE: Each line of the C block with multiple lines MUST be ended by the backslash character `\'.
As we have mentioned before, our instruction set will use the special register ACCUMULATOR and a memory array. First, we should define some helpful constants and include needed libraries.
#define{#include<stdio.h>}
#define{#include<sys/types.h>}
#define {#define ACCUMULATOR 256}
#define {#define MEMORY_SIZE 256}
#define {#define CONST_MASK 0xff}
#define {#define REGADDR_MASK 0xff}
#define {#define REGADDR_SHIFT 0x8}
Let us create a memory of the size MEMORY_SIZE, accumulator and two pointers into the memory.
#define {\
u_int8_t memory[MEMORY_SIZE + 1];\
u_int8_t *acc = &(memory[ACCUMULATOR]);\
u_int8_t *areg = NULL;\
u_int8_t *breg = NULL;\
}
Notice that the accumulator points into the memory at address 256.
Let us write the semantics of the XOR instruction. We will extend the basic instruction definitions that we have described in Part I:
#define xor $b, $a :1111 $b[regaddr_length] $a[regaddr_length]
The code describing behavior of the instruction has to be closed into the brackets `{}' appending the basic description.
#define xor $b, $a :1111 $b[regaddr_length] $a[regaddr_length] {\
breg = &(memory[((code1 >> REGADDR_SHIFT) & REGADDR_MASK)]);\
areg = &(memory[(code1 & REGADDR_MASK)]);\
*acc = (*areg ^ *breg);\
ip ++;\
}
There is an automatic variable `code1' holding the binary representation of the current instruction. Therefore the easiest way to get the value of the parameter `a' is to get the lowest eight bits of the variable `code1'. The value of `a' is then stored into the variable `areg'.
areg = &(memory[(code1 & REGADDR_MASK)]);\
Note that we have previously set the C macro REGADDR_MASK to the value 0xff.
The following command will store the value of `b' into the variable `breg'.
breg = &(memory[((code1 >> REGADDR_SHIFT) & REGADDR_MASK)]);\
The next command performs xor operation:
*acc = (*areg ^ *breg);\
And what is the purpose of the command `ip ++;\'? The variable `ip' is an automatic variable representing the so called `instruction pointer'. Therefore the command `ip ++;\' instructs the interpreter to process the instruction that directly follows in the nanoprogram.
NOTE: If we want to implement a simple jump instruction, we can define the following `jmp' instruction:
#define jmp $a :1101 $x[5] $a[jaddr_length] {\
ip = (code1 & JMP_MASK);\
}
In real applications, we usually develop nanoprocessors that perform input/output operations. It is necessary to include according instructions into the instruction set.
We need to implement IN and OUT instructions with semantics described in the beginning of Part I. The following code will do the trick
#define in :1011 $x[16] {\
if (feof(inp)) {\
fprintf(outp, "\n");\
if (inp!=stdout) fclose(inp);\
if (outp != stdout) fclose(outp);\
force_finish = 1;\
} else fscanf(inp, "%d", acc);\
ip ++;\
}; reads a value from the input file
#define out :1001 $x[16] {\
fprintf(outp, "%d ", *acc);\
ip ++;\
}; writes the value of the ACCUMULATOR into the output file
The sense of the code is clear. The only unknown and interesting thing is the expression `force_finish = 1;\'. The `force_finish' variable is an automatic variable. If an instruction sets the value of `force_finish' to 1, nsim interprets the rest of the instruction and terminates the execution of the nanoprogram.
We went through the definitions of the XOR, JMP, IN and OUT instructions. The implementation of the remaining instructions is located in the file
`instruction.def'.
that should accompany this tutorial.
Part III. Nanoprocessor parameters (interpreter options)
We have implemented IN and OUT instructions. Each of these instructions reads/writes a value from/into the input/output stream. Therefore we need to specify these two streams by some parameters. Let us use the parameter `-i' to specify input file and the parameter `-o' to specify the output file.
All nanoprocessor parameters are given by nsim to the nanoprocessor via automatic variables `int argc' and `char *argv[]'. Their semantics is the same as in C programs.
Processing of parameters is done by the initial part of the interpreter:
#define {\
int i;\
int ai;\
int ao;\
FILE *inp, *outp;\
ai = ao = 0;\
inp = NULL;\
outp = NULL;\
for (i = 0; i <= MEMORY_SIZE; i++) memory[i] = 0;\
for (i = 0; i < argc; i++) {\
if (argv[i][0] == '-') {\
if (argv[i][1] == 'o') ao = 1;\
else if (argv[i][1] == 'i') ai = 1;\
else if (argv[i][1] == 'h'){\
printf("\nXOR machine interpreter syntax:\nnsim <program> [-n] -i [options]\n");\
printf("\t-n\tno debug (inputs are converted to outputs only)\n");\
printf("options:\n");\
printf("\t-h\tprints this help screen\n");\
printf("\t-i file\tdefines input stream (text of decimal numbers)\n");\
printf("\t-o file\tdefines output stream (text of decimal numbers)\n");\
force_finish=1;\
}\
} else {\
if (ao && outp == NULL) {\
outp = fopen(argv[i], "w");\
if (outp == NULL) {\
fprintf(stderr, "error: output file %s cannot be opened\n", argv[i]);\
}\
}\
if (ai && inp == NULL) {\
inp = fopen(argv[i], "r");\
if (inp == NULL) {\
fprintf(stderr, "error: input file %s cannot be opened\n", argv[i]);\
}\
}\
}\
}\
if (inp == NULL) inp = stdin;\
if (outp == NULL) outp = stdout;\
}
Once instructions and the initial part of the interpreter are defined, it is possible to call nanoprocessor with its specific parameters by typing
nsim sample_nanoprogram2 -n -i -i numbers.txt -o xor_numbers.txt
Which parameters belong to nsim and which belong to the nanoprocessor? Note that the text following the first `-i' parameter is given by nsim to the nanoprocessor as its own command line. This implies, that
-i numbers.txt -o xor_numbers.txt
is the command line of the nanoprocessor (interpreter options).
Part IV. Debugging directives
Displaying memory contents is an important part of debugging. However, it is necessary to specify the memory arrays that are desired to be accesible via debugger commands. Nsim debugger supports three diferent linear memories.
The sample nanoprocessor includes only one memory. To let the debugger know about the memory, the following lines have to be put into the nanoassembler code
#define { #define PRINT_MEM memory}
#define { #define PRINT_TYPE u_int8_t}
#define { #define PRINT_MIN 0}
#define { #define PRINT_MAX MEMORY_SIZE}
It is recommended that any nanoprocessor registers, i.e., the ACCUMULATOR are mapped into the memory. Note that our sample nanoprocessor has a memory of size 256 items, but the according definition (above in this tutorial) creates an array with 257 items. The extra item is just the ACCUMULATOR. This code enables us to use debugger commands such as
print 1
print 10, 15
print ACCUMULATOR


