In the hardware development, there are many aspects that have to be taken into account, i.e., performance, complexity, scalability, design demandingness and price. According to a deep analysis of these criteria, the right strategy has to be selected. In comparison to the software development, hardware design and development is a long-term process and bug fixes cost much more resources. Hardware/software codesign is one of the modern hardware design approaches. This method starts with software solution of the task, moves the most demanding algorithms into hardware step-by-step and the speed of the solution increases.
Thereby, there are two basic stages in this scenario, software design and hardware design. When developers switch from one to the other, there is a natural demand for similar principles and design methods. Algorithms should be described by a programming language and easy to debug. These requirements are fulfilled by the technology of programmable hardware. The technology enables algorithms being described by a specialized hardware-oriented programming language and the simulation of this code. The typical implementation of a hardware algorithm is a state machine. Programmable hardware usually utilizes the Field-Programmable Gate Arrays (FPGA) technology
However, construction of state machines using a hardware-oriented programming language and simulation displaying values of signals changing in time is not a preferred way of work for a programmer. In scope of open source systems, contribution of as many developers as possible is very desirable. At this point, we face the problem of development being too expensive, considering either required programmer knowledge or price of development tools. That is why Liberouter has introduced a new idea into the programmable hardware technology. Complicated single-purpose state machines are replaced by simple specialized processors that represent a transition between state machines and full-featured processors, thereby called nanoprocessors. This idea divides the task of hardware algorithm implementation into two parts. The first one is the nanoprocessor, which has to be capable of efficient nanoprogram execution. The second one is the development of the algorithm itself in the form of a program that runs on a nanoprocessor; such called nanoprogram.
The need for development tools
The introduction implies that the projects of HW/SW codesign are very broad --- there are many levels, where programming is needed.
Considering a project of HW/SW codesign for a PC platform, where the hardware implementation utilizes nanoprocessors, there are three basic development areas:
- PC programs,
- programmable hardware,
- nanoprograms.
There is no need to list all the development tools available for PCs. The development tools are numerous, open source projects prefer C, C++ and Java. There are also many tools for the development of designs for programmable hardware, the best known hardware-oriented programming languages include VHDL and Verilog. There are complex development environments for these.
Nanoprogram being viewed from the point of the definition above is a sequence of ones and zeros in the machine code of the given nanoprocessor. Of course, it is possible to create nanoprograms on this elementary level. However, practical work showed that this attitude does not lead to satisfactory results. Errors occur and each change in the instruction set forces large-scale patching of existing nanoprograms or their complete rewriting. Another disadvantage of this attitide is the fact that the interpretation of a nanoprogram requires a functioning version of the nanoprocessor (at least in the form of the source code).
That is why the decision was taken to develop a development system for nanoprograms that would include:
- compiler,
- disassembler,
- debugger,
- interpreter.
The standard set of functions suggests an analogy with the development of PC programs. However, there is a difference that even strenghtens the need for a development environment for nanoprocessors. Nanoprograms are not developed for a ready-made platform, but the target platform, the nanoprocessor, is developed in parallel. Independent development environments for nanoprograms and nanoprocessors enable easier debugging and parallel work of both programmers with no mutual slowing down. What is more, nanoprocessor designer gains early feedback from the programmer of the nanoprogram. A space for tuning up the instruction set comes up. The solution may be optimized well before the hardware design becomes fixed.
Let's sum up the reasons for a development system for nanoprograms:
- programmer's efficiency,
- cross platform development: the target platform is not ready to use or is not available all the time,
- nanoprocesors are under construction, instruction set is not final.
Other goals of the development system:
- concurrent development of nanoprograms and nanoprocesors,
- cooperation on the optimalization of the final solution,
- independent debugging,
- programmer of the nanoprogram does not have to know the hardware implementation,
- development environment that suits the programmer.
Nanoprocessor Simulator: nsim
An important issue of development system design is the decision whether develop an extra system for each nanoprocessor or try developing a generic system. The solution came up when the straint on genericity was recognized even when developing a system for a single nanoprocessor. The main reason is the changing instruction set. That is why a generic tool was designed, Nanoprogram Simulator, nsim. This PC program is a complete development environment for nanoprograms.
Nanoassembler
Nanoassembler is a simple assembly-class programming language. This kind of programming language was chosen due to speed constraints and highly specialized instruction sets.
The use of a high level programming language, i.e., C, would be possible for some nanoprocessors. However, a compiler from C to the nanoassembler would have to be written for each nanoprocessor. Moreover, nanoprograms would be too far from a specialized instruction set, not utilizing its speed.
Nanoassembler is based on standard assembler syntax and C-style directives. Here is an example:
- concurrent development of nanoprograms and nanoprocesors,
- cooperation on the optimalization of the final solution,
- independent debugging,
- programmer of the nanoprogram does not have to know the hardware implementation,
- development environment that suits the programmer.
#define L2_REG 0x60
#define L3_REG 0x61
#define VLAN_PRESENT 14, L3_REG
#define UNKN_ETH_PROTO 4, L2_REG
start:
clr L2_REG
clr L3_REG
movc 1
movr INDD, ACC
out 63, DRAMADDR
movr LC, MAC_ADDR_LENGTH
mac_addr:
outi DIN
loop mac_addrv
movr ACC, DIN
cmp L2_VLAN_TYPE
jmpz l2_vlan
Except of the standard constructions for instructions, labels or directives for conditional compilation, the instruction set description is included in the Nanoassembler specification. The basic instruction definition consists of the mnemonics, description of operands and machine code format. This is enough information for nsim to perform compilation. Prior to simulating a nanoprogram, the definitions of instructions must be appended with semantics in the C programming language.
Here is an example that demonstrates an instruction definition.
#define jmp $iadr :0001000 $iadr[11]{\
ip=code1&IADDR_MASK;\
};Jump to iadr
This code defines an unconditional jump. The mnemonics is jmp. There is exactly one operand, iadr. The occurence of the operand on the left side describes the instruction syntax, the right side defines machine code format. The colon in followed by the opcode of the instruction, the operand and its length. Optional part begins with the brace, it is the instruction semantics. Semantics of unconditional jump is simple: the operand is assigned into ip (instruction pointer). The ip is one of automatic variables that are available for semantics description. Backlash denotes that the current line is continued on the next line. Semicolon is the beginning of a comment. The instruction jmp may be used anywhere in the Nanoassembler code that follows this example definition. The programmer can write, i.e., jmp 3. Such line would be compiled into 000100000000000011.
nsim
Program nsim is a generic compiler, simulator and debugger. It processes the source code in Nanoassembler. This code must contain or refer to an instruction set definition. According to command line parameters nsim nanoprogram_file [options] compilation or simulation of a nanoprogram is performed. Simulator consists of interpreter and debugger, which accepts gdb-style commands. The user interface is command line, enhaced interfaces are currently under construction.
The development system generates outputs for:
- nanoprogram debugging,
- programmable hardware simulator (nanoprocessor development),
- target hardware.
This sequence is the logical order of stages in nanoprogram development. The first stage requires the most functionality from the development system. Nsim allows the algorithm to be tested on a PC with no additional software or hardware equipment. Debugging involves standard operations such as viewing memory contents, stepping, breakpoints, etc. In the stage number two, the nanoprogram is compiled and utilized in programmable hardware simulators. At this point, nanoprogram should be already debugged creating a reliable part of a testbench for the nanoprocessor. Functioning of nanoprogram and nanoprocessor is under test. Note that the nanoprogram is again interpreted by some software, not by nsim, but by a simulator of programmable hardware according to nanoprocessor (not nanoprogram) source code. The lifetime of a nanoprogram is completed in the stage number three. Having been compiled by nsim, it is loaded into the target hardware and interpreted by the nanoprocessor (that has also been compiled by a relevant tool). Stage number two and three are almost identical for nsim. Compilation is the major part, only outputs slightly differ; raw binary code is generated for the target hardware, whilst programmable hardware simulators usually prefer text files.
Implementation of nsim
Nsim was designed for UNIX systems that conform to the POSIX standards. The program has been written in the ANSI C programming language and it has been tested under BSD, Linux and Windows (a UNIX emulator has to be used, i.e., cygwin).
The program is compiled in two passes. The first pass is carried out by the preprocessor, the second pass is the compilation itself.
The interpreter works with with the compiled nanoprogram in the binary code of a nanoprocessor. Because nsim runs on a PC, the simulation process involves intermediate code interpretation. Genericity of the interpreter is provided by a technique that generates variable parts of the interpreter (C code) from the instruction set definition, then joins variable parts of the interpreter with those that are ready made, runs compilation of the interpreter (utilizing the GNU gcc compiler) and finally uses this customized interpreter to execute the nanoprogram. All these actions are transparent to the user, the nsim interface refers to user's nanoassembler source only.
More information on nsim is available in the CESNET technical report 5/2003
Evaluation
Nsim solves the task of nanoprogram simulation in a smart way. The instruction semantics is described in the C programming language that is well-known among software developers and nsim is written in C, too. Putting two and two together, we come up with an efficient solution as described above. Simulation of one instruction of a nanoprogram requires approximately 100 instructions of the host machine. In other words, on a 2 Ghz PC, nsim simulates 20 Mhz nanoprocessor in real time.
Fast simulation opens up new possibilities. For example, nsim with extended versions of nanoprograms may be used for a part of the software implementation of the algorithm; or, nsim may be utilized to emulate a part of hardware that is out of order.
Considering nsim genericity, there are some known limitations that arise from the data types used. However, the system is assumed being generic enough for today's and future applications in scope of the Liberouter project. Nsim is kept being improved and its next generation is being developed.
Nsim allows programmer to develop a nanoprogram by those means that are usual for software development. This solution brings all the advantages described in the introductory section.
Conclusion
The idea of nanoprocessors extends usual conception of programmable hardware. State machines are replaced by simple specialized processors---nanoprocessors. That results in dividing the development into two parts---nanoprocessors and nanoprograms. Implementation of an algorithm into hardware simplifies since most algorithm changes reflect only in patches in the according nanoprogram instead of reprogramming the whole programmable device.
Nanoprocessors are representatives of custom computing in the world of programmable hardware. Such processors are usually programmed in the machine code. The Liberouter project has issued nsim, the complete development equipment. The system includes programming language that enables user to specify an instruction set. Nsim is actively utilized in scope of the Liberouter project. The development of nsim itself continues in order to reach full genericity and deploy new features such as maintaining timing information. Graphical interface is a great challenge, too.


