Cesnet Liberouter
  • Projects
  • Liberouter
  • Scampi
  • FlowMon
  • NIC
  • NIFIC
  • IDS
  • NetCOPE
  • VHDL design
  • System software
  • Testing
  • Formal verification
  • Netopeer
  • Documents
  • Our hardware
  • Card Availability
  • Our partners
 

System Support Group

Author: Tomas Podermanski
Contact: tpoder@hawk.cis.vutbr.cz
Date: 4.4.2007

Abstract

The aim of this report is to describe activities of the system support group in Liberouter project. Initial tasks are mentioned here together with particular solutions.

Introduction

In projects with size comparable to Liberouter many subsidiary problems appear, which do not have to be linked with the aims of the project directly. The leadership knows this fact and does not underestimate the importance of these problems, which has led to establishing the System support group. The main task of this group involves ensuring management routines which would hinder developers in their work otherwise. In the concrete, one of the tasks was to create data centers ensuring development environment for the needs of the developers. Although the members of the System support group participate in maintenance of data centers on their home universities, it is needed to realize that environment of programmable hardware differs in many ways from usually demands.

Initial Tasks

  • Operational systems Linux and NetBSD have to be installed on all development PCs. The operational system can be changed by common developer.
  • It has to be possible to reboot PC remotely - with switching power supply off in an ideal way.
  • The environment of development PCs has to be unified, which means that the same version of operational systems, development applications etc. has to be running on all PCs. Eventual distribution of changes has to run in an automatic way.
  • Each developer has to be able to remove temporarily particular PC from so-called unified administration service. This state is needed when one needs to do some extensive development or testing tasks. The backward joining of PC into the unified administration service has to be automatic.
  • Eventual damage of file system (e.g. due to often system failures when debugging) has to be repairable in hours.
  • Home directories and project data store have to be shared on every development PC.
  • Accounts on PCs have to be bound with AAI mechanisms of CESNET, whereas each developer has to be able to access PC as a root.
  • It has to be possible to work on PCs even in so-called offline mode, i.e. without network connectivity.
  • Tasks to be done directly in computer rooms have to be minimized. It is required because of the difficult management of access (access under the supervision, weekend visits etc.).
  • Working plan has to be created for eventual operations done directly in workrooms (integration of new sets, card exchange, modification of testing interconnections etc.).
  • The part of software (program packages and firmware created in the scope of the project) has to be modifiable by the developers without the need of intervention of members of System support group.
  • The environment has to be unified on all 4 work areas and it has to be transparent for developers.
  • Backup of important data and monitoring of whole system must be supported

Solution

Connecting of Data Centers

Working areas are scattered into different places. Therefore, it was necessary to prepare suitable connection of all data centers into uniform project computer network. Namely, it is computer room and laboratory of the Computational technology department of Masaryk University and computer room and education laboratory of Faculty of Information Technologies. After analyzing requirements of network infrastructure, the realization based on tunnels of link layer was chosen. The advantage is obvious - easy to respect the security rules of both schools, simpler configuration of particular services, simpler address management, independency of location of the device (moreover, eventual movement of devices causes no further changes in configuration). The connection is realized with the means of dedicated metalic connections and virtual networks. Endpoints are realized by vtun software (http://www.vtun.org) and 1U servers on each side of tunnel.

User Accounts Management

When implementing system for user accounts management the problem occurred. It was needed to use existing authenticate mechanisms of CESNET while, on he other hand, it was also needed to enable work in offline mode. Moreover, it was required that for every developer the root account is available. Based on these requirements the system of distributed database of user accounts and passwords was created. Each developer can create or modify his or her own password on private web pages. The implementation of the system is based on both MySQL and created synchronization scripts in language Perl.

Remote Installation/Reinstallation of the Development PCs

It is quite usual that it is needed to perform complete reinstallation of operational system on development PCs. Previously, it was done by hard-copying of the whole disk, but as total amount of maintained computers increased, this method became unbearable. The actual system allows performing complete reinstallation of the operational system in a total remote way. Two tasks had to be done in order to achieve this:

A) Remote Access to Console of Development PCs via Network

Remote access to console allows controlling keyboard and mouse remotely and VGA output is sent to the user. Relatively lots of devices ensuring mentioned functionality is available, however, the choice taking into account all our requirements was not easy. Finally, Municom IP Smart Link was chosen - mainly due to positive operative experience. It was then installed in all work areas. Further, it was needed to interconnect all project PCs via KVM switches. In this case, ATEN CS 1016 device was chosen.

B) Booting via Network

Remote controlling of the PC is not sufficient for installation or reinstallation of operational system. The mechanism of booting (loading operational system after PC is turned on or restarted) via network was created. It is commonly used for booting the diskless computers. In our case it reads part of system which allows loading of the whole operational system on the disk. Complete reinstallation is then performed in several hours after reporting the fault. Mechanism PXE was used for loading of the operational system. Currently, it is supported by major part of Intel cards and network cards, which are integrated on mainboards.

OS Maintenance of Development PCs

For maintenance of the operational system the synchronization mechanism was created. Most of existing implementations dealing with this issue was rated as unsuitable (dependence on one operational system, insufficient configurability etc.). System for so-called unified administration is based on rsync and auxiliary scripts in Perl language. Following features were needed to be implemented: PCs with more than one operational system, possibility to remove PC from unified administration service, non-destructive handling of data which were erased or modified as a result of synchronization.

Power Supply

Together with the need of remote reboot of development PCs it was required to solve power supply of these computers. Plug sets APC were chosen for controlling the power supply the development PCs via SNMP protocol. Moreover, a script was created, which performs statements from internal database of development PCs, thus it is possible to perform reboot or turning on/off of the system very simply by typing a statement on command line.

Resources Reservations

In order to allow using the resources by developers as effective as possible, the reservation system was implemented. Each developer can create a reservation of particular computer. The maximum amount of reserved time is limited to prevent reserving a computer for one year for instance. When creating the reservation one can choose so-called exclusive reservation. In this case PC is removed from the unified administration service and from the monitoring system. Exclusive reservations are helpful e.g. in the case of testing installations of various operational systems which are not supported in the unified administration system.

Backup and Archiving

It is clear that huge amount of data is created in the scope of mentioned units. It is also clear that it is needed to safely backup this data, so disk raid with capacity 2 TB was purchased and all backups are saved there with one day period. The backup strategy supports creating of incremental additions, which allows keeping the data for 90 days.

Together with backup system the mechanism for data archiving was created. It performs copying key-parts of the project, which is namely CVS tree, created firmware and binary packages. Archiving is performed once a year on DVD disk which is then put on the safe place.

Conclusion

In the Liberouter project it was successful to create the environment for smooth work of developers. It is clear that the environment evolves constantly and it is needed to adjust it time to time, however gained experience allows performing the changes relatively very quickly.

Main Page About Liberouter Team Mailing list SVN Contacts