If you can see this check that
Technical Information
This page discusses the technical aspects of the linuxzoo site.
I will add to this over time, and it is partially a guide to people
interested in how the site works, and partially a reference source
for myself.
Overview
This site makes use of User Mode Linux. This allows complete Linux
installations to run as a process within another Linux installation
(kind of a linux-in-linux scheme). The parent Linux is called the HOST,
and the child process-based Linux is known as a GUEST. Users are allocated
guest machines on a queueing basis.
The main server of this site just handles management and communications.
Other servers act as HOSTS, and are linked together using a virtual network
build using tunneling software. Each HOST is controlled by a daemon, which
starts up and stops the GUESTS on that HOST. Each time you connect to the
server, it speaks to your HOSTS daemon, and controls you GUEST remotely.
The GUEST machines each seem to have disk space to run from, but actually
this is just some files in the HOST. The daemon sets these files up for
you, and deletes them as and when required. In this way you can start with
a freshly-installed system at the touch of a button, which is perfect
for system administration tutorials where it is all to easy to mess your system
up!
The nice thing about this architecture is the reliability and self-managing
factors which is has. Machines can go down, GUEST or HOST machines can crash,
networks can fail, but the system (should) regenerate itself over time. It
is self monitoring, and problems can usually be detected within a minute
and corrective action completed within 3 minutes. If things get really bad
HOSTS become isolated from the system, and the affected users are requeued
for the next available GUEST machine.
The architecture was designed to be scalable. When this document was written
we had 80 GUESTS running over 8 different machines, and no sign of
server bottleneck. Our plan is to have 100 GUESTS by September 2005.
Machine topology: April 2005
| 146.176.166.1 |
| linuxzoo.net |
| (gateway and web server) |
| 2.4Ghz Dual,1GB |
| 10.200.0.1 |
|
| | |
| |
| | |
| |
| |
| |
| |
| 10.200.0.4 |
| (146.176.162.82) |
| uml2 |
| hub |
| 2.4Ghz,0.5GB |
| 10.0.2.254 |
|
| 10.200.0.3 |
| (146.176.162.83) |
| uml1 |
| hub |
| 2.0Ghz,2GB |
| 10.0.1.254 |
|
| 10.200.0.6 |
| (146.176.166.11) |
| linuxzoo1 |
| hub |
| 2.4Ghz,1GB |
| 10.0.5.254 |
|
| 10.200.0.7 |
| (146.176.166.9) |
| linuxzoo2 |
| hub |
| 2.4Ghz,1GB |
| 10.0.6.254 |
|
| 10.200.0.8 |
| (146.176.166.10) |
| linuxzoo3 |
| hub (free users) |
| 2.4Ghz,1GB |
| 10.0.7.254 |
|
| | |
| |
| |
| |
| |
| UMLs: 10.0.2.x |
| 7 machines |
|
| UMLs: 10.0.1.x |
| 15 machines |
|
| UMLs: 10.0.5.x |
| 15 machines |
|
| UMLs: 10.0.6.x |
| 15 machines |
|
| UMLs: 10.0.7.x |
| 15 machines |
|
Each UML is connected to its hub via a "tap" device. In turn each hub is
connected to linuxzoo.net via an openvpn encrypted tunnel. From linuxzoo.net,
packets then travel across the internet.
Security
The site was designed with security in mind, yet the focus was really on
trackability rather than limiting what users could do. However, there are
some firewall rules in place to stop some activities, including sending
emails from the GUESTS.
Tracking
The site is currently based at Napier University. Here we have two hardware
firewalls between us and the real world. One of these firewalls has full
packet logging which gives us perfect network logging. On the gateway we also
have significant logging capabilities. The gateway logs are sufficient
to link a user's IP with any network action which leaves or enters the gateway.
If a user tries to hide the browser IP, then the system will not
recognise that user when they try to log into their machine. The system will
also handle NAT firewall users, although when multiple users connect from a
single NAT trackability is reduced slightly under some circumstances. We also
track web server requests and login requests. These logs are processed
automatically and are accessible by the user in question through their login.
Future Work
This is a list of the things I have in mind to do on the system.
- Give users a CPU and network quota for the week.
- Store the student performance on the quiz questions
- Build the packet log continuously rather than every n days
- Provide incremental assessment system in additional to tutorials.
- Kick users with a machine but who are not using it.
- Convert COW images to tar files (and back again) for efficient user storage of files.
- Transport user changes to images between machines.
- Provide fedora core 3 as an image option.
- Provide gentoo as an image option.
- Give users a COW disk quota.
Changelog:
Rather than version numbers and incremental changes, this changelog
uses the date of the change.
26th April 2005
21st April 2005
- Tidying up the code so that if (or when) it is available for redistribution the installer can configure the system with a single configuration file. Up to this point some things were hard-coded, and others spread over a few data files. Should be much simpler now, but I was a bit surprised how many places broke when I was working on this! As this was quite low-level, I had no real choice but to put this into the mainstream directly. Sorry if it has caused any problems.
- My beta interface now supports multiple images. At the moment it allows
you to choose either Fedora Core 2 or 3. Once the hard-coding of the details
of this are moved to the main conf file, I will put this into mainstream.
- Plenty of logging added to the system, and some security changes. Should keep our security manager off my back for a while...
29th March 2005
- Added public keys to the root user. At the moment the tutorial checker
logs in via telnet, but our long-term plan is to move to logging in via ssh.
This change means that we can log in without a password to the VMs. Not
that the password is hard to get mind you!
- Firewall settings changed. Now no other VM can log into a second VM, unless
a user disables the firewall rules. This simplifies my liability worries...
- CGI scripts involved in the notes generation are now disguised as static
pages. They return reasonable Last-Modified dates and Content-Length, and
have had the .cgi extension removed. G**gle just would not scan them
otherwise. Wierd.
- YUM update of Core 2.
- Bug fix: an image update which occurs while you are running a machine
will be saved, but from that point on will silently fail to boot. You have
to reboot with a fresh image. This is fixed for all users except those
on at the time I noticed the problem. I will fix these by hand.
25th March 2005
- Virtual machine control panel is now embedded on almost all the pages available on the site. Thus you now do not need to leave an additional browser window open to keep your virtual machine running.
- New layout. If only I knew a web designer!
- Online quiz and forums added.
- Users can now see the connection and action logs for their user account
- Improved filesystem security for site datafiles.
7th March 2005
- Over the weekend I discovered many security exploits which would
allow you to break out of your linux guest machines and into the real
host machine. This included discovering you can still load modules even
when the kernel is compiled without module support! Today I have spent
a few hours hardening the system. I am not saying that it is impossible
to hack into the host, but once again I dont know how to do it myself!!!
- I now monitor for machines which have been booked and are running
but have no telnet or ssh connection. I dont approve of booking a machine
and not actively using it for the tutorials. In the next week or so I will
be logging users out automatically from the system when they appear to be doing
nothing.
24th Feb 2005
- Tunnels between machines now use persistent tun devices. This should mean
that a tunnel reset does not automatically terminate your telnet or ssh
connections.
- Pipes between hub managers and the linux guests were left in blocking
mode. Why the system still worked is a wonder, but it is fixed now. Could
have been related to strange lockups which seemed to happen every so
often.
- Fault with 10.0.6.* hub. This bad network performance was traced to a faulty network patch cable, and this has been fixed. Still some packet corruption at
high network loads, but this occurs only under stress-testing.
17th Feb 2005
- Hub managers now monitor your CPU usage, and will renice your linux
system if you are using more than the average. Should help average performance.
- IMAGE UPDATE fedora 2: Lockups after 70 minutes of login time were caused
by anacron kicking in. For this reason all cron and anacron jobs have been disabled. I also disabled a few other boring services.
- IMAGE UPDATE fedora 2: YUM update.
12th Feb 2005
- IMAGE UPDATE fedora 2: /etc/securetty increased root logins to 400.
- IMAGE UPDATE fedora 2: /tmp had bad permissions.
- IMAGE UPDATE fedora 2: YUM update.
Feb 2005