UGCS 4.0 buildout
2007-08-22 05:11 - ugcs geek
I spent Friday through Monday back at Caltech with the other sysadmins participating in the full push to get the next iteration of UGCS finished. SURFs ended this past week, so the other three sysadmins were leaving after the weekend, and the next time we’d all be in one place would be after school starts in October. So, it was a choice of getting it done now, or waiting until Thanksgiving break to get it done. Ideally, my goal is to have things substantially ready by the first week or two of school so that we can push UGCS usage to the new freshmen and encourage a much higher cluster adoption rate among the undergraduate population.
Things went fairly well, although we had some snafus with regard to getting our hardware delivered; regardless, we made substantial progress towards having the cluster up and running. We came into the weekend with only two finished servers, each with 6 750gb hard disks in RAID 5, that had been configured to run AFS, LDAP, and a Kerebros KDC. We had an early setback when we discovered one of the hard disks had failed and that the machine would need to be taken down to prevent any additional failures that would result in data loss. I got very little sleep over the weekend and was so focused and engaged that I neglected certain bodily needs.
We finished up with three Cisco switches (2× 2950, 1× 2970) configured as a cluster, segregation of servers by VLAN and automatic VLAN assignment based on MAC address, transparent bridging, firewalling, intrusion detection, and filtering on a new bridge machine, a temporary mailserver with Postfix, Dovecot, Spamassassin, and Mailman running and delivering to AFS Maildir folders with the help of some very small patches to Postfix, working DNS and DHCP/Netboot, a partially set up new Kerberos/LDAP isolation machine, and a much better understanding of what is left to be done. With luck, we’ll be able to get to the stage of performing the actual switch-over the weekend before classes start.
I’m intending to write up what we’ve done after the build-out is complete so that others building similar clusters will benefit from the pitfalls we fell into and extricated ourselves from – I definitely found other blogs and HOWTOs extremely instructive throughout the entire process and want to reciprocate.

