Terminally Incoherent

Utterly random, incoherent and disjointed rants and ramblings...

Friday, August 26, 2005

Server Downtimes

Pegasus is down again! It has been down all day. When pegasus goes down on Friday, it usually means that it will stay down through Tuesday (if you are lucky that is). I have no clue what happened, because the only person I could ask is gone. So I have no clue if this was yet another power outage, or if they are doing something with the network.

I moved my newly created stylesheets to another server. They work for now. You will probably notice the lack of Fully Cully banner, my picture and the nicetitles script. All that stuff is on pegasus, and I don't feel like recreating these again... If it comes back up, I'll move that stuff over, but otherwise, screw it.

For those of you who don't know what pegasus is, let me tell you a story. MSU network is set up as an NFS network composed of various Sun Spark machines. Pegasus was the login server, a mail server and bunch of other things. There was also an application server (called smile I think) a file server, and few workstations providing various services (such as oracle server and etc...).

Anyways, pegasus had bunch of ports open to the outside world, including ftp, telnet and few other. It was not very smart, but hey - try teaching windows sheeople how to ssh. Our IT was not up to this task. But I digress. If you wanted to do anything on the network, you had to either log into pegasus or use one of the Spark machines on campus. All these machines (including Pegasus) had their /bin and /usr/bin mounted from smile and /home was mounted from the file server - so you could sit at any terminal, and you would have instant access to your home directory. Btw, I think all the workstations on campus authenticate via pegasus... Which made perfect sense when we had 30+ Spark machines in the lab... They are gone now.

Which makes this setup an instant recepie for disaster. If any part of the network goes, all the Spark machines crash horribly. If I'm wrong on the exact details here, please correct me. I never really investigated this throughly - I simply saw side effects of interrupted connection to smile or pegasus. They were not pretty.

Besides, who cares how exactly was the network set up. It was a mess - a house of cards. IT didn't care because this was not their priority. The sparc network was exclusively used by CS department meaning that they could safely ignore the complaints and do bare minimum as long as no one complained to the Dean.

As a side effect, pegasus - our overworked, login/mail/authentication server hot totally pwn3d few years ago. Someone installed a rootkit, and a few trojans. The IT guys did not touch it. For several YEARS someone else owned the server many professors used for research, and students used for email and homework. The professors knew about it, the Dean knew about it and the more cluefull students knew about it too. No one cared though. Besides, with the crazy NFS setup, it was to much hassle to even touch pegaus.

The problem unexpectedly solved itself in June (I think) when a big campus-wide power outage completely fried pegasus. It had a spectacular crash, and never came back up. IT scrapped it. The whole NFS network was fucked, and they really had to work hard to get things working again. Of course they did not have a replacement ready so students were cut off from their email and network access for most of the summer.

Recently they resurrected pegasus as a Linux machine which did logins, and email. Once you ssh to it, it slogins you into spark station called freddie. All my rsa keys went to shit, and the first time I connected I nearly got a heart attack thinking someone is pulling a man in the middle on me. But I was happy to see it back.

And now it is down once again. I have no clue when it will come back, but I'm sure as hell it won't happen over the weekend.


Post a Comment

Links to this post:

Create a Link

<< Home