Monday, June 16, 2008

Fileserver

Before I left for San Diego, my fileserver's northbridge fan and heatsink became separated. It's really worse than a divorce, in many ways, because things are all fucked up, but there's still hope, which means uncertainty, lots of work, and very careful handling of electronics. OK, so I don't know much about divorces.

The lack of heat transfer away from the chip kept my fileserver from booting. Since I've already complained about my loud northbridge fan, I chose to take the opportunity to replace that noisy piece of hardware. I already had a spare fan from somewhere, so I went to the hardware store to find screws long enough to secure it. At the hardware store, I had the bright idea of comparing the screw holes of the two fans and found that they didn't line up. That's when I made the coolest discovery of the week: Krazy Glue comes in a small bottle with a tiny brush built into the lid! I glued the fan to the heatsink, and it stuck almost immediately.

When I got home, I couldn't find my isopropyl alcohol. I was able to get some last Friday night, though, and I set out to clean the bottom of the heatsink and the top of the northbridge heat spreader. I remember isopropyl alcohol working much better than it did. I had to cut the thermal paste off the heatsink, for the most part. Perhaps thermal paste is just that horrible compared to thermal grease. The important thing is that I could now keep my northbridge chip cool.

Doh! The power cord for the fan doesn't reach! I had to pry the fan off the heatsink and re-glue it, rotated ninety degrees. A little thermal grease later, and I was ready to plug my motherboard back into the rest of the computer.

Everything is working mostly. The only weird things are related to mdadm and the raid, I think. The automatic resync operation seems to fail. The first night, I had "watch -n 1 cat /proc/mdstat" running, but I woke up to stack traces being dumped to the screen every second instead of the nice md status. I tried again, but came back later to find the same stack traces showing up every once in awhile. So, I loaded the graphical interface to try and catch more information - but the error stopped. The resync also stopped at 8% and refused to go any further.

I decided that some write activity might wake it up, and started up some bittorrents. They worked well for a few tens of MB, but now my fileserver seems to have rebooted to an initramfs prompt and automatically started a resync. It's currently at 6.9%. I'll see what happens at 8% before I go to bed, and will update as things progress.

Update: 1:49 AM: resync is at 9.6% and going strong.
Update: 9:21 AM: resync is at 64.9% and going strong.
Update: 12:14 PM: resync is at 85.9% and going strong. I'm getting bored waiting for this.
Update: 3:16 PM: resync is done. Things have rebooted, and I will begin some bittorrent stress testing shortly.
Update: 10:23 AM: Fileserver is still up after a couple days of heavy I/O. I'm happy!

1 comment:

Lord Hughes said...

Hope it stays running. Nothing worse than not having access to all your files.