When I started using a Power Mac G4 running Mac OS X as desktop machine a year ago I decided to use NIS and NFS on the machine. That would allow me to share data easily and kept my personal file on my server which uses RAID and gets backed up.
Getting NIS and NFS to work wasn’t very difficult using Marcel Bresink’s excellent instructions.The first problem I encountered was poor NFS performance, about 2MB/sec over Gigabit Ethernet. Following the advise of a fellow NetBSD developer I tried using NFS over UDP. While this is usually slower and less reliable it fixed the problem in this case. Reading a large file via NFS now runs at 30MB/sec. The only remaining problem was that I could occasionally not log in after booting up the machine. This happened about once a week and restarting the machine via the login window usually fixed the problem.
Unfortunately the problem got a lot worse when I upgraded the hardware to a Power Mac G5. I wasn’t able to login after one out of three (re)boots. On at least one occassion the problem required half a dozen reboots before I could finally use the machine. I also experienced a new problem where my account would work but the home directory couldn’t be mounted. This error required logging in as a local user and removing the bogus home directory which got created because NFS didn’t work. The automounter would otherwise not mount my home directory even if NIS worked fine.
The situation became unbearable and I began to analyzed the problem. I tried modifying the NIS startup script with little success. After a while I realized that lookupd was causing the problems with NIS. It sometimes failed for no apparent reason to talk to the NIS server. The result was that either the NIS accounts were not available or that the automounter couldn’t load the NIS mount map and the home directories weren’t accessible. I finally figured out the sequences to get my Mac working when it was in that dodgy state:
- Login using a local account.
- Open a Terminal window and use sudo zsh to get system administrator privileges.
- Force a restart of lookupd with killall lookupd.
- Wait a moment and tell the automounter to reload its configuration via killall -HUP automount.
I became tired of doing that manually of course and finally wrote a shell script which did the job automatically. The scripts gets started from /etc/rc.local like this:
nohup /usr/local/sbin/fix-nis 25 >/tmp/fix-nis.log 2>&1 &
Using that brute force approach fixed the problem. If I can’t login after booting the machine I just wait a few seconds until the scripts teaches lookupd a lesson and can finally login and access my home directory.
I nevertheless wanted to know what causes those problems and posted an article in a german Mac OS X network related newsgroup. In the resulting discussion somebody pointed out that Marcel Bresink has added a section about Mac OS X Tiger related NIS bugs to his instructions. It seems that Apple introduced quite a lot of bugs with the integration of launchd into Mac OS X Tiger. I remember that the Solaris 10 on my company laptop at a previos job had similar problems because Sun had also invented a parallelised system startup with that operating system release.
So the good news is that my NIS setup at home isn’t broken. But the bad news is that there is no better solution than my brute force shell script. Let’s hope that Apple fixes these problems in Mac OS X Leopard.