The Adventures of Systems Boy!: Tiger Lab Migration Part 9: More Problems and Solutions

Tiger Lab Migration Part 9: More Problems and Solutions

Well, when last we visited this issue, we were having problems saving, among other things, Final Cut Pro files. After working long and hard with the Panasas folks, I was confident I'd found a solution. And I had, but the solution has created new problems. I want to kill Apple for Tiger.

To refresh, the original problem was that Final Cut was trying to write files across filesystems, and the OS was choking on this operation. This was happening because our RAID exports its /home volume -- and this is what we mount -- but each user account is also an individual volume. FCP writes to the /home level, and then has to cross filesystems to save the file to the user's home account, itself a seperate volume. Our solution was, rather than mounting all of /home, to mount each user's home account individually. This required a whole bunch of scripting and launchd voodoo to really work the way we wanted, particularly with regards to adding new users. But we got it working, and it fixed the Final Cut problem. I implemented it, tested it, and emailed the community about the fix.

Unfortunately, I recently discovered that using this method causes some new, less ugly, but certainly annoying problems: the sidebar home account link no longer works, causing untold confusion among users who suddenly think their home account no longer exists (as per the alert message that appears when clicking this link); but most alarming, users can no longer open documents from their home accounts. Not only can users not open docs, but root via cron fails to open documents. This is a big problem, as our "Scratch" partition gets deleted every Friday, and the users rely on an alert message to warn them of the oncoming doom. This message no longer launches, and that's a real bummer.

So I was in the middle of drafting an email to the community informing them of the latest bugs and workarounds, when I was struck with extreme embarrassment and shame. How can I continue to expect users to work around these bugs when each week brings a new set? Sure, it's not my fault that Tiger is broken. It's not my fault that the Panasas is set up the way it is. But it is my responsibility to create a user environment that works seamlessly and properly for the user. So I resolved to do everything in my power to implement a solution. I stayed at work on Friday night until 2:30 AM, learning much, but left with no solution in hand.

What I learned on Friday, essentially, is that automount in Tiger is totally fucked. I already knew that NFS in Tiger was fucked in that it can't cross filesystems. But it turns out there have been some major changes to the way automount is handled in the GUI, and thus, for all intents and purposes -- or at least for our intents and purposes -- it is indeed fucked. Hard. The inability to follow sidebar references is a direct result of automount's new Finder behavior: automount mounts in the Finder are now invisible! Unless the path is explicitly called, the user accounts, mounted as they are, are quite invisible, both to the user, and apparently to the Finder's sidebar. I also discovered that the Finder's inability to launch files from the user's home account has something to do with automount, or at least how automount is implemented in the GUI. This behavior only exhibits itself from these invisible NFS mounts; it goes away if we mount the user's home account in /home, as we used to do. And it goes away if we use a different command to mount user homes.

Enter mount_nfs.

The mount_nfs command is used to directly and statically mount NFS exports, and it has its own set of peculiarities. First and foremost, mount_nfs grafts mounts to existing folders, which means that mount_nfs requires a folder named for the user who wants to log in. Secondly, and equally important in our scenario, mount_nfs magically mounts the export in the Finder. Or, rather, shows the mount in the Finder (and this has actually, probably more to do with how the Finder translates mounts requested by nfs_mount). Lastly, since the mounts are static, they remain mounted and appear in the Finder until they are explicitly unmounted. Mounting via mount_nfs is equivalent to mounting nfs exports in the Finder with command-k. It also seems to work, you may notice, completely opposite to automount.

To do what we've been doing -- which is to automount home accounts at boot -- using mount_nfs instead would require us to mount every home account at boot on every machine, and those 200+ mounts would be present in the Finder at all times. This simply would not work. So, mount_nfs at boot: not good. You thinking what I'm thinking?

Enter loginhooks.

To use mount_nfs at boot would be disastrous. What we need in this case is something that will dynamically mount and unmount users' home accounts at login and logout respectively. And that's just what loginhooks and logouthooks do, respectively. Getting loginhooks to work in Tiger was again an exercise in frustration, but this time it was due to my own poor understanding of the technology. There is a great reference on login hooks at Mike Bombich's site, and a decent one at Apple's Knowledge Base as well. Between these two resources, I was finally able to cobble together a solution over the weekend which I think will work.

Briefly, there are three things you need to do to implement loginhooks, and some info you need to know.

The info first:
1) Scripts called by loginhooks run as root (good)
2) They run before login to the GUI takes place, and after authentication (also good)
3) The user requesting login can be represented in your script by the variable $1 (excellent!)

What to do:
1) Write a script to be executed at login, make it executable, put it somewhere accessible
2) Run this command:
sudo defaults write com.apple.loginwindow LoginHook /path/to/loginScript
3) Test the script! This is imperative! You can test it as any user you want by running it thusly:
sudo /path/to/script User

For "User," specify a user who would actually log in, and who is available to the system. "User" here will be interpreted in your script with the $1 variable, just as it will at login. If it works in your test, it should work in practice as a loginhook.

Setting up a logouthook works the same way, except you run the command:
sudo defaults write com.apple.loginwindow LogoutHook /path/to/logoutScript

And, BTW, to check that the loginhook (or logouthook, or both) has been successfully added, you should see your script(s) listed, when running this command:
sudo defaults read com.apple.loginwindow

Finally, to disable, login(logout)hooks run:
sudo defaults delete com.apple.loginwindow LoginHook
and:
sudo defaults write com.apple.loginwindow LogoutHook

So, what I have now are two scripts. The login one creates a folder in /home named for the user who's logging in, and then uses mount_nfs to requset the user's home account from the server and mount it in this folder. The logout one forcibly unmounts the user's home account. And that is all.

I've only tested this at home, but it seems to work brilliantly here, and is fast even over a wireless connection. I will be trying it at work on Monday morning and will post back with my results, success or failure.

Wish me luck.

UPDATE 1:
So far, so good. There were some snags, though. I came in a little early this morning (in hopes of starting before the lab became overrun with students) to implement and test my loginhook plan, only to find the home account server had crashed. So, I had to spend the first hour restarting and troubleshooting the Panasas. By the time I'd finished, the lab was filling up. So I had to do everything piecemeal, and it took a bit longer then I'd hoped. That was snag one.

The next snag was that, after one user logged in, the next user would be locked out of his/her home account. This turned out to be a simple matter of setting permissions (on the /home directory) via the loginhook in a manner in which this did not happen.

There are some advantages and disadvantages to this new mounting method. The greatest benefit is that any changes to the mount method are done via very simple shell scripts, and changes don't require a reboot to take effect. Just change the script and you're done. Another big plus it that, if the home account server goes down, the machines aren't really affected to the extent they once were. If no one's logged in, there's no server access happening, so the machines are fine. Only if someone's logged in is it a problem, but then, that's always a problem. It occurs to me, though, that I may want to add a -hard option to my mount script sometime soon. Another advantage is the fact that, since only individual user accounts are mounted, and not the entire home account server, a command like sudo rm -rf /home can't wipe out the entire server as it could before. The final advantage is that reboots are now much faster, since nothing happens at boot time.

The disadvantages are twofold, and solvable but minor: First, since the entire home account server is never mounted now, users no longer have easy access to each other's home accounts. To access another user's home account, they now must connect to the home account server via the Finder's "Connect to Server..." command (or command-k). Secondly, any user who wants to ssh to one of the other Macs, say, from a laptop, will not have access to his home account since the mount only occurs when logging in to the GUI. These issues are minor, and can be solved by simply mounting the home account server somewhere other than /home, but I think we'll try and live without it for now, as I kind of like not having the whole RAID mounted.

Anyway, no complaints thus far. I'll be convinced this is a go after a week or so without problems. But I'm cautiously optimistic at this point.

UPDATE 2:
Day two, and all's quiet on the Western front, as it were. I'm obsessively monitoring the situation, but it's looking very good at this point. Fingers crossed.

Labels: Lab, Systems, Tiger, TigerLabMigration

This entry was posted on Sunday, October 23, 2005 at 12:34 PM. You can skip to the end and leave a response.

« Home | Next »

The Adventures of Systems Boy!

Tiger Lab Migration Part 9: More Problems and Solutions

var a = 0; if(a == 0) {document.write('No comment');} else if(a == 1) {document.write('1 comment:');}else{document.write(a+' comments:');}