Carousel is a lie!

And now it's May
8th May 2012

Prompted by fierce internecine rivalry with Tampa Bay Breakfasts, I'm finally putting in an update. My supervisor is my four year-old son, who's busy reading "You are the first kid on Mars" beside me while holding on to Power Ranger and Terl action figures.

Work: I've got a summer student. She was at one of the labs I work with for the last 8 months, and showed a real aptitude for computers. My boss agreed to pick up the bill for her salary, so here we are.

It's working out really, really well. She's got a lot to learn (basic networking, for example) but it is SUCH A WONDERFUL THING to have someone to send off on jobs. "Hey, have you got a minute to..." "She'll take care of it." She can help with what she knows, and what she doesn't she takes careful notes on. I've even had a chance to work on other, larger projects for, like, an hour or two at a time. It's great.

I'm going away for three weeks in June/July, and there's a lot to teach her before then. Fortunately, there are a couple other sysadmins who can help out, and a couple of other technical folk in the lab who can take on some duties. But it's been a real wake-up for me, realizing how could be made easier for someone else. It'd be nice, for example, to have something that'd let people reboot machines easily when they get stuck. Right now, I SSH to the ILOM and reset it there; what about a web page? It'd be its own set of problems, of course, and I'm not going to code something up between now and June, but it's something to think about. Or at least coming up with some handy wrapper around the ipmipower/console commands.

Home: The weather is at last, AT LAST becoming sunny and springlike. I took the telescope out on Saturday -- full moon, so I spent most of my time looking at Saturn. And holy crap, was it amazing! I saw the Cassini division for the first time, the C ring (!) and five moons. I'm starting to regret (a little) having sold the 4.3mm eyepiece; the 7.5mm is nice but does badly in the Barlow, which I suspect says more about the Barlow than anything else. (Also that night: tried looking for M65 and M66, just to see if I could find them in the suburbs under a full moon. Negative.)

I'm trying to port an astronomical utility to Rockbox; it will show altitude and azimuth for planets, Messier and NGC objects. My intention is to use it with manual setting circles on my Dob. The interesting part is that Rockbox has no floating point arithmetic, so it's not a straightforward port at all. Thus I've had to learn about fixed point arithmetic, lookup tables and the like. My trig and bitwise arithmetic are, how do you say, weak from underuse, so this is a bit of a slog. But I'm hopeful.

And now my other supervisor is coming for a status report. Time to go!

Tags: astronomy, programming, work.
The Fried Rice Manifesto
23rd April 2012

Pretty awesome:

http://www.busydadblog.com/entries/the-fried-rice-manifesto.html

Tags: cooking.
Debugging Bacula FileSet exclusions -- an example
20th April 2012

A user at $WORK was running a series of jobs on the cluster -- dozens at any moment. Other users have their quota set to 60 GB, but this user was not (long story). His home directory is at 400GB, but it was closer to a terabyte not so long ago....right when we had a hard drive and a tape drive fail at the same time on our backup server.

We do backups every night to tape using Bacula. Most backups are incremental (whatever changed since the last backup, usually the day before) and are small...maybe tens of GB per day. But backups for this user, because of the proliferation of logs from his jobs, were closer to the size of his home directory every day -- simply because all these log files were being updated as each job progressed.

Ordinarily this wouldn't be a problem, but the cluster of hardware failures have really fucked things up; they're better now, but I'm very slowly playing catchup backups. Eating a tape or more every day is not in my budget right this moment.

I asked him if any of the log files could be excluded from backups without any great loss. After talking it over with him, we came to this agreement:

This would exclude lots of other files like "1rep2.foo", "8rep9.log", etc, and would cut out about 200 GB of useless churn every day.

Bacula has the ability to do this sort of thing...but I found its methods somewhat counterintuitive, so I want to set down what I did and how I tested it.

First off, the original, let's-include-everything FileSet looked like this:

FileSet {
  Name = "example"
  Include {
    File = /home/example
    Options {
      signature = SHA1
    }
  }
  Exclude {
    File = /proc
    File = /tmp
    File = /.journal
    File = /.fsck
    File = /.zfs
  }
}

We back up everything under /home/example, we keep SHA1 signatures, and we exclude a handful of directories (most of which are boilerplate, applied to every FileSet by default).

In order to get Bacula to change the FileSet definition, you have to get the director to reload its configuration file. But some errors -- not all -- cause a running bacula-dir process to die. So before I started fiddling around, I added a Makefile to the /opt/bacula/etc directory that looked like this:

test:
        @/opt/bacula/sbin/bacula-dir -t && echo "bacula-dir.conf looks good" || echo "problem with bacula-dir.conf"

reload: test
        echo "reload" | /opt/bacula/sbin/bconsole

Whenever I made a change, I'd run "make reload", which would test the configuration first; if it failed, bacula would not be reloaded. (The "@" symbol, in a Makefile, discards standard output.)

Next, I needed a listing of what we were backing up now, before I started fiddling with things:

    echo "estimate job=fileserver-example listing" | bconsole > /tmp/listing-before

The "estimate" command gets Bacula to estimate how big the job is; the "listing" argument tells it to list the files it'd back up. By default it gives you the info for a full backup. (You can also append a joblevel, so you can see how big a Differential or Incremental; I didn't need that here, but it's worth remembering for next time.)

After that, I made another Makefile that looked like this:

test: estimate shouldwork shouldfail

estimate:
        @echo "estimate job=fileserver-example listing" | bconsole > /tmp/listing-after ; wc -l /tmp/listing*

shouldwork: estimate
        grep rep0 /tmp/listing-before | grep projects/output | while read i ; do grep -q $$i /tmp/listing-after || exit 1 ; done

shouldfail:
        grep rep2 /tmp/listing-before |grep projects/output | while read i ; do grep -q $$i /tmp/listing-after && exit 1 ; done ; true

This is a little hackish, so in detail:

Anyhow: after each change, I'd run "make reload" as root to make sure that the syntax worked. After that, I'd run "make test" as an ordinary user (no need for root privileges) to make sure that I was on the right track. After a while, I got this:

FileSet {
  Name = "example"
  Include {
      File = /home/example
      Include {
        Options {
          signature = SHA1
          Wilddir = /home/example/projects/output
          Exclude = yes
        }
      }
  }
  Include {
    File = /home/example/projects/output
    Options {
      WildFile = "*rep0*"
      Signature = SHA1
    }
    Options {
      Exclude = yes
      RegexFile = ".*"
    }
  }
  Exclude {
    File = /proc
    File = /tmp
    File = /.journal
    File = /.fsck
    File = /.zfs
  }
}

Again, this is a little counterintuitive to me, so here's how it works out.

After I was confident that I had the right set of files excluded, I sent the user a list of files to confirm that all was well:

cat /tmp/listing_before | while read i ; do grep -q $i /tmp/listing_after || echo $i ; done > /tmp/excluded

Now, I'm the first to admit that that is ugly. Diff, useless use of cat...lots of objections to raise. But it's been a long day and I got what I wanted. I pointed the user at it, made sure it was okay, and committed the changes.

All in all, this gave me a good loop for testing: it caught fatal errors before they happened, it let me be sure I was excluding the right things, and I was able to work in a stepwise fashion to get where I wanted.

Tags: backups, bacula.
In case this doesn't make it to Slashdot
20th April 2012

Ordinarily, I wouldn't do this...but it's funny because sob.

(EDIT: Accepted!)

No tags
Fixing BeautifulSoup/Venus errors in Debian
19th April 2012

I use Venus to aggregate a number of blogs I read. It works well, but I kept getting lots of complaints about errors in different feeds; it led to some blogs being kept off the page entirely.

Turns out this is a known problem with older versions of BeautifulSoup, the parser used by Venus. Today I was finally motivated to fix it, and I think it's a sign of the sad, sad decline that happens to you after you're 40 that, rather than try something from Debian's experimental branch, or running dpkg-rebuild --no-beautiful-soup errors (sadly undocumented)...I simply followed the suggestions here and copied newer versions of the files into place.

The good: now I can read Matt's blog again. The bad: Matt will be so disappointed in me that he may not even let me buy him beer at LISA this year.

No tags
Downloading MBSA from Microsoft with IE Enhanced Security
18th April 2012

This took a while to figure out...While trying to download MBSA from Microsoft, using IE 9 on a freshly patched install of MS Server 2008, I kept getting the error message "Your current security settings do not allow this file to be downloaded." The solution turned out to be to (temporarily!) add "http://download.microsoft.com" to the list of trusted sites (Internet Options -> Security -> Trusted Sites -> Sites).

No tags
Detailed setup for OrgMode
18th April 2012

http://doc.norang.ca/org-mode.html is an excellent tutorial on customizing Org Mode to the nth degree. I keep trying to remember the link, so I'm writing it here to remember...but I highly recommend checking it out.

Tags: emacs, org.
Brewday!
15th April 2012

Got to brew yesterday, and my oldest son helped out:

Decanting

Decanting

Gonna be an amazingly bitter, session hefeweizen.

Tags: beer.
Linker error: cannot find -lg2c
12th April 2012

I ran into a problem today trying to compile an old Fortran program. Everything was working until the final link:

gcc -o ./DAlphaBall.f77 -O DAlphaBall.o sos_minor_gmp.o alf_tools_gmp.o adjust.o alfcx.o alfcx_tools.o delcx.o truncate_real.o measure_dvol.o dsurfvol_tools.o vector.o write_simplices.o  -lgmp -lg2c -lm
/usr/bin/ld: cannot find -lg2c
collect2: ld returned 1 exit status

The strange thing is that libg2c.so.0 was installed:

$ ls -l /usr/lib64/libg2c*
lrwxrwxrwx 1 root root     15 2010-12-30 12:43 /usr/lib64/libg2c.so.0 -> libg2c.so.0.0.0
-rwxr-xr-x 1 root root 127368 2010-07-05 04:57 /usr/lib64/libg2c.so.0.0.0

After some searching, it seems that libg2c is part of an older version of gfortran, back in the day when it was actually called g77. My problem was that I was using gfortran to compile it (which, therefore, was part of the gcc-4 series) and not g77. On this system, the old version was installed as gcc33-*, and changing the Fortran compiler and CC/LD variables to the appropriate version worked a treat.

Oh, and here's some good technical background on linkers and names.

Tags: programming.
Why I'm starting to hate Bacula
11th April 2012

This is an attempt to lay out my problems with Bacula, and to be explicit about what I hope to achieve by replacing it (if, in fact, I do go ahead with that). If I'm wrong, correct me.

Too many long jobs monopolize spool space, storage job slots, and generally hold up production.

My largest jobs right now are around 1-2 TB -- and in order to accomplish that, I need to manually split up filesystems using a messy syntax. A job running that long will cycle through

many, many times. During spooling, a slot of storage space jobs is used. During despooling, no other job can despool to that tape drive. Often, this ends up holding up a lot of other jobs. If there's a problem, I'm faced with a choice between killing a job that's been running for days, or letting lots of other stuff go swithout backups until/unless it finishes.

More generally, I'm faced with a choice between letting everything run forever at the beginning of the month (because it's simplest to schedule fulls for the first Saturday or some such), or juggling schedules manually to stagger things (which I'm doing now, and leads to schedules like FullBackupSecondSundayAfterLent).

Possible fixes:

Bacula seems to get confused easily about what tapes are available for use.

Bacula's storage daemon seems to often hold on to outdated info about what tapes are in what state.

Example: the daily pool is full, so jobs are halted. Status storage shows it's waiting for a drive to be created for the daily pool. I move a volume from another pool, then have to attempt to mount it manually in the appropriate drive -- the storage daemon doesn't pick up on this change automatically.

Sometimes this works, and sometimes it doesn't. Sometimes both are waiting for a tape from the same pool; creating one doesn't let the jobs queued up on the other drive run on that new tape, but rather you need to create a second new tape and mount it. On top of that, sometimes the jobs hang around on the storage daemon still waiting for a new tape -- or something...because they don't get out of the way, and let other jobs run in their place, unless they're cancelled (and sometimes only when bacula-sd is restarted).

This may be fixed with the upgrade to 5.2.6. However....

The new version of Bacula crashes when I run too many jobs at once.

That's 5.2.6, upgraded to from 5.0.2 (time got away on me, yes). And by too many I mean, like, 50. That's not too many! I'm not sure what the hell's going on, though at least now I have a backtrace. I'm seriously pissed off about this point. Yes, I'll file a bug, but this is annoying.

All in all, I spend far too much time babysitting Bacula.

It's extremely high maintenance, and that's pissing me off. Understand, this is coming after a long weekend spent babysitting it, trying to make sure some jobs got written. There are other problems at work, yes, but this is not meant to be so hard.

Tags: backups, bacula.

RSS feed

Created by Chronicle v4.4