I wrote “On Languages and Models” some time ago; the timestamp on the file is May 11, 2005.

It was surprising to find it by way of a Google search. I don’t know how it was exposed to the world, but it was. C’est la vie.

There are a few ideas and things I want to hold on to there—perhaps I’ll just repost it at some point as a weblog post.

I’ve long feared the use of CD-R and DVD-R as backup media. They are fundamentally unstable technologies. The burn might look good, and it might verify, but 1 year later I might not be able to read the data. I know tape is a great deal more stable, but I’ve often been put off by the high cost of tape drives.

Now I’m about to put a new machine on-line. Along with two friends/colleagues, we’ve purchased a 1U rack mount server. It’s a dual PIII 1.4GHz machine, 2GB of RAM, and a mirrored pair of 250GB drives. We’ve replaced all the moving parts in the server (the drives are new), and are being quite fastidious in our design and setup of the system. Our goal is to make the administration as simple as possible while allowing for a diverse number of users and uses. I’m confident, based on our experiences with our Bytemark virtual machine, that we’ll have a great box in place.

What’s the missing link? Backups. We’ve got complete sensing on the machine, so if one of the drives in the RAID array tanks, we’ll know by email, SMS, and every other messaging mechanism known to us that something’s gone wrong. I’m developing a little language for specifying backups over the filesystem, databases (both Postrgres and MySQL databases run on the machine), Subversion, and anything else that comes up needing to be backed up. Those backups will go first to disk, and then off-site.

But where then? Even with a clean, automated framework for moving data off the server, that leaves me in a position where I need to store the backups in some kind of safe way. This post is just me exploring the cost and reliability benefits of several different options.

Solution Drive cost £/GB Low £/GB High Shelf life Notes Total £/GB
DVD £40 0.02 0.09 6m – 2y £50
Tape (Camcorder) £280 0.17 0.50 3y – 10y 4, 5 £300
Tape Drive £900 0.22 0.30 3y – 20y 1, 2, 3 £950
Hard Drive (RAID1) £200 x 1.20 2y – 4y £200

Notes

  1. I’ve given tapes a 3y to 20yr life as good tape practice assumes re-tensioning every 2-3 years.
  2. Apple Store link for VXA2
  3. Exabyte page
  4. Carts for VXA2
  5. Coolatoola homepage for DV Backup
  6. Apple page for DV Backup

I’ve been through this before, but it was worth seeing again.

So, if I want my backups to last a long time, I need to put them on tape. I can do that most cheaply with software like DV Backup, which allows me to store data on miniDV tape, turning a consumer miniDV camcorder into a digital tape drive. This is almost certainly the most stable format I can affordably purchase.

A tape drive is simply out of the budget; there’s no way I can afford, at this time, a real tape drive. And while I could use a hard drive, the shelf life of a pair of drives isn’t actually that great. Tape will last much longer, although it presents issues of data migration. Still, I’d feel better/can more easily afford (at this time) having to re-tension a miniDV tape every 2-3 years than having to buy new drives every 2-3 years.

For the short term, I think I’ll be using DVD. Our backups are intended for restoration in case of catastrophic failure, and therefore we’re only interested in restoring the last weekly or monthly. As soon as I can afford it, I’ll purchase a miniDV camcorder and begin using that as a destination for monthly (full) backups, and will keep those for a relatively long period of time.

It’s clear; I’ll need one, and more importantly, I really hope we can get one at Kent to port the Transterpreter to it early. It looks like there are so many absolutely cool things we can do with it as a substrate.

And, since our runtime should work perfectly well on both the original Mindstorms and the NXT, we’re in a good position to have a nice bridge between the two. Yummy!

Lisp, as they say, is not Scheme.

http://mailsteward.com/

MailSteward does something that I once started writing software for myself: it archives email.

  1. It looks at what mail accounts you have in your Mail.app setup.
  2. It downloads all your email.
  3. It lets you search your mail, and export it any number of ways.

Most importantly, it can export plain text, MBOX, or (this is fun) an SQL file that can be imported directly into MySQL. If you’re just a normal person, you won’t necessarily need these features, but it means that the application does not hold your email hostage. Instead, it empowers you to export your mail in a variety of ways, all of which can be accessed using open standards.

Why do open standards matter? For example, say I back up all my mail from the server, and then delete it. If I ever want to load it back in, all I have to do is dump it out of MailSteward in (say) an MBOX format, and open it up with most any mail client. Blammo! It Just Works.

I bought it. Having the ability to easily search all of my email is too good to be true. Yes, I know Mail.app integrates with Spotlight, and all these other things… but MailSteward just stores, sorts, and searches email. And that’s great.

I’ll have more to say about it when I’ve spent more time using it. Initial impressions are excellent.

I wrote a little Scheme script to recursively traverse directories and find out how “big” they are.

[Lyra] mcj4 > dir-size analysis/
1240389 files, 8252 directories
30GB 472MB 596KB

[Lyra] mcj4 > dir-size Music/
1918 files, 635 directories
10GB 243MB 469KB

[Lyra] mcj4 > dir-size docs/
56065 files, 2683 directories
7GB 505MB 570KB

[Lyra] mcj4 > dir-size Pictures
4211 files, 269 directories
3GB 262MB 312KB

My analysis directory contains the data from my dissertation, the code that manipulates that data, and the output from that code. 30 gigabytes of data. Over 1.2 million files!. The documents directory contains my writing, programming, and other projects; there is not much in the way of “media” there; it’s almost entirely text, PDFs, and the like. Well, clearly, there must be a lot of something in there.

It’s interesting to get a sense for where the 80GB of space on my Powerbook has gone… clearly, my dissertation is to blame.

One thing I really like about the x86 architecture is the availability of so many cool virtualization technologies.

At work, I run Debian. I don’t even have a graphical interface configured; all the machine does is provide a place to store files, host databases, execute long-running processes—it’s a server, essentially. However, it is a 2.4 GHz Penium 4 with 512MB of RAM, which makes it a fast enough machine for doing reasonably interesting stuff with.

For example, the release of VMWare’s Player application opens up a number of opportunities. While it cannot create a virtual machine image, it can play an existing image. This means I can do a number of interesting things, and letting others take part/play along is no longer a difficult thing.

  1. The CSCS group is interested in Linux. I’d love to provide a VMWare image that lets them load a Linux image on a Windows machine, without them having to actually build the machine.
  2. Similarly, the CSCS group expressed an interest in learning some things about Linux security. I’d rather they were doing such experiments on a VM, and not a machine that I use for my research.
  3. I’d like to experiment with different *NIXes, but I can’t afford the time to be nuking a machine (constantly) to install them. Having a VM allows me to easily install (say) Ubuntu. Or Suse.
  4. It would be incredibly useful to have a Windows machine around, as I occasionally encounter something for a handheld device, small robotics platform, etc., that only comes as a Windows executable. However, I don’t have any desire to actually use Windows every day, nor do I want to replace my rock-solid Debian server. A VM would, however, make my life easy.
  5. Related, it would be great to have a Windows XP, Windows 2000, and Windows 98 installation so that the Transterpreter could easily be installed and tested on a clean machine every time we made a new release. Also, there were some genuine difficulties handling environment variables under XP, and neither Christian nor I had one—we both have Linux machines next to our desks. Makes it kinda difficult to guarantee portability of a binary installer for an OS you don’t actually have or use.
  6. Occasionally, I’d like to be able to play a game or two—nothing cutting edge, mind you, just some old games I grew up with. These are rarely available for the Mac, and almost always available for Windows. A VM running Windows 98 would be perfect in these situations.

OK, so the last item isn’t exactly work related, but it is at least honest. Point is, having a machine where I can store a number of 5-10GB disk images would be great. The Linux images could easily be provided to students for them to download and experiment with, and the Windows images would make it so much easier to occasionally boot “Windows-in-a-Window” to use an old or specialized piece of software that, otherwise, would be a real pain.

I’d love to wait until Apple releases an Intel box to build a machine that does this. The thought of being able to run OSX in a dual-boot configuration with Debian or Ubuntu, and therefore be able to run VMWare on the Linux side, is really quite exciting. It is even possible that VMWare will release an OSX port of their virtualization software, meaning I can possibly use OSX as a VM host for Linux and Windows. That, really, would be wonderful.

In the meantime, I have to wonder if it is worth building a small machine that can do all of these things (so I can take the machine with me after I leave Kent), or if I just want to ask the department for a drive and another gig of RAM to give it a go, and do some cool stuff related to my work and teaching.

How does one archive email?

Sending everything to Gmail isn’t really an option; they might pack up shop any day of the week. Sticking with one client isn’t an option, because most of them are awful. I mean, I don’t want to save things with Thunderbird, or Pine, or MH, or Apple’s Mail.app, or any other tool that binds my data in a particular folder structure, place, etc. I don’t want something to export my mail as HTML, as that leaves me with readable, searchable mail that is not actually well-structured enough to be processed automatically.

I think I would settle for one of two kinds of tool:

  1. An IMAP/POP to XML email archiver. I’d be happy if all my email ended up in a mess of XML documents; those I could easily manipulate in a variety of ways (Scheme scripts, XSLT) to process and display in a number of ways.
  2. A Postgres or MySQL datastore. Perhaps an SQLite file would be better; in any case, if the tables are well documented, I can (again) process it any way I want.

Surprisingly, there doesn’t seem to be a tool that does this. Nothing that is affordable ($20-75), cross-platform (Mac/Windows/Linux), capable of running on the local client or as a service, and most importantly, that leaves my data in an open, manipulable format. Zoe is interesting and all, but it is a nightmare to use; I never have any idea what it is doing, and that scares me.

Perhaps I’ll do a bit more research; it seems like a simple, fundamental service that would matter to a lot of people, but it simply doesn’t exist.

When developing a software product, do you want to hire someone with expertise designing, developing, and maintaining software? Or, do you want to hire someone with domain expertise?

Consider this advert from jobs.phds.org:

I’m looking for a C++ expert who has in their previous experience undergone a PhD or numerical/mathematical research and seeking to build their career into this space as a Jr Quant Developer. This role needs an individual who has spent the last 2-4 years of their experience applying advanced C++ ideally within a mathematical programming framework. You will go beyond pure development and be involved in enhanncing functionality, design and research of an option based pricing system. You will join a small team of individuals with the expectation of growing into the role and potentially creating your own pricing tool/library for the business in the long-run. PhD disciplines will include Physics, Applied Mathematics or related courses with a heavy C++ slant to your experience. Knowledge of derivatives pricing will assure an excellent stepping-stone into the field of financial mathematics. Sound like you? … This role can be based either in London or New york.

(emphasis mine)

The advert clearly states that they want a hard-core C++ developer to develop numerical software. However, it also implies that the developer will be responsible for larger pieces of the software development cycle: research, design, and product evolution, with an eye towards creating a library or application with a reasonably long shelf life. That, to me, sounds like something an experienced software engineer would be challenged by, and a C++ hack would do a really, really awful job at… simply because most physicists who are C/C++ hackers “on the side” write really foul, ugly, unmaintainable code. (No, RB, I’ve not yet read your code.)

But this is where the CS discipline is going: domain experts learning to hack, instead of expert designers and developers working with domain experts to capture their knowledge in code. A dangerous, error-prone future lies ahead of us. Remind me not to trust the quant software from that company.

Reduce the risk, hire from open source:

In the wake of open source, traditional hiring practices seem like an unnecessarily risky way to hire new employees. Especially for small teams where each hire can make it or break it. Why bet the composure of your collective on abstract indicators, hearsay, and a biased bio?

And the rest of the post. Interesting.