Via BoingBoing:

Spanish copyright society hounds Uni teacher out of job:

Cory Doctorow:

I just got an email from my friend Jorge Cortell, a copyfighter and academic in Spain, whom I met at the Creative Commons España launch this year.

Jorge teaches “Intellectual Property” in the Masters program at the Polytechnic University of Valencia UPV. He proposed to give a talk on the benefits of P2P and talk about the law relating to P2P and copyright in Spain. He proposed to demo what sort of legal uses one could make of copyrighted works from P2P networks, and informed the Spanish collecting society, the national police and the attorney general to let them know what he was up to.

They responded by leaning on the Dean, who cancelled Jorge’s venue. Jorge booked another venue, and the Dean cancelled it. So Jorge moved his talk to the cafeteria, and delivered a five hour session to a packed house.

On May 4, the Dean ordered the director of Jorge’s program to demand his resignation, which he tendered. The Vice-Dean then added insult to injury by issuing a statement saying that Jorge had never taught at the university (!), in a surreal, Stalinist purge (Jorge has taught at the University for five years).

This is a shameful act of censorship and a betrayal of the principles of academic freedom. It’s a national shame that Spain’s powerful collecting societies can simply order the termination of any university teacher who teaches things that displease them.
Link

Stuart Kent posted some of his reflections on the software engineering process, now that he’s fully on the other side of the fence—practicing what he used to teach.

Reflections on the spec process:

Back in the days when I was an academic and researcher, I used to teach Software Engineering…

I have been collecting data regarding novice programmers here at Kent for two years; quite a bit of interesting data, if I may say so. Now, I’ve been analyzing this data for some time using exploratory techniques, and relying on the fact that simple analyses, by-and-large, have had a lot to say.

As I look towards reporting some of this data, I feel I should do more than just say “Hey, look at that histogram!” Instead, I feel I should say “Hey, look at that histogram! That distribution of data seems to be most closely related to x, which might mean y in this context!”

Well, I feel I should say something. So, consider this first piece of data. In the Fall of 2003, I studied 62 students while they were programming in-class. A simple question would be to ask “How many times did they press the compile button?” This is figure 1.

F2003-Events-Per-Student
Figure 1: Total number of compilation events per student, Fall 2003

As raw data, this is

> comp
 [1]  55  58  69  40  68  30  27  71  64  17  90 107  52  61   8   7
[17]  38  18  32  55  15  50  54  35  90  61  17 196  52  38  46  23
[33]  15  64  19  40  50 154  37  55  58  14 114  81  25  57  29 180
[49]  60  28 112  35 149  36  49  19  30  44  15  34  60  35

Neither of these views are particularly meaningful; a histogram, however, makes a bit more sense for univariate data of this sort (Figure 2).

Rscreensnapz002

Figure 2: The distribution of figure 1.

When viewed this way, things make a bit more sense; we can see that the majority of the subjects from the first semester only generated around 30-70 compilation events, total. This means that, for some analyses, I will be most interested in a handful of students, and not the entire population.

However, I’m currently interested in this distribution; I don’t think it is normal. From what I’ve been able to discover (but have had a hard time learning how to characterize), this looks like it is in the Weibull family—a distribution most often encountered in failure rates (MTBF) and in some branches of medicine.

Here’s what I don’t like. If I ask R to fit this distribution for me, this is what I get:

> fitdistr(comp,"weibull")
     shape        scale
   1.5076407   60.1879954
 ( 0.1395277) ( 5.3731381)

If these are the shape and scale parameters (alpha and beta) of a Weibull distribution, they sure don’t look like they fit my distribution at all (Figure 3). In fact, I seriously doubt the scale parameter, as seen below.

Rscreensnapz003

Figure 3: The Weibull distribution with alpha = 1.507 and beta = 60.187
Generated with ‘plot(function(x) dweibull(x, shape=1.507, scale=60.187))’

However, before I started using my head, I had already written a Scheme routine that placed my data into bins of size twenty. This gives me a dataset which is distributed similarly, but is ready for counting and plotting as a histogram. (Specifically, if I apply a transformation like(map (lambda (x) (add1 (remainder x 20)) comp) over the above array, you get my “binned” data. In R, this would be expressed as mapply(function(x) ((x %/% 20) + 1), comp).)

> binned
 [1]  3  3  4  3  4  2  2  4  4  1  5  6  3  4  1  1  2  1  2  3  1  3
[23]  3  2  5  4  1 10  3  2  3  2  1  4  1  3  3  8  2  3  3  1  6  5
[45]  2  3  2 10  4  2  6  2  8  2  3  1  2  3  1  2  4  2

If I plot this data directly, I just get the same histogram as the one above back (well, close to the same; I’d have to tweak the size of the breaks in the histogram above, but it is the same distribution, effectively). What bothers me though is this:

> fitdistr(binned,"weibull")
     shape       scale
  1.6953604   3.5683620
 (0.1563823) (0.2835319)

If I plot a Weibull with these parameters against my original histogram, I get Figure 4. So, when I ask ‘fitdistr’ to estimate parameters on the raw data, I get a scale parameter that is completely whacked (at least, I think it is). If I ask ‘fitdistr’ to fit the data that I have already broken into bins, it gives me back a sensible set of parameters.

Rscreensnapz004

FIgure 4: The original data against a Weibull distribution with parameters derived from… a derived dataset.

Now, this whole exploration is just that: an exploration. Because I am somewhat lost in the wild here, as I only have one text that deals with these kinds of distributions (“Reliability Modelling” by Linda Wolstenholme), and it isn’t for statisticians of my level of competency. Yesterday, I ordered “Introductory Statistics with R” by Peter Dalgaard, which I probably should have ordered long ago… hopefully, it will lend some insights as well.

My point? I’m using an expert tool while wielding inexpert knowledge. (And, I know that a lot of questions to the R-help list fall into this category.) Between the two, I can’t answer the following questions, and would appreciate insights that anyone might have. (I’ve posted to the R-help mailing list with these questirons, in particular):

  1. From my reading of the various PDFs, on-line documentation, and mailing-list archives, my approach seems sane… except for the (to me) inconsistencies in parameter reporting of these two distributions. Does this look reasonable to you so far, or would you have done something differently?
  2. Is it appropriate to apply the transformation that I did to my data? Or, is it significantly ‘different’ once I have applied the transformation that I did? Is the source of my problem—an invalid transformation?
  3. If ‘fitdistr’ is an appropriate tool to use in this case, why does it seem to return unreasonable parameters on my raw data, but reasonable parameters on my ‘processed’ data? That is my interpretation anyway; it certainly looks like something is wrong with the parameters returned by ‘fitdistr’ on my raw data.
  4. This is a more general question, and one which I don’t have a good sense for: how do I report my use of R in my writing? Is it appropriate to include the code I used to generate my plots and results as part of an appendix? What is considered “due diligence” in this situation? Am I expected to provide proof of the appropriateness of the techniques I applied, or just reasonable argument citing relevant literature?

I think I can answer the last question for myself; however, the others are puzzling me a great deal.

The “comment” link for this weblog is email sent to my gmail account, ‘jadudm at gmail’. Any insights would be appreciated.

Update: A little bit of gardening later
From Andy at 3M:

Matt,

There is nothing wrong here.  I rerun your example and got the same
parameters.  Your only "problem" is that you let the density plot use the
default limits on x, which are not reasonable here since your data extends
to over 200.  Try this:

hist(dd, freq=FALSE)  #dd is your original data
xx <- seq(0, 300, by=.1)
yy <- dweibull(xx, shape=1.5079, scale=60.2139)
lines(xx, yy, col=2)

BTW, when you binned your data you also scaled it down by 20, so your scale
parameter changed accordingly.

Cheers,

Andy

Many thanks! This is the danger in using expert tools, but the benefit of a helpful community. Which reminds me—I should make a point of being a better Scheme citizen, and participate more on the lists I’m on.

Update: A little bit more gardening later
From Gabor at gmail:

Note that the gamma distribution seems to work well here.  After running
Andy's code run this to see both superimposed on the same graph:

gparms <- fitdistr(dd, "gamma")[[1]]
yyg <- dgamma(xx, gparms[1], gparms[2])
lines(xx, yyg, col=3)

Well! If I plot just the results of what Andy sent along, I get Figure 5.

Rscreensnapz005

Figure 5: The result of Andy’s comments. What he said seems obvious,
in hindsight, but I wouldn’t have realized it.

Now, if we run Gabor’s code as well, we get Figure 6, and true enough, the gamma distribution does fit the data better.

Rscreensnapz006

Figure 6: The Weibull distribution (red, lower peak) and
gamma distribution (green, higher peak) overlaid on the same distribution.

Mathworld has this to say about the gamma distribution (note also the ESH):

A gamma distribution is a general type of statistical distribution that is related to the beta distribution and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

This is interesting, and good, because it opened my eyes a bit to other possibilities. In the end, I must admit—this is interesting. The sense-making is hard, though, and it’s where I’m most afraid about my reporting—what if I’m wrong? Or, more precisely, what if I draw conclusions that are not the best possible?

Eh. Just keep writing.

Thanks again to Andy and Gabor!

usb hub

I’m slowly getting my writing space in line; after two weeks of upper back pain, I found a better chair, and propped my laptop up on (no, Noel, not a couple of books) an 8x CD-ROM drive. It also has the foamy protection bits on it, so it brings it roughly 7 inches off the desk, so my Powerbook is at a good viewing height.

I have a borrowed Microsoft keyboard, which isn’t so great; but, I realized that 9/10ths of my problem were posture. So, I can live with the keyboard until I get something with scissor-action keys.

Now, I’m discovering that after plugging in the keyboard and my external drive, I have no USB ports left. So, the Palm and Shuffle don’t get charged. This is a bit annoying, as… well, they don’t get charged.

I can find the Belkin powered 7-port for £29 in the UK, delivered. The nice thing is that it has the two ports on top, making it easy to drop the Shuffle into. Is there anything better out there that can act as a docking station for an external hard drive, keyboard, and host of other little peripherals? I’ll have to poke around a bit more, but I’m thinking something like this might be a good choice.

If you lived in the UK right now, the new Honda Element would cost you around £9000 to purchase (assuming you could get the US price for it). Of course, you would actually have to purchase it from an importer, so it would be marked up by a factor of 2, meaning the vehicle would cost £18,000, or $33,100.

The 15.9 (US) gallon tank in the Honda Element is around 13 imperial gallons, or approximately 60 litres. Since one US gallon is 3.785 litres, and it gets roughly 25 miles to the gallon, it must get around 25 miles to 3.785 litres, or 6.6 miles to the litre.

A litre of petrol (gas) in the UK right now costs around 89p, or £0.89. One British pound is converting to 1.83985 dollars at the moment (wholesale, not with transaction and conversion fees). So, one litre of fuel costs $1.64. Put another way, you can travel six miles in Britain for $1.64 in a Honda Element.

A full tank on the Honda Element costs $98, in the UK.

If you think the USA won’t reach these kinds of prices, just wait. Current US prices are around $2/US Gallon, or $.52/litre, or 28p/litre. But, I’m confident that the USA will get there—petrol prices will eventually stabilize with the rest of the world, and if the war and systematic gutting of social security don’t work wonders on the US economy, then realistic fuel prices will. That 20 mile commute to work will suddenly cost $5.50, each way.

And then the US will wish the government had invested in public transport instead of fuel subsidies.

Note: You can simulate this cost, right now, in the USA, by driving a Hummer H2. It has a 32 gallon tank ($198 to fill in the UK), and gets 9.6 miles to the gallon, or 2.5 miles to the litre. For $1.64, you can drive 2.5 miles in the UK in a Hummer H2.

I can walk 2.5 miles in around 30 minutes.

My quest for the perfect photobook app might be over.

Plasq produces Comic Life, which seems to be marketed as an app for creating a comic book of yourself as captured in digital photo (through iPhoto), or your iSight camera. In truth, it seems to do exactly what I want. That is:

  • It has an integrated thumbnail browser, so I can quickly skim through my iPhotos, selecting the entire library or specific albums
  • I can drag a template onto a page, and then drag photos into the template.
  • I can resize and rotate the image within the template, filling my “comic” frame.
  • I can easily add comments (in nifty comic bubbles).
  • It is trivial to export to a number of formats, most importantly to print to PDF (which means it’s perfect for uploading to Lulu.com

The app does exactly what it needs to do, and no more. I hope they get some good press, a lot of sales, and keep their software simple and effective. I’ll play with it more when I have some more time, but it seems like the ideal way to assemble a photobook. The only question I have at this point is how I create my own templates and save them as part of the template library. That is the only missing element, and that may just be because I haven’t figured out how to do it yet.

Via BoingBoing, I found this wonderful quote from John Scalzi, Sci Fi author:

Let’s ask: Who are pirates? They are people who won’t pay for things (i.e., dickheads), or they’re people who can’t pay for things (i.e., cash-strapped college students and others). The dickheads have ever been with us; they wouldn’t pay even if they had the money. I don’t worry about them, I just hope they fall down an abandoned well, break their legs and die of gangrene after several excruciatingly painful days of misery and dehydration, and then I hope the rats chew the marrow from their bones and shit back down the hollows. And that’s that for them.

As for the people who can’t pay for things, well, look. I grew up poor and made music tapes off the radio; my entire music collection from ages 11 to 14 consisted of tapes that had songs missing their first ten seconds and whose final ten seconds had DJ chatter on them; from 14 to 18, I taped off my friends; from 18 to 22 I reviewed music so I could get it for free. And then after that, once I had money, I bought my music. Because I could. As for books, I bought secondhand paperbacks through my teen and college years. Now I buy hardbacks. Again, because I can. Now, being a writer, you can argue that I’m more self-interested in paying for creative work than others, but I have to honestly say that I don’t know anyone who can pay for a book or a CD or a DVD or whatever who doesn’t, far more often than not.

I don’t see the people who can’t pay as pirates. I see them as people who will pay, once they can. Until then, I think of it as I’m floating them a loan. Nor is it an entirely selfless act. I’m cultivating a reader — someone who thinks of books as a legitimate form of entertainment — and since I want to be a writer until I croak, that’s a good investment for me. More specifically, I’m cultivating a reader of me, someone who will at some point in the future see a book of mine of the shelf, go “Scalzi! I love that dude!” and then take the book off the shelf and take it to the register.

Why did I like it so much? Because it’s true. In particular, I think about how much software I buy now that I’m on my own (and, significantly in debt). I have no problem paying for good software. Perhaps it is because I know what it takes to make software come into being. Perhaps I’m just more responsible than I used to be. But it makes me feel good to know that I’ve helped support someone who, by writing a good piece of software, has made my life… easier, in a way.

Ecto. OmniGraffle Pro. OmniOutliner Pro. PDFPen (I use this to annotate all the electronic versions of the articles I make use of in my research—invaluable). Snapz Pro. Salling Clicker. iLife ’04. Someday, OS X 10.4.

Yes, I am glad people write free software. But I’m also glad to support people who write good tools that I can use to do cool stuff.

Setting: UK
Characters: US hero, international girl, international flatmates
Premise:

Our US hero, having broken up been dumped by his girlfriend from the States, is clearly having no luck navigating the romantic waters of the UK. One of his housemates, who is always on the phone, has one friend in particular whom our hero gets to know… but only because he’s there to answer the phone. They end up developing their own friendship through these short conversations, giving us glimpses into their respective characters and past. When this `friend of a friend’ comes to England to help the flatmate move, sparks fly—but it seems impossible they’ll ever be together. But, he changes his travel plans, risks it all, and they will clearly live happily ever after. Or something.

I haven’t quite worked out the ending, but it’s clearly insipid and oh-so-chickflicky! I’m quitting my PhD right now, and going into script writing. :)

(I am not enjoying the reworking of the lit review.)

So, I wrote two little Scheme scripts this evening: ‘fsplit’, and ‘fjoin’. The first splits a file into pieces of a given size, and the latter rejoins those pieces.

Then, I figure out how to use ‘split’ from the command line. A common UNIX tool, split lets you chunk a file into arbitrarily large pieces. For example, I have an 8GB tarball I want to burn to DVD. I issued the command

split -d -a 4 -b 4500000000 FirstRun.tar split-FirstRun.

which says “Split the file ‘FirstRun.tar’ up into 4.5GB chunks. When you output the chunks, they should be called ‘split-FirstRun.’, followed by four numbers that increase for each new chunk.”

Now, I can burn those chunks to DVDs. To rejoin them, all I have to do is

cat split-FirstRun.* > FirstRun.tar

(after copying them back off the DVDs, of course).

I was looking for a solution to this earlier today, but almost everything seemed to rely on some kind of “proprietary” solution. I was leery of leaving my data in the hands of a 3rd party app, but GNU command-line utilities like ‘split’ and ‘cat’ aren’t going anywhere anytime soon.

I wrote a little Scheme script to render out images for every minute of the day. Then, I set up a cron job to run some AppleScript to change my desktop picture… every minute of the day.

The result?

23-37

The two circles fill as the hours (left) and minutes (right) increase; one transitions from blue to red (hours), and red to blue (minutes). The time is rendered in the lower left-hand corner.

A quick tap on the F11 key slides everything out of the way, and lets me quickly glance at the desktop. Yes, I know I could just leave it in the menubar, but I wanted to free up some space for other things.

I thought it was cool, anyway. :)