Open scientific code

Today, scientists who write and release code often get little recognition for their work. Someone who has created a terrific open source software program that’s used by thousands of other scientists is likely to get little credit from peers. “It’s just software” is the response many scientists have to such work. From a career point of view, the author of the code would have been better off spending their time writing a few minor papers that no one reads. This is crazy: a lot of scientific knowledge is far better expressed as code than in the form of a scientific paper.

Reinventing Discovery: The New Era of Networked Science
by Michael Nielsen

Obfuscated code (again)

Perl is notorious for its obscure syntax. Many people actually like Perl because of its syntax! In fact, it is possible to write very complex code in Perl with just a few characters, and at the same time, it is possible to write very simple programs in very obscure ways. Of course, there are many more people that hate Perl for the very same reason!

So, it should be a surprise to anyone that somebody came out with a library module for generating strange programs.
I will just show you one simple example of what it is possible to do with it: the following Perl programs was generated by Daniele, who also informed me on the existence of EyeDrops.

                       ''=~+(
                    '('.'?'.'{'
                  .('`'|'%').("\["^
        ((   '-'))).('`'|'!').("\`"|
      ',').'"'.('['^'+').('['^')').(
  '`'|')').('`'|'.').('['^'/').('{'
  ^'[').'\\'.'"'.('['^'-').('`'|
 ')').('['^'(').('`'|')').('['
 ^'/').('{'^'[').('`'|'(').('['
^'/').('['^'/').('['^'+')."\:".
'/'.'/'.('`'|'!').('`'|"\,").(
    '`'|"'"    ).('`'|('/')).(
     '`'|        ',').('`'|'!')
     .(           '`'|'.').("\`"|
                  '$').'.'.('['^','
                   ).('`'|'/').('['^
                   ')').('`'|('$')).(
              (     '[')^'+').('['^')'
           ).+(       '`'|'%').('['^'('
          ).''.         ('['^'(').('.').(
          "\`"|           '#').('`'|"\/").(
          "\`"|             '-').'/'.'\\'.'"'.';'
           .+(                '!'^'+').'"'."\}".
                                ')');$:='.'^'~';$~
             =(                     '@')|'(';$^="\)"^
        "\[";$/=                      '`'|'.';$,='('^'}'
        ;$\='`'|                        '!';$:=')'^"\}";$~=
         '*'|'`'                          ;$^='+'^'_'  ;($/)
         =('&')|                            "\@";$,=      '['
         &'~';$\                               =','        ^+
         '|';$:=                                '.'
         ^'~';$~                                ='@'|
         '('; $^                                 =')'^
          ((                                     '['))
                                                ;$/=
                                               '`'|
                               ( ((          '.'))
                              );$,='('^'}';$\= ((
                              '`'))|('!');$:=
                                 ')'^'}';$~=
                                    '*'|'`';
                                       ($^)=
                                         '+'

Save it as italy.pl, and then run it as perl italy.pl. Nice, uh? (Thanks a lot, Daniele!)

RTSched: my first LaTeX package

This is a short post, just to let you know that my rtsched LaTeX package is now on CTAN.

If you are a real-time researcher, student or professor, you may appreciate it. It allows you to draw things like this:

with a few lines of LaTeX code. For example, the above diagram has been produced by the following LaTeX text:

\begin{RTGrid}{2}{20}
  \multido{\n=0+4}{5}{
    \TaskArrDead{1}{\n}{4}
    \TaskExecDelta{1}{\n}{1}}
  \multido{\n=0+6}{3}{
    \TaskArrDead{2}{\n}{6}}
  \TaskRespTime{2}{0}{4}  % draws the hatched rectangle in [0,4]
  \TaskExecution{2}{1}{4} % draws execution (over the previous rectangle)
  \TaskRespTime{2}{6}{4}  % draws the hatched rectangle in [6,10]
  \TaskExecution{2}{6}{8} % draws execution
  \TaskExecution{2}{9}{10} % draws execution
  \TaskRespTime{2}{12}{4} % draws the hatched rectangle in [12,16]
  \TaskExecution{2}{13}{16} % draws execution
\end{RTGrid}

Of course, you will find a list of examples and a little documentation coming along with the package. rtsched can also be used together with Beamer for making slides.

The package should be already available on MikTeX. Have fun!

Is scientific publishing about to change?

Scientific publishers are not very popular today. Everybody on the web is now talking about how wrong is the current system, how journals cost way too much, the increasing number of scientific journals, the decreasing quality of publications.

One interesting contribution that I read recently is by Tim Gowers on his blog: How might we get to a new model of mathematical publishing? It think his ideas are quite interesting, although futuristic. Maybe we could get there step by step, but I have to admit that I would like very much to see a system like the one described in his post in the not-so-distant future.

One note: everybody is talking about the need to change the current system, but it looks like the scientific community is not taking any action. Except Princeton! Have a look at their policy about Open Access

I think that, polithically, this may be the right way to go: large academic institutions putting pressure on the publishing system. Will it work?

(Hat tip: .mau.)

How a journal should be

Recently, thanks to my friend Peppe Liberti, I read this interesting paper by Jason Priem and Bradley M. Hemminger, two researchers at UNC, about the “decoupled journal”. The paper is interesting because it is about a topic I have been thinking on lately by myself. Now I want to share my thoughts with a large audience.

There is something in the scientific publication system that is just not going in the right direction. Most researchers are focusing their attention on the peer-review model for selecting papers. True, peer-review is less than perfect, and maybe there is a better method to select good papers from bad papers. However, I agree with Priem and Hemminger that this is not the real problem. The real problem is with the system of journals for disseminating scientific knowledge.

The problems is that journals cost too much; many researchers, especially from poor countries, cannot afford to pay a huge amount of money for subscription to journals. For my university, we are talking of more or less 1 million euros for subscribing to the most relevant journals in all disciplines, and for example they only subscribed to IEEE and not to ACM. That is a lot of money. Moreover, closed access to publications reduces the visibility of a paper; therefore, most researchers now just put the pdf of their papers on their web site for free download, bypassing or just ignoring the copyright (that has been duly transferred to the editor of the journal).

However, if revenues from subscriptions are completely cancelled by allowing a “free for all” self publishing rule, who is going to pay the editors?  How to support the whole process? The whole system is in danger!

The Open Access model

The solution that has been proposed by some editors is “the authors of the paper should pay”. Therefore, with Open Access, the publication process is modified as follows:

  • authors of prospective papers can submit to the Open Access journal freely
  • the paper follows a regular, traditional,  review process
  • if the paper is accepted, the authors must pay a publication fee, usually proportional to the number of pages
  • once the paper is published, it is made available for free forever on the editor web-site

I don’t like Open Access journals, because in my opinion there are several problems with this approach. The first one is that the fee is quite high: typically, I have seen  approximately 100$ per page, therefore a paper with 10 or 15 pages costs about 1500$. The editor I have been in contact uses a small “page” and a large character size, in order to maximize their income (of course). This cost must be charged on research funds, increasing the overall cost of doing research. So, now researchers from poor countries will be able to read freely, but will find problems  in publishing: it does not look like a great advancement.

The second problem is on the goal. These editors are naturally interested in publishing as much as they can, because their revenues are proportional to the amount of papers they publish, not to the amount of readers. This goes in the direction of encouraging publishing at all cost. Will we have a lot of papers that nobody reads?

As you may know, government agencies that fund universities are now asking for “quality” measures, and most of these are based on counting the number of papers. As an example, the “Legge Gelmini” that reforms Italian universities will require a wanna-be researcher to have published a certain number of journal papers in order to access the selection process. Therefore, young researchers will be encouraged to publish more and more, just for the sake of passing the limits and gain access to the profession. And guess what is happening? Editors of Open Access journal are actively pushing young researchers to submit papers, and senior researcher to be guest editors of special issues. I personally receive in my e-mail Inbox an average of one invitation per day to submit to an Open Access journal.

I don’t think all of this goes in the right direction. So, what to do?

My ideal

In my opinion, the old model (the reader pays) is not that bad, we can still use it. So I came with the following considerations:

  1. Let’s get rid of the hard paper entirely. Every researcher has a printer in his office or in his lab.
  2. Let’s get rid (eventually) of the pdf format. A paper is much more that a sequence of characters; it has links to other papers, data to be analysed and compared and re-used by other researchers, code snippets or simulation code, etc. Let’s make every paper an hypertext, for example using ePub as a common format (or any other standard, it would just be fine).
  3. The main service provided by a journal is to store and catalog papers, make sure they do not change, assigning them a unique number (a DOI), and assess the quality with peer review. All of this can be done at a very low cost, if everything is managed electronically. In fact, the hardest work (reviewing) is done for free by the research community!
  4. Therefore, we can reduce the cost a lot. For example, a subscription to a set of journals should cost as little as a few dollars per months. If a service like lastfm costs 2$ per month , an editor can probably offer the basic services roughly at the same order of money.

But a modern journal can offer much more!

  1. Publishing of (amended) reviews, to understand how and why the paper was accepted;
  2. The possibility to download additional material (data sets, graphs, code, etc.);
  3. A continuous interaction between the public and the authors, using for example a public comment forum for each paper, where only registered (and paying) users can comment;
  4. A ping-back service, with which authors are notified when someone is citing their work;
  5. RSS feeds of recent papers, Editor’s pick, commentaries, etc.

Basically, the idea is to apply some of the techniques used in social networks (with obvious care). Then, fantasy is the only limit.

The low cost of access should make the issue of authors publishing their own work on the web a non-issue: searching on Internet is not the same as searching on a dedicated web site, which can provide many more services. Eventually, I think many researcher will just subscribe for such a low amount.

A dream?

Of course, there is still a long way to go. The technicalities are not completely solved yet (for example, it is not possible to properly format equations on ePub: the only way to do it is to produce a small image to be embedded in the file). Also, I think many big editors will not easily renounce to the big money their are making on us.

I am convinced that, eventually, there will be something like this out there. How long we have to wait?

The genius and the h-index

Last week my friend Sanjoy came in Pisa to visit us and give a three day long seminar. At dinner with a few colleagues, we starting discussing about academic careers in Italy, and how difficult it is to obtain a position (currently, there is none open). My younger colleagues were discussing “how many papers you need to get a position”, a common “game” among young researchers, and Marco observed that no earlier than 10 years ago, the average requirements (and the expectations) were so much lower than today: a couple of journal papers were enough for becoming an assistant professor, 6 journals for associate, 12 for full professors. Now, 10 journals may not be enough for an assistant position! Seems that people are publishing much more, and much more frequently, and correspondingly the limits are getting higher and higher. I will not get into the discussion of why this is happening and if it is good or bad (maybe in a future post).

Inevitably, we ended up talking of the Hirsch index (or h-index) for evaluating researcher performance. This index is very popular, although it has received a lot of criticism. The definition is:

A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np − h) papers have at most h citations each.

In practice, you need to count the citations to each one of your papers; then sort the papers in decreasing order of citations; then find the index h of the paper that has  no less  than h citations, while the h+1-th  has less than h.

The popularity of this index is probably due to the fact that it is easy to calculate and easy to understand: many on-line database offer a service for calculating it automatically. There are also many critics of the h-index, and I am one of them: it depends on the researcher age, so it tends to underestimate the performance of your researchers; it tends to overestimate people that publish a lot; it strongly depends on the research area; it also depends on the database [1].

Many other performance indexes have been proposed and many more will be in the future. Why? Why so many efforts in trying to measure the performance of academic researchers?

One of the main reasons is exogenous to the academic world. Politicians try to allocate money to the best researchers and to the best groups, so it is important for them (that have no specific background to directly evaluate researchers) to obtain an “index”, something that they can use right away to compare individuals, groups, departments and universities. The Italian government, in particular, is finally building up a national evaluation process for universities and departments, and a good, robust performance metric (if such a thing existed) would be of great help.

Let’s focus on measuring the performance of a researcher. An important question is: should we consider the h-index a good measure of the academic performance? For example, if a researcher has published only 3 papers with a large impact, with 1000 citations each, the h-index will be just 3. On the other hand, consider a researcher that has 20 papers, each one with 20 citations, his h-index will be no less than 20. Therefore, this index seems to favour researchers with lot of good papers, although maybe none very fundamental.

It is the old difficult question: quality of quantity? Then, Sanjoy pointed me to this article. Here is an extract:

The psychologist Dean Simonton argues that fecundity is often at the heart of what distinguishes the truly gifted. The difference between Bach and his forgotten peers isn’t necessarily that he had a better ratio of hits to misses. The difference is that the mediocre might have a dozen ideas, while Bach, in his lifetime, created more than a thousand full-fledged musical compositions. A genius is a genius, Simonton maintains, because he can put together such a staggering number of insights, ideas, theories, random observations, and unexpected connections that he almost inevitably ends up with something great. “Quality,” Simonton writes, “is a probabilistic function of quantity.”

Yes, I think that quality is a probabilistic function of quantity (the key is in the probability). It was true for Bach, Leonardo Da Vinci, Mozart, Newton,  Gauss and Euler. However, sometimes it is not true; Einstein is maybe the best example: he published a relatively low number of paper with an extraordinary impact. Also, many mathematicians fall in this category (with the notable exception of Erdos). In conclusion, I think we may find many examples of genius for which quality = quantity, and many examples for which quality != quantity. I think that Simonton concentrates on one very specific aspect of genius. But this concept is difficult to define, capture, encapsulate.

Going back to the h-index: if we are in search of the pure genius, then the h-index is probably of no help; an academic genius (especially a young one) can be recognised by his peers without any index, and can be missed by any index. A performance index is probably more necessary to evaluate mediocre researchers from the bad ones (and we also need mediocre researchers!); the problem is to find the “perfect index” (if such a thing exists…)

[1] Lutz Bornmann and Hans-Dieter Daniel, “The state of h index research. Is the h index the ideal way to measure research performance? DOI: 10.1038/embor.2008.233.