The genius and the h-index

Last week my friend Sanjoy came in Pisa to visit us and give a three day long seminar. At dinner with a few colleagues, we starting discussing about academic careers in Italy, and how difficult it is to obtain a position (currently, there is none open). My younger colleagues were discussing “how many papers you need to get a position”, a common “game” among young researchers, and Marco observed that no earlier than 10 years ago, the average requirements (and the expectations) were so much lower than today: a couple of journal papers were enough for becoming an assistant professor, 6 journals for associate, 12 for full professors. Now, 10 journals may not be enough for an assistant position! Seems that people are publishing much more, and much more frequently, and correspondingly the limits are getting higher and higher. I will not get into the discussion of why this is happening and if it is good or bad (maybe in a future post).

Inevitably, we ended up talking of the Hirsch index (or h-index) for evaluating researcher performance. This index is very popular, although it has received a lot of criticism. The definition is:

A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other (Np − h) papers have at most h citations each.

In practice, you need to count the citations to each one of your papers; then sort the papers in decreasing order of citations; then find the index h of the paper that has  no less  than h citations, while the h+1-th  has less than h.

The popularity of this index is probably due to the fact that it is easy to calculate and easy to understand: many on-line database offer a service for calculating it automatically. There are also many critics of the h-index, and I am one of them: it depends on the researcher age, so it tends to underestimate the performance of your researchers; it tends to overestimate people that publish a lot; it strongly depends on the research area; it also depends on the database [1].

Many other performance indexes have been proposed and many more will be in the future. Why? Why so many efforts in trying to measure the performance of academic researchers?

One of the main reasons is exogenous to the academic world. Politicians try to allocate money to the best researchers and to the best groups, so it is important for them (that have no specific background to directly evaluate researchers) to obtain an “index”, something that they can use right away to compare individuals, groups, departments and universities. The Italian government, in particular, is finally building up a national evaluation process for universities and departments, and a good, robust performance metric (if such a thing existed) would be of great help.

Let’s focus on measuring the performance of a researcher. An important question is: should we consider the h-index a good measure of the academic performance? For example, if a researcher has published only 3 papers with a large impact, with 1000 citations each, the h-index will be just 3. On the other hand, consider a researcher that has 20 papers, each one with 20 citations, his h-index will be no less than 20. Therefore, this index seems to favour researchers with lot of good papers, although maybe none very fundamental.

It is the old difficult question: quality of quantity? Then, Sanjoy pointed me to this article. Here is an extract:

The psychologist Dean Simonton argues that fecundity is often at the heart of what distinguishes the truly gifted. The difference between Bach and his forgotten peers isn’t necessarily that he had a better ratio of hits to misses. The difference is that the mediocre might have a dozen ideas, while Bach, in his lifetime, created more than a thousand full-fledged musical compositions. A genius is a genius, Simonton maintains, because he can put together such a staggering number of insights, ideas, theories, random observations, and unexpected connections that he almost inevitably ends up with something great. “Quality,” Simonton writes, “is a probabilistic function of quantity.”

Yes, I think that quality is a probabilistic function of quantity (the key is in the probability). It was true for Bach, Leonardo Da Vinci, Mozart, Newton,  Gauss and Euler. However, sometimes it is not true; Einstein is maybe the best example: he published a relatively low number of paper with an extraordinary impact. Also, many mathematicians fall in this category (with the notable exception of Erdos). In conclusion, I think we may find many examples of genius for which quality = quantity, and many examples for which quality != quantity. I think that Simonton concentrates on one very specific aspect of genius. But this concept is difficult to define, capture, encapsulate.

Going back to the h-index: if we are in search of the pure genius, then the h-index is probably of no help; an academic genius (especially a young one) can be recognised by his peers without any index, and can be missed by any index. A performance index is probably more necessary to evaluate mediocre researchers from the bad ones (and we also need mediocre researchers!); the problem is to find the “perfect index” (if such a thing exists…)

[1] Lutz Bornmann and Hans-Dieter Daniel, “The state of h index research. Is the h index the ideal way to measure research performance? DOI: 10.1038/embor.2008.233.

Advertisements

7 thoughts on “The genius and the h-index

  1. An interesting observation: “A performance index is probably more necessary to evaluate mediocre researchers from the bad ones (and we also need mediocre researchers!); the problem is to find the “perfect index” (if such a thing exists…)”

    I wonder about the effect the use of a less-than-perfect index has on the choices made by the mediocre researchers…

    Reply
  2. I agree that “a genius can be recognized by his peers without any index”, but it is often the case (at least, here in Italy) that a genius (or a mediocre researcher as I am) will be judged by people that do not share the same scientific background. Therefore, synthetic parameters that are easy to compute and understand can help them discriminating between mediocre researchers and geniuses, especially when a selection has to be made in a reasonable amount of time among a large number of candidates (again, Italy’s case). As mentioned in the linked article, “the use of citation indices is often necessary, owing to time constraints, but not ideal”. Further studies are needed to derive the best (combination of) indices.

    About the h-index, let me reply to the main drawbacks you pointed out:
    1- it depends on the researcher age, so it tends to underestimate the performance of your researchers: indeed, but isn’t research experience a value that should be taken into account in the evaluation process? At least as long as a scientist keeps publishing citable units…
    2- it tends to overestimate people that publish a lot: I do not fully agree. I know many people that have hundreds of paper, that are never cited, resulting in an h-index below 10. Even self-citations are difficult to use to artificially increase the index above 10, although they can significantly bias the index for smaller values.
    3- it strongly depends on the research area: absolutely true. But most of the selection processes are typically limited inside a particular research area. When this is not the case, it shouldn’t be difficult to normalize the index among the various areas. I think this is one of the problems that ANVUR (the italian agency for the evaluation of the academic research) is trying to solve. I’m also participating in an interesting discussion about how to do that for a ranking of the top italian scientist that appeared in many italian newspapers ( http://www.topitalianscientists.org ). From that ranking it is clear that scientists in the field of medicine and biology have a higher h-index than computer scientists and mathematicians (BTW, from that ranking you can also see that the almost all most famous italian scientists have a very high h-index).
    4- it also depends on the database: true. But I believe that Google Scholar is able to derive the correct index with a very small approximation. Of course, there can be errors that need to be solved manually (double citations, homonymies, irrelevant citations). But databases are becoming every day most precise in detecting such anomalies.

    To summarize, the main drawbacks of the h-index are in my opinion related to the differences in the research fields (typical number of authors per paper, page length of the least publishable unit, average number of citations in the bibliographies, etc). But inside the same research field, it is the best you can use for a time-constrained evaluation. But of course, if you have time you can go on and read the best 10 journal papers of each one of the 40 candidates of the selection process…

    Reply
  3. @Sanjoy: this will be the topic of my next post!

    To both: here is the reference to the “fake” researcher, Ike Antkare. Try to search for him on Google Scholar 🙂

    However, the trick is so stupid that it can be detected “easily”. Here is a reference to a statistical analysis that identifies fake researchers like Ike Antkare.

    However, raising the number of citations “honestly” in a community is not that difficult. My observation is that small closed communities will start to use dirty tricks to artificially raise the number of citations. For example, the editors of some journal in our scientific area are already putting psychological pressure on authors to cite papers of the same journal… Of course, this may eventually cause a wild run to raising citations at all costs!
    But, as I said, this is an argument for the next post.

    Reply
  4. I did a quick web-search for your (Peppe’s) h-index; one site lists it as 26, another as 18. That’s quite a range!

    Reply
    • I guess you tried google scholare and microsoft academic research. They are both generic search engine, that is they look for papers on the Internet, so they typically do not check the authenticity of the source, or maybe they do a simple filtering.
      On Scopus (the database of Elsevier, where all papers and citations are actually checked, or filtered with some complex filtering), I get an index of 14, or 12 excluding self-citations. The difference is quite evident.
      I guess these same figures are true for most CS researchers. Did you try your index?

      Reply
  5. Quoting Marko’s post: “But inside the same research field, it is the best you can use for a time-constrained evaluation.”
    Why do you mean by “best metric” ? I guess, among available ones automatically computed and (unfortunately) widely known.
    I strongly agree with Peppe’s critics:

    1) it depends on the researcher age, so it tends to underestimate the performance of your researchers;
    2) it tends to overestimate people that publish a lot;
    3) it strongly depends on the research area;
    4) it also depends on the database

    And the above posts about Microsoft Academic versus Google Scholar versus Scopus is an empirical proof of 4).

    Even though we remove 3), focusing on a single research area, I’d add further drawbacks:

    5) it merely depends on how many times a paper has been cited, but it says nothing for example about how wide is the community that is citing it (imagine a couple of researchers who keep working on the same problem useless for the society, keeping improving on the results of each other, whose research nobody cares about); I claim one should look also at how many different people are citing each paper, that means the research is not being considered important merely by 1 person, but by a significant group of researchers (better if from different institutions, to remove possible bias);

    6) it also doesn’t say anything about the “meaning” of the citation: sometimes you have clearly negative citations, for example a work that simply proves another work was wrong, fixing other authors’ mistakes; other times, you solve a problem in a completely different problem, but you mention the other works in a long list of citations in the related work section without any real comparison, and without any concrete practical, experimental or simulation comparison. Other (unfortunately few) times, you actually build on the other author’s work, by extending the model, making it more general, expanding on the conclusions, removing unneeded assumption, reusing components of their architectures or concepts of their construction, or comparing concretely on a simulation or experimental basis. However, this kind of very important differences in the impact that a publication is having, is completely lost in the merely Boolean information about whether or not it has been cited. Nor the mere number of citations can cover this gap, IMHO;

    7) what about redundancy ? many authors tend to reuse parts (or lines of reasoning), e.g., introduction stuff, throughout various papers of them, in which they’re keeping citing always the same works; however, still this is “only” and indication that the same author (or authors’ group) is acknowledging the work(s) done by other(s). This acknowledge to another researcher’s work is merely repeated by the same author over more and more of his papers, but the “impact signals” may be differently interpreted if it were different researchers and different research groups actually citing an author’s paper (see 5) ).

    You would argue that the above points cannot be fixed anyway. However, are we really sure it has to be like this ? For example, getting rid of the useless (and costly, and bad for the environmental impact) paper-based production of papers, imagine a publication model in which you write your paper using a tool that allows you:
    -) to differentiate among the type of citation you’re doing. For example, negative citations, discursive citations, and real concrete comparisons (for which you’re forced to detail the comparison);
    -) to back-refer to other introduction material you already wrote in the past, without any need to repeat it over and over again throughout all of your papers;
    -) to back-refer to related work discussions you wrote in the past, as well as incrementally change, amend, integrate and expand previous comparisons you wrote among your research and others’ research;
    -) to back-refer to notation, syntax, models, assumptions, you already stated out clearly in other papers, without any need to copy them again and again in all of your papers; one researcher who is following your research (or reviewing your paper) might read your new findings in one third of the time, with this simple back-references to previous findings, so that it is clearly visible and stated what’s really new, without loosing time for staff that’s been already said, commented, reviewed, etc.
    -) to incrementally change your previous papers, fixing unimportant mistakes you made, or perhaps improving slightly (or fixing) equations, with a clearly highlighted and stored history & revision log;
    -) allow other researchers to post technical comments in a blog-like fashion, and, my last provoking statement may be about “why can’t reviews made by reviewers be publicly available” ?

    Just my 2 cents (well, perhaps I’ve dropped 4 of them 🙂 ).
    Any comment welcome of course.

    Reply
  6. Pingback: How to increase the Impact Factor of a Journal « The land of algorithms

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s