My fellow scientists cite my work at random

Although the average number of times people cite my papers is lower than that for a Siamese cat, people do cite my work. And so Google Scholar reckons that at the time of writing my papers have attracted a total of N = 2858 citations and I have a h-index of 27. The value h of the h-index (named after Jorge Hirsch) is the number of papers that have been cited, i.e., referenced, at least h times. I have published 27 papers which have all been cited at least 27 times.

Clearly the total number of citations of author and their h-index should be correlated, but Alexander Young has looked quantitatively at how strongly they are correlated, in the sense of if you know the value of N, can you predict the value of h? If you can, then basically you can just use one of the two numbers to assess an academic, using both numbers would then be a bit pointless as then the two numbers have very little more information in them than does just one of them.

The answer is that for many scientists, given N, you can predict the value of the h-index with reasonable accuracy (see Young’s paper and a blog post by Claus Wilke), using the very simple formula

h = 0.54 √N

This is pretty accurate for me, 0.54√2858=29, close to the true value of 27. Young’s paper has some fancy math in it but I think the model he uses is just the simplest possible. I.e., you start by saying that N citations can be distributed amongst the author’s P papers in some countable number of ways.e.g., if say N = 5 then the author could have P = 5 papers, each cited once; P =4 papers, one of which is cited 5 times, etc etc. He then assumes that all of these ways of distributing the citations is equally likely, and then uses some standard results in mathematical stats to get the equation above.

This assumption of equal probabilities looks dramatic, why should each pattern of citations be equally likely? But it is actually quite a common approximation, and it often works surprisingly well, as it appears to here. It effectively corresponds to assuming that the distribution of citations is as random as possible.

4 Comments

Paul Stevenson (@gleet_tweet) says:

January 6, 2015 at 8:18 am

I get 0.54√1303 = 19.44 compared with actual value of h=20.

1. Richard Sear says:
  
  January 6, 2015 at 10:59 am
  
  That’s close. The formula really does seem a good approximation for many scientists.
  
  1. Paul Stevenson (@gleet_tweet) says:
    
    February 7, 2015 at 3:17 pm
    
    Just noticed this guy in nuclear physics for whom the formula doesn’t work : https://scholar.google.com/citations?user=e7vL-gQAAAAJ&hl=en
Richard Sear says:

February 7, 2015 at 5:02 pm

Yeah, the h-index underestimates the impact of people who develop one massively used technique, so there must be a fair few people like that. Guess it is harder to see how the opposite could occur, e.g., people with scarcely more than say 20 papers but almost all of which are cited 20 or a few more times.