NSU mathematicians, together with foreign colleagues, investigated the resistance of Google Scholar to manipulation of the Hirsch index

The most famous indicator for evaluating a scientist's productivity in scientometrics is the Hirsch index (h) — the greatest number h such that the scientist has h publications cited at least h times. The Google Scholar Publications and Citation Database allows authors to create thier profile and combine publication records in it. This is done in order to combine records of different versions of the same publication, for example, a preprint and a published article. Merging or splitting records can affect the author's Hirsch index: merging creates a “summary publication” with higher citation, but reduces the number of publications.

Mathematicians from NSU, Australia, Germany, and Poland have shown that it is easy to find a set of operations for combining and splitting publication records that maximizes the Hirsch index. Scientists have also proposed an approach to prevent manipulation with the Hirsch index.

математикирисунок1.png

The team's first work on this topic was published in 2016 in Artificial Intelligence, the flagship magazine on artificial intelligence. In it, the authors showed quick algorithms for finding a set of such joining operations that maximize the Hirsch index in Google Scholar.

Continuing to develop this topic, the researchers published their new findings in the newly created journal Quantitative Science Studies. In a new article, the authors show quick algorithms for finding a set of record splitting operations that maximizes the Hirsch index in Google Scholar. The studies provide data from computational experiments on data from the profiles of young scientists in the field of artificial intelligence (namely, participants in the flagship conference IJCAI and authors from the AI's 10 to watch lists). It is assumed that manipulation of the Hirsch index may be attractive specifically for young scientists for the purpose of employment. Experiments have shown that a significant increase in the Hirsch index in the Google Scholar system can be achieved with just a few operations of combining or splitting publication records.

Vd9r3lkdfMQ.jpg

The body of each candle in the figure shows the median, first and third quartiles for all authors.

One of the authors of the article, head of the Algorithmics Laboratory of the Mathematics and Mechanics Department of NSU, Rene van Bevern, comments:

Our work draws attention to just one of the many possibilities for manipulating the numerical performance of scientists. Scientometric indicators can be used only as additional means of evaluating researchers, they cannot be presented as mandatory requirements for, for example, participation in competitions for grants or for a position.
Our work is partially provocative. We deliberately published them in journals with high scientometric indicators, so that our results on manipulating scientometric indicators would appear in all reports requiring the fulfillment of scientometric indicators.

The possibility for manipulation, as scientists have found, is provided by the method of counting citations of combined records in the Google Scholar database. In this system, the citations of a combined record is defined as the number of articles that reference at least one of the articles in the combined record. The authors propose another, more obvious way of counting the number of links to combined records, which eliminates the possibility of double citation between pairs of combined records and solves the paradox of links between publications included in a single record. This can be graphically represented as follows.

Математикирисунок3.png

As shown in the right column, by combining publications into one record the manipulator can inadvertently reduce the number of citations of other records. This makes it more difficult to find the set of join/split operations to maximize the Hirsch index. Mathematicians show that in this case, the problem of maximizing the Hirsch index belongs to the class of NP-complete problems for which there are no efficient solution algorithms under the hypothesis P ≠ NP.

Rene van Bevern jokes:

Many works have been devoted to the manipulation of scientometric indicators, including the Hirsch index. There are also many works devoted to identifying such manipulations. No matter what performance indicators are introduced, no matter what the rules of the game are established, the researcher is the researcher for that, in order to immediately begin to study their resistance to manipulation. Such issues are actively addressed in the framework of algorithmic game theory and collective choice theory. Of course, my Hirsch index in Google Scholar is also slightly embellished.