An algorithm ranks the reputation of peer reviewers on the basis of how many citations the studies they have reviewed attracted.

The tool, outlined in a study published in February1, could help to identify which papers could become high impact during peer review, its creators say. They add that, during peer review, authors should put the most weight on the recommendations and feedback from reviewers of previous papers that have been highly cited.

The study authors extracted citation data from 308,243 papers published by journals of the American Physical Society (APS) between 1990 and 2010 that had accumulated more than 5 citations each. Information about the referees of these papers was not available, so the authors used an algorithm to create imaginary reviewers, which rated papers on the basis of an algorithm that was trained on citation data from the APS data set. Using the review scores that these papers received in real life (a score of 1 being poor and 5 being outstanding), the study authors compared how closely the imaginary reviewers’ scores correlated to the actual scores the papers received.

To rank the imaginary reviewers, the study authors tracked the citations accumulated by the papers published between 1990 and 2000 and checked the review scores they were given. Imaginary reviewers that gave high review scores to papers that went on to attract a high number of citations were given a high ranking.

The authors then tested how effective these reputation rankings were in predicting citation numbers of papers refereed by the same imaginary reviewers in the second decade of the data. The study found that the imaginary reviewers’ recommendations on the 2000–10 papers were in line with the actual citation counts of these papers over that time span, says study co-author An Zeng, an environmental scientist at Beijing Normal University. This suggests that the algorithm is good at predicting high-impact papers, he adds.

More eyes on peer reviewers

Previous attempts to quantify and predict the reach of studies have been widely criticized for relying too heavily on citation-based metrics, which, critics say, exacerbate existing biases in academia. A 2021 study2 found that non-replicable papers are cited more than replicable studies, possibly because they have more ‘interesting’ results.

Zeng acknowledges the limitations of focusing on citation metrics, but says that it’s important to evaluate the work of peer reviewers. Solid studies are sometimes rejected because of one negative review, he notes, but there’s little attention given to how professional or reliable that reviewer is. “If this algorithm can identify reliable reviewers, it will give less weight to the reviewers who are not so reliable,” says Zeng.

Journal editors often use search tools to identify candidates to peer review papers, but they have to manually decide who to contact. If referee activities were ranked and quantified, this would make it easier for journal editors to choose, Zeng points out.

However, ranking reviewers on their reputation is likely to exacerbate the inequities and biases that exist in peer review, says Anita Bandrowski, an information scientist at the University of California, San Diego.

As previous data have shown, most of the responsibility of the peer-review process in science falls to a small subset of peer reviewers — typically men in senior positions in high-income nations that are geographically closer to most journal editors.

Bandrowski notes that the algorithm might favour those with a long history of reviewing, because they’ve had more time to accumulate citations on their refereed papers. “The oldest reviewers by this metric would be the best reviewers and yet the oldest reviewers are going to be retired or dead,” she says.

Zeng disagrees that his approach will make the selection of peer reviewers more inequitable than it is now. After implementing the reputation ranking, editors might find that some reviewers who are not frequently invited have high reputation scores — in some cases better than those who are inundated with referee requests, he says.

Capturing the nuance

Laura Feetham-Walker, a reviewer-engagement manager at the Institute of Physics Publishing in Bristol, UK, worries that the algorithm might not account for incremental studies, negative findings and replications of previous studies, all of which are crucial for science, albeit often not highly cited.

“Under their system, a reviewer who gave a favourable recommendation on an incremental study — for example, for a journal that does not have novelty as an editorial criterion — would go down in the reviewer reputation ranking, simply because that manuscript would be unlikely to accrue large numbers of citations when published,” she says.

Neither does the ranking account for researchers who have never reviewed before, Feetham-Walker adds, or at least those who have never reviewed for a particular publisher.

“We know that a reviewer’s ability to provide a helpful review is dependent not just on their expertise, but also their availability and interest in the subject matter. We also know that reviewers are human, and their reviewing behaviour can change over time depending on various factors,” Feetham-Walker says. “A nuanced algorithm that took all of this into account, as well as adding new reviewers to enrich the pool, would be of genuine value to publishers.”