Quantifying research quality using article level metrics

Page

Quantifying research quality is a buzz-activity in academia for the last two decades. The irony is lost in the paper work. For reasons best left out in this essay, this activity has come to stay in our academics. One such quantifying-quality measure (QQM) evolved recently is the Impact Factor (IF) of journals [1] that publish peer-reviewed research. Another QQM is the number of citations a peer reviewed research article begets in time. The major difference between these two is, while the former is a journal metric, the latter is an article level metric.

Research articles should be evaluated on their own merits. On that assertion, there could be universal agreement in scientific community. Obviously, more ways to judge the quality of individual research articles is welcome over judging entire journals. This is where introduction of more article level metrics (ALM) – like the number of citations – show promise. We recount the introduced ALM indices [2] in the following section.

The beginning of the end for impact factors and journals, a neat online article by Richard Smith [3], explains the newly introduced ALM indices with examples. Another recent article published in PLoS by Cameron Neylon and Shirley Wu [4] discusses the pros and cons of the newly introduced ALM indices. But both these articles leave out in their discussion, certain key journal requirements for proper functioning of the proposed ALM and their related shortcomings.

Also, journal impact factor is being seen as a very poor measure of article impact. One distinction is essential in such generalization. Because we are able to debunk the efficacy of impact factors, we are not debasing the reputation earned by research journals.

In this article, we discuss the efficacies of the proposed ALM indices, journal impact factor contrasted with the prevailing journal reputation and related issues in detail. In the summary, we provide possible rectification measures for ALM.

*****

Some time back, the Public Library of Science (PLoS), in perhaps a foresighted move, began placing information about the usage and reach of published articles onto the articles themselves. These evaluating measures or indices are called Article-Level Metrics (ALM) by them to distinguish from the traditional journal-level metrics like IF.

The available data in each PLoS article, to quote from their ALM definition [5], includes

  • Article usage statistics – HTML pageviews, PDF downloads and XML downloads
  • Citations from scholarly literature – currently from PubMed Central, Scopus and CrossRef
  • Social bookmarks – currently from CiteULike and Connotea
  • Comments – left by readers of each article
  • Notes – left by readers of each article
  • Blog posts – aggregated from Postgenomic, Nature Blogs, and Bloglines
  • Ratings – left by readers of each article

All the above user-generated-addendum comprise ALM. The philosophy behind the introduction of ALM is noble. The quality and impact of an individual article should not be determined by the journal in which it happened to be published. We now discuss the uncertainties of the proposed ALM indices in detail.

*****

Citation index is one existing article level metric. If self citations are excluded, this is a fair measure for the acquaintance and relevance of an article amongst other researchers in a certain subject area. But it has two major drawbacks. It is slow in accumulation as it takes a few incubation years for others to read and start citing an article. Who decides this incubation period remains subjective. Next, as Cameron Neylon puts it, citations do not express the sentiments of the referrer. An article can be cited for several reasons including the not so reputable ones: (i) cited without any discussion (ii) cited to critique its results. Nevertheless citations offer a fair measure of worthiness of an article, as it is based on other peer reviewed research articles. The measure maintains a peer-to-peer credibility.

The rest of the ALM mentioned in the above section are even more incredible, if not dubious. Article usage statistics, Social bookmarks, Comments, Notes and Ratings; they all differ in a major way from the citations. None of them is peer reviewed. Peer review is not imposable on such ALM, because, except for a few blog posts and comments, they don't carry any original research content that is comparable to or related to the article that is being rated. Such ALM in effect tell the article author that scores of us are aware of the article online, while citations reach the author personally by relating more original content to the research content of the article.

Even looking at them individually, more doubts remain about the veracity of the introduced ALM. For instance, does the number of downloads speak for the quality and reputation of an article? Certainly not. At the end of every year, a few thermal science/engineering related journals release online, a list of their top ten or twenty most downloaded articles for that year or quarter. Without spite towards any individual or research, I could say not all the top five downloaded ones are the top five in quality.

Measuring online traffic to an article measure something. But it is not always quality. Yes, definitely for some reason the article is being linked, clicked and noticed, but this translates to quality as the book sales of Harry Potter translates to its research quality. Mentions in blogs have their own uncertainties similar to the measure of number of citations. For instance, a blogger can mention an article because it probably contained media that she liked. I have blogged about research that I haven't pursued further or had expertise to judge or comment upon. Just because I can blog or write about them in the internet, I cannot judge the impact or quality of research done in fields I am not an expert in. But ALM allows me to sway such judgement.

For instance, there is a a school of thought popular amongst science bloggers that science should be 'liked', for which, it should always be 'appealing'. What is the article level metric for the article that recently solved the Poincare conjecture? Is it online at all for us to start applying the metrics mentioned? How many of the bloggers who 'reported' the 'news' of the proof for Poincare conjecture have successfully captured the essence of the solution? Buzz is a characteristic of blogs and internet in general. Not quality.

*****

Scientific impact is not a simple concept to be confined and described by a single number. While trying to device measures to quantify quality, as a reference point we should keep asking the questions of why measure it at all. The recently evolved Impact Factor of journals is a case in contention.

The misuse of IF are detailed in its wikipedia page [1]. Major criticisms include IF not being an absolute measure, it can be spiked up by journal editorial policies and can be misused to judge the 'reputation' of a researcher. As a concept, IF requires at least two journals to provide a measure of some relevance. In general, in a field of research, relatively less impressive journals are required for the IF concept to provide, if at all, a credible measure of journal quality.

If five journals cater a research field and all have nearly identical IF, use of IF as a measure of journal quality or impact becomes redundant. However, active researchers in the field, from their work experience, could judge the more reputed journals even amongst these five. This situation is not hypothetical. There are about 15 to 20 journals catering thermal and fluid sciences, where most of my research is published. Their IF range between 0.5 and 2 with a majority of journals clustering between 0.9 and 1.5. But as researchers we know which amongst these are 'better' journals that are read by peers – not necessarily the ones with the highest IF values.

The concept of a journal is established with proper fundamentals. In its elements, a journal is peer review. Scholarly the peers, competent their peer review. Stingier such peer review, higher is the standard of the published article. Hence the reputation of a journal. If competence prevail throughout the forward sequence, the backward deduction about an article based on the reputation of the journal it is published remains concrete. There is nothing wrong about such journal reputation as a quality measure of the impact of all the articles published in them. To harp on such a measure is a case of sour grapes.

Peer review process at times declines publication of quality content for varied reasons and worse, accepts dubious or plagiarized research for publication. Despite such short-comings, peer review remains a competent sieve for research publications, as shown recently by a study from the publishing research consortium [6]. Research written and reported as peer reviewed journal articles is intended only for peers. It would remain mostly incomprehensible to non-experts, say, online public. Keep in mind, in the present situation of information explosion, experts of one, perhaps narrow, field of research become 'public' when they read research from another subject domain.

To presume research articles are palatable to the ALM tweak-able online public (traffic) could even create misunderstanding of what is important research. A research article without a picture could become drab and one with only theorems could be uninteresting, if not incomprehensible. How, in such cases, would ALM ensure fairness in the generated impact. Devising metrics attuned and catering to internet public and invoking them to rate peer reviewed research is not right. Sticking with such ALM, eventually, we may descend to the journalist credo of what ain't impressionable to or understandable readily by 'common man' ain't good science/research.

Perhaps you cannot fool all of the ALM indices all the time. Collectively, they should mean some deserved attention for an article. But as such, a researcher perhaps counts only her number of citations for article imapct in all academic purposes. The real impact of the research perhaps remains unpredictable for several years. But with several ALM indices, she must advertise more to get noticed above the cumulative noise. Instead of the current (loathsome) practice of asking others to 'cite' one's article, one asks the article to be cited, downloaded, bookmarked and advertised in all social networking sites, blogged about, viewed (not necessarily read) many times over and commented… a sure way to raise, if not any other impact index, one's blood pressure.

*****

The indices proposed so far in ALM are devised under a presumption. That all research is published in open access journals available for free on the internet. For a peer reviewed research article to be fairly rated in all the proposed ALM indices, it should first be published and available on the internet for free.

One may snort to prophecy a road to oblivion for journals lacking online presence. Online presence is one thing (possible) but open (free) access is still a distant cyber-dream for many otherwise established and reputed journals controlled by middlemen publishers. PLoS, the originator of ALM is indeed open access (and they be blessed for that). But here is a perspective from another corner of science and the internet. There are around 15 to 20 journals that can be related to thermal science and engineering research. Although they all have online presence, not one of them is an open access journal. They are not available for free reading or download for non-subscribers. Publish your research in these journals and your article would never get a fair rating through ALM. Before accepting the universality of ALM, realizing a critical mass for open access journals is a must.

I am not this Elizabethan scientist who is against internet and ALM and likes to communicate his research findings only by perceiving epistles to peers. I only disagree with how ALM functions. At present, ALM allows, say, the quality of apples to be judged not even by only-fruit-likers, but provides swaying scope for any eater – with or without knowledge about apples and how to rate them. Leave research (in one field) to be rated by researchers (from that field). For doing this, ALM is redundant. What exists today in the form of reputed journals of one field are sufficient.

If the ALM indices are to be applied on research articles, the 90-9-1 rule [7] of social media should be reversed in such ratings. An inverted pyramid standing on its tip is to be conceptualized. Importance should be given to opinions or related research by peers – other researchers in the subject area – than to online public for whom the research paper is not written.

Or judge not with ALM, the research paper but its derivative works. The proposed ALM can readily be applied to blogs and online material discussing the original research article. Creating an online buzz is not necessary for a research article to make its mark among co-researchers. But secondary (derivative) content are freely available (open access) and primarily written for online public. They are supposed to create an online buzz. ALM is the right tool to judge such impact. For instance, the research blogging aggregator [8] can employ ALM to generate the impact of how secondary expository material reaches the online public.

Quantifying quality has come to stay in academics. Journal impact factor is an unfair measure of quality of individual research. However, journal reputation is very much a quality measure among research community. Article level metric is certainly welcome. But it is premature to settle with the proposed ALM indices as sufficient measures of impact and quality of individual research. Continued experimentation is required before ALM gains credibility as a fair measure. As such, I am happy to be judged with my publications in reputed journals and refereed research paper citations. And I am happy to write on the internet, about such research.

References

  1. Impact Factor wikipedia page
  2. Article-level metrics at PLoS: addition of usage data – Blog article link
  3. Smith, R., The beginning of the end for impact factors and journals – Essay link.
  4. Neylon, C., & Wu, S. (2009). Article-Level Metrics and the Evolution of Scientific Impact PLoS Biology, 7 (11) DOI: 10.1371/journal.pbio.1000242
  5. PLoS Article level metrics – http://article-level-metrics.plos.org/
  6. Ware, M. and Monkman, M., 2008, Peer Review in Scholarly Journals – perspective of the scholarly community: an international study Link to paper
  7. The 90-9-1 rule of social media Website Link
  8. Research Blogging Website Link