Research integrity breaches present significant financial and reputational risks for publishers, as shown by the recent detection of paper mill activity in published articles, causing much upheaval in the industry.
And while those outside the industry wring their hands and wonder how this could happen, those of us on the inside know that the challenges publishers face with the ever-increasing volume of research output, the need to keep costs low and APCs affordable, and the thin margins and unpaid work – such as peer review – make it all too likely that sometimes things slip through the cracks.
Developing technologies, particularly AI, play a complex and double-sided role in issues of research integrity. On the one hand, generative AI tools like ChatGPT, SciGen, and MathGen can produce papers based on an underlying database of similar research content, scraped from the web and open access materials available without a paywall. Admittedly, these tools are often intended to produce “professionally formatted nonsense,” but development of the underlying code by AI specialists can easily extend the functionality to produce seemingly reliable research papers that can pass peer review.
On the other hand, the same technologies that enable the generation of these papers are now being put to use to gatekeep against them and to help identify large-scale “bad actors” in the scholarly research system.
Given the scale at which paper mills are able to produce material, it is increasingly important that publishers have scalable, affordable ways to check the integrity of the papers they are receiving – which means at least some of the heavy lifting must be done by technology. And, as the saying goes, “where there’s muck there’s brass,” so publishing services vendors are now seeing a business opportunity in supporting publishers in areas like image manipulation checks and plagiarism identification.
But, as always, these tools vary in their accuracy and appropriateness, so publishers need to ask some hard questions before committing to any of these solutions. Key questions to ask would include understanding the database on which the AI has been trained: Is this database appropriate for your content? What are the statistical factors underlying a paper mill identification? Does the sensitivity of the tool match your internal standards for rejection?
Publishers also need to consider how these tools fit into their workflow. Although many are sophisticated, they may require manual upload of files rather than being seamlessly integrated. Who will manage that work? And, once a warning is flagged up, what is the workflow for review and decision-making?
In the end, we can’t avoid the impacts of fast-moving technologies on our industry, for both good and bad. What we can do is educate ourselves about how these technologies work, and what our options are to manage the associated risks. Building a greater understanding of the technology, developing processes to support taking action, and sharing and collaborating with other publishers will all be critical in helping us to tackle the significant risk that paper mills present, not only to our bottom lines but to society’s ongoing trust in the integrity of the scientific record.
Maverick offers a program of research integrity services, including trraining, to help publishers operationalize research integrity and to ensure safeguards are integrated throughout the workflow – from manuscript submission and peer review to publication and data management. For a free consultation, contact your Maverick representative or email email@example.com.
By Nancy Roberts, Maverick Head of Technology and Content
Nancy Roberts is the founder of diversity and inclusion start-up, Umbrella Analytics. She has worked in a variety of production and operational roles across publishing for the past 20 years, following on from the completion of her postgraduate publishing diploma at West Herts College. She has a Ph.D., in Postcolonial Feminist Literary Theory and an Executive MBA from Cranfield University.