The other day, I was reading a draft of one of my students’ MSc dissertations, this one on pooled testing for Covid-19. The student had added a whole new chapter since I’d last seen it, but it all seemed to fall into some sort of English language uncanny valley. For example, it started off like this:

The fundamental thought of pooled testing is that it permits general wellbeing authorities to test little gatherings – called pools – of individuals utilising just one test. This implies you can test more individuals quicker, utilising less tests and for less cash.

It’s not that it’s unclear, exactly – but it doesn’t really sound like a human being talking, somehow. Perhaps the student’s written English wasn’t that great, but I also wondered if they might have plagiarised it from somewhere. A Google search for the whole section didn’t bring up any hits, so instead I Googled smaller segments to see if I got anything. I started with the odd phrase “general wellbeing authorities” – but that gets 3000 hits on Google, which is no good for tracking down any copying.

But looking further at those Google hits, there seemed to be a lot of news sites I’ve never heard of and skeevy-looking journals. And checking further still, they all seem to be written in the same uncanny-valley English as my student’s work. For example, the first Google link is a news story from the site “25 Hour News” that starts like this:

Coronavirus: Trump Sticks By Discredited Hydroxychloroquine

US President Donald Trump has again guarded the utilization of hydroxychloroquine to avoid coronavirus, negating his own general wellbeing authorities.

So what’s going on here? As you’ve probably guessed by now, what’s going on is “thesaurus plagiarism”; that is, a direct copy-paste job has been disguised by replacing some words (or short clusters of words) with thesaurus-equivalents.

So it turns out that my student’s paragraph…

The fundamental thought of pooled testing is that it permits general wellbeing authorities to test little gatherings – called pools – of individuals utilising just one test. This implies you can test more individuals quicker, utilising less tests and for less cash.

…was in fact copied from this article on The Conversation:

The basic idea of pooled testing is that it allows public health officials to test small groups – called pools – of people using only one test. This means you can test more people faster, using fewer tests and for less money.

We can see where the substitutions have been made: “basic idea” becomes “fundamental thought”, “allows” becomes “permits”, “groups” becomes “gatherings”, “money” becomes “cash” – and my student’s weird phrase “general wellbeing authorities” was originally “public health officials”. It’s not like any of these new phrases are wrong, exactly, but each of them is just a bit off, and putting enough of them together results in a sort of almost-gibberish.

Because the substitution is (mostly) on a word-by-word basis, it doesn’t take into account the context, so sometimes we end up seeing a synonym for the wrong sense of a word, which ends up being genuinely nonsensical. For example, the Conversation article’s sentence…

Higher infection rates mean that more pools come back positive, more people need to be retested, and savings from pooling are lower.

…became in my student’s work:

Higher disease rates imply that more pools return constructive, more individuals should be retested, and reserve funds from pooling are lower.

Some of these substitutions are perfectly benign: “disease” for “infection”, “imply” for “mean”, “individuals” for “people”. But “positive” as in “a positive test” doesn’t mean the same as “constructive”, and replacing “savings” with “reserve funds” doesn’t work at all. (This Slate article, h/t @BristOliver, brings up the spectacular example of “left behind” getting thesaurus-substituted into “sinister buttocks”.)

Once you get your eye in, you can get quite good at decoding these. For example, let’s look again at that 25 Hours News article:

US President Donald Trump has again guarded the utilization of hydroxychloroquine to avoid coronavirus, negating his own general wellbeing authorities.

I reckon “guarded” was originally “defended”, “utilization” was “use”, “negating” was probably “contradicting”, and we now know “general wellbeing authorities” was “public health officials”. “Avoid” is a bit trickier, though: “treat” would work there, but “avoid” isn’t really a synonym for “treat”, so I’m a bit stumped. Anyway, Googling for the bit we have decoded, we find this BBC news article:

US President Donald Trump has again defended the use of hydroxychloroquine to ward off coronavirus, contradicting his own public health officials.

(Our guesses were right, but it was “ward off” that became “avoid”.)

More widely, working through some of the 3000 hits for “general wellbeing advisors”, it seems likely to me that every single one of them is thesaurus-plagiarised from some other article that originally said “public health officials”. The internet is just riddled with this stuff.

Another example: this article in the Journal of Biochemical Toxicology

The event of pathogenic microorganisms in ecological waters is a progressing worry for general wellbeing authorities and those in the water administration territory around the world.

…is surely plagiarised from this article in Applied and Environmental Microbiology:

The occurrence of pathogenic microorganisms in environmental waters is an ongoing concern for public health officials and those in the water management area worldwide.

Exercise for the reader – What do you think was was the original phrasing of this:

General wellbeing authorities changed their tune when it became evident that the infection could spread among individuals indicating no manifestations.

(You can check your answer with the AP.)

Scammy automated news website for advertising clicks is one thing – but I’m surprised to see so many journal articles, even in garbagey journals, producing this nonsense. Like, for the academic doing the “writing”, this seems like much more effort than simply not plagiarising:

(And if you’re writing an article about food safety, you probably shouldn’t start referring to it as “sustenance wellbeing” instead halfway through.)

I later learned that “general wellbeing authorities” is an example of what is known as a “tortured phrase”. A tortured phrase is the result of performing thesaurus-substitution to a commonly used piece of technical jargon that, because of substituting word-at-a-time without context, leaves something nonsensical that no human would ever write. This Nature article gives a short overview of some research that works on spotting thesaurus-plagiarised articles by finding common tortured phrases in them. The article also gives some excellent examples, like “irregular esteem” (from “random value”) and “profound neural organisation” (from “deep neural network”).

Anyway, in conclusion: Anyone who‘s ever used the phrase “general wellbeing authorities” is a plagiarist*, and not a very good one.

(*except me, here)