Normally, news involves something that is, as the name implies, new. But this week, attention was given to a problem in biology that is anything but new. There have been decades of warnings that researchers sometimes perform studies using cells that have been misidentified—presented as liver cells when in fact they're derived from the spleen, for example. As cell lines are shared and studies build on earlier work, this misidentification has the potential to cause wider problems in the scientific record.
Despite decades of warnings and the existence of a database of problematic cell lines, the problem isn't going away, as emphasized by a study released last week. The new analysis estimates that as much as 10 percent of the papers in the biological sciences may be influenced by cases of mistaken cellular identity. And it's hard to ascribe this to anything other than carelessness and overconfidence on the part of biologists.
How do you end up with the wrong cells? There are a variety of ways. Often, new cell lines are made from tumor or tissue samples. If the sample is not 100-percent pure, there's a chance that something other than what you expect could grow out. In addition, some tumors can be misidentified—assumed to be lung if they're found there, but the tumor may actually represent a metastasis of a cancer that started in some other tissue. While there are ways of identifying a cell's source (typically, checking the battery of genes active in the cells will indicate its origin), this hasn't been done as consistently as it should be.
Problems don't end with the establishment of the cell line. These cells will often spend years stuck in the back of freezers, where maintaining their identity depends on clear labeling and/or careful bookkeeping. Then there's an issue associated with using them. Some cell lines are extremely well adapted to living in plastic plates (notably HeLa cells). Get a few of these mixed in with the cells you're using through accidental contamination, and they'll gradually take over the population, pushing out the cells you thought you were using.
The ability of HeLa cells to take over was first noted all the way back in the 1960s, where it was estimated that up to 10 percent of the cell lines in use had become replaced by HeLa. Since then, reports of additional problems have surfaced. Cell lines that were thought to be distinct turned out to be identical; "Thymus cells" turned out to be a liver carcinoma.
Reports of these instances are more regularly appearing in the scientific literature. To its credit, the scientific community responded, creating the International Cell Line Authentication Committee. Biotech companies sprang up that offered to perform cell line validation services for a fee, yet little seems to have changed. A 2015 report looking at newly derived cell lines from China found that 85 percent of them were misidentified, and more than 90 percent of the contaminated cell lines in China contained HeLa cells.
Still not going away
So it should be no surprise that, when the issue was recently tackled (again!) by two Dutch scientists, continued problems were apparent. The scientists, Serge Horbach and Willem Halffman, are notable in that they provided a careful quantification of just how large the remaining issue is.
To do so, they downloaded the database of problematic cell lines and identified any papers that seem to have used them, based on the name's appearance in the abstract, title, or references, all of which are publicly available. This is almost certainly an underestimation of the total number of papers. To begin with, most publishers keep the body of the text behind a paywall, so if the cell line is mentioned there, it wouldn't have been picked up by Horbach and Halffman. They also limited themselves to cell lines in which all known samples are contaminated; there are others in which both valid and problematic lines exist.
There are several pieces of bad news here. The first is that, simply based on this analysis, there have been nearly 33,000 papers published using cell lines that we know are bad. "Misidentified cell lines keep being used under their false identities long after they have been unmasked," as Horbach and Halffman put it. Tracking the problem over time shows that the creation of the database tracking these cells hasn't caused it to decrease; at best, it slowed the growth in the study of flawed cell lines.
If you look at the papers that referenced these flawed cell lines, then the number climbs to about a half a million papers—the authors estimate this cache to be about 10 percent of the scientific literature in the relevant subject areas. And that's after Horbach and Halffman eliminated all cases of the same researchers citing their own papers. A total of 46 of the publications had been cited more than 1,000 times, and there were 2,600 articles with more than 100 citations.
So not only are researchers ignoring known problems when they start their research, it's getting published and the rest of the scientific community is taking it seriously.
Some of these instances are likely to be relatively harmless, but Horbach and Halffman examined a few specific cases that were anything but. In one case, researchers picked a second prostate cell line in order to look at whether a result was general, even though contamination had ensured that the second cells were identical to the first. Two other cases involved cells that were identified as coming from one tissue when they originated somewhere else in the body entirely.
This is a disturbing analysis, but perhaps the most disturbing part of it is that it's anything but news. I knew about this problem more than 15 years ago, and I didn't do any work that relied on any of these cell lines. Since awareness shouldn't be a problem, it's likely persisting largely because of a combination of carelessness and a degree of overconfidence, as researchers decide contamination is something that must be a problem for other researchers but not them.
All of which means that biomedical researchers need to be shocked out of this complacency. Horbach and Halffman have a simple solution: publishers should scan their back catalog for mention of these cell lines and label each of these papers with what's called an "Editorial Expression of Concern." These are meant to indicate that the publisher isn't entirely confident in the results of the paper. It would warn other researchers away and would make the papers far less useful for researchers who might otherwise use them for promoting their past accomplishments.
I'd add a second suggestion: grant funding agencies should screen all incoming grants for use of these cells and consider it grounds for automatic rejection.
These may seem like drastic measures, but the scale of the problem seems to call for that. Beyond simply the number of publications, biomedical researchers have become the single largest recipient of non-military research funding in the US—it's our money going to research that's inherently problematic. Research is a difficult enough activity that many publications will end up wrong despite researchers' best efforts, so why should money still be flowing to researchers who aren't making a best effort?