Bad science: a case study in pollution of the scientific literature

Mistakes that keep on giving

Aug 26, 2022

We’ve all heard of fraud in medical research, but we don’t hear as much about mistakes that quietly get passed on to thousands or tens of thousands of papers. They’re definitely out there, and it’s quite possible that they’re more prevalent, and costly, than fraud.

What’s sad is that they’re often preventable. What’s even sadder is that scientists are often incentivized to keep these mistakes buried.

This article is about a particularly costly and tenacious mistake that has been polluting the scientific literature for decades, with no signs of going away anytime soon.

A case of false identity

Cell lines are cultures of cells that were originally obtained from the tissue of a single individual and kept alive, usually in tubes or petri dishes. Some have been kept in culture for many decades. They’re supposed to be a pure population of cells with minimal genetic variation1, which is useful for generating consistent and reproducible results.

The Hep-2 cell line was first described in 1954 as laryngeal cancer cells; aka cancer cells of the larynx, an organ involved in breathing and making sounds.

In 1966 however, Stan Gartler discovered that Hep-2 cells were actually derived from contamination by cervical cancer cells from a cell line called “HeLa,” so named because they had come from a woman named Henrietta Lacks.

“Cervical,” means of the cervix, which is the lower part of the uterus in the female reproductive system. That’s obviously a totally different kind of cell.

Gartler published his results in 1968 in Nature. His findings were independently corroborated in 1988.

However, it seems a lot of people didn’t get the memo.

A review from 2019 revealed that since 1954, 1036 out of 5461 publications using Hep-2 cells were still mistakenly referring to them as laryngeal. A more recent study found that as of June 2021, 3163 out of 8497 articles using Hep-2 cells were still describing them as laryngeal.

Why does this matter?

Cells derived from different tissues, look and behave very differently.

All kinds of things are different between different types of cancer cells, from their morphology, gene expression, metabolisms, and even how they respond to different treatments.

So the cell lines used in a study matter a great deal. If you’re studying laryngeal cancer, you’d obviously want to use a laryngeal cancer cell line, not a cervical cancer line.

The problem doesn’t seem to be going away

Even worse, the rate of publications misidentifying Hep-2 cells has increased over the past three decades.

The red bars in this chart, from the 2019 review, show the annual number of articles referring to Hep-2 cells as laryngeal in origin:

Now, we should distinguish between studies where it was important to use cells that were laryngeal in origin, such as studies that focused on laryngeal cancer or respiratory illnesses, vs studies where that was not the case.

In the second case, we might be able to just replace any instances of “laryngeal” with “cervical” in the paper and not compromise the paper or take away from its relevance.

Unfortunately, of the papers that misidentified Hep-2 cells, the share that fall into the first category has increased over recent years.

In the chart below, from the 2019 review, the red columns represent the annual number of articles that misidentified Hep-2 cells where the laryngeal nature of the cells was directly relevant to the findings:

The problem doesn’t seem to be going away. Here are just a smattering of recent papers still misidentifying Hep-2 cells: there’s this study from July 2019, this one from December 2019, one from June 2020, one from Oct 2020, one from June 2021, and this one from June 2022.

There are probably more; this was just the result of a quick search. Most of these seemed to come from China or Brazil.

[UPDATE 8/29/22: But this is not just a problem coming from China, Brazil, or other emerging economies. A significant proportion of articles using misidentified cell lines comes from the US, Europe, Israel, and Japan. For more, see the section “A peripheral problem?” from here.]

It goes way beyond Hep-2 cells

Hep-2 cells are far from the only cell lines that have been misidentified or contaminated.

HBC and BrCa5 are two cell lines that were thought to be breast cancer cells. They turned out to be rat and HeLa cells, respectively.

The INT 407 cell line, which was originally described as intestinal, were also contaminated by HeLa cells. As of June 2021, 1397 articles were found to have used this cell line inappropriately.

Cell lines that were thought to be from adenoid cystic carcinomas, turned out to be HeLa cells.

Girardi heart (putatively heart), KB (putatively epidermoid cancer), Chang liver (putatively liver), WKD (putatively eye) and WISH (putatively amnion) are actually all HeLa cells.

If you’re wondering why HeLa cells have contaminated so many cell lines, it’s because they’re particularly robust and aggressive; they’ll readily displace slower growing cells if they get into a culture.

There are other cell lines that have caused contamination issues.

ECV304 cells, originally described as umbilical vein endothelial cells, are really T24 cells (bladder carcinoma).

Then there’s MDA-MB-435, a popular cell line for modeling breast cancer and referenced in more than 1,200 scientific studies. Turns out it’s actually from a male patient's melanoma (skin cancer).2

How bad is this problem?

Estimates vary, but a Register of Misidentified Cell Lines lists 531 cell lines that are known to be misidentified. This only lists cells lines where there is sufficient data to draw firm conclusions; many cell lines are candidates awaiting review.

National Testing Services reported that up to 36% of cell lines were incorrectly designated.

A 2018 analysis found that 804 cell lines were misidentified from 3641 human cell lines in 16 collections, meaning roughly two out of nine cell lines were misidentified:

And a 2017 study identified 32,755 articles that had misidentified a cell line.

As they only searched for cell lines known to be misidentified, this is a conservative estimate of the scale of contamination in the primary literature.

Different levels of damage

This kind of mistake leads to different levels of damage.

Some studies, like ones that focus on basic biology and don’t focus on particular cancers, aren’t necessarily undermined by this kind of mistake.

Studies that are rendered useless or of limited use

But this kind of mistake can render some papers useless, or of very limited use.

For example, in this study from Oct 2020, they used Hep-2 cells to study the effects of a toxin from a pathogenic bacterium that commonly causes pneumonia.

Pneumonia is caused by an infection in the lungs. The authors presumably chose Hep-2 cells because they thought they were part of the respiratory system.

Now that we know Hep-2 cells are cervical, we may have to accept that this study has very limited application, unless this respiratory bacterium infects cervixes or the toxin can reach cervixes (I suppose it’s possible).

Studies that undermine the foundations of cancer research

At the most damaging end of the spectrum, a mistake like this could muddy the waters or undermine the foundations that cancer research is built upon.

In the book A Conspiracy of Cells, by Michael Gold, Gold describes how cancer researchers had made some puzzling observations; that of “spontaneous transformation,” a mysterious process by which benign cells suddenly turned malignant.

Much later it became clear that most “transformations” were just takeovers of cultures by HeLa cells.

Gold also describes how cancer researchers had been observing marked similarities between a lot of different kinds of cancer cells:

Researchers had observed that cancer cells shared many fundamental characteristics, and there had begun to emerge a unifying theory; all cancer cells grew relatively quickly and had the same basic nutritional requirements; they seeded new tumors when inoculated into the cheeks of hamsters; many had abnormally shaped chromosomes; and most carried the same surface antigens, proteins on the outside of the cell that stimulate the body’s immune system. Like winning lemons in a casino full of rigged slot machines, these traits kept coming up one after another in dozens of cells lines the scientist thought had come from dozens of cancer patients.

The truth was that they had all been studying one line of cells masquerading as the others: HeLa.

How many blind alleys did these mistakes cause? How many lost treatments or erroneous treatments?

A particularly egregious outcome involving gene therapy

One of the worst outcomes from a misidentified cell line involved South Korea’s first approved form of gene therapy.

A Korean biotech company Kolon Life Science had developed a therapy for arthritis that was supposed to consist of a one-time injection of chondrocytes, which are cells that produce and maintain cartilage. These chondrocytes were supposed to have been genetically modified to produce a certain growth factor.

The problem is that the cells that they were injecting people with were misidentified cells. These cells (GP2-293) were derived from human embryonic kidney 293 (HEK-293) cells.

It gets worse. HEK-293 cells may be tumorigenic. There’s now a class action lawsuit against the company.

Among those taking part in the class action are patients suffering from stomach, breast, and other forms of cancer, in addition to arthritis (more here).

Jonas Salk’s mistake while trying to develop a cancer vaccine

The eminent Jonas Salk, developer of the Salk polio vaccine and founding director of the Salk Institute, deserves a special mention.

A Conspiracy of Cells describes how Salk had attended a scientific meeting3 and described experiments trying to develop a cancer vaccine in the late 1950s.

The idea was to inject patients with monkey cells that had particular antigens (surface “identification tags”) in common with cancer cells. The hope was that they would activate the patients’ immune system against the monkey cells and trigger an attack against the cancer as well.

Strangely enough, when he injected the cells into some of his patients, they developed tumors. In retrospect, he believed those cells may not have been harmless monkey cells, but HeLa cells. Apparently he had thought this years earlier, but hadn’t spoken of it till this meeting.

He quickly added that the tumors had dissipated in a few weeks. Apparently he hadn’t helped most of his patients, but neither had he given them more cancer than they already had. And that was the whole point of his talk: to report that even when injected directly into human beings, HeLa cells do not cause cancer.

That Salk had injected cervical tumor cells straight into human beings was disturbing to at least some of the conference attendees. There were some who thought that he had been glib to conclude that there was no danger, simply because the tumors on patients’ arms had disappeared.

So he got some advice:

The conference organizers apparently found this part of Salk’s talk so unsettling they advised him to skip it in the written version to be submitted for publication. Sure enough, when the collection of reports was published… Salk’s paper made no mention of the inadvertent human experiments.

Denial and suppression from scientists

Stan Gartler, a Seattle geneticist, was one of the first to show that cell line contamination was a problem. But when he stood up at a 1966 scientific meeting to present his findings, he was met with fierce resistance.

Then in the 1970s, biologist Walter Nelson-Rees aggressively tried to expose impostor cell lines. Although his findings got some publicity because they were published in Science (after initially having been rejected), many scientists still ignored or denied the evidence.

In A Conspiracy of Cells, Michael Gold describes how this pushed Nelson-Rees to highlight individual labs or individuals that had used cross-contaminated cell lines. This earned him much vilification; other scientists accused him of “unethical conduct,” calling him “unscientific” or “ungentlemanly.” They released statements urging that he be censured or stripped of some of his positions.

Weary from years of making enemies, he left science in 1981.

After this, cell line misidentification went largely unchecked and the problem escalated. For the next 10–20 years, cell banks distributed many cell lines under their false names.

Later on, cell biologist Roland Nardone and geneticist Christopher Korch took up the fight (see here and here). In 2015, Korch4 stated:

All too often, scientists have ignored my findings. Not one of my published papers has led to a retraction by a journal or scientist. Less than 10 corrections have been issued, when each false line I discovered affects the conclusions of hundreds or thousands of papers.

In one case, Korch had contacted a laboratory that had used INT 407 cells as a model for intestinal cells. When Korch let them know that INT 407 was actually HeLa, the principal investigator of the lab acknowledged the error and stopped using INT 407 cells. But to this day, none of the lab’s 37 papers using INT 407 have been retracted or corrected or issued “letters of concern.”

As of June 2021, those papers had a total of 1212 citations. Their papers continue to be cited by others.

Why is this still happening?

It’s relatively easy to check the authenticity of cell lines, and we have more means of communication than ever before. How could this still be happening?

Maybe part of the reason is that articles about false cell lines are often published in journals that the cancer researchers aren’t reading.

But there is more going on. To a large extent, journals could have put an end to this. Most scientific papers today are viewed online; it would have been easy to put a large warning sign on any articles that had misidentified a cell line.

A few journals have implemented requirements for cell line authentication, but most have not attempted to clean up the past record. Perhaps it’s because the editors of the journals might be implicating some of their own work, or the work of their colleagues. It’s not fun to make enemies after all.

In A Conspiracy of Cells, Nelson-Rees mused that the problem was not just that cells like HeLa were hardy and aggressive:

HeLa cells persist because they have always been helped along by a certain human element in science, an element connected to emotions, egos, a reluctance to admit mistakes…
It’s all human- an unwillingness to throw away hours and hours of what was thought to be good research, worries about jeopardizing another grant that’s being applied for, the hurrying to come out with a paper first.
And it isn’t limited to biology and cancer research. Scientists in many endeavors all make mistakes, and they all have the same problems.

Wade Parks, a virologist, took the story of HeLa and coined a term:

A “HeLa” is a scientific claim that sucks people into a line of work for awhile, a line that is later refuted or shown to be a waste of time. It’s a type of error in science that occurs fairly often. And it will continue to exist.

The story of false cell lines makes it all the more ridiculous to say things like, “The science is settled.” The reality of science is much messier.

The reality is a bit more complicated. Cell lines can eventually diverge to the point where their genomes, epigenomes, gene expression, morphology, and even drug sensitivity are no longer identical. For more, see here.

Moreover this cell line (M14) had actually lost its Y chromosome. Apparently that often happens in cell lines derived from males. If you don’t see the Y, you might assume you have a cell line from a female, but that’s not necessarily true.

The conference had been about the use of cells in making vaccines.

For the last few decades, there had been some concerns over whether using continuous cell lines were safe to grow viruses to be used in vaccines; was there a chance those cells might trigger runaway growth; aka cancer, in the recipients of vaccine prepared in those cells?

People like Salk had argued there were adequate methods for separating a vaccine’s active ingredients from the cells they were prepared in; in fact Salk had believed that even if vaccines weren’t filtered at all and whole cancer cells were injected directly into human test subjects, they would do no harm. Of course no researcher would have tried that experiment, at least not intentionally. But Salk had done it accidentally, as he had told people at the meeting.

Korch has also said that even after scientists found out that their favorite cell line was contaminated, they sometimes kept studying it. “They look at their line and are convinced it's still a valid model, because its behavior seems to match their expectations.”