Health data has long been touted as the key to a revolution in medical research, fueled by billions of dollars from tech investors and Silicon Valley giants eyeing new markets.
But in the wake of a botched study on the benefits of an anti-malarial drug against COVID-19, leading academics warn that big sets of health data need to be treated with caution — and can by no means replace tried-and-true scientific methods in the search for medical treatments.
The study, published in leading medical research journal the Lancet in May, linked the use of hydroxychloroquine to increased deaths in patients infected with COVID-19. It immediately led the World Health Organization to pause its own trial on the drug, while some countries went so far as to ban its use as a treatment for the coronavirus.
Yet as questions emerged over the quality of the data used to support the studys conclusions, the authors withdrew their support and the journal took the rare step of yanking the paper.
The ensuing scandal threw a damper on the idea of medical research being based entirely on big sets of health data, at a time when the market for health data is booming and big industrial players are pushing for its use in medical research. Google this week formally notified the European Commission of its plans to buy wearable health tracking devices company Fitbit — and its troves of sensitive health data — in a deal that has alarmed privacy campaigners, while Palantir, a U.S. data mining company, recently gained access to U.K. health data.
“The idea that AI and big data will replace randomized trials is starting to become more standard” — Charles Mayo, professor at the University of Michigan
At the heart of the scandal are concerns about the quality of the data, and the disproportionate risk to health from basing medical research on faulty data.
U.S.-based Surgisphere, the company that provided data for the botched study, claimed to have gleaned data from around 700 hospitals across six continents.
When challenged about the quality of the data, it refused to open up its databases for audit, citing privacy and confidentiality agreements with the hospitals.
It wasnt the first study backed by troves of Surgisphere data. Sapan Desai, the companys CEO and one of the studys authors, has long pushed for AI and big data analytics to be used more in health research. “With data like this, do we even need a randomized controlled trial?” he was reported as saying of the hydroxychloroquine study before it was retracted.
However, researchers contacted by POLITICO cautioned against the idea of big data replacing tried and tested methods like randomized controlled trials — considered the gold-standard in medical research. They also questioned Surgispheres assertions that it couldnt open up the data for audit.
Neither Surgisphere nor Desai could be reached for comment.
In a media appearance before the paper was retracted, Desai played up his companys role in the research, saying that a study of that “size and quality” had only been possible thanks to Surgispheres technology.
“If information passes into the hands of a company, what is then the companys responsibility as steward of that data? My own opinion is that I dont think there are yet enough clear guidelines and checks for consistency at this point,” said Charles Mayo, a professor at the University of Michigan and author of a paper on big data in clinical trials.
Mayo added that the Surgisphere scandal is likely to prompt most institutions to implement additional practices and frameworks to cover use of patient data with commercial companies — adding that this type of oversight has long been a part of clinical research, with oversight by institutional review boards and compliance offices.
Big role for big data
Experts questioned Desais assertion that big data can replace randomized trials — but said the idea is nonetheless gaining traction as more data becomes available and AI capabilities improve.
“It needs a lot of effort to get the data to the point where it can become useful … we dont see data replacing randomized trials. I would say that is misguided,” said Mayo.
Randomized controlled trials aim to reduce bias by randomly assigning subjects to two or more groups. One group receives the intervention, while the other receives nothing or a placebo. The results of the groups are then compared.
While vital for medical research, these kinds of trials can be expensive and take years to complete. In contrast, the appeal of using big data and AI is linked to speed and lower costs, with its promise of large troves of ready-to-go data, sitting in servers around the world.
But those at the coalface of scientific research say that though big data can be useful, it is no match for the rigor of trials under test conditions.
More perilously, big data can throw up misleading results — a risk inherent to an approach where data is often shoddy and ways of checking it are limited. As Tom Treasure, a University College London professor who co-authored a paper looking at the use of big data in research, put it: “You can make some ghastly mistakes from big data.”
Journals havent yet caught up with the new challenges thrown up by the increasing use of big data in research.
For Mayo, a lack of funding for the infrastructure needed to ensure data quality is one risk of relying too heavily on big data in research.
“It is exceedingly hard for institutions to get funding for the infrastructure. People sell the idea that they can do all these things with the data, but in reality the data often isnt good or well organized enough,” Mayo said.
He said that analyzing large data sets is useful in finding associations and validating the results of a study, but is “no substitute” for answering targeted questions in a way that trials do.
Treasure also poured cold water on the suggestion that data analytics can replace randomized trials.
“A new treatment has to be tested in a formal way, not just unearthed from a mass of data,” he said. “I think if you asked experienced people who have a good, rounded view, they will say we will still need trials to pick signal from the noise, but databases are still useful.”
Even so, with analytics capabilities and the amount of available data on the rise, belief in the irreplaceability of randomized control trials could be challenged.
“The idea that AI and big data will replace randomized trials is starting to become more standard,” Mayo said.
Theo Arvantis, professor of digital health innovation at the University of Warwick, said that AI and big data analytics are currently considered complementary to randomized controlled trials, but that the way clinical studies are done “might have to evolve.”
Questions over process
While experts are wary of the use of big data in medical studies, the Surgisphere scandal also exposed the extent to which scientific journals are ill-equipped to properly vet research based on an analysis of vast data sets.
In the case of the Lancet study, post-publication issues were raised by over 100 researchers relating to the absence of ethics review, inadequate adjustment for known and measured variables that can change the effect of whats being studied, and data that is inconsistent with government reports of cases and deaths.
According to Natalie Banner, the Wellcome Trusts lead for its Understanding Patient Data project, the scandal shows that journals havent yet caught up with the new challenges thrown up by the increasing use of big data in research.
“I think every part of the system that is involved in the research process has not quite caught up with the potential of big data research — all the necessary checks and balances that are needed to ensure that it is done ethically and robustly,” she said.
The ability of researchers to get a study based on questionable data published on a reputable platform has shone an unflattering light on the publishing process — and on the journals themselves.
“Patient confidentiality is, rightly, sacrosanct. I dont believe that extends to institutional sources” — Tom Treasure, professor at University College London
UCLs Treasure said it is “extraordinary” that the journals didnt establish the source or the companys permission to use the data, while Paul Elbers, an intensive care physician at Amsterdam UMC, said the scandal has highlighted the “inadequacy” of the peer review system.
“A limited number of people get to look at the paper before its published, and Im sure they mean well, but they cannot possibly oversee all pRead More – Source