At some point, if you care about science, you will have to find 30 minutes and a stiff drink and read Alvaro de Menard’s 8,500-word post “What’s Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers.” It is that good, that entertaining, that shocking, that discouraging and, at last, that calmingly clarifying about how bad the median quality of papers from many disciplines is, even after the replication “crisis” was supposed to scare many researchers toward tightening things up.
Menard participated in the Defense Advanced Research Projects Agency (DARPA) Replication Markets project, in which participants predicted the chance of replication of about 3,000 social science papers published between 2009 and 2018 as part of a larger DARPA project to improve research standards in the social and behavioral sciences. Why use prediction markets to assess replicability? At least some past research has found that people (both scientists and laypeople) can correctly predict whether a study will hold up or not. The two studies I’ve looked at using laypeople say they correctly predict replicability about 70% of the time.
Some interesting toplines from Menard:
- Just 25% of the papers read were rated having above a 76% chance of being replicated; 33% of the papers had less than 25% chance of replication. “We’re talking hundreds upon hundreds of terrible papers, and this is just a tiny sample of annual academic production,” he writes.
- Studies that replicate are cited at the same rate as studies that do not — with negative citations “extremely rare.” Which means, Menard posits, that scientists either aren’t reading the papers they’re citing or don’t care about their obvious lack of replicability.
- There’s no relationship between a journal’s impact (as measured by its h-index) and the average expected replication rates of the papers it publishes. Better journal ≠ better replicability.
- The recent “replication crisis” hasn’t improved replication rates. In fact, researchers have been writing about the replication crisis since the ‘50s. (The 1950s, just to be clear.)
- DARPA’s Replication Markets participants collectively found economics, education, demography and marketing/management above average for producing replicable papers; evolutionary, cognitive and social psychology as well as criminology were terrible.
Menard comes down very hard on the NSF, which has an $8 billion budget, for being oblivious to marginal quality research and its consequences. (Why, he asks, is it the Department of Defense that runs Replication Markets?) “The broken incentives of the academy did not appear out of nowhere, they are the result of grant agency policies,” Menard writes. “The importance of metascience is inversely proportional to how well normal science is working, and right now it could use some improvement.” His solutions are skewed toward large organizations that could shape the metascience discourse, not to individuals. It appears it would take a mass ethics-based revolt on the part of scientists to change things from the bottom up.
Except…that in a previous post, Menard argued that most non-replicability is because of bad methodology — which is in the hands of individuals:
Let’s say that about half of all published research findings are false. How many of those are due to fraud? As a very rough guess I’d say that for every 100 papers that don’t replicate, 2.5 are due to fabrication/falsification, and 85 are due to lighter forms of methodological fraud. This would imply that about 1% of fraudulent papers are retracted.
This is both good and bad news. On the one hand, while most fraud goes unpunished, it only represents a small portion of published research. On the other hand, it means that we can’t fix reproducibility problems by going after fabrication/falsification: if outright fraud completely disappeared tomorrow, it would be no more than an imperceptible blip in the replication crisis. A real solution needs to address the “questionable” methods used by the median scientist, not the fabrication used by the very worst of them.
Meanwhile: If you run a research-driven organization that produces social science, you should acknowledge this crisis in “lighter forms of methodological fraud” in normal science — and use it to your competitive advantage. One step: Establish an external red team to evaluate and find flaws in your research output before you submit it for peer review. Publish their findings and your revisions in response. Because you can’t rely on journals and journalism — on the science-media industrial complex — to separate your shop’s obvious quality from the “sea of trash” that surrounds it. As Menard writes:
Actually diving into the sea of trash that is social science gives you a more tangible perspective, a more visceral revulsion, and perhaps even a sense of Lovecraftian awe at the sheer magnitude of it all: a vast landfill — a great agglomeration of garbage extending as far as the eye can see, effluvious waves crashing and throwing up a foul foam of p=0.049 papers. As you walk up to the diving platform, the deformed attendant hands you a pair of flippers. Noticing your reticence, he gives a subtle nod as if to say: “come on then, jump in”.