How researchers get heard
Abstract lines

Statistical Significance: Time to Go Cold Turkey

Bethany Brookshire of Science News provides the best overview I’ve seen of the drive to end “statistical significance” as a way to judge the results of scientific experiments and studies.

If you’ve missed this development, you should use Brookshire’s piece to catch up. Getting rid of statistical significance is going to be messy and fraught — the 800 statisticians and scientists who signed a 20 March Nature comment advocating just that are pulling on the end of a very long string. But now that they’re pulling, the unraveling will have major consequences for the framing of scientific results going forward — and the communication of those results.

I’d argue further that, if science doesn’t respond quickly to this call, opponents of science will use it to question results they don’t like — or even sow seeds of doubt about science’s claims to knowledge.

The central issue for a growing number of statisticians and scientists, Brookshire reports, is how science leans on statistical measures such as P values less than 0.05 “as shorthand for scientific quality.”

“First you show me your P less than 0.05, and then I will go and think about the data quality and study design,” Blake McShane, one of the Nature comment authors, tells Brookshire. “But you better have that first.”

For the 800 Nature comment authors — and take that in for a second: 800 authors — that’s the statistical tail wagging the scientific dog.

For example, their analysis of nearly 800 articles across five journals found that more than one-half mistakenly assumed “non-significant results” as indicating “no effect.”

And a P value of 0.05 isn’t a bright red line of quality, McShane tells Brookshire. “There’s no difference between a P value of 0.049 and a P value of 0.051.” (ICYMI: the P=0.05 cutoff dates back to a single 1925 monograph by the statistician Ronald Fisher.)

Read the Nature comment to get a full sense of how silly — not to mention unscientific — privileging statistical significance can get. The authors write they don’t want to get rid of statistical analyses — just the reliance on them “in the conventional, dichotomous way — to decide whether a result refutes or supports a scientific hypothesis.”

But what’s the alternative? Using a larger set of criteria for judging quality—and embracing uncertainty instead of trying to pretend you’re eliminating it, McShane tells Brookshire.

“What’s the quality of your data? What’s your study design like? Do you have an understanding of the underlying mechanism?” he says “These other factors are just as important, and often more important, than measures like P values.”

(The P) limit gives a false sense of certainty about results, McShane says. “Statistics is often wrongly perceived to be a way to get rid of uncertainty,” he says. But it’s really “about quantifying the degree of uncertainty.”

That’s going to be tough, in a world where, as Brookshire reports, 96 percent of the papers in the PubMed database in 2015 relied on P=0.05 as a marker of quality. And, as she notes, “embracing that uncertainty would change how science is communicated to the public.”

People expect clear yes-or-no answers from science, or want to know that an experiment “found” something, though that’s never truly the case. (University of Amsterdam in the Netherlands psychological methodologist Julia) Haaf says. There is always uncertainty in scientific results. But right now scientists and nonscientists alike have bought into the false certainty of statistical significance.

Those teaching or communicating science — and those learning and listening — would need to understand and embrace uncertainty right along with the scientific community. “I’m not sure how we do that,” says Haaf. “What people want from science is answers, and sometimes the way we report data should show that we don’t have a clear answer; it’s messier than you think.”

For me, the flaws in relying on statistical significance are yet another reason scientific experts need to move away from relying on communicating findings and toward applying their expertise in more holistic and relevant ways — including thought leadership or authority content. The Nature authors seem to feel the same:

The objection we hear most against retiring statistical significance is that it is needed to make yes-or-no decisions. But for the choices often required in regulatory, policy and business environments, decisions based on the costs, benefits and likelihoods of all potential consequences always beat those made based solely on statistical significance. Moreover, for decisions about whether to pursue a research idea further, there is no simple connection between a P value and the probable results of subsequent studies.

Takeaway: Start preparing for a world in which you’re communicating what you and your organization know, not just what you find. Science communications needs to find firmer ground for claiming authority than just the latest findings. That includes your expertise-based ideas, paradigms and solutions.