Scientific progress despite irreproducibility: A seeming paradox

analog31 2021-08-18 21:50:11 +0000 UTC [ - ]

In my view, reproducibility is neither strictly necessary nor sufficient for scientific progress. I'm sure if you went back through the literature on something like electromagnetism, you would find results that fail to replicate, yet the theory of electromagnetism if applied properly is remarkably robust.

Scientific results can be strengthened by replication, but that's just one thing, and if that's all you do, then you end up with a science that does nothing but generate independent factoids.

On the other hand, robust science tends to look at a particular phenomenon from many different angles, and manages to connect multiple results together into a framework that can survive retracting individual studies without collapsing. This is how electromagnetism developed.

Sciences that are in the factoid phase are not necessarily junk. Discovery of a psychological "effect" is perfectly scientific and interesting. But some sciences have barely progressed beyond the factoid phase. And if those sciences are also plagued by irreproducibility, then they may embrace scientific methodology without producing a useful scientific knowledge base.

atty 2021-08-18 22:24:27 +0000 UTC [ - ]

As a physicist, I think how we normally describe what you’re talking about is that there must be a robust and explanatory theory that goes along with experimental measurements. The most fruitful areas of research are the ones where theory and experiment exist on approximately the same level of what they can predict or measure. These strong explanatory theories are the things that generate truly robust, reproducible experiments.

In cases where there is significant differences in the maturity between theory and experiment, you either end up with theorists playing games with math, making up tons of unconstrained theories (string theory, for instance), or in the other direction you end up with experimentalists measuring anything and everything they can imagine, half of which is probably not theoretically enlightening, waiting for theorists to constrain their space of possible experiments to potentially fruitful paths.

Obviously this is harder for fields that don’t have models as robust as we do in physics, but I’d guess the same phenomenon happens everywhere.

analog31 2021-08-19 00:23:54 +0000 UTC [ - ]

I'm a physicist too. A bad analogy is that theory and experiment are like plywood. The individual layers are weak, but criss crossing them against one another and gluing them together makes something that's quite strong.

atty 2021-08-19 00:50:38 +0000 UTC [ - ]

I actually quite like that analogy - I may use that the next time I have a chance to teach.

wombatmobile 2021-08-19 03:21:04 +0000 UTC [ - ]

Yeah, why is it a "bad analogy"? I think there's something useful in it, possibly even essential.

wombatmobile 2021-08-19 03:18:50 +0000 UTC [ - ]

Clearly, for there to be chronic scientific progress, there must be some value in experimental practice. And yet widespread irreproducibility of results negates that value, by definition.

So, where is the progress coming from?

Your argument that "there must be a robust and explanatory theory that goes along with experimental results" helps make sense of TFA, which otherwise presents as a paradox, or a ruse to excuse shoddy practices.

The presence of a viable theory is a useful heuristic for sorting useful experimental practice from less useful. At least, I imagine it is. How can we know for sure?

Has anyone published a meta analysis that checks for this?

And has that meta analysis been independently reproduced?

dotcommand 2021-08-19 02:47:30 +0000 UTC [ - ]

> In my view, reproducibility is neither strictly necessary nor sufficient for scientific progress.

In your view? Reproducibility is essential to scientific progress because it defines what science is. It's the backbone of science. Hypothesis,testing/experiments. It's what separates science from math, arts and pseudosciences like social "sciences".

> I'm sure if you went back through the literature on something like electromagnetism, you would find results that fail to replicate, yet the theory of electromagnetism if applied properly is remarkably robust.

How would it be "robust" if it couldn't be replicated? How would the theory of electromagnetism be robust if it failed to produce replicable results?

> Scientific results can be strengthened by replication

No. It can only be a "scientific result" if it can be replicated/tested. It's definitional.

> Discovery of a psychological "effect" is perfectly scientific and interesting

No it isn't if it can't be test/replicated/reproduced. It's not science. It's something else altogether.

> And if those sciences are also plagued by irreproducibility, then they may embrace scientific methodology without producing a useful scientific knowledge base.

There is no "scientific methodology" without testability/replication/reproducibility.

All science is based on hypothesis-testing.

analog31 2021-08-19 04:10:15 +0000 UTC [ - ]

I get what you're saying, but even within the more reputable sciences, the body of known or published results is peppered with results that will fail to replicate. They have not all been identified. There will be more. The good results are mixed with junk results yet somehow science progresses. The robustness of science has to come from somewhere other than the complete absence of junk results. This is what I meant by my first sentence.

I read that in the century after Newton, the French Academy offered a prize for evidence of the failure of Newton's laws. They gave out the prize dozens of times, yet Newtonian physics kept getting stronger and stronger. Eventually they stopped giving out the prize. Many of the contradictions apparently had to do with the lunar orbit, which was poorly understood.

NeuNeurosis 2021-08-19 14:35:12 +0000 UTC [ - ]

Check this article out https://aeon.co/essays/how-popperian-falsification-enabled-t...

Deals with some of your points.

AlotOfReading 2021-08-18 23:16:55 +0000 UTC [ - ]

Of course reproducibility isn't necessary. There's a whole subset of sciences (often called the historical sciences) where reproduction is generally impossible because of the arrow of time. You can't replicate evolving humans again, as an example.

Nonetheless, reproducibility is a really nice property. You have to be incredibly careful with your theory if it can't be justified.

euanc 2021-08-19 15:58:07 +0000 UTC [ - ]

That subset of "sciences" aren't sciences by definition if they are not reproducible. They may be useful but they are not sciences.

LatteLazy 2021-08-18 22:27:29 +0000 UTC [ - ]

Wait, what electromagnetic results are not reproducable!?

enkid 2021-08-18 22:35:27 +0000 UTC [ - ]

I think the parent poster is saying some EM experiments weren't initially replicated because of the peculiarities of the set up, etc. Another example is that some people couldn't see the Galilean moons initially. This made initial report of them somewhat questionable, as some observed them and some didn't.

TeeMassive 2021-08-19 00:05:19 +0000 UTC [ - ]

> Scientific results can be strengthened by replication, but that's just one thing, and if that's all you do, then you end up with a science that does nothing but generate independent factoids.

Scientific results are what makes science universal; since they can be reproduced everywhere by anyone.

analog31 2021-08-19 00:22:26 +0000 UTC [ - ]

Indeed, at least until they can't be. The whole "reproducibility" business is because of an emerging "replication crisis," where it has been found that a large swath of published studies fail attempts at replication. Articles about this situation usually say something to the effect of "even in physics," though usually without evidence.

A variety of causes have been proposed, such as abuse / ignorance of statistics, bad experimental design, and even outright chicanery.

inter_netuser 2021-08-18 21:07:50 +0000 UTC [ - ]

Everyone who reads scientific papers as a part of their job know a good chunk, if not the majority, of papers are fluff, and not only in social sciences.

I’ve been a reviewer for a journal and we’d always recieve a good chunk of marketing whitepapers dressed up in just enough jargon to be published.

baron_harkonnen 2021-08-18 22:19:10 +0000 UTC [ - ]

> Proving the claims of rapid progress would be inordinately difficult and beyond the scope of this contribution.

That seems like it's an important part of this issue. If we know that there are massive problems with irreproducibility and we just have to go on an assumption that there has been "rapid progress" then it seems like the most obvious solution to the "paradox" is to question that progress as well.

This is similar to how in the 90s everything was labeled "healthy and fat free!" which we know resulted in food filled with sugar, then making the claim "the paradox of healthy food despite high sugar". It might be worth calling into question the claims of healthiness given the fundamental conditions that produce health aren't present.

Certainly there are places that everyone will agree we've seen progress, but all of these places seem to be where scientific progress is closely tied to commercial application.

One of the best examples that I think everyone will agree has been remarkable has been the storage capacity of batteries. While much of this progress surely starts in the labs, the true measure of progress isn't in papers it's commercial applications. It frankly doesn't matter at all if the academic research behind battery technology was good or not, because we know these batteries work and are in fact smaller. If you cellphone weighed 10 lbs and ran out of energy in 45 minutes you wouldn't care either way what the research said.

However the answer to this "paradox" isn't just that the market is some force for testing what is real. Anyone who has worked long enough in machine learning knows a fair amount of bullshit not only exists in papers but in products as well.

A better explanation for me is that we live in an age of unprecedented economic and high energy intensity activity. You generate 160,000 TWh of power you're going to see a lot of wonderful things that look like progress. Some real, some illusions, but assuming that science is really the basis for all of this is a fairly large, and unchecked in this article, assumption. Lots of the scientific progress we've made in the last few decades has happened outside of an academic research lab, and assuming "science" is the cause might be a bit naive.

enkid 2021-08-18 22:40:22 +0000 UTC [ - ]

I agree with almost all that you say, bit I disagree with the implication that progress in commercial enterprises isn't also a form of scientific progress. I totally agree that the ultimate measure of scientific progress is how much of that is accessible to wider society. That doesn't have to be in a published journal, it's anything that allows us to know about the world and leverage that knowledge for the better.

wolverine876 2021-08-18 20:59:14 +0000 UTC [ - ]

Every human institution is flawed, which imposes costs, but costs needn't bankrupt the enterprise. Every business is flawed, but some succeed (YC could tell you something about that). All software is flawed. The only solution - the only means to success - is to minimize the flaws and to find solutions that deliver positive returns on the investments. Democracy itself is very flawed, but works; so does science. (That's not an excuse for the flaws or a reason to accept them.)

tomp 2021-08-18 21:23:10 +0000 UTC [ - ]

Is this even true? Most of the progress is in "hard sciences" - physics, cellular biology, computational biology, genetics ... where experiments (and presumably reproducibility) is just fine.

epistasis 2021-08-18 21:39:02 +0000 UTC [ - ]

As somebody in computational biology, I think it's important to note that a ton of cellular biology and even computational biology is "not reproducible." This could be anything from bad documentation of how something was run (computational scientists are often terrible at lab notebook culture), to having a model system that lived for a while and no longer exists in the form that initial discoveries were made with. Or it could be that there was just that one batch of reagents that reliably reproduced a phenomenon, and all future orders of the reagents no longer get the same phenomenon, and whatever the difference was is lost to time.

But I would claim that especially in biology, though this is less than ideal for writing up near little explanations of reality, it is still extremely useful for understanding what's going on. Scientific papers aren't meant to be ever-lasting truth, like a textbook. They are communications amongst specialists about "look here something cool happened that may be useful to you too." It is only through lots of work that a phenomenon can be established as widespread in biology, and sharing information before doing 10 years of work helps accelerate everything.

This is why I roll my eyes at complaints like that one about "foundational cancer research papers not being reproducible." It was written by scientists in industry who wanted to take a new paper and develop an entire drug program around something particularly novel and surprising. Sure, that would be ideal if it worked, but the scientific literature is a lot more than a catalog of ideas ready for commercialization. Scientists in practice understand the limitations of taking a journal paper as gospel. They always try to get something working in their own hands before basing a lot of research off another paper.

SubiculumCode 2021-08-18 23:49:41 +0000 UTC [ - ]

As a fellow scientist, thank you for your dissemination of how science is actually conducted, and what any single paper or series of papers mean. Progress almost always occurs due to the assessment of converging methods and analytic approaches, almost like individual papers are weak classifiers in a big beautiful bag.

epistasis 2021-08-19 02:46:02 +0000 UTC [ - ]

Exactly, the phenomena are often way too complex for even a series of papers to come up with the proper abstractions to plan the right experiments. The history of chromatin marks is really illustrative of this. We had lots of early experiments to establish the antibodies to even measure the chromatin marks, and then then two decades of gathering enough data to lead to useful abstractions. And now it's all converging with the 3D organizational data from Hi-C experiments, and DNA methylation from methyl-Seq, and over the past decades the measurement technology (driven by DNA sequencing) has advanced so much that somebody in 2001 trying to anticipate this trajectory would be amazed. What was a single paper back then? Reproducible? Maybe, but that's much less important than pointing to the correct direction in this n-dimensional search space of what to examine next.

I've never looked at the historical literature from, say, 100 years ago, but I suspect that it's quite the same as today in all regards.

There's a ton of great ideas that never get the attention they deserve, a bunch of ideas that get way too much attention due to fashion or due to influencers that chose the wrong path. But who could we trust to chart a better path? Just like startups, we must accept a high failure rate when exploring the unknown.

querez 2021-08-18 22:04:58 +0000 UTC [ - ]

Counterpoint: I have a friend who works in a biomed lab where results would regularly just be made up. If a lab culture takes 5 years to grow, reproducibility is more of a theoretical than a practical matter. Also, not every reviewer has a particle accelerator in their back yard. Or the money to reproduce large computational models.

A lot of published results only pass peer review because you essentially trust the authors not to have made up their numbers.

microtherion 2021-08-18 22:26:49 +0000 UTC [ - ]

As for biology, I recommend following e.g. the work of Elisabeth Bik to see the hair raising amount of fraud going on, and getting published for years (or, if you like extra spice in your diet, you can follow Leonid Schneider).

For physics, consider how difficult it is to settle the controversies around something like the EmDrive, even among experimenters with solid reputations and impeccable professionalism.

derbOac 2021-08-18 22:10:46 +0000 UTC [ - ]

A lot of the biomedical sciences has similar problems with reproducibility. A survey found similar problems in pharmacology (I think there was one article quoting someone saying the pharmaceutical companies they were familiar with internally budgeted for about 2/3 of published academic articles to be false) and oncology, and others.

Whatever you might have to say about research in psychology, it's also the field primarily turning the microscope on itself. This is part of a tradition in the field -- modern meta-analysis has its origins there.

I'm less familiar with physics but there's a lot of problems with reproducibility in many fields.

tomp 2021-08-18 22:24:59 +0000 UTC [ - ]

> A lot of the biomedical sciences has similar problems with reproducibility.

Indeed, which is why I specifically singled out cell biology. It's the part of medicine that's closest to physics. Oncology, pharmacology etc. study humans/whole bodies, so not only are the experiments more expensive to run and much more noisy, there's also all kinds of ethical issues. Most of these aren't there, or are at least reduced, when dealing with just cells (I didn't expect "cell cultures take a long time to grow" and "the chemicals used are non-reproducible" issues that sibling comments pointed out, so I guess not quite physics)

cafebeen 2021-08-19 02:53:31 +0000 UTC [ - ]

A good example from physics is this observation related by Feynman in 1974:

"[...] It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.

Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of—this history—because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that..."

EverywhereTrip 2021-08-18 21:22:12 +0000 UTC [ - ]

The problem isn't irreproducibility in "science".

It is irreproducibility in a few fields. Most notably, nutrition, psychology, and economics.

These are all fields which study humans. The study of humans is far more fraught with bias and ideology. Humans are also independent decision making agents and behave in a way that atoms do not.

elisharobinson 2021-08-18 21:33:32 +0000 UTC [ - ]

why stop there modern medicine is also full to the brim with non tested hypothesis. i,e only recently have we seriously looked at the effects of saline and the placebo effect is well documented phenomenon.

adamisom 2021-08-18 21:54:42 +0000 UTC [ - ]

Funny you mention placebo effect. It actually is probably far less of a thing than popularly believed see https://slatestarcodex.com/2018/01/31/powerless-placebos/

From one point of view that just proves your point more, from another, less. If placebo effect is tenuous then all else equal that’s a good sign for the rest of medicine… but in fact lots of medicine is tenuous. The landmark “Most Published Research Findings Are False” was looking at medical findings iirc.

SubiculumCode 2021-08-19 00:00:36 +0000 UTC [ - ]

I agree (as a psychologist/neuroscientist) that psychology has problems with reproducibility, but I think you pick the incorrect issues (bias and ideology) nor is it that we are bad or dishonest scientists. Rather, I believe it is the aggregating factors of 1)system complexity, and 2)non-uniform samples (subject's differing in unexpected/unknown ways between samples), and 3) weak theoretical knowledge

Another post here made the point that it is necessary to guide empiricism with strong theoretical frameworks. Those strong theoretical frameworks are missing in psychology/neuroscience, because it is a very very new field tacking something very very complicated. We have weak theories, and thus empirical findings may fail to replicate for differences we do not measure (season? time of day? menstruation? obesity? coffee? as examples of things that might not be measured, or is not practical to include into statistical models due to df) and so on. But this is not to say that there has not been progress. There has been lots of progress, and theories are becoming better, ever so slowly. but we build on converging methods spanning basic neuroscience of individual neurons or local networks to MRI studies of macro phenomenology.

In the end, I think of myself more as a cartographer or explorer, much like those that set sail across the sea knowing little about what will be found, BUT DOCUMENTING IT in their naval logs and reports, so that one day those observations can be put together and build the world map. That is why open neuroscience is critical to our field.

sidlls 2021-08-19 00:18:21 +0000 UTC [ - ]

" but I think you pick the incorrect issues (bias and ideology) nor is it that we are bad or dishonest scientists."

Of course there's no dishonesty. And to be bad would require what you do to be science in the first place. From my perspective (physicist), "science" is applied too broadly. What you do isn't useless or unimportant. I just struggle to square it with "science" in any meaningful sense. That's probably not a popular opinion, but whatever: I stand by it.

SubiculumCode 2021-08-19 02:29:36 +0000 UTC [ - ]

What makes physics a science and psychology not? Are our brains made of the same matter as the rest of the universe? Surely so. Is it because psychology seeks to describe/predict aggregated phenomena? This is a non-starter. Take for example the derivation of Boyle's law, pv = k, for pressure. Surely this derivation was the result of real scientific inquiry, despite that equation merely describing the aggregate of underspecified and innumerable interactions? Is it instead because brains are too complex for scientific inquiry? I quite understand that, despite the immense technical prowess of physicists and engineers, the problems they seek to elucidate are all said and done, easy. Easy because the complexity is low, easy because the problem space is limited, easy because you can see and control more of the entire picture. Is psychology not a science because its problems are just too hard for physicists to tackle? Or to tackle in way they'd prefer: easy, controlled, identifiable, enumerated??? Is it because you can never *actually* rerun a psychology experiment because people are always different always changing? Well, please refer to the experiments deriving Boyle's law. Each experiment was not an exact replica of the previous, because the innumerable collisions/interactions of molecules will never take the same path through time, no matter how hard we try, because, well hell, quantum physics, or so I am told. Buy Boyle did derive some useful relationships, despite the heterogeneity of subjects/runs, and so has psychology.

SantalBlush 2021-08-19 02:51:11 +0000 UTC [ - ]

How are you defining "science"? That would help us understand how it's applied too broadly.

2021-08-19 00:56:41 +0000 UTC [ - ]

derbOac 2021-08-18 22:12:47 +0000 UTC [ - ]

Surveys suggest it's not limited to a few fields. You might be right about fields studying humans though.

cbozeman 2021-08-19 02:53:03 +0000 UTC [ - ]

I'm going to take the opposite position and posit that the "problem" with studying humans is that we just don't know enough yet.

We can't predict human behavior because our science is simply not advanced enough. I think once we have as strong an understanding of biology as we do something like say, physics or mathematics, we'll find it significantly easier to predict human behavior.

Or put another way, I don't think we can't predict human behavior because humans are "special", but because we're actually kind dumb on a cosmic scale.

2021-08-19 00:24:56 +0000 UTC [ - ]

2021-08-19 00:34:31 +0000 UTC [ - ]

mensetmanusman 2021-08-18 20:25:27 +0000 UTC [ - ]

One example: What would progress in psychology even look like? More people than ever are medicated for mental illness, seems like the opposite of progress.

542458 2021-08-18 20:34:04 +0000 UTC [ - ]

I don’t think number of diagnoses is a great metric for or against the progress of psychology. We could outlaw psychology and have zero diagnoses, but that wouldn’t be progress. If we’re criticizing psychology for not “curing” people I think that’s a bit unfair - people may not be cured, but their conditions can be managed. I personally would love to be cured of what ails me, but that doesn’t mean I’m not grateful for the medications and therapy that help me live a more “normal” life than would otherwise be possible.

nomel 2021-08-18 20:40:26 +0000 UTC [ - ]

With the assumption that their problems are real, what's the alternative? Surgery?

amrcnimgrnt 2021-08-19 03:13:05 +0000 UTC [ - ]

Finding the root causes of mental illness.

it's like protesting overmedication of type two diabetes patients by pointing out that surgery isn't very effective. I agree, it isn't, but we know why the patient has type 2 and how he could cure himself!

Tenoke 2021-08-18 20:45:56 +0000 UTC [ - ]

Clinical Psychology is just one part of the field of Psychology but even there can be progress in many directions - better diagnosis, less reported issues due to better prevention, less side-effects from treatments, or just pure metrics like higher life satisfaction, less suicides etc.

_Microft 2021-08-18 21:15:39 +0000 UTC [ - ]

The field of psychology is far larger than just clinical psychology. So much larger that I would recommend looking it up on Wikipedia instead of listing things here.

bsder 2021-08-19 05:58:54 +0000 UTC [ - ]

> More people than ever are medicated for mental illness, seems like the opposite of progress.

Maybe.

And maybe a bunch of those people were self-medicating with alcohol, nicotine, etc. and now are actually getting real treatment.

Don't get me wrong. I think we overly medicate people--especially school children. I think ADHD is hideously overdiagnosed.

However, I've also seen the flip side. People who were completely disconnected from reality who suddenly are back to their "normal" selves with just a small amount of medication.

miga 2021-08-18 21:33:31 +0000 UTC [ - ]

It has been conjectured before, that probably "most published research findings are false" https://en.wikipedia.org/wiki/Why_Most_Published_Research_Fi.... But science is a body of the findings that are confirmed many times, harder and harder to falsify.

amrcnimgrnt 2021-08-19 03:15:01 +0000 UTC [ - ]

You inspired me to come up with this conjecture:

The number of correct papers grows logarithmically with the number of papers published.

api 2021-08-18 20:55:21 +0000 UTC [ - ]

If you are descending a learning gradient you don't need every step to be perfect. You just need the general progress to be along the gradient. If science is even a little bit more right than wrong, theoretically it will follow the gradient over time (with some wiggles).

I don't think anyone would argue that more than 50% of scientific publications or findings are flawed.

oerpli 2021-08-18 21:35:17 +0000 UTC [ - ]

Some (e.g. Marc Andreessen: https://richardhanania.substack.com/p/flying-x-wings-into-th... ) say, 90% are either wrong or useless. It's roughly what I would guess as well.

AussieWog93 2021-08-19 02:18:50 +0000 UTC [ - ]

Based on my experience as a PhD student (Machine Learning Algorithms for Brain-Computer Interfaces), I would say that useless papers are far more of a problem than wrong ones.

I didn't see a lot of evidence of outright fraud, but a lot of mundane and irrelevant research ("This algorithm happens to work well on this one particular test dataset well") presented as if it were a breakthrough that was important to the field.

"Publish or perish" is a helluva drug.

api 2021-08-19 12:12:38 +0000 UTC [ - ]

You get what you incentivize of course.

The question is: what else could we incentivize that would be better and would still allow us to scale science?

One of the hardest things about scaling human activity is figuring out what your goal functions should be. I don’t think raw publication count is a great metric.

The one I personally always used was engineering usefulness, but that only works for those areas that are close to application. It’s an increasingly useless metric as you get far from application, but the stuff that is far from application is essential to our large scale understanding and to future applicable research.

mirker 2021-08-18 21:49:50 +0000 UTC [ - ]

Citations follow some power-law type of distribution, so the 10% (or whatever) of useful ones are basically the high impact papers, anyway. It would be more surprising if citations and usefulness were anti-correlated in some way.

Back to the “gradient” analogy, in this case, papers have a feedback mechanism to suppress less useful papers. Science is not a random flurry of results mashed together.

AussieWog93 2021-08-19 02:25:05 +0000 UTC [ - ]

>It would be more surprising if citations and usefulness were anti-correlated in some way.

I have definitely seen some highly cited and influential papers overstay their welcome and hold the field back.

That's not to say they didn't propel the field forward when they were initially published, just that their overbearing influence causes us to stick to a paradigm that is no longer producing results.

"Science progresses one funeral at a time" and all that.