Hugo Hacker News

Evidence of fraud in an influential field experiment about dishonesty

IncRnd 2021-08-17 16:07:28 +0000 UTC [ - ]

In case you honestly can't reach the site, I've archived the page here - really.

https://archive.is/zayEm

function_seven 2021-08-17 15:53:11 +0000 UTC [ - ]

This is some damning info. Not only as evidence of outright fraud, but also of incompetence in that fraud. Uniformly generating something that's obviously not uniform, not even paying attention to fonts (!), and fabricating your result by adjusting not the collected "current" data, but rather the baseline data.

Ariely's response[1] to this puts the fraud on the insurance company. He rightly notes that better data anomaly testing would have caught this, but I also wonder why the company "knew" which hypothesis to put their thumb on? And where did the impetus come from to double the number of records rather than leave N=6,744?

[1] http://datacolada.org/storage_strong/DanBlogComment_Aug_16_2...

frostburg 2021-08-17 15:55:37 +0000 UTC [ - ]

I don't see how this could have happened without either collusion between the insurance company and the researchers or at least one of the researchers directly tampering with the data.

xondono 2021-08-17 17:46:47 +0000 UTC [ - ]

While I see the possibility of someone at the insurance company being the “motivated party” to fudge with the data, I agree that it’s difficult to see how one would do about it if one does not know what the researchers would look at.

And if I was in charge of the study (or anyone with enough experience I think), that “test” should have been kept secret from the company.

This to me implies at least blurry boundaries between the researcher (in this case Ariely) and someone at the company.

I will also admit not to be very impartial to this, being to this day very unconvinced by Arielys research and especially his conclusions and the platform he has built around them.

function_seven 2021-08-17 16:11:22 +0000 UTC [ - ]

Right? The insurance company doesn't have a hypothesis they're married to. You'd think they would really just want to know an actual technique to improving mileage honesty among their policyholders. It's the researcher who has the motive to show a significant finding. But I can also see Ariely's contact at the firm wanting to "help him out" by providing "good"(!) data.

I'm going through the responses from the other 3 study authors, and I'm seeing a pattern in their replies. They're all—as gently and politely as possible—laying it at Dan Ariely's feet. (He does so as well in his own response; but there's a tiny whiff of skepticism—at least in my reading between lines—that he was blameless.) Mr. Bazerman's response seems the strongest in this regard.

From Francesca Gino[1]:

> I start all my research collaborations from a place of trust and assume that all of my co-authors provide data collected with proper care and due diligence, and that they are presented with accuracy. In the case of Study 3, I was not involved in conversations with the insurance company that conducted the field experiment, nor in any of the steps of running the study or analyzing the data.

From Max H. Bazerman[2]:

> The first time I saw the combined three-study paper was on February 23, 2011. On this initial reading, I thought I saw a problem with implausible data in Study 3. I raised the issue with a coauthor and was assured the data was accurate. I continued to ask questions because I was not convinced by the initial responses. When I eventually met another coauthor responsible for this portion of the work at a conference, I was provided more plausible explanations and felt more confidence in the underlying data. I would note that this coauthor quickly showed me the data file on a laptop; I did not nor did I have others examine the data more carefully.

From Nina Mazar[3]:

> I want to make clear that I was not involved in conducting the field study, had no interactions with the insurance company, and don’t know when, how, or by whom exactly the data was collected and entered. I have no knowledge of who fabricated the data.

and

> This whole situation has reinforced the importance of having an explicit team contract, that clearly establishes roles, responsibilities, and processes

[1] http://datacolada.org/storage_strong/Gino-memo-data-colada-A...

[2] http://datacolada.org/storage_strong/fraud.resonse.max_.8.13...

[3] http://datacolada.org/storage_strong/20210816_NM-Response2Da...

namelessoracle 2021-08-17 17:02:49 +0000 UTC [ - ]

> The insurance company doesn't have a hypothesis they're married to

While the insurance company as a whole didn't. There may have been someone inside the insurance company who wanted to justify spending funds to do this research and wanted a laurel on their cap about how they improved the accuracy of self reporting by X or Y value.

duxup 2021-08-17 16:15:56 +0000 UTC [ - ]

Sounds like the classic case of a bunch of folks involved and the important thing is 'nobody's job' and so nobody does it.

You see this in engineering failures when a bunch of companies or groups are involved to limited extents and everyone does their part, but nobody does something important because it wasn't defined who would do that.

rz2k 2021-08-17 16:27:43 +0000 UTC [ - ]

Could it be one executive wanting get credit for his brilliant initiative by scientifically "proving" how effective it is?

Using the uniform distribution makes it seem like it wasn't one of the researchers or anyone at the insurance company who has studied actuarial science.

frankster 2021-08-18 19:17:35 +0000 UTC [ - ]

The most interesting thing about Dan Ariely's response is that the final line, containing his name, is written in a different font to the rest of his statement!

bediger4000 2021-08-17 15:58:11 +0000 UTC [ - ]

Does this count as an "easter egg"? I mean, fraud in an experiment about dishonesty is irony in its most ferrous form.

meowface 2021-08-17 16:03:50 +0000 UTC [ - ]

It is extremely ironic, but I think the most ironic would be fraud in a research paper about research paper fraud.

prvc 2021-08-17 16:10:24 +0000 UTC [ - ]

Not ironic, as I understand the concept.

IncRnd 2021-08-17 16:26:16 +0000 UTC [ - ]

It is a textbook example of situational irony for people who are on the outside of this situation, looking at how the results differ from what was expected.

jjk166 2021-08-17 16:50:39 +0000 UTC [ - ]

> a state of affairs or an event that seems deliberately contrary to what one expects and is often amusing as a result.

Dishonesty in a study on dishonesty is contrary to what I would expect and, at least in my opinion, amusing as a result.

civilized 2021-08-18 01:56:46 +0000 UTC [ - ]

Dan Ariely's response bothers me. Here's the crucial passage:

> The work was conducted over ten years ago by an insurance company with whom I partnered on this study. The data were collected, entered, merged and anonymized by the company and then sent to me. This was the data file that was used for the analysis and then shared publicly.

On a plain reading, Ariely clearly states here that the data file prepared by the insurance company was the same as the one used for the analysis and shared publicly. But we already know that he modified the file:

- He is listed as the creator and last modifier of the published Excel file.

- In an email to his collaborator Nina, he admitted that he switched the outcome labels in an attempt to make the data easier to understand. So clearly the published file was not identical to the one provided by the insurance company.

So I worry that Ariely is taking us for a ride here. He had the most incentive to get the desired results, far more than the nameless insurance company.

The original insurance company dataset, if it exists, is probably lying around in an email somewhere. They were, after all, able to provide the final version, which was prepared not too long after the original.

woofie11 2021-08-18 02:35:36 +0000 UTC [ - ]

He indubitably is. And this is common. MIT uses NDAs and non-disparage agreements to cover stuff like this up, unless you bring it up first in a very public forum. Stanford has researchers who are well-known outright frauds now driving education reform in California. I know of incidents at Brown too.

This stuff is common, especially in elite schools, and especially among researchers with a lot of popular uptake. Stuff like this brings academic jobs.

Universities have no incentive to fire high-impact professors, and internal channels are populated by friends.

Competition for faculty positions is 1000:1 at elitist schools; jobs aren't available for those who cheat.

radicaldreamer 2021-08-18 06:50:18 +0000 UTC [ - ]

Could you elaborate on the Stanford education reform topic?

woofie11 2021-08-18 14:01:12 +0000 UTC [ - ]

Do a quick web search for Dweck:

1) Results don't replicate.

2) Look up her article in Nature, which claims to be preregistered, then look at the actual preregistration. See the data mismatch.

3) Read her book, and look at the claims in the introduction. Compare those to effect sizes. Are the claims supported or contradicted even by her (already baked) data? By replication studies?

Read the Boaler books and studies. You'll find a lot of fake information too. Not just studies, but also historical stuff (e.g. passages about Einstein and other famous folks).

This is kind of an open secret in the ed research community. Everyone knows about it. Few dare talk in public. You can see outcomes for those who did. And for the general public, it's too technical. That a preregistration comes before data collection isn't something one can explain to a popular audience.

This is the culture at MIT, Brown, Stanford, and I believe many other elite schools. Those three are just where I have first-hand information.

frankster 2021-08-18 19:19:28 +0000 UTC [ - ]

His statement is written in two different fonts! His name is written in a different font to the rest of the statement.

This doesn't prove much, but it's an embarrassing coincidence for him.

frostburg 2021-08-17 15:54:09 +0000 UTC [ - ]

I've read all the individual statements by the authors and they don't even try to explain how this happened, they just vaguely bemoan their own trusting nature and deflect.

I don't see how it would be plausible for the insurance company collecting the data to independently tamper with it in that specific way (and getting the typeface wrong) before passing it to the unsuspecting researchers.

Oh, and of course they immediately taught the result to MBAs and executives. I wonder how long it'll take to filter out of the system.

civilized 2021-08-17 23:58:25 +0000 UTC [ - ]

In the comment threads here I see a lot of cynicism and blame directed at the research team, especially Dan Ariely. And I agree that they, and social science researchers in general, need to do more exploratory analysis to validate their data instead of just blindly feeding the sausage into some hypothesis test. But it's hard to believe that any of them knew about the fabrication, because they published the data used to discover the fabrication. They did so voluntarily and not under any duress.

If Ariely were involved in the fraud, wouldn't he have resisted the move to publish the data? It is very easy for researchers to make up excuses to not share data, and very difficult to force them to do so.

In the absence of an answer to this question, I find it easier to believe in shenanigans at the company. I agree it's hard to imagine what their motivation was, but it's even harder to understand why the researchers published their data so readily if one of them fabricated it.

The incompetence of the fraud doesn't really push me one way or the other. Industry and academia are definitely both extremely capable of doing incompetent, dishonest nonsense with data.

function_seven 2021-08-18 00:57:47 +0000 UTC [ - ]

I agree that the inept way the data was fabricated doesn’t point in one direction or the other. But as to the willingness to release the underlying XLS file, it’s also consistent with the other coauthors looking to distance themselves from their cheating partner. If “the fourth author” sincerely believed in the effect of top-first pledges, he’d assume a replication attempt would make legitimate the “shortcut” his younger self took. When the original results failed to replicate, and the world started looking for the original data, being the only member of the original team to object to releasing it would make things even worse when the fraud eventually was uncovered. Easier to roll with it and work on discrediting the source instead.

Okay, so that last paragraph is pure speculation. But people—even smart people!—do nonsensical things all the time. “If he was guilty, why would he do X?” is rarely a good defense.

All that being said, I’m still more likely to believe the fabrication was done by someone at the insurance company. But it would be better if we got more detail on who Dan was working with. Or the exact method of data delivery. (The file in question shows Dan as the original creator in metadata, and has the Cambria/Calibri issue. How exactly did that happen?)

Not holding my breath on that, though.

DominikPeters 2021-08-18 03:30:57 +0000 UTC [ - ]

At least one other coauthor also had the data, which would have made it more awkward to withhold it. Still, this argument overall does point towards none of the authors knowing that the data was made up.

duxup 2021-08-17 16:11:04 +0000 UTC [ - ]

What would the motivation be here? Just to get a paper out?

Does the insurance company have any involvement / motivation?

Outside someone at the insurance company who want an outcome of the paper to fit some goal of their own, hard to imagine the insurance company would "care" about the results enough to mess with the data. Although I'm open to the possibility that someone was just lazy and they wanted another dataset and so someone just fabricated it based on another dataset to just get it done with.

milliondollar 2021-08-17 19:52:32 +0000 UTC [ - ]

I can COMPLETELY believe it was a lazy analyst at an insurance company. Boss: where's the data that Bob wanted? er... Here! (Source: been inside many insurance companies.)

Hanlon's (and Ocham's) Razor all the way here. Laziness / stupidity wins.

duxup 2021-08-17 20:55:05 +0000 UTC [ - ]

Yeah the "fine here's your data, whatever" scenario with some disconnected guy who doesn't care, kinda believable.

civilized 2021-08-18 00:16:08 +0000 UTC [ - ]

It's hard to overstate how disengaged and checked out the data people at big companies can be. The idea that data is valuable enough to merit executive attention is pretty new, and data scientists often find themselves on new teams in old, blue chip companies where data has gotten no respect for many years. And the long-tenured data employees, if there are any, have the attitude you'd expect from this.

ms9 2021-08-19 05:59:37 +0000 UTC [ - ]

Agree. They had nothing to lose really....

bsder 2021-08-17 22:20:45 +0000 UTC [ - ]

100% agree. Especially if the data was going to be hard to collect.

I have fabricated data to shut up my political chain more than once in my life. Why? Because they kept pestering me after being told that the data doesn't exist yet but will exist naturally at some point in the future.

So, I can fight with my management chain because some VP has "collect data about X" on his quarterly goals and simply won't take "No" for an answer. Or I can feed him crap data that he will most likely forget about. And if the data is actually important, the data will fix itself in <n> months when I collect it.

Most probably, the data never gets looked at and I never waste the time collecting it. All good. I'm a wonderful team player that gets his job done. Probability: 95%

Or, possibly, some intern comes to me in 18 months asking why my data seems to be ... off. Cool. Unbelievably, someone is really using that data. I give a "Hrm. I'll go look at that." prioritize the poor intern, collect the data and give them an attaboy for being so diligent. Intern is happy and his boss thinks he's extra diligent. Probability: 4%

Or, if the data was actually important, I collected it and resubmitted it myself at the first point we could realistically collect it because I wanted it for myself, too. Probability: 1%

However, if that fabricated data somehow escaped the company and people depended upon it, yeah, egg yolk on the face all around, and I might get fired. Probability: 0% to a three digit engineering approximation.

civilized 2021-08-18 00:25:43 +0000 UTC [ - ]

The culture at your company must be really awful to have driven you to this. Why can't you just tell these VPs to go pound sand, since data takes some time to collect and can't simply be willed into existence by working long hours? Are you afraid they'll badmouth you to your boss or something?

bsder 2021-08-18 02:07:13 +0000 UTC [ - ]

Because I already told said VP to get lost, loudly, with justification and he still didn't listen.

It wasn't just my group being harassed. It was probably 15+ design groups. Sure, that VP eventually got nuked, but fighting with a shitty VP generally results in you losing your job before he does.

Politics is a thing. You pick your battles--you only get so many bullets. Too many people here on HN think that fighting every single slight makes you honorable. No, it actually makes you jerk--shitty things happen even at the best places and you need to deal with them without pissing everybody off--it's called being an adult. Sure, at some point enough shitty things happen that you should leave. Prior to that you need to learn how to deal with things so that your team is protected.

Feeding that VP fabricated data meant that he thought I was "good guy" team player. My chain VP got less political heat. My team got an extra positive evaluation for generating data early and going "above and beyond". Everybody on our side got back to doing their job instead of something stupid that would never help us.

All this at the possible cost that I might have to personally say "Whoops, I screwed that up. My bad." 18 months down the road for a single VP who may not even be there that long. I'm gonna take that tradeoff 99 times out of 100.

Now, is that the case here? Don't know and it doesn't look like it. However, don't rule out the fact that someone got "tasked" with something that was obstructing them and did the absolute minimum thing to make it go away.

civilized 2021-08-18 02:22:27 +0000 UTC [ - ]

I appreciate the insight and definitely don't judge you for adapting to survive in your environment. It seems like a really toxic environment though. Really unfortunate that this level of dysfunction is normal in some places.

ms9 2021-08-19 06:01:02 +0000 UTC [ - ]

Thanks for sharing this. It is a good insight. WTF who cares, get the paycheck and whatever is an attitude far more common in industry than in academia.

nn3 2021-08-17 19:13:04 +0000 UTC [ - ]

That was a long standing question of mine: Are people who build their careers around building experiments that mislead others (as a lot of psychological experiments do) more truthful in their papers? At least in this case the answer seems to be no.

superjan 2021-08-17 20:29:43 +0000 UTC [ - ]

Small criticism: the histogram from the UK DoT uses varying size buckets, that makes the data look like a normal distribution. The histogram from the dataset is plotted with constant size buckets. It does not affect the conclusion though.

vincent-toups 2021-08-17 17:26:28 +0000 UTC [ - ]

Whenever I see this stuff I wonder how much fraud is being perpetrated by people with enough statistical know how to make it hard to detect.

derbOac 2021-08-17 19:30:42 +0000 UTC [ - ]

From my own experiences this is the tip of the iceberg, but the majority isn't fraud per se, more like questionable research practices that cumulatively amount to something similar. So maybe not making up data, but fishing (either variables, people, or models) until you find the right combination. On top of that are all the misuses of things, that aren't really fishing, but rather use of methods that produce significant findings, but for reasons other than what is assumed.

woofie11 2021-08-18 02:40:53 +0000 UTC [ - ]

This is exactly right. About half of the people I've seen leave MIT for faculty positions at better schools engaged in these kinds of academic fraud. I only saw one case of outright fabrication, but publishing results which the research knew they had no support for? Very common.

Re-analyzing data 20 times, changing methodologies (median versus mean, handling of outliers, etc.) typically is enough to get an interesting result, and isn't enough to raise alarms. Most people are competent enough to do something like that.

Credit theft is rampant at MIT as well. Financial schemes too. No one does a darned thing about it either.

vlovich123 2021-08-17 17:37:54 +0000 UTC [ - ]

Look up Darrel Huff [1][2]. When there’s enough money on the table, there’s lots of funding (sincere or otherwise) that’s used to try to establish a scientific counter narrative. Stuff like this is more run of the mill corruption than what you see with things like climate change where the underpinnings of the entire industrial economy is at stake (or milder things like Jule).

[1] https://statmodeling.stat.columbia.edu/2012/04/27/how-to-mis... [2] https://en.wikipedia.org/wiki/Darrell_Huff

skynetv2 2021-08-17 16:05:29 +0000 UTC [ - ]

To get around the hug of death - https://archive.is/zayEm

2021-08-17 16:45:45 +0000 UTC [ - ]

radicaldreamer 2021-08-18 06:53:53 +0000 UTC [ - ]

A lot of eyes are going to be on other papers by Dan Ariely (rightly so) and I suspect we’re going to discover a whole lot more data anomolies.

ms9 2021-08-19 04:45:52 +0000 UTC [ - ]

oops typo on prior comment. I meant to say I do NOT think he is blameless. Just not a fraud either.

ghostbrainalpha 2021-08-18 16:10:15 +0000 UTC [ - ]

I wish the people who exposed this weren't anonymous. I would love to donate to a group who was dedicated to exposing data fraud in studies.

Sort of like Scientific Snopes, is there anything like that already?

garyfirestorm 2021-08-17 15:51:02 +0000 UTC [ - ]

HN Hug of death

2021-08-19 04:24:41 +0000 UTC [ - ]

ms9 2021-08-19 04:41:08 +0000 UTC [ - ]

Whether Ariely is honest or not, he is for sure not stupid; and it would be very stupid for him to fabricate data, let alone be so sloppy about it, make it public (which is rare) and then agree to a follow up paper about how the results did not replicate. Sorry, but a guy as smart as he is would cheat more effectively than that. That does not mean he is without fault. I suspect his lab is way too big. I also checked out his schedule once and he was speaking all over the world with a crazy travel schedule at the same time that he was managing that giant lab, coauthoring many papers, and employing way too many research assistants and post docs. Even if the folks on his team were 97% great, that still leaves room for a truly bad or at least very sloppy apple in the cart at that research center. No idea how anyone could do any thorough quality control with that many moving parts going on at once. I do suspect it was a mess created at the insurance company, or perhaps it was some new kid at the lab. Or perhaps Professor Plumb did it with the candlestick in the dining room. I don't have a "clue" (for those that remember the board game "Clue"). I just don't see Ariely doing such a sloppy job of fabricating data for a paper he did not need. I am not saying he would never cheat at all (again I don't have a clue), just that if he had cheated, he would have been smarter about it.

The bigger issue is that just about all these star researchers end up with MORE MORE MORE disease, which is taking over academia. People are working on more projects, giving more talks and writing more papers with more studies with more moderators and more mediators using more data sources and more research assistants and more post docs while also trying to write more books and give more talks to more audiences. Sorry, but a mess up is inevitable with that many things going on. Still, I hope this does not turn into a witch hunt after one guy, because the truth is that mistakes like this likely happen to all the star academics who have overextended themselves. How could they not?

gameswithgo 2021-08-17 17:21:02 +0000 UTC [ - ]

I can appreciate the irony at least.