“Pack it in, mathematicians, someone owes LLVM a million bucks”
hevalon 2021-08-19 13:44:34 +0000 UTC [ - ]
muldvarp 2021-08-19 13:40:08 +0000 UTC [ - ]
Secondly, as others have already explained, endless loops are UB and optimized away, but, thirdly, even if they weren't, C++ integers are more like Z/nZ than Z and checking if Collatz holds in Z/nZ is rather trivial.
stephencanon 2021-08-19 13:55:27 +0000 UTC [ - ]
This is an widely-held misconception, but C++ signed integers are _nothing_ like Z/nZ. Unsigned integers are (except for division), and machine arithmetic often is, but in the C++ abstract machine, signed overflow is undefined behavior.
jasode 2021-08-19 14:19:36 +0000 UTC [ - ]
That soundbite also repeats/paraphrases what thread author Joe Groff said "LLVM considers infinite recursion to be UB,"
But... the key is infinite loops without side effects can be optimized away. Endless loops even without a provable termination (The Halting Problem) are not UB.
But in this case, the value 'x' has a side effect of determining what bool of true/false value to return. That while() loop does not look like it's free of side effects.
agent327 2021-08-19 14:50:48 +0000 UTC [ - ]
That leaves only the loop itself, which can run infinitely long (which would be UB, and the compiler is free to assume that won't happen), or it won't, in which case it will always return true. Thus, like them or not, the rules of C++ allow this particular outcome.
I can't say I like this kind of optimisation. Sure, you can construct neat circus tricks with it like this, but other than that it seems pretty pointless for real-world software, and something that could easily turn a simple mistake into a debugging disaster.
gpderetta 2021-08-19 16:19:18 +0000 UTC [ - ]
bidirectional 2021-08-19 14:24:41 +0000 UTC [ - ]
Denvercoder9 2021-08-19 15:01:03 +0000 UTC [ - ]
detaro 2021-08-19 13:30:47 +0000 UTC [ - ]
AnIdiotOnTheNet 2021-08-19 13:34:32 +0000 UTC [ - ]
tsimionescu 2021-08-19 14:08:34 +0000 UTC [ - ]
I'd also note that this particular footgun only exists in C++, the C standard allows loops that never terminate and never do IO or volatile memory access.
rrss 2021-08-19 14:30:19 +0000 UTC [ - ]
foxfluff 2021-08-19 14:48:56 +0000 UTC [ - ]
Doesn't seem like UB but optimizing effectless loops out is permitted.
AnIdiotOnTheNet 2021-08-19 14:21:26 +0000 UTC [ - ]
Thiez 2021-08-19 14:46:11 +0000 UTC [ - ]
I think the real problem here is that people underestimate the complexity of C and C++. It often feels like it would be easy to guess what the assembly would look like for some snippet of C/C++ code (especially when it's not using any C++ features, such as the loop in the article) when we really should be thinking in terms of the abstract machine that the standard defines.
AnIdiotOnTheNet 2021-08-19 15:03:02 +0000 UTC [ - ]
I just don't think having 'gotcha' rules the compiler exploits to do whatever it wants regardless of programmer intent is a good idea, and that's how C/C++ treats UB. UB should be a compiler error and force the programmer to specify what they want the result to actually be so there are no assumptions on the programmer end and no surprises from the compiler.
Thiez 2021-08-19 15:19:47 +0000 UTC [ - ]
> UB should be a compiler error and force the programmer to specify what they want the result to actually be so there are no assumptions on the programmer end and no surprises from the compiler.
Many types of UB are not easily detected at compile time, or cannot be detected in advance at all. For instance, prompting the user for an integer and incrementing it by 1 is UB when the user enters the maximum value of an integer. So should the compiler forbid adding two numbers unless you explicitly guard against overflow? Perhaps it should, but the resulting language definitely wouldn't look anything like C.
Besides, it's not that the compiler is going out of its way to screw you over with a 'gotcha' rule, it's just optimizing using the assumption that your program is free of UB. As a programmer it is your responsibility that there is no UB. If you feel there are too many pitfalls you could use a different programming language, lobby the standards organization for a standard that has less UB, or use compiler options to detect it (e.g. ubsan for clang).
Denvercoder9 2021-08-19 15:10:39 +0000 UTC [ - ]
AnIdiotOnTheNet 2021-08-19 16:06:28 +0000 UTC [ - ]
That's a fair point, but it still feels wrong to me that the compiler can throw away large chunks of code and change the outcome of the function silently.
foxfluff 2021-08-19 14:39:16 +0000 UTC [ - ]
I think it's easier to define in a standard what's conforming behavior than it is to exactly and precisely define all the possible optimizations a compiler may (or may not) perform. So "undefined behavior" is a decent starting point. Exceptions can be added later, but if you try to spec all allowed optimizations in advance, the spec is going to be massive and also a massive burden.
AnIdiotOnTheNet 2021-08-19 14:55:57 +0000 UTC [ - ]
Yeah, my problem isn't that optimization is changing the code per-se, it's that it is introducing a non-obvious bug by doing so. Ostensibly that loop either returns true or never returns. What if it were provided a number that would never return (assume this is possible given the algorithm provided), but because of the "optimization" now returns true? It literally gives us the wrong answer.
foxfluff 2021-08-19 15:44:51 +0000 UTC [ - ]
On the other hand, it is not hard for me to incorporate into my mental model of a language that a program should eventually "do something" and if it doesn't, it is literally a wrong program.
In fact, that's exactly what I thought when I saw the code in question. Before even realizing what it is about: I read it like a normal program, I saw that it can only ever return true, therefore that code can do only one thing, and all that fluff in there does not impact the end result. So I don't think it's particularly non-obvious, but it does require something from the reader. I'm ok with that, there's a lot in C and C++ you can't just assume.
I call this a "weird program." It's trying to brute-force an answer out of the computer, but it's the kind of an answer that may or may not ever come. Hardly unlike a program that tries to answer the halting problem. Not necessarily an illegitimate program (bruteforcing is a legitimate way to find answers), but not the kind of thing you'd put inside a normal application. So I'm not concerned about normal applications running into this issue much. If a normal application could go into an infinite do-nothing loop, that is almost definitely a bug -- a literally wrong program. The optimization here only changes how that wrongness manifests, which is expected for UB & optimizations.
I do like that C gives you the escape hatch of using a constant controlling expression if you really want a true "potentially-infinite" loop.
AnIdiotOnTheNet 2021-08-19 15:59:27 +0000 UTC [ - ]
Consider this hypothetical: the loop is only potentially infinite because it is written incorrectly. In testing, I would notice an infinite loop and debugging where that is happening is pretty trivial. When the compiler optimizes that away to just always return true, it has made the source of the bug much more difficult to pin down because the code I asked the compiler to make produces a completely different result than what the compiler compiled.
foxfluff 2021-08-19 16:05:38 +0000 UTC [ - ]
Some bugs end up being extremely difficult to debug. I accept that as a fact of life; in exchange, we get pretty decent performance.
Guess what I do when there's a bug I have a hard time pinning down? Recompile with optimizations turned off.
By the way, I don't think I have ever witnessed a bug caused by an accidentally-infinite loop getting optimized out. So making that almost-never-happens situation easier to fix is not worth trading any optimizations for, imo. I'd rather just get a warning from compiler/static analyzer (if the code isn't the way it is due to macro expansion).
tsimionescu 2021-08-19 15:05:06 +0000 UTC [ - ]
AnIdiotOnTheNet 2021-08-19 16:00:54 +0000 UTC [ - ]
alisonkisk 2021-08-19 16:17:30 +0000 UTC [ - ]
cjfd 2021-08-19 13:39:16 +0000 UTC [ - ]
detaro 2021-08-19 13:46:21 +0000 UTC [ - ]
In a valid C++ program, every thread eventually does one of the following:
* terminate
* makes a call to an I/O library function
* performs an access through a volatile glvalue
* performs an atomic operation or a synchronization operation
It's fairly obvious how a loop that has none of these side-effects is thus a target to get turned into one that's guaranteed to terminate.
folli 2021-08-19 13:34:22 +0000 UTC [ - ]
davikrr 2021-08-19 13:34:43 +0000 UTC [ - ]
jasode 2021-08-19 14:06:17 +0000 UTC [ - ]
Not sure why he used the word "recursion" since collatz() is not calling itself.
forgetfulness 2021-08-19 15:23:04 +0000 UTC [ - ]
tpush 2021-08-19 14:29:41 +0000 UTC [ - ]
andrewaylett 2021-08-19 15:41:05 +0000 UTC [ - ]
tomhallett 2021-08-19 15:14:00 +0000 UTC [ - ]
detaro 2021-08-19 16:15:32 +0000 UTC [ - ]
GuB-42 2021-08-19 15:38:19 +0000 UTC [ - ]
In reality, the reason is that out of the two outcomes: true or infinite loop, the infinite loop violates the standard, and therefore, the result must be true. Another way to see it is that the only way for the code to be valid is if the programmer knows that the Collatz conjecture is true, and the compiler trusts the programmer, saying that the compiler knows the answer to the Collatz conjecture is a kind of circular reasoning.
There are other issues here: the numbers are bounded (a 128 bits integer), but the conjecture is not. And no million dollar prize have been assigned to this problem (it is not one of the millenium problems). Also, an infinite loop is not a result, it is actually the textbook example of an undecidable problem.
ColinWright 2021-08-19 13:43:44 +0000 UTC [ - ]
(b) The Collatz Conjecture isn't one of the Millennium problems;
(c) It's using bounded numbers in the range where the Collatz Conjecture is known to be true;
(d) It seems reasonable to say "If this routine exits then the value will be "true";
(e) None of what I say here is deep or new, so when people get excited about this sort of thing it makes me wonder what I've missed.
So ... what have I missed?
stephencanon 2021-08-19 14:01:13 +0000 UTC [ - ]
ColinWright 2021-08-19 14:30:27 +0000 UTC [ - ]
tpush 2021-08-19 14:40:06 +0000 UTC [ - ]
Searching 'collatz conjecture prize' quite literally puts a prize of a bout 120MM Yen (~ $1MM) as the top result.
ColinWright 2021-08-19 14:57:58 +0000 UTC [ - ]
Thanks.
ithinkso 2021-08-19 15:04:51 +0000 UTC [ - ]
[0] https://en.wikipedia.org/wiki/Beal_conjecture
ColinWright 2021-08-19 15:08:14 +0000 UTC [ - ]
<shrug>
Thanks for finding this and letting me know, I've learned something.
lostcolony 2021-08-19 13:45:26 +0000 UTC [ - ]
cabaalis 2021-08-19 14:05:53 +0000 UTC [ - ]
WJW 2021-08-19 14:17:45 +0000 UTC [ - ]
But also a layer deeper, collatz() takes a uint128 as input and the Collatz conjecture has already been proven to be true for all numbers that would fit into a uint128. So LLVM arrives to the correct answer (`return true`) because of the wrong reason, mathematically speaking.
ludocode 2021-08-19 15:13:17 +0000 UTC [ - ]
plasticchris 2021-08-19 15:33:16 +0000 UTC [ - ]
WJW 2021-08-19 15:40:46 +0000 UTC [ - ]
Thiez 2021-08-19 14:17:44 +0000 UTC [ - ]
So the compiler doesn't actually need to figure out if the loop is infinite or not.
Denvercoder9 2021-08-19 14:59:24 +0000 UTC [ - ]
Technically, it doesn't, but it requires programs to make forward progress, which is defined as either terminating, calling I/O functions, accessing a volatile variable or performing an atomic or synchronization operation. Infinite loops that'll eventually do one of these are perfectly valid C++ and don't have to terminate.
HelloNurse 2021-08-19 15:31:40 +0000 UTC [ - ]
Denvercoder9 2021-08-19 15:57:24 +0000 UTC [ - ]
junon 2021-08-19 14:09:45 +0000 UTC [ - ]
cies 2021-08-19 14:05:45 +0000 UTC [ - ]
ColinWright 2021-08-19 14:25:50 +0000 UTC [ - ]
Never mind, on-line forums are really the right place for subtlety.
lostcolony 2021-08-19 14:47:20 +0000 UTC [ - ]
I can't speak to online forums (I use them for so many things), but I'm sure Twitter is the right place for humor, and not the right place to announce a serious belief you've proved a long standing mathematical problem.
ColinWright 2021-08-19 15:03:05 +0000 UTC [ - ]
OK, I guess my sense of humour hasn't translated into your context. Clearly my comment makes you think I didn't get the joke, but I did.
> "I'm guessing you didn't realize that this joke is presented as a joke, ..."
Oh, I did. Really, I did.
I guess we're just talking past each other. <shrug> You didn't get my joke, probably because you think I didn't get the original.
Time to give up.
lostcolony 2021-08-19 15:10:34 +0000 UTC [ - ]
ColinWright 2021-08-19 15:19:26 +0000 UTC [ - ]
I really do think we've just been talking past each other. When something is as obviously "tongue-in-cheek" as the original post, I find it funny when people (a) get the joke, then (b) play it with a completely straight bat. It appears that doesn't cross cultural boundaries in the way I thought it would.
You're not alone ... some of my comments are getting significant numbers of downvotes.
<shrug> ... "Humour".
I'll go back to appearing humourless and commenting purely technically on purely technical things. Meanwhile ... nice to connect with you ... thanks for the responses.
lostcolony 2021-08-19 15:29:17 +0000 UTC [ - ]
ColinWright 2021-08-19 16:07:48 +0000 UTC [ - ]
I appreciate the feedback ... I think the best thing for me is just not to try again, and to be explicit when I know something is a joke, and clear that I'm adding factual information.
Watching people continue to downvote the comments I've made, HN is clearly the wrong place for me to do otherwise.
alisonkisk 2021-08-19 16:12:22 +0000 UTC [ - ]
rtkwe 2021-08-19 14:30:24 +0000 UTC [ - ]
It's not but there are more bounties out there than just the Millenium list. For the Collatz Conjecture specifically there's a 120M Japanese Yen prize which at this moment is a bit more than 1 million USD.
https://www.prnewswire.com/news-releases/bakuage-offers-priz...
ColinWright 2021-08-19 15:05:52 +0000 UTC [ - ]
Thanks.
gameswithgo 2021-08-19 14:38:46 +0000 UTC [ - ]
For instance not all languages that target LLVM will get this result. That is interesting.
tantalor 2021-08-19 13:48:23 +0000 UTC [ - ]
Specifically, Collatz says "if x is even then x /= 2" but the code checks "x is odd".
yissp 2021-08-19 15:02:43 +0000 UTC [ - ]
The input is an unsigned __int128, and I believe Collatz has only been verified up to 2^68.