This is a nice idea, and one I also advocate for, however it's important to keep...

actionfromafar · on Dec 9, 2024

As long as the compiler is not optimizing by "let's give this 3 seconds of solving time, then continue if no better solution is found", then optimizing is not inherently nondeterministic.

CorrectHorseBat · on Dec 9, 2024

Counterpoint: Archlinux is 89% reproducible with optimizations enabled. The only thing I see which is difficult to make reproducible is optimizations with a timeout.

not2b · on Dec 9, 2024

Instead of using a timeout, an optimization that can must be cut off if the cost is excessive can keep some kind of operation or size count, where the count is strictly a function of the input. For example, an optimization based on binary decision diagrams (BDDs) can put a ceiling on the number of nodes in the BDD.

pjc50 · on Dec 10, 2024

This is defeatist: compilers do not usually use the system RNG to make decisions, so what's happening is entirely accidental introduction of difference which propagates.

There is "intentional input" (contents of the source files), and "accidental input" (source file full paths, timestamps, layout of memory given to you by the OS, and so on). A reproducible build system should give the same output for the same "intentional input".

(the only place where you do see RNG driven optimization is things like FPGA routing, which is a mess of closed toolchains anyway. It has no place in regular software compilers.)

screcth · on Dec 9, 2024

Why does an optimizing compiler introduce nondeterminism?

In my mind an optimizing compiler is a pure function that takes source code and produces an object file.

stevefan1999 · on Dec 10, 2024

Well, lot of things can influence here. Multithreaded build, PGO, or even the different access order of the hash table inside the code optimizer can be a factor. Things are getting probalistic and thus somewhat nondeterministic: the build itself is nondeterministic but the runtime/final execution is deterministic

Groxx · on Dec 9, 2024

"Reproducible" isn't necessary for "not modified from what everyone else gets", and that still makes some attacks FAR harder (and easier to identify, as you know what the "normal" one is). And a published Merkle tree just makes it easier to verify "none of this has changed", as opposed to SHAs on a website that could change any time.

tristor · on Dec 9, 2024

For sure, which is one of the big benefits of git + git tagging, but the issue is even if you know you received the same binary as someone else, without reproducible and auditable builds, you have no idea if that binary originated from the same code in the case of a targeted attack.

Terr_ · on Dec 9, 2024

> For sure, which is one of the big benefits of git + git tagging

That's not enough for serious security though, because git is (still) using SHA1 instead of SHA256. You would need something extra, like a signed commit.

There's also the much simpler pitfall of an attacker just creating a branch named the same as a commit, in the hopes that people will accidentally check it out instead.

namibj · on Dec 11, 2024

Then use git in sha2 mode. You just have to turn that on.