We've seen a few takes on this kind of issue, but the solution I liked the best was the linux "developers take full responsibility" approach. The "Assisted-by:" tag was a pretty nice touch too.
The article unfortunately feels more like a rant than a good exploration of the problem space.
I've struggled with this "responsibility" take. What does it mean in the context of an open source project? As far as I understand it, the original contributors of bugs are often not the ones fixing them (though they can be). Is it that if you write enough buggy code you get banned as a contributor? Is it that you're not allowed to say Claude ate my homework?
> Is it that if you write enough buggy code you get banned as a contributor?
If this is a consistent issue, your contribution would (ideally) be continuously put into a backlog until someone else with no connection to you verifies that it's as bug-free as it appears to be. (Excluding non-obvious security & performance issues)
> Is it that you're not allowed to say Claude ate my homework?
Yes. As the contributor, you should be the first one to look over the code, not someone else.
If the submitter of a PR needs to take full responsibility for the code within, then the code within cannot be LLM-generated because—depending on whether you consider it an original work by the LLM or a resurrected copy of its training data—it’s either not subject to copyright or under someone else’s copyright.
(At least for any coding LLM that isn’t trained entirely on one company’s own code and also offered by that company. That sort of LLM might be able to make the regurgitation argument work for them.)
Thus any project requiring “full responsibility” by submitters may as well just ban submitters from using LLM-based tooling. That’s the tack I’ve taken for my projects, and a number of large projects have taken that stance too.
(Before someone trots out “Technical enforcement of this is impossible!” be assured that such rules are not negated by a lack of technical enforcement; after all, there’s also no way to technically enforce that you didn’t copy someone else’s code and paste it in. But by thinking a lack of technical enforcement matters, you’re outing yourself as someone who will happily violate rules if they think they won’t get caught.)
> the solution I liked the best was the linux "developers take full responsibility" approach.
The people who can realistically submit a Linux patch that will ever get looked at is already a super select group through who-you-know network effects.
You can't apply the same system to random open source projects, the best option for people that run random small to medium sized open source projects is just to ban all unsolicited PRs, otherwise you're going to spend way too much effort sorting through the slop.
I don't think that is true at all, I'm just a random FOSS dev with no connection to the Linux kernel community and I have gotten two small commits into the Linux kernel.
There is no way you could recreate a convincing enough 90s era codebase of a japanese videogame + its associated tools + scripts and commented out codepaths with current ai tools.
I wouldn't be too sure about that. The original decompilations of Mario 64 and Ocarina of Time were done mostly by hand because LLMs weren't really around yet, but these kinds of projects seem perfectly suited for handing the gritty work off to AI: There is a clear output (exact binary recreation) and a straightforward path to get there (look at this assembly code and produce some C code from it). The decompilation of Twilight Princess jumped from very little to basically 100% of core code in the past year alone: https://github.com/zeldaret/tp
I have no doubt that this would be possible for MGS2 as well.
I don't think it's impossible, but it would take a lot of time and a lot of money; likely more time than good enough models have been commercially available.
I have been working on an incremental decompilation-based reimplementation (basically how OpenRCT2 was done) of Worms Armageddon for the past 2 months with a lot of help from LLM tools; primarily Claude Code and Ghidra MCP. I've worked on it almost every day, reaching Claude Code Max 5x's 5 hour session limit multiple times every day. Suffice to say as a software rendered, sprite-based 90s PC game, Worms Armageddon is several orders of magnitude simpler than MGS2. Despite that, I think it will be 2-3 more months of work before I can compile a fully independent version of the game.
This is despite the game being an almost ideal candidate for automated RE, as it uses deterministic game logic with built-in checksum checks in replays and multiplayer. I've downloaded all the speedruns I could find for the game (as replay files) and I've retrofitted the replay system into a massively parallel test framework, which simulates over 600 games in about 30 seconds. So Claude can port all game logic independently without much need for manual testing; the replay tests can almost guarantee perfect correctness.
MGS2 doesn't have anything like that, so every ported function requires extensive manual testing. Even with LLM tools, an accurate decomp could take years (unless you're willing spend thousands of $currency per month on it).
Can you point me to any books or online guides to getting set up with Claude Code + Ghidra MCP? I've looked into it a little and am definitely interested in trying it for a few PS1 games that have been overlooked in the hacking community. I started working on Clock Tower US manually with Ghidra and have made some progress but nothing impressive - I've got some years programming professionally but never in C and definitely not with hardware like MIPS lol
This is really cool! Your process is compelling, and your choice of game is excellent. I'd like to read a long blog post about your entire journey from the beginning to a working binary once you get there.
As it happens I do have the habit of writing very long blog posts - though none on OpenWA so far. The OpenWA readme file serves as a bit of an introduction, though it's already a month old.
Keep your eyes open for Sonic R too. Sadly a lot of the online Sonic community has been toxic to the dev for being transparent about using Claude for the majority of the disassembly. Even though he's a very talented developer with lots of credit to his name, and only took a few weeks compared to a year+ if fully manual.
Having followed his bsky during his announcement, he started off per-emptively dissing on his haters that... didn't even exist yet. Constantly posting memes about how everyone was dissing him and how AI was totally superior (and then posting his angry sessions with Claude when it got something wrong) when most other users were just "that's cool man". The thing that made him quit bsky was a (now-deleted) thread someone posted criticizing the weird crash-outs. I think he was more... normal about the whole thing, people would have received the project quite a bit more positively.
Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least. I am not sure what has changed in recent years other than people playing fast and loose with copyright (and GitHub allowing it, likely because their LLMs also stand to benefit). Introducing LLMs here is only going to introduce errors, delays and likely push you away from a reliable result.
The challenge here is readability. Reading the TP source leak you link I think it's even behind the current state of the art, as it's barely above assembly. This is where I suspect even the smallest of LLMs may help, since you don't care that much if it introduces errors.
>Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least.
Only in a very rudimentary sense and definitely not in a working compilation (much less binary equivalent) sense. LLMs have turned this from a gimmick for static analysis into something that actually works pretty well for recompilation projects.
> Only in a very rudimentary sense and definitely not in a working compilation (much less binary equivalent) sense.
Working is the easy part; the hard part is getting something that classifies as readable C. LLMs do not really help reach the "working compilation" part but benefit from it.
We are way past "working compilation" when it comes to LLMs. They are already really good at writing readable, compliable code. The big problem with LLMs is making sure the output binary actually does what you wanted it to do. But if you define the goal not merely as instructions in a vague, unspecific human language and rather as recreating a given set of binary instructions after compilation, this big drawback goes away. So in a sense they are better suited for recompilation projects than for developing new applications.
My point is that we have been past the "working compilation" way before LLMs, and I do not think anything in LLMs help with it, at best agents use these tools with the same efficiency. I disagree that they're good at writing compilable code, but agree on the readable part.
Which decompiler reliably produced working, high level C/C++ from assembly? I would have loved to use this thing you are describing here 15 years ago. Compilation is inherently lossy, so any system that could have given you this would have needed pretty heavy LLM-like features anyways.
>I disagree that they're good at writing compilable code
That was never part of the discussion, because as explained several times now it is irrelevant in this case. The existence of the original binary means all you need to do is match up things, which can be automated completely.
I do not understand what is it so hard to "generate working code". Even the free version of Hexrays was doing it 15 years ago, and I have written one in my company that I have used for over 30 years. It's actually ... trivial?
The problem is readability. No one in his right mind would call what they generate "C++". Mine still interjects assembler from time to time (and not the new version that GCC supports, but the older MSVC style) .
LLMs absolutely do not help with the generate working code part, because this is an exact problem that doesn't need nor benefit from an LLM (other than maybe automating stupid iteration?). They can help with the readability part, because here once you already have a working skeleton it doesn't matter that much if they make mistakes, as it is easy to detect.
I already asked, but I guess I'll need to ask again: Please show me this tool. Hex-rays is certainly the wrong answer, because the decompiled C code usually needs tons of manual cleaning, fixing datatypes and reconstructing function prototypes before you can compile. And even then you can't be sure about functional (much less binary) equivalence. If anything, all these traditional decompilers focused on readability, not recompilability. But even there they were much worse than LLMs.
If what you said was true, the projects mentioned above wouldn't have needed years of arduous work before the age of LLMs came to be.
I get the point, but note that (custom) datatypes and function prototypes are for readability. They are not required for working nor functionally-equivalent code.
Absolutely. This is just some delusions of a vibe coder at best. Not with just current generation of AI tools but essentially never. The conversion from C, C++, Rust or whatever, through post-processing (macros etc), through IR generation, through compile time optimizations, through link time optimizations, to the generated machine code is a one way street for low level languages. You can get a pretty close higher level approximation that matches the flow/logic/structure - but the code will never be anywhere near close to the original source code. I could write the same C++ program in 3 different ways and get identical assembly, how do you go back to the exact source? The answer is that you don't.
Here's the same simple program, written in 3 different ways, producing identical binary compatible code: https://godbolt.org/z/qWrc8fEnn
How does the AI know whether it should produce back the snippet #1, #2 or #3? It does not. It cannot.
Who cares? Who said anything about recreating the exact code? You will get usable, compilable, and surprisingly readable source code, in your language of choice, that yields the functional equivalent of the binary.
Barring obvious edge cases that could show up but don't usually, like intentional race conditions. Timing is the one area where things get iffy.
That is quite incredible if that is true. Need to read a bit into that. Can you point towards relevant literature/examples? Also: please see my questions in the comment to your other reply
That's pre-2026 thinking. At this point, with the ability to lash IDA or similar tools to an agentic harness, there is no longer any such thing as a closed-source binary.
I’m interested in how LLMs handle obfuscated code. Throw LLM with IDA MCP at EasyAntiCheat_EOS.sys or the like (as the most common examples of heavily obfuscated software) and see how far they can get.
I find the specific singling out of vibe coding interesting for a different reason; thinking back to just last month, I recall one of the rationales behind the huge DLSS5 backlash was it ruined the artists original vision. And here we are a month later being amazed at an emulator that literally lets any casual player do just that through a funky point and click interface!
I guess if they added in an MCP server there would probably be a riot.
I often find AI makes me angry and stressed out, especially when it suggests dumb solutions to problems. Honestly makes me wonder if I'm more likely to die early from chronic AI-induced stress rather than dementia.
Isn’t there a saying you only truly know something when you're able to explain it to someone else? When I get angry at LLMs proposing stupid solutions, I see it as a positive thing. "damn, this is garbage, here is a much better solution ..." - i know, not really efficient, but enjoyable :)
The current pricing model (for plus) feels deliberately confusing to me, I can never really tell if I'm nearing any kind of limit with my account since nothing really seems to tell me.
Using connectrpc was a pretty refreshing experience for me. Implementing a client for the HTTP stuff at least is pretty easy!
I was able to implement a basic runner for forgejo using the protobuf spec for the runner + libcurl within a few days.
Bumped into your project a while back - pretty impressive. I was a little disappointed it seemed to just convert the resources rather than use the original runtime formats (since there are a features that don't directly translate to gltf), but for a viewer it's perfectly reasonable.
Are you planning on supporting tribes 1 maps at all?
Theres still quite a surprising interest in reverse engineering and extending the life of torque games. I'm hoping on publicly releasing a refresh of the original torque codebase this year which improves support for modern platforms including wasm.
It's amazingly easy these days to reverse engineer stuff and revive old codebases!
It would be possible to have it decode the .dts and .dif formats on demand - that was my original plan - just much less efficient for users, as the .glb files are about 1/5 of the file size on average. (I also assume glTF loading/rendering has had a lot more optimization work put into it than what I'd be able to accomplish.) For these reasons it seemed more productive to have it work on the Blender addons as a starting point rather than JavaScript/TypeScript parsers for the original formats. I still ship the original assets alongside the .glb files (meaning they have URLs just aren't loaded) in case I want to switch it someday.
Some of the custom features you may be referring to I implemented as custom properties in the glTF output - like surface flags. "Outside Visible" is one example, it's a flag baked into each .dif surface that determines whether rays can reach it from the outside, so the engine knows whether to apply the map's directional sunlight, or just ambient and light map lighting. So, even though it technically could try to render with modern PBR, dynamic lighting/shadows and all that, it instead renders as close to the original as possible using the same (or similar) techniques. Comparing screenshots with actual Tribes 2 renders is often indistinguishable unless you really know what to look for!
I really wanted to like it but the UI always put me off. Also tending to prefer a more open development model these days. Thankfully at least for dev gitea and forgejo have both come a long way and the CI is pretty decent now (though they still dont have a gui workflow builder!).
OneDev is not fully open (there are some module for paid user), but Robin is truly available, even if the only and main developer, on every ticket you open or feature you request.
The Settlers 2 was one of my favorite games growing up - really felt like they polished up the mechanics of the first game and made the UI more tolerable.
If anyone is looking for a more modern 3d equivalent but in a slightly different setting, I'd recommend The Colonists.
The way I see it, S2 was pretty lazy. They took a system that was fairly polished already and tinkered with it without understanding how it would impact the whole, like how they made a level-up system that heavily incentivizes a degree of micromanagement the UI isn't built to support.
Or take the pig farm: Clear pros and cons in S1; in S2 it's just a bad bakery. Or the perpetually broken ship navigation, and no way to do naval invasions.
The article unfortunately feels more like a rant than a good exploration of the problem space.
reply