Hugo Hacker News

Reversing LZ91 from Commander Keen

pdw 2021-08-17 21:01:02 +0000 UTC [ - ]

The article doesn't make this explicit, but this game is just using LZEXE. This was an executable file compressor that was very widely used in the early 90s. An early project by Fabrice Bellard!

TacticalCoder 2021-08-17 22:55:10 +0000 UTC [ - ]

Oh my goodness! I spend countless hours using LZEXE back in the days and never made the link. I never realized it was made by Fabrice Bellard!

From his homepage:

> "I wrote LZEXE in 1989 and 1990 when I was 17."

Incredible.

_kdave 2021-08-17 22:30:31 +0000 UTC [ - ]

Also tools like CUP386 did that for free, but anyway interesting read.

albertzeyer 2021-08-17 21:00:00 +0000 UTC [ - ]

Note that Commander Keen itself was also reverse engineered. The most active project is Commander Genius: https://clonekeenplus.sourceforge.io/ https://github.com/gerstrong/Commander-Genius (You will find some code by me in this. :))

indentit 2021-08-17 18:51:08 +0000 UTC [ - ]

Forgive me, as I've never disassembled anything myself(!), but would it not be helpful to be able to disassemble an executable into pseudo-code or something (I guess ideally something a bit higher level but re-compilable) alongside the assembly language? It seems to me that it could be much easier to understand what is happening that way, no?

AnIdiotOnTheNet 2021-08-17 19:39:14 +0000 UTC [ - ]

You'd think so, but it turns out that reversing compiled code in an automated fashion doesn't usually produce very readable results:

https://derevenets.com/examples.html

Akronymus 2021-08-18 11:14:06 +0000 UTC [ - ]

Sometimes even assembly is too high level too: https://www.youtube.com/watch?v=eunYrrcxXfw

mywittyname 2021-08-17 19:50:33 +0000 UTC [ - ]

Sometimes you get lucky and debug information is left in the binary.

davikrr 2021-08-17 20:10:10 +0000 UTC [ - ]

Hex-Rays begs to differ.

mips_avatar 2021-08-17 20:16:49 +0000 UTC [ - ]

Some delta compressors like Google Courgette actually do this.

mschuster91 2021-08-17 19:16:15 +0000 UTC [ - ]

It's not possible in the most cases - unless you have the exact same version of the compiler that was used and you can figure out the build settings that were used (especially optimizations, but also stuff like include order), you can't recreate the assembler code from pseudo / C code.

Modernizations are especially tricky. Modern compilers can do all sort of weird magic, sometimes combining two or more lines of code into one instruction. Old school compilers don't optimize much which is part of why performance-critical parts of game engines were written in Assembler for a long time.

Not to mention that some stuff you can do in Assembler has no equivalent in higher-level code (e.g. dealing with raw stack frames), and even Assembler to byte code is nowhere near 1:1 reversible.

bugfix 2021-08-17 19:47:09 +0000 UTC [ - ]

You might not get the exact same code, but it is certainly possible to generate C/pseudo-code from the binary.

IDA Pro and Ghidra can identify functions and generate the equivalent C code. I know that this is not the original code, but it does help a bit when you are trying to get an idea of what a large function doing.

kaoD 2021-08-17 20:10:28 +0000 UTC [ - ]

You're both right.

I've used Ghidra to reverse-engineer a game's serialization format[0] and, even though the C-ish result was marginally better than manually tracking registers across the disassembly, it was far from understandable.

A great deal of the work was cleaning up the resulting C into something that a human would've written instead of the garbage ASM-with-C-syntax that Ghidra produced.

That is nowhere near what OP was suggesting (although useful nonetheless).

[0] https://github.com/alvaro-cuesta/townsclipper

mschuster91 2021-08-17 20:29:40 +0000 UTC [ - ]

I'm actually reverse-engineering a game myself... interestingly, for me Ghidra produces very good results, way better than IDA did ten years ago. On the other hand I may be lucky simply because 1996 Borland C++ is a pretty dumb, unoptimizing compiler and there is absolutely no copy protection or whatever present in the game, not even a dead code elimination.

Only thing where Ghidra lacks any form of knowledge of is how to deal with the FS register that is used for SEH on win32... it just marks it as in_FS_offset with no way to tell it that it can replace FS:[0xXX] with appropriate TIB access macros.

stevekemp 2021-08-18 05:37:21 +0000 UTC [ - ]

Off-Topic but the domain-name for that site is perfect. (I wonder how many people these days would even recognize it!)