+  RHDN Forum Archive
|-+  Romhacking
| |-+  ROM Hacking Discussion
| | |-+  Simple DS Series vol. 39: The Shouboutai - decompressing text?
Pages: [1]
Author Topic: Simple DS Series vol. 39: The Shouboutai - decompressing text?  (Read 2 times)
Pidgeot
Guest
« on: September 15, 2008, 02:27:16 pm »

Long post follows. Don't say I didn't warn you. Smiley

So I've decided to try my hand at translating. I've decided upon Simple DS Series Vol. 39 - The Shouboutai, because it's an interesting game, and there's not that much text to translate - so I figure it's a good place to start.

I have some experience with ROM hacking from Pokémon games, mainly some simple text hacking in the GBA games and various kinds of data extraction in both the GBA and DS games. I've also written tools for parsing NDS .NARCfiles and the filesystem on DS ROMs, so I'm not a complete newbie.

I'm struggling a bit with figuring out how I'm supposed to go about this one, though. Obviously, the first thing I do is attempt to locate some of the game text. I've based my searches on this screenshot:



I looked for the "365" part, since it seemed easy to find - and it was. Turns out the strings are embedded in the ARM9 binary, this particular part starting at 0x31D60. The kanji is followed by the Shift-JIS representation of the "day" kanji.

So far, so good. I've determined the file the strings are contained in, and that they use SJIS for text encoding.

Then I spot a problem: Having opened the file in Vim to look more closely at the text, it doesn't decode properly:



Notice the XX part. Checking again in the hex editor shows this:



Clearly, the text is compressed - and the 0x00 at 0x31D68 makes me think of LZSS. If I'm not mistaken (which is certainly a possibility), this also seems to fit fairly well with this section I found in the same file (not sure where that text is used, but it's there):



And this brings me to my problem: How do I figure out where the texts begin? Obviously, I can't just run the entire arm9.bin file through an LZSS decompressor - after all, not everything is compressed data. I could try to look for text pointers, but I would - once again - not be sure where to start. After all, if this is LZSS, it would have to be relative to the decompressed string, and I don't know where in RAM it decompresses to. I don't know of any emulator that lets me do a memory search to find it and possibly figure out where text starts - except the debugger versions of No$GBA, but various posts around here and elsewhere suggest you can't even buy it now if you wanted to.

I've uploaded the arm9.bin file in case people need/want to look at it.

http://download.birdiesoft.dk/arm9.bin (231KB)

Any ideas would be appreciated. Let me know if you need more info.
KC
Guest
« Reply #1 on: September 15, 2008, 03:13:26 pm »

Quote from: Pidgeot on September 15, 2008, 02:27:16 pm
Obviously, I can't just run the entire arm9.bin file through an LZSS decompressor - after all, not everything is compressed data.
Well, that is true. However, most of the code file is compressed.
The game decompresses a large amount of the code and data at the start.
After taking a quick look at it, it's decompressed backward starting from 39F65h down to 4000h.

I've dumped the decompressed code file. Maybe you can use it with a few tweaks in the NDS header (and with the arm9 footer attached, of course), but I haven't tested it. Change 894h to 00 00 A0 E3 to disable the decompression code.

http://rapidshare.com/files/145573065/simple39.rar.html
Pidgeot
Guest
« Reply #2 on: September 15, 2008, 06:46:03 pm »

Thanks a bunch - the ARM9.bin pretty much worked out of the box (after changing that one instruction) and I was able to change texts right away, then repack and see the result.



I took a good look at the assembly after your post, GBATek and ARM reference material in hand, and after a couple of hours scrutinizing it all, it's suddenly much, much clearer what's going on.

If nothing else comes out of this little pet project of mine, at least I've gotten a better understanding of ARM assembly language now. This could come in handy for some other stuff I'm currently working on.

Once again, thanks.
xdaniel
Guest
« Reply #3 on: September 16, 2008, 10:54:50 am »

Since I've got the same problem, I figured I'd just post in this thread instead of opening up a new one. Hope that's okay.

Anyway, Naruto Shippuuden Dairansen also has text inside its ARM9 binary. Missing memory searching functions and the like, I used DeSmuME to make a save state at the title screen, added a .zip extension (a wild guess Tongue) and opened up the actual save state in a hex editor. Hoping that the beginning of the ARM9 binary is identical no matter if compressed or not, I just searched for the first 16 bytes of it inside the save state, and actually found them at 0x100C1FD. Comparing the data there with the compressed file, it seems that 0x4000 is the "barrier" between compressed and uncompressed data here as well, meaning 0x10101FD (0x100C1FD + 0x4000) is where the data from the save state starts to differ from the data inside the ARM9 binary.

Now, two problems arise: First of all, I have no idea where exactly the ARM9 binary ends inside the save state. I think it's somewhere shortly after 0x01052575, since 4 bytes from near the end of the compressed ARM9 binary appear there, but I'm not sure at all. Also, even assuming I've got an uncompressed ARM9 binary with the correct length from the save state, I don't know how to either recompress it when I'm done with it, or how to make the uncompressed one work with the game - the same 4 bytes that Simple DS Series 39's ARM9 has at 0x894 are at 0x89C in Dairansen's ARM9, but, unless they're actually a syscall or somesuch, I can't be sure it replacing them with 0000A0E3 would work there as well.

Here's the save state in case someone's got a bit of spare time to help another "ARM9 victim": http://magicstone.de/dzd/random/6rz-nske_test1.dst.zip

(hope I didn't mess any offsets or anything up, I'm not feeling well today <.<)
Pidgeot
Guest
« Reply #4 on: September 19, 2008, 06:17:33 pm »

I'm not able to try and extract it right now, but I can at least provide you with some information which should help you get started - quite possibly enough to let you try to extract it yourself. I'll see if I can't get it extracted during the weekend or something, though - I don't know what method KC used, but I think your save state might do the trick.

For starters, you're spot on about where the barrier is. The decompression algorithm is actually the same as the one used in Shouboutai, so it's probably originated from the SDK.

The beginning of the ARM9 binary is, by necessity, not compressed. If it were, the DS wouldn't know how to handle it properly.

It just so happens that during the decompression algorithm, the contents of r2 shows where the data goes. The instruction at 0x02000980 calculates this to 0x02064618, so that's the byte after the end of your data. Since the ARM9 binary is offset by 0x02000000 in the memory, that means you'll want to subtract that to get the final size = 0x64618 = 411160 bytes - so dumping that many bytes from the start of the ARM9 binary in your save state should work.

As far as your second problem is concerned, you don't need to recompress it. Simply disabling the decompression algorithm is enough, as the ARM9 binary in memory will end up being the same. The added size won't pose a problem; just unpack the .nds, replace arm9.bin, and repack (that's what I did).

Disabling the algorithm is indeed a matter of changing the instruction you found at 0x0200089C to those bytes. That instruction is ldr r0, [r1, 0x14] - this means "load the value at the memory location (contents of r1+0x14) into r0". Changing those bytes to 0000A0E3 (or E3A00000, as the DS reads it) turns it into a "mov r0, 0x0" - "store the value 0 in r0".

The reason this disables the decompression step is that the first two instructions of that routine are as follows:

cmp r0,#0x0
beq 020009F8

or, in a more readable form: If r0 is equal to 0, continue execution at 0x020009F8. 0x020009F8 is the final instruction in the algorithm: bx r14, which is ARM-speak for "return". In other words, if r0=0, it can't do anything meaningful, so it returns - r0 is supposed to point to somewhere it can get information about the compressed data. The check for 0 is checking if it actually points to something (meaning, it's not a null pointer).

Like I said, I'm not able to dump and test this particular one right now, but given that the principle is the same, and my dump worked straight away, it should work - at least assuming it really is "raw" memory you've got in that dump.
Pages: [1]  


Powered by SMF 1.1.4 | SMF © 2006-2007, Simple Machines LLC