Long post follows. Don't say I didn't warn you.
So I've decided to try my hand at translating. I've decided upon Simple DS Series Vol. 39 - The Shouboutai, because it's an interesting game, and there's not that much text to translate - so I figure it's a good place to start.
I have some experience with ROM hacking from Pokémon games, mainly some simple text hacking in the GBA games and various kinds of data extraction in both the GBA and DS games. I've also written tools for parsing NDS .NARCfiles and the filesystem on DS ROMs, so I'm not a complete newbie.
I'm struggling a bit with figuring out how I'm supposed to go about this one, though. Obviously, the first thing I do is attempt to locate some of the game text. I've based my searches on this screenshot:
I looked for the "365" part, since it seemed easy to find - and it was. Turns out the strings are embedded in the ARM9 binary, this particular part starting at 0x31D60. The kanji is followed by the Shift-JIS representation of the "day" kanji.
So far, so good. I've determined the file the strings are contained in, and that they use SJIS for text encoding.
Then I spot a problem: Having opened the file in Vim to look more closely at the text, it doesn't decode properly:
Notice the XX part. Checking again in the hex editor shows this:
Clearly, the text is compressed - and the 0x00 at 0x31D68 makes me think of LZSS. If I'm not mistaken (which is certainly a possibility), this also seems to fit fairly well with this section I found in the same file (not sure where that text is used, but it's there):
And this brings me to my problem: How do I figure out where the texts begin? Obviously, I can't just run the entire arm9.bin file through an LZSS decompressor - after all, not everything is compressed data. I could try to look for text pointers, but I would - once again - not be sure where to start. After all, if this is LZSS, it would have to be relative to the decompressed string, and I don't know where in RAM it decompresses to. I don't know of any emulator that lets me do a memory search to find it and possibly figure out where text starts - except the debugger versions of No$GBA, but various posts around here and elsewhere suggest you can't even buy it now if you wanted to.
I've uploaded the arm9.bin file in case people need/want to look at it.
http://download.birdiesoft.dk/arm9.bin (231KB)
Any ideas would be appreciated. Let me know if you need more info.