Author
|
Topic: Four Swords Adventures text obfuscation (Read 505 times)
|
Zuqkeo
Guest
|
|
« on: July 09, 2007, 02:31:51 am » |
|
Hello people, I tried to create a text dump for FSA on the GCN, but I ran into a problem. I managed to locate all the data, but it looks really weird! Here's a fraction of the beginning of the text data. LinkCan somebody explain what is going on here... It seems as if there are random characters in the dump at certain locations. Just as if Nintendo wanted to deter outsiders from people sneaking in.
|
|
|
|
Spikeman
Guest
|
|
« Reply #1 on: July 09, 2007, 02:53:06 am » |
|
Looks like control codes to me.
|
|
|
|
Zuqkeo
Guest
|
|
« Reply #2 on: July 09, 2007, 02:59:05 am » |
|
I initially thought of that too, but those characters also appear IN a words. Take for example this:
"Cÿave of Nÿo Return" or "Deathÿ Mountaiþn" or "SwaßmpÀ½ Infiÿltration"... To name a few. If they are control codes then there should be a distinct pattern in them, but there is none! I mean what control code could be needed between the C and a of "Cave of No Return"... It doesn't make any sense to. In Panel De Pon (the translation project I did) I came across control charas too, but there I could easily find the pattern (i.e. font color, font set, etc.). But here most of the characters make the texts look like debris.
|
|
|
|
labmaster
Guest
|
|
« Reply #3 on: July 09, 2007, 03:08:31 am » |
|
I'll need to take a closer look, but my first guess would be that data is actually compressed. Note that at the start there are 8 bytes between FF's, and the data is all uncompressed.
|
|
|
|
Zuqkeo
Guest
|
|
« Reply #4 on: July 09, 2007, 03:27:53 am » |
|
hmm... that's a possibility, but to me it looks like it's rather silly to make the words longer after compressing them. It might also be possible that this game uses a dictionary for certain most often appearing words (or even a mix)...
|
|
|
|
labmaster
Guest
|
|
« Reply #5 on: July 09, 2007, 04:15:39 am » |
|
It's lzss, 8 bit flags (1 - uncompressed, 0 - compressed). Data for the chunk follows - uncompressed is just a raw byte, compressed is a 16-bit length-distance pair.
For e.g. the stuff between Hyrule Castle and Village of the Blue Maiden spells out 'The Coast'. After the 'Th' there's the flags, then an ld-pair that points to the 'e C' from 'Hyrule Castle'. Then there's the uncompressed 'o', followed by an ld pointing to 'ast' from 'Castle'.
I'll leave you to figure out the format of the length-distance pairs (I'd hazard that the lengths are the actual length minus an offset, say 2 or 3, whilst the distances are raw).
|
|
|
|
Zuqkeo
Guest
|
|
« Reply #6 on: July 09, 2007, 04:28:13 am » |
|
:thumbsup: Thanks a lot! Now I see what the problem was. I'll go and see if I can find out how it works. I'll keep you informed of any progress.
EDIT:
Well, I figured it out. It's like:
1st byte high nibble: length in characters + 2 1st byte low nibble: distance * 256 2nd byte: distance + 2
However I ran into a new problem here. Upon decompression, I get an index out bound error, because suddenly a very high number appeared in the low nibble of the first byte. Resulting in a negative value if I substract it from the current position... I guess they implemented some additional features in the compression. The first error occurs after reading "Ice T", because then I get "0x05, 0x70" which reads out as:
length = 0x05 / 16 + 2 = 0 + 2 = 2 distance = 0x05 % 16 * 256 + 0x70 + 2 = 1280 + 112 + 2 = 1394 Shocked
[edit] In the mean while I found out that because of the 8-bit flag the distance length pair is actually a triplet of some kind... It should be read as:
"0x0570FA".
Still don't know what that means, but at least there's progress here.
|
|
« Last Edit: July 10, 2007, 11:32:58 am by Zuqkeo »
|
|
|
|
|