Author
|
Topic: Need help on possible decompression of a file. (Read 2 times)
|
Babazoz
Guest
|
|
« on: January 26, 2008, 06:53:58 pm » |
|
:banghead: I have an earlier thread here about my project to translate the Haruhi Suzumiya game for the PSP. The text in the game is contained in about 640 DAT files, which seem to be partially compressed script files. The text is definitely editable and readable, but attempting to add in bytes either results in a crash or voices not playing. Well, I've tried every way I can to hack this file to allow for text beyond the character limits without crashing the game. I've compared, searched, added, removed, and every attempt comes up nil. I've come to the conclusion that this file is at least partially compressed in key areas, that if uncompressed, could allow editing beyond the byte limit for each section, but if it is, in fact, compressed, I'll be damned if I can find out what it is. I really don't want to have to butcher any translations (any more than necessary) that come our way with this, so I'm coming to you guys, and hoping someone can help out with this, either by seeing a way to decompress, edit, then recompress these files, or if a hack can be worked in to allow going beyond the byte limit. I've already found that the file stores total file length and the length of the actual script in values near the beginning of the file, and that each section has a value which represents number of bytes in that section. I've spent a ton of time and coffee on this, and now I'm asking for help. The file is in the link below (4K), and if anyone can point me in the right direction, it would be greatly appreciated. http://www.mediafire.com/?53ckjgcsy2d
|
|
|
|
Ryusui
Guest
|
|
« Reply #1 on: January 26, 2008, 09:44:22 pm » |
|
Sounds like what you're up against isn't compression: it's metadata.
Could be something as simple as text pointers, which every romhacker should know how to work with (and the famed Atlas contains functions for handling). If it's anything more complicated, though, like checksums or file length headers, you may have to write your own custom script editor/inserter.
|
|
|
|
Babazoz
Guest
|
|
« Reply #2 on: January 27, 2008, 08:25:01 am » |
|
Sounds like what you're up against isn't compression: it's metadata.
Could be something as simple as text pointers, which every romhacker should know how to work with (and the famed Atlas contains functions for handling). If it's anything more complicated, though, like checksums or file length headers, you may have to write your own custom script editor/inserter.
Have you looked at the file? I mean, from looking at it in a hex editor, it seems that at least part of it is compressed, but you may be right. Like I said, there's a hex value at the top of the file, that when flipped, gives the size of the file in bytes.
|
|
|
|
Ryusui
Guest
|
|
« Reply #3 on: January 27, 2008, 02:18:10 pm » |
|
"When flipped". What you mean to say is that it's in little-endian format; the LSB ("Least Significant Byte") comes first (as opposed to the MSB, "Most Significant Byte").
Take a look at the "compressed" data and see if it follows any kind of pattern.
|
|
|
|
Babazoz
Guest
|
|
« Reply #4 on: January 27, 2008, 02:54:58 pm » |
|
"When flipped". What you mean to say is that it's in little-endian format; the LSB ("Least Significant Byte") comes first (as opposed to the MSB, "Most Significant Byte").
Take a look at the "compressed" data and see if it follows any kind of pattern.
It seems to be following one, it's all over the place, though. A couple of repeating patterns here and there, then different patterns. Then filenames, then more patterns. Maybe I'm wrong and it's not compressed. I don't know.
|
|
|
|
Ryusui
Guest
|
|
« Reply #5 on: January 27, 2008, 04:05:55 pm » |
|
Filenames, huh? Sounds like my experience with Death Note: The Kira Game for DS. ^_^
Okay. Repeating patterns. Do the LSBs of any patterns correspond with the starting offsets of any of the strings in the file? If so, those are string pointers. You may have to write a custom script inserter/extractor to deal with this...
|
|
|
|
Babazoz
Guest
|
|
« Reply #6 on: January 27, 2008, 04:14:04 pm » |
|
Filenames, huh? Sounds like my experience with Death Note: The Kira Game for DS. ^_^
Okay. Repeating patterns. Do the LSBs of any patterns correspond with the starting offsets of any of the strings in the file? If so, those are string pointers. You may have to write a custom script inserter/extractor to deal with this...
I'll check in a sec, but it just so happens we have a nice little script extractor for this game and it was given to us through the comments at my blog and it's the source code. It's a neat little script that takes some tweaking, but gives out pretty dialogue scripts for translators. He also made an insertion script that works very well, too, only because of this problem, it's pretty much restrained to the existing character limit. EDIT: Still looking and trying to find a correllation, but I'm not seeing anything that points to anything significant in the file.
|
|
« Last Edit: January 27, 2008, 04:32:26 pm by Babazoz »
|
|
|
|
Ryusui
Guest
|
|
« Reply #7 on: January 27, 2008, 04:44:26 pm » |
|
Does the data appear after the script text? Before the script text? In the middle of the script text?
If the data is wrapped in text, then There's Your Problem: either you're stomping on this data by accident, or the game expects to find this data in a specific place and you're moving it. In either case, you need to find the pointer - or pointers - that point to the data.
|
|
|
|
Babazoz
Guest
|
|
« Reply #8 on: January 27, 2008, 04:58:14 pm » |
|
Does the data appear after the script text? Before the script text? In the middle of the script text?
If the data is wrapped in text, then There's Your Problem: either you're stomping on this data by accident, or the game expects to find this data in a specific place and you're moving it. In either case, you need to find the pointer - or pointers - that point to the data.
The data is after the script text, for the most part. If you open the file I linked in a hex editor, it basically goes (as far as recognizable data is concerned): File length, script length, sound file ID, sentence length, sentence, and then repeats until the end, where it names off some image files to load, and after that, it's pretty much unreadable.
|
|
|
|
Ryusui
Guest
|
|
« Reply #9 on: January 27, 2008, 05:22:09 pm » |
|
Wait. "Script Length"...as in, the length of the script up to the unreadable data at the end?
|
|
|
|
Babazoz
Guest
|
|
« Reply #10 on: January 27, 2008, 05:48:07 pm » |
|
Wait. "Script Length"...as in, the length of the script up to the unreadable data at the end?
Yep. The total size of the actual dialogue in bytes. I've tried editing THAT value as well as the total file size and the size of the sentences to reflect my changes. No luck. If it's checking sizes against another value, I'd love to know where, because those were the only ones I could find. And seeing how the game more or less interprets the individual DAT files, I don't imagine the size it's checking would be in the main executable, but again, I could be completely wrong about that.
|
|
|
|
Ryusui
Guest
|
|
« Reply #11 on: January 27, 2008, 07:07:16 pm » |
|
Okay, let me see if I've got your situation understood.
You can edit the script, but you're limited to the space available: if your script takes up less space than required, that's fine, but if it takes up more, then the game crashes.
Do your expanded scripts overwrite any of the non-script data, or do you have things set up so that everything gets repositioned to its proper place after the script?
|
|
|
|
Babazoz
Guest
|
|
« Reply #12 on: January 27, 2008, 07:14:47 pm » |
|
Okay, let me see if I've got your situation understood.
You can edit the script, but you're limited to the space available: if your script takes up less space than required, that's fine, but if it takes up more, then the game crashes.
Do your expanded scripts overwrite any of the non-script data, or do you have things set up so that everything gets repositioned to its proper place after the script?
You're half right on the first part. We can't reduce space or enlarge it. It has to be the exact same size. Text can be entered within the existing character data, no more, no less. If the translated sentence is smaller than the original, then it has to be filled with spaces or zeroed out (I think I tried that, doesn't work, I believe). All non-script data is intact after enlarging, either with our own tools, or by inserting bytes with a hex editor.
|
|
|
|
Ryusui
Guest
|
|
« Reply #13 on: January 27, 2008, 07:55:10 pm » |
|
That's classic. What you're up against is that time-honored foe of romhackers everywhere: pointers. Can't believe it took me so long to pick it out; you seem to have a grip on everything else, so I didn't think it would be something this simple.
Here's the bottom line: the pointer table is a great big block of numbers that tells the game where each string begins. How it's formatted will vary from game to game, but somewhere in your mystery data - or perhaps in a related file - there should be pointers which indicate where each string begins. They might not be sequential: in Death Note, there are pointers padded with miscellaneous data that point to other pointers which actually point to the strings, and in Patlabor for SNES, the pointers are apparently mixed in with event code. But they should be there.
Try this for a test run. Use your hex editor's Hex Search function to look for the beginning offset of a string. In little-endian format, mind you. For example, if a string begins at 007F, you'd search for 7F00. If it begins at 1648, you'd search for 4816. And so on. Things might get complicated if the pointers count words instead of bytes or something: are the strings all packed in one after another, or are they padded so that they're aligned by a multiple of 4? If you can't find any pointers using the above method and the strings are indeed padded, try dividing the offset by 4 (the word offset as opposed to the byte offset) and try again. (If each string is prefixed by its length or other data, then you should count the offset where the header begins, not the offset where the text begins, as the string offset.)
|
|
|
|
Babazoz
Guest
|
|
« Reply #14 on: January 27, 2008, 08:08:10 pm » |
|
That's classic. What you're up against is that time-honored foe of romhackers everywhere: pointers. Can't believe it took me so long to pick it out; you seem to have a grip on everything else, so I didn't think it would be something this simple.
Here's the bottom line: the pointer table is a great big block of numbers that tells the game where each string begins. How it's formatted will vary from game to game, but somewhere in your mystery data - or perhaps in a related file - there should be pointers which indicate where each string begins. They might not be sequential: in Death Note, there are pointers padded with miscellaneous data that point to other pointers which actually point to the strings, and in Patlabor for SNES, the pointers are apparently mixed in with event code. But they should be there.
Try this for a test run. Use your hex editor's Hex Search function to look for the beginning offset of a string. In little-endian format, mind you. For example, if a string begins at 007F, you'd search for 7F00. If it begins at 1648, you'd search for 4816. And so on. Things might get complicated if the pointers count words instead of bytes or something: are the strings all packed in one after another, or are they padded so that they're aligned by a multiple of 4? If you can't find any pointers using the above method and the strings are indeed padded, try dividing the offset by 4 (the word offset as opposed to the byte offset) and try again. (If each string is prefixed by its length or other data, then you should count the offset where the header begins, not the offset where the text begins, as the string offset.)
I've been doing something similar to this, but let me ask you this: Would an in-game memory dump at the text I'm trying to translate help? I've tried something similar, but the sizes break down like this: 24MB Dump, 1.7MB Executable, 4K DAT file. I think that finding pointers on sizes that drastically different would be pretty hard to do, right? I'll keep looking for the pointers, in the other files, though (besides the dump).
|
|
|
|
|