Briefly: all .DAT files are archives used in our VFS. There is an unsigned long (4 bytes each) value at the beginning of the archive which indicates the number of files contained /2, then there is another ulong, an unknown value, but it's always set to 0x0000000B, so who cares about it.
The last important part is... INDEXES! They are managed as unsigned shorts (2 bytes each). First ushort is the pointer (you have to multiply it for 2048 in order to obtain the actual pointer), next ushort is file size (multiply for 2048 here, too).
Ok guys, now we need only a utility for dumping all that crap out of data files.
One is down, what's the next?
Time to play a little game of bump the thread.
So I'm coming off a couple of days of fever and chicken noodle (not really; I can't stand the smell of that crap
) and decided that learning some basics on the Playstation's hardware would make me feel better. I'm weird. I know.
Anyway, I want to make sure I'm understanding this right. I'm gonna take the first 0x10 bytes of SCENARIO.DAT and use that as an example so we can all learn something, and, hopefully, jump start some sort of discussion.
===================================
OFFSETS:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
===================================
96 01 00 00 0B 00 00 00 01 00 0A 00 0B 00 08 00
For the record, all the values are store Least Significant Byte to Most Significant Byte (LSB and MSB respectively for those who don't know what the hell's going on
). That threw me for a bit of a loop in the beginning, so heads up chuckle heads.
Following what Gemini said, the first four bytes (00-03) determine how files are contained in the archive, so we can take 0x00000196 and divide that by two and get 0xCB or 203 in decimal. I'm not if there's (0x196 - 1) files and the number begins at zero, which would bring the count down to 202 decimal, or if there is an actual 0x196th file. I'm sure a minute or two of calculations would prove this one way or the other, but that's the least of our worries for the time being.
Moving on, we now know how many files there are, great. The next four bytes (04-07) are fluff, so we can ignore them. (I would be interested to know how they're handled internally, if someone with more PSX experience than I could explain that. Are they just ignored altogether or does the PSX require that there be some padding or is that left at the developer's discretion?)
Now, starting at 0x08 we get into the good stuff, the files: who they love and where they live! Ha-ha! I promise, Shannon Doherty is nowhere around. Referring back to Gemini we see that each file has a total of four bytes of information that are required to find it: the two byte relative pointer and the two byte size of the file. Gotcha.
So for the first file we have a pointer of 0x0001 (08-09) and a size, or length in bytes, of 0x000A (0A-0B).
The two byte relative pointer is relative to the
start of the .DAT file,
NOT the CD image. Also, we need to do a little work to get the file's true position in the file. We take the pointer, 0x0001 for our first file, and multiply that by 2048 decimal or ($800 hex). The inquisitive amongst you will ask, "Why did we multiply by 2048?" Sectors, my good boy!
I'm fairly new to CD architecture myself, thus what follows is a crude explanation at best. I'll leave it up to you more familiar with CDs and their high tech wizardry to clear up my discrepancies. CDs aren't read byte-for-byte like we're used to with the NES, SNES, Genesis, and so on. Oh no! They had to be different. Each CD is sectioned off into $800 byte
sectors. When you want a byte from a given sector you have to yank (read) that entire sector into ram and then retrieve the data from there. As far as I know you can have data that starts at the end of Sector N and stops at the beginning of Sector N+1. I liken it to the way NES ROMs have the CHR (graphics) and PRG (code/data) stored separately.
Anyway, back to the file pointers. For the first one we take the pointer, 0x0001, multiply that 2048, and get, tada, 2048 ($800). That's in the second sector of this archive (I'll call this
Sector 1 from here on, since I'm not particularly certain of file system nomenclature and I'm too American to look it up). From this we can deduce that the initial sector, what I'll call
Sector 0, contains all the good stuff.
Now, this is just supposition, but I imagine that a copy of Sector 0 is kept in ram and then each individual file is read from the disc as needed instead of keeping all of the files in a given archive in ram.
Move our little cursors up to $800 in our hex editors and we're at the very beginning of the first file in SCENARIO.DAT. Great, but how does this HELP me? I don't know where the hell the damn thing stops! All those numbers scare me! Oh lord! They wanna eat my babies! Get a hold of yourself for chrissakes, man. That's what the next two bytes are for. In a way, the size plays Robin to the file pointer's (or indexes in Gemini's post) Batman. Hehe...Bat poles.
Any hoo. We go back up to 0A-0B to get the size: 0x000B. Ah-ah, don't you think about dumping those measly eleven bytes and calling them a file, foo. Time for more Sector Math. Sound kinky, no? So we multiply the size, 0x000B, by 2048 (or $800 if you're doing it in hex) and we get the ACTUAL size of the file: $5800 hex and 22528 decimal. Hopefully my clunky explanation gives you some idea as to why we had to multiply by 2048 (
THE SECTOR CONSTANT MWAHAHA!), so we can just move on.
Now we know where the first file begins and how long it is, so we can dump it. Yay. Here's the final information (SCENARIO.DAT in parentheses):
0x00-0x03: Number of Files Contained in Archive (0xCB)
0x04-0x07: Fluff (0x0000000B in all of the archives FOR THIS GAME)
0x08-0x09: File Pointer/Index for File #1 (0x0001 = $800)
0x0A-0x0B: File Size (Length in Bytes) of File #1 (0x000A = $5800)
How to know if you've got this: Figure out where File #2 (File Pointer and Size stored at 0x0C-0x0F in SCENARIO.DAT) is stored and how large it is. Check the spoiler when you're done to see if you're right. Or if I'm wrong, cuz I ain't Einstein, ya feel me? Bliggity bling.
RedComet's Question or the entire reason I spent 30 minutes or better of my life typing this. This is mostly directly at Gemini: how the hell did you figure out that the first four bytes were the number of files in teh archive? The next four being filler I can see from just guess work and maybe even the pointers and sizes, but there's no way I would've ever figured out the number of files in a million years. What's your secret? I'm choking
Klarth's thread on VFS but she ain't a-squealin'! I can't help I'm missing something that's staring me right in my face in that thread. I just can't quite place my finger on it and it's driving me halfway up a wall. Yes, there are footprints on the wall beside me.
Finally, I hope this at least sparks a little
productive discussion that can further PSX, and indeed all CD and post 16-bit systems, hacking just a little. So, speak up no matter how insignificant you are and maybe we'll *all* learn something. If none of you bums learned anything, well, hopefully you got a chuckle out of it at least. I know I did, but then again I'm hyped up on medicine and coffee.