+  RHDN Forum Archive
|-+  Romhacking
| |-+  ROM Hacking Discussion
| | |-+  [BM 00] The Virtual File System
Pages: 1 2 [3]
Author Topic: [BM 00] The Virtual File System  (Read 2 times)
satsu
Guest
« Reply #30 on: December 17, 2006, 10:47:40 am »

http://www.yuudachi.net/bm00-dev/datex.rar
MegaManJuno
Guest
« Reply #31 on: December 17, 2006, 03:04:30 pm »

I'm mirroring this locally as well, just in case it's not accessible elsewhere for some reason.  If you need a copy of it from me, send me a PM and I'll shoot you a link.
Spikeman
Guest
« Reply #32 on: December 21, 2006, 03:22:09 pm »

What file do I use the extractor on?

I have these files:

bm00.ccd
bm00.cue
bm00.img
bm00.sub

Do I have to somehow extract the DAT files from one of those?
KaioShin
Guest
« Reply #33 on: December 21, 2006, 03:24:09 pm »

That's only the disc image. You have to burn it or emulate it with a vitual drive. The dat files are on the disc.
Spikeman
Guest
« Reply #34 on: December 21, 2006, 03:34:27 pm »

Oh okay, but which one of those files, the .img right?
MegamanX
Guest
« Reply #35 on: December 21, 2006, 03:57:13 pm »

Yes, it should be the .img file.  It's like...687mb or something?  I looked at it earlier this week.
Spikeman
Guest
« Reply #36 on: December 21, 2006, 03:59:28 pm »

Thanks, I have it figured out and working now. Cheesy
Nightcrawler
Guest
« Reply #37 on: December 31, 2006, 04:52:19 pm »

Would a centrally accessible and open FTP space be useful for this? Or have you guys got everything under control already with whatever it is you're doing so far?
Aerdan
Guest
« Reply #38 on: December 31, 2006, 07:19:31 pm »

I offered space on my server through bzr, but since no one's taken me up on that, I don't think it's needed. Either that, or no one who said they were interested actually *was*.
RedComet
Guest
« Reply #39 on: March 01, 2007, 08:37:07 am »

Quote from: Gemini on December 12, 2006, 09:34:54 pm
Briefly: all .DAT files are archives used in our VFS. There is an unsigned long (4 bytes each) value at the beginning of the archive which indicates the number of files contained /2, then there is another ulong, an unknown value, but it's always set to 0x0000000B, so who cares about it. Tongue The last important part is... INDEXES! They are managed as unsigned shorts (2 bytes each). First ushort is the pointer (you have to multiply it for 2048 in order to obtain the actual pointer), next ushort is file size (multiply for 2048 here, too).

Ok guys, now we need only a utility for dumping all that crap out of data files. Cheesy One is down, what's the next? Tongue

Time to play a little game of bump the thread.

So I'm coming off a couple of days of fever and chicken noodle (not really; I can't stand the smell of that crap Tongue) and decided that learning some basics on the Playstation's hardware would make me feel better. I'm weird. I know.

Anyway, I want to make sure I'm understanding this right. I'm gonna take the first 0x10 bytes of SCENARIO.DAT and use that as an example so we can all learn something, and, hopefully, jump start some sort of discussion.

Code:
===================================
OFFSETS:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
===================================
96 01 00 00 0B 00 00 00 01 00 0A 00 0B 00 08 00

For the record, all the values are store Least Significant Byte to Most Significant Byte (LSB and MSB respectively for those who don't know what the hell's going on Tongue). That threw me for a bit of a loop in the beginning, so heads up chuckle heads.

Following what Gemini said, the first four bytes (00-03) determine how files are contained in the archive, so we can take 0x00000196 and divide that by two and get 0xCB or 203 in decimal. I'm not if there's (0x196 - 1) files and the number begins at zero, which would bring the count down to 202 decimal, or if there is an actual 0x196th file. I'm sure a minute or two of calculations would prove this one way or the other, but that's the least of our worries for the time being.

Moving on, we now know how many files there are, great. The next four bytes (04-07) are fluff, so we can ignore them. (I would be interested to know how they're handled internally, if someone with more PSX experience than I could explain that. Are they just ignored altogether or does the PSX require that there be some padding or is that left at the developer's discretion?)

Now, starting at 0x08 we get into the good stuff, the files: who they love and where they live! Ha-ha! I promise, Shannon Doherty is nowhere around. Referring back to Gemini we see that each file has a total of four bytes of information that are required to find it: the two byte relative pointer and the two byte size of the file. Gotcha.

So for the first file we have a pointer of 0x0001 (08-09) and a size, or length in bytes, of 0x000A (0A-0B).

The two byte relative pointer is relative to the start of the .DAT file, NOT the CD image. Also, we need to do a little work to get the file's true position in the file. We take the pointer, 0x0001 for our first file, and multiply that by 2048 decimal or ($800 hex). The inquisitive amongst you will ask, "Why did we multiply by 2048?" Sectors, my good boy!

I'm fairly new to CD architecture myself, thus what follows is a crude explanation at best. I'll leave it up to you more familiar with CDs and their high tech wizardry to clear up my discrepancies. CDs aren't read byte-for-byte like we're used to with the NES, SNES, Genesis, and so on. Oh no! They had to be different. Each CD is sectioned off into $800 byte sectors. When you want a byte from a given sector you have to yank (read) that entire sector into ram and then retrieve the data from there. As far as I know you can have data that starts at the end of Sector N and stops at the beginning of Sector N+1. I liken it to the way NES ROMs have the CHR (graphics) and PRG (code/data) stored separately.

Anyway, back to the file pointers. For the first one we take the pointer, 0x0001, multiply that 2048, and get, tada, 2048 ($800). That's in the second sector of this archive (I'll call this Sector 1 from here on, since I'm not particularly certain of file system nomenclature and I'm too American to look it up). From this we can deduce that the initial sector, what I'll call Sector 0, contains all the good stuff.

Now, this is just supposition, but I imagine that a copy of Sector 0 is kept in ram and then each individual file is read from the disc as needed instead of keeping all of the files in a given archive in ram.

Move our little cursors up to $800 in our hex editors and we're at the very beginning of the first file in SCENARIO.DAT. Great, but how does this HELP me? I don't know where the hell the damn thing stops! All those numbers scare me! Oh lord! They wanna eat my babies! Get a hold of yourself for chrissakes, man. That's what the next two bytes are for. In a way, the size plays Robin to the file pointer's (or indexes in Gemini's post) Batman. Hehe...Bat poles. Grin

Any hoo. We go back up to 0A-0B to get the size: 0x000B. Ah-ah, don't you think about dumping those measly eleven bytes and calling them a file, foo. Time for more Sector Math. Sound kinky, no? So we multiply the size, 0x000B, by 2048 (or $800 if you're doing it in hex) and we get the ACTUAL size of the file: $5800 hex and 22528 decimal. Hopefully my clunky explanation gives you some idea as to why we had to multiply by 2048 (THE SECTOR CONSTANT MWAHAHA!), so we can just move on.

Now we know where the first file begins and how long it is, so we can dump it. Yay. Here's the final information (SCENARIO.DAT in parentheses):

Code:
0x00-0x03: Number of Files Contained in Archive (0xCB)
0x04-0x07: Fluff (0x0000000B in all of the archives FOR THIS GAME)
0x08-0x09: File Pointer/Index for File #1 (0x0001 = $800)
0x0A-0x0B: File Size (Length in Bytes) of File #1 (0x000A = $5800)

How to know if you've got this: Figure out where File #2 (File Pointer and Size stored at 0x0C-0x0F in SCENARIO.DAT) is stored and how large it is. Check the spoiler when you're done to see if you're right. Or if I'm wrong, cuz I ain't Einstein, ya feel me? Bliggity bling.


RedComet's Question or the entire reason I spent 30 minutes or better of my life typing this. This is mostly directly at Gemini: how the hell did you figure out that the first four bytes were the number of files in teh archive? The next four being filler I can see from just guess work and maybe even the pointers and sizes, but there's no way I would've ever figured out the number of files in a million years. What's your secret? I'm choking Klarth's thread on VFS but she ain't a-squealin'! I can't help I'm missing something that's staring me right in my face in that thread. I just can't quite place my finger on it and it's driving me halfway up a wall. Yes, there are footprints on the wall beside me. Embarrassed

Finally, I hope this at least sparks a little productive discussion that can further PSX, and indeed all CD and post 16-bit systems, hacking just a little. So, speak up no matter  how insignificant you are and maybe we'll *all* learn something. If none of you bums learned anything, well, hopefully you got a chuckle out of it at least. I know I did, but then again I'm hyped up on medicine and coffee.
KaioShin
Guest
« Reply #40 on: March 01, 2007, 08:56:08 am »

I also had to extract a archive file for Haruhi, so I dealt with this topic myself before. Gemini figured out that format in 4 minutes, just by looking at it Wink By now, I can do it too. The magic is simply that almost all archives will have the information on how many files there are somewhere.
It became quite obvious for me when I wrote the program to extract the data. Without this information you are pretty much screwed. With that info you can easily write a perfect loop which extracts all the files. If you check for end of file manually on every byte you read the program will be much slower and more messy. The developers need to pack and extract these files themselves, so it makes sense to do the format in a way which makes this as easy as possible. You can just count on it that almost every archive format will have this information - and right at the beginning most likely. It won't make sense to put this information right in the middle of a table of pointers, it seems natural that it will stand beforehand.

Hope I could help you a little, Gemini can possibly explain it much better though.
Nightcrawler
Guest
« Reply #41 on: March 01, 2007, 09:35:10 am »

While you've dug this topic up:

It would probably be a good idea to extract some of this information from the ex community project and stick it into documents for archive on the site. Eventually, this board will get pruned and information will be lost. That's what the database is for, information that you don't want lost. Not to mention, it will be easier to find there.
Aerdan
Guest
« Reply #42 on: March 04, 2007, 08:54:45 am »

I pointed this out in another thread, but PSX disk sectors are not 2048 bytes/sector. They are 2352 bytes. CDs used by computers [audio & data] are 2048 bytes/sector. [I have this on the best of authority [a PSX dev and the author of pSX], too, so...]
Gemini
Guest
« Reply #43 on: March 04, 2007, 10:42:38 am »

Quote from: Aerdan on March 04, 2007, 08:54:45 am
I pointed this out in another thread, but PSX disk sectors are not 2048 bytes/sector. They are 2352 bytes. CDs used by computers [audio & data] are 2048 bytes/sector. [I have this on the best of authority [a PSX dev and the author of pSX], too, so...]
Even if that's totally true, it doesn't make much of a difference. User data is 2048 bytes/sector, excluding 2336 bytes/sector XA sectors (which Windows doesn't support), as I said before.
Griever
Guest
« Reply #44 on: June 29, 2008, 01:36:29 am »

Frankly speaking, I've read all psx archives stuff in this forum but just can't get structure in this case. Here is another PSX file system example: Gran Turismo 2 (any disc will do - they both have the same archive type).
It has archive in file GT2.vol I've discovered pointers (in the very beginning of the .vol file) format: 32bit.
 Sector := Ptr shr $0b;
 Free_Space_In_The_Last_Sector_of_The_File := Ptr and $7FF;
BUT there is a mess in actual file/folder stuff. I just can't get that structure. Oh well, it has file/folder names beginning at 0xB000 of .vol file. And it has some binary data before each name. Guess it is file/folder properties, but analysis showed - it is not:

folder's words:
a81f = 1010100000011111
a819 = 1010100000011001
a820 = 1010100000100000
9248 = 1001001001001000
654c = 0110010101001100
7c6a = 0111110001101010

File's words
2b51 = 0010101101010001
9250 = 1001001001010000
2564 = 0010010101100100
a410 = 1010010000010000
6511 = 0110010100010001

I just considered file if it's name has an extension (for example .tim). And as you can see, there is no common bits in either case...
The last thing I've noticed is that some file names looks like $1e$1e, which means ".." name. It looks pretty like new folder or something like this, but I don't know how to use it anyway.

Pages: 1 2 [3]  


Powered by SMF 1.1.4 | SMF © 2006-2007, Simple Machines LLC