Alright so I got the time tonight to read through the program Euclid wrote up as well as his commented CPU code.
I figured out....
The pointer table in the ROM gives the location of the compressed block of graphics followed by 2 "length" bytes that represent the length that the uncompressed data will be, so it knows when to stop the routine.
The first byte in each block of compressed graphics is a "codeByte" as Euclid calls it. This byte represents where a "repeat length" byte followed by an "amount of bytes to repeat" byte can be found.
I also figured out that a new code byte is found after every 7 cycles of data are read. A cycle of data is referring to either a single byte being read or a series of bytes being copied, both of which count as one cycle.
For example a compressed graphics may look like:
codeByte, data, data, data, data, data, repeat length, amount of bytes to repeat, data, codeByte, data, data...
I believe this is correct. I could be a little off right now since it is 1AM...
I have a feeling I'm going to need to look more in depth at some CPU assembly to understand what is done with the codeBytes to derive from them where the repeat lengths are...
It seems like once I figure that out I can somewhat make a re compressor. I am also thinking that since every 7 data cycles a codeByte is found that the maximum length of a data that can be repeated is 6 bytes. So the re compressor would need to look for series of repeated bytes to compress them.
I hope I am moving in the right direction! Now that it is 2AM I better get some sleep... :police: . If anyone has any thoughts on what I figured out please let me know if I'm going about this correctly!
Here is Euclid's decompression routine commented a little more in depth so I could understand it better..
// The first byte of he compressed data is a "codeByte." A new codebyte is read every 7 cycles.
// codeByte values are used to point out where len values can be found that signify bytes to be copied.
// The next bytes are actual data bytes.
// Whenever one or more bytes can be repeated a "len" byte can be found followed by a byte that tells
// how many bytes back the block of bytes to be copied is. The block of data is then copied for the set "len" value
// codeByteBitTest is used to keep track of the fact that a new codebyte is to
// be read from the compressed block after 7 cycles of (reading a byte)/(copying bytes)
#include "stdafx.h"
#include <stdio.h>
unsigned char readBuf[0xFFFF]; //
unsigned char writeBuf[0xFFFF];
unsigned int length = 0x3C0; // Length is based on pointer table's data
void decompress()
{
unsigned int i = 0, j = 0; // i is readbuffer location, j is writebuffer location
unsigned char codeByte = readBuf[i]; // Read first byte of compressed block
unsigned int codeByteBitTest = 7; // intial value of codeByteBitTest is always 7
unsigned int count = 0; // Count initially starts at 0
i++;
while (count < length) // run decompression routine until counter reaches "length"
{
if ((codeByte & (1 << codeByteBitTest)) != 0) // If (codeByteBitTest bit shifted left 1) & codeByte != 0
{ // then find the read length and copy bytes!
//read length
unsigned char nextByte = readBuf[i]; // read initial byte that has to do with read length
unsigned int len = nextByte /4; // len is the amount of bytes to be written
unsigned int start_pos;
unsigned int k, oldJ;
i++;
count += len;
start_pos = (nextByte & 3) * 256 + readBuf[i];
i++;
//copy bytes
oldJ = j;
for (k = 0; len != 0; k++) // stop copying bytes when (initial byte / 4) = 0
{
writeBuf[j] = writeBuf[(oldJ - start_pos) + k]; // read a series of previous bytes and copy them until
j++; // "len" length is met
len--;
}
}
else // If (codeByteBitTest bit shifted left 1) & codeByte == 0
{
//read one byte
writeBuf[j] = readBuf[i];
i++;
j++;
count++;
}
if (codeByteBitTest == 0) // When the codeByteBitTest reaches zero it is reset to 8
{
codeByte = readBuf[i]; // The next byte in the compressed data is the new codeByte
i++;
codeByteBitTest = 8;
}
codeByteBitTest--; // After each byte or copied bytes are read codeByteBitTest is decreased...
}
}
int _tmain(int argc, _TCHAR* argv[])
{
FILE* fp = fopen("mmx3.smc","rb");
FILE* wp = fopen("out.bin","w");
fseek(fp,0x200000,0);
fread(readBuf,sizeof(unsigned char),0xFFFF,fp);
fclose(fp);
decompress();
fwrite(writeBuf,sizeof(unsigned char),0xFFFF,wp);
fclose(wp);
return 0;
}
EDIT:
Just noticed mathonNapkins wrote a compression routine. Thanks I'll check it out!
On a side note as I was looking through some of the data it was generally getting bigger due to all the codebytes most of the time! I suppose it pays off in the end with those kind of compression percentages that were mentioned! I am new to compressions though so what do I know!
Also what is this supposed to be? I notice that readBuf does not have an array position!
if(!blockSize)
writeBuf[k++] = readBuf;