Author
|
Topic: Cartographer (Read 2 times)
|
byuu
Guest
|
|
« Reply #15 on: January 05, 2009, 02:04:15 pm » |
|
I'm not talking about filenames. Neither was I, I was talking about displaying the text inside a user interface. But filenames are a huge problem, too. To take advantage of Unicode (UTF-16) on Windows using straight Win32 C ... Converting after the fact was nearly unfeasible. I'll share some of the fun I had porting my apps to Unicode. 1) need to #define UNICODE (cleaner, safer than adding Ws everywhere), which instantly breaks a few hundred to a few thousand Win32 API calls. You'll cry when GCC spits out 4,873 compilation errors. 2) best to write a generic wrapper to turn UTF-8 into UTF-16 on-the-fly: class utf16 { public: operator wchar_t*() { return buffer; } operator const wchar_t*() const { return buffer; } utf16(const char *s = "") { if(!s) s = """"; unsigned length = MultiByteToWideChar(CP_UTF8, 0, s, -1, 0, 0); buffer = new(zeromemory) wchar_t[length + 1]; MultiByteToWideChar(CP_UTF8, 0, s, -1, buffer, length); } ~utf16() { delete[] buffer; } private: wchar_t *buffer; }; ... and vice versa. 3) all of your filename passing fails. Have to convert them to UTF-16 first. This also breaks all your libc file access functions: fopen needs to become _wfopen, mkdir needs to become _wmkdir, etc. This also breaks all your third-party libraries: have fun patching zlib, libjma, etc. 4) int main(int argc, char *argv[]) fails. The non-ANSI parts become question marks, so even converting them to UTF-16 won't let you open the files. Need some serious black magic to get that back to valid UTF-8: int __stdcall WinMain(HINSTANCE, HINSTANCE, LPSTR, int) { //argv[] is in 7-bit ANSI format; Unicode characters are converted to '?'s. //this needs to be converted to UTF-8, eg for realpath(argv[0]) to work. int argc; wchar_t **wargv = CommandLineToArgvW(GetCommandLineW(), &argc); char **argv = new char*[argc]; for(unsigned i = 0; i < argc; i++) { argv[i] = new char[_MAX_PATH]; strcpy(argv[i], utf8(wargv[i])); } 5) all of these changes are Win32-specific, so you have to encapsulate all of them in #ifdef _WIN32, so that it continues to work on pure UTF-8 systems like Linux / BSD / OS X. And if you want unified GUI text for all platforms, then 100% of your Win32 API calls need to wrap UTF-8 -> UTF-16. The best part ... all of this could be avoided, and current apps could transparently gain Unicode support, if Windows would just accept a UTF-8 codepage with the *A functions. The bad news, you pretty much have to do this stuff. If someone has a Windows username that isn't pure ANSI, and you app saves data inside their profile (as it should, apps are supposed to store data in the App Data folder), it will completely fail to save the data without Unicode support. This really pisses off non-English speakers, and for good reason. I had someone on 2ch asking why I hated Japanese people because I couldn't load Japanese-named ROMs >_< The really bad news, most big-name commercial apps can't handle this, either! Winamp, Firefox 2 ... 95% of my applications failed to work at all when I used a non-English profile username.
|
|
« Last Edit: January 05, 2009, 02:09:27 pm by byuu »
|
|
|
|
RedComet
Guest
|
|
« Reply #16 on: January 05, 2009, 04:26:41 pm » |
|
So.... RedComet, what did you program the utility in?
C++.
|
|
|
|
Gemini
Guest
|
|
« Reply #17 on: January 05, 2009, 05:23:19 pm » |
|
There's already a procedure to convert UTF8 to Windows' Unicode: #include "winldap.h" #pragma comment(lib, "wldap32.lib")
int UnicodeToUtf8(CString string, char* &dest) { int strlen=LdapUnicodeToUTF8(string,string.GetLength(),dest,0); dest=(char*)new BYTE[strlen]; LdapUnicodeToUTF8(string,string.GetLength(),dest,strlen); return(strlen); }
int Utf8ToUnicode(TCHAR* &dest, char* string) { int len=LdapUTF8ToUnicode(string,(int)strlen(string),dest,0); dest=(TCHAR*)new TCHAR[len]; LdapUTF8ToUnicode(string,(int)strlen(string),dest,len); return(len); } I've been using these for almost 3 years, with no problems at all. Ok, it's Windows specific because of wldap, but I for one sure don't care. :p You can also replace CString if anything similar is not available. LPCTSTR+wcslen should work fine for the task. So: int UnicodeToUtf8(LPCTSTR string, char* &dest) { int unilen=wcslen(string); int strlen=LdapUnicodeToUTF8(string,unilen,dest,0); dest=(char*)new BYTE[strlen]; LdapUnicodeToUTF8(string,unilen,dest,strlen); return(strlen); }
|
|
« Last Edit: January 05, 2009, 05:30:26 pm by Gemini »
|
|
|
|
C_CliFF
Guest
|
|
« Reply #18 on: January 05, 2009, 06:39:18 pm » |
|
... am I the only one who wondered why you weren't supposed to use a value of $03 to represent a 3-byte pointer? :/
You're right. For some odd reason I accidently read 24 bit instead of 32... It doesen't keep the program from crashing though. -C_CliFF
|
|
|
|
Gil Galad
Guest
|
|
« Reply #19 on: January 06, 2009, 02:22:03 am » |
|
I talked to RedComet earlier today about Cartographer. So I have an example to show you guys based on the game Cadillac, which is a playing card puzzle type game. I am also calling the project. The main point of this post is to address some documentation and explain some things in order to dump text a bit easier. BASE POINTERI tried a pointer table dump without success and then discovered that in order to dump the text of this game I needed to subtract instead of add in the #BASE POINTER command. The reason why you need to subtract is if the ROM address location of the text is less than the pointer address. Cadillac is a mapper 3 Famicom game. For those that don't know, Mapper 3 is a 1 - 32KB PRG (Program ROM) bank game. The range of the data would be 10h - 800Fh. The range of the text data is at 186Fh - 1F39h. So you would add $8000 to get the real address location of the data. So if you start at 186Fh, add 8000 to that and then -10, the result is $985F. So, $985F is the address and the pointer if you flip the bytes around, 5F98. You would add $8000 because that's the address that the bank starts at and adds up right if you use the SetOff pointer calculation. Next is the pointer table location. I will show you the commands in the file. #POINTER TABLE START: $1F3A #POINTER TABLE STOP: $1FB1 The first two bytes of the pointer table are 5F98. Those two bytes are the correct pointer for the first line of text in this block. Based on the way that I have my command files set up, here is how Cartographer normally works. In # BASE POINTER, you take the modifier address and either add or subtract from the pointer to find the ROM address. In the readme file, it only says that you can add, but you can also subtract. You know that 8000 is the SetOff calculation, so based on the way that Cartographer works, if you add $8000 to the pointer, the program is going to crash or not function as intended. $8000 + $985F = 1185F, that's way out of bounds of the PRG bank and the NES address range. Here is the way around it. Instead, subtract $8000 from $985F and that equals 185Fh, near the intended location. Now, here is where it gets a bit weird. You also have to subtract the header size, in this case you would subtract the header from the BASE POINTER modifier, that would be $7FF0. So, your new BASE POINTER modifier would be -$7FF0. So, $985F - $7FF0 equals 186Fh, that is the correct ROM offset. Table FilesHere are a couple tips for table files. Make sure that you remove all bookmarks from the table files as well as anything that is not supported by Cartographer. Make sure the line and end break codes are at the bottom of the file. There are two types that I have used, one is for raw and the other for relative pointers. You can check out the differences in the material that I am going to provide. The end line codes in your table file should be something like this. FE=[liNE]\\r FF=[END]\\n\\r You can use the /r and /n as you wish. Now, for the RELATIVE POINTER table FE=[liNE]\\r /FF=[END]\\n\\r. DumpsFor the dumps that are RAW, I suggest that you have your table files correct or the dumps will not occur or be messed up. I also removed the #END BLOCK command in the raw dump file so that I could dump the text. In closing, some of these things were already documented. However, these things I talked about are based on my experience and how I solved some of the crashing issues. And the lower ROM address compared to the pointer also needed to be discussed, I believe. Here are the files for you guys to look at. Some of these files are unedited and directly from Cartographer. HERE
|
|
|
|
C_CliFF
Guest
|
|
« Reply #20 on: January 06, 2009, 07:22:13 am » |
|
Thank you! It worked now. I just tried with FF5, which use 3-byte pointers. The text is stored at 21020D, so to get the Pointer Table value, you subtract 200 and add C00000 = 21020D - 200 + C00000 = E1000D. E1000D is the first pointer value for 21020D. What I did was subtracting 200 from C00000. So, C00000 - 200 = BFFE00. To make sure I got to the right adress where the text is stored, I subtract the Pointer Table value with BFFE00. So, E1000D - BFFE00 = 21020D. That worked fine, so this is my commands, if you're going to handle games that uses standard 3 byte pointers: #GAME NAME: Final Fantasy 5 (SNES)
#BLOCK NAME: Dialogue Block (RAW) // this raw block extracts fine #TYPE: NORMAL #METHOD: RAW #SCRIPT START: $21020D #SCRIPT STOP: $21FE16 #TABLE: ff5_raw.tbl #COMMENTS: Yes #END BLOCK
#BLOCK NAME: Dialogue Block (POINTER_RELATIVE) #TYPE: NORMAL #METHOD: POINTER_RELATIVE #POINTER ENDIAN: LITTLE #POINTER TABLE START: $2015F0 #POINTER TABLE STOP: $20205F #POINTER SIZE: $03 #POINTER SPACE: $00 #ATLAS PTRS: Yes #BASE POINTER: -$BFFE00 Â Â // C00000 - 200 = BFFE00. E1000D - BFFE00 = 21020D #TABLE: ff5_ptr.tbl #COMMENTS: Yes #END BLOCK
I wasn't aware of that you could subtract, only add so now it works. Thank you, Gil Galad, for clearing that up! -C_CliFF
|
|
« Last Edit: January 06, 2009, 09:35:11 am by C_CliFF »
|
|
|
|
Rappa
Guest
|
|
« Reply #21 on: February 13, 2009, 10:02:47 am » |
|
I'm, translating Fireemblem 3 and using Script Insertor/extractor. It's easy to use but it does not support pointer recalculating so I must do it manually. That's why I tried
Cartographer by Redcomet to dump the text im FE3. My table include entries like this
[code]0000=[END]\\n\\r 0001=[liNE]\\r 01=゠02=ㄠ03=ㆠ.... F5=父 F6=行 F7=戦 ....
120E=身 120F=愛 1210=姉 ....
13AE=特 13AF=殊 13B0=離 ....
148E=泣 148F=欲 1490=河 1491=音 [/code] As you see, FE3 uses 2 kinds of Kanji: 1byte kind (ends at FF) and 2bytes kind (begins with 00 and 12,13,14 array. I mean in the Rom it would be 0012XX or 0013XX...) When I dumped the text, I have two problem: 1. The 1byte Kanji and the Kanas (1byte) are confused. Say, we have 12=kana1, 4F=kana2, 124F=Kanji1. Instead of displayed 124F as kana1 kana2, it displayed Kanji1. How to remedy this? 2. I started dumping at the begining of the sentence, including control codes and the result looks like //<$00>ダ[END]
//ヂã•ã‚°<$00>パã‚<$00>ジã„ãマルス王å[LINE] //タリス城ã‹ã‚‰ã‚·ï¼ãƒ€æ§˜ãŒ[LINE] //æ¥ã‚‰ã‚Œã¾ã—ãŸ<$00>ヅ<$00>ズã†<$00>パ[END]
//ジ<$00>ãˆã©ã†ã—ãŸã‚“ã !ã‚·ï¼ãƒ€[LINE] //城ã§ä½•ã‹ ã‚ã£ãŸã®ã‹<$00>ヅ<$00>パã‚<$00>ジã‚ãマルス様 会ãˆç›¸ã‹ã£ãŸ<$00>ヅ[LINE] //ガルダã®æµ·è³ŠãŒ[LINE] //<$00>çª<$00>然<$00>ã¡ ãŠãã£éŒ²ãŸã®<$00>ヅ[LINE] //ãŠåŸŽã‚‚<$00>å ã‚“<$00>ã¡ã•ã‚Œã¦[LINE] //大ãœã„ã®äººãŒæ®ºã•ã‚ŒãŸã‚[LINE] //ãŠããŒã„!ãŠçˆ¶æ§˜ã‚’<$00>助<$00>ã¡ã‘ã¦<$00>ヅ<$00>パ[END] I wonder where I went wrong? Here I include the table file for you to download, in case you want to check it. http://www.mediafire.com/?juhumkzmkzyControl codes explain 0000= end the text (pointer read until this one) 0001= line break 008800 = begin the talk 0089XX80 = change music to XX melody 00920X0084YYZZ = open a dialouge box, with X is 0 (top) or 1 (bottom), YY is character's portrait ID, ZZ is 04 (top left) or 05 (top right), or 06 (bottom left) or 07 (bottom right) 008A = pause until key pressed, clear previous text. 00920X0002: switch to other character, with X=0 for top character, X=1 for bottom characte. 00850X= close dialouge box, X=2 for top, X=3 for bottom. .... The text I dumped begins at $00071A2E Anyone check this for me? Thank you in advance. This is a result of script extractor/insertor. Sometimes it work wrongly!
|
|
« Last Edit: February 15, 2009, 10:46:32 am by Rappa »
|
|
|
|
Tauwasser
Guest
|
|
« Reply #22 on: February 13, 2009, 12:26:38 pm » |
|
(begins with 00 and 12,13,14 array. I mean in the Rom it would be 0012XX or 0013XX...) But in your table file, you only wrote 12XX or 13XX. If the game has a 00 in front of every of those characters, then you'll need those, too. Also, if your game has it like 0012XX12XX12XX0013XX0012XX12XX0013XX13XX or something, then it's probably really hard to get cartographer to know what you mean. Also, please do not post ANSI text, it was converted to gibberish, so noone really gets your point... cYa, Tauwasser
|
|
|
|
Kajitani-Eizan
Guest
|
|
« Reply #23 on: February 13, 2009, 02:56:32 pm » |
|
1. The 1byte Kanji and the Kanas (1byte) are confused. Say, we have 12=kana1, 4F=kana2, 124F=Kanji1. Instead of displayed 124F as kana1 kana2, it displayed Kanji1. How to remedy this? now, correct me if i'm wrong here, but isn't that the expected behavior?
|
|
|
|
Rappa
Guest
|
|
« Reply #24 on: February 14, 2009, 01:31:04 am » |
|
Also, please do not post ANSI text, it was converted to gibberish, so noone really gets your point...
cYa, Oh sorry. I wrote this in Unicode and open the file by Notepad in another PC. It turned out like this. I'll correct the gibberish soon.
|
|
|
|
KaioShin
Guest
|
|
« Reply #25 on: February 14, 2009, 03:51:27 am » |
|
1. The 1byte Kanji and the Kanas (1byte) are confused. Say, we have 12=kana1, 4F=kana2, 124F=Kanji1. Instead of displayed 124F as kana1 kana2, it displayed Kanji1. How to remedy this? now, correct me if i'm wrong here, but isn't that the expected behavior? Yeah... how would the game be able to distinguish between this? This doesn't make much sense.
|
|
|
|
Tauwasser
Guest
|
|
« Reply #26 on: February 14, 2009, 08:40:10 am » |
|
and 2bytes kind (begins with 00 and 12,13,14 array. I mean in the Rom it would be 0012XX or 0013XX...) As he said above, IMO it probably needs a 00 in front of it. However, the 00 code could just be a "clear kanji set" or something and 12 would be "use kanji set 12 from now on". So it might have lots of variations or something. Like, kana0012XXYYZZ0013XXZZYY where XX, YY, ZZ are always kanji codes in the 12/13 set and not kana... We need more info for this one... cYa, Tauwasser
|
|
|
|
Rappa
Guest
|
|
« Reply #27 on: February 15, 2009, 10:50:06 am » |
|
I corrected the ANSI.
I think 00 is a kind of trigger byte. It triggers the bytes following are Kanji. You can see this in Fe3 and Fe4. But I'm Ok with it now 'cause I already have the script. I just wonder what if I didn't have it. So this problem must be solved.
Anyway, I'm having some troubles with Atlas. It seems that Atlas hates Unicode table!
|
|
|
|
hanhnn
Guest
|
|
« Reply #28 on: September 24, 2009, 09:27:50 pm » |
|
why i can't find this tool in the Utilities section ?
|
|
|
|
Nightcrawler
Guest
|
|
« Reply #29 on: September 25, 2009, 08:12:16 am » |
|
We've told RedComet about this at least 3 or 4 times. I don't know why he doesn't add it to the database, especially after he recommends people to use it and is a staff member on this site. It defies logic. I don't believe he's ever given any reason for withholding.
Care to comment RedComet? I'm calling you out!
|
|
|
|
|