+  RHDN Forum Archive
|-+  Romhacking
| |-+  ROM Hacking Discussion
| | |-+  Translation via spreadsheets
Pages: 1 [2]
Author Topic: Translation via spreadsheets  (Read 1 times)
Nightcrawler
Guest
« Reply #15 on: June 05, 2010, 05:16:03 pm »

Quote from: Klarth on June 05, 2010, 02:22:07 pm
Quote from: Nightcrawler on June 04, 2010, 08:56:40 am
Try not to make it a command line only converter. Tongue
So at the moment, a neutral XML format would work for Excel at least.  Google Docs wouldn't be supported (XLS and XLSX aren't the same as the SpreadsheetML .XML files Excel supports), but CSV import/export is pretty trivial.  And I have no clue if Open Office supports an XML spreadsheet or not.

The script->XML I did earlier would've evolved into (after fixes) a root node of <script> and subnodes of <stringentity id="num"> (which contains <atlas>, <string language="en or jp">).  Haven't figured out exactly where to put comment, because for the Excel XSLT, I might just attach the comment to the cell (still viewable) rather than create an entire column specifically for comments.  It's a pretty basic XML format and I don't foresee it becoming that complex.  Had it been generated by a dumper, I would've added some things like start address of the string...and possibly some other insertion-relevant information.

First, why are we confining ourselves to Japanese and English? Wouldn't it be better to use something like Source Language and Destination Language? A translation can be for any two languages even though it's most common to see Japanese and English.

Second, just because it wasn't generated by a dumper, doesn't mean those data fields such as address or insertion relevant information should be left out. That's why I was suggesting this leading into a discussion about standardizing the XML format. Ideally, you would want a dumper to be able to generate this format and an inserter to be able to insert directly from it. I realize you're just starting out and this is just a quick example, but these things should definitely be in mind and hashed out going forward.
Tauwasser
Guest
« Reply #16 on: June 05, 2010, 05:33:57 pm »

More than two languages could be of use for people who translate from a given script into a third language. If you are going to use language tags though, please don't make stuff up, but use ISO 639 codes and use them as defined by BCP #47 (best current practice #47). For instance, your example contains en for English, yet jp for Japanese whereas it should be ja for Japanese, because JP is the country of Japan. This confuses people and decreases interoperability.

Also, don't forget versioning. Version all your XML files, so when you parse them you can give a version error instead of failing or crashing. So when the underlying XML grammar gets changed within the stability requirements, you will still be able to parse it correctly for the most part, but can give an error if the grammar version the file was produces with isn't supported by an older version of the program for instance.

All Atlas-specific data can be stored in tags as well, so don't leave that out either. Also, user-defined fields that don't pertain to Atlas in any specific way might be desirable so people can use the files outside of an Atlas context and store information that is relevant to them in there. Of course there might be other standard fields that could be defined for Atlas to insert in a special way, for instance "speaker" would be such a field. Helps the translator if it's known and games that do show this info can have their script inserted in the right fashion by Atlas with this info if the format for said tag is provided.

cYa,

Tauwasser
madmalkav
Guest
« Reply #17 on: June 06, 2010, 08:29:22 am »

Back with my translation memory software fixation:

http://en.wikipedia.org/wiki/Translation_Memory_eXchange , an XML specification than perhaps can be useful, or at least give some ideas.

http://developers.sun.com/dev/gadc/technicalpublications/articles/xliff.html ,another XML specification for localization, by Sun.
« Last Edit: June 06, 2010, 09:06:16 am by madmalkav »
Klarth
Guest
« Reply #18 on: June 06, 2010, 03:32:52 pm »

Just finished a (very) WIP build.  There are a few discrepancies vs what I'd like to see, but that's partially due to coding difficulties.  The GUI is very much WIP too...it'll be more developed when more details are worked out.  At the moment, it handles Atlas files (stripping the comments for now) and basic scripts (hardcoded string delimiter of "<END>" at the moment).  I included a sample script (FF1, english dump) since the combined J and E text files I have, I can't release (though, I do have them working in a separate build, since it requires a slightly different parser).  You can run the FF1 script yourself and it'll generate a generic XML file.  It automatically uses the xml2excel.xslt file I made (first one, too), to generate a SpreadsheetML .XML for Excel (contains a few hardcoded items, which'll be phased out).  It does, however, hide all Atlas commands (the entire column as well as rows) by default, something that might change.  Especially when I get around to more VBA.  Ensure your text encoding is Unicode/UTF-8 if you are using a Japanese script, else it won't show up in Excel.

You can download the WIP from: Script2Excel

I'm worrying about working code first.  I really don't care about the difference between ja and jp when there's no really code to support it yet.  And of course the spec would support multiple languages (in one document, even).  If the translator doesn't want to view the other languages, he/she can just right click the column and hide it.  At the moment it uses <sourcetext> (which is kinda meant to be from a dumper, but it works too) and eventually it'll be something like <sourcetext lang="en">.  There's a lot of options to organize and nothing is set.

I couldn't get XmlSerializer to output proper line breaks for inside of nodes for Excel (which is &#10;...which gets interpreted to &amp;#10;), so I wrote a quick macro to properly resize the row heights without unhiding the Atlas commands.  You'll first have to enable word wrap on the column you're resizing the rows of.

Code:
Sub ResizeRowHeights()
Dim Row As Range

For Each Row In Selection.EntireRow
    If Row.Hidden = False Then
        Row.AutoFit
    End If
Next Row
   
End Sub

Once we get the form down some, we can probably start testing Google Docs (which can import/export Excel files) to see if my proposed benefits materialize.  The XML is still admittedly ugly, but I just wanted to put out a proof of concept, which wasn't that hard to do.
« Last Edit: June 06, 2010, 03:38:55 pm by Klarth »
DaMarsMan
Guest
« Reply #19 on: June 07, 2010, 01:45:51 pm »

Why don't we have a second file that contains all the conversion settings. This way you could run the batch on the script/excel and point it to the conversion settings. This would also be useful if you had multiple conversion types.

Maybe some sort of syncing method is needed. I can just picture overwriting a script file by accident with an older revision.
Klarth
Guest
« Reply #20 on: June 08, 2010, 11:50:15 am »

Quote from: DaMarsMan on June 07, 2010, 01:45:51 pm
Why don't we have a second file that contains all the conversion settings. This way you could run the batch on the script/excel and point it to the conversion settings. This would also be useful if you had multiple conversion types.

Maybe some sort of syncing method is needed. I can just picture overwriting a script file by accident with an older revision.
Haven't decided upon how to store the conversion settings as these will vary depending on the target program.  In any case, the user will be able to provide an XSLT file for transforming the standard XML to a target program (probably just modding the supplied XSLT).  The current UI is just a test interface.  A final UI would definitely prompt you so that you won't overwrite a script file by accident.

So the next step in this process is to develop a DTD (Document Type Definition) or XSD (XML Schema Document) to draft the XML more clearly for review (so my program will follow the spec, rather than the spec following my program).  Does anybody have any input on which should be used?  I used DTD back in 2003, but I don't know which of those two routes I should go as I'm not a regular in the XML world.
Ruairc
Guest
« Reply #21 on: August 11, 2010, 01:24:09 am »

I know this is an old topic but I just wanted to say I really really want this.

Is there any new information or WIPs?
Klarth
Guest
« Reply #22 on: August 25, 2010, 04:43:03 pm »

Sorry for the delayed response.  I didn't see the new post.

At the moment, there has been no new progress due to a relative lack of interest in the project.  I am currently playing around with a test project and once I figure out the text encoding and/or compression scheme on it, said game will be used as a sample to help iron out design issues in the spreadsheet project (and to pretty it up some).  I wouldn't expect an official release for at least another two months if things are going well.  I'd hate to release a program that creates an Excel spreadsheet, have someone use it to translate a script, then realize that some feature of Excel they used breaks the conversion process back to plain text.
Pages: 1 [2]  


Powered by SMF 1.1.4 | SMF © 2006-2007, Simple Machines LLC