Author
|
Topic: Translation via spreadsheets (Read 1 times)
|
Klarth
Guest
|
|
« on: June 03, 2010, 01:09:43 am » |
|
As we all know, Romhacking's primary translation route has been through text files in its entire history. However, text files don't lend themselves well to data hiding, organization of information, nor tracking changes. On the account of others such as Carnivol and satsu, I propose that we should move our text storage system to spreadsheets. The process will work like so: Dump Text -> Format Script (ie, Atlas commands) -> Convert Script to Spreadsheet -> Translate Script Using Spreadsheet -> Revise Script Using Spreadsheet -> Convert Spreadsheet to Script -> Insert Script. A utility would be developed for the text<->spreadsheet conversion. However, before such utility can be made, there are a lot of design considerations on the side of the Spreadsheet. - Which spreadsheet program(s) will be supported?
- What is the feature set for the spreadsheet? (Word count for a column, % of translation complete, etc)
- If the target spreadsheet doesn't support these features, will an external program provide them?
- Will the spreadsheet enable better translation collaboration?
- Will the spreadsheet have user access control?
- Will the spreadsheet have a way to track revisions?
The goal is to keep things simple. A standalone program will handle the conversion to and from the spreadsheet. Either spreadsheet embedded code or the standalone program will implement a set of translation-specific spreadsheet functionality. I first did an evaluation of MS Excel, since it's the most commonly used spreadsheet program. There are three methods that I'm aware of that could be used. The first is actual integration of VBA code into the spreadsheet. The second is creating an Excel Add-In using C/C++. The last is creating an external program in C#/.NET and using Excel Automation. VBA would allow some sort of user access control (since it's embedded within the spreadsheet) whereas the others don't. It would contain a separate worksheet that contains a UI for useful translation functions. An Add-In would keep the interface components within Excel, and may allow user access control with a clever solution. Excel Automation would be a last resort as the functional environment would be outside of Excel. None of these solutions contain built-in collaboration. The option most promising to me so far: Google Spreadsheets. I haven't looked much into it (just discovered it a few days ago), but an owner can upload a spreadsheet (after conversion from text file), share the spreadsheet to other users (collaboration and user access control), edit it online (free, web-based software), tracks revisions, supports multiple sheets per work book, and supports scripting functionality. This method is also secure from a known translation evil: crashing hard drives. The Google Spreadsheets option means that I need to create a script conversion program that generates a spreadsheet , possibly a file containing various script code to add to Google Spreadsheets, and the ability to turn it back into a text file. Opinions or suggestions on this?
|
|
|
|
DarknessSavior
Guest
|
|
« Reply #1 on: June 03, 2010, 01:12:03 am » |
|
I think if you're going to try and do things in this direction, you better make sure you have support for Open Office. Not only is it free, but it isn't web-based like Google Spreadsheets.
~DS
|
|
|
|
Klarth
Guest
|
|
« Reply #2 on: June 03, 2010, 01:21:32 am » |
|
I'm not too interested in adding full support for Open Office. As the initial exported format will be some sort of XML spreadsheet, it will be usable in it. I don't have any plans in adding the extra functionality for more than just one target app. You do bring up a good point about web-based. Some users may have to download a copy from Google Spreadsheets due to a lack of internet where they do their work at. I'm not sure how cleanly the reuploading of the spreadsheet would be, but it wouldn't be a problem unless several people were working on it at the same time.
|
|
|
|
satsu
Guest
|
|
« Reply #3 on: June 03, 2010, 08:21:42 am » |
|
I think this is a wonderful idea and I'm glad you've brought it up. I've been doing professional game localisation for a few years now, and pretty much all software localisation I've done has been in some kind of spreadsheet, be it Excel XLS format or with in-house tools, tools using XML and other systems. This, for me, is one of the main points where fan and profession game localisations differ (and I'm sure I'm opening myself up to sarky comments about lazy, uncaring developers/crappy translators). I've found it very hard to go back to plain-text scripts - I find them hard to read, primitive and unwieldy. The format of the scripts used for professional localisations is an area that fan translation can learn a great deal from. Spreadsheet-based scripts bring many benefits to the table, such as: - Data is efficiently organised rather than represented as long one stream. Information such as the message address in the ROM, actual message and comments etc is separated, making it easier to read and easier to manipulate. For example, you can perform operations on the entire message column without touching the message addresses or comments. This opens up a variety of possibilities. Hate typing out tags? Copy and paste the Japanese column into the English column, and in a few simple operations you can have just the tags by themselves, waiting to be lovely ensconced in English text. - It enables you to use formulas, macros &c. For example, you can use macros to avoid overruns by counting pixels/characters. You don't need to create special editors for each game and so on. - You can have multiple languages side-by-side. For example, you could have a Japanese column next to the English without the need to delete or comment out the original text. Again, this makes it easier to read. Sample: Address | Japanese | English | Comment | 80085 | 鋼ã®ãƒ‘ンツ<end> | SteelPants<end> | British English "pants" | 01134 | ヘコã‚ムシ<end> | Farts McGee<end> | Literally "farting insect" |
Isn't that a whole lot better-organised than this? //[80085] //鋼ã®ãƒ‘ンツ<end> SteelPants<end> //British English "pants"
//[01134] //ヘコã‚ムシ<end> Farts McGee<end> //Literally "farting insect" Additionally, going for some kind of XML spreadsheet is a great option as you can use a variety of editors. I have used Excel to edit XML sheets in the past, allowing me to use macros and formulas and easily manipulate the text without wrecking its structure and so on. It's a very versatile solution - it doesn't lock you down to a single editor. Any editor is fine as long as it respects the structure of the original XML.
|
|
« Last Edit: June 03, 2010, 12:10:05 pm by sa♥tsu »
|
|
|
|
Nightcrawler
Guest
|
|
« Reply #4 on: June 03, 2010, 09:47:19 am » |
|
I was going to say I didn't really understand the payoff from the seemingly large load of work, but Satsu cleared that up. He also hit my other point. For this to work as nicely as it sounds, the tools aren't there. You'd need more than just a conversion from text<->spreadsheet. Satsu is using an in-house tool suite. None of us have that luxury. We're left with what? MS Office, Open Office, Google Spreadsheet? First, I'm not sure any of them are great script editing tools. Second, one is costly, one is more obscure but free, one is online. Picking just one of the three to target doesn't seem like it will really make the impact you're trying to make and reach as far. Next, it complicates the whole process. You're adding an extra conversion utility in the mix. The dumper should output directly to whatever format is used for editing the script. The inserter should be able to then work with that format as well. It doesn't make sense to try complicating the process further to me, unless the reason is so anyone with a dumper that can at least make text files could potentially use the spreadsheet format. But, there's going to be rules for the converter, so if they're not met ahead of time, that's out anyway. I can see the benefits, but there seems like some big hurdles to really get it to take off. Dumping script to an XML based format to begin with might be the better part of the concept presented here. Of course, it's more difficult to edit that way without a tool to interpret and edit with. Circle back to the tool suite issue. This leads me back to my thoughts on a movement to try to standardize as many things as possible from the ground up (such as the table file, so we can start developing compatible tools and building a suite a tools to use and not sacrifice freedom and choice of tools. Standards exist everywhere else for these reasons, yet we as a ROM Hacking community largely ignore the issue. Judging by the reaction to patching format standards the last few years, I see little hope for achieving this goal. But maybe there will be some things the majority can agree on elsewhere. Or, maybe it's the few that can deliver the goods make the decisions and everybody else lives with them. I'd definitely like to start seeing interchangeable/compatible tool suites being made.
|
|
« Last Edit: June 03, 2010, 11:45:28 am by Nightcrawler »
|
|
|
|
DaMarsMan
Guest
|
|
« Reply #5 on: June 03, 2010, 10:48:00 am » |
|
What about skipping the conversion and making a spreadsheet editor that opens Atlas files? If we put enough features into it it could work.
EDIT: Actually now that I think about it.. What if we could code an extension or plugin for Office/Open Office that could do this? That could be another option.
|
|
« Last Edit: June 03, 2010, 10:53:05 am by DaMarsMan »
|
|
|
|
satsu
Guest
|
|
« Reply #6 on: June 03, 2010, 12:09:37 pm » |
|
I was going to say I didn't really understand the payoff from the seemingly large load of work, but Satsu cleared that up.
He also hit my other point. For this to work as nicely as it sounds, the tools aren't there. You'd need more than just a conversion from text<->spreadsheet. Satsu is using an in-house tool suite. None of us have that luxury. We're left with what? MS Office, Open Office, Google Spreadsheet? First, I'm not sure any of them are great script editing tools. Second, one is costly, one is more obscure but free, one is online. Picking just one of the three to target doesn't seem like it will really make the impact you're trying to make and reach as far.
Actually, quite a lot of the work that I've done has been in Excel files. Many developers use generic spreadsheets. Additionally, what is a great script editing tool? In the context of a game translation, I would say that Excel/spreadsheet is certainly a better script editing tool than Notepad/text file due to what I've discussed above. I did find translation and editing in spreadsheets a bit strange at first, but now it's second nature to me. If theoretically Klarth were to update Atlas to support dumping to XML/inserting from XML, that'd already be a huge boost. I'm pretty sure most modern spreadsheet programs would be able to handle it.
|
|
|
|
Klarth
Guest
|
|
« Reply #7 on: June 03, 2010, 12:25:23 pm » |
|
I understand the extra complexity of running another program, but I think the benefits of translation productivity far outweigh the hassle. The idea of creating a conversion program is indeed to keep things compatible with the mainstream and is not an ideal solution. Not many people write custom script utilities based on XML and there are no public utilities that dump or insert XML AFAIK (or maybe there's one). I was considering writing XML utilities several years ago, but decided against it mainly due to incompatibilities with existing utilities. I only saw potential for multilingual translation management but didn't foresee XML spreadsheets to boost translator productivity back then.
I believe creating the converter for this is going to be relatively straightforward. Maybe I'll spend a couple hours and program a prototype so you guys can see how it works. Might help some ideas pop up too.
I'm all for a standardized table format. I do see some difficulties in the shift to Unicode, but it should be done eventually. A new table standard would give me a reason to rewrite TableLib in Unicode.
DaMarsMan, the code involved in a plug-in for Office/Open Office would be equivalent for the code used in a standalone converter, minus the plug-in interfacing code itself. The script file would need to be completely converted over to a spreadsheet-friendly format.
|
|
|
|
madmalkav
Guest
|
|
« Reply #8 on: June 03, 2010, 08:27:17 pm » |
|
As far as I know, profesional translation memory software takes a similar approach on data representation to the user. Take a look if you can to DejaVu X, it is the software of choice of one of my professional translator friends and if something half powerful exist in a non commercial license it will be an interesting try to just go directly for that instead of plain spreadsheets.
|
|
|
|
Nightcrawler
Guest
|
|
« Reply #9 on: June 04, 2010, 08:56:40 am » |
|
I understand the extra complexity of running another program, but I think the benefits of translation productivity far outweigh the hassle. The idea of creating a conversion program is indeed to keep things compatible with the mainstream and is not an ideal solution. Ok. Logical enough. History shows us we're reluctant to change in this community. So, compatibility is certainly important in order to get anybody to adopt and use. Even then, we've seen things fall flat. If it catches on, you can always create dumpers/inserters using the format nativly later on. I believe creating the converter for this is going to be relatively straightforward. Maybe I'll spend a couple hours and program a prototype so you guys can see how it works. Might help some ideas pop up too. That will probably go further than most anything you can say on the subject. You know the old saying, a picture is worth a thousand words. What's to lose? The worst that can happen is it's out there for someone to use if they want to. Best case scenario is it revolutionizes the way things are done. Try not to make it a command line only converter. I'm all for a standardized table format. I do see some difficulties in the shift to Unicode, but it should be done eventually. A new table standard would give me a reason to rewrite TableLib in Unicode.
Already starting to be hashed out over at TransCorp. I thought it best to start with assembling all of the features used in table by various utilities, determine if they belong in a table file, then analyze support for them. The idea is to standardize what we already have, and try and stay generally compatible with existing utilities, but still move forward and eliminate the ambiguity, problems, and need to alter your table for each utility you use. This is in contrast to an approach of coming up with something entirely new which would have a hard time getting off the ground.
|
|
« Last Edit: June 04, 2010, 12:36:37 pm by Nightcrawler »
|
|
|
|
Markliujy
Guest
|
|
« Reply #10 on: June 04, 2010, 09:11:56 am » |
|
I'm not sure why you'd ever use text files, except that they are the simplest things - you don't need anything special to open them. But a spreadsheet is definitely more logical, from the translator's point of view, and I'm pretty sure the dumping program uses arrays/lists or whatever anyways.
Did you consider the CSV table format? Seems the easiest to do, and is pretty much supported everywhere since its literally a text file. It should also be relatively simple to convert a text file to a CSV table.
And then, if someone wants collaboration, version control and user access control, they can use Google Docs, or just edit it with their favorite spreadsheet editing program. Then on top of that you can create excel add-ons, OpenOffice add-ons, google doc add-ons, standalone programs, etc.
|
|
|
|
rmco2003
Guest
|
|
« Reply #11 on: June 04, 2010, 12:57:10 pm » |
|
Whenever I do hacking projects I usually use XML along with my own frontend interface for translation, XML has a lot of useful features that can benefit storing scripts, like translators' notes can go in XML comments for a certain tag, etc. It's quite trivial to implement XML Readers/Writers in .NET too. :thumbsup:
|
|
|
|
Nightcrawler
Guest
|
|
« Reply #12 on: June 05, 2010, 11:31:38 am » |
|
If XML would be the direction all this ends up going, I think it would make sense to start talking about the details of XML file standard soon. XML is pretty broad. It would seem what's been established so far is any Atlas compatible text script can be converted to and from this new format. But the details of the new format should be established as soon as Klarth provides a basic example to look at as a basis for discussion. The idea of course would be so future utilities could use or be compatible with this format.
|
|
|
|
Klarth
Guest
|
|
« Reply #13 on: June 05, 2010, 02:22:07 pm » |
|
Try not to make it a command line only converter. Finished product would absolutely be a GUI program. :p As far as CSV goes, it appears that Excel is unable to export Unicode text (at least UTF-8 with Japanese characters) correctly. That or I'm there's some hidden "feature" to correct this. I have yet to test to see how well Google Docs works with this. (It supports XLS, XLSX, ODS, CSV, and TXT...which means no SpreadsheetML) I was hoping to have something to show by now, but unfortunately the XML sample I have probably can't be released as the source script isn't mine. It was rather easy to convert an Atlas script (with commented Japanese and uncommented English text, didn't conserve original comments though) to an XML file using .NET's XmlSerializer. However, I'm fairly unseasoned in the XML game (made one program in C++/MSXML for a configuration file half a decade ago). Generating SpreadsheetML for Excel is a bit daunting to me when it comes to using XmlSerializer. So I think the proper approach would be to form a standard XML encoding and then use XSLT to translate this standard to SpreadsheetML or other formats. Unfortunately that involves creating two XSLTs per supported end format (Kinda disappointing Google doesn't support a brand of XML Spreadsheets), but it'd lead to better longevity as XSLTs can be made by the end user as other programs (such as Excel) change their XML schema. So at the moment, a neutral XML format would work for Excel at least. Google Docs wouldn't be supported (XLS and XLSX aren't the same as the SpreadsheetML .XML files Excel supports), but CSV import/export is pretty trivial. And I have no clue if Open Office supports an XML spreadsheet or not. The script->XML I did earlier would've evolved into (after fixes) a root node of <script> and subnodes of <stringentity id="num"> (which contains <atlas>, <string language="en or jp">). Haven't figured out exactly where to put comment, because for the Excel XSLT, I might just attach the comment to the cell (still viewable) rather than create an entire column specifically for comments. It's a pretty basic XML format and I don't foresee it becoming that complex. Had it been generated by a dumper, I would've added some things like start address of the string...and possibly some other insertion-relevant information.
|
|
|
|
Tauwasser
Guest
|
|
« Reply #14 on: June 05, 2010, 03:15:45 pm » |
|
As far as CSV goes, it appears that Excel is unable to export Unicode text (at least UTF-8 with Japanese characters) correctly. There is no formal definition of CSV files. Hence, Excel exports using ANSI code page, but will be able to import UTF formats as well. You can write a small script that exports it in csv, or just save as "unicode text file" which saves as TSV and should be processable as easy as CSV. If you think about coining a general xml format for this, maybe start a community poll or something. I know that I would like to include references in there, like one piece of text pointing to the same text, therefore only dumping it the first time and referencing it all other times. I remember some games also call text inside of strings... cYa, Tauwasser
|
|
|
|
|