RHDN Forum Archive - Google Translate updated

RHDN Forum Archive

Romhacking

ROM Hacking Discussion

Google Translate updated

Pages: 1 [2]

Author

Topic: Google Translate updated (Read 2 times)

BRPXQZME
Guest

Re: Google Translate updated

« Reply #15 on: November 24, 2009, 09:50:09 pm »

Quote from: FinS on November 24, 2009, 09:04:06 pm

"You can release Japanese?" sounds strange but I wonder if that is the literal interpretation of it.

Itâ€™s confusing è©±ã™ with é›¢ã™. One of them means â€œspeakâ€. One of them means â€œlet goâ€.

â˜†HAâ˜†NAâ˜†SEâ˜†

KaioShin
Guest

Re: Google Translate updated

« Reply #16 on: November 25, 2009, 03:23:58 am »

Systran has been around for ages already, it's not a new engine by any means. As far as I know it's also the same translation engine that ATLAS is build on (the translation software, not the script inserter...). Judging from my experiences with AGTH it's definitely not near the quality required to use it for anything but looking up a few words. It's better than Babelfish though.

BRPXQZME
Guest

Re: Google Translate updated

« Reply #17 on: November 25, 2009, 03:27:20 am »

Babelfish is Systran-based.

KaioShin
Guest

Re: Google Translate updated

« Reply #18 on: November 25, 2009, 03:29:15 am »

Quote from: BRPXQZME on November 25, 2009, 03:27:20 am

Babelfish is Systran-based.

:banghead:

Grin

Then I mixed things up. Do you know which engine Atlas is based on? I'm pretty sure it's something different then.

Orusaka
Guest

Re: Google Translate updated

« Reply #19 on: November 25, 2009, 03:41:27 am »

Quote from: BRPXQZME on November 24, 2009, 09:50:09 pm

Quote from: FinS on November 24, 2009, 09:04:06 pm

"You can release Japanese?" sounds strange but I wonder if that is the literal interpretation of it.

Itâ€™s confusing è©±ã™ with é›¢ã™. One of them means â€œspeakâ€. One of them means â€œlet goâ€.

â˜†HAâ˜†NAâ˜†SEâ˜†

I think it might be confusing it with æ”¾ã™, actually. Either way, it doesn't make much sense. I understand that context is hard to program for, but if it had to pick one without considering it, why not pick the alternative with the highest frequency rate? è©±ã™ is more frequent than æ”¾ã™. It wouldn't always net results, but if it's going to ignore context, it would probably be the best option.

BRPXQZME
Guest

Re: Google Translate updated

« Reply #20 on: November 25, 2009, 03:54:45 am »

eh, thatâ€™ll show me to trust kotoeri again... *mumble mumble*

Well, it just sucks, and thatâ€™s that. But ultimately, kana vs. kanji is a much harder thing than mere statistics. Computers arenâ€™t so great at figuring out where one word stops and another begins in Japanese; humans can only do it so well because weâ€™re cacophonologists by nature.

Quote from: KaioShin on November 25, 2009, 03:29:15 am

Quote from: BRPXQZME on November 25, 2009, 03:27:20 am

Babelfish is Systran-based.

:banghead:

Grin

Then I mixed things up. Do you know which engine Atlas is based on? I'm pretty sure it's something different then.

Atlas (or perhaps more properly â€œAtlas IIâ€) is its own system. What you can get today is fundamentally the same thing that went to market in 1982, except now you get the benefit of a couple decadesâ€™ worth of tinkering and UI changes, as well as some big-ass dictionaries.

Atlas is rule-based, rather than statistical.

DarthNemesis
Guest

Re: Google Translate updated

« Reply #21 on: November 25, 2009, 06:34:56 am »

Edit: whoops, didn't see that there was a second page.

FinS
Guest

Re: Google Translate updated

« Reply #22 on: November 25, 2009, 08:06:39 am »

I double checked it with RikaiChan which is a dictionary plugin for Firefox, but you probably know that since you all seem pretty familiar with this stuff. It also picks up the "to release" definition from æ”¾ã™ and ã¯ãªã™ but I can see that æ”¾ is not even in the sentence. But the next definition is "to speak" which it picks up from èªžã™ and ã¯ãªã™ and I can see those are all in the sentence but in different parts.

Quote from: BRPXQZME

Computers arenâ€™t so great at figuring out where one word stops and another begins in Japanese; humans can only do it so well because weâ€™re cacophonologists by nature.

It's too bad words or even phrases are not seperated in writing. That surely would make it easier although I'm guessing the sentence becomes too complex in some situations like in this example.

Orusaka
Guest

Re: Google Translate updated

« Reply #23 on: November 25, 2009, 10:08:08 am »

Quote from: FinS on November 25, 2009, 08:06:39 am

Quote from: BRPXQZME

Computers arenâ€™t so great at figuring out where one word stops and another begins in Japanese; humans can only do it so well because weâ€™re cacophonologists by nature.

It's too bad words or even phrases are not seperated in writing. That surely would make it easier although I'm guessing the sentence becomes too complex in some situations like in this example.

I can't speak on how RikaiChan works, as I use a different solution. I can however, explain what happened. It had nothing to do with word boundaries. It's true, that would be hard to program a computer to do, but that's not where it failed. It failed because it couldn't understand the context of the sentence, which is the real trick to getting automated translation to work.

The program had only ã¯ãªã™ to go on. All the program knows is that it has to be either:

1. è©±ã™ - to speak
2. æ”¾ã™ - to seperate, to set free
3. é›¢ã™ - to part, to devide, to seperate

All three of those are homophones. (words pronounced the same way, but with different meanings.) Since the program doesn't know which one is the correct one, it has to guess. Now, you and I will be able to do that based on the context. We know that when we see Japanese, that it must refer to speak, and not seperation etc.

This isn't a problem that's exlusive to Japanese, but perhaps more prominent. There are problems with automatic translation into other languages in regards to homophones, as well.

My point, however, was merely that, if you are not going to attempt the daunting task of trying to program your software to understand contexts, or fake it rather, it would probably be better to have it guess at the most statistically likely one, which in this case would be "to speak". The three possibilities are 151, 250 and 578 respectivly. Those numbers are actual frequency numbers based on newspaper occurrence, meaning that è©±ã™ is the actual 151st most common kanji in Japanese newspapers, so low numbers are "better". (given, my numbers are more than a few years old, but there is no reason to assume it has changed.)

Moulinoski
Guest

Re: Google Translate updated

« Reply #24 on: November 25, 2009, 10:35:08 am »

Hmm, now I'd like to know what are the most used kanji, period.

DarknessSavior
Guest

Re: Google Translate updated

« Reply #25 on: November 25, 2009, 11:09:41 am »

Quote from: Garoth Moulinoski on November 25, 2009, 10:35:08 am

Hmm, now I'd like to know what are the most used kanji, period.

http://www.kanji-a-day.com/100kanji.php

~DS

BRPXQZME
Guest

Re: Google Translate updated

« Reply #26 on: November 25, 2009, 01:45:57 pm »

Quote from: Orusaka on November 25, 2009, 10:08:08 am

You overestimate SYSTRANâ€™s ability to tell word boundaries in Japanese, though.

It didnâ€™t just fail at translating a single word; the bigger problem is that it failed to translate the grammar naturally. If you type â€œæ—¥æœ¬èªžã€è©±ã›ã‚‹ï¼Ÿâ€ you get â€œJapanese, you can speak?â€ and other bits of Tarzan-speak. Note how none of your examples used the proper English: â€œCan you speak Japanese?â€ SYSTRAN is so afraid to switch sentence order around in anything but where it is strictly permitted, that in this case, it merely translates the sentence and then adds a question mark at the end for ã‹. On one hand, this has a lot to do with its original focus in assisting translation with such clients like the EU and the UN that need to pass around technical documents and legal crap in multiple languages. On the other hand, it makes it utter poo for conversation and literary works (like video games, unlike what Old Man Ebert may tell you), and SYSTRANâ€™s academic papers would be the first to mention this.

Letâ€™s take first a simple sentence and then a complicated one from my current project (Ace Combat 3):

ã‚‚ã¯ã‚„ä¸€åˆ»ã®çŒ¶äºˆã‚‚è¨±ã•ã‚Œãªã„ã€‚
Either postponement of moment is not permitted already.

The â€œã‚‚ã¯ã‚„â€ is â€œalreadyâ€ and gets shoved to the end. â€œä¸€åˆ»ã®çŒ¶äºˆã‚‚â€ is translated â€œeither postponement of momentâ€, which makes a little bit of sense translating literally, but is ultimately gibberish. â€œè¨±ã•ã‚Œãªã„â€ is translated â€œis not permittedâ€ and is shoehorned in right before the end, which is technically fine, but the meaning, in context of the â€œã‚‚ã€œãªã„â€ is more like â€œcannot permitâ€ when you put it in English, despite this not being grammatically equivalent.

Now letâ€™s play dirty:

æ»å› ã‚‚â€¦â€¦ã©ã†ã‚„ã‚‰äººå·¥å¿ƒè‡“ã®ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã«æŽ¥ç¶šã•ã‚Œã¦ã„ãŸè‡ªå‹•åˆ¶å¾¡ãƒ—ãƒã‚°ãƒ©ãƒ ãŒã€ä½•ã‚‰ã‹ã®åŽŸå› ã§æš´èµ°ã—ãŸäº‹ã«ã‚ˆã‚‹ã‚‚ã®ã¨è¦‹ã‚‰ã‚Œã€é–¢ä¿‚è€…ã®é–“ã§ã‚‚å¤§ããªæ³¢ç´‹ã‚’å‘¼ã‚“ã§ã„ã¾ã™ã€‚
Cause of death ......Somehow, the automatic control program which is connected to the network of the artificial heart, it is seen as the thing due to driving recklessly with a some cause calls the big ripple even between the authorized personnel.

Well, this is where a rule-based system like SYSTRAN shines. Except for a little word->word boo-boos, and a couple of grammar niggles, this is actually reasonably close to how I translated it. All that cautiousness pays off if you end up with something that remotely makes sense.

Cf. Google, just for kicks:

ã‚‚ã¯ã‚„ä¸€åˆ»ã®çŒ¶äºˆã‚‚è¨±ã•ã‚Œãªã„ã€‚
Not allowed time to lose anymore.

Equally bad as above, but in some different ways.

æ»å› ã‚‚â€¦â€¦ã©ã†ã‚„ã‚‰äººå·¥å¿ƒè‡“ã®ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã«æŽ¥ç¶šã•ã‚Œã¦ã„ãŸè‡ªå‹•åˆ¶å¾¡ãƒ—ãƒã‚°ãƒ©ãƒ ãŒã€ä½•ã‚‰ã‹ã®åŽŸå› ã§æš´èµ°ã—ãŸäº‹ã«ã‚ˆã‚‹ã‚‚ã®ã¨è¦‹ã‚‰ã‚Œã€é–¢ä¿‚è€…ã®é–“ã§ã‚‚å¤§ããªæ³¢ç´‹ã‚’å‘¼ã‚“ã§ã„ã¾ã™ã€‚
Cause of death ... ... and Control Program also had an artificial heart is connected to the network apparently believed by the runaway thing for whatever reason, is causing a big stir among the people concerned.

Close, but no cigar. The part â€œis causing a big stir among the people concernedâ€ is great, but â€œapparently believed by the runaway thing for whatever reasonâ€ is aswingandamiss; the â€œã«ã‚ˆã‚‹â€ can mean â€œbyâ€ or â€œon account ofâ€, which is basically the same thing, except for a subtlety in English in which â€œbyâ€ used with the passive voice indicates that something is directly responsible (and would be the subject of the sentence were the active voice used). In this case, Google Translate botched it, because the â€œã¨â€ is more important here, and SYSTRANâ€™s â€œit is seen asâ€ is much closer to the mark.

---
As to ã¯ãªã™, there is a simpler explanation: usually, when kana only are used instead of a common kanji, itâ€™s because they wanted a complex kanji but didnâ€™t want to bother fishing through the IME to select it. SYSTRANâ€™s rule-based system, with its gazillions of language pairs, reflects this. Googleâ€™s statistics-based system does not, and just goes with what it thinks would make sense statistically.

Moulinoski
Guest

Re: Google Translate updated

« Reply #27 on: November 25, 2009, 06:10:03 pm »

Quote from: DarknessSavior on November 25, 2009, 11:09:41 am

Quote from: Garoth Moulinoski on November 25, 2009, 10:35:08 am

Hmm, now I'd like to know what are the most used kanji, period.

http://www.kanji-a-day.com/100kanji.php

~DS

Ooh, ã‚ã‚ŠãŒã¨ã†ï¼ï¼

Pages: 1 [2]