I can't speak on how RikaiChan works, as I use a different solution. I can however, explain what happened. It had nothing to do with word boundaries. It's true, that would be hard to program a computer to do, but that's not where it failed. It failed because it couldn't understand the context of the sentence, which is the real trick to getting automated translation to work.
You overestimate SYSTRAN’s ability to tell word boundaries in Japanese, though.
It didn’t just fail at translating a single word; the bigger problem is that it failed to translate the grammar naturally. If you type “日本語ã€è©±ã›ã‚‹ï¼Ÿâ€ you get “Japanese, you can speak?†and other bits of Tarzan-speak. Note how
none of your examples used the proper English: “
Can you speak Japanese?†SYSTRAN is so afraid to switch sentence order around in anything but where it is strictly permitted, that in this case, it merely translates the sentence and then adds a question mark at the end for ã‹. On one hand, this has a lot to do with its original focus in assisting translation with such clients like the EU and the UN that need to pass around technical documents and legal crap in multiple languages. On the other hand, it makes it utter poo for conversation and literary works (like video games, unlike what Old Man Ebert may tell you), and SYSTRAN’s academic papers would be the first to mention this.
Let’s take first a simple sentence and then a complicated one from my current project (Ace Combat 3):
ã‚‚ã¯ã‚„一刻ã®çŒ¶äºˆã‚‚許ã•ã‚Œãªã„。
Either postponement of moment is not permitted already.
The “もã¯ã‚„†is “already†and gets shoved to the end. “一刻ã®çŒ¶äºˆã‚‚†is translated “either postponement of momentâ€, which makes a little bit of sense translating literally, but is ultimately gibberish. “許ã•ã‚Œãªã„†is translated “is not permitted†and is shoehorned in right before the end, which is
technically fine, but the meaning, in context of the “も〜ãªã„†is more like “cannot permit†when you put it in English, despite this not being grammatically equivalent.
Now let’s play dirty:
æ»å› も……ã©ã†ã‚„ら人工心臓ã®ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã«æŽ¥ç¶šã•ã‚Œã¦ã„ãŸè‡ªå‹•åˆ¶å¾¡ãƒ—ãƒã‚°ãƒ©ãƒ ãŒã€ä½•ã‚‰ã‹ã®åŽŸå› ã§æš´èµ°ã—ãŸäº‹ã«ã‚ˆã‚‹ã‚‚ã®ã¨è¦‹ã‚‰ã‚Œã€é–¢ä¿‚者ã®é–“ã§ã‚‚大ããªæ³¢ç´‹ã‚’呼んã§ã„ã¾ã™ã€‚
Cause of death ......Somehow, the automatic control program which is connected to the network of the artificial heart, it is seen as the thing due to driving recklessly with a some cause calls the big ripple even between the authorized personnel.
Well, this is where a rule-based system like SYSTRAN shines. Except for a little word->word boo-boos, and a couple of grammar niggles, this is actually reasonably close to how I translated it. All that cautiousness pays off if you end up with something that remotely makes sense.
Cf. Google, just for kicks:
ã‚‚ã¯ã‚„一刻ã®çŒ¶äºˆã‚‚許ã•ã‚Œãªã„。
Not allowed time to lose anymore.
Equally bad as above, but in some different ways.
æ»å› も……ã©ã†ã‚„ら人工心臓ã®ãƒãƒƒãƒˆãƒ¯ãƒ¼ã‚¯ã«æŽ¥ç¶šã•ã‚Œã¦ã„ãŸè‡ªå‹•åˆ¶å¾¡ãƒ—ãƒã‚°ãƒ©ãƒ ãŒã€ä½•ã‚‰ã‹ã®åŽŸå› ã§æš´èµ°ã—ãŸäº‹ã«ã‚ˆã‚‹ã‚‚ã®ã¨è¦‹ã‚‰ã‚Œã€é–¢ä¿‚者ã®é–“ã§ã‚‚大ããªæ³¢ç´‹ã‚’呼んã§ã„ã¾ã™ã€‚
Cause of death ... ... and Control Program also had an artificial heart is connected to the network apparently believed by the runaway thing for whatever reason, is causing a big stir among the people concerned.
Close, but no cigar. The part “is causing a big stir among the people concerned†is great, but “apparently believed by the runaway thing for whatever reason†is aswingandamiss; the “ã«ã‚ˆã‚‹â€ can mean “by†or “on account ofâ€, which is basically the same thing,
except for a subtlety in English in which “by†used with the passive voice indicates that something is directly responsible (and would be the subject of the sentence were the active voice used). In this case, Google Translate botched it, because the “ã¨â€ is more important here, and SYSTRAN’s “it is seen as†is much closer to the mark.
---
As to ã¯ãªã™, there is a simpler explanation: usually, when kana only are used instead of a common kanji, it’s because they wanted a complex kanji but didn’t want to bother fishing through the IME to select it. SYSTRAN’s rule-based system, with its gazillions of language pairs, reflects this. Google’s statistics-based system does not, and just goes with what it thinks would make sense statistically.