Well, play the game in your emu, then run the program, it'll read in Ram or some kind of memory of the emu when the text is in, then translate the text to the destinated language with the preset text.
HOW would that be accomplished? The answer is why this hasn't been done. Where to read in RAM or VRAM (when text is present) is, for the most part, game specific. What works on one game will likely not work on another. Also, there's countless ways to put the text together and get it on the screen. The same game could even use several different methods to create text, at the same time, on the same screen. The text may be sprites, the text may be tiles on one layer, the text may be spread across several layers. On some consoles, such as the SNES, Per scan-line effects may be applied that you can't duplicate externally. There always needs to be that manual involvement to figure out how game x does it's text and where to look for it.
With that said, you can code a custom emulator that would be set up to use information supplied about each game to do what you desire. You have the limitation that the end result only works on your special emulator, is limited to using information your emulator can interpret (which may not work for all games), and limited to only those games people have reverse engineered enough to support.
Lastly, the time it would take someone to reverse engineer the game enough to make it fully translated with your special emulator is likely to be not too far away from doing a 'traditional' translation. A 'traditional' translation would work on all emulators (and never become outdated or permanently broken) and allow for much more flexibility. So, this is the basic idea why it's a bit impractical and no one has done it despite many people asking about this idea over the years.