Tenshi No Uta was similar in nature, and why I started the universal routine to begin with. A different text engine was used for every different kind of text. Multiple text engines were used on the same screen! A single menu screen could use say 3 or 4 different text engines. A typical battle could see say 5 or 6 depending on what you do. After I implemented 2 or 3 VWFs and discovered how many more there were, I had enough! No sense rewriting the same basic code over and over. That's software design 101. It indicates a problem in your design!
What's even more interesting is Tenshi No Uta's lineage is Wolf Team, who did Tales of Phantasia and Star Ocean. Several of those guys went to Tri-Ace if I'm not mistaken. I imagine there's probably a loose connection of people between the two.
As far as practicality and speed on the SNES, mileage will vary. There are several text engine scenarios on the SNES (sprite text, straight tilemap text, dynamic draw text etc.) It's definitely fine for dialog where you typically print a character per frame. It's probably OK for anything where the screen is already drawn dynamically. It's generally OK for basic menu screens too if they are fairly static, or don't fill the entire screen with walls of text.
Where you run into issues is when the original font is fully stored in VRAM and text routine was a straight tilemap to that font. The entire screen is typically rendered in a single frame, refreshed each frame in that situation. That's where you really suffer, especially if the screen is completely filled with text such as a scrolling full screen item list. It can be difficult to VWF that situation properly, even when you do a specialized routine (not even possible in some cases if you don't have enough VRAM). Add in the overhead of the universal routine and I've found it very difficult to get acceptable speeds. The more flexible I make my routine, the more overhead it incurs. So, some balance must be struck. It will certainly not be ideal for all cases. It's just not possible.
I am experimenting with several ideas to do a better global job on all situations. Rather than a true universal routine, I can be more modular where you plug in the correct function calls from a library designed to work together for the situation you need. It's not as user friendly (all you do now is define a handful of starting variables and a few things like what values to drop out of the text rendering loop with), but more practical. Another idea is perhaps there should be three VWFs to rule them all, one for each major type (sprite, straight tilemap, dynamic). Then it could be optimized for each target reducing overhead greatly.
So, there's definitely plenty left to do. In the end, it may not be something very useful to others. As an end user, there's a balance between how much time you'd need to learn my system to use effectively versus time saved from not doing it yourself. You already need to be an intermediate hacker to figure out where this routine needs to go and configure it properly. If the learning curve continues upward and requires more of the user, and may deliver slow results, one might be more inclined to do it themselves. We'll see where this experiment goes. It will definitely be useful to me if nothing else.