The PPU is optimized for left-to-right or top-to-bottom writing (auto-increment address by 1 or $20 after each write).
So unfortunately, writing in the reverse direction would make for a slower to execute routine that may or may not work within the VRAM access time limit (during VBlank).
(though I guess I'm not sure exactly how critical timing on the NES usually is. I just have bad memories of Dragon Scroll (had to replace the VRAM buffer routines just to write four freakin' bytes extra (widen the window 2 tiles) to VRAM without causing flicker)
I don't think this would make a big difference. If the text is being printed letter by letter, the routine probably idles for a few frames anyway. Maybe I'm underestimating it, though, I've only coded stuff like this from scratch, rather than trying to work it into existing code.
I really didn't get it. Can you explain more?
Try following these tutorials:
http://www.nintendoage.com/forum/messageview.cfm?catid=22&threadid=7155They're written for developing your own game, but the skills they teach can be applied to ROM hacking. Basicallly, if you look at the background2 tutorial's example loop:
LoadBackground:
LDA $2002 ; read PPU status to reset the high/low latch
LDA #$20
STA $2006 ; write the high byte of $2000 address
LDA #$00
STA $2006 ; write the low byte of $2000 address
LDX #$00 ; start out at 0
LoadBackgroundLoop:
LDA background, x ; load data from address (background + the value in x)
STA $2007 ; write to PPU
INX ; X = X + 1
CPX #$80 ; Compare X to hex $80, decimal 128 - copying 128 bytes
BNE LoadBackgroundLoop ; Branch to LoadBackgroundLoop if compare was Not Equal to zero
; if compare was equal to 128, keep going down
You'll want to change it to something like:
LoadBackground:
LDX #$00 ; start out at 0
LDY #$80 ; example starting low byte of PPU address
LoadBackgroundLoop:
LDA $2002 ; read PPU status to reset the high/low latch
LDA #$20
STA $2006 ; write the high byte of $2080 address
STY $2006 ; write the low byte of $2080 address (loaded in Y)
LDA background, x ; load data from address (background + the value in x)
STA $2007 ; write to PPU
DEY ; Decrease Y, next tile will be written to the left of previous
INX ; X = X + 1
CPX #$80 ; Compare X to hex $80, decimal 128 - copying 128 bytes
BNE LoadBackgroundLoop ; Branch to LoadBackgroundLoop if compare was Not Equal to zero
; if compare was equal to 128, keep going down
Now, this code is probably nothing like the code you'll be working with in the game, but it gives you an idea of what you need to do. You need to write the new PPU address every time you read a character, rather than at the start of a string of characters.
I'm no expert, so there might be a better way of doing this.