Jump to content

Part 2: Text Editing


Recommended Posts

** You will need to understand Part 1 of this tutorial before you can decompress the files used in any other parts**

Text editing is the easiest thing to do in just about any game that is being hacked. Text editing in colo/xd is also very easy to do but can be very tedious as the text in the game is scattered between many different files. This means that you have to first figure out which file has the text you want, then you may have to unarchive the file if it's in a .fsys file (most of the text is in .fsys files) and if it is in one then you must be careful not to increase the compressed file size for when you recompress it. However, if it isn't in a .fsys file then you can safely write over it without trouble.

All the text in-game is in Unicode which takes up 2 bytes per character. so for example the character 'A' is 0x0041. To change this to 'B' you just need to change the second byte from 0x42 to 0x41 and you can ignore the 0x00 between each letter.

The easiest place to start is in Start.dol in the "&&SystemData" folder of the ISO file system. This file doesn't get compressed so you can edit the text as much as you like without it increasing the file size, assuming you replace bytes rather than insert new bytes. The dol is really large and most of it isn't text so the text may be hard to find. Try using your hex editor to search for common words.

i-1.jpg

common_rel.fdat in common.fsys also contains a lot of interesting text like the names of all the pokemon and moves.

The text is always in a file I call a "string table". It stores each piece of text with an id number. That id number can be used elsewhere in the game to reference that piece of text. I will refer to this later when editing moves and pokemon.

You can recognise the start of the string table file because it has the acronym for the language the table is in. My games are the US version so the string tables all start with "US" (0x5553). You can search for this in a file to see if it contains an embedded string table, although it's usually obvious by the presence of unicode text. The table begins with pairs of values. The first 4 bytes is the id I mentioned previously and the second 4bytes is a pointer to the offset from the start of the string table (6bytes before US) to the string referenced by that id.

They exist in so many files that I can't list them all but here are some of the important ones.

Start.dol has a string table at 0x2CC810 (Colosseum), 0x374FC0 (XD) .

i-2.jpg

common_rel has some string tables, the first one being at 0x59890 (Colosseum), 0x0x4E274 (XD) and the next following consecutively.

A few files in fight_common are string tables.

i-3.jpgi-4.jpg

The third file in any .fsys containing a map is a string table. This will containing all the NPC dialogue for that map.

When changing the text, make sure you only write over one string at a time. You can usually spot the end of a string because it is terminated by 0x00 or 0xFF. Writing passed this will overwrite the next string and mess up the pointers. So either repoint or try to fit you text in the same amount of space.

In order to keep the compressed file size small, try to find some other strings which are less important and shorten them. For example, changing the description for the inner focus ability from "prevents flinching" to something like "stops flinches" will decrease the randomness. Also in XD it seems that all the text translations for other languages were included in the US version which is the one I happen to have been hacking. You can probably safely erase all of the text in those files by replacing each character with 0x00 since you'll probably only ever play in one language. I haven't done this myself but I doubt it would cause issues.

Part 3:

 


View full tutorial

Link to comment
Share on other sites

  • 4 weeks later...

I've been looking at the string tables in xD and have some updates. Each string is terminated by the value 0x00 spanning 2bytes. There are also some 'escape' characters (a bit like \n, \t or \\ in programming). They start with 0xFFFF followed by one or 2 bytes determining the special string or character that it will be replaced with in game. 0xFFFF00 and 0xFFFF03 seem to add a new line (so basically \n) . I think the gba games have 2 different ones as well; one which just continues the text on the next line and one which does the same but also scrolls up a row in the text box. Might be something similar. I don't know exactly what the rest do but some clearly fill in the player's name, the name of the npc meant to say that line or variable text like item or pokemon names. 0xFFFF07 and 0xFFFF53 always have one extra byte after them taking a total of 4 bytes for the special character. Every other special character I've seen so far is 3 bytes. The regular characters are 2 byte unicode characters and the strings are 'null terminated' (end in 0x0000).

Colosseum is probably exactly the same but I haven't checked.

Edited by StarsMmd
Link to comment
Share on other sites

I started looking at this after I happened to see your post a while ago, and I ended up ripping the string tables from all of the different-language ROMs. I kind of took the statement that "They exist in so many files that I can't list them all" as a challenge.:tongue: But I don't really hang around here, so I'm not entirely sure if it would be okay for me to link the "text dumps" I made? I included the string ID for each string, thinking it'd be an easy thing for people to grab and use to match string IDs in move data and stuff if they want.

For those interested, a few things I learned...

The string tables in the Japanese Colosseum actually don't have a reliable language code like 0x5553 ("US", for the US ROM). A couple of tables have 0x0001 in that position, but most are just 0x0000, meaning it's really hard to search for those by identifying the header. (In xD, the string tables in the Japanese ROM use the code "JP". And if anyone is interested, the other languages use "FR", "GE", "IT", "SP", and "UK" (United Kingdom/EU English).) Most tables list the same IDs in the same order, though, independent of the ROM, so that's a fairly reliable way to find most of the string tables in the Japanese Colosseum. There are some that have some differences, though; in particular, not every string ID used in one ROM is always used in another (a good example is 0x44F2, present in EU ROMs only).

It might go without saying, but a given ID refers to the same thing between different ROMs (JP/US/EU). As an example, Bulbasaur's species name has ID 0x03E9 in US, UK, French, German, Italian, Japanese, and Spanish string tables. Some string tables have copies in multiple files, though, and in a few cases, the same string ID may have slightly different text in one location than it does in another. For instance, string 0x3BFF in the US Colosseum in most places says a save file "could not be created", but in one place it says "could not be made".

As you say, StarsMmd, most of the special 0xFFFF "characters" use 3 bytes total. Those that use a total of 4 bytes are where 0xFFFF is followed by 0x07, 09, 38, 52, 53, 5B, or 5C. One seems to use 7 bytes total: where 0xFFFF is followed by 0x08. And as near as I can tell, all these "characters" are indeed basically the same between Colosseum and xD. I've put the functions as I know them in a spoiler below (the first line is 0xFFFF00, the next is 0xFFFF01, and so on, and lines with "--" are unused). Some are easy enough, but at some point, I got frustrated with trying to pin down all of them, so I just started identifying most of them as "unknown" or something suitably generic:tongue:. 0xFFFF59 (bubble_or_speaker) and 0xFFFF6A (maybe_speaker_ID_toggle) are interesting. Often times 0xFFFF59 will print a speech bubble before a character's dialogue; if 0xFFFF6A is used, though (I've only seen them together, 0xFFFF6AFFFF59), it seems to reveal the character's identity so that any use of 0xFFFF59 thereafter prints the character's name instead (without needing to use 0xFFFF6A again).

newline

unknown_01

dialogue_end

clear_window

furi_kanji

furi_kana

furi_close

unknown2_07

unknown5_08

unknown2_09

--

unknown_0B

unknown_0C

some_pokemon_0D

some_pokemon_0E

some_pokemon_0F

some_pokemon_10

some_pokemon_11

some_pokemon_12

Player_alt

sent_out_pokemon_2

sent_out_pokemon_1

some_pokemon_16

some_pokemon_17

some_pokemon_18

some_pokemon_19

some_ability_1A

some_ability_1B

some_ability_1C

some_ability_1D

some_pokemon_1E

unknown_1F

some_pokemon_20

some_pokemon_21

opp_trainer_class

opp_trainer_name

unknown_24

some_opponent_24

some_opponent_26

some_opponent_27

some_move_28

some_item_29

--

Player

Rui

some_item_2D

some_item_2E

unknown_2F

var_0

var_1

var_2

var_3

var_4

var_5

var_6

var_7

unknown2_38

var_9

--

maybe_location

--

unknown_3D

--

--

--

unknown_41

unknown_42

unknown_43

unknown_44

unknown_45

unknown_46

unknown_47

--

unknown_49

--

unknown_4B

unknown_4C

unknown_4D

some_pokemon_4E

--

unknown_50

--

unknown2_52

unknown2_53

--

unknown_55

unknown_56

unknown_57

unknown_58

bubble_or_speaker

--

unknown2_5B

unknown2_5C

unknown_5D

unknown_5E

unknown_5F

--

unknown_61

unknown_62

--

unknown_64

unknown_65

--

unknown_67

--

unknown_69

maybe_speaker_ID_toggle

--

--

unknown_6D

unknown_6E

Edited by Tiddlywinks
typo
Link to comment
Share on other sites

That's really amazing! I don't know how you found the patience for that. I wrote an app that lets me type in an id and it automatically searches for me (and can replace strings) although It only searches the major ones right now because I couldn't be bothered to track down all the other files. The details of the special characters is really great too (that 7 byte one though 8O). I was dreading the day I'd have to go and find those.

I'd really love to use your text dump as a reference if you make it available and maybe you could include a list of the fsys files that each table comes from?

Link to comment
Share on other sites

Patience? I saw somewhere around here you said you've been working on these games for about a year, I think. That's patience! But when someone tells me, "This is basically how strings work" (especially being something as simple as text), man, I can absolutely go to town figuring the little bits that are missing and making something of it. :biggrin:

I don't have a problem releasing the text dumps I made, I was just being paranoid about rules/etiquette here. I don't really expect any problems, though, so here are the files of all the strings I ripped. For some reason, I'm having real trouble uploading most of the Colosseum files individually (even though I previously uploaded almost the same files with no problem), so I just packed all the languages (US, FR/GE/IT/SP/UK, JP) together into an archive; I figure US/English will probably be of most interest anyway.

Those files do have every table (that I found), so there will be some duplicate IDs/strings. But I think I've included enough information to be useful to you/anyone playing with these games. Like I said above, it's not really possible to look for the Japanese Colosseum tables directly, so it's possible I missed some oddballs there. I in fact did find one table I had missed when I did some manual checking.

Link to comment
Share on other sites

Hahaha that's a fair point! This is really cool though. I'm assuming you wrote a program of sorts to dump all text right? If so, could you upload the source code for that as well. Mine isn't complete yet and it would be great to see how you did it (if you don't mind of course). You don't seem to have mentioned any such code but you'd have to be pretty dedicated to do all of this manually xD.

Also, from the values I'm seeing, it looks like 0xFFFF08 almost definitely changes the font colour of the text. The next 4 values are RGBA values which determine the colour.

Edited by StarsMmd
Link to comment
Share on other sites

I'd wager you're right about 0xFFFF08 changing the font color.

I certainly did write some Java. It's in pieces and the output of one doesn't always feed right into the next... I guess I can, though. Here. (Lemme also disclaim that it's probably not written to a professional standard:tongue:.)

If you run any of it, it might be helpful to know that I like to use Excel (Calc, actually) to sort and prune my intermediate output files where it's necessary. That's the nice thing about using tabs as delimiters, super easy to put into and take out of a spreadsheet.:wink: In one program, I'm also throwing much more than is remotely useful to stderr, I just haven't felt like cutting it out... (You could in fact pretty well remove anything I'm printing to stderr at this point.)

FYI, it usually takes me about 20-30 minutes to search through every file extracted from any given ROM. The other programs don't even take a minute to run, though, I think. Also FYI, one of my intermediate steps sees me writing a file for every table, and I did this because (at the moment, at least), I have a notion of only opening the file(s) I need and reading strings from there. Pretty much the only reason I went beyond that stage was to make it into a convenient text dump.

Link to comment
Share on other sites

I'd wager you're right about 0xFFFF08 changing the font color.

I certainly did write some Java. It's in pieces and the output of one doesn't always feed right into the next... I guess I can, though. Here. (Lemme also disclaim that it's probably not written to a professional standard:tongue:.)

If you run any of it, it might be helpful to know that I like to use Excel (Calc, actually) to sort and prune my intermediate output files where it's necessary. That's the nice thing about using tabs as delimiters, super easy to put into and take out of a spreadsheet.:wink: In one program, I'm also throwing much more than is remotely useful to stderr, I just haven't felt like cutting it out... (You could in fact pretty well remove anything I'm printing to stderr at this point.)

FYI, it usually takes me about 20-30 minutes to search through every file extracted from any given ROM. The other programs don't even take a minute to run, though, I think. Also FYI, one of my intermediate steps sees me writing a file for every table, and I did this because (at the moment, at least), I have a notion of only opening the file(s) I need and reading strings from there. Pretty much the only reason I went beyond that stage was to make it into a convenient text dump.

Thanks for that. I just want to look through and make sure I haven't missed anything. You managed to parse every single file so your code can handle anything that the game contains. Mine currently only works perfectly for common_rel, tableres2 and start.dol but will crash on things I hadn't seen like 0xFFFF08. I haven't used java in a while but I think I can still understand it.

Edited by StarsMmd
Link to comment
Share on other sites

  • 3 weeks later...
I'd wager you're right about 0xFFFF08 changing the font color.

Just a little update. 0xFFFF08 does indeed change the font colour. The next bytes are in RGBA order but the Alpha channel doesn't appear to have any effect on the font.

0xFFFF38 also changes the font colour but it uses a small group of predefined colours based on the following 1byte. The colours are as follows:

0x00 white

0x01 yellow

0x02 green

0x03 dark blue

0x04 orange

0x05 black

The range of colours is small but it uses fewer bytes.

Link to comment
Share on other sites

  • 8 months later...

In my slow exploration of the assembly code, I just discovered that string tables are actually supposed to be linked lists (linking one string table to another). It's kind of trivial, but I kind of want to write it down anyway so I have somewhere to look back to if I need to. I'm just gonna lay out the whole thing, in fact, to make it a bit easier...

(This is true in xD at least, I can't guarantee it is for Colosseum. Certainly, as I mentioned above, the language code isn't really used in the Japanese Colossuem.)

String table structure:

  • 0x00 -- 4 bytes? -- Unknown (usually 0 or 1)
  • 0x04 -- 2 bytes -- Number of entries in string info list
  • 0x06 -- 2 bytes -- Language code (two ASCII letters)
  • 0x08 -- 4 bytes -- Link to next string table (an address; hard-coded 0, but filled in when the game runs)
  • 0x0C -- 4 bytes -- Link to previous string table (an address; hard-coded 0, but filled in when the game runs)
  • 0x10 -- ... -- List of string info...
  • 0x... -- ... -- List of strings...

Each entry in the string info list:

  • 0x00 -- 4 bytes -- String ID
  • 0x04 -- 4 bytes -- Offset of string text from the start of the table

The string IDs in each entry are always higher than the ones before it (i.e., goes from low to high), but the offset can be anything.

The links can run between different languages of string tables (in the US game, a JP table is linked in the middle of US tables). That seems to actually be a good part of the reason they're linked (though I haven't seen it used to that effect exactly, since I'm not using the PAL game right now where you can actually use different languages).

FWIW, string IDs may have a cap of 0xEA5F, too.

Edited by Tiddlywinks
Link to comment
Share on other sites

Sure, I can send that map. =) I'll include what I have for Colosseum too.

[ATTACH]12954[/ATTACH]

Some of the function names are actually names I've pulled from what looks like error message data or something in some places in the games. (I wrote a bit of a program to search for those patterns and rename the right function. For that matter, I also used the same program to remove those annoying places where Dolphin mistakenly inserts the start of a new function in the middle of another.)

Names that I've made, though, I like to put "q_" (like a substitute for a "?") at the beginning if I'm somehow not confident it's correct, or if I'm even less confident, I'll even just leave the default name and append something at the end so I at least know I've seen it if I run into it again (e.g., like "zz_028b5c8_q_AI_element_set" or "zz_010ae8c_q_Copy_helper").

Edit:

Oh, these might also be useful... Various structure definitions or partitions, some function "maps" (like input->called function, for some that seem to use something like "select case")... And in particular, all the identifications of the r13 pointers I know.

[ATTACH]12955[/ATTACH]

[ATTACH]12956[/ATTACH]

Edited by Tiddlywinks
Link to comment
Share on other sites

Sure, I can send that map. =) I'll include what I have for Colosseum too.

Thanks G, I'm looking through it now. Interested to see what I'll find. Here's mine as well. All the functions I named are at the bottom of the list alphabetically (after the default named ones).

[ATTACH]12957[/ATTACH]

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...