Jump to content
  • Sign in to follow this  

    Part 2: Text Editing

    ** You will need to understand Part 1 of this tutorial before you can decompress the files used in any other parts**

    Text editing is the easiest thing to do in just about any game that is being hacked. Text editing in colo/xd is also very easy to do but can be very tedious as the text in the game is scattered between many different files. This means that you have to first figure out which file has the text you want, then you may have to unarchive the file if it's in a .fsys file (most of the text is in .fsys files) and if it is in one then you must be careful not to increase the compressed file size for when you recompress it. However, if it isn't in a .fsys file then you can safely write over it without trouble.

    All the text in-game is in Unicode which takes up 2 bytes per character. so for example the character 'A' is 0x0041. To change this to 'B' you just need to change the second byte from 0x42 to 0x41 and you can ignore the 0x00 between each letter.

    The easiest place to start is in Start.dol in the "&&SystemData" folder of the ISO file system. This file doesn't get compressed so you can edit the text as much as you like without it increasing the file size, assuming you replace bytes rather than insert new bytes. The dol is really large and most of it isn't text so the text may be hard to find. Try using your hex editor to search for common words.

    i-1.jpg

    common_rel.fdat in common.fsys also contains a lot of interesting text like the names of all the pokemon and moves.

    The text is always in a file I call a "string table". It stores each piece of text with an id number. That id number can be used elsewhere in the game to reference that piece of text. I will refer to this later when editing moves and pokemon.

    You can recognise the start of the string table file because it has the acronym for the language the table is in. My games are the US version so the string tables all start with "US" (0x5553). You can search for this in a file to see if it contains an embedded string table, although it's usually obvious by the presence of unicode text. The table begins with pairs of values. The first 4 bytes is the id I mentioned previously and the second 4bytes is a pointer to the offset from the start of the string table (6bytes before US) to the string referenced by that id.

    They exist in so many files that I can't list them all but here are some of the important ones.

    Start.dol has a string table at 0x2CC810 (Colosseum), 0x374FC0 (XD) .

    i-2.jpg

    common_rel has some string tables, the first one being at 0x59890 (Colosseum), 0x0x4E274 (XD) and the next following consecutively.

    A few files in fight_common are string tables.

    i-3.jpgi-4.jpg

    The third file in any .fsys containing a map is a string table. This will containing all the NPC dialogue for that map.

    When changing the text, make sure you only write over one string at a time. You can usually spot the end of a string because it is terminated by 0x00 or 0xFF. Writing passed this will overwrite the next string and mess up the pointers. So either repoint or try to fit you text in the same amount of space.

    In order to keep the compressed file size small, try to find some other strings which are less important and shorten them. For example, changing the description for the inner focus ability from "prevents flinching" to something like "stops flinches" will decrease the randomness. Also in XD it seems that all the text translations for other languages were included in the US version which is the one I happen to have been hacking. You can probably safely erase all of the text in those files by replacing each character with 0x00 since you'll probably only ever play in one language. I haven't done this myself but I doubt it would cause issues.

    Part 3:

     

    Edited by evandixon
    Update formatting

    Sign in to follow this  


    User Feedback

    Recommended Comments

    I've been looking at the string tables in xD and have some updates. Each string is terminated by the value 0x00 spanning 2bytes. There are also some 'escape' characters (a bit like \n, \t or \\ in programming). They start with 0xFFFF followed by one or 2 bytes determining the special string or character that it will be replaced with in game. 0xFFFF00 and 0xFFFF03 seem to add a new line (so basically \n) . I think the gba games have 2 different ones as well; one which just continues the text on the next line and one which does the same but also scrolls up a row in the text box. Might be something similar. I don't know exactly what the rest do but some clearly fill in the player's name, the name of the npc meant to say that line or variable text like item or pokemon names. 0xFFFF07 and 0xFFFF53 always have one extra byte after them taking a total of 4 bytes for the special character. Every other special character I've seen so far is 3 bytes. The regular characters are 2 byte unicode characters and the strings are 'null terminated' (end in 0x0000).

    Colosseum is probably exactly the same but I haven't checked.

    Edited by StarsMmd

    Share this comment


    Link to comment
    Share on other sites

    I started looking at this after I happened to see your post a while ago, and I ended up ripping the string tables from all of the different-language ROMs. I kind of took the statement that "They exist in so many files that I can't list them all" as a challenge.:tongue: But I don't really hang around here, so I'm not entirely sure if it would be okay for me to link the "text dumps" I made? I included the string ID for each string, thinking it'd be an easy thing for people to grab and use to match string IDs in move data and stuff if they want.

    For those interested, a few things I learned...

    The string tables in the Japanese Colosseum actually don't have a reliable language code like 0x5553 ("US", for the US ROM). A couple of tables have 0x0001 in that position, but most are just 0x0000, meaning it's really hard to search for those by identifying the header. (In xD, the string tables in the Japanese ROM use the code "JP". And if anyone is interested, the other languages use "FR", "GE", "IT", "SP", and "UK" (United Kingdom/EU English).) Most tables list the same IDs in the same order, though, independent of the ROM, so that's a fairly reliable way to find most of the string tables in the Japanese Colosseum. There are some that have some differences, though; in particular, not every string ID used in one ROM is always used in another (a good example is 0x44F2, present in EU ROMs only).

    It might go without saying, but a given ID refers to the same thing between different ROMs (JP/US/EU). As an example, Bulbasaur's species name has ID 0x03E9 in US, UK, French, German, Italian, Japanese, and Spanish string tables. Some string tables have copies in multiple files, though, and in a few cases, the same string ID may have slightly different text in one location than it does in another. For instance, string 0x3BFF in the US Colosseum in most places says a save file "could not be created", but in one place it says "could not be made".

    As you say, StarsMmd, most of the special 0xFFFF "characters" use 3 bytes total. Those that use a total of 4 bytes are where 0xFFFF is followed by 0x07, 09, 38, 52, 53, 5B, or 5C. One seems to use 7 bytes total: where 0xFFFF is followed by 0x08. And as near as I can tell, all these "characters" are indeed basically the same between Colosseum and xD. I've put the functions as I know them in a spoiler below (the first line is 0xFFFF00, the next is 0xFFFF01, and so on, and lines with "--" are unused). Some are easy enough, but at some point, I got frustrated with trying to pin down all of them, so I just started identifying most of them as "unknown" or something suitably generic:tongue:. 0xFFFF59 (bubble_or_speaker) and 0xFFFF6A (maybe_speaker_ID_toggle) are interesting. Often times 0xFFFF59 will print a speech bubble before a character's dialogue; if 0xFFFF6A is used, though (I've only seen them together, 0xFFFF6AFFFF59), it seems to reveal the character's identity so that any use of 0xFFFF59 thereafter prints the character's name instead (without needing to use 0xFFFF6A again).

    newline

    unknown_01

    dialogue_end

    clear_window

    furi_kanji

    furi_kana

    furi_close

    unknown2_07

    unknown5_08

    unknown2_09

    --

    unknown_0B

    unknown_0C

    some_pokemon_0D

    some_pokemon_0E

    some_pokemon_0F

    some_pokemon_10

    some_pokemon_11

    some_pokemon_12

    Player_alt

    sent_out_pokemon_2

    sent_out_pokemon_1

    some_pokemon_16

    some_pokemon_17

    some_pokemon_18

    some_pokemon_19

    some_ability_1A

    some_ability_1B

    some_ability_1C

    some_ability_1D

    some_pokemon_1E

    unknown_1F

    some_pokemon_20

    some_pokemon_21

    opp_trainer_class

    opp_trainer_name

    unknown_24

    some_opponent_24

    some_opponent_26

    some_opponent_27

    some_move_28

    some_item_29

    --

    Player

    Rui

    some_item_2D

    some_item_2E

    unknown_2F

    var_0

    var_1

    var_2

    var_3

    var_4

    var_5

    var_6

    var_7

    unknown2_38

    var_9

    --

    maybe_location

    --

    unknown_3D

    --

    --

    --

    unknown_41

    unknown_42

    unknown_43

    unknown_44

    unknown_45

    unknown_46

    unknown_47

    --

    unknown_49

    --

    unknown_4B

    unknown_4C

    unknown_4D

    some_pokemon_4E

    --

    unknown_50

    --

    unknown2_52

    unknown2_53

    --

    unknown_55

    unknown_56

    unknown_57

    unknown_58

    bubble_or_speaker

    --

    unknown2_5B

    unknown2_5C

    unknown_5D

    unknown_5E

    unknown_5F

    --

    unknown_61

    unknown_62

    --

    unknown_64

    unknown_65

    --

    unknown_67

    --

    unknown_69

    maybe_speaker_ID_toggle

    --

    --

    unknown_6D

    unknown_6E

    Edited by Tiddlywinks
    typo

    Share this comment


    Link to comment
    Share on other sites

    That's really amazing! I don't know how you found the patience for that. I wrote an app that lets me type in an id and it automatically searches for me (and can replace strings) although It only searches the major ones right now because I couldn't be bothered to track down all the other files. The details of the special characters is really great too (that 7 byte one though 8O). I was dreading the day I'd have to go and find those.

    I'd really love to use your text dump as a reference if you make it available and maybe you could include a list of the fsys files that each table comes from?

    Share this comment


    Link to comment
    Share on other sites

    Patience? I saw somewhere around here you said you've been working on these games for about a year, I think. That's patience! But when someone tells me, "This is basically how strings work" (especially being something as simple as text), man, I can absolutely go to town figuring the little bits that are missing and making something of it. :biggrin:

    I don't have a problem releasing the text dumps I made, I was just being paranoid about rules/etiquette here. I don't really expect any problems, though, so here are the files of all the strings I ripped. For some reason, I'm having real trouble uploading most of the Colosseum files individually (even though I previously uploaded almost the same files with no problem), so I just packed all the languages (US, FR/GE/IT/SP/UK, JP) together into an archive; I figure US/English will probably be of most interest anyway.

    Those files do have every table (that I found), so there will be some duplicate IDs/strings. But I think I've included enough information to be useful to you/anyone playing with these games. Like I said above, it's not really possible to look for the Japanese Colosseum tables directly, so it's possible I missed some oddballs there. I in fact did find one table I had missed when I did some manual checking.

    Share this comment


    Link to comment
    Share on other sites

    Hahaha that's a fair point! This is really cool though. I'm assuming you wrote a program of sorts to dump all text right? If so, could you upload the source code for that as well. Mine isn't complete yet and it would be great to see how you did it (if you don't mind of course). You don't seem to have mentioned any such code but you'd have to be pretty dedicated to do all of this manually xD.

    Also, from the values I'm seeing, it looks like 0xFFFF08 almost definitely changes the font colour of the text. The next 4 values are RGBA values which determine the colour.

    Edited by StarsMmd

    Share this comment


    Link to comment
    Share on other sites

    I'd wager you're right about 0xFFFF08 changing the font color.

    I certainly did write some Java. It's in pieces and the output of one doesn't always feed right into the next... I guess I can, though. Here. (Lemme also disclaim that it's probably not written to a professional standard:tongue:.)

    If you run any of it, it might be helpful to know that I like to use Excel (Calc, actually) to sort and prune my intermediate output files where it's necessary. That's the nice thing about using tabs as delimiters, super easy to put into and take out of a spreadsheet.:wink: In one program, I'm also throwing much more than is remotely useful to stderr, I just haven't felt like cutting it out... (You could in fact pretty well remove anything I'm printing to stderr at this point.)

    FYI, it usually takes me about 20-30 minutes to search through every file extracted from any given ROM. The other programs don't even take a minute to run, though, I think. Also FYI, one of my intermediate steps sees me writing a file for every table, and I did this because (at the moment, at least), I have a notion of only opening the file(s) I need and reading strings from there. Pretty much the only reason I went beyond that stage was to make it into a convenient text dump.

    Share this comment


    Link to comment
    Share on other sites
    I'd wager you're right about 0xFFFF08 changing the font color.

    I certainly did write some Java. It's in pieces and the output of one doesn't always feed right into the next... I guess I can, though. Here. (Lemme also disclaim that it's probably not written to a professional standard:tongue:.)

    If you run any of it, it might be helpful to know that I like to use Excel (Calc, actually) to sort and prune my intermediate output files where it's necessary. That's the nice thing about using tabs as delimiters, super easy to put into and take out of a spreadsheet.:wink: In one program, I'm also throwing much more than is remotely useful to stderr, I just haven't felt like cutting it out... (You could in fact pretty well remove anything I'm printing to stderr at this point.)

    FYI, it usually takes me about 20-30 minutes to search through every file extracted from any given ROM. The other programs don't even take a minute to run, though, I think. Also FYI, one of my intermediate steps sees me writing a file for every table, and I did this because (at the moment, at least), I have a notion of only opening the file(s) I need and reading strings from there. Pretty much the only reason I went beyond that stage was to make it into a convenient text dump.

    Thanks for that. I just want to look through and make sure I haven't missed anything. You managed to parse every single file so your code can handle anything that the game contains. Mine currently only works perfectly for common_rel, tableres2 and start.dol but will crash on things I hadn't seen like 0xFFFF08. I haven't used java in a while but I think I can still understand it.

    Edited by StarsMmd

    Share this comment


    Link to comment
    Share on other sites
    I'd wager you're right about 0xFFFF08 changing the font color.

    Just a little update. 0xFFFF08 does indeed change the font colour. The next bytes are in RGBA order but the Alpha channel doesn't appear to have any effect on the font.

    0xFFFF38 also changes the font colour but it uses a small group of predefined colours based on the following 1byte. The colours are as follows:

    0x00 white

    0x01 yellow

    0x02 green

    0x03 dark blue

    0x04 orange

    0x05 black

    The range of colours is small but it uses fewer bytes.

    Share this comment


    Link to comment
    Share on other sites

    In my slow exploration of the assembly code, I just discovered that string tables are actually supposed to be linked lists (linking one string table to another). It's kind of trivial, but I kind of want to write it down anyway so I have somewhere to look back to if I need to. I'm just gonna lay out the whole thing, in fact, to make it a bit easier...

    (This is true in xD at least, I can't guarantee it is for Colosseum. Certainly, as I mentioned above, the language code isn't really used in the Japanese Colossuem.)

    String table structure:

    • 0x00 -- 4 bytes? -- Unknown (usually 0 or 1)
    • 0x04 -- 2 bytes -- Number of entries in string info list
    • 0x06 -- 2 bytes -- Language code (two ASCII letters)
    • 0x08 -- 4 bytes -- Link to next string table (an address; hard-coded 0, but filled in when the game runs)
    • 0x0C -- 4 bytes -- Link to previous string table (an address; hard-coded 0, but filled in when the game runs)
    • 0x10 -- ... -- List of string info...
    • 0x... -- ... -- List of strings...

    Each entry in the string info list:

    • 0x00 -- 4 bytes -- String ID
    • 0x04 -- 4 bytes -- Offset of string text from the start of the table

    The string IDs in each entry are always higher than the ones before it (i.e., goes from low to high), but the offset can be anything.

    The links can run between different languages of string tables (in the US game, a JP table is linked in the middle of US tables). That seems to actually be a good part of the reason they're linked (though I haven't seen it used to that effect exactly, since I'm not using the PAL game right now where you can actually use different languages).

    FWIW, string IDs may have a cap of 0xEA5F, too.

    Edited by Tiddlywinks

    Share this comment


    Link to comment
    Share on other sites

    Are you naming the functions in the ASM in a symbol map in dolphin? If so I'd love to see what you've discovered so far. I could send you my symbol map file as well.

    Share this comment


    Link to comment
    Share on other sites

    Sure, I can send that map. =) I'll include what I have for Colosseum too.

    [ATTACH]12954[/ATTACH]

    Some of the function names are actually names I've pulled from what looks like error message data or something in some places in the games. (I wrote a bit of a program to search for those patterns and rename the right function. For that matter, I also used the same program to remove those annoying places where Dolphin mistakenly inserts the start of a new function in the middle of another.)

    Names that I've made, though, I like to put "q_" (like a substitute for a "?") at the beginning if I'm somehow not confident it's correct, or if I'm even less confident, I'll even just leave the default name and append something at the end so I at least know I've seen it if I run into it again (e.g., like "zz_028b5c8_q_AI_element_set" or "zz_010ae8c_q_Copy_helper").

    Edit:

    Oh, these might also be useful... Various structure definitions or partitions, some function "maps" (like input->called function, for some that seem to use something like "select case")... And in particular, all the identifications of the r13 pointers I know.

    [ATTACH]12955[/ATTACH]

    [ATTACH]12956[/ATTACH]

    Edited by Tiddlywinks

    Share this comment


    Link to comment
    Share on other sites
    Sure, I can send that map. =) I'll include what I have for Colosseum too.

    Thanks G, I'm looking through it now. Interested to see what I'll find. Here's mine as well. All the functions I named are at the bottom of the list alphabetically (after the default named ones).

    [ATTACH]12957[/ATTACH]

    Share this comment


    Link to comment
    Share on other sites


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...