StarsMmd Posted May 10, 2015 Posted May 10, 2015 ** You will need to understand Part 1 of this tutorial before you can decompress the files used in any other parts** Text editing is the easiest thing to do in just about any game that is being hacked. Text editing in colo/xd is also very easy to do but can be very tedious as the text in the game is scattered between many different files. This means that you have to first figure out which file has the text you want, then you may have to unarchive the file if it's in a .fsys file (most of the text is in .fsys files) and if it is in one then you must be careful not to increase the compressed file size for when you recompress it. However, if it isn't in a .fsys file then you can safely write over it without trouble. All the text in-game is in Unicode which takes up 2 bytes per character. so for example the character 'A' is 0x0041. To change this to 'B' you just need to change the second byte from 0x42 to 0x41 and you can ignore the 0x00 between each letter. The easiest place to start is in Start.dol in the "&&SystemData" folder of the ISO file system. This file doesn't get compressed so you can edit the text as much as you like without it increasing the file size, assuming you replace bytes rather than insert new bytes. The dol is really large and most of it isn't text so the text may be hard to find. Try using your hex editor to search for common words. common_rel.fdat in common.fsys also contains a lot of interesting text like the names of all the pokemon and moves. The text is always in a file I call a "string table". It stores each piece of text with an id number. That id number can be used elsewhere in the game to reference that piece of text. I will refer to this later when editing moves and pokemon. You can recognise the start of the string table file because it has the acronym for the language the table is in. My games are the US version so the string tables all start with "US" (0x5553). You can search for this in a file to see if it contains an embedded string table, although it's usually obvious by the presence of unicode text. The table begins with pairs of values. The first 4 bytes is the id I mentioned previously and the second 4bytes is a pointer to the offset from the start of the string table (6bytes before US) to the string referenced by that id. They exist in so many files that I can't list them all but here are some of the important ones. Start.dol has a string table at 0x2CC810 (Colosseum), 0x374FC0 (XD) . common_rel has some string tables, the first one being at 0x59890 (Colosseum), 0x0x4E274 (XD) and the next following consecutively. A few files in fight_common are string tables. The third file in any .fsys containing a map is a string table. This will containing all the NPC dialogue for that map. When changing the text, make sure you only write over one string at a time. You can usually spot the end of a string because it is terminated by 0x00 or 0xFF. Writing passed this will overwrite the next string and mess up the pointers. So either repoint or try to fit you text in the same amount of space. In order to keep the compressed file size small, try to find some other strings which are less important and shorten them. For example, changing the description for the inner focus ability from "prevents flinching" to something like "stops flinches" will decrease the randomness. Also in XD it seems that all the text translations for other languages were included in the US version which is the one I happen to have been hacking. You can probably safely erase all of the text in those files by replacing each character with 0x00 since you'll probably only ever play in one language. I haven't done this myself but I doubt it would cause issues. Part 3: View full tutorial
StarsMmd Posted June 2, 2015 Author Posted June 2, 2015 (edited) I've been looking at the string tables in and have some updates. Each string is terminated by the value 0x00 spanning 2bytes. There are also some 'escape' characters (a bit like \n, \t or \\ in programming). They start with 0xFFFF followed by one or 2 bytes determining the special string or character that it will be replaced with in game. 0xFFFF00 and 0xFFFF03 seem to add a new line (so basically \n) . I think the gba games have 2 different ones as well; one which just continues the text on the next line and one which does the same but also scrolls up a row in the text box. Might be something similar. I don't know exactly what the rest do but some clearly fill in the player's name, the name of the npc meant to say that line or variable text like item or pokemon names. 0xFFFF07 and 0xFFFF53 always have one extra byte after them taking a total of 4 bytes for the special character. Every other special character I've seen so far is 3 bytes. The regular characters are 2 byte unicode characters and the strings are 'null terminated' (end in 0x0000). Colosseum is probably exactly the same but I haven't checked. Edited June 13, 2016 by StarsMmd
Tiddlywinks Posted June 7, 2015 Posted June 7, 2015 (edited) I started looking at this after I happened to see your post a while ago, and I ended up ripping the string tables from all of the different-language ROMs. I kind of took the statement that "They exist in so many files that I can't list them all" as a challenge.:tongue: But I don't really hang around here, so I'm not entirely sure if it would be okay for me to link the "text dumps" I made? I included the string ID for each string, thinking it'd be an easy thing for people to grab and use to match string IDs in move data and stuff if they want. For those interested, a few things I learned... The string tables in the Japanese Colosseum actually don't have a reliable language code like 0x5553 ("US", for the US ROM). A couple of tables have 0x0001 in that position, but most are just 0x0000, meaning it's really hard to search for those by identifying the header. (In , the string tables in the Japanese ROM use the code "JP". And if anyone is interested, the other languages use "FR", "GE", "IT", "SP", and "UK" (United Kingdom/EU English).) Most tables list the same IDs in the same order, though, independent of the ROM, so that's a fairly reliable way to find most of the string tables in the Japanese Colosseum. There are some that have some differences, though; in particular, not every string ID used in one ROM is always used in another (a good example is 0x44F2, present in EU ROMs only). It might go without saying, but a given ID refers to the same thing between different ROMs (JP/US/EU). As an example, Bulbasaur's species name has ID 0x03E9 in US, UK, French, German, Italian, Japanese, and Spanish string tables. Some string tables have copies in multiple files, though, and in a few cases, the same string ID may have slightly different text in one location than it does in another. For instance, string 0x3BFF in the US Colosseum in most places says a save file "could not be created", but in one place it says "could not be made". As you say, StarsMmd, most of the special 0xFFFF "characters" use 3 bytes total. Those that use a total of 4 bytes are where 0xFFFF is followed by 0x07, 09, 38, 52, 53, 5B, or 5C. One seems to use 7 bytes total: where 0xFFFF is followed by 0x08. And as near as I can tell, all these "characters" are indeed basically the same between Colosseum and . I've put the functions as I know them in a spoiler below (the first line is 0xFFFF00, the next is 0xFFFF01, and so on, and lines with "--" are unused). Some are easy enough, but at some point, I got frustrated with trying to pin down all of them, so I just started identifying most of them as "unknown" or something suitably generic:tongue:. 0xFFFF59 (bubble_or_speaker) and 0xFFFF6A (maybe_speaker_ID_toggle) are interesting. Often times 0xFFFF59 will print a speech bubble before a character's dialogue; if 0xFFFF6A is used, though (I've only seen them together, 0xFFFF6AFFFF59), it seems to reveal the character's identity so that any use of 0xFFFF59 thereafter prints the character's name instead (without needing to use 0xFFFF6A again). newline unknown_01 dialogue_end clear_window furi_kanji furi_kana furi_close unknown2_07 unknown5_08 unknown2_09 -- unknown_0B unknown_0C some_pokemon_0D some_pokemon_0E some_pokemon_0F some_pokemon_10 some_pokemon_11 some_pokemon_12 Player_alt sent_out_pokemon_2 sent_out_pokemon_1 some_pokemon_16 some_pokemon_17 some_pokemon_18 some_pokemon_19 some_ability_1A some_ability_1B some_ability_1C some_ability_1D some_pokemon_1E unknown_1F some_pokemon_20 some_pokemon_21 opp_trainer_class opp_trainer_name unknown_24 some_opponent_24 some_opponent_26 some_opponent_27 some_move_28 some_item_29 -- Player Rui some_item_2D some_item_2E unknown_2F var_0 var_1 var_2 var_3 var_4 var_5 var_6 var_7 unknown2_38 var_9 -- maybe_location -- unknown_3D -- -- -- unknown_41 unknown_42 unknown_43 unknown_44 unknown_45 unknown_46 unknown_47 -- unknown_49 -- unknown_4B unknown_4C unknown_4D some_pokemon_4E -- unknown_50 -- unknown2_52 unknown2_53 -- unknown_55 unknown_56 unknown_57 unknown_58 bubble_or_speaker -- unknown2_5B unknown2_5C unknown_5D unknown_5E unknown_5F -- unknown_61 unknown_62 -- unknown_64 unknown_65 -- unknown_67 -- unknown_69 maybe_speaker_ID_toggle -- -- unknown_6D unknown_6E Edited June 7, 2015 by Tiddlywinks typo
StarsMmd Posted June 8, 2015 Author Posted June 8, 2015 That's really amazing! I don't know how you found the patience for that. I wrote an app that lets me type in an id and it automatically searches for me (and can replace strings) although It only searches the major ones right now because I couldn't be bothered to track down all the other files. The details of the special characters is really great too (that 7 byte one though 8O). I was dreading the day I'd have to go and find those. I'd really love to use your text dump as a reference if you make it available and maybe you could include a list of the fsys files that each table comes from?
Tiddlywinks Posted June 9, 2015 Posted June 9, 2015 Patience? I saw somewhere around here you said you've been working on these games for about a year, I think. That's patience! But when someone tells me, "This is basically how strings work" (especially being something as simple as text), man, I can absolutely go to town figuring the little bits that are missing and making something of it. :biggrin: I don't have a problem releasing the text dumps I made, I was just being paranoid about rules/etiquette here. I don't really expect any problems, though, so here are the files of all the strings I ripped. For some reason, I'm having real trouble uploading most of the Colosseum files individually (even though I previously uploaded almost the same files with no problem), so I just packed all the languages (US, FR/GE/IT/SP/UK, JP) together into an archive; I figure US/English will probably be of most interest anyway. Colo: US, All languages : _en_decoded.txt'>US, _US_EU_JP_decoded.zip'>All languages Those files do have every table (that I found), so there will be some duplicate IDs/strings. But I think I've included enough information to be useful to you/anyone playing with these games. Like I said above, it's not really possible to look for the Japanese Colosseum tables directly, so it's possible I missed some oddballs there. I in fact did find one table I had missed when I did some manual checking.
StarsMmd Posted June 9, 2015 Author Posted June 9, 2015 (edited) Hahaha that's a fair point! This is really cool though. I'm assuming you wrote a program of sorts to dump all text right? If so, could you upload the source code for that as well. Mine isn't complete yet and it would be great to see how you did it (if you don't mind of course). You don't seem to have mentioned any such code but you'd have to be pretty dedicated to do all of this manually . Also, from the values I'm seeing, it looks like 0xFFFF08 almost definitely changes the font colour of the text. The next 4 values are RGBA values which determine the colour. Edited June 9, 2015 by StarsMmd
Tiddlywinks Posted June 9, 2015 Posted June 9, 2015 I'd wager you're right about 0xFFFF08 changing the font color. I certainly did write some Java. It's in pieces and the output of one doesn't always feed right into the next... I guess I can, though. Here. (Lemme also disclaim that it's probably not written to a professional standard:tongue:.) If you run any of it, it might be helpful to know that I like to use Excel (Calc, actually) to sort and prune my intermediate output files where it's necessary. That's the nice thing about using tabs as delimiters, super easy to put into and take out of a spreadsheet.:wink: In one program, I'm also throwing much more than is remotely useful to stderr, I just haven't felt like cutting it out... (You could in fact pretty well remove anything I'm printing to stderr at this point.) FYI, it usually takes me about 20-30 minutes to search through every file extracted from any given ROM. The other programs don't even take a minute to run, though, I think. Also FYI, one of my intermediate steps sees me writing a file for every table, and I did this because (at the moment, at least), I have a notion of only opening the file(s) I need and reading strings from there. Pretty much the only reason I went beyond that stage was to make it into a convenient text dump.
StarsMmd Posted June 9, 2015 Author Posted June 9, 2015 (edited) I'd wager you're right about 0xFFFF08 changing the font color.I certainly did write some Java. It's in pieces and the output of one doesn't always feed right into the next... I guess I can, though. Here. (Lemme also disclaim that it's probably not written to a professional standard:tongue:.) If you run any of it, it might be helpful to know that I like to use Excel (Calc, actually) to sort and prune my intermediate output files where it's necessary. That's the nice thing about using tabs as delimiters, super easy to put into and take out of a spreadsheet.:wink: In one program, I'm also throwing much more than is remotely useful to stderr, I just haven't felt like cutting it out... (You could in fact pretty well remove anything I'm printing to stderr at this point.) FYI, it usually takes me about 20-30 minutes to search through every file extracted from any given ROM. The other programs don't even take a minute to run, though, I think. Also FYI, one of my intermediate steps sees me writing a file for every table, and I did this because (at the moment, at least), I have a notion of only opening the file(s) I need and reading strings from there. Pretty much the only reason I went beyond that stage was to make it into a convenient text dump. Thanks for that. I just want to look through and make sure I haven't missed anything. You managed to parse every single file so your code can handle anything that the game contains. Mine currently only works perfectly for common_rel, tableres2 and start.dol but will crash on things I hadn't seen like 0xFFFF08. I haven't used java in a while but I think I can still understand it. Edited June 10, 2015 by StarsMmd
StarsMmd Posted June 30, 2015 Author Posted June 30, 2015 I'd wager you're right about 0xFFFF08 changing the font color. Just a little update. 0xFFFF08 does indeed change the font colour. The next bytes are in RGBA order but the Alpha channel doesn't appear to have any effect on the font. 0xFFFF38 also changes the font colour but it uses a small group of predefined colours based on the following 1byte. The colours are as follows: 0x00 white 0x01 yellow 0x02 green 0x03 dark blue 0x04 orange 0x05 black The range of colours is small but it uses fewer bytes.
Tiddlywinks Posted March 3, 2016 Posted March 3, 2016 (edited) In my slow exploration of the assembly code, I just discovered that string tables are actually supposed to be linked lists (linking one string table to another). It's kind of trivial, but I kind of want to write it down anyway so I have somewhere to look back to if I need to. I'm just gonna lay out the whole thing, in fact, to make it a bit easier... (This is true in at least, I can't guarantee it is for Colosseum. Certainly, as I mentioned above, the language code isn't really used in the Japanese Colossuem.) String table structure: 0x00 -- 4 bytes? -- Unknown (usually 0 or 1) 0x04 -- 2 bytes -- Number of entries in string info list 0x06 -- 2 bytes -- Language code (two ASCII letters) 0x08 -- 4 bytes -- Link to next string table (an address; hard-coded 0, but filled in when the game runs) 0x0C -- 4 bytes -- Link to previous string table (an address; hard-coded 0, but filled in when the game runs) 0x10 -- ... -- List of string info... 0x... -- ... -- List of strings... Each entry in the string info list: 0x00 -- 4 bytes -- String ID 0x04 -- 4 bytes -- Offset of string text from the start of the table The string IDs in each entry are always higher than the ones before it (i.e., goes from low to high), but the offset can be anything. The links can run between different languages of string tables (in the US game, a JP table is linked in the middle of US tables). That seems to actually be a good part of the reason they're linked (though I haven't seen it used to that effect exactly, since I'm not using the PAL game right now where you can actually use different languages). FWIW, string IDs may have a cap of 0xEA5F, too. Edited March 4, 2016 by Tiddlywinks
StarsMmd Posted March 4, 2016 Author Posted March 4, 2016 Are you naming the functions in the ASM in a symbol map in dolphin? If so I'd love to see what you've discovered so far. I could send you my symbol map file as well.
Tiddlywinks Posted March 4, 2016 Posted March 4, 2016 (edited) Sure, I can send that map. =) I'll include what I have for Colosseum too. [ATTACH]12954[/ATTACH] Some of the function names are actually names I've pulled from what looks like error message data or something in some places in the games. (I wrote a bit of a program to search for those patterns and rename the right function. For that matter, I also used the same program to remove those annoying places where Dolphin mistakenly inserts the start of a new function in the middle of another.) Names that I've made, though, I like to put "q_" (like a substitute for a "?") at the beginning if I'm somehow not confident it's correct, or if I'm even less confident, I'll even just leave the default name and append something at the end so I at least know I've seen it if I run into it again (e.g., like "zz_028b5c8_q_AI_element_set" or "zz_010ae8c_q_Copy_helper"). Edit: Oh, these might also be useful... Various structure definitions or partitions, some function "maps" (like input->called function, for some that seem to use something like "select case")... And in particular, all the identifications of the r13 pointers I know. [ATTACH]12955[/ATTACH] [ATTACH]12956[/ATTACH] Edited March 4, 2016 by Tiddlywinks
StarsMmd Posted March 5, 2016 Author Posted March 5, 2016 Sure, I can send that map. =) I'll include what I have for Colosseum too. Thanks G, I'm looking through it now. Interested to see what I'll find. Here's mine as well. All the functions I named are at the bottom of the list alphabetically (after the default named ones). [ATTACH]12957[/ATTACH]
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now