TruePikachu

Member
  • Content count

    42
  • Joined

  • Last visited

Community Reputation

10 Good

About TruePikachu

  • Rank
    Researcher - ROM
  • Birthday 02/23/94
  1. Keep in mind that these string references are to features which are the least likely to change -- the availible Pokémon, their abilities, the moves and items, etc. All of the storyline stuff is scripted through the Lua interface, with strings referred to through their UUIDs. Anyway, if something which has a reference through the binary were to change, a recompile would probably be needed anyway (usually to change a hardcoded e.g. number of Pokémon, though I can not confirm this yet).
  2. So I went farther and got a large number of additional tables identified. I have a feeling that something, somewhere, was compiled with MSVC++, partially because of that string, and partially because I saw some structures that looked similar to the VC++ implementation of typeinfo. If that is the case, it will have a number of type names leaked in a way that I can pair them with function calls. I know that those two lists I found in the previous post's commit are the names and categories for all the species, both real (including Mega forms) and fake (being plot-controlled non-PkMn "opponents"). However, I don't have code written up to dump them (at least not in the repo) - the specific file (data/offsets.dat) is just a list of the offsets (into code.bin) along with the names I've given them. It doesn't take too long to write up code, however, so give me a moment... EDIT: Here's the output, spoilers!
  3. strings(1) will, by default, only search for ASCII/ISO8859-charset strings, but you can pass a different encoding to it through the -e/--encoding= argument (e.g. "-e l" for UTF16LE; see man page for more info). Usually, when the path to a source file ends up in the output binary, it is for an assert-like functionality (note that assert() is actually a macro, so the exact implementation is compiler-dependant). Working on the disassembly, I've found what seems to be the core function which dereferences UUIDs to their strings (const wchar_t *sub_1C42CC(_DWORD uuid)); this might hold at least a few details about some of the escapes, but I'm not holding my breath for that. I've located the list of Pokémon names and categories in the .data section, as well as a couple more tables which I haven't fully identified (and so they aren't in the offset list). The category list almost certainly is keyed by a value in the Pokémon data file, so that might help the understanding a bit. As a side note, does anyone know for sure what register holds the reason code when executing SVC 0x3C (Break)?
  4. You need to set the encoding for 2-byte characters (called Unicode in the interface) to UTF16LE (Alt+A screen). Either Chunsoft messed up their compiler configuration, or these aren't actually comments (but rather debug output text or something), in technicality. I haven't found any references to them yet, though, so I can't determine anything for sure. Also, IDA can't copy the characters out, so looking things up on Google Translate is a bit of a pain (I used dd(1) to cut out the section of the binary, and loaded it in Vim). Might want to be careful with this, though, since it causes the autoanalysis to think a lot more pieces of data (say, vtables) are actually Unicode strings, if you set the default encoding. RE: PM, I need to actually begin to understand ARM before I can really say anything else. FAKEEDIT: Hm, just noticed that the excerpt from the binary is terminated by a newline. It is almost certainly a debug string (something which would be printed to terminal directly)...
  5. Hm, while one can reasonably assume the programmers knew Japanese, did you know their comments can still be found in the exefs? .rodata:00856B12 DCB 0 .rodata:00856B13 DCB 0 .rodata:00856B14 aCtr_1 unicode <メニューファイバーが何らかの理由で、ですとろいされました。かなり致命的なエラーです。(CTR版では多分フリーズ)>,0xA .rodata:00856B14 ; DATA XREF: .text:006D3AA0o .rodata:00856B14 ; .text:off_6D3C7Co .rodata:00856B14 unicode <>,0 (Google Translate: "Menu fiber for some reason, it was Destroy. It is quite a fatal error. (Maybe freeze in CTR version)")
  6. My string dumper doesn't entirely count either, since one thing it does is make a data/string-table.dat file, which then other tools can easily access by (require "string-table") then (st:uuid-string #x12345678) or whatever. This functionality is already being used by the WIP item dumper, which uses the string table and code.bin to match the item names to the items. It does a bit more than just dump the strings, it dumps them into an intermediate format that the other tools can use EDIT: Output is actually from IDA.
  7. Definitly found the lookup table: .data:0092BFAB DCB 0x6E ; n .data:0092BFAC item_strid DCD strDummyItem0 ; 0 .data:0092BFAC ; DATA XREF: sub_1ADBDC+37ABCo .data:0092BFAC ; .text:off_1E56A4o ... .data:0092BFAC DCD strWoodSpike ; 1 .data:0092BFAC DCD strIronSpike ; 2 .data:0092BFAC DCD 0xDD853600 ; 3 .data:0092BFAC DCD 0x95D49FFA ; 4 .data:0092BFAC DCD 0xBD288A62 ; 5 .data:0092BFAC DCD 0xFBF878F4 ; 6 .data:0092BFAC DCD 0x1744B491 ; 7 EDIT: Just created https://github.com/TruePikachu/psmd-tools and have my string table code up there. About to start work with porting over my WIP item code. Copied, committed, and pushed onto a branch. EDIT: Heh, item ID 539 is Poké. It costs 6000P to buy it from a shop. Gold bars (540) only need 5550P...
  8. debian-VM:/~/lisp/psmd chris:$ cat num2hex.lisp ; Load IEEE754 double-handling code I wrote ages ago (load "../elisp/binary-float.lisp") ; Two possibilities for the source number (dolist (num-poss '((#xF099A436 "Wooden Spike from QWORD") (-258366410 "Wooden Spike from DWORD"))) (destructuring-bind (num name) num-poss (let* ((num-as-double (float num 1.0d0)) (num-as-bin64 (ieee754:get-binary64 num-as-double))) ; It is guarenteed by an extensive C++-based test suit that num-as-bin64 ; is the correct 'double' representation of the input number, according ; to IEEE754 (format t "~A: 0x~16,'0X~%" name num-as-bin64)))) * (load "/home/chris/lisp/psmd/num2hex.lisp") Wooden Spike from QWORD: 0x41EE133486C00000 Wooden Spike from DWORD: 0xC1AECCB794000000 C:\Users\Chris\3dDump>grep -aPRl '\x94\xB7\xCC\xAE\xC1' PSMD_00174600.exefs PSMD_00174600/ C:\Users\Chris\3dDump>grep -aPRl '\xC1\xAE\xCC\xB7\x94' PSMD_00174600.exefs PSMD_00174600/ C:\Users\Chris\3dDump>grep -aPRl '\xC0\x86\x34\x13\xEE\x41' PSMD_00174600.exefs PSMD_00174600/ C:\Users\Chris\3dDump>grep -aPRl '\x41\xEE\x13\x34\x86\xC0' PSMD_00174600.exefs PSMD_00174600/ C:\Users\Chris\3dDump> Ignoring the possibility that I'm overlooking something with grep or how the number would be stored, I can't find the `double` representation anywhere either. EDIT: Turns out I never actually decompressed exefs <_< EDIT2: Found a UUID in code.bin, not as a floating point number
  9. After I read up quite a bit on how ARM assembly works, I'll see what I can make of a disassembly. I'm already familiar with disassemblies in general (recall that I had to disassemble that one tool someone made for the PSMD2 soundtrack to understand the data file it came with), I'll probably set aside either later today or tomorrow for that subroutine.
  10. I mean, I'm successfully parsing the string tables, but the order of the entries is sorted by UUID (generally), not by the order the strings themselves are in the file. While this probably is to allow a binary search, it also means that there needs to be a lookup somewhere for e.g. item ID to string ID; something which I am completely unable to find, since the only instances of the ID that I can find only exist in the language files.
  11. I know message_us.bin is for the US text, but it is hard to compare the MD5 checksums of the SIR0s contained within against the nonexistant message_us/ tree in the root of romfs. I only used message_en.bin and its tree to match the file names to the files. The string tables are...just weird. I've already made a bit of progress on the item data (0x24-byte sized structures, with the WORD at +2h being the cost to buy, and things sorted in what I'm guessing is item ID order; I'll check against the Reviver Seed description stuff, since Plain Seeds), but there is no reference to the absolute position in the string table as for where the name is. However, starting from the 2988th entry in common.bin (when sorted by the location in the file where the string is; this doesn't fit at all with either the ID or the order in the table pointed to by +0x2), you get the item names. However, I highly doubt the game sorts the string tables this way when they load, so there needs to be a reverse mapping somewhere (to map the order of start position of the string to offset in the table of string start addresses). I'll poke around for it next. Do we have a full understanding of the pokemon_data_info.bin yet? My theory right now is that the "model ID"s that you changed actually refer to some field in there, which refers to a table with init data (moveset/ability/etc.). However, said field wouldn't actually be relevant to non-starters, so it might just have junk data which then maps it to whatever init data (like Poochyena -> Piplup's setup).
  12. Well, I guess I'll finish up my current mission (in the game itself), then see if I can find and make sense out of the data lists (items, etc.) since that might help me a bit more with the text decoding (some of the escapes refer to internal IDs). EDIT: Well, things start to look "weird" already, since I can't find either string ID for the Reviver Seed anywhere in the ROMFS (aside from the string tables which provide the string itself). I'll check EXEFS (since it might be a hardcoded lookup table or something), but it means that either I'm missing something major about the ID numbers, or we'll need to also pull data from EXEFS to do some things. EDIT: ... what? Not there either... EDIT: And not hiding as big-endian data. Either it isn't present (and the ID can be computed based on what is needed), or it is shuffled somewhere. Didn't something from PMD2 have a shuffled size/address field, or am I remembering some other game's structures? EDIT: Err wait <_< I might have found it now (message.bin, not message_??.bin) Nope, that's Japanese...
  13. Oh hai guys I'm back On a more serious note, I've been (independantly) working on the message_us.bin and message_us.lst files. I don't see a way to map the names (from the .lst) to the actual files (in the .bin FARC) using just that data; I instead used message_en.bin and message_en/ (which are identical, except for the FARC container) to map the names over. Text uses two-byte characters (which is UTF16LE if the high byte does NOT fall into the range A0h..FEh). There are escapes (as already noted), and their length can be determined by the high byte of the first character (there does not appear to be a simple mathematical relationship for this, unforturnately): (case (ldb (byte 8 8) base-code) ((#xA0 #xA1) 1) ((#xA2 #xA3) 2) ((#xA4) 1) ((#xA7) 2) ((#xA8) 1) ((#xB0) 2) ((#xB1 #xB2) 1) ((#xB3) 3) ((#xB4) 1) ((#xB5) 3) ((#xB6) 2) ((#xB8 #xB9) 3) ((#xBD #xBE) 2) ((#xBF #xC1 #xC2 #xC3 #xC4) 1) ((#xC5 #xC6) 2) ((#xC7 #xC8) 1) ((#xC9) 2) ((#xCA) 3) ((#xCB #xD3 #xD5 #xD6) 1) ((#xD7 #xD8 #xD9 #xDA #xDD #xDE #xDF #xE0) 2) ((#xE4 #xE5 #xE6) 3) ((#xEA #xEB) 1) ((#xEC) 2) ((#xEE #xF0 #xF1) 1) ((#xF4 #xF5 #xF6 #xF7 #xF8) 2) ((#xFC) 1) ((#xFD) 2)) (Yes, I've since learned Common Lisp, and it is really useful to have around when dealing with Unicode.) The length in this (case) statement is in units of 2-byte characters, including the beginning character. Not listed are #xD0..#xD2 being a single character (Team/Hero/Partner, but you probably already knew that). I might mention this is the work of a single day's research (which includes a typoed "rm *" instead of "rm *.txt", wiping out my code) Any specific tasks you think I should get to work on? Side note: one of my favorite features of Lisp: (case code ; Known values (#x0000 (values nil position)) (#x0009 (values (list #\SYMBOL_FOR_HORIZONTAL_TABULATION) position)) (#x000A (values (list #\SYMBOL_FOR_LINE_FEED) position)) Naming characters from Unicode
  14. Further research on monster.md. Gonna just toss my stuff from IDA here... 00000000 PkMn struc ; (sizeof=0x44) 00000000 index dw ? 00000002 unk_02 dw ? 00000004 dex dw ? ; enum DEX_ID 00000006 unk_06 dw ? 00000008 evolveFrom dw ? ; Weirdness exists here with how indexing is done 0000000A evolveMethod dw ? ; enum EVOLVE_TYPE 0000000C evolveParam1 dw ? 0000000E unk_0E db 2 dup(?) 00000010 spriteID dw ? 00000012 gender db ? ; enum GENDER 00000013 bodySize db ? 00000014 mainType db ? 00000015 altType db ? 00000016 unk_16 db 8 dup(?) 0000001E recruitRate dw ? ; Tenths of a % 00000020 baseHP dw ? ; base 10 00000022 unk_22 dw ? ; base 10 00000024 baseATK db ? ; base 10 00000025 baseSPATK db ? ; base 10 00000026 baseDEF db ? ; base 10 00000027 baseSPDEF db ? ; base 10 00000028 unk_28 db 10 dup(?) 00000032 family dw ? 00000034 XItem0 dw ? 00000036 XItem1 dw ? 00000038 XItem2 dw ? 0000003A XItem3 dw ? 0000003C unk_3C dw ? 0000003E unk_3E dw ? 00000040 unk_40 dw ? 00000042 unk_42 dw ? 00000044 PkMn ends FFFFFFFF ; enum EVOLVE_TYPE FFFFFFFF BASE_FORM = 0 FFFFFFFF EVOLVE_LEVEL = 1 FFFFFFFF ; enum GENDER FFFFFFFF DUMMY_ENTRY = 0 FFFFFFFF MALE = 1 FFFFFFFF FEMALE = 2 FFFFFFFF GENDERLESS = 3 Data structures actually start at +0x00008, making the overall file structure be something like struct MonsterMdFile { char magic[4]; // "MD\0\0" uint32_t nEntries; PkMn entries[nEntries]; }; The 601th entry is NULL♀; first is NULL♂. This +600 for female is consistant with the Wonder Mail S encoding, but I'm not entirely sure that it is a plain difference of 600, due to some mismatches in the 'evolveFrom' field. Furthermore, I believe that move information is stored in the /BALANCE/waza_p.bin and /BALANCE/waza_p2.bin files. I haven't looked inside them yet, but I'd think the learnsets are in one, and the move stats in the other.
  15. Okay, just confirmed that the /BALANCE/m_level.bin file handles stat growth at level up for each PkMn. I don't know how it is indexed exactly, but 0000 is Bulbasaur. When unpacked, unSIR0'd (dd can strip the first 16 bytes), and uncompressed, you get 100 12-byte structures; the first DWORD in the structure is the number of experience points for the level 1..100, and the either BYTE or WORD just past it is how much the HP increases after levelling up. The numbers for experience match with my book, and the L1->10 HP growth, as well as the 90->100, matches as well. EDIT: /BALANCE/monster.md has some information, like the Lv1 stats and mappings to sprite ID numbers, but it doesn't look like it contains learnsets or possibly the evolution tree. It is a collection of 0x44=68-byte structures: +0x00 WORD index +0x02 WORD family +0x04 WORD dex +0x10 WORD spriteID +0x14 BYTE mainType +0x15 BYTE altType +0x20 WORD baseHP +0x24 BYTE baseATK +0x25 BYTE baseSPATK +0x26 BYTE baseDEF +0x27 BYTE baseSPDEF +0x32 WORD unevolvedForm +0x34 WORD exclusiveItems[4] A lot of the enums (like items and types) are already listed in http://apointlessplace.net/wms/research/item_p_Sky.xlsx as part of the WMS research. Bulbasaur♂ is at 0x0004C, and Bulbasaur♀ is at 0x09FAC. You also have such gems as Nidoran♀♂ at 0x007BC, but its likely just a placeholder for easier access via index numbers.