Jump to content

Pokemon Super Mystery Dungeon And PMD:GTI Research And Utilities


Recommended Posts

Thanks! That piece of code that I wrote was in a spot I know is dumb, it was just the only spot that I knew I could trigger in game (I don't want to delete my save!) I'll try to find other places that are more receptive to menus, that's a fun part of the experimen

You probably want to make a save backup then, because something might corrupt your savegame, since scripts have some control over game saves. I used PowerSave to make one. Idk if the homebrew savegame manager now works with PSMD though.

Link to comment
Share on other sites

Status update:

I've analyzed the FARC files in PSMD, and a lot of them have a corresponding list of files in them (usually called *_database.bin or *.lst). Unfortunately, there's no obvious relation between the filenames and the entries in the FARC, because the filename database is sorted alphabetically, and the FARC entries are sorted by what may be a hash. The newest version of FARC replaces each entry's filename pointer with what's probably a hash of the filename. I think it's a hash because the game scripts refer to backgrounds in image_2d.bin by filename (like WALLPAPER_SUB_GOLD01), and there's no obvious dictionary. I've tried various hashing algorithms like the ones on Nitrxgen's website, in addition to the C++ hash function in <functional>, but no matches. It's possible the hash is calculated with a Unicode string, which would yield different results than with an ANSI string, so my tests on Nitrxgen's page would mean nothing. The only algorithm I've tried with a Unicode string is C++'s hash function in <functional>.

Also, thanks to @Andibad for making his FARC Unpacker open-source, which helped me get this far.

Link to comment
Share on other sites

where is the link to the source code? i'm trying to organize all this information since a ton of stuff is on dropbox or pastebins which are not guaranteed to stick around

on my github account ...

EDIT : uploaded project on github done : https://github.com/andibadra/unFARC

is still not complete yet, since no emulator for 3ds on PC (on playable state ...), i just lazy copy paste huge size of file to memory card.

Link to comment
Share on other sites

I was think SMD use zlib compression on gyu files and save files, i will check it later.

I looked at the save file, and it has a zlib header, but using .Net's DeflateStream and DotNetZip, I was unable to inflate the save. Something about the block size not being right. It could be that there's some slight differences in PSMD's library, and the ones I used. Looking in the code.bin strings provided by psy_commando, it seems PSMD uses "Quazal ZLib Compression Plugin". Unfortunately, according to Bing and Google, that means nothing.

Link to comment
Share on other sites

Oh hai guys I'm back :D

On a more serious note, I've been (independantly) working on the message_us.bin and message_us.lst files. I don't see a way to map the names (from the .lst) to the actual files (in the .bin FARC) using just that data; I instead used message_en.bin and message_en/ (which are identical, except for the FARC container) to map the names over.

Text uses two-byte characters (which is UTF16LE if the high byte does NOT fall into the range A0h..FEh). There are escapes (as already noted), and their length can be determined by the high byte of the first character (there does not appear to be a simple mathematical relationship for this, unforturnately):

(case (ldb (byte 8 8) base-code)
 ((#xA0 #xA1) 1)
 ((#xA2 #xA3) 2)
 ((#xA4) 1)
 ((#xA7) 2)
 ((#xA8) 1)
 ((#xB0) 2)
 ((#xB1 #xB2) 1)
 ((#xB3) 3)
 ((#xB4) 1)
 ((#xB5) 3)
 ((#xB6) 2)
 ((#xB8 #xB9) 3)
 ((#xBD #xBE) 2)
 ((#xBF #xC1 #xC2 #xC3 #xC4) 1)
 ((#xC5 #xC6) 2)
 ((#xC7 #xC8) 1)
 ((#xC9) 2)
 ((#xCA) 3)
 ((#xCB #xD3 #xD5 #xD6) 1)
 ((#xD7 #xD8 #xD9 #xDA #xDD #xDE #xDF #xE0) 2)
 ((#xE4 #xE5 #xE6) 3)
 ((#xEA #xEB) 1)
 ((#xEC) 2)
 ((#xEE #xF0 #xF1) 1)
 ((#xF4 #xF5 #xF6 #xF7 #xF8) 2)
 ((#xFC) 1)
 ((#xFD) 2))

(Yes, I've since learned Common Lisp, and it is really useful to have around when dealing with Unicode.)

The length in this (case) statement is in units of 2-byte characters, including the beginning character. Not listed are #xD0..#xD2 being a single character (Team/Hero/Partner, but you probably already knew that). I might mention this is the work of a single day's research (which includes a typoed "rm *" instead of "rm *.txt", wiping out my code)

Any specific tasks you think I should get to work on?

Side note: one of my favorite features of Lisp:

   (case code
     ; Known values
     (#x0000 (values nil position))
     (#x0009 (values (list #\SYMBOL_FOR_HORIZONTAL_TABULATION) position))
     (#x000A (values (list #\SYMBOL_FOR_LINE_FEED) position))

Naming characters from Unicode :D

Link to comment
Share on other sites

On a more serious note, I've been (independantly) working on the message_us.bin and message_us.lst files. I don't see a way to map the names (from the .lst) to the actual files (in the .bin FARC) using just that data; I instead used message_en.bin and message_en/ (which are identical, except for the FARC container) to map the names over.

I've researched the same thing. I believe the original filename is hashed somehow, then the game uses that hash to find the appropriate file. In image_2d.bin, the game takes even takes it a step further. It takes a filename (like "BACKGROUND_SUB_GOLD01"), looks in image_2d_database.bin to find a corresponding filename (usually just lowercase, but can be anything), then hashes that to look for the file. I managed to redirect an image lookup that way.

I tried comparing the hashing algorithm to that of SARC (scroll down to GetHashFromName()), but I don't think it's the same one. Maybe it's similar and being calculated slightly differently.

I've identified some corresponding filenames and hashes (well, positively identified the first two and made reasonable guesses about the others):

num - hash     - filename
130 - 21691D3B - wallpaper_main_top01
689 - CA5EA838 - wallpaper_main_top02
141 - 259CCD06 - wallpaper_main_top03
680 - C740D87F - wallpaper_main_top04
147 - 2882BD41 - wallpaper_main_top05

It's interesting how different the hashes can be when there's only a 1 bit difference.

For now, I've given up. I've started trying to map image filenames to their corresponding hash, but I don't think we'll ever have a complete list, even if we do figure out the hash. A lot of entries in image_2d_database point to "unused" (like BACKGROUND_SUB_GIFT01), so any file in the FARC that's not in the database we'll probably never know how to match. I would be interested to see your analysis of the message.bin FARC hashes to the actual filenames. This is something I considered doing myself. Knowing this would be especially useful, since it would allow me to make a string editor to complement my WIP script editor.

Any specific tasks you think I should get to work on?

Right now, there's more we don't know that what we do, so you probably can't go wrong with anything. However, here are some things that we'd like to know:

-The placement data for scripts

-The script flow data

-Where starter moves/abilities are defined when changing the player or partner Pokemon in scripts. If changing the player/partner to a non-starter Pokemon, it will have the moves and ability of some other starter Pokemon. @psy_commando could tell you more about that.

-Cafe rewards

-Item data

-Pretty much everything else

Link to comment
Share on other sites

Well, I guess I'll finish up my current mission (in the game itself), then see if I can find and make sense out of the data lists (items, etc.) since that might help me a bit more with the text decoding (some of the escapes refer to internal IDs).

EDIT: Well, things start to look "weird" already, since I can't find either string ID for the Reviver Seed anywhere in the ROMFS (aside from the string tables which provide the string itself). I'll check EXEFS (since it might be a hardcoded lookup table or something), but it means that either I'm missing something major about the ID numbers, or we'll need to also pull data from EXEFS to do some things.

EDIT: ... what? Not there either...

EDIT: And not hiding as big-endian data. Either it isn't present (and the ID can be computed based on what is needed), or it is shuffled somewhere. Didn't something from PMD2 have a shuffled size/address field, or am I remembering some other game's structures?

EDIT: Err wait <_< I might have found it now (message.bin, not message_??.bin) Nope, that's Japanese...

Edited by TruePikachu
Link to comment
Share on other sites

Hey TruePikachu!

Yeah its unclear how string ids are assigned to stuff. Except pokemon have their String UUID for their names I think.

Also, you probably don't want to look at message_en, its not used in the US version.

You want to look at message_us. I modified the text in there, and I figured that its what the game uses to display the text in the US version. I also noticed that, the text in there is loaded based on context, and not directly in the script like GTI used to do. I suspect the "script_flow_data" files to contain some hints. That, or its hard-coded. Since, most lua functions in the scripts appear to be callbacks called from the original C++ code.

Also, like evan mentioned, there is something odd with starters.. I made a little script edit I posted earlier to be able to pick whatever pokemon as starter, but the thing is, if they're not one of the 20 official starters, their ability, and their starting moves will be the ones from one of the official starter pokemon.. And sometimes, for some pokemon, they won't always get the same starter's moveset/ability. Like I had on eevee both oshawott and pipplup's moveset on different occasions. Poochyena seems to always get piplup's moveset and ability..

And looking for some of the move ids in the binaries didn't yield anything. Not to mention that, while the fancy starter select screen is hard-coded, only the starter's model ids are in the binary, they match the pokemons' ids. I edited then, and found out only their model changed, not the actual pokemon tied to it.. And after the select screen you'd get the actual correct pokemon model assigned with the starter you picked.. So something tells me there is some data in the game files with details on starters and their moveset and ability.

Link to comment
Share on other sites

I know message_us.bin is for the US text, but it is hard to compare the MD5 checksums of the SIR0s contained within against the nonexistant message_us/ tree in the root of romfs. I only used message_en.bin and its tree to match the file names to the files.

The string tables are...just weird. I've already made a bit of progress on the item data (0x24-byte sized structures, with the WORD at +2h being the cost to buy, and things sorted in what I'm guessing is item ID order; I'll check against the Reviver Seed description stuff, since Plain Seeds), but there is no reference to the absolute position in the string table as for where the name is. However, starting from the 2988th entry in common.bin (when sorted by the location in the file where the string is; this doesn't fit at all with either the ID or the order in the table pointed to by +0x2), you get the item names. However, I highly doubt the game sorts the string tables this way when they load, so there needs to be a reverse mapping somewhere (to map the order of start position of the string to offset in the table of string start addresses). I'll poke around for it next.

Do we have a full understanding of the pokemon_data_info.bin yet? My theory right now is that the "model ID"s that you changed actually refer to some field in there, which refers to a table with init data (moveset/ability/etc.). However, said field wouldn't actually be relevant to non-starters, so it might just have junk data which then maps it to whatever init data (like Poochyena -> Piplup's setup).

Link to comment
Share on other sites

The new palemoon update seems to have completely broken the forums for me.. It seems the forum's software is really abusing user agent sniffing.. :/

I know message_us.bin is for the US text, but it is hard to compare the MD5 checksums of the SIR0s contained within against the nonexistant message_us/ tree in the root of romfs. I only used message_en.bin and its tree to match the file names to the files.

Well the thing is, there's probably a reason why message_us doesn't have its own directory. In GTI, the text used to be stored like that, in a subdirectory. But in PSMD, its all packed into an archive, and accessed via filename like before. The best example of that is the function in the script for loading the string file for shops that is used the same way as it was in GTI..

Also, its been reported that the european language files, which message_en is part of, are incomplete. Names of things for one. Apparently that only the dialog was translated. But that's mainly just from hearing randomn player accounts that tried the language hack on the US psmd rom.

 MENU:LoadMenuTextPool("message/shop.bin", false)
 MENU:LoadMenuTextPool("message/top.bin", true)
 SCREEN_B:LoadWallpaper("WALLPAPER_SUB_SKILL01")

Also, I'm not sure what you mean about the strings tables being weird. They have a table with offsets from the start of the header to a string, and each entry has a 32bits UUID used to refer to it in the scripts. Its been like this since GTI.

https://dl.dropboxusercontent.com/u/13343993/my_pmd_research_files/PMD_GTI/FileFormats/string_database.txt

As for the pokemon data file, silverhawke and andibad found a lot about them a few pages back. There are still some things we're not too sure about though.

Link to comment
Share on other sites

I mean, I'm successfully parsing the string tables, but the order of the entries is sorted by UUID (generally), not by the order the strings themselves are in the file. While this probably is to allow a binary search, it also means that there needs to be a lookup somewhere for e.g. item ID to string ID; something which I am completely unable to find, since the only instances of the ID that I can find only exist in the language files.

Link to comment
Share on other sites

Ohh, I understand now ^^;

Well, maybe a good lead would be investigating the scripts ?

I found this while looking around:

local textId = FUNC_COMMON:GetItemExplainTextId(parent.curItem.obj.obj:GetIndex())

Then with the IDA demo, and some messing around to get an elf to feed it:

.text:0011C8A0 ; ---------------------------------------------------------------------------
.text:0011C8A0
.text:0011C8A0 loc_11C8A0                              ; CODE XREF: sub_112FAC+9510j
.text:0011C8A0                 BL      sub_1D8BE0      ; Branch with Link
.text:0011C8A4                 LDR     R2, =sub_16CA38 ; Load from Memory
.text:0011C8A8                 ADR     R1, aGetitemexpla_0 ; "GetItemExplainTextId"
.text:0011C8AC                 MOV     R0, R4          ; Rd = Op2

So the subroutine that runs this is at 0x16CA38 in the elf, and you just have to subtract 0x10000 from it to get the offset in the exefs.bin file. Though, its bound to have all the Lua parameter processing stuff and etc in there to make things more complicated.. :/

Link to comment
Share on other sites

After I read up quite a bit on how ARM assembly works, I'll see what I can make of a disassembly. I'm already familiar with disassemblies in general (recall that I had to disassemble that one tool someone made for the PSMD2 soundtrack to understand the data file it came with), I'll probably set aside either later today or tomorrow for that subroutine.

Link to comment
Share on other sites

Actually, now that you mention it, you'll probably want to look up more infos on how the Lua VM works. And what the ASM for exposed functions/methods looks like. Also all lua numbers and constants are stored as doubles. So now that I think of it, that might be why you were having issues finding the values you wanted in the exefs maybe?

The 3ds runs ARMv6, with VFPv2. Just for reference. The VFP is a floating point number co processor. Basically, it adds some op-codes and registers for floating points numbers to take into account. It works very similarly to how the NDS ARM9 worked. Except that now there's executable and non-executable memory pages and services and etc..

Link to comment
Share on other sites

debian-VM:/~/lisp/psmd
chris:$ cat num2hex.lisp
; Load IEEE754 double-handling code I wrote ages ago
(load "../elisp/binary-float.lisp")

; Two possibilities for the source number
(dolist (num-poss '((#xF099A436 "Wooden Spike from QWORD")
                   (-258366410 "Wooden Spike from DWORD")))
 (destructuring-bind (num name) num-poss
   (let* ((num-as-double (float num 1.0d0))
          (num-as-bin64 (ieee754:get-binary64 num-as-double)))
     ; It is guarenteed by an extensive C++-based test suit that num-as-bin64
     ; is the correct 'double' representation of the input number, according
     ; to IEEE754
     (format t "~A: 0x~16,'0X~%" name num-as-bin64))))

* (load "/home/chris/lisp/psmd/num2hex.lisp")
Wooden Spike from QWORD: 0x41EE133486C00000
Wooden Spike from DWORD: 0xC1AECCB794000000

C:\Users\Chris\3dDump>grep -aPRl '\x94\xB7\xCC\xAE\xC1' PSMD_00174600.exefs PSMD_00174600/

C:\Users\Chris\3dDump>grep -aPRl '\xC1\xAE\xCC\xB7\x94' PSMD_00174600.exefs PSMD_00174600/

C:\Users\Chris\3dDump>grep -aPRl '\xC0\x86\x34\x13\xEE\x41' PSMD_00174600.exefs PSMD_00174600/

C:\Users\Chris\3dDump>grep -aPRl '\x41\xEE\x13\x34\x86\xC0' PSMD_00174600.exefs PSMD_00174600/

C:\Users\Chris\3dDump>

Ignoring the possibility that I'm overlooking something with grep or how the number would be stored, I can't find the `double` representation anywhere either.

EDIT: Turns out I never actually decompressed exefs <_<

EDIT2: Found a UUID in code.bin, not as a floating point number :D

Link to comment
Share on other sites

Definitly found the lookup table:

.data:0092BFAB                     DCB 0x6E ; n
.data:0092BFAC     item_strid      DCD strDummyItem0       ; 0
.data:0092BFAC                                             ; DATA XREF: sub_1ADBDC+37ABCo
.data:0092BFAC                                             ; .text:off_1E56A4o ...
.data:0092BFAC                     DCD strWoodSpike        ; 1
.data:0092BFAC                     DCD strIronSpike        ; 2
.data:0092BFAC                     DCD 0xDD853600          ; 3
.data:0092BFAC                     DCD 0x95D49FFA          ; 4
.data:0092BFAC                     DCD 0xBD288A62          ; 5
.data:0092BFAC                     DCD 0xFBF878F4          ; 6
.data:0092BFAC                     DCD 0x1744B491          ; 7

EDIT: Just created https://github.com/TruePikachu/psmd-tools and have my string table code up there. About to start work with porting over my WIP item code. Copied, committed, and pushed onto a branch.

EDIT: Heh, item ID 539 is Poké. It costs 6000P to buy it from a shop. Gold bars (540) only need 5550P...

Edited by TruePikachu
Link to comment
Share on other sites

My string dumper doesn't entirely count either, since one thing it does is make a data/string-table.dat file, which then other tools can easily access by (require "string-table") then (st:uuid-string #x12345678) or whatever. This functionality is already being used by the WIP item dumper, which uses the string table and code.bin to match the item names to the items.

It does a bit more than just dump the strings, it dumps them into an intermediate format that the other tools can use :)

EDIT: Output is actually from IDA.

Link to comment
Share on other sites

Hm, while one can reasonably assume the programmers knew Japanese, did you know their comments can still be found in the exefs?

.rodata:00856B12                     DCB    0
.rodata:00856B13                     DCB    0
.rodata:00856B14     aCtr_1          unicode <メニューファイバーが何らかの理由で、ですとろいされました。かなり致命的なエラーです。(CTR版では多分フリーズ)>,0xA
.rodata:00856B14                                             ; DATA XREF: .text:006D3AA0o
.rodata:00856B14                                             ; .text:off_6D3C7Co
.rodata:00856B14                     unicode <>,0

(Google Translate: "Menu fiber for some reason, it was Destroy. It is quite a fatal error. (Maybe freeze in CTR version)")

Link to comment
Share on other sites

You need to set the encoding for 2-byte characters (called Unicode in the interface) to UTF16LE (Alt+A screen).

Either Chunsoft messed up their compiler configuration, or these aren't actually comments (but rather debug output text or something), in technicality. I haven't found any references to them yet, though, so I can't determine anything for sure. Also, IDA can't copy the characters out, so looking things up on Google Translate is a bit of a pain (I used dd(1) to cut out the section of the binary, and loaded it in Vim). Might want to be careful with this, though, since it causes the autoanalysis to think a lot more pieces of data (say, vtables) are actually Unicode strings, if you set the default encoding.

RE: PM, I need to actually begin to understand ARM before I can really say anything else.

FAKEEDIT: Hm, just noticed that the excerpt from the binary is terminated by a newline. It is almost certainly a debug string (something which would be printed to terminal directly)...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...