1582 Posted December 16 Posted December 16 Greetings, Super noob here when it comes to this kind of stuff, but for my CS50 final project I decided to create an offline .pkx bank-like app in python using tkinter, sort-of how PKHeX looks, but basically just for storage. For that, I need to parse the .pkx files to get some needed info for the mons displayed. I could just theoretically create a separate text file that stores all of this info from some other source and just look it up with the Pokemon national dex index that PKHeX automatically puts at the start of the file upon extraction, but I feel like that would be extremely inefficient. How exactly would one go from a .pkx file to something akin to a dictionary for practical use in python? This might be a question asked a hundred thousand times and I have very little technical knowledge, so any help would be appreciated. Thanks.
Kaphotics Posted December 16 Posted December 16 Depends how much you want to read, and which formats you wish to support. pk* files are normally saved in their decrypted & unshuffled state. You can probably ignore that, but here's gen4's reference. For reading the byte array into a structure, however you want to do it. To read species, you need to read it from the correct offset. Gen6's reference. Converting to json isn't really advisable; there are so many properties.
1582 Posted December 16 Author Posted December 16 Thanks for the reply; Honestly, I only really need to parse the name of the species, the nickname, the types, the pokeball, the level and the game of origin (don't know how possible this last one is). I just need to read the bytes at the offsets and convert the corresponding numbers to data via an external source right? (f.e. the number at the offset for the pokeball is 3 an according to some source of pokeball codes, number 3 would be an ultra ball). Would Bulbapedia be a good source for that? My questions might be all over the place, so I apologise again for my lack of understanding.
Kaphotics Posted December 16 Posted December 16 Each set of games stores data in its own format; while there may be similarities, offsets can differ. You need to identify which format(s) you would like to support, and implement readers for each. Species can be read as an integer, then localized to whatever language based on a string list for that language. Nickname can be read from the data, depending on how the game encodes strings (gen6+ use UTF-16 with \0000 terminators). There are quite a few open source implementations in various languages; here's a Python one for LGP/E which is the "Gen7b" format of PKM data. https://github.com/Lincoln-LM/PyNXReader/blob/cb15cf5935fdcb8de9b8a9c268d87bc161d3af9a/structure/PK7b.py You'll need to break down each of your "needs" into smaller and smaller needs. Parsing the nickname? Need a reader, and need the offset to get the Species ID, then need a list of strings to localize the value to a display string. Parsing the ball? Need a reader, and need the offset to get the Ball ID, then need a list of strings to localize the value to a display string. See the pattern and reusable parts? I would recommend opening a pk* file in a hex editor, so you can "feel" what you're actually telling the program to do. Files are just data, and understanding (without code) how that data is stored is essentially a prerequisite to understand what you need to reimplement your understanding as code.
1582 Posted December 16 Author Posted December 16 Yeah, I understand the issue at hand more thanks to you. Super grateful for the insight and the help. Might come back with some questions as I progress through the project, but thanks a lot.
1582 Posted December 18 Author Posted December 18 So I started working with .pk3 files in a hex editor and grasped most of how it was organised, but one thing (and arguably the most important part), being the Data section at offset 0x20 is extremely confusing to me. It's confusing for 2 main reasons: 1) The G, A, E and M order should be scrambled, but from my own shiny pokemon files, I've noticed they've been consistently in GAEM order. Does PKHeX do this automatically (since that's how I extracted these files) ? Or, is the shininess somehow the reason for this consistency. 2) Everything seems to align with the GAEM table here, but the part I'm the most curious about, which is "Origins info" in the Miscellaneous section. I might be converting the hex to binary wrong, but the binary I get just doesn't make any sense in this context. Even the trainer gender is off. I'd appreciate any help, if it isn't of too much trouble, thanks.
Kaphotics Posted December 18 Posted December 18 If you are reading the data from a save file, then you need to jump through the same decryption & shuffling steps that the games use. Dumped PKM files can be whatever format the program wants to dump them in, and it's most convenient to dump things in the decrypted and unshuffled state. Compare your output to what the game displays (or other programs) to test your implementation. You can even write unit tests to assert that you are reading the expected values from the structure.
1582 Posted December 18 Author Posted December 18 Yeah, I tried comparing the results I get to the in-game info and everything seems to line up besides the "Origins info". For my Dusclops, the Hex corresponding to that segment is 18 E1, and when I convert it to binary I get 0001 1000 1110 0001, which is the only piece of info I am struggling with to align with the in-game material. I'm following the same bit-by-bit principle as shown on bulbapedia, but the info just doesn't match. Is there another conversion method I'm missing?
Kaphotics Posted Wednesday at 11:06 PM Posted Wednesday at 11:06 PM 4 hours ago, 1582 said: 18 E1 If read as a 16-bit number, it is 0xE118. https://en.wikipedia.org/wiki/Endianness
1582 Posted Thursday at 12:12 PM Author Posted Thursday at 12:12 PM I considered the endianness and tried both ways (despite ChatGPT telling me it was not needed in translation to binary), but either way it didn't match. I understand all of this might be getting slightly annoying, but I just don't have any other better source, so please understand why I'm being pesky about this. Let me explain with an example: Here I have the hex for my Dusclops from Ruby Everything, including other info from the 48-byte Data structure lines up, until we reach the two highlighted bytes, that should be the Origins info. I tried both little-endian and big-endian conversions, but for the sake of this example (neither of the two matched anyway) I'll show the one you proposed. The game of origin doesn't only not match, but it's a random not even relevant. The pokeball did match actually, but my guess is it's just a coincidence, since I tried with another example and it no longer matched. The trainer gender doesn't match the in-game gender either. This is the part that is confusing to me. I'll show one more example with my Mew from Emerald; Here's the entire hex and, again everything matches, so the two highlighted hex become our point of focus. Here, both the game of origin and the pokeball are numbers that are completely off, but the trainer gender matches (that one should be an easy coincidence). All in all, I don't know what I'm doing wrong and the fact that everything else lines up and matches besides this very part, being the part most important to me, is what's making me ask so many times. I apologise for the sheer amount of questions, but I hope these examples provide a clearer image of what I can't figure out.
Kaphotics Posted Thursday at 02:16 PM Posted Thursday at 02:16 PM Ruby: 1110000100011000, discarding 7 bits (use the >> operator in the calculator), is Since Version is only 4 bits, the lowest 4 bits are 0010, which is "2", for Ruby. For your Mew: 399E is 00111 0011 0011110; 0011 is "3", for Emerald.
1582 Posted yesterday at 05:32 AM Author Posted yesterday at 05:32 AM Sorry for the late reply; I tried this and this does wonders, everything just sits in place so perfectly. May I ask why we shift the bits by 7 specifically? (I understand that that is the number that made it "make sense", so to speak, but is there more to the number? Also, This method is consistent with the bit placement, but what Bulbapedia had as indexes doesn't really work anymore. Thank you so much, though. Will probably bother again sometime, as annoying as it might be, all of this is just too interesting.
Kaphotics Posted yesterday at 03:54 PM Posted yesterday at 03:54 PM Your screenshot of Bulbapedia shows Met Location as bits 0-6. Game of origin starts at Bit 7, ending with Bit 10. So, to discard the Met Level bits, you discard a total of 7. Since there are 4 values squished into a 16-bit value, you have to do the appropriate bitwise operations to isolate the values of each.
1582 Posted 8 hours ago Author Posted 8 hours ago I suppose I understand, but I also keep realising how lucky I keep getting with this endeavour. First of all, I would have to learn and do so much more (which isn't inherently bad) if PKHeX hadn't saved all of my Pokemon in a decrypted and non-shuffled form. Also, this 16-bit sequence is consistent after the 7 times shift. I understand that normally I would've had to shift the bits each time for each of the pieces of info I was trying to get, but somehow by shifting it by 7 bits it has a consistent and accurate placement of all of the info, such as: trainer gender, pokeball used, game of origin. I was able to parse the species name and the nickname of the Pokemon in python today and am extremely happy, without your help it would've been impossible, so thanks for all the insight. As always, will probably bother again sometime.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now