Sunday, November 26, 2017

pg_hexedit: Rich hex editor annotations for Postgres relfiles

I've written an experimental tool for presenting PostgreSQL relation files in a hex editor with annotations/tags and tooltips that show the structure of the data and its content, including bit field values. This tool is called pg_hexedit, and is available from:

https://github.com/petergeoghegan/pg_hexedit

pg_hexedit is built on top of the open source, cross-platform GUI hex editor wxHexEditor. Since it's an experimental tool that is primarily made available for educational purposes, you are well advised to not use it on any data directory that isn't entirely disposable. It may cause data corruption. Opening a Postgres relation file in a hex editor while the server is running is a fundamentally unsafe thing to do if you care about your data. Use of the tool should be limited to throwaway installations on users' personal machines.

wxHexeditor and pg_hexedit together show information about each individual field in an interactive, easy to use way:

pg_type catalog table opened in wxHexeditor, with annotations

I originally wrote the tool in order to meet my own needs in this area. I was working on corruption detection, and it became clear that a tool like this would help with corruption simulation/white-box testing, something that I've spent rather a lot of time on. Simulating and testing novel corruption scenarios became significantly easier with pg_hexedit.

Tools like contrib/pageinspect are great, but they are still somewhat interpretive, which can actually be a hindrance for this kind of work. In short, pageinspect functions show "what tuples are on the page" logically, as well as the physical contents of individual tuples, but the exact physical state of the entire page is obscured, in order to support an item-pointer-wise SQL interface. The subtle details of how free space is managed within a single page can matter. At least to me.

"cities" nbtree page, starting with ItemId array (shown as blue tags)
I eventually realized that pg_hexedit is also broadly useful as an educational tool, and decided to make it available as an open source project. Anything that helps to demystify the internals of Postgres seems like a good thing to me. I hope that pg_hexedit will be useful to users or aspiring hackers that want to understand how PostgreSQL works from the ground up. I welcome pull requests from users that want to expand pg_hexedit. For example, support for additional index access methods would be nice. Only heapam and the nbtree index AM are currently supported.

No comments:

Post a Comment