Ported
From AlephWiki
Porting Aleph One to Other Operating Systems
Return to History
Unknown author
Contents |
Structure alignment issues explained (8/27/00)
The Problem:
Aleph One reads/writes C structures from/to disk files when reading the "Map", "Sounds" and "Shapes" files and writing preferences, saved games and films. This is a serious portability problem. No well-designed program should do this. There are three reasons:
1. Endianess
The byte-order of integer values that take up more than one byte is CPU dependent. To solve this, the routines byte_swap_data() and byte_swap_memory() are provided. They work reasonably well, but their use is limited because of the following 2 problems.
2. Alignment of structure fields
The byte-offset of fields within structures is compiler dependent. Consider this structure:
struct foo {
int16 bar;
int32 baz;
};
On most 16-bit systems and (for historical reasons) under MacOS this struct will look like this in memory:
2 bytes "bar" 4 bytes "baz"
However, on almost every other system it will look like this:
2 bytes "bar" 2 bytes padding 4 bytes "baz"
2 bytes of padding are inserted to make "baz" (being a 4-byte value) start on a 4-byte boundary in memory. This (putting an n-byte item at an n-byte boundary) is called "natural alignment". It is done because most CPUs can access data faster if it is naturally aligned. Also, most RISC CPUs can't even access data at all if it is not naturally aligned (the program will crash when this is attempted). Even the PowerPC needs support from the operating system to be able do this in all cases.
Some compilers have means to force a specific structure alignment, but this is highly compiler-specific and therefore can't be used in a portable program (which I have every intention of turning Aleph One into).
3. Structure sizes
If the alignment of structure fields is compiler dependent, so is of course the size of a structure. But there's one additional caveat. Consider this structure:
struct bogon {
int32 up;
int16 down;
};
What is sizeof(bogon)? 6? Wrong. On most systems, it is 8, even though all fields are naturally aligned. This is because sizeof() returns the number of bytes an object would take inside an array of this type. Consider
bogon a[2];
On some systems (including the Mac), this will look like
4 bytes "up" from a[0] 2 bytes "down" from a[0] 4 bytes "up" from a[1] 2 bytes "down" from a[1]
But this would mean that a[1].up is not naturally aligned, so on most systems the array will look like this:
4 bytes "up" from a[0] 2 bytes "down" from a[0] 2 bytes padding 4 bytes "up" from a[1] 2 bytes "down" from a[1] 2 bytes padding
Every "bogon" effectively takes up 8 bytes of memory and that's what sizeof() returns, so something like malloc(10 * sizeof(bogon)) to dynamically allocate an array will work properly. This is the officially defined behaviour of the sizeof() operator.
By now it should be clear that storing C structures on disk is a portability nightmare. Unfortunarely, the Marathon (and therefore the Aleph One) source does it. As I'm primarily working on Linux/i386, where every one of the three issues mentioned above is handled differently than on the Mac, I had to do something about it to make it work.
The clean and proper way would be to rewrite all file I/O routines completely and turn something like
struct foo s; read_from_file(file, &s, sizeof(foo));
into
struct foo s; s.bar = read_big_endian_16(file); s.baz = read_big_endian_32(file);
to make a clear distinction between the (portable) external representation of data and the (non-portable) internal one. SDL offers a convenient set of routines to implement file I/O in such a way, centering around an "SDL_RWops" object that represents a byte stream in a file or in memory.
Doing it like this would have taken a considerable amount of time and restructuring of the code (because in many instances, a big chunk of data is read from disk and only parsed in memory). As I wanted to see results quickly, I opted for the "middle way" by making this assumption:
If the fields in a structure are naturally aligned by counting the number of bytes each field takes, the compiler will not add padding.
This means that, for example, in the structure "bogon" above, the field "up" will be at offset 0 and the field "down" at offset 4. This is true on almost all architectures, even on the Mac.
This assumption solves problem 2 when only naturally aligned structures are used. So I created special versions of the structures where the problem arises. For example, the "foo" structure above is not naturally aligned, but
struct saved_foo {
int16 bar;
int16 baz_hi, baz_lo;
};
is. The problematic "baz" field is split into two. Now it's easier to read and byte-swap a 6-byte "foo" from disk:
_bs_field _bs_saved_foo[] = {
_2byte, _2byte, _2byte // Note: not _2byte, _4byte
};
struct saved_foo s;
read_from_file(file, &s, sizeof(saved_foo));
byte_swap_data(&s, sizeof(saved_foo), 1, _bs_saved_foo);
struct foo s2;
s2.bar = s.bar;
s2.baz = (s.baz_hi << 16) _ s.baz_lo;
Now about problem 3. When we read it a 6-byte "bogon" structure like this
_bs_field _bs_bogon[] = {
_4byte, _2byte
};
struct bogon b;
read_from_file(file, &b, sizeof(bogon));
byte_swap_data(&b, sizeof(bogon), 1, _bs_bogon);
we are making two mistakes. Firstly, sizeof(bogon) is most likely 8, so we are reading too much from the file. And then, the byte_swap_data() function is told to swap 8 bytes, but the _bs_bogon definition only accounts for 6 bytes, so the byte_swap_data() function will run amok.
To solve this problem, I've introduced "SIZEOF" constants:
const int SIZEOF_bogon = 6;
This means that a "bogon" structure takes up 6 bytes on disk, which may be different from sizeof(bogon). Now we can write
struct bogon b; read_from_file(file, &b, SIZEOF_bogon); byte_swap_data(&b, SIZEOF_bogon, 1, _bs_bogon);
...and it will work. This is how most of my map-reading code was implemented, and it worked for me.
