Data sizes on various machines and compilers

The following details the sizeof() various data types on several different compilers under Linux, Unix, Minix, and Windows.

Test program

The test program, fed to all the compilers, and run, was as per the following. The line identifying the compiler was changed for each one, to the string that its "version" option produced.

#include <stdio.h>
main()
{
	long a; 
	long long b; 
	int c; 
	short c1; 
	char d; 
	void *e; 
	int (*f)(); 
	size_t ss; 

	/* This string was changed for the other compilers */ 
	printf("cc (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8)\n"); 

	printf("size of void * is %d\n", sizeof(e)); 
	printf("size of code * is %d\n", sizeof(f)); 

	printf("size of size_t is %d\n", sizeof(ss)); 

	printf("size of long long is %d\n", sizeof(b)); 
	printf("size of long is %d\n", sizeof(a)); 
	printf("size of int is %d\n", sizeof(c)); 
	printf("size of short is %d\n", sizeof(c1)); 
	printf("size of char is %d\n", sizeof(d)); 

	return(0); 
}

For some of the systems, the definition of long long and size_t had to be removed since those particular compilers don't support it.

Results

The various sizes of objects are as shown in the following table

OS Bits End Compiler void * code * size_t long long long int short char
Linux, Debian Jessie 64 Le cc (Debian 4.9.2-10) 4.9.2 8 8 8 8 8 4 2 1
Windows 7 Home 64 Le x86_64-w64-mingw32-gcc (GCC) 4.5.1 8 8 8 8 4 4 2 1
Fedora Core 4 Linux 64 Le cc (GCC) 4.0.0 20050519 (Red Hat 4.0.0-8) 8 8 8 8 8 4 2 1
Slackware Linux 32 Le cc (GCC) 3.2.3 4 4 4 8 4 4 2 1
uclinux (Picotux) 32 Be armeb-uclinux-gcc (GCC) 3.4.4 20041218 (prerelease) (Debian 3.4.3-1) 4 4 4 8 4 4 2 1
Western Digital WorldBook 32 Le cc (GCC) 3.4.2 4 4 4 8 4 4 2 1
Raspberry Pi 32 Le cc (Debian 4.6.3-12+rpi1) 4.6.3 4 4 4 8 4 4 2 1
Minix 32 Le cc (Minix 2.0) i386 4 4 4 N/A 4 4 2 1
MS Windows 32 Le Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 11.00.7022 for 80x86 4 4 4 N/A 4 4 2 1
HP-UX 32 Be cc (HP-UX 7.03 A 9000/370) 4 4 4 N/A 4 4 2 1
MS DOS 16 Le Turbo C Version 2.0 (Huge Model) 4 4 2 4 4 2 2 1
MS DOS 16 Le Turbo C Version 2.0 (Large Model) 4 4 2 4 4 2 2 1
286-XENIX 16 Le cc (Large Model) 4 4 N/A N/A 4 2 2 1
MS DOS 16 Le Turbo C Version 2.0 (Medium Model) 2 4 2 4 4 2 2 1
286-XENIX 16 Le cc (Medium Model) 2 4 N/A N/A 4 2 2 1
MS DOS 16 Le Turbo C Version 2.0 (Compact Model) 4 2 2 4 4 2 2 1
MS DOS 16 Le Turbo C Version 2.0 (Small Model) 2 2 2 4 4 2 2 1
286-XENIX 16 Le cc (Small Model) 2 2 N/A N/A 4 2 2 1
MS DOS 16 Le Turbo C Version 2.0 (Tiny Model) 2 2 2 4 4 2 2 1

Discussion

We can see that pointers and size_t grow as the processor sizes grow. Things were the simplest on 32-bit systems, where everything was 4 bytes, whether int, long, size_t, or pointers. On 64-bit, int again diverge from pointer sizes, like what was the case in the most useful 16-bit models. Fortunately, all the other headaches from thos small and medium models are still absent.

Down Memory Lane

The 16-bit systems had all the various memory models, of which two have different sizes of code and data pointers. This was because the 2-byte pointers were offsets into default segments, and only in the large and huge models were they complete selector-and-offset variables. Of course, having non-default data-segments under 16-bit windows ruled out all but the small and medium models there.

Since the data segment was not separated between processes, we had to make nearly everything dynamically or stack allocated, use separate explicit 32-bit data pointers, and therefore all standard functions had to be re-done with 32-bit data pointers. Rather tedious, not really too much fun.

Then there was the perpetual problem of handling more than 65536 bytes of anything. This would translate to 16384 pointers in an array of these; we routinely had more data than this. And arrays of structs, where each instance could be 80 or 100 bytes... these were even less pleasant.

Thankfully these days are past

32-bit glory

No wonder 32-bit was welcomed. All pointers and ints and longs were 4-bytes, no more of this casting (char FAR *) everywhere in printf() argument lists so that the segment would make it through, all into a relative state of bliss and lazyness. Whether we used int, long, size_t, void * or the Microsoftism DWORD never really mattered. Not quite typeless, but rather that a lot of things went. And also very important, no more of the bugs and errors caused by overrunning 64K segments.

64-bit austerity and luxury

With 64-bits, we now have 8-byte pointers and longs, but ints are still 4-bytes. Of course, in many cases, where we only have small values, even the 16-bit shorts remains useful. The situation resembles the 16-bit world, where pointers (at least some of them) were 4 bytes, ints were 2 bytes, so we all used unsigned long for carrying them around. It is back to these old ways again.

Another big change is that things have grown. Numerical values printed out as unsigned long (which, as before, can handle the pointers and the other smaller integers) can be a lot longer than before. 2^32 is 4294967296, and 2^64 is 18446744073709551616. That means that functions like sprintf() that were given once-ample 16 bytes to fit the 10 digits, plus any sign and terminating NUL, are even more dangerous, and that snprintf(), which allows us to trade stack overrun crashes for data truncation, is more likely to chop off data. This function tells us if that happens, and if we bother to check it out, and re-do the job in those cases needed, we will be in decent shape.

The gains coming from this increased diligence is less bugs, and less risk of memory overruns