Time to sort two million 8-byte records vs. two million 40-byte records
I was trying to get an idea how much slower/faster sorting data gets with unit size. This relates to low-level data structure organization, in particular, when it makes sense to index data by pointer vs. by value. L1 cache line size on a modern Intel CPU is 64 bytes, and I was thinking it would make a difference. What I got wasn't quite what I had expected: apparently, it takes only 4 times as much time to sort two million 1024 byte records than to sort two million two-byte records.
I found this pretty amazing.
Here's my program and test results:
kostja@atlas:~$ cat foo.sh #!/bin/sh for el_size in 2 4 8 16 32 64 128 256 512 1024 do echo -n "$el_size\t" ./a.out $el_size done kostja@atlas:~$ cat foo.c #include#include <sys/time.h> #include #define EL_CNT 2000000 int EL_SIZE; int my_cmp(const void *a, const void *b) { return memcmp(a, b, EL_SIZE); } int usage(const char *progname) { printf("usage: %s <element_size>\n", progname); return 1; } int main(int argc, char **argv) { EL_SIZE = argc > 1 ? atoi(argv[1]) : 0; if (EL_SIZE <= 0) exit(usage(argv[0])); char *c = malloc(EL_CNT * EL_SIZE); int i; for (i = 0; i < (EL_CNT * EL_SIZE); ++i) { c[i] = (rand() % 30) + 65; } struct timeval tv, tv1; gettimeofday(&tv, NULL); qsort(c, EL_CNT, EL_SIZE, my_cmp); gettimeofday(&tv1, NULL); printf("%lf\n", ((tv1.tv_sec - tv.tv_sec) * 1000000 + tv1.tv_usec - tv.tv_usec)/1000000.0); free(c); } kostja@atlas:~$ gcc -O3 foo.c; sh ./foo.sh 2 0.552472 4 0.502978 8 0.559762 16 0.682934 32 0.895612 64 1.203204 128 1.300302 256 1.583563 512 1.782502 1024 2.142894 </pre>