In C, I tested the i && !(i & (i - 1)
trick and compared it with __builtin_popcount(i)
, using gcc on Linux, with the -mpopcnt flag to be sure to use the CPU's POPCNT instruction. My test program counted the # of integers between 0 and 2^31 that were a power of two.
At first I thought that i && !(i & (i - 1)
was 10% faster, even though I verified that POPCNT was used in the disassembly where I used__builtin_popcount
.
However, I realized that I had included an if statement, and branch prediction was probably doing better on the bit twiddling version. I removed the if and POPCNT ended up faster, as expected.
Results:
Intel(R) Core(TM) i7-4771 CPU max 3.90GHz
Timing (i & !(i & (i - 1))) trick30real 0m13.804suser 0m13.799ssys 0m0.000sTiming POPCNT30real 0m11.916suser 0m11.916ssys 0m0.000s
AMD Ryzen Threadripper 2950X 16-Core Processor max 3.50GHz
Timing (i && !(i & (i - 1))) trick30real 0m13.675suser 0m13.673ssys 0m0.000sTiming POPCNT30real 0m13.156suser 0m13.153ssys 0m0.000s
Note that here the Intel CPU seems slightly slower than AMD with the bit twiddling, but has a much faster POPCNT; the AMD POPCNT doesn't provide as much of a boost.
popcnt_test.c:
#include "stdio.h"// Count # of integers that are powers of 2 up to 2^31;int main() { int n; for (int z = 0; z < 20; z++){ n = 0; for (unsigned long i = 0; i < 1<<30; i++) { #ifdef USE_POPCNT n += (__builtin_popcount(i)==1); // Was: if (__builtin_popcount(i) == 1) n++; #else n += (i && !(i & (i - 1))); // Was: if (i && !(i & (i - 1))) n++; #endif } } printf("%d\n", n); return 0;}
Run tests:
gcc popcnt_test.c -O3 -o test.exegcc popcnt_test.c -O3 -DUSE_POPCNT -mpopcnt -o test-popcnt.exeecho "Timing (i && !(i & (i - 1))) trick"time ./test.exeechoecho "Timing POPCNT"time ./test-opt.exe