Confirmed users
753
edits
(9 intermediate revisions by the same user not shown) | |||
Line 15: | Line 15: | ||
=== Handling of denormals === | === Handling of denormals === | ||
x86 CPUs have two distinct flags that can optionally be enabled to enable non-IEEE-compliant handling of denormals: | x86 CPUs have two distinct flags that can optionally be enabled to enable non-IEEE-compliant handling of denormals by SSE instructions: | ||
* FTZ (Flush To Zero) causes any denormal ''result'' to be flushed to zero. | * FTZ (Flush To Zero) causes any denormal ''result'' to be flushed to zero. | ||
* DAZ (Denormals Are Zero) causes any denormal ''operand'' to be handled as if it were zero. | * DAZ (Denormals Are Zero) causes any denormal ''operand'' to be handled as if it were zero. | ||
Not all CPUs have these flags. Even CPUs with SSE2 instructions can fail to support DAZ | |||
Not all CPUs have these flags. Even CPUs with SSE2 instructions can fail to support DAZ, see below the section about that. | |||
== x86-64 architecture == | == x86-64 architecture == | ||
Line 51: | Line 52: | ||
The reason to compare these normal laws to the overall ''median'', not the overall ''mean'', is that the overall mean may fall in between the actual "normal" time range and some abnormal, much higher values obtained for certain operands. In that case, ''all'' actual timings would fall far away from the mean, which would cause all timings to be reported as abnormal! This problem is averted by using the median instead, which is by definition an actual timing, and is most stable in the face of outliers. | The reason to compare these normal laws to the overall ''median'', not the overall ''mean'', is that the overall mean may fall in between the actual "normal" time range and some abnormal, much higher values obtained for certain operands. In that case, ''all'' actual timings would fall far away from the mean, which would cause all timings to be reported as abnormal! This problem is averted by using the median instead, which is by definition an actual timing, and is most stable in the face of outliers. | ||
Times are measured by clock_gettime using CLOCK_THREAD_CPUTIME_ID. In other words, this is per-thread CPU time. This should be as resilient to perturbations as it gets, especially as the core benchmarking loop is free of I/O or syscalls (no malloc) and relies on only at most two cache lines (it only accesses a few bytes on the stack, and in the worst case they would extend | Times are measured by clock_gettime using CLOCK_THREAD_CPUTIME_ID. In other words, this is per-thread CPU time. This should be as resilient to perturbations as it gets, especially as the core benchmarking loop is free of I/O or syscalls (no malloc) and relies on only at most two cache lines (it only accesses a few bytes on the stack, and in the worst case they would extend across one cache line boundary). | ||
== Caveats == | == Caveats == | ||
Line 70: | Line 71: | ||
== Running the code == | == Running the code == | ||
You can either build [https://github.com/bjacob/time-math | You can either build [https://github.com/bjacob/time-math the source code] or use [http://people.mozilla.org/~bjacob/ta/bin/ prebuilt binaries]. Note that only Linux is supported at the moment. | ||
The binaries should run on any Linux-based system: GNU/Linux, Android, Firefox OS. They are statically linked, so they don't have any dependency, not even on the standard library. | The binaries should run on any Linux-based system: GNU/Linux, Android, Firefox OS. They are statically linked, so they don't have any dependency, not even on the standard library. | ||
Line 114: | Line 115: | ||
As expected, divisions are not constant-time whatsoever, not even on finite values, and not even on integer types. | As expected, divisions are not constant-time whatsoever, not even on finite values, and not even on integer types. | ||
=== Integer addition, subtraction and multiplication are universally constant-time === | === Integer addition, subtraction and multiplication are almost universally constant-time === | ||
But again, note that integer ''division'' isn't, and nor are any ''floating-point'' operations. | But again, note that integer ''division'' isn't, and nor are any ''floating-point'' operations. | ||
See below for a strange exception, where int64 subtraction is not constant time on some 32bit CPUs. | |||
== The hopefully interesting parts == | == The hopefully interesting parts == | ||
Line 132: | Line 135: | ||
* [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-d-3GHz-family-15-model-4-stepping-7-dell-dimension9150-gavin/time-math-x86-64-DAZ-FTZ.txt this Pentium D], family 15 model 4. This set of results shows that the problem exists even in 64bit mode. | * [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-d-3GHz-family-15-model-4-stepping-7-dell-dimension9150-gavin/time-math-x86-64-DAZ-FTZ.txt this Pentium D], family 15 model 4. This set of results shows that the problem exists even in 64bit mode. | ||
=== A significant mass of SSE2-capable CPUs do not have DAZ === | |||
Conventional "wisdom" is that DAZ is available nearly everywhere SSE2 is. | |||
This turns out not to be the case even for Intel CPUs: this [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-m-1.4GHz-family-6-model-9-stepping-5-toshiba-tecra/ Pentium M] supports SSE2 but evidently [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-m-1.4GHz-family-6-model-9-stepping-5-toshiba-tecra/time-math-x86-32-sse2-DAZ-FTZ.txt doesn't support DAZ]. As a result, timings are all over the place, even on just additions, as soon as denormals are involved. | |||
It is also reported that some early Pentium 4 CPUs were in the same case. Conversely, not ''all'' Pentium M's are affected (here is an [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-m-1.6GHz-family-6-model-13-stepping-6-dell-inspiron600m-gavin/ unaffected one]). | |||
=== | === Timing differences in equality comparisons === | ||
The SSE instructions being used there are ucomiss and ucomisd. | |||
The first problem is denormals: equality comparisons are just as affected as other operations by denormals. | |||
The next problem is that many CPUs have abnormally ''slow'' or ''fast'' equality comparisons when the operands are non-finite. | |||
Specifically: | |||
* Recent Intel CPUs tend to have abnormally slow comparisons with NaN. Examples: [http://people.mozilla.org/~bjacob/ta/results/intel-core-i7-2860QM-2.5GHz-family-6-model-42-stepping-7-thinkpad-w520/ Core i7], [http://people.mozilla.org/~bjacob/ta/results/intel-core-2-2.66GHz-family-6-model-15-stepping-11-st3fan/ Core 2]. | |||
* Recent AMD CPUs, low-end and older Intel CPUs, and Intel Atom CPUs have abnormally fast comparisons with NaN. Examples: [http://people.mozilla.org/~bjacob/ta/results/amd-fx-8150-family-21-model-1-stepping-2-benwa-bulldozer/ AMD FX], [http://people.mozilla.org/~bjacob/ta/results/amd-athlon-ii-x4-635-family-16-model-5-stepping-2-gkwong/ Athlon II], [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-b940-2GHz-family-6-model-42-stepping-7-gerv/ Pentium B940], [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-4-3.2GHz-family-15-model-3-stepping-3-tycho/ Pentium 4], [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-d-3GHz-family-15-model-4-stepping-7-dell-dimension9150-gavin/ Pentium D], [http://people.mozilla.org/~bjacob/ta/results/intel-atom-d510-1.66GHz-family-6-model-28-stepping-10-jvehent/ Intel Atom]. | |||
* Some VIA CPUs have abnormally fast comparisons with NaN and Inf. See [http://people.mozilla.org/~bjacob/ta/results/via-nano-u2250-1.6GHz-family-6-model-15-stepping-3-jvehent/ this VIA Nano]. | |||
=== Some CPUs have surprising timing differences on 64-bit integer arithmetic using 32-bit integer arithmetic instructions, when DAZ is enabled === | === Some CPUs have surprising timing differences on 64-bit integer arithmetic using 32-bit integer arithmetic instructions, when DAZ is enabled === | ||
Line 171: | Line 170: | ||
movl %edx, 20(%ebx) | movl %edx, 20(%ebx) | ||
On [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-4-3.2GHz-family-15-model-3-stepping-3-tycho/ one Pentium 4 CPU] that we used, this does not quite run in constant time, with timing differences of up to 25% and | On [http://people.mozilla.org/~bjacob/ta/results/intel-pentium-4-3.2GHz-family-15-model-3-stepping-3-tycho/ one Pentium 4 CPU] that we used, this does not quite run in constant time, with timing differences of up to 25% and about 3 sigma from the overall median; and the weirdest part is that this only happens when DAZ is enabled! This is of course particularly surprising as DAZ is not supposed to have any relevance to integer arithmetic. | ||
Here are the results with DAZ, showing slower-than-normal operation when the right hand side of the subtraction is INT64_MIN, with the slowest case being INT64_MIN - INT64_MIN: | Here are the results with DAZ, showing slower-than-normal operation when the right hand side of the subtraction is INT64_MIN, with the slowest case being INT64_MIN - INT64_MIN: | ||
Line 184: | Line 183: | ||
It is worth noting that this only happens on that particular Pentium 4 CPU (family 15 model 3) and not on other Netburst-based CPUs that we tested. | It is worth noting that this only happens on that particular Pentium 4 CPU (family 15 model 3) and not on other Netburst-based CPUs that we tested. | ||
=== Float to int64 conversion is non-constant-time on some CPUs === | |||
FILL ME | |||
=== std::max<int64> is non-constant-time on some CPUs === | |||
FILL ME |