Fuchsia is looking to remove the ZX_WAIT_ASYNC_ONCE constant, and our
copy of GoogleTest still has it. As usually with GoogleTest updates, our
old local patch is no longer necessary, but now we need a new one.
Change-Id: I8d226f01cf0951fd278605688684bf1ce3e17898
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/44884
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Originally, when we imported fiat-crypto field operations, the pipeline
was in early stages and the generated code had to be manually integrated
with the rest of the curve implementation, so we moved all our
supporting code to third_party/fiat for simplicity. Over time more
supporting code, like the table generators, landed there to be next to
its callers.
fiat-crypto now generates standalone files which we #include into the
supporting code. This moves the supporting code back to the usual
location. It also updates the README.md file to reflect the new
pipeline. (Most of it was a documentation of the old pipeline, which was
much more manual.)
Change-Id: I64db7235feb6566f0d3cd4db3a7146050edaf25a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40904
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
I missed a symbol.
Change-Id: I83c6828620a54eaab26cad08b1714402a2758fc0
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40905
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This file gets #included into other files, so we shouldn't use generic
names like 'fe'. This will let us import other fiat-crypto curves in the
future, if we want them.
Change-Id: Ie4e222729bde7e4ccd368b86fb9048a2ea4a58ac
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40824
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This pulls in the latest upstream files and reapplies our value_barrier patch.
In particular, https://github.com/mit-plv/fiat-crypto/pull/723 made 64-bit
P-256 faster!
32-bit x86, gcc
Before:
Did 3150 ECDH P-256 operations in 4027477us (782.1 ops/sec)
Did 9912 ECDSA P-256 signing operations in 4067212us (2437.1 ops/sec)
Did 3772 ECDSA P-256 verify operations in 4059197us (929.2 ops/sec)
Did 74800 Ed25519 key generation operations in 4020883us (18602.9 ops/sec)
Did 74000 Ed25519 signing operations in 4001827us (18491.6 ops/sec)
Did 21371 Ed25519 verify operations in 4024606us (5310.1 ops/sec)
Did 78000 Curve25519 base-point multiplication operations in 4051574us (19251.8 ops/sec)
Did 25133 Curve25519 arbitrary point multiplication operations in 4063280us (6185.4 ops/sec)
After:
Did 3250 ECDH P-256 operations in 4025179us (807.4 ops/sec) [+3.2%]
Did 10277 ECDSA P-256 signing operations in 4084926us (2515.8 ops/sec) [+3.2%]
Did 3895 ECDSA P-256 verify operations in 4048734us (962.0 ops/sec) [+3.5%]
Did 74480 Ed25519 key generation operations in 4002460us (18608.6 ops/sec) [+0.0%]
Did 74000 Ed25519 signing operations in 4004425us (18479.6 ops/sec) [-0.1%]
Did 21756 Ed25519 verify operations in 4038856us (5386.7 ops/sec) [+1.4%]
Did 78000 Curve25519 base-point multiplication operations in 4031991us (19345.3 ops/sec) [+0.5%]
Did 25133 Curve25519 arbitrary point multiplication operations in 4064925us (6182.9 ops/sec) [-0.0%]
x86_64, clang, OPENSSL_SMALL
Before:
Did 20090 ECDH P-256 operations in 4019408us (4998.2 ops/sec)
Did 56000 ECDSA P-256 signing operations in 4004370us (13984.7 ops/sec)
Did 23562 ECDSA P-256 verify operations in 4062283us (5800.2 ops/sec)
Did 127000 Ed25519 key generation operations in 4005053us (31709.9 ops/sec)
Did 128000 Ed25519 signing operations in 4021902us (31825.7 ops/sec)
Did 71000 Ed25519 verify operations in 4036015us (17591.6 ops/sec)
Did 132000 Curve25519 base-point multiplication operations in 4002101us (32982.7 ops/sec)
Did 93000 Curve25519 arbitrary point multiplication operations in 4023827us (23112.3 ops/sec)
After:
Did 22263 ECDH P-256 operations in 4005099us (5558.7 ops/sec) [+11.2%]
Did 61000 ECDSA P-256 signing operations in 4024810us (15156.0 ops/sec) [+8.4%]
Did 27426 ECDSA P-256 verify operations in 4038547us (6791.1 ops/sec) [+17.1%]
Did 128000 Ed25519 key generation operations in 4015033us (31880.2 ops/sec) [+0.5%]
Did 127000 Ed25519 signing operations in 4003894us (31719.1 ops/sec) [-0.3%]
Did 70000 Ed25519 verify operations in 4017446us (17424.0 ops/sec) [-1.0%]
Did 132000 Curve25519 base-point multiplication operations in 4006282us (32948.3 ops/sec) [-0.1%]
Did 93000 Curve25519 arbitrary point multiplication operations in 4025190us (23104.5 ops/sec) [-0.0%]
Change-Id: I2f705772899c701480ca0e0885e6b75dd1bb1f5d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40746
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
It is tricky to create EC_FELEMs right now. This will simplify making them.
Change-Id: Icde518efed61381004e2e5569a45d653af48ca2a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40585
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This fixes two issues. First, we have been lax about whether the
low-level inversion functions fail on zero input or output zero. Fix the
documentation and call the latter inv0 or inverse0 to match the
terminology used in draft-irtf-cfrg-hash-to-curve. (Although we may not
actually need inv0 given the optimization in D.2.)
This has no actual effect because the functions were only used in
contexts where the inputs were already guaranteed to be non-zero. Still,
we should be consistent here.
Second, ec_scalar_inv_montgomery and ec_scalar_inv_montgomery_vartime
claim to perform the same operation, but they do not. First, one
computed inv0 and the other computed inv (except only in some
implementations, so fix it to be consistent). Second, the former
computes inverses in the Montgomery domain, while the latter converts to
the Montgomery domain and then inverts. Rename it to
ec_scalar_to_montgomery_inv_vartime, which is... questionably
understandable but at least looks different.
Change-Id: I9b4829ce5013bdb9528078a093f41b1b158df265
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/40526
Reviewed-by: Adam Langley <agl@google.com>
In doing so, add an IgnoreAllInstructions option for FileTest. FileTest
tracks unused fields so test drivers don't accidentally miss a portion
of the test. Wycheproof tests, however, have many different key formats
in instructions. These are tedious to list out, so add an option to
ignore them, on the assumption that checking attributes is more useful
than instructions.
Change-Id: I01cc9f3a95577d576c8c2dd68f5092fceb3215b1
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39191
Reviewed-by: Adam Langley <agl@google.com>
It's unclear to me whether the normal rsa_signature_test.txt file is
still needed with these, but I've left it in for now.
Change-Id: I6db9c9556820263c2b0bc37d144e6403b9a7a178
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39189
Reviewed-by: Adam Langley <agl@google.com>
This CL updates the JSON conversion to preserve the flags. A
WycheproofResult now captures both "result" and "flags". An "acceptable"
test case's validity is determined by its flags. By default, we consider
an "acceptable" case as invalid, but a test driver may mark some of them
as valid by listing the flags as a parameter.
Previously, some Wycheproof tests (I think it was x25519_tests.txt?) did
not contain enough information to resolve this unambiguously. This has
since been fixed.
This also makes the converted files smaller because we no longer expand the
flags into comments.
Change-Id: I2ca02d7f1b95f250409e8b23c4ad7bb595d77fdf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39188
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Wycheproof have added many more tests. They'll be imported in subsequent
CLs.
Change-Id: I69d8e09328b08edbd0a96757db26b380d7a7c7ee
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/39184
Reviewed-by: Adam Langley <agl@google.com>
The experiment which motivated CECPQ2b has concluded (although the
results haven't been published yet) and the SIKE code is causing some
issues for gRPC in gprc/grpc#20100. Also, this is code size that takes
up space in Android etc.
Change-Id: I43b0b8c420f236c0fe9b40bf2517d2fde98495d5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38384
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This path has always had one-too-few “..” elements since the file first
appeared, but everyone seems to have lived with it, presumably because
/include is in the search path and the compiler tries relative to that.
Change-Id: I30006209ad74d064ded5dd2cd34b1f14806dcffe
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/38344
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
I should have noticed this previously, but the SIKE code was exporting
symbols called generic things like “params”. They're not dynamically
exported, but BoringSSL is often statically linked so better to ensure
that these things are prefixed to avoid the risk of collisions.
Change-Id: I3a942dbc8f4eab703d5f1d6898f67513fd7b578c
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36745
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
We already have crypto/dh/params.c and some of our downstream consumers
cannot take two source files with the same name in the same build
target.
Change-Id: I324ace094c2215b443e98fc9ae69876ea1929efa
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36744
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
* CECPQ2b will use SIKE/p434 instead of SIKE/p503
* KEM uses SHA256 instead of HMAC-256
* implements new starting curve: y^2=x^3 + 6x^2 + x
* adds optimized implementation for aarch64
* adds optimized implementation for AMD64
which do not support MULX/ADOX/ADCX
* syncs the SIKE test code with the NIST Round 2
specification.
* removes references to field size from variables
names, tests and defines.
Change-Id: I5359c6c62ad342354c6d337f7ee525158586ec93
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36704
Reviewed-by: Adam Langley <agl@google.com>
Clang recognizes the (mask & a) | (~mask & b) pattern as a select. While
it often optimizes this into a cmov, it sometimes inserts branches
instead, particularly when it detects a string of cmovs with the same
condition.
In the long term, we need language-level support for expressing our
constraints. In the short term, introduce value barriers to prevent the
compiler from reasoning about our bit tricks. Thanks to Chandler Carruth
for suggesting this pattern. It should be reasonably robust, short of
value-based PGO or the compiler learning to reason about empty inline
assembly blocks.
Apply barriers to our various constant-time selects. We should invest
more in the valgrind-based tooling to figure out if there are other
instances.
Change-Id: Icc24ce36a61f7fec021a762c27197b9c5bd28c5d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36484
Reviewed-by: Chandler Carruth <chandlerc@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Point addition formulas for short Weierstrass curves are often
incomplete and do not work for P + P. EC implementations usually rely on
constant-time operations never hitting this case, or at least it being
rare[0].
However, the condition checks several values. Our C functions use && and
||, and the P-256 assembly also branches. This can leak intermediate
values via a small side channel. Thanks to David Schrammel and Samuel
Weiser for reporting this.
nistz256 base point multiplication (keygen, ECDSA signing) is unaffected
due to ecp_nistz256_point_add_affine lacking a doubling case. nistp224
and nistp256 base point multiplication, on some compilers, are saved by
quirks of the "mixed" path. The generic code's base point multiplication
and all methods' arbitrary point multiplication (ECDH; ephemeral keys
makes this less interesting) are affected.
Fix the branches in the nistz256 assembly, and use bit operations in C.
Note the C versions are all different because P-224 believes true is 1,
P-256 believes true is any non-zero value, and the generic code believes
true is 0xf...f. This should be double-checked when reviewing.
Aside: The nistz256 assembly also special-cases nontrivial P + (-P) in
arbitrary point multiplication. Fortunately, the formulas in util.c hold
there and I believe one can show P + (-P) is unreachable for all curves.
Still, it would be nice to omit the branch if we can verify the assembly
works anyway.
[0] 03da376ff7/crypto/ec/ecp_nistp521.c (L1259)
Change-Id: I8958624cd6b5272e5076c6c1605ab089e85f4cb7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36465
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
KWP is Key Wrap with Padding, defined in RFC 5649 and SP 800-38F. Like
Key Wrap, it's a poor-man's AEAD and shouldn't be used. However, some
existing systems use it and we need to interoperate.
The interface of the added functions is a little unfortunate, but they
match the interfaces of the existing Key Wrap functions which, in turn,
match functions in OpenSSL. Hopefully this way, if OpenSSL ever add
support, we'll line up.
Change-Id: I3496c288f32230a891261586ca2e9c4ee8456c09
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36324
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
See I9c20b660ce4b58dc633588cfd5b2e97a40203ec3. Aside for p224-64.c, we'd
already split mul_public into a dedicated function, at which point it's
simpler to just have three functions.
This makes it clearer when we do and don't care about the doubling case
coming up and avoids a bunch of NULL checks.
Change-Id: I7c5dafa7f12f4f53937d912ba22c90cb5a786f04
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36225
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
These even trip UBSan because they break alignment requirements. The
crypto_word_t isn't doing anything here, so just read bytes.
Change-Id: Icb6dfce2c3a10f8252bbb0889cbeedcf1e8d8e62
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36066
Reviewed-by: Adam Langley <alangley@gmail.com>
Commit-Queue: David Benjamin <davidben@google.com>
Most of the crypto_word_t* casts in third_party/sike are due to p503
being defined with uint64_t. This is a strict aliasing violation and
easily avoided with a TOBN-like macro when defining p503.
This clears almost all of the casts. Also remove an unused stdbool.h
include.
Change-Id: Ife3a4ec620f8b7f4e09c87c6fc43d8b82396046b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/36064
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <alangley@gmail.com>
Based on Microsoft's implementation available on github:
Source: https://github.com/Microsoft/PQCrypto-SIDH
Commit: 77044b76181eb61c744ac8eb7ddc7a8fe72f6919
Following changes has been applied
* In intel assembly, use MOV instead of MOVQ:
Intel instruction reference in the Intel Software Developer's Manual
volume 2A, the MOVQ has 4 forms. None of them mentions moving
literal to GPR, hence "movq $rax, 0x0" is wrong. Instead, on 64bit
system, MOV can be used.
* Some variables were wrongly zero-initialized (as per C99 spec).
* Rewrite x86_64 assembly to AT&T format.
* Move assembly for x86_64 and aarch64 to perlasm.
* Changes to aarch64 assembly, to avoid using x18 platform register.
Assembly also correctly constructs linked list of stack-frames as
described in AAPCS64, 5.2.3.
* Move constant values to .RODATA segment, as keeping them in .TEXT
segment is not compatible with XOM.
* Fixes issue in arm64 code related to the fact that compiler doesn't
reserve enough space for the linker to relocate address of a global
variable when used by 'ldr' instructions. Solution is to use 'adrp'
followed by 'add' instruction. Relocations for 'adrp' and 'add'
instructions is generated by prefixing the label with :pg_hi21:
and :lo12: respectively.
* Enable MULX and ADX. Code from MS doesn't support PIC. MULX can't
reference global variable directly. Instead RIP-relative addressing
can be used. This improves performance around 10%-13% on SkyLake
* Check if CPU supports BMI2 and ADOX instruction at runtime. On AMD64
optimized implementation of montgomery multiplication and reduction
have 2 implementations - faster one takes advantage of BMI2
instruction set introduced in Haswell and ADOX introduced in
Broadwell. Thanks to OPENSSL_ia32cap_P it can be decided at runtime
which implementation to choose. As CPU configuration is static by
nature, branch predictor will be correct most of the time and hence
this check very often has no cost.
* Reuse some utilities from boringssl instead of reimplementing them.
This includes things like:
* definition of a limb size (use crypto_word_t instead of digit_t)
* use functions for checking in constant time if value is 0 and/or
less then
* #define's used for conditional compilation
* Use SSE2 for conditional swap on vector registers. Improves
performance a little bit.
* Fix f2elm_t definition. Code imported from MSR defines f2elm_t type as
a array of arrays. This decays to a pointer to an array (when passing
as an argument). In C, one can't assign const pointer to an array with
non-const pointer to an array. Seems it violates 6.7.3/8 from C99
(same for C11). This problem occures in GCC 6, only when -pedantic
flag is specified and it occures always in GCC 4.9 (debian jessie).
* Fix definition of eval_3_isog. Second argument in eval_3_isog mustn't be
const. Similar reason as above.
* Use HMAC-SHA256 instead of cSHAKE-256 to avoid upstreaming cSHAKE
and SHA3 code.
* Add speed and unit tests for SIKE.
Some speed results:
Skylake (64-bit):
Did 408 SIKE/P503 generate operations in 1002573us (407.0 ops/sec)
Did 275 SIKE/P503 encap operations in 1070570us (256.9 ops/sec)
Did 264 SIKE/P503 decap operations in 1098955us (240.2 ops/sec)
Skylake (32-bit):
Did 9 SIKE/P503 generate operations in 1051620us (8.6 ops/sec)
Did 5 SIKE/P503 encap operations in 1038251us (4.8 ops/sec)
Did 5 SIKE/P503 decap operations in 1103617us (4.5 ops/sec)
Change-Id: I22f0bb1f9edff314a35cd74b48e8c4962568e330
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35204
Reviewed-by: Adam Langley <alangley@gmail.com>
The new version of googletest deprecates INSTANTIATE_TEST_CASE_P in
favor of INSTANTIATE_TEST_SUITE_P, so apply the change.
This requires blacklisting C4628 on MSVC 2015 which says about digraphs
given foo<::std::tuple<...>>. Disable that warning. Digraphs are not
useful and C++11 apparently explicitly disambiguates that.
It also requires applying
https://github.com/google/googletest/pull/2226, to deal with a warning
in older MSVC.
Update-Note: Consumers using BoringSSL with their own copy of googletest
must ensure googletest was updated to a version from 2019-01-03 or
later for INSTANTIATE_TEST_SUITE_P to work. (I believe all relevant
consumers are fine here. If anyone can't update googletest and is
building BoringSSL tests, building with
-DINSTANTIATE_TEST_SUITE_P=INSTANTIATE_TEST_CASE_P would work as
workaround.)
Bug: chromium:936651
Change-Id: I23ada8de34a53131cab88a36a88d3185ab085c64
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35504
Reviewed-by: Adam Langley <agl@google.com>
Unfortunately, it's not enough to be able to turn it on thanks to the
PURE_VIRTUAL macro. But it gets us most of the way there.
Change-Id: Ie6ad5119fcfd420115fa49d7312f3586890244f4
Reviewed-on: https://boringssl-review.googlesource.com/c/34949
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Due to a language flaw in C, left-shifts on signed integers are
undefined for negative numbers. This makes them all but useless. Cast to
the unsigned type, left-shift, and cast back (casts are defined to wrap)
to silence UBSan.
Change-Id: I8fbe739aee1c99cf553462b675863e6d68c2b302
Reviewed-on: https://boringssl-review.googlesource.com/c/34446
Reviewed-by: Adam Langley <agl@google.com>
This new version makes it much easier to tell which code is handwritten
and which is verified. For some reason, it also is *dramatically* faster
for 32-bit x86 GCC. Clang x86_64, however, does take a small hit.
Benchmarks below.
x86, GCC 7.3.0, OPENSSL_SMALL
(For some reason, GCC used to be really bad at compiling the 32-bit curve25519
code. The new one fixes this. I'm not sure what changed.)
Before:
Did 17135 Ed25519 key generation operations in 10026402us (1709.0 ops/sec)
Did 17170 Ed25519 signing operations in 10074192us (1704.4 ops/sec)
Did 9180 Ed25519 verify operations in 10034025us (914.9 ops/sec)
Did 17271 Curve25519 base-point multiplication operations in 10050837us (1718.4 ops/sec)
Did 10605 Curve25519 arbitrary point multiplication operations in 10047714us (1055.5 ops/sec)
Did 7800 ECDH P-256 operations in 10018331us (778.6 ops/sec)
Did 24308 ECDSA P-256 signing operations in 10019241us (2426.1 ops/sec)
Did 9191 ECDSA P-256 verify operations in 10081639us (911.7 ops/sec)
After:
Did 99873 Ed25519 key generation operations in 10021810us (9965.6 ops/sec) [+483.1%]
Did 99960 Ed25519 signing operations in 10052236us (9944.1 ops/sec) [+483.4%]
Did 53676 Ed25519 verify operations in 10009078us (5362.7 ops/sec) [+486.2%]
Did 102000 Curve25519 base-point multiplication operations in 10039764us (10159.6 ops/sec) [+491.2%]
Did 60802 Curve25519 arbitrary point multiplication operations in 10056897us (6045.8 ops/sec) [+472.8%]
Did 7900 ECDH P-256 operations in 10054509us (785.7 ops/sec) [+0.9%]
Did 24926 ECDSA P-256 signing operations in 10050919us (2480.0 ops/sec) [+2.2%]
Did 9494 ECDSA P-256 verify operations in 10064659us (943.3 ops/sec) [+3.5%]
x86, Clang 8.0.0 trunk 349417, OPENSSL_SMALL
Before:
Did 82750 Ed25519 key generation operations in 10051177us (8232.9 ops/sec)
Did 82400 Ed25519 signing operations in 10035806us (8210.6 ops/sec)
Did 41511 Ed25519 verify operations in 10048919us (4130.9 ops/sec)
Did 83300 Curve25519 base-point multiplication operations in 10044283us (8293.3 ops/sec)
Did 49700 Curve25519 arbitrary point multiplication operations in 10007005us (4966.5 ops/sec)
Did 14039 ECDH P-256 operations in 10093929us (1390.8 ops/sec)
Did 40950 ECDSA P-256 signing operations in 10006757us (4092.2 ops/sec)
Did 16068 ECDSA P-256 verify operations in 10095996us (1591.5 ops/sec)
After:
Did 80476 Ed25519 key generation operations in 10048648us (8008.6 ops/sec) [-2.7%]
Did 79050 Ed25519 signing operations in 10049180us (7866.3 ops/sec) [-4.2%]
Did 40501 Ed25519 verify operations in 10048347us (4030.6 ops/sec) [-2.4%]
Did 81300 Curve25519 base-point multiplication operations in 10017480us (8115.8 ops/sec) [-2.1%]
Did 48278 Curve25519 arbitrary point multiplication operations in 10092500us (4783.6 ops/sec) [-3.7%]
Did 15402 ECDH P-256 operations in 10096705us (1525.4 ops/sec) [+9.7%]
Did 44200 ECDSA P-256 signing operations in 10037715us (4403.4 ops/sec) [+7.6%]
Did 17000 ECDSA P-256 verify operations in 10008813us (1698.5 ops/sec) [+6.7%]
x86_64, GCC 7.3.0
(Note these P-256 numbers are not affected by this change. Included to get a
sense of noise.)
Before:
Did 557000 Ed25519 key generation operations in 10011721us (55634.8 ops/sec)
Did 550000 Ed25519 signing operations in 10016449us (54909.7 ops/sec)
Did 190000 Ed25519 verify operations in 10014565us (18972.4 ops/sec)
Did 587000 Curve25519 base-point multiplication operations in 10015402us (58609.7 ops/sec)
Did 230000 Curve25519 arbitrary point multiplication operations in 10023827us (22945.3 ops/sec)
Did 179000 ECDH P-256 operations in 10016294us (17870.9 ops/sec)
Did 557000 ECDSA P-256 signing operations in 10014158us (55621.3 ops/sec)
Did 198000 ECDSA P-256 verify operations in 10036694us (19727.6 ops/sec)
After:
Did 569000 Ed25519 key generation operations in 10004965us (56871.8 ops/sec) [+2.2%]
Did 563000 Ed25519 signing operations in 10000064us (56299.6 ops/sec) [+2.5%]
Did 196000 Ed25519 verify operations in 10025650us (19549.9 ops/sec) [+3.0%]
Did 596000 Curve25519 base-point multiplication operations in 10008666us (59548.4 ops/sec) [+1.6%]
Did 229000 Curve25519 arbitrary point multiplication operations in 10028921us (22834.0 ops/sec) [-0.5%]
Did 182910 ECDH P-256 operations in 10014905us (18263.8 ops/sec) [+2.2%]
Did 562000 ECDSA P-256 signing operations in 10011944us (56133.0 ops/sec) [+0.9%]
Did 202000 ECDSA P-256 verify operations in 10046901us (20105.7 ops/sec) [+1.9%]
x86_64, GCC 7.3.0, OPENSSL_SMALL
Before:
Did 350000 Ed25519 key generation operations in 10002540us (34991.1 ops/sec)
Did 344000 Ed25519 signing operations in 10010420us (34364.2 ops/sec)
Did 197000 Ed25519 verify operations in 10030593us (19639.9 ops/sec)
Did 362000 Curve25519 base-point multiplication operations in 10004615us (36183.3 ops/sec)
Did 235000 Curve25519 arbitrary point multiplication operations in 10025951us (23439.2 ops/sec)
Did 32032 ECDH P-256 operations in 10056486us (3185.2 ops/sec)
Did 96354 ECDSA P-256 signing operations in 10007297us (9628.4 ops/sec)
Did 37774 ECDSA P-256 verify operations in 10044892us (3760.5 ops/sec)
After:
Did 343000 Ed25519 key generation operations in 10025108us (34214.1 ops/sec) [-2.2%]
Did 340000 Ed25519 signing operations in 10014870us (33949.5 ops/sec) [-1.2%]
Did 192000 Ed25519 verify operations in 10025082us (19152.0 ops/sec) [-2.5%]
Did 355000 Curve25519 base-point multiplication operations in 10013220us (35453.1 ops/sec) [-2.0%]
Did 231000 Curve25519 arbitrary point multiplication operations in 10010775us (23075.1 ops/sec) [-1.6%]
Did 31540 ECDH P-256 operations in 10009664us (3151.0 ops/sec) [-1.1%]
Did 99012 ECDSA P-256 signing operations in 10090296us (9812.6 ops/sec) [+1.9%]
Did 37695 ECDSA P-256 verify operations in 10092859us (3734.8 ops/sec) [-0.7%]
x86_64, Clang 8.0.0 trunk 349417
(Note these P-256 numbers are not affected by this change. Included to get a
sense of noise.)
Before:
Did 600000 Ed25519 key generation operations in 10000278us (59998.3 ops/sec)
Did 595000 Ed25519 signing operations in 10010375us (59438.3 ops/sec)
Did 184000 Ed25519 verify operations in 10013984us (18374.3 ops/sec)
Did 636000 Curve25519 base-point multiplication operations in 10005250us (63566.6 ops/sec)
Did 229000 Curve25519 arbitrary point multiplication operations in 10006059us (22886.1 ops/sec)
Did 179250 ECDH P-256 operations in 10026354us (17877.9 ops/sec)
Did 547000 ECDSA P-256 signing operations in 10017585us (54604.0 ops/sec)
Did 197000 ECDSA P-256 verify operations in 10013020us (19674.4 ops/sec)
After:
Did 560000 Ed25519 key generation operations in 10009295us (55948.0 ops/sec) [-6.8%]
Did 548000 Ed25519 signing operations in 10007912us (54756.7 ops/sec) [-7.9%]
Did 170000 Ed25519 verify operations in 10056948us (16903.7 ops/sec) [-8.0%]
Did 592000 Curve25519 base-point multiplication operations in 10016818us (59100.6 ops/sec) [-7.0%]
Did 214000 Curve25519 arbitrary point multiplication operations in 10043918us (21306.4 ops/sec) [-6.9%]
Did 180000 ECDH P-256 operations in 10026019us (17953.3 ops/sec) [+0.4%]
Did 550000 ECDSA P-256 signing operations in 10004943us (54972.8 ops/sec) [+0.7%]
Did 198000 ECDSA P-256 verify operations in 10021714us (19757.1 ops/sec) [+0.4%]
x86_64, Clang 8.0.0 trunk 349417, OPENSSL_SMALL
Before:
Did 326000 Ed25519 key generation operations in 10003266us (32589.4 ops/sec)
Did 322000 Ed25519 signing operations in 10026783us (32114.0 ops/sec)
Did 181000 Ed25519 verify operations in 10015635us (18071.7 ops/sec)
Did 335000 Curve25519 base-point multiplication operations in 10000359us (33498.8 ops/sec)
Did 224000 Curve25519 arbitrary point multiplication operations in 10027245us (22339.1 ops/sec)
Did 68552 ECDH P-256 operations in 10018900us (6842.3 ops/sec)
Did 184000 ECDSA P-256 signing operations in 10014516us (18373.3 ops/sec)
Did 76020 ECDSA P-256 verify operations in 10016891us (7589.2 ops/sec)
After:
Did 310000 Ed25519 key generation operations in 10022086us (30931.7 ops/sec) [-5.1%]
Did 308000 Ed25519 signing operations in 10007543us (30776.8 ops/sec) [-4.2%]
Did 173000 Ed25519 verify operations in 10005829us (17289.9 ops/sec) [-4.3%]
Did 321000 Curve25519 base-point multiplication operations in 10027058us (32013.4 ops/sec) [-4.4%]
Did 212000 Curve25519 arbitrary point multiplication operations in 10015203us (21167.8 ops/sec) [-5.2%]
Did 64059 ECDH P-256 operations in 10042781us (6378.6 ops/sec) [-6.8%]
Did 170000 ECDSA P-256 signing operations in 10030896us (16947.6 ops/sec) [-7.8%]
Did 72176 ECDSA P-256 verify operations in 10075369us (7163.6 ops/sec) [-5.6%]
Bug: 254
Change-Id: Ib04c773f01b542bcb8611cceb582466bfa6f6d52
Reviewed-on: https://boringssl-review.googlesource.com/c/34306
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
I don't believe we use this anymore. People using it should upgrade to a newer
NDK (or, worst case, download android-cmake themselves).
Change-Id: Ia99d7b19d6f2ec3f4ffe90795813b00480dc2d60
Reviewed-on: https://boringssl-review.googlesource.com/c/34004
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
The change seems to have stuck, so bring us closer to C/++11 static asserts.
(If we later find we need to support worse toolchains, we can always use
__LINE__ or __COUNTER__ to avoid duplicate typedef names and just punt on
embedding the message into the type name.)
Change-Id: I0e5bb1106405066f07740728e19ebe13cae3e0ee
Reviewed-on: https://boringssl-review.googlesource.com/c/33145
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
For simplicity, punt order > field or width mismatches. Analogous
optimizations are possible, but the generic path works fine and no
commonly-used curve looks hits those cases.
Before:
Did 5888 ECDSA P-384 verify operations in 3094535us (1902.7 ops/sec)
After [+6.7%]:
Did 6107 ECDSA P-384 verify operations in 3007515us (2030.6 ops/sec)
Also we can fill in p - order generically and avoid extra copies of some
constants.
Change-Id: I38e1b6d51b28ed4f8cb74697b00a4f0fbc5efc3c
Reviewed-on: https://boringssl-review.googlesource.com/c/33068
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This is in preparation for removing the BIGNUM from cmp_x_coordinate.
Change-Id: Id8394248e3019a4897c238289f039f436a13679d
Reviewed-on: https://boringssl-review.googlesource.com/c/33064
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This reverts commit e907ed4c4b92cd7cea250240e32695c73dacf7eb. CPUID
checks have been added so hopefully this time sticks.
Change-Id: I5e0e5b87427c1230132681f936b3c70bac8263b8
Reviewed-on: https://boringssl-review.googlesource.com/c/32924
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This reverts commit 3d450d2844db825a906fc19f7bb1e6ce765047db. It fails
SDE, looks like a missing CPUID check before using vector instructions.
Change-Id: I6b7dd71d9e5b1f509d2e018bd8be38c973476b4e
Reviewed-on: https://boringssl-review.googlesource.com/c/32864
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Some consumer stumbled upon EC_POINT_{add,dbl} being faster with a
"custom" P-224 curve than the built-in one and made "custom" clones to
work around this. Before the EC_FELEM refactor, EC_GFp_nistp224_method
used BN_mod_mul for all reductions in fallback point arithmetic (we
primarily support the multiplication functions and keep the low-level
point arithmetic for legacy reasons) which took quite a performance hit.
EC_FELEM fixed this, but standalone felem_{mul,sqr} calls out of
nistp224 perform a lot of reductions, rather than batching them up as
that implementation is intended. So it is still slightly faster to use a
"custom" curve.
Custom curves are the last thing we want to encourage, so just route the
tuned implementations out of EC_METHOD to close this gap. Now the
built-in implementation is always solidly faster than (or identical to)
the custom clone. This also reduces the number of places where we mix
up tuned vs. generic implementation, which gets us closer to making
EC_POINT's representation EC_METHOD-specific.
Change-Id: I843e1101a6208eaabb56d29d342e886e523c78b4
Reviewed-on: https://boringssl-review.googlesource.com/c/32848
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This commit improves the performance of ECDSA signature verification
(over NIST P-256 curve) for x86 platforms. The speedup is by a factor of 1.15x.
It does so by:
1) Leveraging the fact that the verification does not need
to run in constant time. To this end, we implemented:
a) the function ecp_nistz256_points_mul_public in a similar way to
the current ecp_nistz256_points_mul function by removing its constant
time features.
b) the Binary Extended Euclidean Algorithm (BEEU) in x86 assembly to
replace the current modular inverse function used for the inversion.
2) The last step in the ECDSA_verify function compares the (x) affine
coordinate with the signature (r) value. Converting x from the Jacobian's
representation to the affine coordinate requires to perform one inversions
(x_affine = x * z^(-2)). We save this inversion and speed up the computations
by instead bringing r to x (r_jacobian = r*z^2) which is faster.
The measured results are:
Before (on a Kaby Lake desktop with gcc-5):
Did 26000 ECDSA P-224 signing operations in 1002372us (25938.5 ops/sec)
Did 11000 ECDSA P-224 verify operations in 1043821us (10538.2 ops/sec)
Did 55000 ECDSA P-256 signing operations in 1017560us (54050.9 ops/sec)
Did 17000 ECDSA P-256 verify operations in 1051280us (16170.8 ops/sec)
After (on a Kaby Lake desktop with gcc-5):
Did 27000 ECDSA P-224 signing operations in 1011287us (26698.7 ops/sec)
Did 11640 ECDSA P-224 verify operations in 1076698us (10810.8 ops/sec)
Did 55000 ECDSA P-256 signing operations in 1016880us (54087.0 ops/sec)
Did 20000 ECDSA P-256 verify operations in 1038736us (19254.2 ops/sec)
Before (on a Skylake server platform with gcc-5):
Did 25000 ECDSA P-224 signing operations in 1021651us (24470.2 ops/sec)
Did 10373 ECDSA P-224 verify operations in 1046563us (9911.5 ops/sec)
Did 50000 ECDSA P-256 signing operations in 1002774us (49861.7 ops/sec)
Did 15000 ECDSA P-256 verify operations in 1006471us (14903.6 ops/sec)
After (on a Skylake server platform with gcc-5):
Did 25000 ECDSA P-224 signing operations in 1020958us (24486.8 ops/sec)
Did 10373 ECDSA P-224 verify operations in 1046359us (9913.4 ops/sec)
Did 50000 ECDSA P-256 signing operations in 1003996us (49801.0 ops/sec)
Did 18000 ECDSA P-256 verify operations in 1021604us (17619.4 ops/sec)
Developers and authors:
***************************************************************************
Nir Drucker (1,2), Shay Gueron (1,2)
(1) Amazon Web Services Inc.
(2) University of Haifa, Israel
***************************************************************************
Change-Id: Idd42a7bc40626bce974ea000b61fdb5bad33851c
Reviewed-on: https://boringssl-review.googlesource.com/c/31304
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
- In base.h, if BORINGSSL_PREFIX is defined, include
boringssl_prefix_symbols.h
- In all .S files, if BORINGSSL_PREFIX is defined, include
boringssl_prefix_symbols_asm.h
- In base.h, BSSL_NAMESPACE_BEGIN and BSSL_NAMESPACE_END are
defined with appropriate values depending on whether
BORINGSSL_PREFIX is defined; these macros are used in place
of 'namespace bssl {' and '}'
- Add util/make_prefix_headers.go, which takes a list of symbols
and auto-generates the header files mentioned above
- In CMakeLists.txt, if BORINGSSL_PREFIX and BORINGSSL_PREFIX_SYMBOLS
are defined, run util/make_prefix_headers.go to generate header
files
- In various CMakeLists.txt files, add "global_target" that all
targets depend on to give us a place to hook logic that must run
before all other targets (in particular, the header file generation
logic)
- Document this in BUILDING.md, including the fact that it is
the caller's responsibility to provide the symbol list and keep it
up to date
- Note that this scheme has not been tested on Windows, and likely
does not work on it; Windows support will need to be added in a
future commit
Change-Id: If66a7157f46b5b66230ef91e15826b910cf979a2
Reviewed-on: https://boringssl-review.googlesource.com/31364
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
Change-Id: I45866c3a4aa98ebac51d4e554a22eb5add45002f
Reviewed-on: https://boringssl-review.googlesource.com/31404
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Along the way, split up the EVPTest Wycheproof tests into separate tests (they
shard better when running in parallel).
Change-Id: I5ee919f7ec7c35a7f2e0cc2af4142991a808a9db
Reviewed-on: https://boringssl-review.googlesource.com/30846
Reviewed-by: Adam Langley <agl@google.com>